Come rimuovere i caratteri speciali html?

97

O decodificarli utilizzando html_entity_decode o rimuoverli utilizzando preg_replace:

$Content = preg_replace("/&#?[a-z0-9]+;/i","",$Content);

(da here)

MODIFICA: Alternativa secondo il commento di Jacco

potrebbe essere piacevole sostituire il '+' con {2,8} o qualcosa del genere. Ciò limiterà la possibilità di sostituire intere frasi quando un '&' non codificato è presente.

$Content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$Content);

fonte

2009-03-18 10:16:58 schnaader

+3

potrebbe essere bello sostituire il '+' con '{2,8] o qualcosa del genere. Ciò limiterà la possibilità di sostituire intere frasi quando è presente un '&' non codificato. – Jacco

+0

Grazie, aggiunto il tuo commento e una versione alternativa alla risposta. – schnaader

+0

Ho fatto un refuso: dovrebbe essere {2,8} mi dispiace per quello – Jacco

18

Utilizzare html_entity_decode per convertire le entità HTML.

Avrai bisogno di impostare il set di caratteri per farlo funzionare correttamente.

fonte

2009-03-18 10:15:19 andi

+1

questo è più correttamente perché quando abbiamo appena sostituiamo con stringa vuota otteniamo risultato non corretto - tutti gli spazi non fragili sono crollati – heximal

7

Si consiglia un'occhiata a htmlentities() e html_entity_decode() here

$orig = "I'll \"walk\" the <b>dog</b> now"; 

$a = htmlentities($orig); 

$b = html_entity_decode($a); 

echo $a; // I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt; now 

echo $b; // I'll "walk" the <b>dog</b> now

fonte

2009-03-18 10:16:14 0xFF

+0

voglio per rimuovere quelle html caratteri di codici speciali – Prashant

+0

questo html_entity_decode ($ a); sta facendo il tric – 0xFF

1

Una pianura stringhe vanilla modo per farlo senza impegnare il motore preg regex:

function remEntities($str) { 
    if(substr_count($str, '&') && substr_count($str, ';')) { 
    // Find amper 
    $amp_pos = strpos($str, '&'); 
    //Find the ; 
    $semi_pos = strpos($str, ';'); 
    // Only if the ; is after the & 
    if($semi_pos > $amp_pos) { 
     //is a HTML entity, try to remove 
     $tmp = substr($str, 0, $amp_pos); 
     $tmp = $tmp. substr($str, $semi_pos + 1, strlen($str)); 
     $str = $tmp; 
     //Has another entity in it? 
     if(substr_count($str, '&') && substr_count($str, ';')) 
     $str = remEntities($tmp); 
    } 
    } 
    return $str; 
}

fonte

2009-03-18 11:19:50 karim79

1

Sembra che ciò che si vuole veramente è:

function xmlEntities($string) { 
    $translationTable = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES); 

    foreach ($translationTable as $char => $entity) { 
     $from[] = $entity; 
     $to[] = '&#'.ord($char).';'; 
    } 
    return str_replace($from, $to, $string); 
}

sostituisce la entità nominate con il loro numero equivalente.

fonte

2009-03-18 16:21:55 Jacco

1

<?php 
function strip_only($str, $tags, $stripContent = false) { 
    $content = ''; 
    if(!is_array($tags)) { 
     $tags = (strpos($str, '>') !== false 
       ? explode('>', str_replace('<', '', $tags)) 
       : array($tags)); 
     if(end($tags) == '') array_pop($tags); 
    } 
    foreach($tags as $tag) { 
     if ($stripContent) 
      $content = '(.+</'.$tag.'[^>]*>|)'; 
     $str = preg_replace('#</?'.$tag.'[^>]*>'.$content.'#is', '', $str); 
    } 
    return $str; 
} 

$str = '<font color="red">red</font> text'; 
$tags = 'font'; 
$a = strip_only($str, $tags); // red text 
$b = strip_only($str, $tags, true); // text 
?>

fonte

2010-07-10 11:43:18 jahanzaib

1

La funzione che ho usato per eseguire l'operazione, unendo l'aggiornamento fatto da schnaader è:

mysql_real_escape_string(
     preg_replace_callback("/&#?[a-z0-9]+;/i", function($m) { 
      return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); 
     }, strip_tags($row['cuerpo'])))

Questa funzione rimuove ogni simbolo tag HTML e HTML, convertito in UTF-8 pronto a salvare in MySQL

fonte

2011-07-14 15:08:37 Lalala

+0

creazione di un feed rss, non salvataggio in sql –

16

In aggiunta alle buone risposte sopra, PHP ha anche una funzione di filtro integrata che è piuttosto utile: filter-var.

Per rimuovere i caratteri HMTL, utilizzare:

$cleanString = filter_var($dirtyString, FILTER_SANITIZE_STRING);

Maggiori informazioni:

fonte

2012-02-16 16:59:55 gpkamp

+2

w3schools non è un buon sito Web di esempio, vedere http://www.w3fools.com – Skuld

+1

So che il thread è un po 'vecchio, ma sto cercando di risolvere lo stesso problema .. Sfortunatamente filter_var richiede 5.2 o più recente ... Altrimenti questa sarebbe la risposta (almeno al mio problema specifico). Grazie. – ChronoFish

4

Questo potrebbe funzionare bene per rimuovere caratteri speciali.

$modifiedString = preg_replace("/[^a-zA-Z0-9_.-\s]/", "", $content);

fonte

2013-03-29 09:58:05

2

Quello che ho fatto è stato quello di utilizzare: html_entity_decode, quindi utilizzare strip_tags a loro rimossi.

fonte

2013-12-16 15:36:35

2

provare questo

<?php 
$str = "\x8F!!!"; 

// Outputs an empty string 
echo htmlentities($str, ENT_QUOTES, "UTF-8"); 

// Outputs "!!!" 
echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8"); 
?>

fonte

2014-03-11 04:11:08 RaGu

+3

crea note "perché funziona il tuo codice"?Quindi sarebbe chiaro agli altri. – Praveen

-1

$string = "äáčé"; 

$convert = Array(
     'ä'=>'a', 
     'Ä'=>'A', 
     'á'=>'a', 
     'Á'=>'A', 
     'à'=>'a', 
     'À'=>'A', 
     'ã'=>'a', 
     'Ã'=>'A', 
     'â'=>'a', 
     'Â'=>'A', 
     'č'=>'c', 
     'Č'=>'C', 
     'ć'=>'c', 
     'Ć'=>'C', 
     'ď'=>'d', 
     'Ď'=>'D', 
     'ě'=>'e', 
     'Ě'=>'E', 
     'é'=>'e', 
     'É'=>'E', 
     'ë'=>'e', 
    ); 

$string = strtr($string , $convert); 

echo $string; //aace

fonte

2015-05-13 11:32:12 Zombyii

+0

Questo non risponde al problema degli OP – FluffyKitten

0

Si può provare htmlspecialchars_decode($string). Per me funziona.

http://www.w3schools.com/php/func_string_htmlspecialchars_decode.asp

fonte

2015-10-01 12:56:02 surabhivin

+3

Il downvoted per il collegamento a w3chools invece della documentazione ufficiale: http://php.net/htmlspecialchars_decode Detto questo, questo non risolve la domanda dell'OP. –

0

Se si desidera convertire i caratteri speciali HTML e non solo togliere loro così come striscia cose e si preparano per il testo normale questa è stata la soluzione che ha funzionato per me ...

function htmlToPlainText($str){ 
    $str = html_entity_decode($str, ENT_QUOTES | ENT_XML1, 'UTF-8'); 
    $str = htmlspecialchars_decode($str); 
    $str = html_entity_decode($str); 
    $str = strip_tags($str); 
    return $str; 
} 

$string = '<p>this is (&nbsp;) a test</p> 
<div>Yes this is! &amp; does it get "processed"? </div>' 

htmlToPlainText($string); 
// "this is () a test. Yes this is! & does it get processed?"`

html_entity_decode w/ENT_QUOTES | ENT_XML1 converte le cose come ' htmlspecialchars_decode converte le cose come & html_entity_decode converte   e strip_tags rimuove eventuali tag HTML lasciati.

fonte

2018-01-26 00:19:47 Jay

Come rimuovere i caratteri speciali html?

risposta

Problemi correlati