Come posso abbinare il primo subpattern in C#?

Ho fatto questo modello per abbinare div annidati:Come posso abbinare il primo subpattern in C#?

(<div[^>]*>(?:\g<1>|.)*?<\/div>)

Questo funziona bene, come si può vedere in regex101.

Tuttavia, quando scrivo il codice qui sotto in C#:

Regex findDivs = new Regex("(<div[^>]*>(?:\\g<1>|.)*?<\\/div>)", RegexOptions.Singleline);

Mi genera un errore:

Additional information: 
    parsing "(<div[^>]*>(?:\g<1>|.)*?<\/div>)" - 
     Unrecognized escape sequence \g.

Come si può vedere \g non funziona in C#. Come posso abbinare il primo subpattern allora?

fonte

2016-05-24 João Ferreira

mi piace la risposta in alto su questa questione quando guardando il tentativo di abbinare HTML usando espressioni regolari http://stackoverflow.com/questions/1732348/regex-match-open -tags-except-xhtml-self-contained-tags In breve 'Do not Parse HTML With Regex' –

Prima di tutto dovresti davvero usare un tester di espressioni regolari che usi in modo specifico C# per garantire la compatibilità. In secondo luogo controlla questa domanda http://stackoverflow.com/questions/19596502/regex-nested-parentheses – juharr

Quello che state cercando è bilanciamento gruppi. Ecco una conversione uno-a-uno dei tuoi regex per .NET:

(?sx)<div[^>]*>     # Opening DIV 
    (?>       # Start of atomic group 
     (?:(?!</?div[^>]*>).)+ # (1) Any text other than open/close DIV 
     | <div[^>]*> (?<tag>) # Add 1 "tag" value to stack if opening DIV found 
     | </div> (?<-tag>)  # Remove 1 "tag" value from stack when closing DIV tag is found 
    )* 
    (?(tag)(?!))     # Check if "tag" stack is not empty (then fail) 
</div>

Vedi l'regex demo

Tuttavia, si potrebbe davvero desiderare di utilizzare HtmlAgilityPack per analizzare HTML.

Il punto principale è ottenere un XPath che corrisponda a tutti i tag DIV che non hanno antenati con lo stesso nome. Si potrebbe desiderare qualcosa di simile (non testata):

private List<string> GetTopmostDivs(string html) 
{ 
    var result = new List<KeyValuePair<string, string>>(); 
    HtmlAgilityPack.HtmlDocument hap; 
    Uri uriResult; 
    if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp) 
    { // html is a URL 
     var doc = new HtmlAgilityPack.HtmlWeb(); 
     hap = doc.Load(uriResult.AbsoluteUri); 
    } 
    else 
    { // html is a string 
     hap = new HtmlAgilityPack.HtmlDocument(); 
     hap.LoadHtml(html); 
    } 
    var nodes = hap.DocumentNode.SelectNodes("//div[not(ancestor::div)]"); 
    if (nodes != null) 
     return nodes.Select(p => p.OuterHtml).ToList(); 
    else 
     return new List<string>(); 
}

fonte

2016-05-24 19:50:04

Questo ha funzionato bene. Molte grazie!!! –

Quello che si vuole fare è iterare sui gruppi di cattura. Ecco un esempio:

foreach (var s in test) 
{ 
    Match match = regex.Match(s); 

     foreach (Capture capture in match.Captures) 
     { 
      Console.WriteLine("Index={0}, Value={1}", capture.Index, capture.Value); 
      Console.WriteLine(match.Groups[1].Value); 
     } 
}

fonte

2016-05-24 19:18:09

Scusate se sono pigro .. come può aiutare esattamente? –

Puoi fare la partita e quindi puoi guardare i gruppi di cattura. Vedrai che uno di loro contiene i valori che desideri. Accendi quel debugger e dai un'occhiata. –

Come posso abbinare il primo subpattern in C#?

risposta

Problemi correlati