Post Reply 
 
Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regular Expression Headache
30-03-2006, 01:41 PM
Post: #1
Regular Expression Headache
hi Guys,

my head is really hurting with this one:

I need to parse a string of html, finding certain phrases and replacing them with an '<a href' link.

Problem is that I also need the regex to NOT match phrases that are already in links.

Any regex gurus out that that can help me solve the problem?

Bomberman
aka Colin Smith
http://www.smithcolin.co.uk
Send this user an email Visit this user's website Find all posts by this user
Quote this message in a reply
30-03-2006, 04:56 PM
Post: #2
 
Well, I have found 'a' solution, would have preferred to do it with regular expressions, as it would mean less coding and higher efficiency.

Code:
public static string ConvertContent(string html)
        {
            DataTable dtbKeyword = Read();
            string retHtml =html;
            
            //retHtml = retHtml.Replace("<a","\r\n<a").Replace("</a>","</a>\r\n");
            foreach(DataRow dr in dtbKeyword.Rows)
            {
                string target = " target=\"_blank\" ";
                if(!(bool)dr["NewWindow"]) target = string.Empty;

                Match oMatch = null;
                int startfrom = 0;
                do
                {
                    //carry on from where left off
                    string endstring = retHtml.Substring(startfrom,retHtml.Length-startfrom);
                    //find the next match (if it exists)
                    oMatch = Regex.Match(endstring,Regex.Escape( ((string)dr["Keyword"])) ,RegexOptions.IgnoreCase | RegexOptions.Multiline);
                    
                    if(oMatch.Value != "")
                    {
                        //construct the new link
                        string link = "<a href=\"" + (string)dr["Hyperlink"] + "\"" + target + ">" + oMatch.Value + "</a>";
                        
                        //get a string that is upto the phrase found
                        string upto = retHtml.Substring(0,startfrom + oMatch.Index);
                        
                        //find the number of 'a' open a closing elements upt oteh phrase
                        MatchCollection starts = Regex.Matches(upto,@"\<a",RegexOptions.IgnoreCase);
                        MatchCollection ends = Regex.Matches(upto,@"\</a",RegexOptions.IgnoreCase);
                    
                        //get the string from the end of the found phrase to the end of the string
                        string downto = retHtml.Substring(startfrom + oMatch.Index + oMatch.Length,retHtml.Length - (startfrom + oMatch.Index + oMatch.Length));
                        
                        //if there are the same amount of opening and closing of links, the phrase is not already in a link, so reconstruct the content string
                        if(starts.Count == ends.Count)
                        {
                            retHtml = upto + link + downto;
                            //set thh new start from index
                            startfrom = startfrom + oMatch.Index + link.Length;
                        }
                        else
                            startfrom = startfrom + oMatch.Index + oMatch.Length;//set thh new start from index
                    }
                }while(oMatch.Value != "");
            }

            return retHtml;
        }

If anyone does have a regex that could be used please let me know!

Bomberman
aka Colin Smith
http://www.smithcolin.co.uk
Send this user an email Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump: