Several days ago, someone at the forum has asked
how to extract the text from a hyperlink and preserve other HTML tags.
It sound interesting, I did some research but can't find the direct
solution. So, I decide to put together a simple regular expression to
execute the task.
Regular Expression: (<[a|A][^>]*>|</[a|A]>)
Explanation:
<[a|A][^>]*> -- Remove <a href="a.aspx">
</[a|A]> -- Remove </a> tag
Example 1:
string str1 = "<a href=\"http://www.amazon.com/dp/0596528124/\" class=\"someclass\">Mastering Regular Expressions</a>
-- <A href=\"http://cnn.com/\">CNN</a> <div><a href=\"http://blog.ysatech.com\">http://blog.ysatech.com<;/a></div>";
str1 = System.Text.RegularExpressions.Regex.Replace(str1, "(<[a|A][^>]*>|)", "");
|
Result: Mastering Regular Expressions -- CNN <div> http://blog.ysatech.com </div>
Example 2:
string str2 = "<div><a href=\"http://www.ysatech.com/\" class=\"someclass\">ysatech</a></div>";
str2 = System.Text.RegularExpressions.Regex.Replace(str2, "(<[a|A][^>]*>|)", "");
|
Result: <div>ysatech</div>
Test this regular expression here.