Hello Everyone,
I have one static HTML page table with some information in it, I am trying to extract the contents of the Table
HTML Table:-
<table border="0" cellpadding="5" cellspacing="0" width="165">
<td nowrap>
<div class="saadirname">
<span class="saadirtext">
State College
First Grade
<span class="saadirheader">Mailing Address:</span><br>
<span class="saadirtext">
<!--If using company address-->
Welcome Society
Library Arch
# 20 State Street
Mail Road
<img src="/images/spacer.gif" alt="" height="1" width="5" border="0">
<img src="/images/spacer.gif" alt="" height="1" width="5" border="0">
<b class="saadirheader">Phone:</b> <span class="saadirtext">(916) 060-6480</span><br>
<b class="saadirheader">Fax:</b> <span class="saadirtext">(916) 264-6336</span><br>
<b class="saadirheader">Email:</b>
<a href="MailTo:[email protected] "><span class="saadirtext">
[email protected]</span></a><br>
<b class="saadirheader">Membership Type:</b>
<span class="saadirtext">Individual</span><br>
Program :-
static void Main() { StreamReader str = new StreamReader("C:\\member.html"); string SFile = str.ReadToEnd();
Regex regex = new Regex( @"<tr> ( \s* <td[^>]*> \s*<div[^>]*>\s* (\s*<!--((?!-->).)*-->)*\s* (?<value>.*?) (\s*<!--((?!-->).)*-->)*\s* </div>\s* </td> )+ \s*</tr> ", RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
foreach (Match m in regex.Matches(SFile)) {
foreach (Capture item in m.Groups["value"].Captures) { Console.WriteLine(item.Value); } Console.WriteLine(); }
Console.ReadLine(); }
I am getting the below output ....Entire table is getting printed with tags. can we handle the span,br tags ??
There is one more table in the Page Footer part, can we start the process after this line
OutPut :-
Pradeep G </div> <span class="saadirtext"> State College <br>
First Grade <br> Library <br> </span><br> <span class="saadirheader">Mailing Address:</span><br> <span class="saadirtext"> <!--If using company address--> Welcome Society
Library Arch <br> # 20 State Street <br> Mail Road , WI <img src="Newrecord_files/spacer.gif" alt="" border="0" height="1" width="5"> 5000-1000 <img src="Newrecord_files/spacer.gif" alt="" border="0" height="1" width="5"> IND </span> <p>
<b class="saadirheader">Phone:</b> <span class="saadirtext">(916) 060-6480</span><br> <b class="saadirheader">Fax:</b> <span class="saadirtext">(916) 264-6336</span><br>
<b class="saadirheader">Email:</b> <a href="mailto:[email protected]"><span class="saadirtext"> [email protected]</span></a><br>
<br> <b class="saadirheader">Membership Type:</b> <span class="saadirtext">Individual</span><br> <br> </p></td> </tr> </tbody></table> </td>
</tr> </tbody></table> <!--START FOOTER FILE-->
<table border="0" cellpadding="0" cellspacing="0" width="100%"> <tbody><tr> <td><img src="Newrecord_files/transparent.gif" alt="" border="0" height="0" hspace="0" vspace="0" width="0"></td> <td align="center" width="771"> <table border="0" cellpadding="0" cellspacing="0" width="771"> <tbody><tr>
<td> <table border="0" cellpadding="0" cellspacing="0" width="100%"> <tbody><tr> <td> <table border="0" cellpadding="0" cellspacing="0" width="100%"> <tbody><tr> <td valign="top"><img src="Newrecord_files/transparent.gif" alt="" border="0" height="25" hspace="0" vspace="0" width="165"></td> <td valign="top"><img src="Newrecord_files/transparent.gif" alt="" border="0" height="25" hspace="0" vspace="0" width="10"></td> </tr>
<tr> <td valign="top" width="165"><img src="Newrecord_files/transparent.gif" alt="" border="0" height="25" hspace="0" vspace="0" width="165"></td> <td valign="top"> <div id="footer" style="border-top: 1px solid rgb(204, 204, 204); padding: 10px 0pt 20px;"> <p>© The Archivists</p> <ul>