0
Reply

Need some file parsing tips!!

mark

mark

Aug 24 2006 4:07 AM
2k
Hi
could any one show me the best way to extract the info i need from the file below. Basicly the files are hand historys from a poker site and i would like to keep track of how often a player calls checks folds and so on. The file is in html (hoping it posts ok) , or if any one could tell me if it would be better to convert the file to xml (if possible) and extract the data that way?
well here is a short part of the file .....

<html><body style="font-size:13px">
<a name="topofpage"><a/>
<table width="500" style="font-size:13px">
<tr bgcolor="#EEEEEE"><td  width="3%"><b>Idx</b></td><td  width="20%"><b>Date/Time</b></td><td  width="42%"><b>Table Name</b></td><td  width="20%"><b>Hand ID</b></td><td  width="15%"><b>Stakes</b></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">19</td><td  width="20%"><a href="#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50">[Jul 31 16:04:10]</a></td><td  width="42%"><a href="#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50">10474702-12327</a></td><td  width="15%"><a href="#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">18</td><td  width="20%"><a href="#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50">[Jul 31 16:03:20]</a></td><td  width="42%"><a href="#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50">10474702-12326</a></td><td  width="15%"><a href="#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">17</td><td  width="20%"><a href="#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50">[Jul 31 16:02:32]</a></td><td  width="42%"><a href="#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50">10474702-12325</a></td><td  width="15%"><a href="#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">16</td><td  width="20%"><a href="#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50">[Jul 31 16:01:45]</a></td><td  width="42%"><a href="#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50">10474702-12324</a></td><td  width="15%"><a href="#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">15</td><td  width="20%"><a href="#[Jul 31 16:00:43]~Casanova~10474702-12323~$0.25/$0.50">[Jul 31 16:00:43]</a></td><td  width="42%"><a href="#[Jul 31 16:00:43]~Casanova~10474702-12323~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 16:00:43]~Casanova~10474702-12323~$0.25/$0.50">10474702-12323</a></td><td  width="15%"><a href="#[Jul 31 16:00:43]~Casanova~10474702-12323~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">14</td><td  width="20%"><a href="#[Jul 31 15:59:13]~Casanova~10474702-12322~$0.25/$0.50">[Jul 31 15:59:13]</a></td><td  width="42%"><a href="#[Jul 31 15:59:13]~Casanova~10474702-12322~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:59:13]~Casanova~10474702-12322~$0.25/$0.50">10474702-12322</a></td><td  width="15%"><a href="#[Jul 31 15:59:13]~Casanova~10474702-12322~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">13</td><td  width="20%"><a href="#[Jul 31 15:58:00]~Casanova~10474702-12321~$0.25/$0.50">[Jul 31 15:58:00]</a></td><td  width="42%"><a href="#[Jul 31 15:58:00]~Casanova~10474702-12321~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:58:00]~Casanova~10474702-12321~$0.25/$0.50">10474702-12321</a></td><td  width="15%"><a href="#[Jul 31 15:58:00]~Casanova~10474702-12321~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">12</td><td  width="20%"><a href="#[Jul 31 15:57:31]~Casanova~10474702-12320~$0.25/$0.50">[Jul 31 15:57:31]</a></td><td  width="42%"><a href="#[Jul 31 15:57:31]~Casanova~10474702-12320~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:57:31]~Casanova~10474702-12320~$0.25/$0.50">10474702-12320</a></td><td  width="15%"><a href="#[Jul 31 15:57:31]~Casanova~10474702-12320~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">11</td><td  width="20%"><a href="#[Jul 31 15:56:44]~Casanova~10474702-12319~$0.25/$0.50">[Jul 31 15:56:44]</a></td><td  width="42%"><a href="#[Jul 31 15:56:44]~Casanova~10474702-12319~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:56:44]~Casanova~10474702-12319~$0.25/$0.50">10474702-12319</a></td><td  width="15%"><a href="#[Jul 31 15:56:44]~Casanova~10474702-12319~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">10</td><td  width="20%"><a href="#[Jul 31 15:56:22]~Casanova~10474702-12318~$0.25/$0.50">[Jul 31 15:56:22]</a></td><td  width="42%"><a href="#[Jul 31 15:56:22]~Casanova~10474702-12318~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:56:22]~Casanova~10474702-12318~$0.25/$0.50">10474702-12318</a></td><td  width="15%"><a href="#[Jul 31 15:56:22]~Casanova~10474702-12318~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">9</td><td  width="20%"><a href="#[Jul 31 15:54:56]~Casanova~10474702-12317~$0.25/$0.50">[Jul 31 15:54:56]</a></td><td  width="42%"><a href="#[Jul 31 15:54:56]~Casanova~10474702-12317~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:54:56]~Casanova~10474702-12317~$0.25/$0.50">10474702-12317</a></td><td  width="15%"><a href="#[Jul 31 15:54:56]~Casanova~10474702-12317~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">8</td><td  width="20%"><a href="#[Jul 31 15:53:43]~Casanova~10474702-12316~$0.25/$0.50">[Jul 31 15:53:43]</a></td><td  width="42%"><a href="#[Jul 31 15:53:43]~Casanova~10474702-12316~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:53:43]~Casanova~10474702-12316~$0.25/$0.50">10474702-12316</a></td><td  width="15%"><a href="#[Jul 31 15:53:43]~Casanova~10474702-12316~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">7</td><td  width="20%"><a href="#[Jul 31 15:52:18]~Casanova~10474702-12315~$0.25/$0.50">[Jul 31 15:52:18]</a></td><td  width="42%"><a href="#[Jul 31 15:52:18]~Casanova~10474702-12315~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:52:18]~Casanova~10474702-12315~$0.25/$0.50">10474702-12315</a></td><td  width="15%"><a href="#[Jul 31 15:52:18]~Casanova~10474702-12315~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">6</td><td  width="20%"><a href="#[Jul 31 15:50:59]~Casanova~10474702-12314~$0.25/$0.50">[Jul 31 15:50:59]</a></td><td  width="42%"><a href="#[Jul 31 15:50:59]~Casanova~10474702-12314~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:50:59]~Casanova~10474702-12314~$0.25/$0.50">10474702-12314</a></td><td  width="15%"><a href="#[Jul 31 15:50:59]~Casanova~10474702-12314~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">5</td><td  width="20%"><a href="#[Jul 31 15:49:37]~Casanova~10474702-12313~$0.25/$0.50">[Jul 31 15:49:37]</a></td><td  width="42%"><a href="#[Jul 31 15:49:37]~Casanova~10474702-12313~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:49:37]~Casanova~10474702-12313~$0.25/$0.50">10474702-12313</a></td><td  width="15%"><a href="#[Jul 31 15:49:37]~Casanova~10474702-12313~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">4</td><td  width="20%"><a href="#[Jul 31 15:48:10]~Casanova~10474702-12312~$0.25/$0.50">[Jul 31 15:48:10]</a></td><td  width="42%"><a href="#[Jul 31 15:48:10]~Casanova~10474702-12312~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:48:10]~Casanova~10474702-12312~$0.25/$0.50">10474702-12312</a></td><td  width="15%"><a href="#[Jul 31 15:48:10]~Casanova~10474702-12312~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">3</td><td  width="20%"><a href="#[Jul 31 15:47:20]~Casanova~10474702-12311~$0.25/$0.50">[Jul 31 15:47:20]</a></td><td  width="42%"><a href="#[Jul 31 15:47:20]~Casanova~10474702-12311~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:47:20]~Casanova~10474702-12311~$0.25/$0.50">10474702-12311</a></td><td  width="15%"><a href="#[Jul 31 15:47:20]~Casanova~10474702-12311~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">2</td><td  width="20%"><a href="#[Jul 31 15:46:31]~Casanova~10474702-12310~$0.25/$0.50">[Jul 31 15:46:31]</a></td><td  width="42%"><a href="#[Jul 31 15:46:31]~Casanova~10474702-12310~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:46:31]~Casanova~10474702-12310~$0.25/$0.50">10474702-12310</a></td><td  width="15%"><a href="#[Jul 31 15:46:31]~Casanova~10474702-12310~$0.25/$0.50">$0.25/$0.50</a></td></tr>
<tr bgcolor="#EEEEEE"><td width="3%">1</td><td  width="20%"><a href="#[Jul 31 15:45:09]~Casanova~10474702-12309~$0.25/$0.50">[Jul 31 15:45:09]</a></td><td  width="42%"><a href="#[Jul 31 15:45:09]~Casanova~10474702-12309~$0.25/$0.50">Casanova</a></td><td  width="20%"><a href="#[Jul 31 15:45:09]~Casanova~10474702-12309~$0.25/$0.50">10474702-12309</a></td><td  width="15%"><a href="#[Jul 31 15:45:09]~Casanova~10474702-12309~$0.25/$0.50">$0.25/$0.50</a></td></tr>
</table>
<br><br><a name="[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50"><a/>
<table width="500" style="font-size:13px">
<tr bgcolor="#CCCCCC" style="font-size:30px"><td colspan="4"><center>Real Money Ring Game</center></td></tr>
<tr bgcolor="#EEEEEE"><td width="40%"><b>Table Name</b></td><td width="20%"><b>Hand ID</b></td><td width="20%"><b>Game</b></td><td width="20%"><b>Stakes</b></td></tr>
<tr bgcolor="#CCCCCC"><td width="40%">Casanova</td><td width="20%">10474702-12327</td><td width="20%">Holdem Limit</td><td width="20%">$0.25/$0.50</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:20] : Hand Start.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:20] : Seat 1 : davebass has $31.99</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:20] : Seat 2 : PoKaBoT has $15.21</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:20] : Seat 4 : WC2006 has $9.82</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:20] : Seat 6 : xbambamx has $8.02</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:20] : Seat 8 : drurylane has $23.05</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:20] : Seat 9 : modeselect has $3.34</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:20] : xbambamx is the dealer.</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:21] : drurylane posted small blind.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:21] : modeselect posted big blind.</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:21] : Game [12327] started with 6 players.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:21] : Dealing Hole Cards.</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:21] : Seat 2 : PoKaBoT has 5s Ts</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:25] : davebass folded.</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:26] : PoKaBoT folded.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:29] : WC2006 called $0.25</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:32] : xbambamx folded.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:34] : drurylane called $0.13</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:35] : modeselect checked.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:36] : Dealing flop.</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:36] : Board cards [As Kc 4d]</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:38] : drurylane checked.</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:38] : modeselect checked.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:42] : WC2006 checked.</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:42] : Dealing turn.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:42] : Board cards [As Kc 4d 8s]</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:45] : drurylane bet $0.50</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:45] : modeselect folded.</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:52] : WC2006 called $0.50 and raised $0.50</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:53] : drurylane called $0.50</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:53] : Dealing river.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:53] : Board cards [As Kc 4d 8s Js]</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:56] : drurylane checked.</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:57] : WC2006 bet $0.50</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:03:58] : drurylane called $0.50</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:03:58] : Showdown!</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:04:00] : Seat 4 : WC2006 has 8d Ac</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:04:00] : WC2006 has Two Pair: Aces and 8s</td></tr>
<tr bgcolor="#EEEEEE"><td colspan="4">[Jul 31 16:04:01] : WC2006 wins $3.57 with Two Pair: Aces and 8s</td></tr>
<tr bgcolor="#CCCCCC"><td colspan="4">[Jul 31 16:04:10] : Hand is over.</td></tr>
<td colspan="4"><br><br><center><a href="#topofpage">          Table of contents</a></center></td>
</table>
</body></html>

if i havent been very clear or this file has posted terribly and you would like to help i could always email a file to you to take a look @ ... many thanx
The top part of the file which looks extreemly messy in the post is not all that important its just the hand numbers , the main part is where you can see "Hand Start" .