I want to compare several HTML files which contains many string tags so I am looking for the best ways which have less time complexity .
For Example:
There are 2 HTML files below which contain many tags which are string so I want to compare them and count the similar tags at last by using these formulas:
1. Average of similar tags of each two HTML files = (Quantity of same tags in first file) + (Quantity of same tags in second file) / (Sum of all second column of first file) + (Sum of all second column of second file).
2. Main Function for calculating: F ( File1,File2 ) = ((Quantity of tags which are same in both files) / ( Quantity of all tags of first and second files - Quantity of tags which are same in both files ) ) * (Average of similar tags of each two HTML files)
counter: which has the quantity of similar tags of two HTML files.
Note: The second columns contain the quantity of each tag in current HTML file.
The first HTML file:
joiuh
| 12
|
@62jj
| 10
|
k6235
| 2
|
99ui*
| 3
|
00Qyu67 | 9
|
*8455
| 7 |
Sum of all second column = 43
The Second HTML File :
00Qyu67 | 20 |
NY%%%20
| 1
|
UWCN10
| 13 |
89PO* | 6 |
$$CS40 | 11 |
@62jj | 56 |
k6235 | 10 |
Sum of all second column = 117
In this example:
Average of similar tags of each two HTML files: ( ( 10 + 9 + 2 ) + ( 56 + 20 + 10 ) ) / ( ( 43 ) + ( 117 ) ) = 107 / 160 = 0.66
F ( File1,File2 ) = ( ( 3 ) / ( 13 - 3 ) ) * 0.66 = 0.198