Company
I am using regex and i am using the following expressions:
1) r = New Regex("(?:""kop\x22\x3E)(\w[A-Z]\S+\x20\S+)(?:\b\x3c)", RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace Or RegexOptions.Compiled)
this gives as result:
Internal Links
2) r = New Regex("(?:""kop\x22\x3E)(\w[A-Z]\S+|\x20|\S+)(?:\b\x3c)", RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace Or RegexOptions.Compiled)
this gives as result:
Company
3) r = New Regex("href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))", RegexOptions.IgnoreCase Or RegexOptions.Compiled)
this gives the following result
http://intranet.company.com/
http://intranet.company.com/email/
http://www.hotmail.com/
I want as output a text file with one of the following:
1)
Internal Links
url1=http://intranet.company.uk/,Intranet Pages
url2=http://intranet.company.de/email/,Email
url3=http://www.hotmail.com/,Hotmail Email
Company
url1=http://www.test.com/,Test Url
url2=http://sonelink.url.nl/,Somelink Url
2)
Internal Links,http://intranet.company.uk/,Intranet Pages
Internal Links,http://intranet.company.de/email/,Email
Internal Links,http://www.hotmail.com/,Hotmail Email
Company,url1=http://www.test.com/,Test Url
Company,http://sonelink.url.nl,Somelink Url
My Questions:
1) : How can I combine the regex expresions Nr.1 and Nr.2 into 1 single expression
(these regex expressions don't work flawlessly)
2) : How can I group the Urls (found with regex expression 3) so that they form a single category (per group) for example
Under "Internal Links" only the following urls must appear
http://intranet.company.com/
http://intranet.company.com/email/
http://www.hotmail.com/
but the urls under Company must appear under their Own group
So "Company" will only contain the following urls:
http://www.test.com/
http://sonelink.url.nl/
3): How can I extract the url and the comment from the following line?
Hotmail Email
|
I want as result the following:
http://www.hotmail.com,Hotmail Email
Please help me, your help is greatly appreciated
MJCM