Introduction
You can either use this data with console applications or with Windows/web applications. I used a console since this is introductory.
In the console application, add the following namespaces:
using System.Net; // to handle internet operations
using System.IO; // to use streams
using System.Text.RegularExpressions; // To format the loaded data
Loading the content
Create the WebRequest and WebResponse objects. See:
WebRequest request=System.Net.HttpWebRequest.Create(url); // url="http://www.google.com/";
WebResponse response=request.GetResponse();
Create the StreamReader object to store the response from the website and save it in any string type variable and close the stream. See:
StreamReader sr=new StreamReader(response.GetResponseStream());
string result=sr.ReadToEnd();
sr.Close();
To view the unformatted result simply write it to the console.
Console.WriteLine(result);
Formatting the result
To format the result we will use Regular Expression class functions. See:
result = Regex.Replace(result, "<script.*?</script>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase); // Remove scripts
result = Regex.Replace(result, "<style.*?</style>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase); // Remove inline stylesheets
result = Regex.Replace(result, "</?[a-z][a-z0-9]*[^<>]*>", ""); // Remove HTML tags
result = Regex.Replace(result, "<!--(.|\\s)*?-->", ""); // Remove HTML comments
result = Regex.Replace(result, "<!(.|\\s)*?>", ""); // Remove Doctype
result = Regex.Replace(result, "[\t\r\n]", " "); // Remove excessive whitespace
Now print the results on the screen. See:
Console.WriteLine(result);