This article will explain how to: Use a WebBrowser control in a Windows form Navigate to a web site in the WebBrowser control Access the HTML document in the WebBrowser control Use the Document Object Model to find data in a table in the WebBrowser control Create a web page from the data and show that web page in the WebBrowser control Library of Congress Web SiteThe sample for this article will get (scrape) data from a page of the United States Library of Congress (LOC) catalog data for bibliographic (book) data. To understand the sample, it will help to understand the web site pages being scraped. Go to Library of Congress Online Catalog and do a search. For example, in the search text enter:ksub "c#" not musicAnd then select "Expert Search" for the search type. Then select one of the books in the results. If you do a search that results in just one book (such as an ISBN search), then you will see the book data directly without the list of books. In the first page of the book data, there will be tabs for "Brief Record", "Subjects/Content", "Full Record" and "MARC Tags". Click on the MARC Tags tab; you will then get a page that resembles:That is the data that this sample will scrape.Note that we are getting the data from that web site for the purposes of this sample, but this is not the best way to get book data from the LOC using software; a better interface would be Z39.50. See the Library of Congress WWW/Z39.50 Gateway for more about Z39.50.OverviewFinding data in a web page often is not easy. First we must find the data ourselves by viewing the page and the HTML, and then we must design a way for the program to find the data. Each web page is different but often the elements have an id or a name that can make things easier. We can use the HtmlElementCollection.GetElementsByName Method to easily find an element by id or by name. Sometimes we must find an element by iterating (going to each next element) the elements that precede it.The Document Object Model is a standard way to represent HTML in programs. Dynamic HTML (DHTML) is similar. You can learn more about each in About the W3C Document Object Model.HTML Classes in the Forms Namespace and in mshtmlThe Forms Namespace has a few classes that help with the use of HTML in a WebBrowser control. The following table summarizes those classes.
Library of Congress Web SiteThe sample for this article will get (scrape) data from a page of the United States Library of Congress (LOC) catalog data for bibliographic (book) data. To understand the sample, it will help to understand the web site pages being scraped. Go to Library of Congress Online Catalog and do a search. For example, in the search text enter:ksub "c#" not musicAnd then select "Expert Search" for the search type. Then select one of the books in the results. If you do a search that results in just one book (such as an ISBN search), then you will see the book data directly without the list of books. In the first page of the book data, there will be tabs for "Brief Record", "Subjects/Content", "Full Record" and "MARC Tags". Click on the MARC Tags tab; you will then get a page that resembles:That is the data that this sample will scrape.Note that we are getting the data from that web site for the purposes of this sample, but this is not the best way to get book data from the LOC using software; a better interface would be Z39.50. See the Library of Congress WWW/Z39.50 Gateway for more about Z39.50.OverviewFinding data in a web page often is not easy. First we must find the data ourselves by viewing the page and the HTML, and then we must design a way for the program to find the data. Each web page is different but often the elements have an id or a name that can make things easier. We can use the HtmlElementCollection.GetElementsByName Method to easily find an element by id or by name. Sometimes we must find an element by iterating (going to each next element) the elements that precede it.The Document Object Model is a standard way to represent HTML in programs. Dynamic HTML (DHTML) is similar. You can learn more about each in About the W3C Document Object Model.HTML Classes in the Forms Namespace and in mshtmlThe Forms Namespace has a few classes that help with the use of HTML in a WebBrowser control. The following table summarizes those classes.
You need to be a premium member to use this feature. To access it, you'll have to upgrade your membership.
Become a sharper developer and jumpstart your career.
$0
$
. 00
monthly
For Basic members:
$20
For Premium members:
$45
For Elite members: