Working with XML

This article has been excerpted from book "A Programmer's Guide to ADO.NET in C#".

THE PROGRAMMING WORLD IS moving more and more toward the Web, and Extensible Markup Language (XML) is an essential part of Web-based programming.

This article begins with basic definitions of Hypertext Markup Language (HTML), XML, and other Web-related technologies in coming articles. Then you'll take a look at the .NET Framework Library namespaces and classes that provide XML functionality in the .NET Framework.

I'll explain how to read, write and navigate XML documents using XML and Document Object Model (DOM) .NET classes. I'll also discuss XML transformations. This article also covers the relationship between ADO.NET and XML and shows how to mix them up and use rich ADO.NET database components to display and manipulate XML data. At the end of this article I'll cover the XPathNavigator class, which you can use to navigate through XML documents.

Defining XML - Related Terminology

The ADO.NET and XML.NET Framework Application Programming Interface (API) combination provides a unified way to work with XML in the Microsoft .NET Framework. There are two ways to represent data using XML: in a tagged-text format metalanguage similar to HTML and in a relational table format. You use ADO .NET to access relational table formats. You would use DOM to access the text format.

Before talking about the role of XML in the .NET Framework and how to work with it, it's important you understand the basic building blocks of XML and its related terminology. You'll learn the basic definitions of Standard Generalized Markup Language (SGML) and HTML in the following sections. If you're already familiar with these languages, you can skip to the "XML Overview" section.

Standard Generalized markup Language (SGML)

In 1986, Standard Generalized Markup Language (SGML) because the international standards for representing electronic documents in a unified way. SGML provides a standard format for designing your own markup schemes. Markup is a way to represent some information about data.

Later Hypertext Markup Language (HTML) became the international standard for representing documents on the Web in a unified way.

Hyper text Markup Language (HTML)

The HTML file format is text format that contains, rather heavily. Markup tags. A tag is a section of a program that starts with < and ends with > such as <name>. (An element consists of a pair of tags, starting with <name> and ending with </name>). The language defines all of the markup tags. All browsers support HTML tags, which tell a browser how to display the text of an HTML document. You can create an HTML file using a simple text editor such as Notepad. After typing text in a text editor, you save the file with an.htm or .html extension.

Note: An HTML document is also called HTML pages or HTML file.

Listing 6-1 shows an example of an HTML file, type the following in a text editor, and save it myfile.htm.

Listing 6-1. A simple HTML file


<
html>
<
head>
    <title>A Test HTML Page </title
>

</
head>
<
body>
    Here is the body part.

</
body>
</
html>

If you view this field in a browser, you'll see the text Here is the body part. In Listing 6-1, your HTML file starts with the <html> tag and ends with the </html> tag. The <html> tag tells a browser that this is the starting point of an HTML document. The </html> tag tells a browser that this is the ending point of an HTML documents. These tags are required in all HTML documents. The <head> tag is header information of a document and is not displayed in the browser. The <body> and</body> tags, which are required, makeup the main content of a document. As you can see, all tags ends with a</> tag.

Note: HTML tags are not case sensitive. However, the World Wide Web Consortium (W3C) recommends using lowercase tags in HTML4. The next generation of HTML, XHTML, doesn't support uppercase tags. (The W3C promotes the web worldwide and makes it more it more useful. You can find more information on the W3C at http://www.w3c.org.)

Tags can have attributes, which provide additional information about the tags. Attributes are part of the starting tag. For example:

<table border ="0">

In this example the <table> tag has an attribute border and its value is 0. This value applies to the entire <table> tag, ending with the </table> tag. Table 6-1 describes some common HTML tags.

Table 6-1: Common HTML Tags


TAG

DESCRIPTION

<html>

Indicates start and end of an HTML document

<title>

Contains the title of the page

<body>

Contains the main content, or body, of the page

<h1...h6>

Creates headings (from level 1 to 6)

<p>

Starts a new paragraph

<br>

Insert a single line break

<hr>

Defines a horizontal rule

<!-->

Defines a comment tag in a document

<b>

Defines bold text

<i>

Defines italic text

<strong>

Defines strong text

<table>

Defines a table

<tr>

Defines a row of a table

<td>

Defines a cell of a table row

<font>

Defines a font name and size


There are comes tags beyond those described in table 6-1. In fact the W3C's HTML 4 specification is quite extensive. However, discussing all of the HTML tags is beyond the scope of this article. Before moving to the next topic, you'll take a look at one more HTML example using the tags discussed in the table. Listing 6-2 shows you another HTML document example.

Listing 6-2: HTML tag their usage


<
html>
<
head>
    <title>A Test HTML Page</title>

</
head>
<!-
– This is a comment - ->
<
body>
    <h1>
        Heading 1</h1>
    <h2>
        Heading 2</h2>
    <p>
        <b><i><font size="4">Bold and Italic Text. </font></i></b>
    </p>
    <table border="1" width="43%">
        <tr>
            <td width="50%">
                Row1, Column1
            </td>
            <td width="50%">
                Row1, column2
            </td>
        </tr>
        <tr>
            <td width="50%">
                Row2, Column1
            </td>
            <td width="50%">
                Row2, Column2
            </td>
        </tr>
    </table>

</
body>
</
html
>

Note:
In Listing 6-2, the <font> and <td> tags contain size and width attributes, respectively. The size attribute tells the browser to display the size of the font, which is 4 in this example, and the width attribute tells the browser to display the table cell as 50 percent of the browser window.

Conclusion

Hope this article would have helped you in understanding Hypertext Markup Language (HTML), XML, and other Web-related technologies. See other articles on the website also for further reference.

adobook.jpg This essential guide to Microsoft's ADO.NET overviews C#, then leads you toward deeper understanding of ADO.NET.

Up Next
    Ebook Download
    View all
    Learn
    View all