XML Overview

This article has been excerpted from book "A Programmer's Guide to ADO.NET in C#".

I'll now cover some XML-related terminology. So what exactly is XML? XML stands for Extensible Markup Language. It's family member of SGML and an extended version of HTML. If you've ever gotten your hands dirty with HTML, then XML will be piece of cake.

Essentially XML extends the power and flexibility of HTML. You don't have to work a limited number of tags as you do in HTML. You can define your own tags. And you can store your data in structured format.

Unlike HTML, XML stores and exchanges data. By contrast, HTML represents the data. You can create separate XML files to store data, which can be used as a data source for HTML and other applications.

You'll now see an XML example. Listing 6-3 shows a simple XML file: books.Xml. By default, this file comes with Visual Studio (VS).NET if you have VS .NET or the .NET Framework installed on your machine; you probably have this file in your sample folder.

You'll create this XML file called books.xml, which will store data about books in a bookstore. You'll create a tag for each of these properties, such as a <title> tag that will store the title of the book and so on.

You can write an XML file in any XML editor or text editor. Type the code shown in listing 6-3 and save the file as books.xml.

This file stores information about a bookstore. The root node of the document is <bookstore>. Other tags follow the <bookstore> tag, and the document ends with the</bookstore> tag. Other tags defined inside the <bookstore> tag are <book>, <title>, <author>, and <price>. The tags store information on the store name, book publication date, book ISBN number, book title, author's name and price.

Listing 6-3. Your first XML file sample


<?xml version ='1.0'?>
<bookstore>
  <book>
    <title>The Autobiography of Benjamin Franklin</title>
    <author>
      <firstname>Benjamin</firstname>
      <lastname>Franklin</lastname>
    </author>
    <price>8.99</price>
  </book>
  <book>
    <title> The Confidence Man</title>
    <author>
      <firstname>Herman</firstname>
      <lastname>Melville</lastname>
    </author>
    <price>11.99</price>
  </book>
  <book>
    <title>The Gorgias</title>
    <author>
      <name>Plato</name>
    </author>
    <price>9.99</price>
  </book>
</bookstore>


The first line of an XML file looks like this: <? Xml version ="1.0"? >. This line defines the XML version of the document. This tag tells the browser to start executing the file. You may have noticed that <?> doesn't have an ending </?> tag. Like HTML, other tags in an XML document start with < and are followed by a/> tag. For example, the<title> tag stores the book's title like this: <title> The Gorgias</title>.

In Listing 6-3, <bookstore> is the root node. Every XML document must start with a root node with the starting tag and end with the root node ending tag; otherwise the XML passer gives an error. (I'll discuss XML parsers shortly.)

Now, if you view this document in a browser, the output looks like Listing 6-4.

Listing 6-4. Output of books.xml in the browser

xmlOutput.gif

Your browser recognizes the XML and colors it appropriately.

Important Characteristics of XML

There are few things you need to know about XML. Unlike HTML, XML is case sensitive. In XML, <Books> and <books> are two different tags. All tag in xml must be well formed and must have a closing tag. A language is well formed only if it follows exact language syntaxes the way they are defined.

Improper nesting of tags in XML won't the document property. For example:

<b><i>Bold and Italic Text.</b></i>

is not well-formed. The well- formed version of the same code is this:

<b><i>Bold and Italic Text.</i></b>

Another difference between HTML and Xml is that attributes must use double quotes in XML. Attributes function like HTML attributes and are extra information you can add to a tag. Having attributes without double quotes is improper in XML. For example, Listing 6-5 is a correct example of using the attributes ISBN, genre, and Publication date inside the <book>tag.

Listing 6-5 Attributes in XML files


<?xml version ='1.0'?>
<!-- This file represents a fragment of a book store inventory database -->
<bookstore>
<book genre = "autobiography" publicationdate = "1981" ISBN ="1-861003-11- 0">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<firstname>Benjamin</firstname>
<lastname>Franklin</lastname>
</author>
<price>8.99</price>
</book>
</bookstore>


Output of above code:

xmlOutput2.gif

The genre, publicationdate, and ISBN attribute store information about the category, publication date, and ISBN number of the book, respectively. Browsers won't have a problem parsing the code in listing 6-5, but if you remove the double quotes the attributes like this:

<book genre = autobiography publicationdate = 1981 ISBN =1-861003-11-0>

then the browser will give the error message shown in Figure 6-1.

xmlerror.gif

Figure 6-1. XML attribute definition error message

Another character you might notice in Listing 6-5 is the ! - -, which represents a comment in XML document.

Unlike HTML, XML preserves spaces, which means you'll see the white space in your document displayed in the browser.

XML Parser

An XML parser is a program that sits between XML documents and the application using the document. The job of a parser is to make sure the document meets the define structures, validation, and constraints. You can define validation rules and constraints in a Document type Definition (DTD) or schema.

An XML parser comes with Internet Explorer (IE) 4 or later and can read XML data process it, generate a structured tree, and expose all data elements as DOM objects. The parser then makes the data available for further manipulation through scripting. After that, another application can handle this data.

MSXML parser comes with IE 5 or later and resides in the MSXML.DLL library. MSXML parser supports the W3C XML 1.0 and XML DOM recommendations, DTDs, schemas, and validations. You can use MSXML programmatically from languages such as JavaScript, VBScript, Visual Basic, Perl, and C++.

Universal Resource Identifier (URI)

A Universal Resource Identifier (URI) is a resource name available on the Internet. A URI contains three parts: the naming schema (a protocol used to access the resource), the name of the machine (in the form of an Internet Protocol) upon which the resource reside, and the name of the resource (the file name). For Example, http://www.csharpcorner.com/Images/cshea1.gif is a URI name where http:// is a protocol, www.csharpcorner.com is the address of the machine (which is actually a conceptual name for the address), and Images/afile.gift is the filename location on that machine.

Conclusion

Hope this article would have helped you in understanding XML. See other articles on the website also for further reference.

adobook.jpg This essential guide to Microsoft's ADO.NET overviews C#, then leads you toward deeper understanding of ADO.NET.

Up Next
    Ebook Download
    View all
    Learn
    View all