An XML Document and its Items

This article has been excerpted from book "A Programmer's Guide to ADO.NET in C#".

An XML document is a set of elements in a well-formed and valid standard format. A document is valid if it has DTD associated with it and if it complies with the DTD. As mentioned earlier, a document is well formed if it contains one or more elements and if it follows the exact syntaxes of the language. An XML parser will only parse a document that is a well formed, but the document doesn't necessarily have to be valid. This means that a document must have at least one element (a root element) in it, but it doesn't matter whether it uses DTDs.

An XML document has the following parts, each described in the sections that follow:

  • Prolog
  • DOCTYPE declaration
  • Start and end tags
  • Comments
  • Character and entity references
  • Empty elements
  • Processing instructions
  • CDATA section
  • Attributes
  • White spaces
Prolog

The prolog part of a document appears before the root tag. The prolog information applies to the entire document. It can have character encoding, stylesheets, comments, and processing instructions. This is an example of a prolog:


<?
xml version ="1.0"  ?>
<?
xml-stylesheet type="text/xsl"  href="books.xsl" ?>
<!
DOCTYPE StudentRecord SYSTEM "mydtd.dtd">
<!=
my comments - - - ->

DOCTYPE Declaration

With the help of a DOCTYPE declaration, you can read the structure of your root element and DTD from external files. A DOCTYPE declaration can contain a root element or a DTD (used for document validation). In a validating environment, a DOCTYPE declaration is must. In a DOCTYPE reference, you can even use a URI reference. For example:


<!
DOCTYPE rootElement>

or


<!
DOCTYPE rootElement SYSTEM "URIreference">

or


<!
DOCTYPE StudentRecord SYSTEM "mydtd.dtd">

Start and End tags

Start and end tags are the heart of XML language. As mentioned earlier in the article, XML is nothing but a text file start and end tags. Each tag starts with <TAG> and ends with </TAG>. If you want to add a tag called <book> to your XML file, it must start with <book> and end the </book>, as shown in this example:


<?
xml version ="1.0"  ?>
<
book xmlns = "http://www.c-sharpcorner.com/xmlNet">
  <
title> The Autobiography of Benjamin Franklin</title>
  <
author>
    <
first-name>
      Benjamin</ First-name
>

      <
last-name>
        Franklin</ last- name
>

      </
author>
  <
price>
    8.99</ price
>

  </
book>

Note: Empty elements don't have to heed this < >...</ > criteria. I'll discuss empty tags later in the "Empty Elements" section.

Note: An element is another name a starting and ending tag pair

Comments

Using comments in your code is good programming practice. They help you understand your code, as well as help others to understand your code, by explaining certain code lines. You use the <! - - and - - > pair to write comments in an XML document:


<!--
My comments here -->

<!--
This is a comment -->

XML parsers ignore comments.

CDATA Sections

What if you want to use < and > characters in your XML file but not as part of a tag? Well, you can't use them because the XML parser will interpret them as start and end tags. CDATE provides the following solution. So you can use XML markup characters in your documents and have the XML parser ignore them. If you use the following line:


<! [CDATA [
I want to use < and >, characters]]>

the parser will treat those characters as data.

Another good example of CDATA is the following example:


<! [CDATA [<
Title>This is the title of a page</ Title>

In this case, the parser will treat the second title as data as data, not as a mark up tag.

Character and entity reference

In some cases, you can't use a character directly in a document because of some limitations, such as character being treated as markup character or a device or processor limitation.

By using character and entity references, you can include information in a document by reference rather than the character.

A character reference is a hexadecimal code for a character. You use the hash symbol (#) before the hexadecimal value. The XML parser takes care of the rest. For example, the character reference for the Return Key is# x000d.

The reference start with an ampersand (&) and a #, and it ends with a semicolon (;). The syntax for decimal and hexadecimal references is & # value; and &#xvalue; respectively. XML has some built-in entities. Use the It, gt, and amp entities for less than, greater than, and ampersand, respectively. Table 6-2 shows five XML built-in entities and their references. For example, if you want to write a > b or Jack & Jill, you can do that by using these entities:


A&gt;b and Jack&amp; Jill


Table 6-2. XML Build- in Entities


ENTITY

REFERENCE

DESCRIPTION

Lt

&lt

Less than: <

Gt

&gt

Greater than: >

Amp

&amp

Ampersand: &

Apos

&apos

Single quote: '

Auot

&quot

Double quote: "

Empty elements

Empty elements start and end with the same tag. They start with < and end with >. The text between these two symbols is the text data. For example:


<
Name> </Name>
<
IMG SRC= "img.jpg" />
<
tagname/>

are all empty element example. The <IMG> specifies an inline image, and the SRC attribute specifies the image's location. The image can be any format, though browsers generally support only GIF, JPEG, and PNG images.

Processing Instructions

Processing instructions (PIs) play a vital role in XML parsing. A PI holds the parsing instructions, which are read by the parser and other programs. If you noticed the first line of any of the XML samples discussed earlier, a PI starts like this:


<?
xml version ="1.0" ?>

All PIs start with <? And end with ?>. This is another example of PI:


<?
xml-stylesheet type ="text/ xsl" href="myxsl.xsl"?>

This PI tells a parser to apply a stylesheet on the document.

Attributes

Attributes let you add extra information to an element without creating another element. An attribute is a name and value pair. Both the name and value must be present in an attribute. The attribute value must be in double quotes; otherwise the parser will give an error. Listing 6-8 is an example of an attribute in a <table> tag. In the example, the <table> tag has border and width attributes, and the <td> tag a width attribute.

Listing 6-8. Attributes in the < table> tag


<
table border ="1" width = "43%">
  <
tr>
    <
td width ="50%">Row1, Column1</td>
    <
td width ="50%">Row1, Column2</td>
  </
tr>
  <
tr>
    <
td width = "50%">Row2, Column1</td>
    <
td width = "50%">Row2, Column2</td>
  </
tr>
</
table>

White spaces

XML preserves white spaces except in attribute values. That means white space in your document will be displayed in the browser. However, white spaces are not allowed before the XML declaration. The XML parser reports all white spaces available in the document. If white spaces appear before declaration, the parser treats them as PI.

In element, XML 1.0 standard defines the xml: space attribute to insert spaces in a document. The XML:space attribute accepts only two values: default and preserve. The default value is the same as not specifying an xml:space attribute. It allows the parser to treat spaces as in a normal document. The Preserve value tells the parser to preserve space in the document. The parser preserves space in attributes, but it converts line break into single spaces.

Conclusion


Hope this article would have helped you in understanding XML Document and its Items. See other articles on the website also for further reference.


adobook.jpg
This essential guide to Microsoft's ADO.NET overviews C#, then leads you toward deeper understanding of ADO.NET.

Up Next
    Ebook Download
    View all
    Learn
    View all