Easily Find Tags and Values in a Large Xml Document Using XmlTextReader in C#

Liju Gopalan
11y
49k
0
1

100

Article

Use XmlTextReader to parse large XML documents.

public void findAParticularNodesUsingTextReader()

{

XmlTextReader txtreaderObj = new XmlTextReader(@"C:\

Document and Settings\ Administrator\Desktop\samleXmlDoc.xml");

txtreaderObj.WhitespaceHandling = WhitespaceHandling.None;

while (txtreaderObj.Read())

{

if (txtreaderObj.Name.Equals("TotalPrice") &&

txtreaderObj.IsStartElement())

{

txtreaderObj.Read();

richTextBox1.AppendText(txtreaderObj.Value);

}

Result

12.36 11.99 7.97

Faster, read-only XPath provides query-based access to data, use XPathDocument and XPathNavigator along with xpath query.

public void FindTagsUsingXPthNaviatorAndXPathDocumentNew()

{

XPathDocument xpDoc = new XPathDocument(@"C:\Documents

and Settings\ Administrator\Desktop\samleXmlDoc.xml");

XPathNavigator xpNav = xpDoc.CreateNavigator();

XPathExpression xpExpression =

xpNav.Compile(@"/Orders/Order/TotalPrice");

XPathNodeIterator xpIter = xpNav.Select(xpExpression);

while (xpIter.MoveNext())

{

richTextBox1.AppendText(xpIter.Current.Value);

}

Result

12.36 11.99 7.97

Combining XmlReader and XmlDocument. On the XmlReader use the MoveToContent and Skip methods to skip unwanted items.

public void UserXmlReaderAndXmlDocument()

{

XmlReader RdrObj = XmlReader.Create(@"C:\Documents and Settings\ Administrator\ Desktop\samleXmlDoc.xml");

while (RdrObj.Read())

{

if (RdrObj.NodeType.Equals(XmlNodeType.Element) && RdrObj.Name.Equals("TotalPrice") &&
RdrObj.IsStartElement())

{

RdrObj.Read();

richTextBox1.AppendText(RdrObj.Value);

}

Result

12.36 11.99 7.97

public void UserXmlReaderAndXmlDocumentNew()

{

XmlReader RdrObj = XmlReader.Create(@"C:\Documents and Settings\Administrator\ Desktop\samleXmlDoc.xml");

XmlDocument XmlDocObj = new XmlDocument();

while (RdrObj.Read())

{

if (RdrObj.NodeType.Equals(XmlNodeType.Element) && RdrObj.Name.Equals("TotalPrice") &&
RdrObj.IsStartElement())

{

RdrObj.Read();

richTextBox1.AppendText(RdrObj.Value);

}

XmlDocObj.Load(RdrObj);

richTextBox1.Text = XmlDocObj.InnerText;

}

Design Considerations

Avoid XML as much as possible.
Avoid processing large documents.
Avoid validation. XmlValidatingReader is 2-3x slower than XmlTextReader.
Avoid DTD, especially IDs and entity references.
Use streaming interfaces such as XmlReader or SAXdotnet.
Consider hard-coded processing, including validation.
Shorten node name length.
Consider sharing NameTable, but only when names are likely to be really common. With more and more irrelevant names, it becomes slower and slower.

Parsing XML

Use XmlTextReader to avoid validating readers.
When a node is required, consider using XmlDocument.ReadNode(), not the entire Load().
Set null for the XmlResolver property on some XmlReaders to avoid access to external resources.
Make full use of MoveToContent() and Skip(). They avoid extraneous name creation. However, it becomes nearly nothing when you use XmlValidatingReader.
Avoid accessing Value for Text/CDATA nodes as much as possible.

Validating XML

Avoid extraneous validation.
Consider caching schemas.
Avoid identity constraint usage. Not only because it stores key/fields for the entire document, but also because the keys are boxed.
Avoid extraneous strong typing. It results in XmlSchemaDatatype.ParseValue(). It could also result in avoiding access to Value string.

Writing XML

Write output directly as long as possible.
To save documents, XmlTextWriter without indentation is better than TextWriter/Stream/file output (all indented) except for human reading.

DOM Processing

Avoid InnerXml. It internally creates XmlTextReader/XmlTextWriter. InnerText is fine.
Avoid PreviousSibling. XmlDocument is very inefficient for a backward traverse.
Append nodes as soon as possible. Adding a big subtree results in a longer extraneous run to check ID attributes.
Prefer FirstChild/NextSibling and avoid accessing ChildNodes. It creates XmlNodeList, that is initially not instantiated.

XPath Processing

Consider using XPathDocument but only when you need the entire document. With XmlDocument you can use ReadNode() but no equivalent for XPathDocument.
Avoid preceding-sibling and preceding axes queries, especially over XmlDocument. They would result in sorting, and for XmlDocument they need access to PreviousSibling.
Avoid // (descendant). The returned nodes are mostly likely to be irrelevant.
Avoid position(), last() and positional predicates (especially things like foo[last()-1]).
Compile XPath string to XPathExpression and reuse it for frequent query.
Don't run XPath query frequently. It is costly since it always must Clone() XPathNavigators.

XSLT Processing

Reuse (cache) XslTransform objects.
Avoid key() in XSLT. They can return all the kinds of nodes that prevent node-type based optimization.
Avoid document() especially with nonstatic argument.
Pull style (for example xsl:for-each) is usually better than template match.
Minimize output size. More importantly, minimize input.

Up Next

Ebook Download

View all

Printing in C# Made Easy

Read by 10.8k people

Download Now!

Learn

View all