Use XmlTextReader to parse large XML documents.
public void findAParticularNodesUsingTextReader()
{
XmlTextReader txtreaderObj = new XmlTextReader(@"C:\
Document and Settings\ Administrator\Desktop\samleXmlDoc.xml");
txtreaderObj.WhitespaceHandling = WhitespaceHandling.None;
while (txtreaderObj.Read())
{
if (txtreaderObj.Name.Equals("TotalPrice") &&
txtreaderObj.IsStartElement())
{
txtreaderObj.Read();
richTextBox1.AppendText(txtreaderObj.Value);
}
}
}
Result
12.36 11.99 7.97
Faster, read-only XPath provides query-based access to data, use XPathDocument and XPathNavigator along with xpath query.
public void FindTagsUsingXPthNaviatorAndXPathDocumentNew()
{
XPathDocument xpDoc = new XPathDocument(@"C:\Documents
and Settings\ Administrator\Desktop\samleXmlDoc.xml");
XPathNavigator xpNav = xpDoc.CreateNavigator();
XPathExpression xpExpression =
xpNav.Compile(@"/Orders/Order/TotalPrice");
XPathNodeIterator xpIter = xpNav.Select(xpExpression);
while (xpIter.MoveNext())
{
richTextBox1.AppendText(xpIter.Current.Value);
}
}
Result
12.36 11.99 7.97
Combining XmlReader and XmlDocument. On the XmlReader use the MoveToContent and Skip methods to skip unwanted items.
public void UserXmlReaderAndXmlDocument()
{
XmlReader RdrObj = XmlReader.Create(@"C:\Documents and Settings\ Administrator\ Desktop\samleXmlDoc.xml");
while (RdrObj.Read())
{
if (RdrObj.NodeType.Equals(XmlNodeType.Element) && RdrObj.Name.Equals("TotalPrice") &&
RdrObj.IsStartElement())
{
RdrObj.Read();
richTextBox1.AppendText(RdrObj.Value);
}
}
}
Result
12.36 11.99 7.97
public void UserXmlReaderAndXmlDocumentNew()
{
XmlReader RdrObj = XmlReader.Create(@"C:\Documents and Settings\Administrator\ Desktop\samleXmlDoc.xml");
XmlDocument XmlDocObj = new XmlDocument();
while (RdrObj.Read())
{
if (RdrObj.NodeType.Equals(XmlNodeType.Element) && RdrObj.Name.Equals("TotalPrice") &&
RdrObj.IsStartElement())
{
RdrObj.Read();
richTextBox1.AppendText(RdrObj.Value);
}
}
XmlDocObj.Load(RdrObj);
richTextBox1.Text = XmlDocObj.InnerText;
}
Design Considerations
-
Avoid XML as much as possible.
-
Avoid processing large documents.
-
Avoid validation. XmlValidatingReader is 2-3x slower than XmlTextReader.
-
Avoid DTD, especially IDs and entity references.
-
Use streaming interfaces such as XmlReader or SAXdotnet.
-
Consider hard-coded processing, including validation.
-
Shorten node name length.
-
Consider sharing NameTable, but only when names are likely to be really common. With more and more irrelevant names, it becomes slower and slower.
Parsing XML
-
Use XmlTextReader to avoid validating readers.
-
When a node is required, consider using XmlDocument.ReadNode(), not the entire Load().
-
Set null for the XmlResolver property on some XmlReaders to avoid access to external resources.
-
Make full use of MoveToContent() and Skip(). They avoid extraneous name creation. However, it becomes nearly nothing when you use XmlValidatingReader.
-
Avoid accessing Value for Text/CDATA nodes as much as possible.
Validating XML
-
Avoid extraneous validation.
-
Consider caching schemas.
-
Avoid identity constraint usage. Not only because it stores key/fields for the entire document, but also because the keys are boxed.
-
Avoid extraneous strong typing. It results in XmlSchemaDatatype.ParseValue(). It could also result in avoiding access to Value string.
Writing XML
-
Write output directly as long as possible.
-
To save documents, XmlTextWriter without indentation is better than TextWriter/Stream/file output (all indented) except for human reading.
DOM Processing
-
Avoid InnerXml. It internally creates XmlTextReader/XmlTextWriter. InnerText is fine.
-
Avoid PreviousSibling. XmlDocument is very inefficient for a backward traverse.
-
Append nodes as soon as possible. Adding a big subtree results in a longer extraneous run to check ID attributes.
-
Prefer FirstChild/NextSibling and avoid accessing ChildNodes. It creates XmlNodeList, that is initially not instantiated.
XPath Processing
-
Consider using XPathDocument but only when you need the entire document. With XmlDocument you can use ReadNode() but no equivalent for XPathDocument.
-
Avoid preceding-sibling and preceding axes queries, especially over XmlDocument. They would result in sorting, and for XmlDocument they need access to PreviousSibling.
-
Avoid // (descendant). The returned nodes are mostly likely to be irrelevant.
-
Avoid position(), last() and positional predicates (especially things like foo[last()-1]).
-
Compile XPath string to XPathExpression and reuse it for frequent query.
-
Don't run XPath query frequently. It is costly since it always must Clone() XPathNavigators.
XSLT Processing
-
Reuse (cache) XslTransform objects.
-
Avoid key() in XSLT. They can return all the kinds of nodes that prevent node-type based optimization.
-
Avoid document() especially with nonstatic argument.
-
Pull style (for example xsl:for-each) is usually better than template match.
-
Minimize output size. More importantly, minimize input.