Automatic Reading and Verification of Microsoft Word document in C# using Aspose.Words


Introduction

Student Course Registration and Verification (SCRV) is designed to automatically read and modify Roll Number slips in Microsoft Word format. The idea is to design a tool that reads, parses and modifies the Microsoft Word document automatically. SCRV take input from Microsoft Word format files of Roll Number slips. It reads the Microsoft Word documents, comparing the contents of the document with an Oracle database. The document is updated/corrected, if required, and saved at a different location.

Tools Used: Visual Studio 2005, C#

Libraries Used:  Aspose.Words

Prerequisites

Make sure that following components and tools are installed on the machine.

  1. Microsoft Visual Studio 2005.
  2. Aspose.Words component installed.

Adding Reference to Aspose.Words Library

Create a simple C# windows project using the built-in wizard of Microsoft Visual Studio 2005. Give an appropriate meaningful name to the project like SCRV (Student Course registration and verification). Now you have to add the reference of the Aspose.Words library to your project. So the library should be available for opening, editing and manipulation of the Microsoft Word document. Following are simple steps to add the library.

  • Right click the References node in the Solution Explorer window of the project and select Add Reference option.

        

  • Select the appropriate Aspose.Words version from the dialog. Press OK to proceed further.

        

  • The library is available for use in the project. Add the required namespaces of the Aspose.Words library to the project. So that we can access the classes without writing the fully qualified names of the class.

    using Aspose.Words;

    using Aspose.Words.Drawing;

    using Aspose.Words.Reporting;

    using Aspose.Words.Viewer;

     

    namespace SCRV
    {

Reading Microsoft Word Documents

Reading Microsoft Word document is very easy using Aspose.Words library. Following is the code segment to read and load Microsoft Word(*.doc) files in a batch from a folder.

//Reading all the files in the folder

String[] files = Directory.GetFiles("Path");


//Loop through all Files to process all Files one by one

for (int i = 0; i < files.Length; i++)

{

    //Reading the Microsoft Word file using Aspose.Words library

    Aspose.Words.Document doc = new Document(filename);

Accessing Properties of Microsoft Word document

Once the file is loaded we need to change the properties of the file. We need to change the author of the document to this new username. Following code segment will show how to access/change the properties of the Microsoft Word document in C#.

//Change the Author of the Document    doc.BuiltInDocumentProperties.Author    =   "Examination Center";

//Change the Category of the document

doc.BuiltInDocumentProperties.Category  =   " New category ";

//Add the comments

doc.BuiltInDocumentProperties.Comments  =   " Comments";

//Add the University name

doc.BuiltInDocumentProperties.Company   =   " University name";

//Add the Subject

doc.BuiltInDocumentProperties.Subject   =   " Subject";

//Change the version of the document as required

doc.BuiltInDocumentProperties.Version   =   "3.0";

Parsing Microsoft Word Document

Microsoft Word document is a complex format. A Word document consists of Sections, Pages, Paragraphs, Tables, Bookmarks, Header and Footers. There are many ways to access these elements of the document using Aspose.Words. So the idea is to use the most simple and efficient way to access the contents(Refer to as Nodes).

 

Accessing Sections of Microsoft Word Document 

 

A word document has one or more sections. We can navigate through sections using the Aspose.Words library. In our case we have only one section. So we can get the object of the first section of the document using following code.

//Load the Microsoft Word document from a file

Aspose.Words.Document doc = new Document(filename);


//Get the first Section of the document

Section firstSection = doc.FirstSection;


Acessing Header and Footer Of Microsoft Word Document

After we get the section from the document we have to change the Header and Footer of the document. Each section has a separate header and footer. Moreover first page, even and odd page headers can also be different. One can easily read and change the content of the header and footer using following code.

//Load the Microsoft Word document from a file

Aspose.Words.Document doc = new Document(filename);

 

//Get the first only Section of the document

Section firstSection = doc.FirstSection;

 

//Loop through all the Header and Footers of the Section

foreach( HeaderFooter headerFooter in  firstSection.HeadersFooters)

{

    //If its the header

    if (headerFooter.IsHeader)

    {
        
//Modify the Header here if required

    }
}

 

Acessing Tables of Microsoft Word Document

Aspose.Words provides the easy access to each and every element of the document using properties. Tables property is used to read and modify the content of the Table.

The Student Roll Number slip has two tables. First table contains the personal data of the student. The second table contains the information about the courses registered by the student. These are the courses our tool has to check and verify.

We can get the second table at index 1 from Section using following code.

//Get the table of the document from first section at index 1

Aspose.Words.Table courseTable = firstSection.Body.Tables[1];

 

Once we have searched the right table to read. Now we need to access each row of the table. Similarly Rows property of the Aspose.Words.Table class will be used to read and modify the rows.

 

//Loop through all the rows of the table one by one

foreach(Aspose.Words.Row myRow in courseTable.Rows)

{

        /*

* Code for the manipulation of the Rows

 

*/

 }

 

Similarly Aspose.Words.Cell class provides the functionality to read and parse the contents of a Cell. Each Row class has a collection of Cells. Each Cell can be accessed by the index.

 

As we need to verify the Course code and Course Title from that particular table. We have to access the cells with index 1 and 2 from each row. Than the course code and title is verified and matched from the database. And the content of the cells is updated if required.

//Get the second table of the document from first section

Aspose.Words.Table courseTable = firstSection.Body.Tables[1];

           

//Loop through all the rows of the table one by one

foreach(Aspose.Words.Row myRow in courseTable.Rows)

{

    //Getting the Cells with Course code

    Aspose.Words.Cell courseCode =  myRow.Cells[1];

 
    //Getting the Cells with Course Title

    Aspose.Words.Cell courseTitle =  myRow.Cells[1];

             

    //Check if the information is correct

    if(!CheckValidity(courseCode.GetText(),courseTitle.GetText())

    {

         //Modify the courseCode and courseTitle with correct information from database.

    }
}


Saving the Word Document

Finally after all required modification we need to save the document. The Save function of the Aspose.Words.Document class is used to save the document in compatible format.


//Save the Document at location specified by outputFilename

doc.Save(ouputFilename);


Conclusion

 

Using Aspose.Words its very simple to create, read and modify the Microsoft Word document in C#. This is only one case where we have automated our system. Aspose.Words gives full control of the Microsoft Word document to the programmer.

 

Screen Shot of the Application (SCRV)

Up Next
    Ebook Download
    View all
    Learn
    View all