Sort and Filter CSV Files With DataTable and DataView

Working with data, a common need is to manage streams that come from various software. Sometimes there's the necessity to make available to a program the data created by another one, and the phase of data exchange is often entrusted to text files, like Comma Separated Value (CSV) files. That's perfectly OK when the information can be acquired the way they present themselves, but if we need to further elaborate them before importing, a text file is not the most convenient format to work with. That's when DataTables and DataViews from the .NET Framework can be useful, helping us to manage our data in a tabular form. This article shows some brief examples using Visual Basic .NET.
A simple scenario
Consider a sample pipe-separated CSV file like the following:
  1. Name|Surname|Age|Occupation|City  
  2. John|Doe|30|Developer|New York  
  3. Jack|NoName|25|DBA|Los Angeles  
  4. Mario|Mario|42|Plumber|Unknown  
  5. Laura|Green|25|Developer|Unknown  
We have a first line of field headers and four rows of data. Filtering or sorting the data, while mantaining the original textual format, will be a not-so-quick task. We could import the file into an Excel worksheet, in a SQL Server table, and so on, but we do not have access to those tools. Maybe we must develop an interpreter that take the original file and produce a second file with the data revisited. There are many possible scenarios.

In those cases, a DataTable is surely what we need. Like the MSDN says, a DataTable represents a table of in-memory data. It is a sort of virtual table, in which we can store data in tabular form (in other words columns and rows), relying on the peculiarities of such a structure (data access, relations and so on). A DataTable can be bound to any control (WPF or WinForms) on which the DataSource or ItemSource property is available.

Create a DataTable from a CSV file
Let's create and populate a DataTable with the data above. For the sake of immediacy, I've written a short snippet to create a file containing our sample data.
Consider the following:
  1. Const sampleFile As String = "c:\temp\sample.txt"  
  2. '-- Create sample data, writing them to c:\temp\sample.txt  
  3. Dim _sampleData As String = "Name|Surname|Age|Occupation|City" & Environment.NewLine & _  
  4.                "John|Doe|30|Developer|New York" & Environment.NewLine & _  
  5.                 "Jack|NoName|25|DBA|Los Angeles" & Environment.NewLine & _  
  6.                 "Mario|Mario|42|Plumber|Unknown" & Environment.NewLine & _  
  7.                 "Laura|Green|25|Developer|Unknown"  
  9. IO.File.WriteAllText(sampleFile, _sampleData)  
The code above will create a file named "sample.txt" under the "c:\temp\" folder, writing the content of _sampleData in it. Starting from such a file, we need to initialize the DataTable and create the columns representing our data. Since we have the columns names on the first row of the file, we can write a snippet like this:
  1. '-- Create a datatable from our text file  
  2. Dim dt As New DataTable("Sample")  
  4. '-- Opens sample file, read first line, assign  
  5. For Each l As String In IO.File.ReadLines(sampleFile)(0).Split("|")  
  6.     dt.Columns.Add(l)  
  7. Next  
We've called our DataTable "Sample". Then, looping on the string array obtained by splitting by pipe the first line of our file, we've extracted each field name (Name, Surname, Age, Occupation, City). For each of those, a call to the Add function of the DataTable's DataColumnCollection will create the column named as the passed parameter (in our case, the field name itself).

The preceding creates the structure of our table. We must now fill it with data. That means we will apply a logic similar to the one used for columns, this time on rows, using the file lines from the second to the last one. 
  1. '-- Read sample data as rows  
  2. Dim nRow As Boolean = False  
  3. For Each l As String In IO.File.ReadLines(sampleFile)  
  4.     If Not (nRow) Then nRow = True : Continue For  
  5.     dt.Rows.Add(l.Split("|"))  
  6. Next  
Easy enough, we loop on all the file lines (skipping the first, used for column headers), each time adding a row to our DataTable. Since one of the Rows.Add overloads accepts an array of Objects, we can use the array obtained by splitting the read line by its separator (l.Split("|")). At the end of the loop, our DataTable will have a valid structure, and valid data too.

DataTable as DataSource
We can test our DataTable against a DataGridView (or any other control accepting a DataSource), to check everything is OK. I've added to my Form a DataGridView. We can bind the DataTable to the grid by doing:
  1. '-- The DataTable could be used as a Data Source  
  2. DataGridView1.DataSource = dt  
Running our program, the result will be:
Figure 1: GridView Data 
We can observe everything went OK; our data has correctly shown up. They are presented in the way the DataTable was populated; no filter applied, no specific sort order. Each line is found at the index it has in the file (in other words "John Doe" is the first row, "Laura Green" the last one).

Use DataView to sort and filter data
As stated by the MSDN, a DataView represents a databindable, customized view of a DataTable for sorting, filtering, searching, editing, and navigation. The DataView does not store data, but instead represents a connected view of its corresponding DataTable. DataViews allow us to customize the way our data is presented.

Let's say we want to order our DataTable by the Age columns, showing the older people first. We could do that easily, by writing: 
  1. Dim dv As New DataView(dt)  
  2. dv.Sort = "Age DESC"  
And then binding our grid to the DataView as in the following:
  1. DataGridView1.DataSource = dv  
If we run our program now, we'll see the rows will be sorted as requested. Sorting can be done on multiple columns, for example, if we want to sort by descendant Age and descending Name, the preceding Sort code will be changed to:
  1. dv.Sort = "Age DESC, Name DESC"  
Any other column could be added to the Sort property, separating each one of them by a comma. DESC and ASC predicates determines the sort direction, namely descending (from greater to smaller) and ascending (from smaller to greater).
Figure 2: Sort Age in Descending 
In the screenshot, note the descending sort by Age, and the secondary descending sort by Name in case of equal Age value. DataViews filter's capabilities are pretty straightforward too. Using the RowFilter property, we could write a concise expression to determine the data we want to show. For example, let's say we want to extract only those records in which the city is Unknown. We could write:
  1. dv.RowFilter = "City = 'Unknown'"  
And the result will be:
Figure 3:  Record where city is Unknown 
A multi-filter can be set by joining different expression with the logic predicates AND and OR. The expression:  
  1. dv.RowFilter = "Age < 30 OR Age=42"  
Will return the following:
Figure 4: Age between 30 & 42 
Because we have only two records where Age is < 30, and only one in which it is = 42.

From DataView to Text
Now, let's assume we want to create a second CSV file, containing only those records in which the field City contains a blank space, sorted by ascending Age field. A simple way to export our data could be:
  1. Dim dw As New DataView(dt)  
  2. dw.Sort = "Age ASC"  
  3. dw.RowFilter = "City LIKE '% %'"  
  5. Dim lines As String = ""  
  6. For Each r As DataRowView In dw  
  7.     lines &= String.Join("|", r.Row.ItemArray) & Environment.NewLine  
  8. Next  
  10. IO.File.WriteAllText("c:\temp\output.txt", lines)  
In other words, we declare a DataView based on the previously filled DataTable, specifying Sort and RowFilter properties. Then, executing a loop on each row of the view, we create a pipe-separated string obtained by joining each field of the row. Finally, the text data will be written to an arbitrary text file.
Figure 5: Arbitrary Text File 
Source Code
The sample code used in the article can be downloaded here:  Sort and Filter CSV files with DataTable and DataView