I recently have been engaged in an assignment where I had to copy large chunks of data from one dataset to another in memory. When I say large chunks, I am talking about hundreds of thousands of rows. Not only I had to copy the data but go through each row of the dataset, do some calculations, and update the rows of new dataset.
To do so, I decided to use DataTable objects. If you go through the rows of a DataTable and direct copy rows from one DataTable, you will get an exception saying "Row is already being used by another table".
The simplest way is to clone an existing DataTable, loop through all rows of source DataTable and copy data from column by column and add row to the destination DataTable. The following code does the same:
For Each dr As DataRow In sourceTable.Rows
r = destinationTable.NewRow
r("Name") = dr("Name")
r("City") = dr("City")
r("Cost") = dr("Cost")
destinationTable.Rows.Add(r)
Next
DataTable.ImportRow Method
The second method is using DataTable.ImportRow method. The ImportRow method of DataTable copies a row into a DataTable with all of the properties and data of the row. It actually calls NewRow method on destination DataTable with current table schema and sets DataRowState to Added. The following code does the same as we did in the previous code snippet.
For Each dr As DataRow In sourceTable.Rows
destinationTable.ImportRow(dr)
Next
Performance
The following chart compares the time versus number of records copied. If you have small number of rows, it will not make a big difference but when you start importing hundreds of thousands of rows, for example, 500,000 in my case, there was a difference of almost 9 seconds.
Here are the numbers for both methods:
When I tried to copy over 1 million rows, DataTable.ImportRow method took 17 seconds but I got memory exception using direct copy method.
Summary
When it comes to copying bulk of data in memory from one DataTable to another, DataTable.ImportRow method comes handy. This article compares both direct copy and ImportRow method and we saw ImportRow method may reduce the processing time when dealing with large data. See the attached source code for more details.