Thursday, February 17, 2011

When DataTable.Select() Is Slow, Use DataView.FindRows()

ADO.NET's DataTable.Select() comes with a runtime complexity of O(N). This turns into a problem when the DataTable contains a lot of rows, and DataTable.Select() is invoked many times. It's pretty obvious why - just like in LinqToObjects-queries - there is no indexed lookup. DataTable.Select() needs to scan through all rows, and compare all values.

What I usually ended up doing in those cases, was to loop over the DataTable once, and fill a Dictionary using the Select's lookup-criteria as the entry's key, and the DataRow as its value, and use the Dictionary for lookups from this point on. This works fine, but it clutters the code with all those Dictionaries, which - in addition - must be synchronized manually each time changes occur in the underlying DataTable.

The DataView class offers an alternative. By defining DataView's Sort-property in accordance to the search criteria, DataView.FindRows() then allows indexed lookup with the usual binary search tree lookup runtime complexity of O(log(n)) (yes I know, Hashtable lookup might even execute at O(1), but O(log(n)) will do here). For example:

// do this just one time
DataView view = new DataView(table);
view.Sort = "name, city";

// do this N times
DataRowView[] res = view.FindRows(
new object[] { "Earp", "Tombstone" });


I should probably mention that DataTables and DataRows actually do offer indexed searching in two special cases:

(1) When there is a primary key defined on a DataTable, DataTable.Rows.Find() (resp. DataTable.FindBy[KeyColName]() on typed DataSets) allows for indexed primary key loookup.
(2) When there is a child relation defined on a DataTable, DataRow.GetChildRows() (resp. DataRow.Get[ChildRelName]() on typed DataSets) will return the parent row's children by indexed lookup as well.