The title of this post might seem really silly considering MSDN specifically says they aren’t thread safe, but there’s more to it than that. It says DataTables are not thread safe for WRITE operations which is a major ‘”Duh” statement, but in reality they are not safe for some read operations. I’ve been having some strange issues with DataRows in parallel loops so I started to investigate.
I saw a post stating that DataTable.Select() is actually a write operation so it isn’t thread safe and requires a lock. I don’t see anything on MSDN about this. So I got to thinking and that’s when I opened up dotPeek. What did I find?
DataTable.NewRow() is not thread safe! The point of this method is to create a new row with the same schema as the table. According to MSDN you have to then add that row to the table’s DataRowCollection, it is not automatically added to the row collection when calling NewRow(). This, for the most part, is a read operation which would be thread safe. BUT, at one point in the internals, a call to NewRow(int) is made which creates the new record object, then sets that record to a class scoped field before passing it along to a DataRowBuilder. This is where it is no longer thread safe.
I was seeing so many problems because I was using NewRow() in a parallel loop and data was being corrupted at the end. This was why. To combat this, I generated a cache of new rows before the parallel loop, in a ConcurrentBag<DataRow> and then use a TryTake() inside of the loop. No more problems.
Microsoft MVP, MCTS, PostSharp MVP,
I saw this and your previous post on using your thread safe pattern. I highly suspect that DataTable is simply manipulating a flat data object in the background that it will eventually render into HTML.
If you’re really wanting to optimize this you may want to get the initial data from the database into a thread safe collection and then run that through your pipeline. Then simply pass the pipeline output to the datatable’s data source.
Sounds like fun! :)
I’m working with the DataTable object (instead of DataSets). I would love to use POCO’s instead of DataTables but I can’t for a few reasons. But it’s important to pay attention when working with multi-threading and parallelism. I already knew DataSet, DataTable and DataRow were not thread safe, but I assumed some operations were. But it turns out, it isn’t!
What read operations are not thread-safe?