I mentioned that we have created our own thread pool implementation in RavenDB to handle our specific needs. A common scenario that ended up quite costly for us was the notion of parallelizing similar work. For example, I have 15,000 documents to index .That means that we need to go over each of the documents and apply the indexing function. That is an embarrassingly parallel task. So that is quite easy. One easy way to do that would be to do something like this:
foreach(var doc in docsToIndex)
Of course, that generates 15,000 entries for the thread pool, but
that is fine.
Except that there is an issue here, we need to do stuff to the result of the indexing. Namely, write them to the index. That means that even though we can parallelize the work, we still have non trivial amount of startup & shutdown costs. Just running the code like this would actually be much slower than running it in single threaded mode.
So, let us try a slightly better method:
foreach(var partition in docsToIndex.Partition(docsToIndex.Length / Environment.ProcessorCount))
If my machine has 8 cores, then this will queue 8 tasks to the thread pool, each indexing just under 2,000 documents....(Read whole news on source site)