Thursday, 19 September 2013

Task Parallel Library and SQL Connections

Task Parallel Library and SQL Connections

I'm hoping someone can confirm what is actually happening here with TPL
and SQL connections.
Basically, I have a large application which, in essence, reads a table
from SQL Server, and then processes each row - serially. The processing of
each row can take quite some time. So, I thought to change this to use the
Task Parallel Library, with a "Parallel.ForEach" across the rows in the
datatable. This seems to work for a little while (minutes), then it all
goes pear-shaped with...
"The timeout period elapsed prior to obtaining a connection from the pool.
This may have occurred because all pooled connections were in use and max
pool size was reached."
Now, I surmised the following (which may of course be entirely wrong).
The "ForEach" creates tasks for each row, up to some limit based on the
number of cores (or whatever). Lets say 4 for want of a better idea. Each
of the four tasks gets a row, and goes off to process it. TPL waits until
the machine is not too busy, and fires up some more. I'm expecting a max
of four.
But that's not what I observe - and not what I think is happening.
So... I wrote a quick test (see below):
Sub Main()
Dim tbl As New DataTable()
FillTable(tbl)
Parallel.ForEach(tbl.AsEnumerable(), AddressOf ProcessRow)
End Sub
Private n As Integer = 0
Sub ProcessRow(row As DataRow, state As ParallelLoopState)
n += 1
Console.WriteLine("Starting thread {0}({1})", n,
Thread.CurrentThread.ManagedThreadId)
Using cnx As SqlConnection = New
SqlConnection(My.Settings.ConnectionString)
cnx.Open()
Thread.Sleep(TimeSpan.FromMinutes(5))
cnx.Close()
End Using
Console.WriteLine("Closing thread {0}({1})", n,
Thread.CurrentThread.ManagedThreadId)
n -= 1
End Sub
This creates way more than my guess at the number of tasks. So, I surmise
that TPL fires up tasks to the limit it thinks will keep my machine busy,
but hey, what's this, we're not very busy here, so lets start some more.
Still not very busy, so... etc. (seems like one new task a second -
roughly).
This is reasonable-ish, but I expect it to go pop 30 seconds (SQL
connection timeout) after when and if it gets 100 open SQL connections -
the default connection pool size - which it doesn't.
So, to scale it back a bit, I change my connection string to limit the max
pool size.
Sub Main()
Dim tbl As New DataTable()
Dim csb As New SqlConnectionStringBuilder(My.Settings.ConnectionString)
csb.MaxPoolSize = 10
csb.ApplicationName = "Test 1"
My.Settings("ConnectionString") = csb.ToString()
FillTable(tbl)
Parallel.ForEach(tbl.AsEnumerable(), AddressOf ProcessRow)
End Sub
I count the real number of connections to the SQL server, and as expected,
its 10. But my application has fired up 26 tasks - and then hangs. So,
setting the max pool size for SQL somehow limited the number of tasks to
26, but why no 27, and especially, why doesn't it fall over at 11 because
the pool is full ?
Obviously, somewhere along the line I'm asking for more work than my
machine can do, and I can add "MaxDegreesOfParallelism" to the ForEach,
but I'm interested in what's actually going on here.
PS.
Actually, after sitting with 26 tasks for (I'm guessing) 5 minutes, it
does fall over with the original (max pool size reached) error. Huh ?
Thanks.

No comments:

Post a Comment