Tuesday, August 16, 2011

ADO.NET Connection Pooling, Internals and Issues

Description

The creation of a new database connection is a relatively expensive process on server side. The server needs to allocate new memory, create a new session, allocate all needed client information, authenticate the client and so on. On the other side, it does not make sense to keep keep connections open for the whole lifetime of a application, since this causes a higher session management overhead on server side due to many concurrent connections.

To handle those two issues, ADO.NET uses connection pooling. This means, the well known SqlConnection object from .NET framework is just a wrapper for the real, internal database connection. Whenever a new SqlConnection becomes opened it asks the connection pool for an existing, and currently not used, internal connection. If an existing, free connection is available ADO.NET wont create a new database connection but reuse the existing one. As soon as the connection is not needed any more it becomes sent back into the connection pool. If a connection is not used for a longer time it becomes automatically closed by the ADO.NET connection pooler, to release resources on SQL Server. To determine if an internal connection can be used for a current request by a SqlConnection, ADO.NET compares several (not all) connection string information. This includes authentication information, database name, transaction context enlisting configuration and so forth.

This blog post focuses on ADO.NET SqlClient and SQL Server. Some of the information might be different when working with other database servers or other data providers.

Myth Close and Dispose

Many books and web resources preach that it is important to always Close and dispose a connection, to ensure that the internal connection will be sent back to the connection pool. Until .NET 3.5 even MSDN stated this, but this became fixed version 4.0. Both ways, calling the Close method or by disposing the SqlConnection will actually send the connection back to the pool. Since the using-directive provides a language integrated way to dispose a component as soon as it goes out of scope, I usually prefer this over calling the Close method in a try-catch-finally block.

Here's a little sample that shows that both ways cause a correct connection pooling:
string cnStr = "server=(local);database=Sandbox;trusted_connection=sspi;";
string sql = "SELECT connect_time FROM sys.dm_exec_connections WHERE session_id = @@SPID";
SqlConnection cn1 = new SqlConnection(cnStr);

// close the connection but don't dispose it
cn1.Open();
using (SqlCommand cmd = new SqlCommand(sql, cn1)) {
   DateTime loginTime = (DateTime)cmd.ExecuteScalar();
   Console.WriteLine("New connection: {0}", loginTime.ToString("HH:mm:ss.fff"));
}
cn1.Close();

// dispose but no close
Thread.Sleep(500);
using (SqlConnection cn = new SqlConnection(cnStr)) {
   cn.Open();
   using (SqlCommand cmd = new SqlCommand(sql, cn)) {
      DateTime loginTime = (DateTime)cmd.ExecuteScalar();
      Console.WriteLine("After Close only: {0}", loginTime.ToString("HH:mm:ss.fff"));
   }
}

// result afer dispose but no close
Thread.Sleep(500);
using (SqlConnection cn = new SqlConnection(cnStr)) {
   cn.Open();
   using (SqlCommand cmd = new SqlCommand(sql, cn)) {
      DateTime loginTime = (DateTime)cmd.ExecuteScalar();
      Console.WriteLine("After disposing only: {0}", loginTime.ToString("HH:mm:ss.fff"));
   }
}

cn1.Dispose();
Console.ReadLine();
By returning the connect_time column from sys.dm_exec_connections for our current session id we retrieve the creation time of the connection from SQL Server.

As we see, in all cases we reuse the same, internal connection to SQL Server.

How a Pooled Connection becomes Reset

Since ADO.NET does not know what happened in a previous session of a pooled connection, when it is returned to a new instance of a SqlConnection, it has to reset the connection before it can be safely reused for another session. When resetting a connection all created temporary objects become dropped, all previously allocated resources are freed and the initially connected database will be set - if it was changed in the previous session. This resetting is done by calling the SQL Server procedure sp_reset_connection.

This can cause some issues. The ADO.NET connection pooler does not reset the connection when it is sent back to the pool, but when it is returned to a new opening SqlConnection, as shown in the next sections.

In addition, the transaction isolation level will not be reset if you changed it in a previous session. If you change the transaction isolation level in some of your sessions you will need to manually reset it whenever you get a pooled connection. At the moment this behavior is by design, due to backward compatibility (see Microsoft Connect issue sp_reset_connection doesn't reset isolation level). (Thanks to Greg who suggested this!)

Temporary Tables and Named Constraints

When working with temporary tables in SQL Server, there is still an issue, names of temporary tables only need to be unique within the current connection, but names of constraints need to be unique for all current connections. Since SQL Server 2005 and following versions it is possible to define constraints without specifying a name for them. Previous versions did not support this feature and the syntax to specify a name for a table constraint is, for sure, still valid. When executing the following SQL statement in two concurrent connections you get an error message:

CREATE TABLE #t1 (
   Id INT NOT NULL CONSTRAINT PK_TEMP PRIMARY KEY CLUSTERED
)
Only the first connection is able to create the temp table, the second connection gets the following error:
Msg 2714, Level 16, State 5, Line 1
There is already an object named 'PK_TEMP' in the database.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.

This also affects ADO.NET connection pooling. Say, we've got two different processes, the first creates a new entry in a database and the second one listens for those entries to process them. When the first process finishes its work without an explicit DROP of the temporary table, the connection stays open and the temporary table will stay in database until the connection is reused. If the second process tries to create the same temporary table it will receive the same exception.

Global Temporary Tables

Same problem as described for constraints of usual temporary tables applies to global temporary tables. Their name needs to be unique over all concurrent SQL Server connections. If one process creates a global temp table and does not explicitly drop it, when closing/disposing the SqlConnection, the table will stay on the server until the current process opens another connection to this server (and database). This behavior can cause confusing errors in a production system.

Release of Temporary Allocated Data

For sure, since temporary tables stay available until a connection becomes reused, this also implies that all data, stored in those tables, stay allocated until the process exists or reuses the connection.

If we have a process that creates a temporary table and fills it with hundreds of thousands of rows those data will stay in tempdb for probably longer than expected.

One way to ensure all allocated resources and temp tables become cleaned as soon as the connection is closed would be to deactivate connection pooling. However, we should always try to avoid this. The cleanest way to avoid all previously described issues is to always, explicitly drop temporary objects.

Pooled Connection became Invalid

If an connection becomes invalid while it is in the connection pool, it will still be returned for next usage. This could be caused by network problems, a restart of the SQL Server service, a manual KILL of the connection on database side or several other reasons.

Here is a little example that can be debugged to show this:
string cnStr = "server=(local);database=Sandbox;trusted_connection=sspi;";
         
using (SqlConnection cn = new SqlConnection(cnStr))
using (SqlCommand cmd = new SqlCommand("SELECT @@SPID", cn)) {
   cn.Open();
   Debug.Print("SPID: {0}", cmd.ExecuteScalar());
}

// at this point, restart the service or KILL the connection on server side

using (SqlConnection cn = new SqlConnection(cnStr))
using (SqlCommand cmd = new SqlCommand("SELECT @@SPID", cn)) {
   cn.Open();
   Debug.Print("SPID: {0}", cmd.ExecuteScalar());
}
When debugging this code and restarting the SQL Server service after the first connection became disposed and before the second connection becomes opened you will get an exception. In my case (since I'm working with a local instance of SQL Server):
A transport-level error has occurred when sending the request to the server. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.)
At the time of this post, there is no built-in support of ADO.NET for SQL Server that tells the resource pool to verify the connection before it is returned to the SqlConnection. However, if occasional network issues or other problems, that cause an invalid internal connection, sometimes happen in your environment, you can implement your custom connection validation in a factory method that creates your connections.

string cnStr = "server=(local);database=Sandbox;trusted_connection=sspi;";
SqlConnection cn = new SqlConnection(cnStr);
cn.Open();

try {
   using (SqlCommand cmd = new SqlCommand("DECLARE @i INT", cn)) {
      cmd.ExecuteNonQuery();
   }
}
catch (SqlException ex) {
   if (ex.Message.StartsWith("A transport-level error has occurred ")) {
      cn = new SqlConnection(cnStr);
      cn.Open();
   }
   else
      throw;
}
return cn;
As you see, the connection sends a tiny SQL statement to the server before it is returned to the caller. If a transport-level exception occurs, the internal connection becomes removed from pool and another new connection will be created and returned. You can replace the "DECLARE @i INT;" SQL statement with an empty stored procedure called usp_VerifyConnection. This might be slightly faster (not tested, though), but will definitely look less confusing in SQL Server traces ;-).

Control the Connection Pool over Connection String

The SqlConnection.ConnectionString provides several attributes that allow to control the behavior of ADO.NET pooling.

Connection Lifetime can be used to specify a maximum life time for a connection in seconds. When a connection is returned to the pool this value will be compared to the (client side) creation time of the connection. If the connections lifetime is timed out, it will be closed and not put into pool. Attention, this implies if a connection with connection is sent back into the pool before its lifetime elapsed and not used for a long time, it will still stay opened and reused for one more time!
// connection string with a lifetime of 10 seconds
string cnStr = "server=(local);database=Sandbox;trusted_connection=sspi;connection lifetime=10";
string sql = "SELECT connect_time FROM sys.dm_exec_connections WHERE session_id = @@SPID";

// create a new connection with 10 seconds lifetime and send it back to the pool
using (SqlConnection cn = new SqlConnection(cnStr))
using (SqlCommand cmd = new SqlCommand(sql, cn)) {
   cn.Open();
   DateTime connectTime = (DateTime)cmd.ExecuteScalar();
   Debug.Print("Connect time: {0}", connectTime.ToString("HH:mm:ss.fff"));
}

// wait 15 seconds
Thread.Sleep(15000);
// aquire a new connection and release it
using (SqlConnection cn = new SqlConnection(cnStr))
using (SqlCommand cmd = new SqlCommand(sql, cn)) {
   cn.Open();
   DateTime connectTime = (DateTime)cmd.ExecuteScalar();
   Debug.Print("Connect time: {0}", connectTime.ToString("HH:mm:ss.fff"));
}

// create another connection
using (SqlConnection cn = new SqlConnection(cnStr))
using (SqlCommand cmd = new SqlCommand(sql, cn)) {
   cn.Open();
   DateTime connectTime = (DateTime)cmd.ExecuteScalar();
   Debug.Print("Connect time: {0}", connectTime.ToString("HH:mm:ss.fff"));
}
And here the output of this test:
Connect time: 12:47:07.617
Connect time: 12:47:07.617
Connect time: 12:47:22.837
As we see, even if the specified connection lifetime is elapsed, the second connection will get the pooled connection. As long as the elapsed connection is not sent back to the pool it will stay active. Only the third connection retrieves a new created connection, since the connections lifetime was already elapsed as it was sent to the pool.

The default value for Connection Lifetime is zero, what means a maximum lifetime.

Min Pool Size and Max Pool Size can be used to specify a minimum/maximum count of pooled connections. The default value for Min Pool Size is zero, the default value of Max Pool Size is 100. When trying to use more concurrent connections than specified value of Max Pool Size you will get an exception.

The Pooling attribute (default 'true') says if connection pooling shall be used for the created connection or not. If you use a SqlConnection with pooling 'false' the database connection will always be newly created and closed after using it.
// connection string with "pooling=false"
string cnStr = "server=(local);database=Sandbox;trusted_connection=sspi;pooling=false";
string sql = "SELECT connect_time FROM sys.dm_exec_connections WHERE session_id = @@SPID";

// create a new connection and print the current session id
using (SqlConnection cn = new SqlConnection(cnStr))
using (SqlCommand cmd = new SqlCommand(sql, cn)) {
   cn.Open();
   DateTime connectTime = (DateTime)cmd.ExecuteScalar();
   Debug.Print("Connect time: {0}", connectTime.ToString("HH:mm:ss.fff"));
}

// create a new connection and print the current session id
using (SqlConnection cn = new SqlConnection(cnStr))
using (SqlCommand cmd = new SqlCommand(sql, cn)) {
   cn.Open();
   DateTime connectTime = (DateTime)cmd.ExecuteScalar();
   Debug.Print("Connect time: {0}", connectTime.ToString("HH:mm:ss.fff"));
}

And here is the output:
Connect time: 13:41:32.790
Connect time: 13:41:33.050

As we see, the connect time on server side is different for both connections where it would be equal if we would have reused the same internal connection for the second SqlConnection.

Enlist (default 'true') attribute is a more special than the other connection pooling related attributes of connection string. If you don't work with distributed systems and/or Systems.Transactions namespace, you wont need this attribute. The Enlist attribute is important if you need to control how to span transactions over more than one transactional sub-system (like database servers, WCF services, ...). If Enlist is set to 'true' and a parent transaction context is available for the static property System.Transactions.Transaction.Current, ADO.NET will automatically register the SQL connection and its transaction in this transaction context. If you commit your database transaction (implicit or explicit) the transaction will stay pending until the parent transaction becomes committed or rolled back. Today, the most common way to manage those parent transactions is utilizing the TransactionScope class.

A hypothetical system, that might use transaction scopes, is a order management system that automatically creates a shipping order by utilizing a suppliers web-service when an order becomes created or approved. Here we need to update/create new data in a local database and create a shipping order over a web-service within one transaction. If both systems support (and allow) transactions a transaction scope can be used to ensure that all data are either written or not.

If Enlist is set to 'false', a new connection will not be enlisted to an existing transaction scope and behave like no transaction scope was available. All database transactions become committed, independent if the parent transaction will fail or succeed.

Control the Connection Pool over static SqlConnection methods

In addition to the control mechanisms provided by the connection string, ADO.NET SqlConnection provides two static methods to partially, or completely clear the current pool.

SqlConnection.ClearPool(SqlConnection) clears all pooled connections that match the provided connection - by its connection string.

SqlConnection.ClearAllPools clears all currently pooled SQL Server connections.

Pool Fragmentation

There are two more possible issues with ADO.NET connection pooling. Though, I find both are well described at SQL Server Connection Pooling (ADO.NET), so I just quote the paragraphs for sake of completeness.

Pool Fragmentation Due to Integrated Security (Windows Authentication)
Connections are pooled according to the connection string plus the user identity. Therefore, if you use Basic authentication or Windows Authentication on the Web site and an integrated security login, you get one pool per user. Although this improves the performance of subsequent database requests for a single user, that user cannot take advantage of connections made by other users. It also results in at least one connection per user to the database server. This is a side effect of a particular Web application architecture that developers must weigh against security and auditing requirements.
Pool Fragmentation Due to Many Databases
Many Internet service providers host several Web sites on a single server. They may use a single database to confirm a Forms authentication login and then open a connection to a specific database for that user or group of users. The connection to the authentication database is pooled and used by everyone. However, there is a separate pool of connections to each database, which increase the number of connections to the server.

This is also a side-effect of the application design. There is a relatively simple way to avoid this side effect without compromising security when you connect to SQL Server. Instead of connecting to a separate database for each user or group, connect to the same database on the server and then execute the Transact-SQL USE statement to change to the desired database. The following code fragment demonstrates creating an initial connection to the master database and then switching to the desired database specified in the databaseName string variable.

// Assumes that command is a SqlCommand object and that
// connectionString connects to master.
command.Text = "USE DatabaseName";
using (SqlConnection connection = new SqlConnection(
  connectionString))
  {
    connection.Open();
    command.ExecuteNonQuery();
  }
Two short things to the second quote. I would not restrict this possible issue to web applications. This kind of pool fragmentation can also happen on application servers that work with many databases on same database server. I don't know why MSDN suggests to use a explicit SqlCommand to change the database, instead of using the SqlConnection.ChangeDatabase instance method.

Sources

8 comments:

  1. Nice article. Thanks.

    In your section on sp_reset_connection, it would be good to mention that it does not reset the transaction isolation level on the connection.

    See
    http://connect.microsoft.com/SQLServer/feedback/details/243527/sp-reset-connection-doesnt-reset-isolation-level
    http://blogs.msdn.com/b/jimmymay/archive/2009/02/02/sp-reset-connection-does-not-reset-transaction-isolation-level-unexpected-behavior-by-design.aspx

    - Greg

    ReplyDelete
    Replies
    1. Hey Greg

      Thanks for your feedback and for this information! Just updated the section to reflect this issue.

      - Flo

      Delete
  2. This is the very helpful post, thanks for sharing with us. I've found another nice post over internet which also explained very well on connection string...
    http://mindstick.com/Blog/27/Connection%20String

    http://www.connectionstrings.com/sql-server-2005

    Thanks Everyone for your precious post it really helped me in completing my task.

    ReplyDelete
  3. SqlConnection.ChangeDatabase will create a different connection pool on application server. You can find out SqlConnection.Database property is changed to a different database, and connection pool performance counter is also increased.

    "USE DatabaseName" is executed on SQL Server instead, which leave the SqlConnection.Database unchanged, no extra connection pool is created.

    ReplyDelete
  4. Excellent Post. Also visit http://www.msnetframework.com/

    ReplyDelete
  5. Nice Information..
    You can see the below link information on ADO.NET.
    Its very helpful for the beginners
    http://codecreature.wordpress.com/2014/08/07/all-in-one-class-for-ado-net/

    ReplyDelete