Florian Reischl: Table-Valued Parameters - A Performance Comparison

As I wrote in some of my previous posts, table-valued parameters - which have been introduced with SQL Server 2008 - appear to be a great feature to enable client side bulk operations when working with SQL Server. They enable client data provider like ADO.NET to send a complete set of data to the server instead of calling serveral single row operations. Today we'll do some comparsions.

I'll start with a ~~short~~ (had to correct this...) description of the test environment. To use the best possible performance for both ways I'll use stored procedures.

The first test results focus the performance of single statements vs. bulk operations in a single client environemnt - means without any other concurrent activities. The second, more important tests cover a performance comparsion in a multi-client scenario. The last part will show a comparison for some larger bulk operations.

Test Environment

If you already read my previous posts you can skip this section. I try to keep this post uncoupled from my previous post since this covers another general topic (TVPs). So I have to repreat the test environment and the general way to use table-valued parameters from ADO.NET.

For my tests I'll used the following simple table to INSERT/UPDATE rows.

CREATE TABLE TestMany
(
   Id int IDENTITY(1,1) NOT NULL
      PRIMARY KEY CLUSTERED 
   ,SomeInt int NULL
   ,SomeDate datetime NULL
   ,SomeText varchar(100) NULL,
);

Whenever we intend to use table-valued parameters in SQL Server 2008, we need a table-valued type defined in SQL Server which can be used as a collection of Data Transfer Objects (DTOs) to transfer .NET native data to SQL Server.

CREATE TYPE TestManyMergeType AS TABLE(
   Id int NOT NULL
   ,Action char(1) NOT NULL
      CHECK (Action IN ('I', 'D', 'U'))
   ,SomeInt int NULL
   ,SomeDate datetime NULL
   ,SomeText varchar(100) NULL
   
   ,PRIMARY KEY CLUSTERED 
      (Id, Action)
);

This type covers our destination table with some slight exceptions. Since we will use table-valued parameters to push all changed client data in one package to SQL Server, we need a column which tells SS2k8 how to handle a distinct row. Therefore, we'll use the Action column which specifies the current row to be Inserted, Updated or Deleted. We don't use an IDENTITY specification for our "Id" column since we need to be able to specify an destination "Id" to be handled. To insert new rows we can use negative, unique "Id" values which are required to map newly inserted rows on client-side to their new server ids. To get the best selectivity we use a primary key over the "Id" and "Action" columns.

To provide a stored procedure which gets a table-valued parameter, the parameter has to be specified as READONLY in SS2k8.

CREATE PROCEDURE SaveTestMany
   @data TestManyMergeType READONLY
AS
   SET NOCOUNT ON;
   
   -- update
   UPDATE trg SET 
         SomeInt = src.SomeInt
         ,SomeDate = src.SomeDate
         ,SomeText = src.SomeText
      FROM TestMany trg 
         JOIN @data src 
         ON trg.Id = src.Id AND src.Action = 'U';

   -- insert
   INSERT INTO TestMany
      SELECT SomeInt,SomeDate,SomeText 
      FROM @data 
      WHERE Action = 'I' 
      ORDER BY Id

   -- return last IDENTITY and count of added rows
   SELECT SCOPE_IDENTITY(), @@ROWCOUNT;
GO

The important most important part of this procedure is the ORDER BY clause in the INSERT statement. This ensures that we can use the information about the last generated identity value (by SCOPE_IDENTITY()) and the count of inserted rows (by @@ROWCOUNT) to map the provided surrogate ids to the generated server ids. Maybe you think this might be dangerous without a MAXDOP option or anything else and maybe you are right, I just can tell you that I tried different order conditions and all rows have always been inserted in an exact order. So feel free to trust this or not. If you don't trust it (what is a good thing) do your own tests (what is a good thing, too) or just feel free to leave this post (what would be a really bad thing).

To provide a table-valued parameter to a stored procedure from ADO.NET you have two possibilities. You can either provide your data by passing a System.Data.DataTable (which appears be too blown just to provide simple data records in my opinion) or a custom implementation of an IEnumerable<SqlDataRecord>. I'll use the enumerator in my tests since it is way faster and does not require to copy all domain objects into a data table.

class TestManyEnumerator : IEnumerable<SqlDataRecord>
{
   public TestManyEnumerator(IEnumerable<TestMany> data) {
      _data = data;
   }

   private IEnumerable<TestMany> _data;

   public IEnumerator<SqlDataRecord> GetEnumerator() {
      SqlMetaData[] metaData = {
         new SqlMetaData("Id", SqlDbType.Int),
         new SqlMetaData("MergeAction", SqlDbType.Char, 1),
         new SqlMetaData("SomeInt", SqlDbType.Int),
         new SqlMetaData("SomeDate", SqlDbType.DateTime),
         new SqlMetaData("SomeText", SqlDbType.VarChar, 100),
      };

      foreach (var item in _data) {
         SqlDataRecord record = new SqlDataRecord(metaData);

         record.SetValue(0, item.Id);
         record.SetValue(1, item.ChangeState);
         record.SetValue(2, item.SomeInt);
         record.SetValue(3, item.SomeDate);
         record.SetValue(4, item.SomeText);

         yield return record;
      }
   }

   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
   }
}

This enumerator handles domain objects from type TestMany which map exactly to our above specified table-valued T-SQL type. ChangeState specifies the state ('I', 'D', 'U') of a particular object.

To call our procedure with a set of data to be updated we have to specify a SqlParameter with SqlDbType.Structured, the name of the destination table-valued type as TypeName and provide the previous enumerator as Value.

using (SqlConnection cn = GetConnection(true))
using (SqlTransaction tran = cn.BeginTransaction()) {
   // get test data
   IList<TestMany> data = GetTestData();

   // create a new command as stored procedure
   SqlCommand cmd = new SqlCommand("MergeTestMany", cn);
   cmd.CommandType = CommandType.StoredProcedure;
   cmd.Transaction = tran;
   // create a new parameter ("Structured" defines a TVP)
   SqlParameter p = cmd.Parameters.Add("@data", SqlDbType.Structured);
   // specify the name of the TVP
   p.TypeName = "TestManyMergeType";
   p.Value = new TestManyEnumerator(data);

   // catch the returned IDENTITY information
   using (SqlDataReader reader = cmd.ExecuteReader()) {
      reader.Read();
      int lastId = reader.IsDBNull(0) ? -1 : (int)reader.GetDecimal(0);
      int count = reader.GetInt32(1);
      // synchronize all newly added keys
      SyncAddedIds(data, count, lastId);
   }

   tran.Commit();
}

For a more detailed explanatin and the implementation of GetTestData():IList<TestMany> and GetConnection(open):SqlConnection please have a look here. For a generic approach, have a look at A Generic BDP Approach for SQL Server 2008.

The procedures to insert single rows into our TestMany table are just simple as they can be.

CREATE PROCEDURE [dbo].[InsertOneTestMany]
   @id INT OUTPUT
   ,@someInt INT
   ,@someDate DATETIME
   ,@someText VARCHAR(100)
AS
   SET NOCOUNT ON;
   
   INSERT INTO TestMany (SomeInt, SomeDate, SomeText)
      SELECT @someInt, @someDate, @someText;
   
   SELECT @id = SCOPE_IDENTITY();
GO

CREATE PROCEDURE [dbo].[UpdateOneTestMany]
   @id INT
   ,@someInt INT
   ,@someDate DATETIME
   ,@someText VARCHAR(100)
AS
   SET NOCOUNT ON;
   
   UPDATE TestMany SET 
         SomeInt = @someInt
         ,SomeDate = @someDate
         ,SomeText = @someText
      WHERE Id = @id;
GO

I think there is nothing which needs to be explained.

We're done with our test enviroment.

Not part of this Post

Calling a single statement procedure from .NET straight forward and part of many tutorials. Therefore I kept this source code away from here. I also don't show how to use multi-threading to call the above procedures in a concurrency scenario.

I kept DELETE operations out of the party since they are often done by special purge processes instead of usual business applications.

Since I don't have a home network, I did my tests with a SQL Server running on a VM on my local host. Therefore I cannot show you any comparison differences depending on network latency.

Single Client Tests

Let's compare the performance of single INSERT/UPDATE statements vs. bulk operations when using a single client connection. To get (almost) linear test results I executed each test configuration 1,000 times. One important thing is that I always called the specified count of INSERT and UPDATE operations in one round trip. So, "Count Insert/Update" with value "1" calls one INSERT and one UPDATE operation, count "50" affects hundred rows. I did my tests with a destination table with just 100,000 rows and witha a table with 10,000,000 rows. Both table have been recreated after/before each test configuration and flushed by calling "DBCC DROPCLEANBUFFERS() WITH NO_INFOMSGS;". To ensure as many index read operations as possible I arranged all operations over the complete complete id range of the table.

Duration on a destination table with 100,000 rows after 1,000 executions.

Insert/Update	Duration
Count	Single Statements	Bulk Operation
1	1,299 ms	3,285 ms
3	2,319 ms	2,995 ms
5	3,201 ms	3,007 ms
10	5,553 ms	3,121 ms
20	11,923 ms	3,449 ms
50	27,644 ms	4,769 ms

Duration on a destination table with 10,000,000 rows after 1,000 executions.

Insert/Update	Duration
Count	Single Statements	Bulk Operation
1	2,631 ms	4,076 ms
3	3,614 ms	4,200 ms
5	4,608 ms	4,470 ms
10	7,541 ms	4,381 ms
20	12,308 ms	4,846 ms
50	26,173 ms	6,336 ms

As you can see, there is a point at about five INSERT/UPDATE operations (means 10 rows!) in one single transaction. With less than ten operations for the same table single statements are faster than bulk operations with table-valued parameters. If we hit this count of rows to be changed, table-valued parameters become way faster than classical single row operations.

A work load between one and fifty INSERT and UPDATE tasks shows that single row operations take about 10 times longer (what's a good thing), the duration of bulk operations increase only about 1.5 times (what appears to be a better thing ;-) ).

Multi-Client Tests

As initially written. SS2k8 is a server and no geeky feature has any worth if it works only in a single-client scenario. Therefore, now we'll have a look at some multi-client tests.

Since concurrent executions take more time I ran the following tests only 100 times. I also changed the count of rows within the destination table to 500,000 what was a convenient value to arrange the threads over the whole table - to get the highest count of reads).

The following results show the overall duration of 100 executions.

Insert/Update	Duration
Count	Single Statements	Bulk Operation
1 Thread
1	159 ms	298 ms
3	237 ms	358 ms
5	344 ms	371 ms
10	562 ms	391 ms
20	1,040 ms	427 ms
50	2457 ms	558 ms
2 Threads
1	387 ms	348 ms
3	679 ms	368 ms
5	854 ms	438 ms
10	1,597 ms	421 ms
20	3,513 ms	575 ms
50	6,839 ms	830 ms
5 Threads
1	1,466 ms	936 ms
3	2,528 ms	943 ms
5	3,542 ms	1,007 ms
10	5,837 ms	1,097 ms
20	10,995 ms	1,293 ms
50	26,092 ms	2,251 ms
10 Threads
1	4,820 ms	1,604 ms
3	13,290 ms	1,936 ms
5	12,624 ms	1,838 ms
10	28,829 ms	2,012 ms
20	45,414 ms	2,789 ms
50	97,498 ms	4,610 ms

Again, remember "1" INSERT/UPDATE means two operations within one transaction. Since single row operations usually scale almost linear they are still the faster solution for single operations. However, as we see bulk operations using with table-valued parameters perform really fine in a multi-threaded environment.

Some Larger Bulks

Okay, we speak about bulk operations so finally, let's do some real bulk operations.

The following results show the complete duration of 10 executions.

Insert/Update	Duration
Count	Single Statements	Bulk Operation
1,000	4,845 ms	370 ms
5,000	23,190 ms	2,033 ms
10,000	46,539 ms	4,854 ms
20,000	95,086 ms	7,209 ms

In my opinion 400,000 changed rows (remember we did 20,000 INSERT and UPDATE operations and we executed 10 times!) in 7 seconds appear to be a really great performance. ;-)

Conclusion

Apparently table-valued parameters might be a great feature for systems which work with bulk data manipulation. This might be some web or rich client applications but especially backend processes like data loader or synchronization processes.

What does this mean? Probably nothing for a general statement. If you think bulk operations with table-valued parameters might be a nice feature for your project. Always do your own tests! We did no tests with a real network and we did not test DELETE operations.

As a general advice, always try to stay innovative but validate everything. My intention was not to show you that you have to change all your running code, but to think about some new features which might be helpful.

1 comment:

AnonymousOctober 21, 2010 at 8:15 AM
Regarding identity orders, I found this interesting:

INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted

http://blogs.msdn.com/b/sqltips/archive/2005/07/20/441053.aspx

Note: Only a member of this blog may post a comment.

Monday, November 16, 2009

Table-Valued Parameters - A Performance Comparison