Florian Reischl: How to handle SQL Server IDENTITIES in .NET

Many people still don't use SQL Server IDENTITY columns to generate their primary key values. Most times they just don't know how to handle them in client applications. In this case implementations often end up with client side ID generations or GUID primary keys. This post tries to show why server side IDs are a good approach and how to work with.

IDENTITY is not a Sequence
First I have to say, IDENTITY columns are not made to represent a sequence which can/shall be shown to the user. What does this mean? Every ID is used exactly once, when you delete an existing ID it will not be reused for the next inserted row.

Here's a little sample to show this behavior.

DECLARE @t TABLE
(
   Id INT IDENTITY(1,1) 
      PRIMARY KEY CLUSTERED
   ,SomeInt INT
);

-- insert some rows
INSERT INTO @t (SomeInt) 
   VALUES (100)
         ,(100)
         ,(100);

SELECT * FROM @t;

Since now the data appear to have a correct sequence.

Id	SomeInt
1	100
2	100
3	100

Let's delete one row and add another one.

-- delete one row in the middle of the table
-- and insert another one
DELETE FROM @t WHERE Id = 2;

INSERT INTO @t (SomeInt) 
   VALUES (200);

SELECT * FROM @t;

As you see the deleted Id with value "2" will not be refilled with the new inserted row. The Id "2" was used, deleted and will never be reused by SQL Server (without using tricks like IDENTITY_INSERT or re-seed the IDENTITY).

Id	SomeInt
1	100
3	100
4	200

Why doesn't SQL Server decrement the ID "3" when "2" became deleted?
This would be a huge overhead for the database server. Adopt a table with 1,000,000 rows and delete ID "1". This would cause 999,999 rows to be decremented.

Why doesn't SQL Server reuse the ID "3" for the next inserted row?
In this case SQL Server would have to remember each deleted ID. This would affect the INSERT performance and the storage usage.

If you need a dense rank or a sequence without any gaps you can use functions like ROW_NUMBER() or DENSE_RANK() in your select statement. Anyway, usually this kind of sequences should be handled in client if possible. It's quiet simple to use a int variable to create a sequence to show in GUI.

Client Side ID Generation

Before we start with server side IDs, let's have a short look to the client side approaches.

One solution I've seen some times is a SELECT MAX(Id) FROM ... before execution of the INSERT statement. This has two huge problems! First, this is a huge performance problem since every INSERT causes an index scan. The second problem is, this solution is not safe. Keep in mind, a database server is a multi-user environment. Between your MAX select and the INSERT statement another user can INSERT the same ID and you get a primary key violation!

A second solution to handle client side IDs is to use a ID table which contains the next available. The client applications use this table to get the next IDs for their INSERT statements. This way to handle client side IDs is a good approach but needs a correct implementation. What does this mean? Suppose a table like this:

table_name	next_id
Table1	344828
Table2	6454
Table3	23432

If every client application selects one ID whenever it wants to insert a new row into a table, the whole system meets in one single table with just some rows, which are probably stored in same data and index page. The result would be a problem with locks on this table.

A correct solution to work with a ID table would be to use a client side ID caching. Every client selects a range of IDs to work with for the next INSERT statements. The cache size depends on the type of the client and the count of INSERT operations generated by this client. A windows application which does not generate too many new rows might be fine with a cache size of 10, a server side import process probably needs a cache size of 500 IDs or more.

Doesn't this mean that cached IDs will not be used if the client application exists when IDs are cached? Yes, but this doesn't matter. Keep in mind, primary key IDs are not made to show to a user. They are the database identity of a row. Since INT has a maximum value of 2,147,483,647 this should be enough for most systems - even if you loose some IDs. If you think you might exceed this count of IDs you can use BIGINT which has a maximum value of 9,223,372,036,854,775,807.

Database Structure
Before we start with server side IDENTITIES, you should know the database structure for the following tests.

And here the create statements for both tables.

CREATE TABLE Parent
(
 Id int IDENTITY(1,1) NOT NULL
    PRIMARY KEY CLUSTERED
 ,SomeInt int NULL
);

CREATE TABLE Child
(
   Id int IDENTITY(1,1) NOT NULL
      PRIMARY KEY CLUSTERED
   ,ParentId int NOT NULL
      REFERENCES Parent (Id)
   ,SomeInt int NULL
);

As you see, both tables have an IDENTITY column defined as primary key and "Child" table has a foreign key which points to "Parent" table.

Low Level ADO.NET
First we'll have a look to the low level way to work with server side IDs. Therefore we use a SqlCommand to execute an INSERT statement and retrieve the new id from server.

string cnStr = "Server=.;Database=Sandbox;Integrated Security=SSPI;";

// the INSERT sql statement. Notice the "SELECT @Id"!
string insertSql = @"INSERT INTO Parent (SomeInt) 
                        VALUES (@SomeInt); 
                     SELECT @Id = SCOPE_IDENTITY();";

using (SqlConnection cn = new SqlConnection(cnStr))
using (SqlCommand insertCmd = new SqlCommand(insertSql, cn))
{
   cn.Open();

   // Add the @SomeInt param with a value
   SqlParameter someIntParam = insertCmd.Parameters.Add(
      "@SomeInt", SqlDbType.Int);
   someIntParam.Value = 123;
   
   // Add the @Id param and define it as output parameter
   SqlParameter idParam = insertCmd.Parameters.Add(
      "@Id", SqlDbType.Int);
   idParam.Direction = ParameterDirection.Output;

   // execute the command
   insertCmd.ExecuteNonQuery();

   // show the "Id" after the data are written to server
   Console.WriteLine("after database insert: {0}", idParam.Value);
}

Since our "idParam" is specified as output parameter ADO retrieves the new SCOPE_IDENTITY() from server and we can work with it in our client.

Mostly we don't need to work with this low level implementations. Anyway, sometimes it's good to know how the upper level works.

Working with untyped DataTables

Still quiet low level but common. I've seen many solutions working with untyped DataTables to handle data from a database server. Usually I would say use at least typed DataSets or an O/R-mapper. Anyway. ;-)

How to handle server side IDs with a DataTable? The DataTable will be updated with a SqlDataAdapter, so there is no way to customize the parameters and handle the returned values, is there? To combine the columns of a DataTable with return values of a SqlParameter you can use the SqlParameter.SourceColumn property, which specifies the mapping between the parameter and the columns of the table.

string cnStr = "Server=.;Database=Sandbox;Integrated Security=SSPI;";

// the INSERT sql statement. Notice the "SELECT @Id"!
string insertSql = @"INSERT INTO Parent (SomeInt) 
                        VALUES (@SomeInt); 
                     SELECT @Id = SCOPE_IDENTITY();";

using (SqlConnection cn = new SqlConnection(cnStr))
using (SqlCommand insertCmd = new SqlCommand(insertSql, cn))
using (SqlDataAdapter adap = new SqlDataAdapter())
{
   // create a DataTable which represents our Parent table
   DataTable table = new DataTable();
   // specify Id columnd as AutoIncrement
   table.Columns.Add("Id", typeof(int)).AutoIncrement = true;
   table.Columns.Add("SomeInt", typeof(int));

   // Add a row to our table and show the current Id value (which is "0")
   DataRow row = table.Rows.Add(null, 1);
   Console.WriteLine("before database insert: {0}", row["Id"]);

   // add the "SomeInt" param
   insertCmd.Parameters.Add("@SomeInt", SqlDbType.Int, 4, "SomeInt");
   // add the "Id" param and define it as output param
   SqlParameter idParam = insertCmd.Parameters.Add(
      "@Id", SqlDbType.Int, 4, "Id");
   idParam.Direction = ParameterDirection.Output;

   // inject the adapter with the custom command and update data
   adap.InsertCommand = insertCmd;
   adap.Update(table);

   // show the "Id" after the data are written to server
   // this returns the new IDENTITY now.
   Console.WriteLine("after database insert: {0}", row["Id"]);
}

The fourth parameter of SqlCommand.Parameters.Add specifies the "SourceColumn". The data adapter uses this column to create the mapping between the parameters and the columns of the table to be updated. The console output shows the server side generated value of our "Id" column.

Working with typed DataSets

Welcome on "it works build in" level. Maybe some of you made bad experiences with IDENTITY columns and typed DataSets, let me tell you that's just caused of a tiny configuration issue within the DataSet designer.

If you already used the previously shown CREATE statements to create the sample tables "Parent" and "Child" on a sample database just create a typed DataSet named "DataSet1" in a Visual Studio C# project. You will find some detailed tutorials in MSDN, which show hot to handle this. Here I just want to show how to configure a correct handling of IDENTITY columns.

Have a look to the DataSet1.Designer.cs file and navigate to the ParentTableAdapter. You will find the following line in "InitAdapter()" method.

this._adapter.InsertCommand.CommandText = 
   "INSERT INTO [dbo].[Parent] ([SomeInt]) VALUES (@SomeInt);\r\n" +
   "SELECT Id, SomeInt FROM Parent WHERE (Id = SCOPE_IDENTITY())";

As you see, it selects all inserted values back to client as soon as the new row is inserted. Unfortunately, in default configuration the selected "Id" value will not be written to our related ChildDataTable. Go back to the DataSet Designer and do the following tasks:

Right click the relation between "Parent" and "Child" table and select "Edit Relation...". The relation configuration dialog appears.
In section "Choose what to create" select option "Both Relation and Foreign Key Constraint".
Change "Update Rule" to "Cascade".
Ensure "Accept/Reject Rule" is set to "None". If you change this option to "Cascade", the changes of our "Child" row will be automatically when the "Parent" table becomes saved.

The following picture shows the configuration.

Finally a little sample which can be copied and pasted to illustrate the correct identity handling with a typed DataSet.

// create a DataSet
DataSet1 ds = new DataSet1();

// get the parent table and add a new row
DataSet1.ParentDataTable parentTable = ds.Parent;
DataSet1.ParentRow parentRow = parentTable.AddParentRow(1);

// get the child table and add a new row
DataSet1.ChildDataTable childTable = ds.Child;
DataSet1.ChildRow childRow = childTable.AddChildRow(parentRow, 1);

// create the table adapters
DataSet1TableAdapters.ParentTableAdapter parentAdapter =
   new ConsoleApplication2.DataSet1TableAdapters.ParentTableAdapter();
DataSet1TableAdapters.ChildTableAdapter childAdapter =
   new ConsoleApplication2.DataSet1TableAdapters.ChildTableAdapter();

// update the parent table
parentAdapter.Update(parentTable);
// #######################
// At this point, the new IDENTITY value from parent row
// was written to our child row!
// #######################
Console.WriteLine(childRow.ParentId);

childAdapter.Update(childTable);

Special thanks at this point goes to ErfinderDesRades, a power-user state member of myCSharp.de - a great C# forum. He showed me the correct configuration of the DataSet.

LINQ 2 SQL and Entities Framework

LINQ 2 SQL as well as Entities Framework support IDENTITY columns out of the box, so there is nothing more to say about these O/R mappers.

Florian Reischl

Sunday, October 4, 2009

How to handle SQL Server IDENTITIES in .NET

No comments:

Post a Comment