Florian Reischl

Saturday, July 31, 2010

Business Process Modeling (BPM) with .NET Workflow Foundation (WF) 4.0

I'm currently reading the book Pro WF: Windows Workflow in .NET 4. As a short intermediate review: It's a great resource! I'll write a complete review when I'm done with the rest of the book. I'll also come up with a post that describes a complete overview of Workflow Foundation (WF) 4.0.

This post provides a simple example how WF can be used by our clients to apply real Business Process Modeling (BPM).

Guess we are developing a sales order system. Depending on an orders net-amount the order gets a discount. The calculation of this discount is usually best done within source code (at least in my opinion). So here is a class that represents our default implementation.

using System.Activities;

namespace WfActivityLibrary1 {
   public sealed class DefaultDiscount : CodeActivity {
      [RequiredArgument]
      public InArgument<Order> Order { get; set; }

      protected override void Execute(CodeActivityContext context) {
         Order order = Order.Get(context);

         if (order.NetAmount >= 1000)
            order.Discount = order.NetAmount * 0.2m;
         else if (order.NetAmount >= 500)
            order.Discount = order.NetAmount * 0.1m;
         else if (order.NetAmount >= 100)
            order.Discount = order.NetAmount * 0.05m;
         else
            order.Discount = 0m;
      }
   }
}

The used base class CodeActivity represents one of the Workflow Foundation base classes that can be used to implement custom workflow activities. Usually I'd move the business logic into a class within the Business Logic Layer (BLL) and consume that class within the WF activity. But this would increase the complexity of this sample with additional layers and without any real worth about working with WF, what's the topic of this post.

The problem with discounting is, operational business is very dynamic and the client usually needs to be able to provide selling campaigns to her/his customer. A campaign can be "20% for everything" or "50% of for premium customers".

Until now, this was the point where we started to create dozens of mapping tables and much more classes, containing hundreds of if-statements, which covered the different types of clients discounts. The client got some new masks within the application to manage all those new master data or (worse) a way to import some outside maintained Excel sheets.

However, what if the client wants to provide a new discount related on information that are not yet covered by our "discount calculation"-subsystem within our software? We started to add more master data tables, more classes, more if-statements and more master data masks within the application.
Humans are creative. Especially sales people, when they try to find a new ways to sales their products. (What's a good thing!) So requirements like calculation of discounts for selling-out campaigns are often spirals that never end.

Business Process Modeling

Often clients do not for complicated master data table relations or huge complexity within their system (which is often expensive to realize). They don't understand why it is so complicated to combine discount rates dynamically in different ways. And from a workflow based point, they are right though.

Instead of all those master data tables, and mask, the client often would prefer some way to really dynamically define their discounts with some simple components. That's the point where an application, that supports those features, enters the space of Business Process Modeling (BPM). BPM enabled software represents a huge benefit to the client, since (s)he does not longer need to specify a new change request for the software vendor whenever business changes. Instead of immutable behavior, the software is able to be adopted into the changing business processes.

BPM with WF

Based on our initial sample, let's take our already existing "Default Discount" workflow activity and one additional activity called "Fix Rate Discount". The "Fix Rate Discount" gets two arguments:

Order: The order to be discounted
Percentage: The percentage rate to apply to the order

I kept the source code of the "Fix Rate Discount" activity away, since it is really simple and does not relate to business modeling at all.
Armed with those two (notice, only two) workflow activities it's time to review our two selling-out campaigns.

20% for everything
50% of for premium customers

WF provides the possibility to define new workflows not only in Visual Studio but in any kind of .NET client application, simply by hosting a special WF designer control. So we can include this control into our application (or an external BPM module) and give the client the possibility to model their business by utilizing any defined activities. Even more, each new workflow represents an activity too and can be reused within any other workflow.

WF provides two different kinds of diagrams. Sequence diagrams represent a sequence of activities, executed top down. Flowchart diagrams are like any kind of well known flow chart out of tools like common BPM tools, Microsoft Visio, and most other modeling tools. Both diagram types can be easily combined with each other. Both diagram types support activities like "if"-decisions or several different loops.

I've chosen a sequence diagram for our "20% for everything" selling-off.

Here you see an excerpt of the WF workflow designer and related toolboxes. Certainly the main component is the diagram designer. Activities can be dragged into the designer and configured. The left toolbox (in my configuration) shows all workflow activities that are currently available. The properties window on the right side can be used to configure distinct activities within a diagram and exchange data. The tool-window at the bottom is used to define input and output arguments of the current workflow. For instance, the current workflow expects an input argument of type "Order" called "OrderToDiscount". Those arguments are provided at runtime when the system calls the workflow. Within the diagram the "DefaultDiscount" activity represents the systems implemented default discount calculation. The "Add 20% Discount" activity represents a "FixRateDiscount" activity where I changed the name to be more meaningful within the diagram.
For the "50% of for premium customers" workflow, I've chosen a Flowchart diagram. Not because it wouldn't be possible to do this with a Sequence diagram, but to show the other diagram type.

The "Decision" activity represents an "if"-decision that determines if the customers category is "Premium". For sure, in a real system this information evaluation should be encapsulated in a custom activity. The "DefaultDiscount" represents our system default discount calculation. The "50% Discount" activity is another instance of our "FixRateDiscount" activity.

Instrumentation Instead of Development

Nevertheless, be careful with too many logic within those workflows. Due to some very powerful activities like "Assign" to assign any value to any variable or property of an object or "Invoke" to invoke any method on an object instance, you can use WF to develop almost everything. In my opinion, this is the wrong way to use workflow engines. Those tasks are usually better placed in source code.

I see a huge strength in workflows to enable clients to change the processing behavior of a system. I don't see a worth in letting the client develop her/his system. Usually they neither do have the skills to do this, nor they are interested in tiny low level tasks within their models.

Monday, July 19, 2010

If You Cannot Use Identities; Cache Your Ids

Sometimes it is not possible to use IDENTITY columns in database tables. Unfortunately, one of the most common approaches to handle new IDs in non IDENTITY columns is working with MAX(). I just saw this yet another time in a blog post I don't want to refer here. In my opinion, the generation of unique ID values is quiet basic but should be done correct, so this blog post is all about this topic.

If you are able to use IDENTITIES in all tables, you are find and you can finish reading here. :-)

Reasons Not To Work With Identity Columns

One reason might be, that the used framework does not support them, like Entity Framework in combination with SQL Server Compact. SQL Server Compact does not support batch execution, so every query allows only one SQL statement and EF does not support a second query for each INSERT to determine the last generated IDENTITY value.

Another reason not to work with IDENTITY columns are scenarios where computed IDs are required, like in distributed systems. I used them to guarantee a unique PRIMARY KEY over all databases (all over the world) where the first part of the primary key was an incrementing BIGINT value and the second part was a SMALLINT, describing the database where the table row was created. Some of you might argue, that I would have also been able to use a Guid, however I still don't like them in databases since they are way too large and too slow.

The MAX() Approach

A very common approach to generate new ID values is to select the currently maximal value of the table where new rows have to be inserted.

Here a plain SQL sample.

-- ======================================================================
CREATE TABLE Orders (
   Id INT NOT NULL PRIMARY KEY CLUSTERED
   ,OrderDate DATETIME2
);

-- ======================================================================
-- some existing rows
INSERT INTO Orders VALUES (1, SYSDATETIME());

-- ======================================================================
-- generate new ID values, using MAX
DECLARE @newId INT;

-- select current maximal ID + 1
SELECT @newId = ISNULL(MAX(Id), 0) + 1 FROM Orders;

INSERT INTO Orders VALUES(@newId, SYSDATETIME());

Unfortunately, this approach has some issues. First, let's have a look at the performance. Each call of MAX() causes an index scan of the primary key, as shown in the below execution plan. SQL Server 2008 does a great job, avoiding to scan the whole index, however this scan for each new row should be avoided.

The second, more important issues is concurrency. If we are working with SQL Server Compact, this issue does not exist, since the DBMS supports only one current connection at the time ;-). In every other environment, where we have more than one concurrent user connection, we might (and occasionally will) get an error if there are two concurrent inserts and both connections are at the same time between row 16 and row 18 of the above sample script. Both will get the same MAX ID value from the table, but only one of them can use the new calculated ID for insert. The second one will retrieve a primary key violation.

Using an ID Table

To avoid both previously denoted issues, we can use an ID-table, containing the name of the table to create a new ID for and the next available ID.

-- ======================================================================
-- id table
CREATE TABLE Ids (
   Name VARCHAR(30) PRIMARY KEY CLUSTERED
   ,NextId BIGINT NOT NULL
);
-- ======================================================================
-- init ids for Orders table
INSERT INTO Ids VALUES('Orders', 1);
GO
-- ======================================================================
DECLARE @id BIGINT;
-- get the next available "Orders" id from Ids table
UPDATE Ids SET
   @id = NextId
   ,NextId += 1
WHERE Name = 'Orders';

INSERT INTO Orders VALUES(@id, SYSDATETIME());

Usually, we would move the ID allocation into a procedure, which I left out for sake of brevity. This already looks like a much better approach, doesn't it?

However, till now, this solution brings up another issues. Now the whole system has to access (with a write lock) the ID-table for each single row inserted into any other table. This usually causes a system wide performance issue when system is on load. The solution for this problem is to cache IDs as explained below.

ID Caching

To avoid the access problem with the ID table, clients can use a caching mechanism. This means each client avoids to allocate new IDs one by one, but always allocates a larger count of IDs in one step and works with them for the next new rows to be inserted.

The count of IDs to be allocated in one step depends on the client and the count of row it usually creates. A GUI like a web- or windows-application is often fine with an ID cache size of 10 where a data loader process might need a cache size of 1000.

For sure, this causes that some ids are never used. E.g. if a web application allocates 10 new IDs but the user enters only one new row, 9 IDs are lost. However, this doesn't really matter since primary key IDs are only needed for uniqueness and should never be used as shown sequence in the client.

The easiest way to allocate new IDs is a simple procedure which gets the name of the table to allocate new IDs for and the count of IDs to be allocated and returns the next available ID to be used.

ALTER PROCEDURE AllocateIds
   @nextId BIGINT OUTPUT
   ,@name VARCHAR(30)
   ,@count INT
AS
SET NOCOUNT ON;

UPDATE Ids SET
   @nextId = NextId
   ,NextId += @count
WHERE Name = @name
GO

Client Side Caching

Caching IDs in a client application is usually quiet easy, as long as the client is stateful like usual web- or windows-applications and most web services.

This IdGenerator is a very simple sample of how to cache IDs in the client, utilizing our previously shown stored procedure.

static class IdGenerator {
   class IdEntry {
      public long NextId;
      public long MaxId;
   }
   // table depenant cache
   static IDictionary<string, IdEntry> _cache = new Dictionary<string, IdEntry>();
   // connection and allocation information
   static string _connectionString;
   static int _allocationSize;
   // initializes the connection and allocation information
   public static void Init(string connectionString, int allocationSize) {
      _connectionString = connectionString;
      _allocationSize = allocationSize;
   }
   // public interface to allocate new ids
   public static long NextId(string tableName) {
      IdEntry entry = GetEntry(tableName);
      if (entry.NextId > entry.MaxId)
         AllocateNewIds(entry, tableName);
      return entry.NextId++;
   }
   // get an id entry for a specified table
   private static IdEntry GetEntry(string tableName) {
      IdEntry entry;
      if (!_cache.TryGetValue(tableName, out entry)) {
         entry = new IdEntry { NextId = 1 };
         _cache.Add(tableName, entry);
      }
      return entry;
   }
   // allocate new ids from database
   private static void AllocateNewIds(IdEntry entry, string tableName) {
      Console.WriteLine("Allocating new ids from database");
      using (SqlConnection connection = CreateConnection())
      using (SqlCommand command = new SqlCommand("AllocateIds", connection)) {
         command.CommandType = CommandType.StoredProcedure;

         SqlParameter nextId = command.Parameters.Add("@nextId", SqlDbType.BigInt);
         nextId.Direction = ParameterDirection.Output;
         command.Parameters.Add("@name", SqlDbType.VarChar, 30).Value = tableName;
         command.Parameters.Add("@count", SqlDbType.Int).Value = _allocationSize;
         command.ExecuteNonQuery();
         entry.NextId = (long)nextId.Value;
         entry.MaxId = entry.NextId + _allocationSize - 1;
      }
   }
   // create a new, open database connection
   private static SqlConnection CreateConnection() {
      SqlConnection connection = new SqlConnection(_connectionString);
      connection.Open();
      return connection;
   }
}

... and a sample, how to use the ID generator...

IdGenerator.Init("Server=.;Database=Sandbox;Integrated Security=sspi;", 10);
for (int i = 0; i < 100; i++) {
   long id = IdGenerator.NextId("Orders");
   Console.WriteLine(id);
}

ID Tables And T-SQL

Well, while ID-tables and caching are a good way to go in client applications, they are a bit tricky in T-SQL (and probably in most other SQL dialects). The problem is that T-SQL is stateless and due to this fact it does not support caching, since we don't have that "static" place where we can store our cached IDs.

One workaround is to create a temp table that looks like the stateful ID-table and represents our "own" cache. But, due to the scope handling of SQL Server it is not possible to move the temp table creation into a procedure, so this workaround is quiet awkward to maintain since all scripts/procedures that want to a client like caching have always to create their temp table by their own.

However, usually T-SQL scripts should never try to work like clients - in a row based manner, so they should not really need a ID cache. Whenever new rows have to be inserted into a table, the executing procedure/script should determine the full count of needed rows, allocate the required range of IDs and insert all rows in one operation.

One thing that changes when working with ID-tables is, IDs for single row inserts should be provided from client now. Why? Because of the missing ability to cache IDs in T-SQL. Where we usually had insert procedures lie this...

CREATE PROCEDURE InsertOrder
   @id INT OUTPUT
   ,@orderDate DATETIME2
AS
   INSERT INTO Orders (OrderDate)
      VALUES (@orderDate);
   SELECT @id = SCOPE_IDENTITY();

... we would have to act like this...

CREATE PROCEDURE InsertOrder
   @id INT OUTPUT
   ,@orderDate DATETIME2
AS
   EXECUTE AllocateIds @id OUTPUT, 'Orders', 1;
   INSERT INTO Orders (Id, OrderDate)
      VALUES (@id, @orderDate);

... but this would cause a huge traffic in our ID-table. So it is usually more productive to get the IDs from the client, which can easily handle the caching.

CREATE PROCEDURE InsertOrder
   @id INT
   ,@orderDate DATETIME2
AS
   INSERT INTO Orders (Id, OrderDate)
      VALUES (@id, @orderDate);

Some might say, that they don't want to give clients the force to handle the new primary key values. Though, SQL Server enforces the uniqueness for you ;-) and the price to pay is quiet cheap, compared to the other option.

Stateless Clients

Stateless clients have almost same issues like T-SQL procedures and scripts. Due to their statelessness, they are usually not able to cache their IDs. The only additional option we got with stateless clients is to move the ID caching into an external resource like a WCF service. If an system has several stateful and several stateless components, it is an option to keep the stateful applications fast by providing a separate ID cache for those that are stateless.

For sure, you could consume this service also from T-SQL (quiet simple with SQLCLR as I showed here) but SQLCLR to access web services is still rare used and the overhead is usually way larger than directly accessing the ID-table. So, the service should only be used from client side to keep the SQL out of the ID generation.

Friday, July 9, 2010

Defend Your Code

Ever explained a current bug or dirty implementation with one of those sentences?

"The (project) manager/marketer said, we have to meet the deadline."
"The (project) manager/product owner told you, a quick and dirty solution is okay for now."
"The XYZ told you, we NEED this new feature, no matter how it works behind."

Never? Though, I did, not currently but I did.

Who was responsible to the current problem? It's me. I treated non code-specific requirements over system quality. I did not write the unit test which had found the problem. I did not invest the time for refactoring to ensure a clean implementation. You might argue that somebody else decided. Nevertheless, it's my, and only my, task to keep the system clean, free of bugs and maintainable.

I see software projects are like a play and we, all the stakeholders, are the actors/actresses. The part of the marketer is to push forward for new features. Part of the project manager, managers and product owners is to obsess the schedule. And it's our part to defend the code!

What about "If I did not implement this feature in time, I'd got fired"? Probably not. Most Managers don't want buggy software or software that is hard to maintain for future requirements. Even if they don't say that very clear. Managers, and all other non developing stakeholders, just cannot rate importance of one or more unit tests and have no clue about refactoring (even if they say they do have).

If I would be an electrican, the building owner and the lead worker would always insist on the deadline. However, if there is no power when the building is done, I'd be the single neck to wreck. No building owner, no lead worker. It would be their part to push forward and it would be my part to make (and keep!) it work.

Wednesday, July 7, 2010

We Are Authors!

I just read this statement in Robert C. Martin's Book Clean Code where he primary wrote about "how to write code". However, I see this as a more general thing.

We are all authors and and we are writing almost all the day. Everything we write will be read by somebody, now or in months, so we really should think about the reader when we are writing. Reading can make people annoyed about the author or grateful to him/her and we should always prefer the latter.

Source Code

Did you ever see some, so called "very flexible", solution like this?

// get data returns all new users to be created
object[] rows = GetData();
foreach (object[] row in rows) {


    // ix 0 represents the new user id
    // ix 1 is not needed
    // ix 2 is the name of the new user
    int i = (int)row[0];
    string n = (string)row[2];
    CreateUser(i, n);
}

Guess, the author is just sitting right next to you and you are about to add some new features. He tells you that this approach, using untyped object arrays is very flexible, because you don't have to change the interface if something new comes in.

I call this completely inflexible. It is impossible to do any change anything within the user structure since it is almost impossible to find the positions where the structure is used.

How shall I know that GetData returns users? Does it always return users? Why an array of object as rows, instead of an ICollection<T> to let the user know what's inside the collection? Why another array of object as row, instead of a strong typed NewUser type? Why all the noise comments, instead of self describing variables? What's about "ix 1 is not needed"? Is it always empty until now? Is it reserved for future usage? Am I just not allowed to look into? Why the two empty rows at the beginning of the loop?

How about this?

ICollection<NewUser> newUsers = GetNewUsers();
foreach (NewUser user in newUsers) {
    CreateUser(user.Id, user.UserName);
}

This code does exactly the same as the previous, but we don't need any comments. The reader directly knows what's going on and we are able to refactor NewUser whenever we want because we can find all positions where it is used.

If I might ask myself, who should ever read my code? The most faithful reader of my code am I. If we have to change some implementation, we realized days, months or years before we start with reading the existing code. We are jump back and forth to find out what happens, which objects are used and how they are composed to hit the target.

Writing clean source code means to be kind to your readers; especially you.

T-SQL

For sure, T-SQL is source code too. However, for me it felt important to write this own subject. Why? Because the most ugly code I've seen was written in T-SQL.

Ever seen a procedure/script like this?

insert Into TestData100K SELECT Top(10) id, BitVal, FLOATVAL from TestData10K WHERE 
ID > 99
and intval != 1

Unfortunately (yes, unfortunately) T-SQL is not case sensitive, so the upper/lower case of statements or even words within statements are often only depends on the occasional twitches of the writers little finger. What about line breaks? How about indention?

How about this?

INSERT INTO TestData100K
SELECT TOP(10) 
   Id
   ,BitVal
   ,FloatVal
FROM TestData10K 
WHERE ID > 99
   AND IntVal != 1;

I'd never say that my way to write T-SQL is the best, but you can be sure that all my T-SQL looks like this. It is unimportant if you prefer upper case or lower case keywords like SELECT but take your decision and comply with it. Take your decision how and what to indent and where to add the commas.

The worst T-SQL I've seen was most times written by non database developers who "had to do some ugly T-SQL". T-SQL, like other programming languages, is always as ugly as we write it. (Well, T-SQL provides more features to write bad code than most other languages.) So, it's up to us to write pretty procedures.

Emails

Emails? Why bother? In my opinion, emails are some kind of comedown of humans writing. In good old days, as we wrote letters with paper we all knew some formatting rules, and we followed them. Today, with emails, many people seem to forgot any kinds of formatting.

Did you ever get a mail like this?

Subject: todo
Content:
Please implement NewOrder till friday. WE NEED TO GO ONLINE ON MONDAY!!!
thx

What does this subject say to me; except "somebody has to do something"? Who is responsible to implement the new feature? I have to look to the email header to see, if I am the only person in the "TO" recipients. If there are even more than one "TO" recipients, the recipients have to guess who is responsible. Why capitalizing the production schedule? Capitalized text always feels like screaming. Does this mean I might have to work on weekend to meet the deadline? Why those three exclamation marks? Does this mean I have to stay on Friday, even if it becomes Saturday, until I'm done? After this hard task assignment a sloppy "thx"? Are you kidding me?

How about this?

Subject: Feature NewOrder (until Friday)
Content:
Hi Flo

Please implement the “NewOrder” until next Friday. It's important that we meet the deadline due to a presentation on Monday. Please let me know if you need some additional resources.

Thanks
John

Sure, this is much more text to write and read. However, I'm no machine and I don't need (/want) to get my tasks optimized for fast processing (reading). It's a kind of respect to invest some time for assigning a task in a grateful form.

I don't mean emails to a friend, to ask her for lunch tomorrow. I use to write them sloppy because the content is not really important and I'd probably even speak in slang if I'd stand in front of her.

Conclusion

I'd really appreciate if more people would feel like an author whenever they are writing. To keep the readers in mind whenever we write is some kind of respect which is always worth.

Though, sometimes I'd appreciate if I would follow this engagement more consequent than I actually do. However, I know I'm not perfect, I know (at least some of) my personal issues and I'm still working on.

Tuesday, July 6, 2010

Table-Valued Parameters

This post is initiated by Ken Simmons (Blog | Twitter). He asked to write a blog post that describes our favorite feature of SQL Server. Ken's post here is not only interesting due to the MSDN package :-) but also caused me thinking about what I changed in my database solutions since SQL Server 2008.

For sure, this is not my first post about SQL Server 2008 feature User Defined Table-Valued types. I'd not call them as the greatest feature of SQL Server in general, however they are one of the greatest new features in version 2008.

Bulk Load by Ids

Often it is required to load many rows of one table by their ids or any other attribute. A sample is reloading of already known data.

One solution is to send one query for each item to be (re-)loaded. However, this causes many small queries, stressing server and network.

Another approach is to create one dynamic SQL query, containing an IN statement for all ids to be reloaded. Though, this has several issues. Dynamic creation of SQL statements is always a potential open door for SQL injection. Those IN statements make it impossible for SQL Server to reuse existing execution plans. IN statements with many items require many resources on server side and become slow. This works only if there is one unique column to filter.

The - in my opinion - best solution, before SQL Server 2008, is to create a concatenated string of all IDs to be loaded, send the string to the server, split them into a temp table and use this table joined to the real table to return the requested rows. I wrote a huge post about SQL split functions here. Though, as cool as this approach is, it stays a little hack. When working with more than one column, this approach becomes awkward since you have to handle hierarchical data (lines and items) within the string.

User-Defined Table-Valued types are a new and clean way to bulk load rows from database. Let's start with the definition of a new type.

CREATE TYPE IntIdTable AS TABLE (
   Id INT NOT NULL PRIMARY KEY CLUSTERED
)

Now, we are able to write a stored procedure which gets this type as parameter. (I don't show the referenced table "TestData100K" because it doesn't really matter. Out of others, it contains one column, called "Id". Please, no bashing for the "SELECT *", it's a sample.).

CREATE PROCEDURE GetTestDataByIds
   @ids IntIdTable READONLY
AS
   SELECT td.*
   FROM TestData100K td
      JOIN @ids ids ON td.Id = ids.Id;

Okay, we're done on server side, let's switch to the client.
First thing we need to be able to address User-Defined Table-Types is a class, implementing the IEnumerable<SqlDataRecord>. (Another way would be a DataTable, but I don't like this class due to its huge overhead.)

class IntRecordEnumerator : IEnumerable<SqlDataRecord> {
   public IntRecordEnumerator(IEnumerable<int> values) {
      _values = values;
   }

   IEnumerable<int> _values;

   public IEnumerator<SqlDataRecord> GetEnumerator() {
      SqlMetaData metaData = new SqlMetaData("Id", SqlDbType.Int);

      foreach (int id in _values) {
         SqlDataRecord record = new SqlDataRecord(metaData);
         record.SetInt32(0, id);
         yield return record;
      }
   }

   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
   }
}

Now, we can use this class as parameter of a SqlParameter. (Notice, the parameter requires SqlDbType.Structured and TypeName to address the name of the table type.)

[TestMethod]
public void SearchIds() {
   string sql = "GetTestDataByIds";
   using (SqlConnection connection = CreateOpenConnection())
   using (SqlCommand command = new SqlCommand(sql, connection)) {
      command.CommandType = CommandType.StoredProcedure;
      SqlParameter param = command.Parameters.Add("@Ids", SqlDbType.Structured);
      param.TypeName = "IntIdTable";

      int[] idsToLoad = new int[] { 1, 2, 3, 4, 5 };
      param.Value = new IntRecordEnumerator(idsToLoad);

      using (SqlDataReader reader = command.ExecuteReader()) {
         while (reader.Read()) {
            int id = (int)reader["Id"];
            Console.Write(id);
            Assert.IsTrue(idsToLoad.Contains(id));
         }
      }
   }
}

Streaming Data To The Server

In distributed systems it is important to synchronize data between different databases.

If possible to use database features like BCP or SSIS, this is often the best solution. However, sometimes it is required to do some additional work, containing business logic, those solutions require a duplication of business logic into the database. In addition, if there are data to be updated (not only inserted) the data have to be packed into a staging table before they can be handled by a procedure. Staging tables are a nice temporary storage but can become tricky if there are concurrent synchronization jobs.

The classic and, as far as I know, most common solution today is using single INSERT statements to load those bulk data row by row. As denoted above, this becomes a little stress test for database server and network.

Another, a bit esoteric, approach is to use SQL Server 2005's XML functionality to create one huge XML package, containing all data to be written into database server. Problem is XML is a quiet chatty language and causes a huge memory usage on client and server.

Here, User-Defined Table-Valued types can be used to push all data in one step into a procedure. Business logic can be applied while streaming through all data to be synchronized. The procedure finally updates and inserts the data into their destination table. I first wrote about this approach here.

Again, we start with a table type.

CREATE TYPE ImportSampleTable AS TABLE (
   FirstName VARCHAR(30)
   ,LastName VARCHAR(30)
);

A simple procedure, that gets the type as parameter and inserts the rows into a table.

CREATE PROCEDURE ImportSampleFile
   @importData ImportSampleTable READONLY
AS
   INSERT INTO ImportSample
   SELECT *
   FROM @importData;

Heating to the client...
An implementation of a IEnumerable<T>.

class FileToLoad : IEnumerable<SqlDataRecord> {
   public FileToLoad(string fileName, SqlConnection connection) {
      _fileName = fileName;
      _connection = connection;
   }

   string _fileName;
   SqlConnection _connection;
   SqlMetaData[] _metaData;

   public IEnumerator<SqlDataRecord> GetEnumerator() {
      CreateMetaData();

      using (StreamReader reader = new StreamReader(_fileName)) {
         while (!reader.EndOfStream) {
            SqlDataRecord record = CreateRecord(reader);
            yield return record;
         }
      }
   }

   private void CreateMetaData() {
      _metaData = new SqlMetaData[] {
         new SqlMetaData("FirstName", SqlDbType.VarChar, 30),
         new SqlMetaData("LastName", SqlDbType.VarChar, 30),
      };
   }

   private SqlDataRecord CreateRecord(StreamReader reader) {
      string line = reader.ReadLine();
      SqlDataRecord record = new SqlDataRecord(_metaData);
      string[] lineItems = line.Split('\t');
      record.SetString(0, lineItems[0]);
      record.SetString(1, lineItems[1]);
      return record;
   }

   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
   }
}

And again, we are ready to use the table valued type as parameter for another test.

[TestMethod]
public void StreamingLoad() {
   using (SqlConnection connection = CreateOpenConnection())
   using (SqlCommand command = new SqlCommand("ImportSampleFile", connection)) {
      command.CommandType = CommandType.StoredProcedure;
      SqlParameter param = command.Parameters.Add("@importData", SqlDbType.Structured);
      param.TypeName = "ImportSampleTable";
      // get a sample file
      string fileName = GetType().Assembly.Location;
      fileName = Path.GetDirectoryName(fileName);
      fileName = Path.Combine(fileName, "FileToLoad.txt");
      
      param.Value = new FileToLoad(fileName, connection);
      command.ExecuteNonQuery();
   }
}

Conclusion

User-Defined Table-Valued types a nice new feature of SQL Server 2008! I did not find many reasons to use them in plain T-SQL, but they provide a native support of set-based solutions between client and server and this is a really great enhancement of SQL Server.