Florian Reischl

Saturday, January 23, 2010

.NET 4.0 Lazy<T> Class

.NET 4.0 and Visual Studio 2010 is raring to go. Current release date is somewhere in May 2010.

One of the new features of new framework is the Lazy<T> class which supports lazy initialization of objects.

As I wrote in some of my previous blogs I'm not best friend of (wrong used) Lazy Loading. However, there are some use cases where this is a very powerful feature. Therefore I use the phrase "Lazy Initialization" instead of "Lazy Loading" in below samples for Lazy<T> class. "Lazy Loading" usually implies automatic loading of reference data in a Data Access Layer. When I speak about "Lazy Initialization" this implies a lazy initialization of some (rare) object properties which require many or unmanaged system resources.

The following sample class shows a common case where lazy initialization becomes a very useful and often required implementation.

sealed class Product1 : IProduct {
   // -------------------------------------------------------------------
   public Product1(string name, Stream picture) {
      Name = name;
      Picture = picture;
   }
   // -------------------------------------------------------------------
   public string Name { get; set; }
   public Stream Picture { get; set; }
}

The "Picture" property of "Product" class is usually very rare needed in a whole system and it works with (probably) unmanaged resources which might cause a very high memory allocation. Always to initialize the picture of a product which might only be required in a web front-end or a sales application would cause a huge I/O traffic which is not required in system directions like billing or call-center order management.

Lazy<T> Class

The Lazy<T> class represents a wrapper class to handle those kind of property initialization.

As a quote from MSDN:

Although you can write your own code to perform lazy initialization, we recommend that you use Lazy instead. Lazy and its related types also support thread-safety and provide a consistent exception propagation policy.

If a system already works with its custom implemented lazy initialization, I'd advice to stay with this. If lazy initialization is a new requirement, we shouldn't run into evil Not Invented Here software project anti-pattern ;-) and use the existing implementation.

Here's a list of methods and constructors provided by Lazy<T> class (descriptions from MSDN):

ctor(valueFactory Func<T>): Initializes a new instance of the Lazy(T) class. When lazy initialization occurs, the specified initialization function is used.
ctor(valueFactory Func<T>, isThreadSafe bool): Initializes a new instance of the Lazy(T) class. When lazy initialization occurs, the specified initialization function and initialization mode are used.
Value: Gets a value that indicates whether lazy initialization has occurred for this Lazy(T) instance.
IsValueCreated: Gets the lazily initialized value of the current Lazy(T) instance.

Lazy<T> class provides some more methods and constructors, but the above seem to be the most important in my opinion.

As a side note for isThreadSafe parameter, a second quote from MSDN:

Some Lazy constructor overloads have a Boolean parameter named isThreadSafe that is used to specify whether the Value() property will be accessed from multiple threads. If you intend to access the property from just one thread, pass in false to obtain a modest performance benefit. If you intend to access the property from multiple threads, pass in true to instruct the Lazy instance to correctly handle race conditions in which one thread throws an exception at initialization time. If you use a constructor that does not have the isThreadSafe parameter, the value defaults to true.

If an application never works with multiple threads, isThreadSafe parameter with value false causes a faster execution due to no thread locking overhead.

How to Use

First, the "Product" class needs to be modified to support lazy initialization. I moved the stream creation into an external factory class

// ======================================================================
sealed class Product2 : IProduct {
   // -------------------------------------------------------------------
   public Product2(string name, string fileName) {
      Name = name;
      _picture = new Lazy<Stream>(() => PictureFactory.GetPicture(fileName), true);
   }
   // -------------------------------------------------------------------
   public string Name { get; set; }
   public Stream Picture {
      get { return _picture.Value; }
      set { _picture = new Lazy<Stream>(() => value, true); }
   }
   Lazy<Stream> _picture;
}
// ======================================================================
static class PictureFactory {
   // -------------------------------------------------------------------
   // get picture from disc
   public static Stream GetPicture(string fileName) {
      Console.WriteLine("## Lazy Initializing Image ##");
      if (string.IsNullOrEmpty(fileName))
         fileName = @"C:\Temp\not_available.jpg";
      return new FileStream(fileName, FileMode.Open, FileAccess.Read);
   }
}

Instead of a direct member field mapping of "Picture" property, "Product" class now works with a Lazy<T> wrapper which only initializes resources when it is requested. This implementation uses the isThreadSafe parameter with value true since we will reuse it below in a multi-threaded sample.

The "fileName" parameter in constructor and factory class to initialize the picture stream is just an example. If no file name was provided, the factory returns the stream of a "not_available.jpg" as default picture (see Special Case pattern).

Notice the Console.WriteLine within the factory class. This will show when data become initialized.

The following sample creates some products and simulates a rendering of product pictures into console.

// ======================================================================
class Program {
   // -------------------------------------------------------------------
   static void Main(string[] args) {
      Console.WriteLine("Creating products");
      List<IProduct> products = new List<IProduct>{
         new Product2("Lord of the Rings", null),
         new Product2("Avatar", @"C:\Temp\avatar.jpg"),
      };

      Console.WriteLine("Loop through products and print picture");
      products.ForEach(p => ConsolePictureRenderer.PrintPicture(p));
      products.ForEach(p => ConsolePictureRenderer.PrintPicture(p));
      Console.ReadKey();
   }
}
// ======================================================================
static class ConsolePictureRenderer {
   // -------------------------------------------------------------------
   public static void PrintPicture(IProduct product) {
      Console.WriteLine();
      // simulate picture rendering by a simple console output
      Console.WriteLine("Picture of {0}", product.Name);
   }
}

As shown in console output, picture data become initialized when it becomes accessed. In addition, picture factory is called only once for each instance of a product.

Multi Threaded

As written above, Lazy<T> class represents a thread-safe implementation of a lazy initialization class. That means, picture factory class does not need to implement a custom thread locking. It's automatically handled in Lazy<T> class.

As a last snippet, we reuse our "Product", "ProductFactory" and "ConsolePictureRenderer" classes from previous sample and change "Main" method to access the product pictures in a multi-threaded environment.

// ======================================================================
class Program {
   // -------------------------------------------------------------------
   static void Main(string[] args) {
      Console.WriteLine("Creating products");
      List<IProduct> products = new List<IProduct>{
         new Product2("Lord of the Rings", null),
         new Product2("Avatar", @"C:\Temp\test.jpg"),
      };

      Console.WriteLine("Loop through products and print picture");
      for (int i = 0; i < 4; i++) {
         ThreadPool.QueueUserWorkItem(foo =>
            products.ForEach(p =>
               ConsolePictureRenderer.PrintPicture(p)));
      }
      Console.ReadKey();
   }
}

Now, products and their pictures become accessed by four parallel threads. A second look into console output shows, all pictures are still loaded only once, since "Product" class initializes Lazy<T> class with isThreadSafe parameter with values true.

Conclusion

.NET 4.0 Lazy<T> class is a nice helper when working with large and/or unmanaged resources. Especially its thread-safe implementation increases the usability in a todays multi-processor environment.

Monday, January 18, 2010

Concatenation of Text and Binary Data in T-SQL

Some month ago, I blogged about "String Split Functions" in SQL Server, today I'll write about the inverse part of this topic; The concatenation of text and binary data of several rows, using T-SQL (and the available extensions). As we'll see, just like string splitting, there are several ways to skin the cat.

Topics covered in this blog entry:

Classical, recursive @variable assignment within an SELECT statement
FOR XML
A WHILE loop and utilizing the BLOB data types WRITE() method
SQLCLR

Topics not covered in this blog entry:

Recursive @variable assignment within a CURSOR/WHILE-loop. This produces the same memory management as the SELECT based version but in a non-set-based manner.
SQLCLR as a aggregate function. Reason will be described later.

Test Data
Since we'll speak about performance in below topics, here are two simple tables containing two different kinds of data to be concatenated.

The below table contains 200 rows with 180,000 characters and bytes per row. This means 36,000,000 characters/bytes will be concatenated.

CREATE TABLE LargeData (
   Id INT NOT NULL IDENTITY(1,1)
      PRIMARY KEY CLUSTERED
   ,Txt VARCHAR(MAX)
   ,Bin VARBINARY(MAX)
);

-- fill the table with 200 rows 180,000 characters per row
WITH Numbers AS (
   SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) Num
   FROM master.sys.all_columns
)
INSERT INTO LargeData
   SELECT TOP(200)
      -- text data
      REPLICATE(CONVERT(VARCHAR(MAX), NEWID()), 5000)
      -- binary data
      ,CONVERT(VARBINARY(MAX), REPLICATE(CONVERT(VARCHAR(MAX), NEWID()), 5000))
   FROM Numbers;

As a second kind of source data we'll work with a table containing 50,000 rows but only 16 characters and bytes per row. This means 800,000 characters/bytes to be concatenated.

CREATE TABLE ManyData (
   Id INT NOT NULL IDENTITY(1,1)
      PRIMARY KEY CLUSTERED
   ,Txt VARCHAR(16)
   ,Bin BINARY(16)
);

-- fill the table with 200 rows 180,000 characters per row
WITH Numbers AS (
   SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) Num
   FROM master.sys.all_columns c1, master.sys.all_columns c2
)
INSERT INTO ManyData
   SELECT TOP(50000)
      -- text data
      LEFT(CONVERT(VARCHAR(36), NEWID()), 16)
      -- binary data
      ,CONVERT(VARBINARY(MAX), NEWID())
   FROM Numbers;

For sake of brevity, I'll provide only one sample statement in the following topics. You can find the complete SQL script as attachment at the end of this blog entry.

Classical @variable Assignment
The classical approach to concatenate data from rows into one column (or variable in our case) is an SELECT statement with a recursive variable assignment.

DECLARE 
   @txt VARCHAR(MAX) = ''
   ,@bin VARBINARY(MAX) = 0x;

SELECT 
      @txt += TextColumn
      ,@bin += BinaryColumn
   FROM AnyTable;

(The “+=”-Operator is a new feature of SQL Server 2008. If you work on SQL Server 2005, just replace it with a variable reassignment like “@txt = @txt + TextColumn”. The “+=”-Operator does exactly the same; it’s just a simplified way to write T-SQL.)

This is generally a good solution due to three facts. It’s easy to use. It’s a set-based solution. It’s fast for less data to be concatenated. On the other hand this kind of concatenation runs into problems if you have to handle many or large data, because all data are copied several times.

A look into Profiler shows a huge resource usage, especially for concatenating of many rows to be handled. A concatenation of only 800,000 bytes or characters caused over 180,000,000 read and 1,400,000 write operations.

FOR XML
This is a kind of tricking another build-in feature of SQL Server 2005 (and further versions). FOR XML is generally made to create XML output from a SELECT statement. However, it can also be used to concatenate data without any XML-style.

I learned this trick on SQLServerCentral.com. One article that shows the usage is Jeff Moden's article about Performance Tuning: Concatenation Functions and Some Tuning Myths. (Special thanks to Bluedog67 for this link :-) .)

Another good resource in web for T-SQL is Mladen Prajdić's page where this trick is shown in "SQL Server 2005: Concat values XML Style".

DECLARE @txt VARCHAR(MAX);
SELECT @txt = (SELECT Txt FROM LargeData FOR XML PATH(''));

Literally the text to be provided within the PATH(‘’) is made to provide the name of a parent XML element used for the output data. If you provide an empty string you get a concatenated text of all data returned by the query.

This is a very powerful solution to concatenate text data. Unfortunately it has to be used carefully if source data are unknown. Since this method works with XML, any kind of functional XML characters like “>” or “&” become masked by this method. Binary data cannot be concatenated by XML PATH().

A look into profiler shows a great resource usage and duration.

WRITE Method
SQL Server 2005 introduced the new BLOB data types VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX). As well as previous BLOB data types (TEXT, NTEXT and IMAGE), also the old BLOB functions like TEXTPTR or UPDATETEXT became marked as deprecated. As a replacement for those functions, SQL Server 2005 introduced a new method called WRITE() which is a data type method - not a usual T-SQL function. This means, it is not called to get a BLOB as a parameter but it is called on a BLOB data type, like methods in a object-oriented programming language.

DECLARE 
   @all VARCHAR(MAX) = ''
   ,@buffer VARCHAR(MAX)
   ,@position INT = 0
   ,@len INT
   ,@i INT = 0
   ,@max INT = (SELECT MAX(Id) FROM LargeData);

WHILE (@i < @max)
BEGIN
   -- get the next row value
   SELECT TOP(1)
         @i = Id
         ,@buffer = Txt
         ,@len = LEN(Txt)
      FROM LargeData
      WHERE Id > @i
      ORDER BY Id;

   -- add the text to the existing text
   SET @all.WRITE(@buffer, @position, @len);
   SET @position += @len;
END;

Unfortunately, there seem to be no way (I know?) to use the WRITE method in a set-based way. It is possible to use it within an UPDATE statement but it is called only once and not for each row of the source data.

A look into Profiler shows, the WRITE method much faster than the classical approach of variable reassignment. It's two times faster with ten times less resource usage for larger data and it's 60% faster with about four times less resource usage with many data to be concatenated.

However, is there a way to get this WRITE method faster than this? As I already wrote above, it can also be used in an UPDATE statement, so this should also be tried with a temp table.

-- Large Binary
DECLARE 
   @all VARBINARY(MAX) = 0x
   ,@i INT = 0
   ,@max INT = (SELECT MAX(Id) FROM LargeData);

DECLARE @buffer TABLE (Bin VARBINARY(MAX));

INSERT INTO @buffer
   SELECT 0x;

WHILE (@i < @max)
BEGIN
   -- get the next row value
   WITH cte AS (
      SELECT TOP(1)
         Id
         ,Bin
      FROM LargeData
      WHERE Id > @i
      ORDER BY Id
   )
   UPDATE b SET
         -- get the next id
         @i = cte.Id
         -- write into temp table
         ,Bin.WRITE(cte.Bin, DATALENGTH(b.Bin), DATALENGTH(cte.Bin))
      FROM @buffer b CROSS JOIN cte
END;
SELECT @all = Bin FROM @buffer;

Sure, the SQL statement is a bit more complicated since we have to use a common table expression to get the (ORDERED) next row of source data, but what's about performance?

A look into Profiler shows, execution for many, small data didn't really change. On the other hand concatenation of large data became about 80% faster with only 30% resource usage. So this - more complicated - way to use the WRITE method might be a good investment if large data manipulation is needed.

SQLCLR
Well, if you don't consider to ever enable CLR in SQL Server this blog ends here :-).

Generally, SQLCLR provides two different ways to perform this concatenation.

The first way would be a user-defined aggregate function which gets the text or binary data to be concatenated. This solution would provide a very handy way to be used like this.

SELECT CONCATENATE_TEXT(Txt) FROM ManyData;

Unfortunately, this solution would require to set the resulting assembly to unsafe mode and server to be configured at TRUSTWORTHY level. Why? All SQLCLR functions (procedures, functions, ...) are static methods on a wrapper .NET class and aggregate functions are called again and again for each row to be proceeded. Within this function CLR only get the current value to be handled, what means CLR would have to remember the previous data anywhere in a static (shared) variable. Due to concurrency calls this would have to handled a custom thread-locking to store and get information. I find SQLCLR provides some powerful extensions but an unsafe assembly tastes bad. Due to this fact, I'll not provide this solution for now. (If anybody is interested in this, please leave a short note and I'll provide this in another blog entry.)

The second way to handle the concatenation is a simple user-defined scalar function which gets the SQL query to be executed. This requires a dynamic SQL approach to use the function, though, it is build-in thread safe because it doesn't require any class member variables.

Here's a C# method to concatenate text data.

[Microsoft.SqlServer.Server.SqlFunction(
   DataAccess=DataAccessKind.Read,
   IsDeterministic=false,
   IsPrecise=true,
   SystemDataAccess=SystemDataAccessKind.None
   )]
public static SqlChars ConcatText (
   // the sql statement to be executed
   [SqlFacet(IsFixedLength=false ,IsNullable=false ,MaxSize=-1)]
   SqlString sqlIn
   ) {

   // create a System.Text.StringBuilder to push all text data into.
   StringBuilder buffer = new StringBuilder();

   // connect to database and execute the provided SQL statement
   using (SqlConnection cn = new SqlConnection("Context Connection=true")) {
      cn.Open();
      using (SqlCommand cmd = new SqlCommand(sqlIn.Value, cn))
      using (SqlDataReader reader = cmd.ExecuteReader()) {
         while (reader.Read()) {
            // get the next string value from reader and append it to buffer
            string str = (string)reader[0];
            buffer.Append(str);
         }
      }
   }

   // return the concatenated string
   return new SqlChars(buffer.ToString());
}

Here's a C# method to concatenate binary data.

[Microsoft.SqlServer.Server.SqlFunction(
      DataAccess=DataAccessKind.Read,
      IsDeterministic=true,
      IsPrecise=true,
      SystemDataAccess=SystemDataAccessKind.None
      )]
   public static SqlBytes ConcatBinary(
      // the sql statement to be executed
      [SqlFacet(IsFixedLength = false, IsNullable = false, MaxSize = -1)]
      SqlString sqlIn
      ) {

      // create a System.IO.MemoryStream to push all text data into
      MemoryStream stream = new MemoryStream();

      // connect to database and execute the provided SQL statement
      using (SqlConnection cn = new SqlConnection("Context Connection=true")) {
         cn.Open();
         using (SqlCommand cmd = new SqlCommand(sqlIn.Value, cn))
         using (SqlDataReader reader = cmd.ExecuteReader()) {
            while (reader.Read()) {
               // get the next binary value from reader and append it to buffer
               byte[] data = (byte[])reader[0];
               stream.Write(data, 0, data.Length);
            }
         }
      }
      // important! reset the stream position to the beginning
      stream.Seek(0, SeekOrigin.Begin);

      return new SqlBytes(stream);
   }

The call from T-SQL is equal to any other scalar function call.

DECLARE @bin VARBINARY(MAX);
SELECT @bin = dbo.ConcatBinary('SELECT Bin FROM LargeData');

Due to the fact that SQLCLR (.NET in general) does not know ANSI text like VARCHAR all non-Unicode data have to be translated into Unicode and back into ANSI. Therefore I extended this tests with two additional executions into NVARCHAR variables instead of VARCHAR.

A look into Profiler shows, only XML transformation is slightly faster than SQLCLR, without any restrictions for functional characters and it fully supports binary concatenation. In case of large text data concatenation into a NVARCHAR variable SQLCLR is even slightly faster than XML.

Conclusion
Classical concatenation by recursive variable assignment is a nice way to handle less, small data. In case of larger or many data it appears to be not the best solution.

New WRITE method is a nice extension for SQL Server, especially when working with larger data.

FOR XML PATH('') is a very powerful trick for text data concatenation. Use it (really!), as long as you keep in mind, it is a kind of a hack and should be used carefully.

We all (including me!) know, there are very less useful ways to use SQLCLR. But concatenation of data as well as string splitting are some of those and might be considered if this is a common task in a SQL Server 2005/2008 environment.

Attachments

Post Scriptum
Thanks again to Bluedog67, who showed me another blog about string concatenation by Anith Sen at:
Concatenating Row Values in Transact-SQL
It also covers the CLR aggregate function.

... didn't know this blog entry before...

Thursday, January 14, 2010

Bug in SQL Profiler when working Table-Valued Parameters and DATETIME

Just provided a new bug note at Microsoft® SQL Server Connect:
Profiler shows DATETIME for TVP as DATETIME2

If you work with table-valued types and table-valued parameters provided by a front-end application you get an invalid SQL Profiler output for DATETIME columns. Profiler shows DATETIME column inserts as DATETIME2 values (like '2010-01-14 18:38:09.2100000'). The problem with this output are the nano-second digits which cannot be converted to DATETIME.

For anybody who is interested, here a short description how to reproduce.

Create a new table-valued type in SQL Server.

CREATE TYPE FooType AS TABLE (
   Dt DATETIME
);

And a little C# application that uses this table-valued type by executing an SqlCommand.

class TestEnumerator : IEnumerable {
   public IEnumerator GetEnumerator() {
      // create meta data for the table-valued parameter
      SqlMetaData[] meta = new SqlMetaData[] {
         new SqlMetaData("Dt", SqlDbType.DateTime)
      };

      // create a single record to be returned as enumerator
      SqlDataRecord record = new SqlDataRecord(meta);
      record.SetDateTime(0, DateTime.Now);
      // return the record as enumerator value
      yield return record;
   }

   IEnumerator IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
   }
}
class Program {
   static void Main(string[] args) {
      string cnStr = Properties.Settings.Default.SandboxConnectionString;

      // connect to the database and create a new sql command that simply 
      // returns the count of rows within a table-valued parameter
      using (SqlConnection cn = new SqlConnection(cnStr))
      using (SqlCommand cmd = new SqlCommand("SELECT COUNT(*) FROM @t", cn)) {
         cn.Open();

         // create and configure the table-valued parameter
         SqlParameter p = new SqlParameter("@t", SqlDbType.Structured);
         // specify the table-valued type
         p.TypeName = "FooType";
         p.Value = new TestEnumerator();
         cmd.Parameters.Add(p);

         // execute the command
         object result = cmd.ExecuteScalar();
      }
   }
}

Now start SQL Profiler and enable the "RPC:Completed" event. Start the C# application and you will get the following trace output:

declare @p3 dbo.FooType
insert into @p3 values('2010-01-14 18:38:09.2100000')

exec sp_executesql N'SELECT COUNT(*) FROM @t',N'@t FooType READONLY',@t=@p3

The error occurs if you try to run this SQL in SSMS for debugging.

Msg 241, Level 16, State 1, Line 7
Conversion failed when converting date and/or time from character string.

Monday, January 11, 2010

How to Enable First Chance Exceptions in Visual Studio

This blog entry explains how to enable first chance exceptions in Visual Studio. I explained this several people, so this might be a basic configuration but there are many people who apparently do not yet know.

First chance exceptions means that Visual Studio immediately stops when any kind of exception (e.g. NullReferenceException or InvalidOperationException) occurs. Often first chance exceptions are disabled in Visual Studio by default installation. This causes exceptions that can be caught by a catch-block, wherever in call stack, will not stop the debugger at the point of exception creation. The debugger only stops if an exception cannot be handled by any catch-block. So a developer often does not see where the exception was created and thrown.

Depending on your installation of Visual Studio you first have to add an additional command to the menu of your IDE. Therefore select "Tools" menu and click "Customize...".

The "Customize" dialog opens where you can select additional items to be added to your menu or toolbars. Activate the "Commands" tab and select "Debug" in the left list with title "Categories". Select "Exceptions..." item from the right list with title "Commands" and drag it into your "Debug" menu on top of Visual Studio.

Now you can close the Customize dialog by hitting the "Close" button.

Go to menu and select the newly added menu item "Exceptions..." in "Debug" top menu. The new "Exceptions" dialog appears where you can configure the exception behavior of Visual Studio debug sessions. Within the shown list select "Common Language Runtime Exceptions" and activate the "Thrown" check-box on the right side.

As you noticed, you also can expand each exception category to select any special kind of exception you want to enable or disable for first chance exception handling. This is useful for some special debugging sessions, but most times its best to force Visual Studio to stop at any kind of thrown exception.

To close the dialog use the "Ok" button.

Now you can start debugging your project and Visual Studio stops as soon as any kind of exception occurs.

Happy coding :-)

Saturday, January 9, 2010

O/R-Mapper - Enemy Mine

Those days, I here many discussions pro and contra O/R-Mappers (ORMs). Jack Corbett, a professional SQL guy I really regard speaks at SQLSaturday#32 about "Why Should I Use Stored Procedures?" in relation to ORM tools. I just had an own, nice discussion with several guys at:
Sind O/R-Mapper potentiell ein schleichender Tod? (a German C# developer forum)
Try of a English translation:
Are O/R-Mapper a potential sneaking dead?

The quint essence of this discussion was. Most, experienced developers generally like the idea of those ORMs but currently they don't trust them in larger projects.

In this blog entry I'll try to address my main problems with O/R-Mappers.

The Holy Cow for Consumers

Often I saw developers who forgot the "R" (relational) within the OR-Mapper. It's a mapping tool to bring relational data into an domain object model. However, there is still a relational database system behind the curtain. It's very very very important to never forget this.

What does this mean?

The first thing is the query support. ORMs usually provide an object oriented way to handle database queries. This is a nice feature (if correct implemented by the ORM), but it's impossible for any tool to cover the whole power of RDBMS specific SQL. Those queries are usually fine for simple queries like "give me a customer by her code" or "give me the not finished orders of a specific customer". That means queries on direct fields on the base table of a domain object and queries over one or maybe two related tables. As soon as you hit this boundary, you should consider to implement a custom SQL query (or procedure). Especially things like "LEFT JOINs", "OR" or the disingenuous "IN" should always be done by custom SQL, since it directly shows potential problems - as execution plans in SQL Server.

The second, most times wrong used feature of O/R-Mappers is lazy loading. There are some, very rare(!) business cases where lazy loading is a good thing and should be used. Whenever I use lazy loading I have to keep in mind that any access to a not yet loaded relation causes an own SQL query fired to the database server. Using lazy loading in form of always just load the root object and let the mapper load every information when I need it quickly ends up in hundreds or thousands or more single queries stressing the server. Let me quote Martin Fowler who calls this a "ripple loading".

It's important to always keep in mind, an O/R-Mapper is a tool and nothing more, really. It's a tool to automate some boring work like simple 1:1 mapping of relational data into a domain object and automate some basic queries. Depending on the technology which is covered by the tool - relational databases - every ORM is always a weak tool. That doesn't mean that developers of ORMs doing a bad job! The weakness depends on the fact that an ORM always can only cover a very small and simple part of the SQL possibilities of a todays RDBMS. They especially cannot handle a professional query tuning like analyzing execution plans which may point a completely different query for same result on different databases - sometimes even for the same RDBMS.

The Holy Cow Mislead by the Publishers

Sadly, some of the problems addressed above are often caused on the way how those tools are promoted (or implemented). Every common O/R-Mapper is promoted as "the all-in-one". However, I think an ORM will never be able to handle every special case - from database and domain model side.

As I pointed the "R" as the most common potential problem while consuming an ORM; the "M" (mapper) is my main issue of todays ORMs. Sure, the mapping is the main part of an O/R-Mapper. It maps relational data into a domain object model and vice versa. Unfortunately, all mappers I know encapsulate this part of the tool as a black-box. My problem with this encapsulation is an build-in tight coupling between database and domain model which is restricted to the mapping strength of the used tool. I think the only restriction of modelling should be the experiences of the developers, DBAs and architects.

What does this mean?

In the domain model, this tight coupling sometimes disables a more normalized form of objects than it is available by the database. I don't speak about a denormalized database but often an domain object model uses more than one object for one table. While a "Orders" table might contain customer information like "CustomerOrderRef" or "RequestedDeliveryDate", the "Order" domain object can contain another "CustomerInfo" reference which holds all the customer specific information. As long as those related objects are mutable (what means they provider setter methods or properties), abstracting ORMs like NHibernate or Entity Framework are able to handle the mapping. However, if there is a mapping to a immutable value object they hit the wall. A good sample (again by Martin Fowler) for a mapping of columns to a immutable object is a Money-object (don't mis with the SQL Server data type MONEY). Money usually has two different information an Amount and a Currency. Usually a table usually contains an "Amount" column and a "CurrencyId" column, but those non normalized information seem to be not the best design in a domain model, especially if I want to be able to calculate with those money information. I'd be glad if I could inject the existing mapping with some custom work without having to provide the database columns in my domain objects.

On the other hand the database can be more normalized than a domain model, what is more common known situation. Again, ORMs like NHibernate and Entity Framework support simple denormalization of 1:1 table relations into a domain model - with some restrictions. A good sample where ORMs are impossible to handle the mapping are EAV/CR tables. Generally EAV/CR is not a common database design today but there are good reasons to use them for some special cases. If I think about a software of a sports club which holds describing information for each member. Storing all playing skills of all members for any kind of sport. This seems to be almost impossible in a usual relational database design. At this point I'd like to be able to map this very special form of data into my domain object model without leaving the whole universe of the rest of my model by defining my "Member" as POCO (or POJO in case of Java) which is not part of my mapper.

Last but not least, many ORMs access way to may information what causes a huge, not needed I/O overhead. An order object and table might contain several information which are needed for very different parts of a system like customer, shipping and billing information. ORMs either always load and write all the information or work with a property/column based lazy loading - what usually becomes the hardest kind of ripple loading. There are some main information like the table identity and an "OrderNumber" which are usually needed for almost any part of the system but I don't need all the other information whenever I load an order from the database.

My Wishes for the Future

From the consumer side, I hope developers come back to the mindset that an O/R-Mapper is a (weak) tool and nothing more. Whenever they create a new ORM query object, reconsider if the query that will be created fits the possibilities of the mapper.

From the publisher side, I don't look for a mapper that handles mapping of value objects or EAV/CR tables. These have just have been samples for boundaries of mappers - maybe there is even a mapper which supports one of those features. I'm looking for an O/R-Mapper which knows its limits and allows a tailor made mapping without the must of leaving the whole universe of the mapper for change tracking, transaction handling and other useful features.

A current Spare Time Project

I'm thinking several month about O/R-Mappers, their strengths and their weaknesses. I tried several mappers and after some short euphorias I hit back the ground due to existing restrictions which could not be handled without some ugly workarounds. About 2 month ago, I started to think (and implement) a own version of an ORM. After more than 150 classes and the first tests it seems to fit my requirements but who knows... Maybe it dies while implementation phase. Maybe it works and I'll use it in one of my future projects. Maybe I'll publish it. If I publish it, maybe it becomes yet another dead born O/R-Mapper in the internet, maybe there will be some guys and/or gals who try and hate it, maybe there will be some guys and/or gals who try and enjoy it.

Friday, January 8, 2010

Generic C# RingBuffer

Today I'll share an almost coding only blog entry.

A RingBuffer (aka. Circular Buffer) is a kind of a collection with a fixed maximal count of items to be stored within. After it reaches its capacity, the next insertion of a new items causes the first item to be overwritten.

These kind of objects is especially useful when working with large amounts of data where it is required to look back at some previous items (e.g. for validation). A RingBuffer provides this functionality by avoiding to hold all the data ever stored within.

This can be imagined like a window when driving by train. We see the next tree and the previous some, but we don't see all the trees since the journey started.

Before I wrote this blog entry, I searched the .NET framework and some common web sources. I have to say, I was wondering that I did not find another (working) .NET implementation of a RingBuffer.

Here is my version of a generic RingBuffer. Source file and unit tests are attached at the bottom of this post.

/*
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
using System.Diagnostics;
*/


/// <summary>
/// Represents a fixted length ring buffer to store a specified maximal count of items within.
/// </summary>
/// <typeparam name="T">The generic type of the items stored within the ring buffer.</typeparam>
[DebuggerDisplay("Count = {Count}")]
public class RingBuffer<T> : IList<T>, ICollection<T>, 
                             IEnumerable<T>, IEnumerable {
   /// <summary>
   /// Creates a new instance of a <see cref="RingBuffer&lt;T&gt;"/> with a 
   /// specified cache size.
   /// </summary>
   /// <param name="capacity">The maximal count of items to be stored within 
   /// the ring buffer.</param>
   public RingBuffer(int capacity) {
      // validate capacity
      if (capacity <= 0)
         throw new ArgumentException("Must be greater than zero", "capacity");
      // set capacity and init the cache
      Capacity = capacity;
      _buffer = new T[capacity];
   }

   /// <summary>
   /// the internal buffer
   /// </summary>
   T[] _buffer;
   /// <summary>
   /// The all-over position within the ring buffer. The position 
   /// increases continously by adding new items to the buffer. This 
   /// value is needed to calculate the current relative position within the 
   /// buffer.
   /// </summary>
   int _position;
   /// <summary>
   /// The current version of the buffer, this is required for a correct 
   /// exception handling while enumerating over the items of the buffer.
   /// </summary>
   long _version;

   /// <summary>
   /// Gets or sets an item for a specified position within the ring buffer.
   /// </summary>
   /// <param name="index">The position to get or set an item.</param>
   /// <returns>The fond item at the specified position within the ring buffer.
   /// </returns>
   /// <exception cref="IndexOutOfRangeException"></exception>
   public T this[int index] {
      get {
         // validate the index
         if (index < 0 || index >= Count)
            throw new IndexOutOfRangeException();
         // calculate the relative position within the rolling base array
         int index2 = (_position - Count + index) % Capacity;
         return _buffer[index2]; 
      }
      set { Insert(index, value); }
   }

   /// <summary>
   /// Gets the maximal count of items within the ring buffer.
   /// </summary>
   public int Capacity { get; private set; }
   /// <summary>
   /// Get the current count of items within the ring buffer.
   /// </summary>
   public int Count { get; private set; }
   
   /// <summary>
   /// Adds a new item to the buffer.
   /// </summary>
   /// <param name="item">The item to be added to the buffer.</param>
   public void Add(T item)
   {
       // avoid an arithmetic overflow
       if (_position == int.MaxValue)
           _position = _position%Capacity;
       // add a new item to the current relative position within the
       // buffer and increase the position
       _buffer[_position++ % Capacity] = item;
       // increase the count if capacity is not yet reached
       if (Count < Capacity) Count++;
       // buffer changed; next version
       _version++;
   }

   /// <summary>
   /// Clears the whole buffer and releases all referenced objects 
   /// currently stored within the buffer.
   /// </summary>
   public void Clear() {
      for (int i = 0; i < Count; i++)
         _buffer[i] = default(T);
      _position = 0;
      Count = 0;
      _version++;
   }

   /// <summary>
   /// Determines if a specified item is currently present within
   /// the buffer.
   /// </summary>
   /// <param name="item">The item to search for within the current
   /// buffer.</param>
   /// <returns>True if the specified item is currently present within 
   /// the buffer; otherwise false.</returns>
   public bool Contains(T item) {
      int index = IndexOf(item);
      return index != -1;
   }

   /// <summary>
   /// Copies the current items within the buffer to a specified array.
   /// </summary>
   /// <param name="array">The target array to copy the items of 
   /// the buffer to.</param>
   /// <param name="arrayIndex">The start position witihn the target
   /// array to start copying.</param>
   public void CopyTo(T[] array, int arrayIndex) {
      for (int i = 0; i < Count; i++) {
         array[i + arrayIndex] = _buffer[(_position - Count + i) % Capacity];
      }
   }

   /// <summary>
   /// Gets an enumerator over the current items within the buffer.
   /// </summary>
   /// <returns>An enumerator over the current items within the buffer.
   /// </returns>
   public IEnumerator<T> GetEnumerator() {
      long version = _version;
      for (int i = 0; i < Count; i++) {
         if (version != _version)
            throw new InvalidOperationException("Collection changed");
         yield return this[i];
      }
   }

   /// <summary>
   /// Gets the position of a specied item within the ring buffer.
   /// </summary>
   /// <param name="item">The item to get the current position for.</param>
   /// <returns>The zero based index of the found item within the 
   /// buffer. If the item was not present within the buffer, this
   /// method returns -1.</returns>
   public int IndexOf(T item) {
      // loop over the current count of items
      for (int i = 0; i < Count; i++) {
         // get the item at the relative position within the internal array
         T item2 = _buffer[(_position - Count + i) % Capacity];
         // if both items are null, return true
         if (null == item && null == item2)
            return i;
         // if equal return the position
         if (item != null && item.Equals(item2))
            return i;
      }
      // nothing found
      return -1;
   }

   /// <summary>
   /// Inserts an item at a specified position into the buffer.
   /// </summary>
   /// <param name="index">The position within the buffer to add 
   /// the new item.</param>
   /// <param name="item">The new item to be added to the buffer.</param>
   /// <exception cref="IndexOutOfRangeException"></exception>
   /// <remarks>
   /// If the specified index is equal to the current count of items
   /// within the buffer, the specified item will be added.
   /// 
   /// <b>Warning</b>
   /// Frequent usage of this method might become a bad idea if you are 
   /// working with a large buffer capacity. The insertion of an item
   /// at a specified position within the buffer causes causes all present 
   /// items below the specified position to be moved one position.
   /// </remarks>
   public void Insert(int index, T item) {
      // validate index
      if (index < 0 || index > Count)
         throw new IndexOutOfRangeException();
      // add if index equals to count
      if (index == Count) {
         Add(item);
         return;
      }

      // get the maximal count of items to be moved
      int count = Math.Min(Count, Capacity - 1) - index;
      // get the relative position of the new item within the buffer
      int index2 = (_position - Count + index) % Capacity;

      // move all items below the specified position
      for (int i = index2 + count; i > index2; i--) {
         int to = i % Capacity;
         int from = (i - 1) % Capacity;
         _buffer[to] = _buffer[from];
      }

      // set the new item
      _buffer[index2] = item;

      // adjust storage information
      if (Count < Capacity) {
         Count++;
         _position++;
      }
      // buffer changed; next version
      _version++;
   }

   /// <summary>
   /// Removes a specified item from the current buffer.
   /// </summary>
   /// <param name="item">The item to be removed.</param>
   /// <returns>True if the specified item was successfully removed
   /// from the buffer; otherwise false.</returns>
   /// <remarks>
   /// <b>Warning</b>
   /// Frequent usage of this method might become a bad idea if you are 
   /// working with a large buffer capacity. The removing of an item 
   /// requires a scan of the buffer to get the position of the specified
   /// item. If the item was found, the deletion requires a move of all 
   /// items stored abouve the found position.
   /// </remarks>
   public bool Remove(T item) {
      // find the position of the specified item
      int index = IndexOf(item);
      // item was not found; return false
      if (index == -1)
         return false;
      // remove the item at the specified position
      RemoveAt(index);
      return true;
   }

   /// <summary>
   /// Removes an item at a specified position within the buffer.
   /// </summary>
   /// <param name="index">The position of the item to be removed.</param>
   /// <exception cref="IndexOutOfRangeException"></exception>
   /// <remarks>
   /// <b>Warning</b>
   /// Frequent usage of this method might become a bad idea if you are 
   /// working with a large buffer capacity. The deletion requires a move 
   /// of all items stored abouve the found position.
   /// </remarks>
   public void RemoveAt(int index) {
      // validate the index
      if (index < 0 || index >= Count)
         throw new IndexOutOfRangeException();
      // move all items above the specified position one step
      // closer to zeri
      for (int i = index; i < Count - 1; i++) {
         // get the next relative target position of the item
         int to = (_position - Count + i) % Capacity;
         // get the next relative source position of the item
         int from = (_position - Count + i + 1) % Capacity;
         // move the item
         _buffer[to] = _buffer[from];
      }
      // get the relative position of the last item, which becomes empty
      // after deletion and set the item as empty
      int last = (_position - 1) % Capacity;
      _buffer[last] = default(T);
      // adjust storage information
      _position--;
      Count--;
      // buffer changed; next version
      _version++;
   }

   /// <summary>
   /// Gets if the buffer is read-only. This method always returns false.
   /// </summary>
   bool ICollection<T>.IsReadOnly { get { return false; } }

   /// <summary>
   /// See generic implementation of <see cref="GetEnumerator"/>.
   /// </summary>
   /// <returns>See generic implementation of <see cref="GetEnumerator"/>.
   /// </returns>
   IEnumerator IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
   }
}

You can download the source file and the unit tests here:
RingBuffer.cs
RingBufferTests.cs