Sunday, May 30, 2010

Streaming Serialization with XmlSerializer

As known, .NET provides several different ways to serialize and de-serialize objects. One of those serialization techniques is the XmlSerializer class. Generally this class provides a nice, straight forward approach. However, there is one problem with this (and most other) serialization classes, it does not support streaming. If you want to (de-)serialize really large counts of objects, you might be run into physical memory problems.

One solution is to implement the IXmlSerializable interface, which exposes the ReadXml(XmlReader) and WriteXml(XmlWriter) operations. Though, this ends in quiet a bunch of complicated code.

To handle the memory problem when using the XmlSerializer class, here you can find two simple wrapper classes which provide a streaming functionality.

XmlStreamingSerializer

The XmlStreamingSerializer creates an internal instance of a XmlSerializer and a XmlWriter which provides the persistence management. To avoid the "xsi" and "xsd" namespaces again and again for each object to be serialized, it simply provides an empty XmlSerializerNamespaces. I found that trick in one of the comments on Scott Hanselman's blog about XmlFragmentWriter.
public class XmlStreamingSerializer<T> {
   // ------------------------------------------------------------------
   static XmlStreamingSerializer() {
      _ns = new XmlSerializerNamespaces();
      _ns.Add("", "");
   }
   // ------------------------------------------------------------------
   private XmlStreamingSerializer() {
      _serializer = new XmlSerializer(typeof(T));
   }
   // ------------------------------------------------------------------
   public XmlStreamingSerializer(TextWriter w)
      : this(XmlWriter.Create(w)) {
   }
   // ------------------------------------------------------------------
   public XmlStreamingSerializer(XmlWriter writer) : this() {
      _writer = writer;
      _writer.WriteStartDocument();
      _writer.WriteStartElement("ArrayOf" + typeof(T).Name);
   }
   // ==================================================================
   static XmlSerializerNamespaces _ns;
   XmlSerializer _serializer = new XmlSerializer(typeof(T));
   XmlWriter _writer;
   bool _finished;
   // ==================================================================
   public void Finish() {
      _writer.WriteEndDocument();
      _writer.Flush();
      _finished = true;
   }
   // ------------------------------------------------------------------
   public void Close() {
      if (!_finished)
         Finish();
      _writer.Close();
   }
   // ------------------------------------------------------------------
   public void Serialize(T item) {
      _serializer.Serialize(_writer, item, _ns);
   }
}

XmlStreamingDeserializer

As the serializer, the XmlStreamingDeserializer class wraps an instance of a .NET XmlSerializer. It uses a XmlReader to provide the streaming functionality and utilizes the XmlReader.ReadSubtree method to get the current serialized item into the XmlSerializer.
public class XmlStreamingDeserializer<T> {
   // ------------------------------------------------------------------
   static XmlStreamingDeserializer() {
      _ns = new XmlSerializerNamespaces();
      _ns.Add("", "");
   }
   // ------------------------------------------------------------------
   private XmlStreamingDeserializer() {
      _serializer = new XmlSerializer(typeof(T));
   }
   public XmlStreamingDeserializer(TextReader reader)
      : this(XmlReader.Create(reader)) {
   }
   public XmlStreamingDeserializer(XmlReader reader) : this() {
      _reader = reader;
   }
   // ==================================================================
   static XmlSerializerNamespaces _ns;
   XmlSerializer _serializer = new XmlSerializer(typeof(T));
   XmlReader _reader;
   // ==================================================================
   public void Close() {
      _reader.Close();
   }
   // ------------------------------------------------------------------
   public T Deserialize() {
      while (_reader.Read()) {
         if (_reader.NodeType == XmlNodeType.Element
               && _reader.Depth == 1
               && _reader.Name == typeof(T).Name) {
            XmlReader reader = _reader.ReadSubtree();
            return (T)_serializer.Deserialize(reader);
         }
      }
      return default(T);
   }
}

Some Tests

Here you can find some performance test results.

I serialized a very simple object structure shown below.
public class Foo {
   [XmlAttribute]
   public int Id { get; set; }
   [XmlAttribute]
   public string Bar { get; set; }
   public List<foo> SubFoos { get; set; }
}
I serialized 10,000 instances, each with 100 sub items. Maybe you don't have to serialize so many objects, but the depth of serialized objects is often far deeper than two levels.
ActionDuration (ms)RAM (MB)
Serialization with XmlSerializer2954134
Serialization with XmlStreamingSerializer239113
De-serialization with XmlSerializer3662150
De-serialization with XmlStreamingDeserializer295313
Those test results are not representative for any purpose. Especially the apparently faster processing of the streaming classes seems to be a bit curious to me. Nevertheless, the more important factor in this test results it the far less memory requirement (about 10% here).

Possible Extensions

Both classes, shown above are quiet simple and there are several possible extensions to get them more powerful.
  • One class for both. If you prefer one serialization class for serialization and de-serialization, feel free to merge both into one. I preferred the two-class solution to keep each of them more pure.
  • Configurable document element name. Since now, the serializer creates a hard-coded document element name called "ArrayOfMyType". This behavior matches to the XmlSerializer behavior, when serializing an IEnumerable<out T>. Feel free make this
  • Xml Namespaces. The above serialization classes doe not support XML namespaces. If you need namespaces just add something like a public XmlSerializerNamespaces property.
  • Different objects. XML files often contain more than one type of objects. Support of different object types could easily be added by a dictionary of serializers.
The intension of this blog was not to present a fully dressed streaming XML framework. I tried to a prove of concept for how to handle large amounts of objects in combination with the .NET XmlSerializer class.

Attachments

Here you can find the XmlStreamingSerializer and XmlStreamingDeserializer classes as well as a simple console application I used for my tests.

Friday, May 14, 2010

Layers, Bondaries And How To Exchange Data

One of the most popular (architecual) patterns is the Three-Layer design. Unfortunately, this pattern is often convounded with the three-tier pattern. I take the definition of authors like Martin Fowler, Thomas Erl and Craig Larman, which says:
  • The Three-Tier pattern describes a software deployment into three physical tiers. A common solution for this pattern is a database server, a web server and a web client (browser)
  • The Three-Layer pattern describes how to split sofware concerns into three different layers. These layers are a data acess layer, a business logic layer and a presentation layer. The layers might also be deployed into different tiers, though, this is optional.

The below picture shows the three layers and their positions.


The intention of this blog is not to describe how to implement the Three-Layer pattern, there are already milions of web resources which does this. This blog focuses on the data exchange between the data access layer (DAL) and the business logic layer (BLL).

The Architectual Challange

Most sources, describing the three-layer pattern, advice to separate the DAL and the BLL into two different assemblies. The reason is to avoid mixing different concerns, what is a good thing. This brings up an intersting issue. How to create BLL objects (like Customer or Order) from DAL?

Why? Because each layer should only communicate with the layer directly below. A layer should never communicate with a above layer. So, the BLL communicates with the DAL, but the DAL never (active) communicates with the BLL. However, the business objects (aka. entities, domain objects, ...) are stored within the BLL, so how to create objects which are not known within the DAL?

The following topics describe diferent solutions of how to get data from DAL into BLL and back.

DataTables

DataTables are one option to transport data from DAL to the BLL. The DAL gets the data from data store, transforms the data into a DataTable and returns the table to the calling BLL. The BLL gets the DataTable and transforms the data into business objects. For saving changes, the BLL transforms the business objects into a DataTable and calls a distinct save method from DAL.

The advantages of DataTable objects are, they support other rich features like complex sorting, searching and data binding. They are serializable and can be used for data exchange over network.

The disadvantages are, DataTables (especially if not using typed DataSets) the contained data are not guaranteed to be valid by type. Another issue of DataTables is the overhead, since DataTables objects are very rich objects. Last but not least, this approach requires a huge effort of quiet simple mapping code (= unhappy developer) within the BLL from a DataTable into business objects and back into a DataTable.

// ====================================================================
// DAL
public DataTable GetCustomers(object criteria) {
   using (IDataReader reader = CreateReader(criteria)) {
      // create results DataTable
      DataTable result = new DataTable();
      result.Columns.Add("Id", typeof(int));
      result.Columns.Add("Name", typeof(string));
      // fill the table
      while (reader.Read()) {
         DataRow row = result.NewRow();
         row["Id"] = reader["Id"];
         row["Name"] = reader["Name"];
         result.Rows.Add(row);
      }

      return result;
   }
}
// --------------------------------------------------------------------
public void SaveCustomers(DataTable data) { /* do save*/ }
// ====================================================================
// BLL
public void DoFoo() {
   // get the DataTable from DAL
   DataTable data = DAL.GetCustomers(null);
   // convert the DataTable into business objects
   IList<Customer> customers = ConvertDataTable(data);
   // --------------------------------------------
   // do business staff
   // --------------------------------------------
   // convert the business objects into a DataTable to return to DAL
   data = ConvertCustomers(customers);
   DAL.SaveCustomers(data);
}
// --------------------------------------------------------------------
private IList<Customer> ConvertDataTable(DataTable data) {
   // convert DataTable into business objects
   List<Customer> customers = new List<Customer>();
   foreach (var row in data.AsEnumerable()) {
      Customer cust = new Customer();
      cust.Id = (int)row["Id"];
      cust.Name = (string)row["Name"];
      customers.Add(cust);
   }
   return customers;
}
// --------------------------------------------------------------------
private DataTable ConvertCustomers(IEnumerable<Customer> customers) {
   // convert business objects into DataTable
   DataTable data = new DataTable();
   data.Columns.Add("Id", typeof(int));
   data.Columns.Add("Name", typeof(string));
   foreach (var cust in customers) {
      DataRow row = data.NewRow();
      row["Id"] = cust.Id;
      row["Name"] = cust.Name;
      data.Rows.Add(row);
   }
   return data;
}

Reflection

Another way to skin the cat is to use .NET (or Java) Reflection features to automate the mapping. The DAL gets the business object types from meta data (like a config file) and handles all the business object creation and field mapping dynamically.

The advantage of using reflection is, there is very less code to be written and it can be reused for many projects.

A disadvantage of reflection is bad runtime performance. Customizations, like differences between data source and business object structures are very difficult to be implemented and error investigations become awkward.

// ====================================================================
// DAL
// the type of the destination business object (usually, from config)
private Type EntityType { get { return typeof(Customer); } }
// --------------------------------------------------------------------
public IList GetCustomers(object criteria) {
   List<object> result = new List<object>();
   using (IDataReader reader = CreateReader(criteria)) {
      while (reader.Read()) {
         // get empty constructor
         object entity = EntityType.GetConstructor(Type.EmptyTypes).Invoke(new object[0]);
         // fill all DB data into object properties (with same name)
         for (int i = 0; i < reader.FieldCount; i++) {
            string name = reader.GetName(i);
            PropertyInfo prop = EntityType.GetProperty(name);
            prop.SetValue(entity, reader.GetValue(i), null);
         }
         result.Add(entity);
      }
   }
   return result;
}
// --------------------------------------------------------------------
public void SaveCustomers(IList data) { /* do save*/ }
// ====================================================================
// BLL
public void DoFoo() {
   // get result from DAL
   IList untyped = DAL.GetCustomers(null);
   // convert into typed list
   List<Customer> customers = untyped.Cast<Customer>().ToList();
   // --------------------------------------------
   // do business staff
   // --------------------------------------------
   DAL.SaveCustomers(customers);
}

Data Transfer Objects

Data Transfer Objects (DTOs) are a very clean and type safe solution for data exchange between the layers. They are usually defined within the DAL, will be filled there and mapped into the destination business objects in the BLL.

The advantages of DTOs are the strong typing and a a good maintainibility. Any kind of mapping customizations are simple to be done.

The disadvantage is the huge amount of simple mapping code (= unhappy developer) within the BLL to convert received DTOs into business objects and back into DTOs for saving concerns. The best solution to avoid writing all the mapping code is using some kind of souce code generation like a CASE tool.

// ====================================================================
// DAL
// a customer DTO object for data exchange
public class CustomerDTO {
   public int Id { get; set; }
   public string Name { get; set; }
}
// --------------------------------------------------------------------
public IList<CustomerDTO> GetCustomers(object criteria) {
   // create list of DTOs to exchange result data
   IList<CustomerDTO> result = new List<CustomerDTO>();
   using (IDataReader reader = CreateReader(criteria)) {
      while (reader.Read()) {
         // create DTOs
         CustomerDTO dto = new CustomerDTO();
         dto.Id = (int)reader["Id"];
         dto.Name = (string)reader["Name"];
         result.Add(dto);
      }
   }
   return result;
}
// --------------------------------------------------------------------
public void SaveCustomers(IEnumerable<CustomerDTO> dtos) { /* do save*/ }
// ====================================================================
// BLL
public void DoFoo() {
   // get DTOs
   IList<CustomerDTO> dtos = DAL.GetCustomers(null);
   // convert into business objects
   IList<Customer> customers = ConvertDTOs(dtos);
   // --------------------------------------------
   // do business staff
   // --------------------------------------------
   // convert back into DTOs to exchange with DAL
   dtos = ConvertCustomers(customers);
   DAL.SaveCustomers(dtos);
}
// --------------------------------------------------------------------
private IList<Customer> ConvertDTOs(IList<CustomerDTO> dtos) {
   // convert DTOs into business objects
   List<Customer> customers = new List<Customer>();
   foreach (var dto in dtos) {
      Customer cust = new Customer();
      cust.Id = dto.Id;
      cust.Name = dto.Name;
      customers.Add(cust);
   }
   return customers;
}
// --------------------------------------------------------------------
private IList<CustomerDTO> ConvertCustomers(IEnumerable<Customer> customers) {
   // convert business objects into DTOs
   IList<CustomerDTO> dtos = new List<CustomerDTO>();
   foreach (var cust in customers) {
      CustomerDTO dto = new CustomerDTO();
      dto.Id = cust.Id;
      dto.Name = cust.Name;
      dtos.Add(dto);
   }
   return dtos;
}

Interfaces as DTOs

A last solution (which is my personal favorite) is to create an additional assembly (usually called Project.Core.dll) which contains only the interfaces that describe the business objects data properties. This library is references from both layers, DAL and BLL. It enables the DAL to assign all data directly to the destination objects, that are implemented within the BLL. To create new instances of objects you can use either any kind of Dependency Injection (DI) framework like the Unity application block from Microsoft Enterprise Library (MEL) or something like an IFactory<T> interface, defined in the Core library, which is used as an Abstract Factory and implemented within the BLL. To inject the DAL with the abstract factory, you can use DI or static code.

The advantages of this approach are all the advantages of DTOs. In addition there is way less source code to be written, since no additional mapping into business objects is needed; the DTOs are covered by the interfaces, which are realized by the business objects.

The disadvantages of this solution is the additional library, conaining the interfaces and an additional architectual complexity due to the DI framework and/or the factory class(es).

// ====================================================================
// Core library
// interface for Customer to exchange data
public interface ICustomer {
   int Id { get; set; }
   string Name { get; set; }
}
// --------------------------------------------------------------------
// factory interface to create new business objects from DAL
public interface IFactory<T> {
   T Create();
}
// ====================================================================
// DAL
public class CustomerMapper<T> : SamplesMapperBase
    where T : ICustomer {
   // the factory; injected at runtime by BLL or any kind of DI
   public IFactory<T> Factory { get; set; }

   public IList<T> GetCustomers(object criteria) {
      IList<T> result = new List<T>();
      using (IDataReader reader = CreateReader(criteria)) {
         while (reader.Read()) {
            // create business object instance from factory
            T customer = Factory.Create();
            customer.Id = (int)reader["Id"];
            customer.Name = (string)reader["Name"];
            result.Add(customer);
         }
      }
      return result;
   }

   public void SaveCustomers(IList<T> data) { /* do save*/ }
}
// ====================================================================
// BLL
// simple sample implementation for customer factory
class CustomerFactory : IFactory<Customer> {
   public Customer Create() {
      return new Customer();
   }
}
// --------------------------------------------------------------------
// instance of customer mapper. The generic parameter specifies the
// business objects type
public CustomerMapper<Customer> DAL { get; set; }
// --------------------------------------------------------------------
public void DoFoo() {
   // directly get the business objects from DAL
   IList<Customer> customers = DAL.GetCustomers(null);
   // --------------------------------------------
   // do business staff
   // --------------------------------------------
   DAL.SaveCustomers(customers);
}

Other Solutions

There might be myriad of other solutions to exchange data between those both layers, like XML or untyped Object-Arrays, though, in this blog I tried to show the (in my opinion) most popular and efficient.

<off topic>Long time ago since last blog... It's good to find some time for writing.</off topic>