Sunday, May 30, 2010

Streaming Serialization with XmlSerializer

As known, .NET provides several different ways to serialize and de-serialize objects. One of those serialization techniques is the XmlSerializer class. Generally this class provides a nice, straight forward approach. However, there is one problem with this (and most other) serialization classes, it does not support streaming. If you want to (de-)serialize really large counts of objects, you might be run into physical memory problems.

One solution is to implement the IXmlSerializable interface, which exposes the ReadXml(XmlReader) and WriteXml(XmlWriter) operations. Though, this ends in quiet a bunch of complicated code.

To handle the memory problem when using the XmlSerializer class, here you can find two simple wrapper classes which provide a streaming functionality.

XmlStreamingSerializer

The XmlStreamingSerializer creates an internal instance of a XmlSerializer and a XmlWriter which provides the persistence management. To avoid the "xsi" and "xsd" namespaces again and again for each object to be serialized, it simply provides an empty XmlSerializerNamespaces. I found that trick in one of the comments on Scott Hanselman's blog about XmlFragmentWriter.
public class XmlStreamingSerializer<T> {
   // ------------------------------------------------------------------
   static XmlStreamingSerializer() {
      _ns = new XmlSerializerNamespaces();
      _ns.Add("", "");
   }
   // ------------------------------------------------------------------
   private XmlStreamingSerializer() {
      _serializer = new XmlSerializer(typeof(T));
   }
   // ------------------------------------------------------------------
   public XmlStreamingSerializer(TextWriter w)
      : this(XmlWriter.Create(w)) {
   }
   // ------------------------------------------------------------------
   public XmlStreamingSerializer(XmlWriter writer) : this() {
      _writer = writer;
      _writer.WriteStartDocument();
      _writer.WriteStartElement("ArrayOf" + typeof(T).Name);
   }
   // ==================================================================
   static XmlSerializerNamespaces _ns;
   XmlSerializer _serializer = new XmlSerializer(typeof(T));
   XmlWriter _writer;
   bool _finished;
   // ==================================================================
   public void Finish() {
      _writer.WriteEndDocument();
      _writer.Flush();
      _finished = true;
   }
   // ------------------------------------------------------------------
   public void Close() {
      if (!_finished)
         Finish();
      _writer.Close();
   }
   // ------------------------------------------------------------------
   public void Serialize(T item) {
      _serializer.Serialize(_writer, item, _ns);
   }
}

XmlStreamingDeserializer

As the serializer, the XmlStreamingDeserializer class wraps an instance of a .NET XmlSerializer. It uses a XmlReader to provide the streaming functionality and utilizes the XmlReader.ReadSubtree method to get the current serialized item into the XmlSerializer.
public class XmlStreamingDeserializer<T> {
   // ------------------------------------------------------------------
   static XmlStreamingDeserializer() {
      _ns = new XmlSerializerNamespaces();
      _ns.Add("", "");
   }
   // ------------------------------------------------------------------
   private XmlStreamingDeserializer() {
      _serializer = new XmlSerializer(typeof(T));
   }
   public XmlStreamingDeserializer(TextReader reader)
      : this(XmlReader.Create(reader)) {
   }
   public XmlStreamingDeserializer(XmlReader reader) : this() {
      _reader = reader;
   }
   // ==================================================================
   static XmlSerializerNamespaces _ns;
   XmlSerializer _serializer = new XmlSerializer(typeof(T));
   XmlReader _reader;
   // ==================================================================
   public void Close() {
      _reader.Close();
   }
   // ------------------------------------------------------------------
   public T Deserialize() {
      while (_reader.Read()) {
         if (_reader.NodeType == XmlNodeType.Element
               && _reader.Depth == 1
               && _reader.Name == typeof(T).Name) {
            XmlReader reader = _reader.ReadSubtree();
            return (T)_serializer.Deserialize(reader);
         }
      }
      return default(T);
   }
}

Some Tests

Here you can find some performance test results.

I serialized a very simple object structure shown below.
public class Foo {
   [XmlAttribute]
   public int Id { get; set; }
   [XmlAttribute]
   public string Bar { get; set; }
   public List<foo> SubFoos { get; set; }
}
I serialized 10,000 instances, each with 100 sub items. Maybe you don't have to serialize so many objects, but the depth of serialized objects is often far deeper than two levels.
ActionDuration (ms)RAM (MB)
Serialization with XmlSerializer2954134
Serialization with XmlStreamingSerializer239113
De-serialization with XmlSerializer3662150
De-serialization with XmlStreamingDeserializer295313
Those test results are not representative for any purpose. Especially the apparently faster processing of the streaming classes seems to be a bit curious to me. Nevertheless, the more important factor in this test results it the far less memory requirement (about 10% here).

Possible Extensions

Both classes, shown above are quiet simple and there are several possible extensions to get them more powerful.
  • One class for both. If you prefer one serialization class for serialization and de-serialization, feel free to merge both into one. I preferred the two-class solution to keep each of them more pure.
  • Configurable document element name. Since now, the serializer creates a hard-coded document element name called "ArrayOfMyType". This behavior matches to the XmlSerializer behavior, when serializing an IEnumerable<out T>. Feel free make this
  • Xml Namespaces. The above serialization classes doe not support XML namespaces. If you need namespaces just add something like a public XmlSerializerNamespaces property.
  • Different objects. XML files often contain more than one type of objects. Support of different object types could easily be added by a dictionary of serializers.
The intension of this blog was not to present a fully dressed streaming XML framework. I tried to a prove of concept for how to handle large amounts of objects in combination with the .NET XmlSerializer class.

Attachments

Here you can find the XmlStreamingSerializer and XmlStreamingDeserializer classes as well as a simple console application I used for my tests.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.