Thursday, February 9, 2012

Tables are no Domain Objects: Data Type Transformations

This is the last post of a design basics series that focuses on circumstances where it can be useful to refine the structure of domain objects (aka entities) when they are dematerilized from a database (or other persistence storage). If you are interested you can find the whole list of posts on the bottom of the Introduction post.

In the previous part, Field Aggregations, I talked about cases where groups of two or more table columns can represent a logical collaboration. In this post we will have a look at opposed scenarios, where single table columns contain structured information. Depending on the purpose of the current project it can be useful to transform those columns into an object that provides a richer, more natural than simple strings or other base data types.

Binary Data

Whenever we find a column that stores binary data we should think about a transformation into something that is more reasonable in an object model. One reason is that binary data are often not wanted to be eager loaded, when the rest of the row is retrieved from the database, but are lazy loaded, when they are really needed. Therefore another interface than a simple byte array can provide a mechanism that loads the data only when they are accessed the first time.

Another reason, that makes binary data a good candidate for a transformation, is the fact that applications rarely really need a byte array but usually store and retrieve a specific kind of structured data. Often binary columns store pictures of products or people, documents like PDFs or text documents and so forth. (If there is another column that describes the documents extension or mime type you also found a candidate for a Field Aggregation.)

In case of pictures we should consider to return a System.Drawing.Image and encapsulate the code to initialize the image from the byte array into a distinct method. This not only provides a much more sophisticated interface but also avoids duplication of code if the image is needed at more than one position in our system.

When our column stores documents it depends on how the system works with them. If those documents are shown or edited by an embedded 3rd party component we could return something that is native supported by this library, what is often a System.IO.Stream or a file name. In case of a file name we could provide a class that writes the document byte array into a temp file and returns the name of the file. If the documents are meant to be shown or edited by their Windows default program we will usually need to write them into a temp file and do a shell-execute system call to open the default program.

Last but not least, the size of data we store can become really large. When working with a byte array interface we can run into major client and server resource issues if we hold, store, and retrieve the whole bulk of data in a single operation. A different interface can implement a streaming read/write functionality that helps to minimize those problems. (If you are working with SQL Server you might be interested in my SqlBinaryData class, which provides this functionality.)

Special Types Strings

Many entities consist of several string properties like names, addresses, descriptions and so forth and most of them are (technically) nothing more than text data to be stored. However, sometimes strings might need special validation or represent structures of sub-information that need to be addressed.

Here is a short list of string types that are candidates to be represented in a different way.

TypeExisting ClassDescription
Email AddressSystem.Net.Mail. MailAddressEmail addresses can contain the address and a display name and the host name can be useful for analysis or to aggregate customers or contacts.
Web SiteSystem.UriUsing the Uri class instead of a simple string not only gives several additional information like host name but also gives the advantage of a built in validation.
Phone NumberNone. Happy RegExA phone number can consist of country codes, regional codes the number and extensions for direct access. Cleaning phone numbers is a common and awkward(!) task when synchronizing systems or loading warehouses. If you find yourself in a project where you need to clean phone numbers start with a Google search for existing regular expressions.
Date/Time Offset or DurationSystem.TimeSpanIn our current project we had to store the duration of media content. Depending on the required accuracy, a duration can be stored as seconds/milliseconds/... but when shown to a user it is usually required to show time in a richer way instead of a 10 digit millisecond number.
Composite KeysNone.When working with decentralized databases it is a common approach to create object keys (identifiers) that consist of a division/location part and an incremental id that is only unique for the local database.

XML Property Bags

These days domain objects tend to specify more and more attributes. Ever seen a table with 100+ columns where at least 50 columns don't contain anything in 90% of the rows? Customers and business analysts can be really creative in defining tons of optional attributes for their orders, articles, contacts, employees, and so on. If those attributes are rarely available and not (or rare) search criteria they can be moved into a single XML column that represents a property bag. This not only makes the rest of the table much easier to read but also can cause a increased performance and eliminates the restriction of database page sizes.

However, while XML can be a good solution for the DBMS, it does not represent a very sophisticated interface for a domain object. The consumer of a business layer should not need to bother if a property is a table column or is stored in a XML property bag. Therefore the XML can be held inside of an object is responsible to easy get/set the properties.


Conclusion

As I stated in the introduction post of this series, it is impossible to give a halfway complete picture of things to consider when designing entities in a small blog series.

Nevertheless, I hope I've been able to show you a few things to bear on mind when designing the interface of your entities. The point is, the more natural and easy to use your domain objects are, the more you and your team can concentrate on developing functionality that provides a real business value. Due to the fact that domain objects tend to be excessively reused over broad parts of information systems an adequate effort in their initial design (or a refactoring) will return its invest quite soon.

Just keep in mind, the effort needs to be appropriate to its expected benefit. Use more time for often reused types of entities than less often used ones. Don't waste time in a (never) perfect designing if it most likely not give you a value.

One thing that is often called as conflict when talking about rich object designs and abstracted architectures is performance. I say this is (most of the time) incorrect, but this might become part of another post.

No comments:

Post a Comment