Thursday, February 9, 2012

Tables are no Domain Objects: Data Type Transformations

This is the last post of a design basics series that focuses on circumstances where it can be useful to refine the structure of domain objects (aka entities) when they are dematerilized from a database (or other persistence storage). If you are interested you can find the whole list of posts on the bottom of the Introduction post.

In the previous part, Field Aggregations, I talked about cases where groups of two or more table columns can represent a logical collaboration. In this post we will have a look at opposed scenarios, where single table columns contain structured information. Depending on the purpose of the current project it can be useful to transform those columns into an object that provides a richer, more natural than simple strings or other base data types.

Binary Data

Whenever we find a column that stores binary data we should think about a transformation into something that is more reasonable in an object model. One reason is that binary data are often not wanted to be eager loaded, when the rest of the row is retrieved from the database, but are lazy loaded, when they are really needed. Therefore another interface than a simple byte array can provide a mechanism that loads the data only when they are accessed the first time.

Another reason, that makes binary data a good candidate for a transformation, is the fact that applications rarely really need a byte array but usually store and retrieve a specific kind of structured data. Often binary columns store pictures of products or people, documents like PDFs or text documents and so forth. (If there is another column that describes the documents extension or mime type you also found a candidate for a Field Aggregation.)

In case of pictures we should consider to return a System.Drawing.Image and encapsulate the code to initialize the image from the byte array into a distinct method. This not only provides a much more sophisticated interface but also avoids duplication of code if the image is needed at more than one position in our system.

When our column stores documents it depends on how the system works with them. If those documents are shown or edited by an embedded 3rd party component we could return something that is native supported by this library, what is often a System.IO.Stream or a file name. In case of a file name we could provide a class that writes the document byte array into a temp file and returns the name of the file. If the documents are meant to be shown or edited by their Windows default program we will usually need to write them into a temp file and do a shell-execute system call to open the default program.

Last but not least, the size of data we store can become really large. When working with a byte array interface we can run into major client and server resource issues if we hold, store, and retrieve the whole bulk of data in a single operation. A different interface can implement a streaming read/write functionality that helps to minimize those problems. (If you are working with SQL Server you might be interested in my SqlBinaryData class, which provides this functionality.)

Special Types Strings

Many entities consist of several string properties like names, addresses, descriptions and so forth and most of them are (technically) nothing more than text data to be stored. However, sometimes strings might need special validation or represent structures of sub-information that need to be addressed.

Here is a short list of string types that are candidates to be represented in a different way.

TypeExisting ClassDescription
Email AddressSystem.Net.Mail. MailAddressEmail addresses can contain the address and a display name and the host name can be useful for analysis or to aggregate customers or contacts.
Web SiteSystem.UriUsing the Uri class instead of a simple string not only gives several additional information like host name but also gives the advantage of a built in validation.
Phone NumberNone. Happy RegExA phone number can consist of country codes, regional codes the number and extensions for direct access. Cleaning phone numbers is a common and awkward(!) task when synchronizing systems or loading warehouses. If you find yourself in a project where you need to clean phone numbers start with a Google search for existing regular expressions.
Date/Time Offset or DurationSystem.TimeSpanIn our current project we had to store the duration of media content. Depending on the required accuracy, a duration can be stored as seconds/milliseconds/... but when shown to a user it is usually required to show time in a richer way instead of a 10 digit millisecond number.
Composite KeysNone.When working with decentralized databases it is a common approach to create object keys (identifiers) that consist of a division/location part and an incremental id that is only unique for the local database.

XML Property Bags

These days domain objects tend to specify more and more attributes. Ever seen a table with 100+ columns where at least 50 columns don't contain anything in 90% of the rows? Customers and business analysts can be really creative in defining tons of optional attributes for their orders, articles, contacts, employees, and so on. If those attributes are rarely available and not (or rare) search criteria they can be moved into a single XML column that represents a property bag. This not only makes the rest of the table much easier to read but also can cause a increased performance and eliminates the restriction of database page sizes.

However, while XML can be a good solution for the DBMS, it does not represent a very sophisticated interface for a domain object. The consumer of a business layer should not need to bother if a property is a table column or is stored in a XML property bag. Therefore the XML can be held inside of an object is responsible to easy get/set the properties.


Conclusion

As I stated in the introduction post of this series, it is impossible to give a halfway complete picture of things to consider when designing entities in a small blog series.

Nevertheless, I hope I've been able to show you a few things to bear on mind when designing the interface of your entities. The point is, the more natural and easy to use your domain objects are, the more you and your team can concentrate on developing functionality that provides a real business value. Due to the fact that domain objects tend to be excessively reused over broad parts of information systems an adequate effort in their initial design (or a refactoring) will return its invest quite soon.

Just keep in mind, the effort needs to be appropriate to its expected benefit. Use more time for often reused types of entities than less often used ones. Don't waste time in a (never) perfect designing if it most likely not give you a value.

One thing that is often called as conflict when talking about rich object designs and abstracted architectures is performance. I say this is (most of the time) incorrect, but this might become part of another post.

Friday, February 3, 2012

Tables Are No Domain Objects: Field Aggregations

This is the fourth part of the series about "Tables Are No Domain Objects". After an Introduction the other parts covered Relation Transformations and Many To Many Relations.

In this post I want to talk about field aggregations. By nature table columns are arranged side by side on same hierarchy level. Sometimes there are two or more columns that are tightly coupled with each other, where one (or more) is useless with out the others. Whenever we find collaborations of columns in a table we also found something to consider to aggregate into own classes when dematerializing into our object model. This can avoid duplicate code, centralize logic, and effect more natural object interfaces.

Person Tables

Tables containing person information often contain columns that are more or less related to each other. I'm sure many of you have already seen databases with one or more tables like those shown here.


When designing the domain objects we could create two classes where each contains all columns of the based table, but there are two collaborations of columns that could be considered to be aggregated into own classes.


This not only enables us to reuse helper methods like those I denoted in the PersonName class, but also enables us to easily reuse other methods without designing methods that take many parameters (one for each property). Consider a method that creates a letterhead for a payment check (for employees) or a delivery note (for customers).

Money

At least since Martin Fowlers' legendary book Patterns Of Enterprise Application Architecture where he introduced the Money Pattern we know that calculating with money can be really awkward .

Since Mr. Fowler did a great job in his book I'll not try to explain the whole pattern but only give you a short summary. Apart from all the issues with multiplication and division of money values the pattern describes the relation between an amount column and a currency column. The following diagram shows a possible table structure.


By applying the Money pattern we will get the class structure shown in the diagram below, where all calculation logic is moved into a separate Money class. (BTW: You can find a really good .NET implementation at Code Project: A Money type for the CLR.)


If we would need to calculate with money values of different currencies we would even need a further class like a CurrencyCalculator, but this is far out of scope of this post.

System Information

Often database tables not only store business data but also contain different kinds of additional system information that might be less (or not) business relevant but might be needed for support, analysis or other investigations.

Here is a short list of possible system information columns.

NameShort Description
CreationDate The date/time where the row was inserted into the table.
LastUpdate The date/time of the rows last update. If there have not been any updates yet this value is often either NULL or equal to the creation date.
Creator A key that describes the employee who caused the insertion of the row.
LastUpdater A key that describes the employee who caused the last update of the row.
CreationHost The host computer that caused the insertion of the row.
LastUpdateHost The computer that caused the last update of the row.
Version The current version of the row, if the data in the table are version controlled.

And here are two possible database tables.


Moving the system information into an own class will give us entities that are easier to use, because they really focus on the business, not on the system.


In addition, keeping the properties in each of our entities can cause a remarkable amount of duplicated code or reflection based solutions that are hard to maintain, slow at runtime and not compiler checked.

Conclusion

I hope this post showed you another reason why it is not always the best way to design a domain object model with exactly the same structure as a database.

The easiest way to find candidates for field aggregations is to keep reviewing the table columns or class properties and search for columns/properties that are related to each other.