Thursday, February 9, 2012

Tables are no Domain Objects: Data Type Transformations

This is the last post of a design basics series that focuses on circumstances where it can be useful to refine the structure of domain objects (aka entities) when they are dematerilized from a database (or other persistence storage). If you are interested you can find the whole list of posts on the bottom of the Introduction post.

In the previous part, Field Aggregations, I talked about cases where groups of two or more table columns can represent a logical collaboration. In this post we will have a look at opposed scenarios, where single table columns contain structured information. Depending on the purpose of the current project it can be useful to transform those columns into an object that provides a richer, more natural than simple strings or other base data types.

Binary Data

Whenever we find a column that stores binary data we should think about a transformation into something that is more reasonable in an object model. One reason is that binary data are often not wanted to be eager loaded, when the rest of the row is retrieved from the database, but are lazy loaded, when they are really needed. Therefore another interface than a simple byte array can provide a mechanism that loads the data only when they are accessed the first time.

Another reason, that makes binary data a good candidate for a transformation, is the fact that applications rarely really need a byte array but usually store and retrieve a specific kind of structured data. Often binary columns store pictures of products or people, documents like PDFs or text documents and so forth. (If there is another column that describes the documents extension or mime type you also found a candidate for a Field Aggregation.)

In case of pictures we should consider to return a System.Drawing.Image and encapsulate the code to initialize the image from the byte array into a distinct method. This not only provides a much more sophisticated interface but also avoids duplication of code if the image is needed at more than one position in our system.

When our column stores documents it depends on how the system works with them. If those documents are shown or edited by an embedded 3rd party component we could return something that is native supported by this library, what is often a System.IO.Stream or a file name. In case of a file name we could provide a class that writes the document byte array into a temp file and returns the name of the file. If the documents are meant to be shown or edited by their Windows default program we will usually need to write them into a temp file and do a shell-execute system call to open the default program.

Last but not least, the size of data we store can become really large. When working with a byte array interface we can run into major client and server resource issues if we hold, store, and retrieve the whole bulk of data in a single operation. A different interface can implement a streaming read/write functionality that helps to minimize those problems. (If you are working with SQL Server you might be interested in my SqlBinaryData class, which provides this functionality.)

Special Types Strings

Many entities consist of several string properties like names, addresses, descriptions and so forth and most of them are (technically) nothing more than text data to be stored. However, sometimes strings might need special validation or represent structures of sub-information that need to be addressed.

Here is a short list of string types that are candidates to be represented in a different way.

TypeExisting ClassDescription
Email AddressSystem.Net.Mail. MailAddressEmail addresses can contain the address and a display name and the host name can be useful for analysis or to aggregate customers or contacts.
Web SiteSystem.UriUsing the Uri class instead of a simple string not only gives several additional information like host name but also gives the advantage of a built in validation.
Phone NumberNone. Happy RegExA phone number can consist of country codes, regional codes the number and extensions for direct access. Cleaning phone numbers is a common and awkward(!) task when synchronizing systems or loading warehouses. If you find yourself in a project where you need to clean phone numbers start with a Google search for existing regular expressions.
Date/Time Offset or DurationSystem.TimeSpanIn our current project we had to store the duration of media content. Depending on the required accuracy, a duration can be stored as seconds/milliseconds/... but when shown to a user it is usually required to show time in a richer way instead of a 10 digit millisecond number.
Composite KeysNone.When working with decentralized databases it is a common approach to create object keys (identifiers) that consist of a division/location part and an incremental id that is only unique for the local database.

XML Property Bags

These days domain objects tend to specify more and more attributes. Ever seen a table with 100+ columns where at least 50 columns don't contain anything in 90% of the rows? Customers and business analysts can be really creative in defining tons of optional attributes for their orders, articles, contacts, employees, and so on. If those attributes are rarely available and not (or rare) search criteria they can be moved into a single XML column that represents a property bag. This not only makes the rest of the table much easier to read but also can cause a increased performance and eliminates the restriction of database page sizes.

However, while XML can be a good solution for the DBMS, it does not represent a very sophisticated interface for a domain object. The consumer of a business layer should not need to bother if a property is a table column or is stored in a XML property bag. Therefore the XML can be held inside of an object is responsible to easy get/set the properties.


Conclusion

As I stated in the introduction post of this series, it is impossible to give a halfway complete picture of things to consider when designing entities in a small blog series.

Nevertheless, I hope I've been able to show you a few things to bear on mind when designing the interface of your entities. The point is, the more natural and easy to use your domain objects are, the more you and your team can concentrate on developing functionality that provides a real business value. Due to the fact that domain objects tend to be excessively reused over broad parts of information systems an adequate effort in their initial design (or a refactoring) will return its invest quite soon.

Just keep in mind, the effort needs to be appropriate to its expected benefit. Use more time for often reused types of entities than less often used ones. Don't waste time in a (never) perfect designing if it most likely not give you a value.

One thing that is often called as conflict when talking about rich object designs and abstracted architectures is performance. I say this is (most of the time) incorrect, but this might become part of another post.

Friday, February 3, 2012

Tables Are No Domain Objects: Field Aggregations

This is the fourth part of the series about "Tables Are No Domain Objects". After an Introduction the other parts covered Relation Transformations and Many To Many Relations.

In this post I want to talk about field aggregations. By nature table columns are arranged side by side on same hierarchy level. Sometimes there are two or more columns that are tightly coupled with each other, where one (or more) is useless with out the others. Whenever we find collaborations of columns in a table we also found something to consider to aggregate into own classes when dematerializing into our object model. This can avoid duplicate code, centralize logic, and effect more natural object interfaces.

Person Tables

Tables containing person information often contain columns that are more or less related to each other. I'm sure many of you have already seen databases with one or more tables like those shown here.


When designing the domain objects we could create two classes where each contains all columns of the based table, but there are two collaborations of columns that could be considered to be aggregated into own classes.


This not only enables us to reuse helper methods like those I denoted in the PersonName class, but also enables us to easily reuse other methods without designing methods that take many parameters (one for each property). Consider a method that creates a letterhead for a payment check (for employees) or a delivery note (for customers).

Money

At least since Martin Fowlers' legendary book Patterns Of Enterprise Application Architecture where he introduced the Money Pattern we know that calculating with money can be really awkward .

Since Mr. Fowler did a great job in his book I'll not try to explain the whole pattern but only give you a short summary. Apart from all the issues with multiplication and division of money values the pattern describes the relation between an amount column and a currency column. The following diagram shows a possible table structure.


By applying the Money pattern we will get the class structure shown in the diagram below, where all calculation logic is moved into a separate Money class. (BTW: You can find a really good .NET implementation at Code Project: A Money type for the CLR.)


If we would need to calculate with money values of different currencies we would even need a further class like a CurrencyCalculator, but this is far out of scope of this post.

System Information

Often database tables not only store business data but also contain different kinds of additional system information that might be less (or not) business relevant but might be needed for support, analysis or other investigations.

Here is a short list of possible system information columns.

NameShort Description
CreationDate The date/time where the row was inserted into the table.
LastUpdate The date/time of the rows last update. If there have not been any updates yet this value is often either NULL or equal to the creation date.
Creator A key that describes the employee who caused the insertion of the row.
LastUpdater A key that describes the employee who caused the last update of the row.
CreationHost The host computer that caused the insertion of the row.
LastUpdateHost The computer that caused the last update of the row.
Version The current version of the row, if the data in the table are version controlled.

And here are two possible database tables.


Moving the system information into an own class will give us entities that are easier to use, because they really focus on the business, not on the system.


In addition, keeping the properties in each of our entities can cause a remarkable amount of duplicated code or reflection based solutions that are hard to maintain, slow at runtime and not compiler checked.

Conclusion

I hope this post showed you another reason why it is not always the best way to design a domain object model with exactly the same structure as a database.

The easiest way to find candidates for field aggregations is to keep reviewing the table columns or class properties and search for columns/properties that are related to each other.

Wednesday, September 28, 2011

Domain Objects And Many To Many Relations

Today I want to discuss different kinds of many to many relations and cases where it can make sense to transform them into different structures when they become loaded from a database into an object graph of a business layer. For sake of straightforwardness this post focuses on Microsoft Entity Framework and O/R-Mappers in general. Let me know, if one is interested in how to handle many to many relations in native ADO.NET.

This is the third post of a blog series about designing domain objects in a business layer and the second part that gives attention to transformation of table structures into object structures. The first post "Tables Are No Domain Objects" gave an introduction to this series and showed some reasons why it can make sense to abstract O/R-Mappers in layers above the data access layer (DAL). In the second part "Tables Are No Domain Objects: Table Relation Transformations Part 1" we discussed foreign key fields, aggregations and status objects.

Basics

A many to many relation is given when two objects are related to each other and each of them can be referenced to more than one object (rows in the database) on the other side.

An example for a many to many relation is the relation between articles and their categories. Each category can be related to many articles, like a category "food" that references apples, pies and meat. On the other hand, an article "apple" can be categorized as "food" and "healthy".

In an object model a many to many relation is exposed by two domain objects where each contains a collection of objects of objects of the other type. Since databases don't provide complex column types, like lists many to many relations are realized by putting an intermediate link table between the tables.

Simple Many To Many Relations

A simple many to many relation is given whenever the link table consists of nothing but the foreign keys which point to the rows of the two tables to be related to each other.


When working with native ADO.NET many to many relations are always a bit tricky but can, for sure, be handled. I'll focus on O/R-Mappers for now.

When working with a common O/R-Mapper simple many to many relations are usually automatically transformed by the mapper. The link table stays hidden inside of the mapper each of our two objects can provide a list of objects of the other type.

public partial class Article {
   public IList<Category> Categories { get; set; }
}

public partial class Category {
   public IList<Article> Articles { get; set; }
}
The ORM knows all values to be inserted into our link table and doesn't need to annoy clients of our business layer with this table. If you are at the beginning of a project and your O/R-Mapper does not support simple many to many relations, I'd suggest to consider another mapper.

Complex Many To Many Relations

A complex many to many relation is given when the link table contains any additional columns which are not the foreign keys of our domain objects base tables.


With this link table an O/R-Mapper like Entity Framework run into trouble. It is unable to fill our creation date column without an intermediate domain object that does nothing but hold the additional column. Our two domain objects will look like this.

public partial class Article {
   public List<ArticleCategory> ArticleCategories { get; set; }
}

public partial class Category {
   public List<ArticleCategory> ArticleCategories { get; set; }
}
This might be fine for EF but usually that's not how we want to work with our objects in the main part of our system. Often columns like a creation date are only used for support or reporting purposes and we don't want to think about the odd ArticleCategory object when adding new operations features.

Without some refining of our domain objects we will be forced to implement every access of an articles categories like this.
Article article = GetArticle();
var categories = from acl in article.ArticleCategories
                 select acl.Category;

// process the article and its categories
It is not only unnatural to need to always access the intermediate object to get what we are really looking for but also a causes a tight coupling between our domain objects and the underlying database table structure. Worst thing would be if we started up with a simple many to many relation between articles and categories and a new requirement causes the need of the creation date column - and the resulting ArticleCategory object. Without some architectural effort we might have to refactor larger parts of our existing source code. Luckily, there are a few things we can do.

The easiest way to hide the relation object is to define the ArticleCategories property as private and provide a few methods that give us the opportunity to directly work with the referenced entities.
public partial class Article {
   public IEnumerable<Category> GetCategories() {
      return ArticleCategories.Select(acl => acl.Category);
   }

   public void AddCategory(Category category) {
      ArticleCategory categoryLink = new ArticleCategory();
      categoryLink.CreationDate = DateTime.Now;
      categoryLink.Article = this;
      categoryLink.Category = category;
      ArticleCategories.Add(categoryLink);
   }

   public void RemoveCategory(Category category) {
      var categoryLink = ArticleCategories.Where(
                           item => category.Equals(item.Category)).FirstOrDefault();
      if (categoryLink != null)
         ArticleCategories.Remove(categoryLink);
   }
}
// =========================================
// sample usage
Article article = GetArticle();

var categories = article.GetCategories();
// process categories

article.AddCategory(GetCateory());

Apart from the fact that we provide a more natural access to our categories, this also causes an architecture that is robuster for possible future changes - like additional fields in our link table.

If we want to go one step further we can provide an even more sophisticated interface to access our (indirectly) referenced domain objects. Unfortunately we cannot use a simple List<T> and copy all categories into it because our ArticleCategories list would not become affected by any add/remove calls. This makes also impossible to use a simple LINQ query that transforms the ArticleCategory objects into categories.

However, what we can do is implement a custom IList<T> that transforms a list of objects of one type into other objects by utilizing a provided delegate. In our case we need to transform a list of ArticleCategory objects into categories.

The following snipped shows how such a list could work.

public class TransformationList<T, TResult> : IList<TResult> {
   private IList<T> _list;
   private Func<T, TResult> _transform;
   private Func<TResult, T> _factory;

   // Constructor that creates a read-only version of the list
   public TransformationList(IList<T> list, 
                             Func<T, TResult> transformation)
      : this(list, transformation, null) {
   }
   // Constructor that creates a writable version of the list
   public TransformationList(IList<T> list, 
                             Func<T, TResult> transformation, 
                             Func<TResult, T> factory) {
      _list = list;
      _transform = transformation;
      _factory = factory;
   }

   // Indexer access
   public TResult this[int index] {
      get { return _transform(_list[index]); }
      set {
         EnsureWritable();
         _list[index] = _factory(value);
      }
   }

   // Count property works like a proxy
   public int Count { get { return _list.Count; } }

   // The list is read-only if no factory method provided
   public bool IsReadOnly { get { return _factory != null; } }

   // Ensures that the list is writable and uses the factory method to create a new item
   public void Insert(int index, TResult item) {
      EnsureWritable();
      _list.Insert(index, _factory(item));
   }

   // Read-only method uses the transformation method
   public bool Contains(TResult item) {
      return _list.Where(i => item.Equals(_transform(i))).Any();
   }

   // ensure that the list is writable
   private void EnsureWritable() {
      if (IsReadOnly)
         throw new InvalidOperationException("List is read only");
   }

   // and so forth...
}

The second constructor, which gets a second delegate as factory method makes the list writable and enables us to add new objects from outside without knowing that another, hidden object becomes materialized inside of our transformation list.

This (reusable!) class makes us able to provide a our articles categories with a nice IList<Category> property.
public partial class Article {
   private IList<Category> _categories;

   public IList<Category> Categories {
      get {
         if (_categories == null)
            _categories = 
               new TransformationList<ArticleCategory, Category>(
                     ArticleCategories, 
                     (acl) => acl.Category,
                     (c) => AddCategory(c));
         return _categories;
      }
      set { _categories = value; }
   }

   public ArticleCategory AddCategory(Category category) {
      ArticleCategory acl = new ArticleCategory();
      acl.CreationDate = DateTime.Now;
      acl.Article = this;
      acl.Category = category;
      ArticleCategories.Add(acl);
      return acl;
   }
}
// =========================================
// sample usage
Article article = GetArticle();

foreach (var category in article.Categories) {
   // process categories
}

article.Categories.Add(GetCateory());

Conclusion

Simple many to many relations are usually easy to work with, but even if an O/R-Mapper shows some weakness in its mapping features, we are still able to provide a reasonable interface to clients of our business layer and its domain objects.

Outlook

In the next part of this series we will look at version controlled data, what challenges they can could cause and ways to get them handled.

Tuesday, September 6, 2011

Tables Are No Domain Objects: Table Relation Transformations Part 1

This is the second part of a blog series 'Tables Are No Domain Objects'. In this post we will discuss where database relations are good candidates to be accessed in a different manner when they are represented by domain objects of our business layer.

The most obvious kind of a database table relation is a one (A) to many (B) relation where rows in table B hold a foreign key column that points to a unique key (usually the primary key) of table A. Most data access layers, based on an O/R-Mapper or custom mappings, do a good job to map this kind of database relations into objects, but there are some cases where it can make sense to transform those relations into a different structure or provide a different access than given by our database.

Foreign Key Fields And Transparent Database Relations

Before we step into more specific types of relations, there is one very basic thing where each of us should think about when starting to design a new business layer. In a database relations are always represented by foreign keys but when data are loaded into a object structure we can use object references, so we don't really need those foreign key fields as part of our objects. For instance, a sales order line object does not need to hold the ID of its parent sales order, it can hold a object reference of the sales order. One good reason to keep foreign key fields present our in domain objects is to have some additional logging and debugging information. However, we should never use those fields to implement any business logic on them, instead all business logic should always be implemented on the corresponding object references. (Very rare exceptions prove the rule though.)

This was already discussed in the previous blog post but should be recalled for sake of completeness. O/R-Mapper like Entity Framework or NHibernate provide a powerful query interface to access data, but using those queries in our business layer or presentation layers will cause a tight coupling between our source code and the database structure. Apart from other issues, discussed in the other post, queries like this can become a issue if we ever need to refactor our database structure or domain objects.

var orders = from o in efContext.Orders
             where o.CustomerId == currentCustomer.Id
             select o;

foreach (var order in orders) {
   // process order
}
Instead of this it is usually much safer to provide strong typed access methods out of our data access layer.
var orders = myDataContext.Orders.GetForCustomer(currentCustomer);

foreach (var order in orders) {
   // process order
}
Please read the previous post (Tables Are No Domain Objects Part 1) to see further issues, especially when using Entity Framework.

Aggregations

In general I'm not as restrictive as other architects, who say it is always a bad solution to access any related objects of a current object reference, but when it comes to aggregations it can sometimes be dangerous to be done from outside of the class that holds the objects to be aggregated.

One of the most common examples for a an aggregation that we should consider to encapsulate is when we have to calculate the price of a sales order that is based on the price of its line items.

SalesOrder order = GetOrder();
decimal orderPrice = 
   order.SalesOrderLines.Sum(line => line.ArticlePrice * line.ItemCount);
From the very beginning of a new system this could work pretty nice. The problem is, what if the calculation of the sales orders price ever changes? Salespeople are creative to find new ways to sell the companies products and usually it is only a matter of time when discount features become required. Discounts can be a special offer for specific articles or article categories, a graduated discount depending on the orders all round price or many other types. Now we can run into trouble if we do a outside calculation of a sales orders price. A better solution is to put the aggregation into the sales order class.

public partial class SalesOrder {
   public decimal GetPrice() {
      return SalesOrderLines.Sum(line => line.ArticlePrice * line.ItemCount);
   }
}
For now we only encapsulated the calculation that we have done from outside (what already avoids a duplication of logic to multiply with the sales lines item count) but when it comes to discounting we don't need to scan our whole source code to find all places where an orders price is calculated. We only have to adapt the body of our SalesOrder.GetPrice method and the rest of the system doesn't even notice the new calculation.

(The approach to do money calculations with decimal becomes part of a subsequent blog of this series.)

Status Objects

Status objects are special kinds of domain objects that describe the current status of their parent objects. They usually exist in a collaboration of their parent domain object and a description object that describes the current status.


In addition to providing a current status of their parent object, status objects are often used to log an operational history of an object since they are usually not deleted or updated after their first creation. Since each new status object can change the state of its parent they are often very important rules for the processing of an object.

As an example, say we have a parent SalesOrder domain object that can hold a list of SalesOrderStats objects where each of the status is described by a referenced SalesOrderStatusDescription object. Now what if we want to know if an orders current status is "Closed"? Without some design effort we would have to do something like this.
public partial class SalesOrderStatusDescription {
   // status description code constants
   public const string ClosedCode = "Closed";
   // ...
}
// =========================================
// sample usage
SalesOrder order = GetOrder();

bool isClosed = (from status in order.SalesOrderStatus
                 where status.OrderId = order.Id
                 orderby status.CreationDate descending
                 select status.SalesOrderStatusDescription.Code)
                 .First()
                 == SalesOrderStatusDescription.ClosedCode;
Apart from the fact that this causes is tight coupling between three different domain objects and their base tables, it would be crap if we always would have to do so in upper layers, just to get an objects current state.

A first thing we can do to prettify this is introduce an enum that either represents the possible codes of our SalesOrderStatusDescription objects or represents the possible foreign key values pointing to the SalesOrderStatusDescription primary keys. Since we would need to always load the descriptions to parse the code fields, we will do the foreign key solution, what causes a lower database utilization. Yes, I know we should try to never base any functionality on foreign key values but I tend to see this as one of the valid exceptions. Our descriptions IDs are usually immutable and it does not make a big difference if our source code is coupled to the Code column of the status description or its primary key.

public enum SalesOrderStatusCode : int {
   Created = 1,
   Approved = 2,
   Delivered = 3,
   Payed = 4,
   Closed = 5,
}
Next step we can do is add a new property to our sales order status that represents the value of the enum. Unfortunately Entity Framework does not provide native support for enums, so we need to do workaround by casting the foreign keys value.
public partial class SalesOrderStatus {
   public SalesOrderStatusCode Code {
      get { return (SalesOrderStatusCode)SalesOrderStatusDescriptionId; }
      set { SalesOrderStatusDescriptionId = (int)value; }
   }
}
Okay, now we are able to shorten the previous snippet a little bit, but without one more method we would still need to traverse the list of all existing status whenever we want to know the current one. Since the current status of an object is usually a widely needed information we should add a method to our sales order that encapsulates the traversing returns the code of the current status.

public partial class SalesOrder {
   public SalesOrderStatusCode GetCurrentStatusCode() {
      return (from status in SalesOrderStatus
              orderby status.CreationDate descending
              select status.Code)
              .First();
   }
}
// =========================================
// sample usage
SalesOrder order = GetOrder();
bool isClosed = order.GetCurrentStatusCode() == SalesOrderStatusCode.Closed;
This interface is much niftier and will make life much easier in client code.

As an optional, last step we could add a IsClosed method to our order. I use to do this only for the most important states of an object though.
public partial class SalesOrder {
   public bool IsClosed() {
      return GetCurrentStatusCode() == SalesOrderStatusCode.Closed;
   }
}
// =========================================
// sample usage
SalesOrder order = GetOrder();
bool isClosed = order.IsClosed();
Now our sales order provides a really handy interface that helps us to concentrate on other things when implementing features that need to work with the sales order status.

Last but not least, we should add a corresponding method set the new status of an order.
public partial class SalesOrder {
   public void SetStatus(SalesOrderStatusCode code) {
      SalesOrderStatus status = new SalesOrderStatus();
      status.CreationDate = DateTime.Now;
      status.Code = code;
      status.SalesOrder = this;
      SalesOrderStatus.Add(status);
   }
}
There is one line in this method that could cause problems. Setting the CreationDate with by using the local hosts time is only safe if we are sure that all client PCs are configured with the same time server, otherwise we can get deflections of the creation date of new status. Since the creation date is essential for these objects this could cause issues in production. One thing we can do is to use the time from a central server, like the database server, instead of trusting the clients.

Since there are usually much more places where we need to know the current status of an object than places where a status of an object becomes changed I tend to add less strong typed set methods like SetPayed().

As we have seen, due to their importance and the complicated access, status objects are usually good candidates to be handled in a very different way than they are stored in our database and some architectural effort to get them into a more fashionable, object-oriented structure can be a good investment.

Performance Tuning. Since this series concentrates on designing our domain objects I kept this until now, but our current solution requires to always retrieve all existing status objects from the database server, what causes a unneeded network traffic and database utilization. We should consider to add a method to our data access layer that loads only the current status, description ID or description code, instead of loading all status objects if not yet loaded anyway. This is another important reason why we should encapsulate the get method, since we need to change only one place.

Outlook

In the next post we will continue the discussion of table relation transformations.

We will have a look at many to many relations where we might need to handle the weakness of O/R-Mappers.

As last part of the discussion about table relation transformations we will have a look at versioned data.