Wednesday, September 28, 2011

Domain Objects And Many To Many Relations

Today I want to discuss different kinds of many to many relations and cases where it can make sense to transform them into different structures when they become loaded from a database into an object graph of a business layer. For sake of straightforwardness this post focuses on Microsoft Entity Framework and O/R-Mappers in general. Let me know, if one is interested in how to handle many to many relations in native ADO.NET.

This is the third post of a blog series about designing domain objects in a business layer and the second part that gives attention to transformation of table structures into object structures. The first post "Tables Are No Domain Objects" gave an introduction to this series and showed some reasons why it can make sense to abstract O/R-Mappers in layers above the data access layer (DAL). In the second part "Tables Are No Domain Objects: Table Relation Transformations Part 1" we discussed foreign key fields, aggregations and status objects.

Basics

A many to many relation is given when two objects are related to each other and each of them can be referenced to more than one object (rows in the database) on the other side.

An example for a many to many relation is the relation between articles and their categories. Each category can be related to many articles, like a category "food" that references apples, pies and meat. On the other hand, an article "apple" can be categorized as "food" and "healthy".

In an object model a many to many relation is exposed by two domain objects where each contains a collection of objects of objects of the other type. Since databases don't provide complex column types, like lists many to many relations are realized by putting an intermediate link table between the tables.

Simple Many To Many Relations

A simple many to many relation is given whenever the link table consists of nothing but the foreign keys which point to the rows of the two tables to be related to each other.


When working with native ADO.NET many to many relations are always a bit tricky but can, for sure, be handled. I'll focus on O/R-Mappers for now.

When working with a common O/R-Mapper simple many to many relations are usually automatically transformed by the mapper. The link table stays hidden inside of the mapper each of our two objects can provide a list of objects of the other type.

public partial class Article {
   public IList<Category> Categories { get; set; }
}

public partial class Category {
   public IList<Article> Articles { get; set; }
}
The ORM knows all values to be inserted into our link table and doesn't need to annoy clients of our business layer with this table. If you are at the beginning of a project and your O/R-Mapper does not support simple many to many relations, I'd suggest to consider another mapper.

Complex Many To Many Relations

A complex many to many relation is given when the link table contains any additional columns which are not the foreign keys of our domain objects base tables.


With this link table an O/R-Mapper like Entity Framework run into trouble. It is unable to fill our creation date column without an intermediate domain object that does nothing but hold the additional column. Our two domain objects will look like this.

public partial class Article {
   public List<ArticleCategory> ArticleCategories { get; set; }
}

public partial class Category {
   public List<ArticleCategory> ArticleCategories { get; set; }
}
This might be fine for EF but usually that's not how we want to work with our objects in the main part of our system. Often columns like a creation date are only used for support or reporting purposes and we don't want to think about the odd ArticleCategory object when adding new operations features.

Without some refining of our domain objects we will be forced to implement every access of an articles categories like this.
Article article = GetArticle();
var categories = from acl in article.ArticleCategories
                 select acl.Category;

// process the article and its categories
It is not only unnatural to need to always access the intermediate object to get what we are really looking for but also a causes a tight coupling between our domain objects and the underlying database table structure. Worst thing would be if we started up with a simple many to many relation between articles and categories and a new requirement causes the need of the creation date column - and the resulting ArticleCategory object. Without some architectural effort we might have to refactor larger parts of our existing source code. Luckily, there are a few things we can do.

The easiest way to hide the relation object is to define the ArticleCategories property as private and provide a few methods that give us the opportunity to directly work with the referenced entities.
public partial class Article {
   public IEnumerable<Category> GetCategories() {
      return ArticleCategories.Select(acl => acl.Category);
   }

   public void AddCategory(Category category) {
      ArticleCategory categoryLink = new ArticleCategory();
      categoryLink.CreationDate = DateTime.Now;
      categoryLink.Article = this;
      categoryLink.Category = category;
      ArticleCategories.Add(categoryLink);
   }

   public void RemoveCategory(Category category) {
      var categoryLink = ArticleCategories.Where(
                           item => category.Equals(item.Category)).FirstOrDefault();
      if (categoryLink != null)
         ArticleCategories.Remove(categoryLink);
   }
}
// =========================================
// sample usage
Article article = GetArticle();

var categories = article.GetCategories();
// process categories

article.AddCategory(GetCateory());

Apart from the fact that we provide a more natural access to our categories, this also causes an architecture that is robuster for possible future changes - like additional fields in our link table.

If we want to go one step further we can provide an even more sophisticated interface to access our (indirectly) referenced domain objects. Unfortunately we cannot use a simple List<T> and copy all categories into it because our ArticleCategories list would not become affected by any add/remove calls. This makes also impossible to use a simple LINQ query that transforms the ArticleCategory objects into categories.

However, what we can do is implement a custom IList<T> that transforms a list of objects of one type into other objects by utilizing a provided delegate. In our case we need to transform a list of ArticleCategory objects into categories.

The following snipped shows how such a list could work.

public class TransformationList<T, TResult> : IList<TResult> {
   private IList<T> _list;
   private Func<T, TResult> _transform;
   private Func<TResult, T> _factory;

   // Constructor that creates a read-only version of the list
   public TransformationList(IList<T> list, 
                             Func<T, TResult> transformation)
      : this(list, transformation, null) {
   }
   // Constructor that creates a writable version of the list
   public TransformationList(IList<T> list, 
                             Func<T, TResult> transformation, 
                             Func<TResult, T> factory) {
      _list = list;
      _transform = transformation;
      _factory = factory;
   }

   // Indexer access
   public TResult this[int index] {
      get { return _transform(_list[index]); }
      set {
         EnsureWritable();
         _list[index] = _factory(value);
      }
   }

   // Count property works like a proxy
   public int Count { get { return _list.Count; } }

   // The list is read-only if no factory method provided
   public bool IsReadOnly { get { return _factory != null; } }

   // Ensures that the list is writable and uses the factory method to create a new item
   public void Insert(int index, TResult item) {
      EnsureWritable();
      _list.Insert(index, _factory(item));
   }

   // Read-only method uses the transformation method
   public bool Contains(TResult item) {
      return _list.Where(i => item.Equals(_transform(i))).Any();
   }

   // ensure that the list is writable
   private void EnsureWritable() {
      if (IsReadOnly)
         throw new InvalidOperationException("List is read only");
   }

   // and so forth...
}

The second constructor, which gets a second delegate as factory method makes the list writable and enables us to add new objects from outside without knowing that another, hidden object becomes materialized inside of our transformation list.

This (reusable!) class makes us able to provide a our articles categories with a nice IList<Category> property.
public partial class Article {
   private IList<Category> _categories;

   public IList<Category> Categories {
      get {
         if (_categories == null)
            _categories = 
               new TransformationList<ArticleCategory, Category>(
                     ArticleCategories, 
                     (acl) => acl.Category,
                     (c) => AddCategory(c));
         return _categories;
      }
      set { _categories = value; }
   }

   public ArticleCategory AddCategory(Category category) {
      ArticleCategory acl = new ArticleCategory();
      acl.CreationDate = DateTime.Now;
      acl.Article = this;
      acl.Category = category;
      ArticleCategories.Add(acl);
      return acl;
   }
}
// =========================================
// sample usage
Article article = GetArticle();

foreach (var category in article.Categories) {
   // process categories
}

article.Categories.Add(GetCateory());

Conclusion

Simple many to many relations are usually easy to work with, but even if an O/R-Mapper shows some weakness in its mapping features, we are still able to provide a reasonable interface to clients of our business layer and its domain objects.

Outlook

In the next part of this series we will look at version controlled data, what challenges they can could cause and ways to get them handled.

Tuesday, September 6, 2011

Tables Are No Domain Objects: Table Relation Transformations Part 1

This is the second part of a blog series 'Tables Are No Domain Objects'. In this post we will discuss where database relations are good candidates to be accessed in a different manner when they are represented by domain objects of our business layer.

The most obvious kind of a database table relation is a one (A) to many (B) relation where rows in table B hold a foreign key column that points to a unique key (usually the primary key) of table A. Most data access layers, based on an O/R-Mapper or custom mappings, do a good job to map this kind of database relations into objects, but there are some cases where it can make sense to transform those relations into a different structure or provide a different access than given by our database.

Foreign Key Fields And Transparent Database Relations

Before we step into more specific types of relations, there is one very basic thing where each of us should think about when starting to design a new business layer. In a database relations are always represented by foreign keys but when data are loaded into a object structure we can use object references, so we don't really need those foreign key fields as part of our objects. For instance, a sales order line object does not need to hold the ID of its parent sales order, it can hold a object reference of the sales order. One good reason to keep foreign key fields present our in domain objects is to have some additional logging and debugging information. However, we should never use those fields to implement any business logic on them, instead all business logic should always be implemented on the corresponding object references. (Very rare exceptions prove the rule though.)

This was already discussed in the previous blog post but should be recalled for sake of completeness. O/R-Mapper like Entity Framework or NHibernate provide a powerful query interface to access data, but using those queries in our business layer or presentation layers will cause a tight coupling between our source code and the database structure. Apart from other issues, discussed in the other post, queries like this can become a issue if we ever need to refactor our database structure or domain objects.

var orders = from o in efContext.Orders
             where o.CustomerId == currentCustomer.Id
             select o;

foreach (var order in orders) {
   // process order
}
Instead of this it is usually much safer to provide strong typed access methods out of our data access layer.
var orders = myDataContext.Orders.GetForCustomer(currentCustomer);

foreach (var order in orders) {
   // process order
}
Please read the previous post (Tables Are No Domain Objects Part 1) to see further issues, especially when using Entity Framework.

Aggregations

In general I'm not as restrictive as other architects, who say it is always a bad solution to access any related objects of a current object reference, but when it comes to aggregations it can sometimes be dangerous to be done from outside of the class that holds the objects to be aggregated.

One of the most common examples for a an aggregation that we should consider to encapsulate is when we have to calculate the price of a sales order that is based on the price of its line items.

SalesOrder order = GetOrder();
decimal orderPrice = 
   order.SalesOrderLines.Sum(line => line.ArticlePrice * line.ItemCount);
From the very beginning of a new system this could work pretty nice. The problem is, what if the calculation of the sales orders price ever changes? Salespeople are creative to find new ways to sell the companies products and usually it is only a matter of time when discount features become required. Discounts can be a special offer for specific articles or article categories, a graduated discount depending on the orders all round price or many other types. Now we can run into trouble if we do a outside calculation of a sales orders price. A better solution is to put the aggregation into the sales order class.

public partial class SalesOrder {
   public decimal GetPrice() {
      return SalesOrderLines.Sum(line => line.ArticlePrice * line.ItemCount);
   }
}
For now we only encapsulated the calculation that we have done from outside (what already avoids a duplication of logic to multiply with the sales lines item count) but when it comes to discounting we don't need to scan our whole source code to find all places where an orders price is calculated. We only have to adapt the body of our SalesOrder.GetPrice method and the rest of the system doesn't even notice the new calculation.

(The approach to do money calculations with decimal becomes part of a subsequent blog of this series.)

Status Objects

Status objects are special kinds of domain objects that describe the current status of their parent objects. They usually exist in a collaboration of their parent domain object and a description object that describes the current status.


In addition to providing a current status of their parent object, status objects are often used to log an operational history of an object since they are usually not deleted or updated after their first creation. Since each new status object can change the state of its parent they are often very important rules for the processing of an object.

As an example, say we have a parent SalesOrder domain object that can hold a list of SalesOrderStats objects where each of the status is described by a referenced SalesOrderStatusDescription object. Now what if we want to know if an orders current status is "Closed"? Without some design effort we would have to do something like this.
public partial class SalesOrderStatusDescription {
   // status description code constants
   public const string ClosedCode = "Closed";
   // ...
}
// =========================================
// sample usage
SalesOrder order = GetOrder();

bool isClosed = (from status in order.SalesOrderStatus
                 where status.OrderId = order.Id
                 orderby status.CreationDate descending
                 select status.SalesOrderStatusDescription.Code)
                 .First()
                 == SalesOrderStatusDescription.ClosedCode;
Apart from the fact that this causes is tight coupling between three different domain objects and their base tables, it would be crap if we always would have to do so in upper layers, just to get an objects current state.

A first thing we can do to prettify this is introduce an enum that either represents the possible codes of our SalesOrderStatusDescription objects or represents the possible foreign key values pointing to the SalesOrderStatusDescription primary keys. Since we would need to always load the descriptions to parse the code fields, we will do the foreign key solution, what causes a lower database utilization. Yes, I know we should try to never base any functionality on foreign key values but I tend to see this as one of the valid exceptions. Our descriptions IDs are usually immutable and it does not make a big difference if our source code is coupled to the Code column of the status description or its primary key.

public enum SalesOrderStatusCode : int {
   Created = 1,
   Approved = 2,
   Delivered = 3,
   Payed = 4,
   Closed = 5,
}
Next step we can do is add a new property to our sales order status that represents the value of the enum. Unfortunately Entity Framework does not provide native support for enums, so we need to do workaround by casting the foreign keys value.
public partial class SalesOrderStatus {
   public SalesOrderStatusCode Code {
      get { return (SalesOrderStatusCode)SalesOrderStatusDescriptionId; }
      set { SalesOrderStatusDescriptionId = (int)value; }
   }
}
Okay, now we are able to shorten the previous snippet a little bit, but without one more method we would still need to traverse the list of all existing status whenever we want to know the current one. Since the current status of an object is usually a widely needed information we should add a method to our sales order that encapsulates the traversing returns the code of the current status.

public partial class SalesOrder {
   public SalesOrderStatusCode GetCurrentStatusCode() {
      return (from status in SalesOrderStatus
              orderby status.CreationDate descending
              select status.Code)
              .First();
   }
}
// =========================================
// sample usage
SalesOrder order = GetOrder();
bool isClosed = order.GetCurrentStatusCode() == SalesOrderStatusCode.Closed;
This interface is much niftier and will make life much easier in client code.

As an optional, last step we could add a IsClosed method to our order. I use to do this only for the most important states of an object though.
public partial class SalesOrder {
   public bool IsClosed() {
      return GetCurrentStatusCode() == SalesOrderStatusCode.Closed;
   }
}
// =========================================
// sample usage
SalesOrder order = GetOrder();
bool isClosed = order.IsClosed();
Now our sales order provides a really handy interface that helps us to concentrate on other things when implementing features that need to work with the sales order status.

Last but not least, we should add a corresponding method set the new status of an order.
public partial class SalesOrder {
   public void SetStatus(SalesOrderStatusCode code) {
      SalesOrderStatus status = new SalesOrderStatus();
      status.CreationDate = DateTime.Now;
      status.Code = code;
      status.SalesOrder = this;
      SalesOrderStatus.Add(status);
   }
}
There is one line in this method that could cause problems. Setting the CreationDate with by using the local hosts time is only safe if we are sure that all client PCs are configured with the same time server, otherwise we can get deflections of the creation date of new status. Since the creation date is essential for these objects this could cause issues in production. One thing we can do is to use the time from a central server, like the database server, instead of trusting the clients.

Since there are usually much more places where we need to know the current status of an object than places where a status of an object becomes changed I tend to add less strong typed set methods like SetPayed().

As we have seen, due to their importance and the complicated access, status objects are usually good candidates to be handled in a very different way than they are stored in our database and some architectural effort to get them into a more fashionable, object-oriented structure can be a good investment.

Performance Tuning. Since this series concentrates on designing our domain objects I kept this until now, but our current solution requires to always retrieve all existing status objects from the database server, what causes a unneeded network traffic and database utilization. We should consider to add a method to our data access layer that loads only the current status, description ID or description code, instead of loading all status objects if not yet loaded anyway. This is another important reason why we should encapsulate the get method, since we need to change only one place.

Outlook

In the next post we will continue the discussion of table relation transformations.

We will have a look at many to many relations where we might need to handle the weakness of O/R-Mappers.

As last part of the discussion about table relation transformations we will have a look at versioned data.

Thursday, September 1, 2011

Tables Are No Domain Objects Part 1

This is the first part of a blog series where I want to discuss ways of how to design rich domain objects provided by the Business Logic Layer of a software system.

In most business applications we have to work with data stored in a database or provided by web services. For sake of straightforwardness, we will focus on database based storages. However, when working with a skinny data web service layer that provides its data almost (or completely) in a database table style, some of the things we'll discuss here might still fit.

Today where we are armed with shiny O/R-Mappers like NHibernate, Active Record or Microsoft Entity Framework. Some of those even provide a nice wizard that only needs a database connection to create a full set of domain objects. It seems like we can immediately start to create our first front end dialog/view/page and use those classes.

Cool, this "software architecture" and layering, people are talking about, is a wizard. Now we've got it. Thanks for participating the discussion, we are finished. Or, probably not... ;-)

Sounds funny? Unfortunately quite a few applications are based on this kind of "software architecture" and "business layers". When talking about smaller projects, this might even work, but if a system becomes more complex, say with with some thousands lines of code (how about millions of lines?) this approach can become an early dead approach. A table-like domain object is often not perfect structured when we need to implement our business logic or when displaying in a rich presentation layer.

Sure, we cannot cover everything that might need be considered when designing domain objects and structuring a business layer in a blog series, so we will concentrate on a few topics that I find more important and discuss some examples for each of them. At current stadium of software industry I don't even think that anybody is at a point that (s)he knows everything about architecting a business layer - or any other layer. If we look at the last 10 years of software industry we see way to many changes of minds. Every new pattern, technology or practice preaches itself to be the golden calf. I'd say, as long as there is not a full decade without big changes in architectural and technical approaches, we will not know that we reached the end - and I guess we are far away from the beginning of this decade.

With this knowledge about imperfection, we, still, should always do best to design a powerful and usable business layer. The business layer is the heart of all applications. It should encapsulate as much business logic as possible to avoid duplication of source code. After this encapsulation of logic, the highlight becomes to provide this logic by an easy to use interface for those (us?) who will consume it in a front end application like a windows UI or web UI, but also a web or windows services.

Here is a short overview of the topics we will cover in this series.

Before we dig deeper into designing of domain objects we will startup in this post with a look at O/R-Mappers, if actually used, and reasons why we should consider to keep them inaccessible to other layers than our Data Access Layer.

In a following post we will talk about Table Relations and Class References. We will look at database relations like aggregates or status objects that might be candidates to be represented in a different manner when they are loaded into a domain object hierarchy.

When talking about Field Aggregations I will try to show groups of database columns that can make sense to be arranged in referenced helper classes or structs instead of keeping them directly in our entities.

We will discuss Data Type Transformations and see some cases where it can be helpful when data, received from a database, are transformed into a different type or one database column can contain structural information.

In the last part of this series we will talk about Reusable Entities. There are some types of domain objects that appear in many projects. Some of them are good candidates to be implemented in base libraries reuse them in other projects, other types often appearing entities are harder or (almost) impossible to be reused.

How to Safely Use an O/R-Mapper

I know, this topic is not directly related to designing domain objects. However, since O/R-Mapper are often used as central component to access an underlying database this topic makes much sense to me to start with.

We should always consider to abstract the native instance of the O/R-Mapper and I will try to show a few reasons here. First thing, when talking about designing domain objects, is we might run into architectural restrictions caused by restrictions of the ORM. If we directly use the mapper class in our whole system might not be able to design our objects as we would like to.

Apart from the architectural reasons we should look a little closer at some other reasons why we should consider to avoid being tightly coupled with the O/R-Mapper.

(Examples and issues here are based on Microsoft Entity Framework. Some of them will also apply to other ORMs and others might not.)

Let's do a jump start with a very simple domain model that contains only one entity SalesOrder.


Here is a very simple example to use Microsoft Entity Framework with this domain model that already contains a few pitfalls.
using (var ctx = new Model1Container()) {
   DateTime since = DateTime.Today.AddMonths(-1);
   var orders = from o in ctx.SalesOrders
                where o.CreationDate >= since
                select o;

   foreach (var order in orders) {
      // set approved
   }

   // do something else

   foreach (var order in orders) {
      // create invoices
   }
}

The worst issue of this piece of code is one that is especially related to the architecture of Entity Framework. The used LINQ query to get sales orders does actually not return a real result of objects, instead it only returns an instance of an EF implementation of an IQueriable<SalesOrder>. This IQueriable can be seen as an object version of a SQL statement, whenever we start to traverse through the result we will fire another database query. This causes unnecessary database load and (worse) can cause different results for each loop. If a new order becomes created between the first and the second foreach-loop, we will get this order in our second loop and the order becomes invoiced before it was approved. A simple solution is to put the LINQ query into braces and call the IEnumerable<T>.ToList() extension method. This copies the results into a List<T> and all list operations become offline from now. Problem is, if we ever forget the call of ToList() we will again run into this trap.

A much safer way to use an O/R-Mapper is to create an own data storage class that wraps the EF container and provide custom methods for the needed data. At this point we stand on a crossroad where we have to decide between implementing a Gateway (Martin Fowler, Patterns of Enterprise Application Architecture) and provide all data access methods directly from our data storage class or implementing Repositories (Martin Fowler, PoEAA) for each of our domain object types. When working in smaller projects a Gateway is usually the better solution because it is easier to implement. Drawback of this pattern is, a Gateway class can become large when there are dozens or hundreds of access methods. The Repository approach needs more effort to be set up, since we need one for each type of domain object but keeps classes more well-arranged. Due to our large domain model I decided to use a Repository.
// Custom data store class that wraps the EF container
public class DataStore : IDisposable {
   private Model1Container _efContainer;
   private SalesOrderRepository _salesOrders;
      
   public DataStore() {
      _efContainer = new Model1Container();
   }

   public SalesOrderRepository SalesOrders {
      get { return _salesOrders 
                   ?? (_salesOrders = new SalesOrderRepository(_efContainer)); }
   }

   public void Dispose() {
      _efContainer.Dispose();
   }
}
// ========================================================================
// custom repository to provide access to sales orders
class SalesOrderRepository {
   Model1Container _efContainer;

   internal SalesOrderRepository(Model1Container efContainer) {
      _efContainer = efContainer;
   }

   public IEnumerable<SalesOrder> GetCreatedSince(DateTime date) {
      return (from o in _efContainer.SalesOrders
               where o.CreationDate >= date
               select o).ToList();
   }
}
// ========================================================================
// ========================================================================
// sample usage
using (var data = new DataStore()) {
   DateTime since = DateTime.Today.AddMonths(-1);
   var orders = data.SalesOrders.GetCreatedSince(since);

   foreach (var order in orders) {
      // set approved
   }

   // will always be the same results
   foreach (var order in orders) {
      // create invoices
   }
}

Now we need to remember only once that we need to add this odd ToList() method to avoid the previously described issues. Unfortunately this was only one, EF related, issue of directly accessing a O/R-Mapper from higher layers.

Another issue is the SalesOrder.IsCanceled property. In most parts of our software we might not want/need to work with canceled orders. Alike fields in other objects could be IsActive, IsDeleted and so forth. By encapsulating our O/R-Mapper we only need to change a few access methods to eliminate unwanted access to canceled orders. Sure, there are still parts of the system that might want to work with those orders but we should consider to provide them by more descriptive methods like GetCreatedWithCanceledSince or simply GetCanceledSince, depending on what is required.

A non-abstracted O/R-Mapper access does usually also mean a tight coupling between our database and source code. If we ever need to change the structure of tables we can only hope that our ORM is able to handle the new structure without changing the domain model and, if it is able to handle the new mapping, we can hope that this will still give us the advantage that we aimed with our database changes. There can be database changes that will most likely make it impossible for most O/R-Mappers to stay able to handle them. Say we need to introduce some EAV/CR tables to move some rarely needed columns out of our main tables. When using a custom data store class we can relatively easy adopt an internal hybrid of keep using the ORM while doing other mappings with native ADO.NET features.

What if not all data we need to work with come from our own database? I've just seen a thread MSDN Software Architecture Forums "Centralizing Duplicate Functionality and Data", where employees data had to be taken from one single source for all software systems of the company. Most O/R-Mapper support only one database but we can hold several different versions of them when working with a wrapping data storage class. Even if some of the data we work with are not provided by a database, say by a web service, we can still provide one homogeneous source of our data to the other layers.

As a last reason for now why to abstract an O/R-Mapper we should keep the possibility of horizontal scaling in mind. With one database server we have only one option to get better performance from it, buying a bigger box. Though, there is a point where vertical scaling reaches its end. Most software systems consist of say 80% read and only 20% write operations. With a strong data access layer we are able to set up several read-only accessed database servers to handle all the read operations while all write operations are directed to the master server.

There are still more reasons why we should consider to keep our O/R-Mapper inaccessible for main parts of the system but this might become part of another blog entry.

Upcoming Posts

I will add a link to all related posts here.