Thursday, September 1, 2011

Tables Are No Domain Objects Part 1

This is the first part of a blog series where I want to discuss ways of how to design rich domain objects provided by the Business Logic Layer of a software system.

In most business applications we have to work with data stored in a database or provided by web services. For sake of straightforwardness, we will focus on database based storages. However, when working with a skinny data web service layer that provides its data almost (or completely) in a database table style, some of the things we'll discuss here might still fit.

Today where we are armed with shiny O/R-Mappers like NHibernate, Active Record or Microsoft Entity Framework. Some of those even provide a nice wizard that only needs a database connection to create a full set of domain objects. It seems like we can immediately start to create our first front end dialog/view/page and use those classes.

Cool, this "software architecture" and layering, people are talking about, is a wizard. Now we've got it. Thanks for participating the discussion, we are finished. Or, probably not... ;-)

Sounds funny? Unfortunately quite a few applications are based on this kind of "software architecture" and "business layers". When talking about smaller projects, this might even work, but if a system becomes more complex, say with with some thousands lines of code (how about millions of lines?) this approach can become an early dead approach. A table-like domain object is often not perfect structured when we need to implement our business logic or when displaying in a rich presentation layer.

Sure, we cannot cover everything that might need be considered when designing domain objects and structuring a business layer in a blog series, so we will concentrate on a few topics that I find more important and discuss some examples for each of them. At current stadium of software industry I don't even think that anybody is at a point that (s)he knows everything about architecting a business layer - or any other layer. If we look at the last 10 years of software industry we see way to many changes of minds. Every new pattern, technology or practice preaches itself to be the golden calf. I'd say, as long as there is not a full decade without big changes in architectural and technical approaches, we will not know that we reached the end - and I guess we are far away from the beginning of this decade.

With this knowledge about imperfection, we, still, should always do best to design a powerful and usable business layer. The business layer is the heart of all applications. It should encapsulate as much business logic as possible to avoid duplication of source code. After this encapsulation of logic, the highlight becomes to provide this logic by an easy to use interface for those (us?) who will consume it in a front end application like a windows UI or web UI, but also a web or windows services.

Here is a short overview of the topics we will cover in this series.

Before we dig deeper into designing of domain objects we will startup in this post with a look at O/R-Mappers, if actually used, and reasons why we should consider to keep them inaccessible to other layers than our Data Access Layer.

In a following post we will talk about Table Relations and Class References. We will look at database relations like aggregates or status objects that might be candidates to be represented in a different manner when they are loaded into a domain object hierarchy.

When talking about Field Aggregations I will try to show groups of database columns that can make sense to be arranged in referenced helper classes or structs instead of keeping them directly in our entities.

We will discuss Data Type Transformations and see some cases where it can be helpful when data, received from a database, are transformed into a different type or one database column can contain structural information.

In the last part of this series we will talk about Reusable Entities. There are some types of domain objects that appear in many projects. Some of them are good candidates to be implemented in base libraries reuse them in other projects, other types often appearing entities are harder or (almost) impossible to be reused.

How to Safely Use an O/R-Mapper

I know, this topic is not directly related to designing domain objects. However, since O/R-Mapper are often used as central component to access an underlying database this topic makes much sense to me to start with.

We should always consider to abstract the native instance of the O/R-Mapper and I will try to show a few reasons here. First thing, when talking about designing domain objects, is we might run into architectural restrictions caused by restrictions of the ORM. If we directly use the mapper class in our whole system might not be able to design our objects as we would like to.

Apart from the architectural reasons we should look a little closer at some other reasons why we should consider to avoid being tightly coupled with the O/R-Mapper.

(Examples and issues here are based on Microsoft Entity Framework. Some of them will also apply to other ORMs and others might not.)

Let's do a jump start with a very simple domain model that contains only one entity SalesOrder.


Here is a very simple example to use Microsoft Entity Framework with this domain model that already contains a few pitfalls.
using (var ctx = new Model1Container()) {
   DateTime since = DateTime.Today.AddMonths(-1);
   var orders = from o in ctx.SalesOrders
                where o.CreationDate >= since
                select o;

   foreach (var order in orders) {
      // set approved
   }

   // do something else

   foreach (var order in orders) {
      // create invoices
   }
}

The worst issue of this piece of code is one that is especially related to the architecture of Entity Framework. The used LINQ query to get sales orders does actually not return a real result of objects, instead it only returns an instance of an EF implementation of an IQueriable<SalesOrder>. This IQueriable can be seen as an object version of a SQL statement, whenever we start to traverse through the result we will fire another database query. This causes unnecessary database load and (worse) can cause different results for each loop. If a new order becomes created between the first and the second foreach-loop, we will get this order in our second loop and the order becomes invoiced before it was approved. A simple solution is to put the LINQ query into braces and call the IEnumerable<T>.ToList() extension method. This copies the results into a List<T> and all list operations become offline from now. Problem is, if we ever forget the call of ToList() we will again run into this trap.

A much safer way to use an O/R-Mapper is to create an own data storage class that wraps the EF container and provide custom methods for the needed data. At this point we stand on a crossroad where we have to decide between implementing a Gateway (Martin Fowler, Patterns of Enterprise Application Architecture) and provide all data access methods directly from our data storage class or implementing Repositories (Martin Fowler, PoEAA) for each of our domain object types. When working in smaller projects a Gateway is usually the better solution because it is easier to implement. Drawback of this pattern is, a Gateway class can become large when there are dozens or hundreds of access methods. The Repository approach needs more effort to be set up, since we need one for each type of domain object but keeps classes more well-arranged. Due to our large domain model I decided to use a Repository.
// Custom data store class that wraps the EF container
public class DataStore : IDisposable {
   private Model1Container _efContainer;
   private SalesOrderRepository _salesOrders;
      
   public DataStore() {
      _efContainer = new Model1Container();
   }

   public SalesOrderRepository SalesOrders {
      get { return _salesOrders 
                   ?? (_salesOrders = new SalesOrderRepository(_efContainer)); }
   }

   public void Dispose() {
      _efContainer.Dispose();
   }
}
// ========================================================================
// custom repository to provide access to sales orders
class SalesOrderRepository {
   Model1Container _efContainer;

   internal SalesOrderRepository(Model1Container efContainer) {
      _efContainer = efContainer;
   }

   public IEnumerable<SalesOrder> GetCreatedSince(DateTime date) {
      return (from o in _efContainer.SalesOrders
               where o.CreationDate >= date
               select o).ToList();
   }
}
// ========================================================================
// ========================================================================
// sample usage
using (var data = new DataStore()) {
   DateTime since = DateTime.Today.AddMonths(-1);
   var orders = data.SalesOrders.GetCreatedSince(since);

   foreach (var order in orders) {
      // set approved
   }

   // will always be the same results
   foreach (var order in orders) {
      // create invoices
   }
}

Now we need to remember only once that we need to add this odd ToList() method to avoid the previously described issues. Unfortunately this was only one, EF related, issue of directly accessing a O/R-Mapper from higher layers.

Another issue is the SalesOrder.IsCanceled property. In most parts of our software we might not want/need to work with canceled orders. Alike fields in other objects could be IsActive, IsDeleted and so forth. By encapsulating our O/R-Mapper we only need to change a few access methods to eliminate unwanted access to canceled orders. Sure, there are still parts of the system that might want to work with those orders but we should consider to provide them by more descriptive methods like GetCreatedWithCanceledSince or simply GetCanceledSince, depending on what is required.

A non-abstracted O/R-Mapper access does usually also mean a tight coupling between our database and source code. If we ever need to change the structure of tables we can only hope that our ORM is able to handle the new structure without changing the domain model and, if it is able to handle the new mapping, we can hope that this will still give us the advantage that we aimed with our database changes. There can be database changes that will most likely make it impossible for most O/R-Mappers to stay able to handle them. Say we need to introduce some EAV/CR tables to move some rarely needed columns out of our main tables. When using a custom data store class we can relatively easy adopt an internal hybrid of keep using the ORM while doing other mappings with native ADO.NET features.

What if not all data we need to work with come from our own database? I've just seen a thread MSDN Software Architecture Forums "Centralizing Duplicate Functionality and Data", where employees data had to be taken from one single source for all software systems of the company. Most O/R-Mapper support only one database but we can hold several different versions of them when working with a wrapping data storage class. Even if some of the data we work with are not provided by a database, say by a web service, we can still provide one homogeneous source of our data to the other layers.

As a last reason for now why to abstract an O/R-Mapper we should keep the possibility of horizontal scaling in mind. With one database server we have only one option to get better performance from it, buying a bigger box. Though, there is a point where vertical scaling reaches its end. Most software systems consist of say 80% read and only 20% write operations. With a strong data access layer we are able to set up several read-only accessed database servers to handle all the read operations while all write operations are directed to the master server.

There are still more reasons why we should consider to keep our O/R-Mapper inaccessible for main parts of the system but this might become part of another blog entry.

Upcoming Posts

I will add a link to all related posts here.

1 comment:

  1. I am not that much familiar with this information. Now only I have got it. Thanks for sharing...

    ReplyDelete