DataObjects.Net vs NHibernate: conceptual differences and feature-based comparison

This article describes the most important differences between two ORM frameworks: DataObjects.Net and NHibernate. It is written by one of DataObjects.Net framework architects, so likely, it is biased toward DataObjects.Net, although the author tried to minimize the bias. Comparison of DataObjects.Net to other widely adopted ORM frameworks (e.g. Entity Framework) is also partially done here.

This article is written for the people that are familiar with NHibernate or its major concepts.

Contents:

Conceptual differences

1. Externally stored state

2. Persistence awareness

3. Common persistence behavior and services

Feature-based comparison

Feature matrix building rules

Feature descriptions

Description of features with standard meaning is omitted.

True POCO / PI

Disconnected scenario support / offline operation mode

Inheritance mapping

Associations (relationships)

Mapping of generic types

Query related performance features

Partial and eager fetches

Session / DataContext / UoW features

Operation logging

Unification

Conceptual differences

NHibernate (NH) is designed to provide core ORM functions (precisely, mapping objects to database structures) for nearly any objects. In general,

That’s called “persistence ignorance”. It brings some advantages, and many people like this approach. It is very flexible, since it allows to construct generally everything around it (at least, as it initially seems).

Probably, the strongest argument of people preferring this approach is “less is more” principle: the less limitations and features you should know, the simpler and faster you study the system. That’s true: all you need to use NHibernate (more precisely, its basic functions) is to study its mapping APIs and methods of Session type.

On contrary, DataObjects.Net (DO):

This means, that to use DataObjects.Net, you should:

As you see, that’s not much more in general, but honestly, noticeably more.

The key question is: what do you get in exchange, and why we decided POCO/PI won’t suit DataObjects.Net as well?

In exchange, you can be sure any persistent object you deal with shares the following common properties:

1. Externally stored state

In fact, any persistent object you use has just few inherited fields, and one of these fields stores reference to its persistent state container. Such approach allows DataObjects.Net to:

2. Persistence awareness

All persistent entities and services in DataObjects.Net directly violate persistence ignorance principle - in fact, it is a global persistence awareness. Despite some disadvantages (they are discussed further), this enables your entities to access the persistent storage, which is quite convenient.

To illustrate why the persistence awareness is attractive, let’s see what you must do to run a query inside a method of e.g. POCO entity without violation of PI:

You must notice that in any of these cases your entity:

I.e. the code is anyway aware of persistence engine (query service is your own abstract version of a part of persistence engine) - the only benefit you have is that you (theoretically) can provide a different implementation of this engine (i.e. provide a query service specific to a particular backend).

It also worth mentioning that if you’re using lazy loading, you’re already using one of approaches described here (by default, via NHibernate proxies).

Now let’s think when it’s really both necessary and possible for you to change the underlying persistence engine (i.e. to benefit from PI - not theoretically, but actually):

Also, after studying DataObjects.Net you’ll understand that even ORM tools can be quite different from the point of provided API, and the fight for PI may easily make you to sacrifice some useful ORM features for it. That’s a very common choice: to build a common abstraction API for N tools, you should rely on a common subset of their features (= never use everything else directly, even if it solves your problem precisely) and abstract anything else. Also,

So the question is: does it worth to invest the time and money on building a system ready for migration to some other ORM tool or to a custom DAL?

Hopefully, now it’s clear why we decided to sacrifice PI but provide richer API instead:

3. Common persistence behavior and services

This was just touched, but it is worth to describe this more precisely.

DataObjects.Net allows each Session to operate in one of two primary modes:

SessionOptions.ServerProfile indicates all the changes must be flushed continuously - in such a way that queries executed by user code always provide consistent result. It’s a kind of WYSIWIG mode ensuring BLL code can safely rely on what it sees - without worrying about caches, statement batching and so on. That’s quite convenient for any server-side code: developers should write the essence (logic), DataObjects.Net should care about everything else.

Any changes made to Entity are either saved or cancelled - this happens automatically together with the nearest query (DataObjects.Net mixes CUD statements together with queries in batches) or on completion of the current transaction. If changes are cancelled (e.g. as result of rollback), this is immediately visible.

If validation logic is implemented, it’s guaranteed transaction with invalid entities won’t be committed.

SessionOptions.ClientProfile indicates all the changes must be cached and delayed until explicit command, although again, DataObjects.Net does a lot here to ensure anything you see (mainly, except query results) is consistent. In particular, synchronization of paired associations and recursive Entity removal work the same in this mode. Moreover, that explicit SaveChanges command actually tries to replay all saved BLL operations on server side - that’s completely different to what other ORM frameworks do in this case (they compute changed rows and flush the differences to the database).

Any Entity provides:

That’s not everything, but all major features are there.

Feature-based comparison

In-depth look on all the features we could identify is expressed in this feature matrix.

It’s pretty large - there are about 250 features, carefully selected, grouped and organized to simplify understanding the relationships between them.

Feature matrix building rules

First of all, definitions:

The rules are:

1. All features should be grouped into a tree-like structure so that intended usage of each feature and relationships between nearby features are easily understandable.

2. All performance-related features must be located in “Performance” section.

3. The matrix should be as short as it’s possible. So we must avoid:

4. Each cell should contain one of the following abbreviations:

The meaning  of these abbreviations is explained in Remarks sheet.

If several abbreviations suit, the topmost one from above list must be used.

Feature descriptions

The content of this part of the document is preliminary. After this part was ready, it become clear it should be written in different way: instead of being focused on feature descriptions, it should describe how the differences in feature matrix affect on development, and especially, how to handle each particular case when DataObjects.Net does not offer an equivalent feature. So it will be rewritten in near future.

Description of features with standard meaning is omitted.
True POCO / PI

The possibility to map absolutely arbitrary types to relational structures.

Disconnected scenario support / offline operation mode
Inheritance mapping

“Large inheritance hierarchies support”: ORM is capable of join only the tables related to ancestors (superclasses) while running queries involving types mapped with class-table (table-per-type, TPT) inheritance mapping.

Most of ORM tools build queries for TPT mapping by joining the whole inheritance hierarchy (proof). So e.g. if you query for a Dog type (a descendant of Animal), tables for Cat, Mouse and so on will be joined. I.e. there not just the tables of ancestors, but also tables of all the siblings and descendants.

5-6 joins turn any query into a performance killer: even if SQL Server would be able to produce the best possible plan for such a query (that's also harder when lots of tables are used in query), it might be anyway too slow because of joins - i.e. joins will "eat" 90 or more percents of query cost.

More detailed description can be found here.

“Type discriminators - Automatically provided”: ORM is capable of automatically assign and maintain associations between types and type discriminator values.

Associations (relationships)

“Automatic sync. of inverse (paired) associations”: inverse associations are gets automatically synchronized by ORM, so e.g. the following test should pass:

var book   = new Book(...);

var author = new Author(...);

book.Authors.Add(author);

Assert.IsTrue(author.Books.Contains(book));

Mapping of generic types

“Open generic types (automatic)”: ORM is capable of automatically map all appropriate generic instances for a particular open generic type.

For example, if open generic type is FullTextInfo<T> where T: IFullTextIndexable, a set of objects like FullTextInfo<DocumentBase>, FullTextInfo<AccountBase> and so on must be registered. All T substitutions there are the most base types satisfying constraints for T in their inheritance hierarchies, but obviously, only the ones that are mapped.

Query related performance features

The featured covered in this section are mostly described in this cycle of posts. The tricky ones are:

“Generated SQL is ready for query plan caching”: subsequent execution of the same query with different parameters won’t make database server to generate a new query plan. At least, this implies:

See e.g. this article for further details.

“On-demand (partial) materialization”: ORM materializes parts of query result on enumeration, but not on execution - i.e. continuously, but not on the first IEnumerable.MoveNext() call.

The feature is especially useful, when large result sets must be browsed, although in this case you need either IStatelessSession or GC-friendly L1 cache mode support as well.

“MARS support”: makes on-demand materialization much more attractive, since you can run other queries while browsing large result set in this case.

Partial and eager fetches

“Eager fetches / prefetch paths / .Include”: the meaning of feature is obvious, but it can be implemented by two different ways:

If you’re interested in more pros and cons, please refer to this issue.

Session / DataContext / UoW features

“GC-friendly mode”: ORM can be configured to keep at least all the referenced and changed entities in L1 cache; non-referenced and unchanged entities can be purged out from L1 cache when it’s necessary.

This mode allows to browse large sets of entities part-by-part in the same session and transaction, while working set of entities fits in RAM.

"Ghost objects": see the description of “ghost objects” in Ayende’s blog; my comments there describe some pros and cons. This isn’t a feature, which presence is definitely good, but a bit unexpected behavior (if you don’t know about it) intended to resolve the performance optimization issue that can be solved differently (e.g. with batch-based prefetch and entity type caching).

Operation logging

This feature allows to:

Unification

“Unified “key” object” implies nearly the following code should work:

EntityKey key = session.GetEntityKey(book); // Getting unified key

Assert.AreSame(book, session.GetEntityByKey(key));

Assert.AreSame(book,

  session.All<Book>() // LINQ query

    .Where(book => session.GetEntityKey(book)==key) // Using key in LINQ query

    .SingleOrDefault());

As you see, we never mentioned actual key type in this example (Int32 / String / composite key, etc).

“Unified “version” object” implies nearly the following code should work:

EntityVersion version = session.GetEntityVersion(book);

Assert.AreEqual(version, session.GetEntityVersion(book));

book.Title = book.Title + “, The Sequel”; // Title is changed

Assert.AreNotEqual(version, session.GetEntityVersion(book));

Normally  there should be a way to specify which entity fields are included into entity version, and which aren’t.

“Unified exception types” implies nearly the following code should work:

foreach (int retryNumber = 0; retryNumber < 10; retryNumber++) {

  try {

    RunATransaction();

    break;

  }

  catch (ReprocessableException) {

    // Optionally: Thread.Sleep(new Random().Next(10));

    continue;

  }

}

The goal of unified exceptions is to make such code fully RDBMS-independent.