Sunday, October 19, 2008

My First NHibernate Project, Part 3

Part 1, Part 2

Lessons Learned

What did I take away from this whole experience? For one thing, now that I've worked with NHibernate a bit, I've come to really enjoy it. Its ability to map classes to database structures and its handling of may of the most annoying pieces of data layer coding should make writing my next data access layer much easier. NHibernate has its own little quirks that must be accounted for, but over all if you prefer working with actual objects in your code instead of DataSets, Tables, and Rows, then NHibernate's for you.

Throughout the course of this exercise I've encountered several things that had me pulling out my hair until I found the answer. I've also discovered several tips and tricks to make my life a bit easier. Since I know I'm not the only one to encounter these issues, here's my list of top tips for working with NHibernate.

Don't forget to embed the mapping files!
This is an easy thing to forget and one that bit me more than once after adding new tables to my database, writing the mappings and classes, and then getting all kinds of obscure errors when trying to use them! It's also appears to be one of the most common beginner mistakes based on the number of times I've seen this tip mentioned. Alway, always, always remember to change the Build Action (in the Properties Window) to "Embedded resource." If you don't, NHibernate won't be able to find the mapping file and you'll encounter all sorts of confusing exceptions.

Don't lose your identity
NConstruct created an "Id" property in each of my entity classes that's mapped to the primary key of the underlying table. I like this convention and have stuck with it, but with one small modification... I'm not exposing the properties' Set block publicly. In C# this is very easy because you can set access to the Get and Set blocks separately so I made my Id's Get block public but my Set block Protected (I considered making BOTH protected since you shouldn't be using your primary key in any business logic but eventually decided against it). You can't do this in VB.NET (unless it's been added since .NET 3.0), so you can't do this if you map the database columns directly to properties, but if you map the database columns to fields you should be able to write your Id property using the ReadOnly modifier.Of course, this only applies if your primary keys are automatically generated by the database (which is generally a very good idea). If you are assigning the primary keys yourself and need to have the consumer of the entity objects provide the key value for new objects then you will need to make your Id field read/write.

If your id is set by the database, remember to use the <generator class="native"/> tag inside the "id" tag (you could also use "identity," "sequence," or "hilo" depending on your database). If your application is assigning the identity to the object, use the "assigned" tag. If your entity class is responsible for assigning its own id, then you will need to use one of the other supplied generator classes or write your own generator class.

Closing Sessions is good, but creates extra work
The easiest approach to session management is to create one session, keep it open, and use it throughout the lifetime of the application. However, since Sessions encapsulate database connections, which are a limited resource on the database, this usually isn't the best approach... you can probably get away with it on a single-user database, but generally this is a "bad thing".

Since database interaction in typical applications (i.e. information is retrieved from the database and presented to the user; the user reads and possibly edits the data; any changes are written back to the database) are very "bursty," sessions kept open throughout the life of an application will remain idle most of the time, consuming database resources unnecessarily. A better approach is to only keep sessions open long enough to perform the required database CRUD (create, read, update, delete) operations and then close the session to free up database resources for other users.

This approach leads to a more complicated session management strategy, however. For one thing, you need to reconnect your entities to a session before you can save any changes to them back to the database. This could be a newly-created session, or a session that you've reconnected to the database. I've found that the latter approach works best for me.

Basically, the first time I need a session I get one from the SessionFactory. After I'm done with it, I flush it and disconnect it. Disconnecting the session frees up the database connection but leaves the session object available for reuse, freeing my application from the expense of having to create a new session the next time I need one. When it's time to save or update the database, I reconnect the session, lock the entity to the session, and call the appropriate session methods to persist my changes back to the database.

Lazy loading is very useful... but tricky
If you follow the same decision I did and disconnect your sessions when not in use, you'll probably run afoul of one of the other issues I had: problems retrieving items from collections that were defined as lazy loading. While lazy loading is definitely useful since it delays reading data from the database until it's actually needed, it won't work if the session that loaded the entity is disconnected. There are various ways of dealing with this, but that's a broad enough topic for an article of its own (soon to come).

Sets... not as easy to work with as you'd think
Sets are one of the main types of collections in NHibernate. Unless you have a relationship between tables where the child table has a column that explicitly stores sequential index values, you'll most likely be retrieving child records as an ISet(Of T). The concrete class that you'll be dealing with (if the collection is loaded and not proxied because it's defined as lazy loaded) is a HashedSet.

If you're used to automatically going to some variant of the List class whenever you need a collection (and especially if you migrated to VB.NET from VB6), you'll be shocked to discover that ISet doesn't have an Item property. ISet is not an indexed collection. In fact, that's the whole point of being a set: a set is an unordered collection of distinct values. What this means to you as a developer is that only one of any specific entity can exist within the set and that the set is not sorted or organized in any specific way. Technically, if you retrieve the elements in a set there is a predictable order in which they will be retrieved, but for all practical intents, you cannot rely on the elements in a set to be retrieved in any specific order. A set exists to hold the child entities related to your parent entity, nothing else.

This can take some getting used to. While constructs like For Each x In y... Next will work on Sets, constructs like
For i As Integer = 0 To set.Length -1
Dim x as entityType = set(i)
' do something here with x
Next
will not because there is no default Item property (there's no Item property at all). This means that if you are used to accessing collection items through their index your strategies for accessing items in the collection are going to have to be a bit different.

Objects in collections need special care
I had an entity class that contained a collection of entities from a related table (in this case a form Document and its collection of Fields). When tried to remove one of the fields from the document I discovered that the collection's Remove method was failing and I didn't know why. I finally solved the problem but it took quite some time to figure out (1) what exactly was happening, and (2) what to do about it. The ins and outs of this one are a whole article by itself, so I'll try to summarize this one here and write a more detailed explanation later.

Basically, what was happening was this: I had overridden my entity class' Equals() and GetHashCode() methods and I was apparently a bit over-zealous in what I felt was necessary to say that two instances were "equal" to each other. The real problem was actually my implementation of GetHashCode(), but you can't really override one without overriding the other (more on that in a later article).

Since my collection of fields was an unordered collection, it was defined in the mapping file as a Set. In my code, this set was implemented as a HashedSet. The importance of this is the fact that when an item is added to a HashedSet, its hashcode is used as the key for storing the entity. If you later try to remove that item (or find it using Contains) and pass in a reference to an item in the collection, the HashedSet calculates the hashcode of the item being passed in and uses that hashcode to find the item in the collection.

I got into trouble because the properties of the object I used in my hashcode were not immutable. Somewhere between adding the field to the collection and trying to remove it, some of the properties that are used in the hashcode were changed. As a result, when I tried to remove the item, its current hashcode no longer matched the hashcode that was used to place it in the collection so the HashedSet was unable to find the item again to remove it.

In short, the solution was to change which properties I used in my hashcode (I got a little less zealous about what constituted a "unique" entity) and to make those properties immutable. The actual implementation turned out to be a real pain because that meant having to make a number of changes in the rest of my code to accomodate the fact that I could no longer change the values of those properties. Since GetHashCode and Equals are closely related it also meant changing the implementation of Equals and rethinking exactly what I mean by "equals." To be totally consistant, this also meant changing my implementation of GetHashCode and Equals in several other entity classes as well as modifying code that was no longer valid because the properties used in those methods were now also immutable.

There's a lot more involved in the underlying theory behind object equality, hashcodes, and my final implementations, but those are topics for a later discussion.

Bulk Updates -- Not NHibernate's cup of tea
As much as I'm growing to like NHibernate, I discovered one weakness: it's lack of support for bulk updates. I believe that the recently-released version 2.0 of NHibernate is making inroads on this problem, but with NHibernate 1.2 it doesn't appear that bulk updates/deletes/inserts are possible. In order to insert, update, or delete multiple items using NHibernate 1.2 it appears you only have two choices:
  1. Perform each insert, update, or delete one at a time
  2. Bypass NHibernate and perform your mass operation using an ADO.NET Command.
The first option is obviously very inefficient, especially if you have to retrieve each entity, update it, and save the changes back to the database. Unless you're only dealing with a small number of entities, this isn't really even an option. The second option is really the only option of the two. Fortunately, you won't have to create and maintain a second connection to the database. You can share the connection object used by NHibernate by calling s.Connection.CreateCommand().


From what I understand, in NHibernate 2.0 you can execute bulk queries. To do this, you can write a named SQL Query, get it into an IQuery object, and then call ExecuteUpdate() on that object. The next release of NHibernate should go even further by allowing you to create mass update queries using HQL. For more information about bulk queries, check out this article.

full window
full window

Saturday, October 4, 2008

My First NHibernate Project, Part 2

Part1, Part3

The next time...


Now that I have one project under my belt, what will I do differently next time? I think now that I have more experience and a better understanding of NHibernate I'd probably do several things a bit differently if I were starting this project over. Here's a list of the main things I'd consider the next time I started a new project using NHibernate.

NConstruct... to use or not to use?
Although I'll consider NConstruct again (if nothing else, their development staff appears quite responsive to feedback and is very quick to incorporate requests), I'll probably try creating at least some of the classes and mapping files myself. Maybe I'm just a bit of a control freak, but now that I'm familiar with the file structures, I think it will be easier to create them myself than to use a tool and then have to correct the code generated by that tool. Still, it was able to generate quite a lot of code and XML with very little effort, so I'd probably still use it for most of my basic entities unless I decide to use interfaces and then just clean up the property names afterward (Refactor! is a pretty handy tool for helping with that). Maybe I'll beat my DBA around the head a little bit as well to see if I can get him to loosen up on his table naming rules a bit, which would drastically cut down on the amount of fixing up I need to do to property names!

Interfaces
Next time i think I'll also look into using interfaces with NHibernate. This is a bit of a trade-off, so I'm not sure whether I'll stick with that approach though. On one hand, using interfaces makes unit testing easier. On the other hand, with interfaces you can't map database columns to fields... you must map to properties, which may not be acceptable if your setter method does things that you don't need to do when initializing the object from the database. For example, if your workplace is anything like mine, you may have to work with a legacy database with some not-too-clean data. You might want to add validation in your setter so that new data going in will be clean and shiny, but couldn't use that setter when initializing an object from the database because you need to accept the record as-is, errors and all.

Components
Components sound interesting because they're basically a way to deconstruct part of a table into a separate class which is then included as a property in the table's entity representation. The classic example of this is a table that includes address information (such as almost any database that includes contact records). The address fields (street address, city, state, zip code, etc) can be defined as a component of type Address and the entity will then contain a field of type Address that contains the values of these columns. This is a good way to create small, general-purpose business objects that can be reused in different applications. In my current project I have a couple tables that could have been represented this way had I been able to absorb everything about NHibernate at once! Instead, I modified the NConstruct-derived classes for these tables so that they implemented a pre-existing business object interface, encapsulated a concrete instance of that interface within the class to hold the properties of that interface, and modified the generated getters and setters of the entity class to use that instance. This could have been accomplished more easily using a component (although in this specific case it was a better implementation choice to not use a component, but at the time I didn't have enough familiarity with NHibernate to evaluate the applicability of components to the problem.

Named Queries
Named queries could also be useful for encapsulating some of the data layer. My data layer code consists of a number of methods that query the database to return single entities or collections of entities that match specific criteria. It also contains methods that return collections of arrays consisting of specific properties from entities (for example, when I only need a couple properties such as the id and name in order to populate a combobox or a list). This means that the data layer consists mainly of methods that look basically like this:

Public Function GetSomeEntitiesBySomeProperty(ByVal propertyValue As PropertyType) As IList(Of EntityType)
Try
Dim s As ISession = GetSession()
Dim entities As IList(Of EntityType) = s.CreateQuery(queryString) _
.SetString(0, propertyValue).List(Of EntityType)()
Return entities
Finally
DoSessionCleanup()
End Try
End Function

where GetSession and DoSessionCleanup are methods in my data layer that manage the reuse of Session objects.

The question then is where does the query string come from? It could be hard-coded into each method in a string, but that's a fairly inflexible approach. It could also be coded into the method using properties of the ICriteria interface, but that suffers from the same problem (not to mention is just much to wordy for my taste, but if you like the declarative programming approach, feel free to go for it; I find that approach quite useful when writing unit tests in NUnit, but I'll probably pass in favor of HQL and SQL queries in NHibernate).

I ended up putting my query strings in Resource strings, the advantage being that it takes the query strings out of my code and into a resource file where it can be modified without touching the actual code. I could achieve basically the same end result using named queries, so that's an approach I'll probably look into next time (or maybe this time... there's still time to move the queries from the resource file into the mapping files if I want to give it a shot). The main advantage of moving the queries from the resource file into the named queries is that named queries are parsed once as opposed to queries passed in as strings, which are parsed each time unless they're cached (I believe NHibernate currently caches a certain number of the most recently used queries).

Generics and ICriteria
Since the majority of my queries are very similar -- "Get all of the {some object}s where {some property} is {some value}" -- occasionally with an "ordered by {some property}" I could could probably also drastically cut down on the number of queries I need to write if I used some combination of ICriteria and generic methods. But wait... didn't I just say that ICriteria wasn't any more flexible than hard-coding the queries as strings? Yes, but you for basic lookups you can pack a lot of bang for your buck if you write your lookup method using generics and a few input parameters. Here's a quick example of my previous pseudocode implemented using generics:
Public Function GetFilteredEntities(Of T As EntityBase) _
(ByVal propertyName as String, ByVal propertyValue as Object, _
Optional ByVal orderBy as String = Nothing) As IList(Of T)
Try
Dim s As ISession = GetSession()
Dim crit as ICriteria = s.CreateCriteria(GetType(T)) _
.Add(Expression.Eq(propertyName, propertyValue))
If orderBy IsNot Nothing AndAlso Not String.IsNullOrEmpty(orderBy.Trim()) Then
crit.AddOrder(Order.Asc(orderBy))
End If
Return crit.List()
Finally
DoSessionCleanup()
End Try
End Function

This one method could replace over a dozen methods in my current data layer, along with their associated query string resources.

While generic classes are a very powerful feature, I'm starting to believe that generic methods may even be more powerful. By using generics, this method can return a list of any entity type instead of a single type. In the above example, all of my entity classes inherit from the base class EntityBase so I specified that the generic type passed in be of that type or a subclass. Since each entity will have different properties and I may need to filter the same entity by different properties, I am passing in the name of the property by which to filter as a parameter and the value of that property as another parameter. Also, since ordering the results is a common requirement, I've addedn an optional parameter to specify a column by which to order the results.

Granted, this one method only handles a very limited set of possible queries, but I've found that this limited set actually covers most of the queries I need to do. In my experience I'm usually doing one of three things. Most often, I'm getting a single item from the database by Id. If I'm not getting a single item, I'm either getting everything from a table (but usually only if the table is fairly small), in which case I could use a method like the one above but without the propertyValue an propertyName parameters, or I'm getting a simple subset of a table, filtered on a single value (all users in group X, all orders for customer Y, all items for order Z, etc). In this last case, the method above would work perfectly well. I could also use the method listed above for the first case where I'm getting a single item by Id, but I'd probably write another method similar to the one above except that it executes a Load() method instead of using ICriteria and returns a single item instead of a collection containing one item.

If you're concerned with needing to filter on multiple criteria, you could easily extend the method listed above by passing in a dictionary of property name/value pairs instead of the single propertyName, propertyValue parameters. The method would then iterate over the dictionary, adding crit.Add(Expression.Eq(prop.Key, prop.Value)) for each property in the dictionary. This would build a filter where all of the properties must equal their specified values. If you wanted to return everything from a table, pass in an empty dictionary (I'd probably allow Nothing as a value and check for that to skip the loop as well).

If you really wanted to get fancy, you could probably come up with ways to pass in a data structure that can specify more advanced filtering such as "greater than" and "less than" instead of just "equals" or to allow "or" as well as "and," but then your setup for the method and handling of the data structure start becoming too cumbersome to be useful in opinion. Unless I really needed the data layer to be totally dynamic, I'd go as far as the parameter arrays and write specialized methods for the remaining 10% of my queries that can't fit into my general-purpose methods.

Coming up

In part three of my series on my first NHibernate project, I'll be discussing the main lessons I've learned and giving a few tips for new NHibernate users.
full window