Sunday, October 19, 2008

My First NHibernate Project, Part 3

Part 1, Part 2

Lessons Learned

What did I take away from this whole experience? For one thing, now that I've worked with NHibernate a bit, I've come to really enjoy it. Its ability to map classes to database structures and its handling of may of the most annoying pieces of data layer coding should make writing my next data access layer much easier. NHibernate has its own little quirks that must be accounted for, but over all if you prefer working with actual objects in your code instead of DataSets, Tables, and Rows, then NHibernate's for you.

Throughout the course of this exercise I've encountered several things that had me pulling out my hair until I found the answer. I've also discovered several tips and tricks to make my life a bit easier. Since I know I'm not the only one to encounter these issues, here's my list of top tips for working with NHibernate.

Don't forget to embed the mapping files!
This is an easy thing to forget and one that bit me more than once after adding new tables to my database, writing the mappings and classes, and then getting all kinds of obscure errors when trying to use them! It's also appears to be one of the most common beginner mistakes based on the number of times I've seen this tip mentioned. Alway, always, always remember to change the Build Action (in the Properties Window) to "Embedded resource." If you don't, NHibernate won't be able to find the mapping file and you'll encounter all sorts of confusing exceptions.

Don't lose your identity
NConstruct created an "Id" property in each of my entity classes that's mapped to the primary key of the underlying table. I like this convention and have stuck with it, but with one small modification... I'm not exposing the properties' Set block publicly. In C# this is very easy because you can set access to the Get and Set blocks separately so I made my Id's Get block public but my Set block Protected (I considered making BOTH protected since you shouldn't be using your primary key in any business logic but eventually decided against it). You can't do this in VB.NET (unless it's been added since .NET 3.0), so you can't do this if you map the database columns directly to properties, but if you map the database columns to fields you should be able to write your Id property using the ReadOnly modifier.Of course, this only applies if your primary keys are automatically generated by the database (which is generally a very good idea). If you are assigning the primary keys yourself and need to have the consumer of the entity objects provide the key value for new objects then you will need to make your Id field read/write.

If your id is set by the database, remember to use the <generator class="native"/> tag inside the "id" tag (you could also use "identity," "sequence," or "hilo" depending on your database). If your application is assigning the identity to the object, use the "assigned" tag. If your entity class is responsible for assigning its own id, then you will need to use one of the other supplied generator classes or write your own generator class.

Closing Sessions is good, but creates extra work
The easiest approach to session management is to create one session, keep it open, and use it throughout the lifetime of the application. However, since Sessions encapsulate database connections, which are a limited resource on the database, this usually isn't the best approach... you can probably get away with it on a single-user database, but generally this is a "bad thing".

Since database interaction in typical applications (i.e. information is retrieved from the database and presented to the user; the user reads and possibly edits the data; any changes are written back to the database) are very "bursty," sessions kept open throughout the life of an application will remain idle most of the time, consuming database resources unnecessarily. A better approach is to only keep sessions open long enough to perform the required database CRUD (create, read, update, delete) operations and then close the session to free up database resources for other users.

This approach leads to a more complicated session management strategy, however. For one thing, you need to reconnect your entities to a session before you can save any changes to them back to the database. This could be a newly-created session, or a session that you've reconnected to the database. I've found that the latter approach works best for me.

Basically, the first time I need a session I get one from the SessionFactory. After I'm done with it, I flush it and disconnect it. Disconnecting the session frees up the database connection but leaves the session object available for reuse, freeing my application from the expense of having to create a new session the next time I need one. When it's time to save or update the database, I reconnect the session, lock the entity to the session, and call the appropriate session methods to persist my changes back to the database.

Lazy loading is very useful... but tricky
If you follow the same decision I did and disconnect your sessions when not in use, you'll probably run afoul of one of the other issues I had: problems retrieving items from collections that were defined as lazy loading. While lazy loading is definitely useful since it delays reading data from the database until it's actually needed, it won't work if the session that loaded the entity is disconnected. There are various ways of dealing with this, but that's a broad enough topic for an article of its own (soon to come).

Sets... not as easy to work with as you'd think
Sets are one of the main types of collections in NHibernate. Unless you have a relationship between tables where the child table has a column that explicitly stores sequential index values, you'll most likely be retrieving child records as an ISet(Of T). The concrete class that you'll be dealing with (if the collection is loaded and not proxied because it's defined as lazy loaded) is a HashedSet.

If you're used to automatically going to some variant of the List class whenever you need a collection (and especially if you migrated to VB.NET from VB6), you'll be shocked to discover that ISet doesn't have an Item property. ISet is not an indexed collection. In fact, that's the whole point of being a set: a set is an unordered collection of distinct values. What this means to you as a developer is that only one of any specific entity can exist within the set and that the set is not sorted or organized in any specific way. Technically, if you retrieve the elements in a set there is a predictable order in which they will be retrieved, but for all practical intents, you cannot rely on the elements in a set to be retrieved in any specific order. A set exists to hold the child entities related to your parent entity, nothing else.

This can take some getting used to. While constructs like For Each x In y... Next will work on Sets, constructs like
For i As Integer = 0 To set.Length -1
Dim x as entityType = set(i)
' do something here with x
will not because there is no default Item property (there's no Item property at all). This means that if you are used to accessing collection items through their index your strategies for accessing items in the collection are going to have to be a bit different.

Objects in collections need special care
I had an entity class that contained a collection of entities from a related table (in this case a form Document and its collection of Fields). When tried to remove one of the fields from the document I discovered that the collection's Remove method was failing and I didn't know why. I finally solved the problem but it took quite some time to figure out (1) what exactly was happening, and (2) what to do about it. The ins and outs of this one are a whole article by itself, so I'll try to summarize this one here and write a more detailed explanation later.

Basically, what was happening was this: I had overridden my entity class' Equals() and GetHashCode() methods and I was apparently a bit over-zealous in what I felt was necessary to say that two instances were "equal" to each other. The real problem was actually my implementation of GetHashCode(), but you can't really override one without overriding the other (more on that in a later article).

Since my collection of fields was an unordered collection, it was defined in the mapping file as a Set. In my code, this set was implemented as a HashedSet. The importance of this is the fact that when an item is added to a HashedSet, its hashcode is used as the key for storing the entity. If you later try to remove that item (or find it using Contains) and pass in a reference to an item in the collection, the HashedSet calculates the hashcode of the item being passed in and uses that hashcode to find the item in the collection.

I got into trouble because the properties of the object I used in my hashcode were not immutable. Somewhere between adding the field to the collection and trying to remove it, some of the properties that are used in the hashcode were changed. As a result, when I tried to remove the item, its current hashcode no longer matched the hashcode that was used to place it in the collection so the HashedSet was unable to find the item again to remove it.

In short, the solution was to change which properties I used in my hashcode (I got a little less zealous about what constituted a "unique" entity) and to make those properties immutable. The actual implementation turned out to be a real pain because that meant having to make a number of changes in the rest of my code to accomodate the fact that I could no longer change the values of those properties. Since GetHashCode and Equals are closely related it also meant changing the implementation of Equals and rethinking exactly what I mean by "equals." To be totally consistant, this also meant changing my implementation of GetHashCode and Equals in several other entity classes as well as modifying code that was no longer valid because the properties used in those methods were now also immutable.

There's a lot more involved in the underlying theory behind object equality, hashcodes, and my final implementations, but those are topics for a later discussion.

Bulk Updates -- Not NHibernate's cup of tea
As much as I'm growing to like NHibernate, I discovered one weakness: it's lack of support for bulk updates. I believe that the recently-released version 2.0 of NHibernate is making inroads on this problem, but with NHibernate 1.2 it doesn't appear that bulk updates/deletes/inserts are possible. In order to insert, update, or delete multiple items using NHibernate 1.2 it appears you only have two choices:
  1. Perform each insert, update, or delete one at a time
  2. Bypass NHibernate and perform your mass operation using an ADO.NET Command.
The first option is obviously very inefficient, especially if you have to retrieve each entity, update it, and save the changes back to the database. Unless you're only dealing with a small number of entities, this isn't really even an option. The second option is really the only option of the two. Fortunately, you won't have to create and maintain a second connection to the database. You can share the connection object used by NHibernate by calling s.Connection.CreateCommand().

From what I understand, in NHibernate 2.0 you can execute bulk queries. To do this, you can write a named SQL Query, get it into an IQuery object, and then call ExecuteUpdate() on that object. The next release of NHibernate should go even further by allowing you to create mass update queries using HQL. For more information about bulk queries, check out this article.

full window
full window