Wednesday, August 5, 2009

Validation within the Entity Framework

During last years DevConnections in Vegas I could definitely feel a big divide about the Entity Framework. Some people were diving right into it, others were rather critical. Certainly people that had been betting on LINQ to SQL. I think whenever Microsoft makes a large structural change in it's preferred data access layer, there are bound to be some waves. Personally, I never really got into datasets. Useful in some scenario's, but my preference truly is an object layer like the Entity Framework. So I have taken a deep dive into the Entity Framework, actually developed a large system in it, and came out alive and breathing (rather than floating belly up). Definitely some gotcha's, certainly in the first release of EF, that you have to pick up on, but overall workable.

One of my preferences is to always put an additional layer between a framework like EF and the actual application. Kind of a mini-framework, where you can easily deal with oddities, bugs and often used extensions, without copying it left and right through your code. Part of that framework that we developed on top of EF is a validation engine.

Part of some work I have done in the mid 90's on automatic generation of test cases was the idea of making validation an integral part of the application. The actual communication would continue to be checked against the contract. As a side note, that same work also included a note on applying the capability maturity model to testing. We threw the idea out as not being a practical thing for the study we were doing. Enter T-Map about two years later ( ). Oh well, never said I am doing this work because of my keen business instinct. The idea was applicable to a communications bus. You can validate the actual communication against a model of what it is supposed to be and pinpoint any differences. It can also be helpful in plugging in test mock-ups. A similar example closer to the .NET home is the code contract library added to .NET 4.0 (see ).

The validation framework does something similar for the interface between the application and the database. By providing a predefined set of validation rules we reduce the number of mistakes made in actually implementing the validation. You can also code custom validation rules for more complex scenario's.

I quickly discovered a rub though that gave me the typical 'duh' Erlebnis. An invariant defined in your contract is an invariant in any state. If you state that within a class property A is always bigger than the value of property B, that is going to be always enforced. Sounds like what you want, but what when you are assigning values to an object of that class? Programmers will quickly become annoyed once they figure out that they always have to make sure this holds. For instance: property A has value 1 and property B has value 0. You want to assign property A value 3 and property B value 2. You won't be able to do that without violating the invariant unless you assign the value to property A first (an easy way around this is a simultaneous assignment operator, but C# doesn't have that). If you have many classes with this issue splattered all over them, you will see your programmers transform from quiet geeks to raving monsters. So the validation framework is capable of selectively applying rules based upon the state of the entity.

A core assumption of the validation framework is that everything is a function. So rather than evaluating a value of a property, we evaluate a function on the value of a property, that also can consider the object that the property belongs to and the evaluation context (which contains a reference to the EF object context) . It's flexible enough that I only had to implement one custom validation rule for the whole project. All other validations were covered by configuring instances of the standard rules that the framework provides.

Very early on I specify the rules (mostly in XML, seldom a need for custom roles that are coded). As the code is developed the validation acts as a constant reminder of what the stored data should adhere to. The nice thing about the Entity Framework is that the object manager allows you to access old and new values rather easily, making it easier to develop fast and complex validations. During development the rules are refined. Sometimes I feel like loosening up the rule, but most of the time I end up changing to code to comply, since it is the right thing to do :) . In production, the rules are still validated and constantly check the data going into the database, catching odd scenario's that were not covered originally.

On top of that it is rather easy to add audit trails (automatically track who changed what fields) and support standard fields. For instance, if it's standard to have fields to track who created/updated a record and when it was created/updated, you can easily implement a standard interface and automatically fill these fields as the entity framework saves changes.

I definitely enjoyed working with the Entity Framework and it has lifted development standards way above the "we call it a business layer, but really it just provides easy access to our stored procedures". We have been capable of performing complex changes in a very short amount of time. The major pains are probably:

  • Integration with UI components through databinding is sometimes awkward or inefficient. I truly hope that the integration will become good enough to be able to generate complex LINQ to Entities queries automatically with efficient use of paging.
  • Explicit loading. Wrapped that in some additional methods and using some extension methods to make that easier and not have the code littered with explicit load statements.
  • Attach/Detach: the rule is that an attach automatically cascades through the associations while a detach doesn't. The object oriented approach of the Entity Framework came to good use when implementing some what-if scenario's. But some of the results of that evaluation had to be saved while temporary artifacts should never end up in the database. That caused some rather ugly scenes with additional DeepDetach methods to clean everything up. Having two different contexts (one that contains the changes you want to save and one that you don't want to save) is an option, but since objects can't be in two contexts that also causes some additional code. Related to that is also the rule that you can't save changes in a context until all reads have been completed. That makes 'pipe line' processing where you save intermediate results more complicated.

No comments:

Post a Comment