Composable Data Validation with Haskell
Recently, a client asked us to work on a new Rules Engine for them. This system serves as the back end to a highly configurable dashboard where non-technical users define business rules for how a certain component of the system behaves. When deployed, this system has to handle a substantial production workload, so performance is also a key consideration.
So, our primary requirements were that this rules engine would be:
- Configurable - The end user should be given toggles to get the desired behavior from the fixed kinds of rules.
- Fast - This system should be able to handle several hundred validation requests per second.
- Robust - The system should not go down and should be easy to modify on the fly.
A secondary requirement related to robustness would be that the rules be:
- Declarative - A product manager should be able to look at the rule code and have a reasonably accurate understanding as to what the rule is doing.
Creating an eDSL
In order to meet the above requirements, we decided to write a small embedded domain-specific language (eDSL) to enable writing declarative validation rules. This article will show a simplified version of the actual language being used in production. We will be using a shallow embedding also known as a monomorphic final tagless encoding.
The first question that we must ask ourselves is not what operations we want to perform but what is the essence of our problem or domain, this is often called denotational design.
In our case, this is relatively simple—a validation is a mapping from an input value to whether it is valid or not. Putting this into code:
newtype ValidationRule a = ValidationRule { validate :: a -> Bool }
Given this as a basis we can start to write functions that help us to build these validation rules, for our basic rules we will use a convention of a trailing “_” to avoid name collisions. A simple example is the equality rule:
eq_ :: Eq a => a -> ValidationRule a
= ValidationRule $ \actual -> actual == ruleValue eq_ ruleValue
In the above, ruleValue
is a value that is embedded in the ValidationRule
whereas actual
is the value that this rule will attempt to validate during a validation operation. An example of how this rule can be built and configured is:
equalsFive :: ValidationRule Int
= eq_ 5 equalsFive
Already we see that rules are composed of both static and dynamic parts. The shape of the rule is statically determined by the function eq_
but the value (which is 5
in our example) can be any runtime value. eq_
isn’t the only rule we may want. Let’s make a few more basic rules:
gt_ :: Ord a => a -> ValidationRule a
lt_,= ValidationRule $ \actual -> actual > ruleValue
gt_ ruleValue = ValidationRule $ \actual -> actual < ruleValue lt_ ruleValue
Making rules composable
With this, our eDSL gives us a way to compare a value to something, but this alone is not very useful. We can make bespoke rules each time there is a new business use case, but what we need is the ability to build up larger validations. In boolean logic we have two main operations for conjunction and disjunction also know as AND
and OR
. Let’s write them:
or_ :: ValidationRule a -> ValidationRule a -> ValidationRule a
and_,= ValidationRule $ \actual ->
and_ rule1 rule2 && validate rule2 actual
validate rule1 actual = ValidationRule $ \actual ->
or_ rule1 rule2 || validate rule2 actual validate rule1 actual
It’s easy for us to add support for the inverse, or NOT
:
not_ :: ValidationRule a -> ValidationRule a
= ValidationRule $ \actual -> not $ validate rule actual not_ rule
We can compose the rules we have written so far to create new combinators. Let’s create the greater than or equal to operator (>=
in most languages):
geq_ :: Ord a => a -> ValidationRule a
= gt_ value `or_` eq_ value geq_ value
Handling complex data types
At the moment, there is still a key limitation with our language. All of these rules need to be of the same type and we do not have a way to change that type. Usually when we see something that looks like f a
and we want f b
we reach for fmap
on the Functor
type class. Let’s try to write that for our type:
instance Functor ValidationRule where
fmap :: (a -> b) -> ValidationRule a -> ValidationRule b
fmap f rule = ValidationRule $ \actual ->
???? validate rule
We have run into a problem, we have a function a -> b
and a ValidationRule a
and want a ValidationRule b
. This means when we try to construct our new validation rule we have a value actual :: b
but validate rule
needs something of type a
. If we had a function b -> a
we could achieve this. However, f
is the reverse of what we need. It seems like we want something that is like a reverse functor:
notQuiteFmap :: (b -> a) -> ValidationRule a -> ValidationRule b
= ValidationRule $ \actual ->
notQuiteFmap f rule validate rule (f actual)
This function has a name, contramap
, and it belongs to the Contravariant
Functor also known as a Cofunctor. The documentation for Contravariant
defines the difference between the two:
Whereas in Haskell, one can think of a Functor as containing or producing values, a contravariant functor is a functor that can be thought of as consuming values.
In fact, the example used is Predicate a
which is exactly the same as our type ValidationRule a
. It’s always nice when you can find a preexisting type to validate your approach.
The example provided (checking whether an account balance is overdrawn) is good demonstration of how to use contramap
, so let’s implement it. First we set up the data type of our account and our negative rule which is in terms of an Integer
:
data Account = Account
accountBalance :: Integer
{ accountName :: Text
, accountOwner :: Text
,
}
negative_ :: ValidationRule Integer
= lt_ 0 negative_
We want to be able to validate an Account
but we only have a rule that works on Integer
and we don’t want to be writing lots of one-off rules. This is where we can apply the contramap
function:
overdrawn :: ValidationRule Account
= contramap accountBalance negative_ overdrawn
That’s fairly terse, so what does it mean? In plain English we can say an Account
(the type that’s being validated) is overdrawn
(the name of the function) if the accountBalance
(the name of the field we are using contramap
with) is negative_
(the rule we contramapped). We can imagine this being used in a larger validation:
accountOwnedBy :: Text -> ValidationRule Account
= contramap accountOwner $ eq_ owner
accountOwnedBy owner
withdrawalAllowed :: ValidationRule Account
=
withdrawalAllowed "Alice" `and_` (not_ overdrawn) accountOwnedBy
So we are now able to make fairly complex validation rules which read almost verbatim like English. “A withdrawal on this Account is allowed if the account is owned by Alice and it is not overdrawn.” That’s pretty cool but we’re starting to see the ergonomics of our API fray.
Adding validation results
Looking at the above example, we know if a withdrawal is allowed or not but we don’t know why it is not allowed. For example, the failure may have occurred due to the account being overdrawn, but it could also have been because the owner of the account was Bob rather than Alice.
Let’s revise our semantic domain a bit to help us keep track of why something failed. In order to track this we’re going to define a Validation
data type instead of using a Bool
for our return value:
type ErrMsg = Text
data Validation err
= Success
| Failure err
type ValidationResult = Validation [ErrMsg]
success :: ValidationResult
= Success
success
failure :: ErrMsg -> ValidationResult
= Failure [errMsg]
failure errMsg
newtype ValidationRule a =
ValidationRule { validate :: a -> ValidationResult }
We’re going to need to rewrite our core functions now to use the new representation. We will only rewrite a few of them. The rest are left as an exercise to the reader.
eq_ :: (Show a, Eq a) => a -> ValidationRule a
= ValidationRule $ \actual ->
eq_ value if actual == value then
successelse
$ "Expected " <> show actual <> " to equal " <> show value)
failure (Text.pack
and_ :: ValidationRule a -> ValidationRule a -> ValidationRule a
= ValidationRule $ \actual ->
and_ rule1 rule2 case (validate rule1 actual, validate rule2 actual) of
Failure e1, Failure e2) -> Failure (e1 <> e2)
(Failure e1, _) -> Failure e1
(Failure e2) -> Failure e2
(_, Success, Success) -> Success
(
or_ :: ValidationRule a -> ValidationRule a -> ValidationRule a
= ValidationRule $ \actual ->
or_ rule1 rule2 case (validate rule1 actual, validate rule2 actual) of
Failure e1, Failure e2) -> Failure (e1 <> e2)
(Success, _) -> Success
(Success) -> Success (_,
Now we can run our validate function and if there is a failure we will know why the validation failed. And the best part is, we don’t actually have to change how the top level rule is written (though we do have to recompile since we’re not taking full advantage of the polymorphic final tagless approach). Since only our primitives are aware of ValidationResult
, we did not have to update any of the more sophisticated business rules during this change.
Conclusion
In this post we solved the problem of configurable validation by creating an eDSL that allows us to create sophisticated business rules through the composition of primitive rules. The performance of validating inputs using this approach is very good since there is almost no interpretive overhead when composing functions as we have done here.
The eDSL we have shown here is only slightly simpler than what we built for our client, and illustrates the core ideas clearly. A language like this one can continue to evolve as necessary, oftentimes without requiring existing rules to be rewritten, as we saw above. One aspect we haven’t covered in this post is adding context to rules, giving us the ability to see where one of our primitive rules failed relative to a larger validation (e.g. we expected “Alice” but got “Bob” in the context of validating the account owner).
Ben Levy and Christian Charukiewicz are Partners and Principal Software Engineers at Foxhound Systems. At Foxhound Systems, we focus on building fast and reliable custom software. Are you looking for help with something you’re working on? Reach out to us at [email protected].