Bruce Silver's blog post - DMN: Data Validation Reconsidered
Bruce Silver
Blog

DMN: Data Validation Reconsidered

By Bruce Silver

Read Time: 5 Minutes

In a post last summer, I began to question the value of assigning constraints like a range of allowed values to datatypes in DMN, in favor of performing data validation using a decision table inside the decision model.

There were a few reasons for this:

But since version 11.4, Trisotech has significantly enhanced both constraint definition and type checking, and I am now changing my tune. You still need a bit of data validation with a decision table for a few types of errors – for example, when a valid value for the variable depends on the values of other variables – but I now think it’s best to do most data validation with type checking against the item definition. In this post we’ll see how that works.

The root of the problem is this: You can say some input data conforms to a certain datatype, but you cannot guarantee that an invoking client will actually provide that. It could be an invalid value or missing entirely. Without type checking, your logic may initially produce null but downstream that null will likely generate a cryptic runtime error. So you really want to catch the problem up front.

Enhanced Constraint Definition

Prior to version 11.4, the allowed constraints on an item definition mirrored the simple unary tests allowed in a decision table input entry. You could specify either an enumerated list or a numeric range, but that’s it. Now constraints can be defined as any generalized unary test, meaning any FEEL expression, replacing the variable name with the ? character. For example, to specify item definition tInteger as an integer, you can say it is basic type Number with the constraint ? = floor(?). And here is something interesting: An expression like that used in a decision table requires first ensuring that the value is a Number. If it is Text or null, the floor() function will return a runtime error. For example, in a decision table rule validating integer myInt, where true returns an error message, you would need something likeif ? instance of Number then not(?=floor(?)) else true. But with type checking on tInteger as defined above, you don’t need the instance of condition. Type checking generates an appropriate error message when the base type is incorrect, simplifying the type check expressions. Even better, with each constraint specified for an item definition, modelers can define their own error message and unique error code, simplifying documentation and training.

Here is an example. Input data User is structured type tUser with three components: Name, which is text; ID, text with the format A followed by 5 digits; and Cost Center, a positive integer. All three components are required. Now in the item definition dialog, you can add a constraint expression as a generalized unary test, assign it a unique Validation Code, and specify the Validation message returned on an error. Below you see the item definition for tName. Here we actually defined two expressions, one to test for a null value and a second one to test for an empty string. If an invoking client omits Name entirely, the value is null. But when you test it in the DMN modeling environment, leaving Name blank in the html form does not produce null but the empty string. (You can get it to produce null by clicking Delete in that field.) So you really need to test both conditions. It turns out null returns both error messages.

The type definitions for all three components are shown below. ID uses the matches function, which checks the value against a regular expression. We don’t need to test for null or empty string explicitly, as both of those will trigger the error. And Cost Center uses the integer test we discussed previously.

Type Checking in Operation

Let’s see what happens with invalid data. Below you see the simple decision Hello New User, which returns a welcome message. What if we run it in Execution/Test with all three fields blank? The data is never submitted to the decision. Instead, the modeler-defined Validation Message and Validation Code are shown in red below each invalid field. If this decision service was deployed, a call to it omitting all three elements would return an html 400 error message, also containing the modeler-defined Validation Message and Validation Code.

Validation Levels

This is all very nice! But it’s quite a change from the way the software worked previously. For that reason, you can set the validation level in the tool under File/Preferences.

As you can see, you have three choices. None turns off type checking; it is how the software worked previously. External data entry effectively checks only input data. If you’ve thoroughly tested your model, that’s the only place invalid data is going to appear. Always type checks everything. Trisotech recommends using Always until you’ve thoroughly tested your model.

A Benefit for Students

In my DMN training, I am beginning to see the value of that. To be certified, students must create a decision model containing certain required elements, including some advanced ones like iteration and filter. And they often make mistakes with the type definitions, not so much in the input data but in the decision logic, for example, when a decision’s value expression does not produce data consistent with the assigned type. Now, instead of me telling the student about this problem, the software does it for me automatically! It’s too early to tell whether students like that or not, but in the end it’s a big help to everyone.

Follow Bruce Silver on Method & Style.

Blog Articles

Bruce Silver

View all

All Blog Articles

Read our experts’ blog

View all

Trisotech's blog post - Two DMN Solutions to the Same Problem
Trisotech Icon
Blog

Two DMN Solutions to the Same Problem

By Trisotech

Read Time: 5 Minutes

Often, multiple methods can be used to solve the same business problem. In this blog we will briefly explore two different ways to create a DMN solution to the same decision problem and discuss the pros and cons of each solution.

The Problem

As part of a regulatory process, a government agency wants to determine if an applicant is eligible for a resident permit.

The rule is simple enough. An applicant is eligible for a resident permit if the applicant has lived at an address while married and in that time period, they have shared the same address at least 7 of the last 10 years.

In terms of inputs to make that decision, we are provided with three lists:

1

A list of periods living at an address for applicant (From, To, Address)

2

A list of periods living at an address for spouse (From, To, Address)

3

A list of applicant and spouse marriage periods (From, To)

Modeling the Decision as Stated (Model 1)

The agency suggests that we calculate the time in years, months and days, where the above time periods overlap, then evaluate if this condition has lasted more or equal than 7 of the last 10 years.

The Decision Requirements Diagram (DRD) below captures that method.

Model 1
Eligibility DRD as stated

In the DRD above, the three provided lists are used as Data Inputs to a Decision that produced a collection of Periods Overlaps which are then submitted to a Decision of whether these Periods Overlaps add up to Seven of the last ten years.

Model 1
Defining our Input and Output types

For our inputs, we first define a type tPeriod as follows:

The List of applicant and spouse marriage periods (From, To) is then defined simply defined as a Collection of tPeriod.

As for both the List of periods living at an address for applicant (From, To, Address) and the List of periods living at an address for spouse (From, To, Address), they are defined as a Collection of tLivingAdress that reuses our tPeriod type definition above.

Our Period Overlaps Decision will then lead to a Collection of tPeriod and finally our Seven of the last ten years Decision will provide us with a Boolean true or false.

Model 1
Period Overlaps logic

To obtain the Periods Overlaps Decision which calculates the overlap periods where the Applicant was married and living at the same address as Spouse, we will progressively build a Boxed Context. This is done here to help more novice readers. Advanced users may skip to the completed Boxed Context at the end of this section.

Our first step is to find Common Addresses between the applicant and the spouse to filter the periods to only those that are on common addresses. By doing so, we eliminate processing any farther periods they were not living together.

To achieve this, we take the Address portion from each the List of periods living at an address for applicant and filter that list by retaining only those addresses that are also contained in the Address portion from each the List of periods living at an address for spouse. This provides us with a collection of Common Addresses for both the applicant and spouse. As you can see below, the expression language FEEL from DMN makes this logic quite simple to follow and author. Note that the natural language annotations above the double blue lines make the logic even more obvious to a novice decision modeler or reader.

Using the collection of Common Addresses, we will now filter out the Applicant Periods and the Spouse Periods to only be those that were at a common address.

We can now identify the Cohabitation Periods. To achieve this, we will iterate over the Applicant Periods and the Spouse Periods at a common address identified just before and only extract the subperiods the that overlaps. There is no prebuilt FEEL function that extracts overlapping subperiods, we therefore need to create ourselves a Business Knowledge Model (BKM) that will look at two periods and return either an overlap subperiod or null. Here is the logic of that BKM.

We can invoke this BKM from within our logic as per below. Note the Overlap Interval function invocation in the Then portion below.

The invocation of this new BKM also affects our DRD. As it turns out, we will also need it to decide on the Seven of the last ten years decision later. Here is the DRD now augmented with the two invocations.

We can now complete our Boxed Expression for the Periods Overlaps decision. Armed with the Collection of Cohabitation Periods, we can now look for overlaps with the Marriage periods and return a single Collection of Periods Overlap by flattening the Cohabitation in Marriage Collection while taken care of removing the null entries that our BKM may have introduce when there was no overlap.

Here is below the complete Boxed Expression for the Periods Overlaps decision.

Model 1
Seven of the Last Ten Years Logic

Having obtained a Collection of Periods Overlaps for the applicant and Spouse we can now decide if these overlaps add up to Seven of the last ten years. This Decision simply returns TRUE if the applicant and spouse were living together and married for at least seven of the last ten years. We will not go into all the details this time as by now you can easily read the boxed context and annotations to guide yourself. We will only bring your attention to the invocation of our BKM Overlap Interval created before in the return portion of the Periods Overlaps in the last 10 years below.

Briefly:
1

Determine the period equating to the last 10 years using today’s date

2

Find the periods from previous Periods Overlaps decision that occurred during last 10 years

3

Covert those to the number of days for each period

4

Sum the days and return TRUE if number of days overlapped is greater than or equal to the number of days in 7 years.

In the end, our decision logic turned out somewhat complex when literally following the simple enough suggestions to calculate the time in years, months and days, where the time periods overlap, and then evaluate if this condition has lasted more or equal than 7 of the last 10 years. I wounder if there would be another way to tackle this problem?

Modeling the Decision in a Simplified Way (Model 2)

Upon further analysis of the problem, another simpler model (Model 2) that produces the same results can be created. This decision model requires only a single decision context and is much easier to understand than the original model even though it does not use the explicit steps described in the problem statement.

For Model 2, rather than using periods as the major organizing principle, this second decision model is driven by iterating through each day in the last ten-year period and verifying each day if all the overlapping requirements are met.

Model 2
Simplified Model DRD

Model 2
Seven of the Last Ten Years Logic

Note that we use the same three lists as inputs maintaining their type definitions untouched. Model 2 differ in how we express the logic for the Seven of the last ten years decision.

A brief description of this approach goes like this:

1

Determine the interval of days that represents the Last 10 years using today’s date

2

For each day during the Last 10 years, validate if they were Married, and if they were living at the Same Address, if yes then count this day

3

Compute the total Number of days Married and Living Together by summing all the days that met all criteria from the previous step

4

Validate if the Number of days Married and Living Together is greater or equal to 7 years

Model 2
Day in Period BKM (Function)

You probably noticed that we introduced a Business Knowledge Model (BKM) in Model 2 as well. This function is invoked by Married, Applicant Address, and Spouse Address in the above Context and returns a Boolean TRUE if the input day is in the input period.

Key Takeaways

Using this simple problem, we have presented two quite different ways to model the same decision. Each model has pros and cons to be considered.

In the original model (Model 1), we have created a solution that literally follows the description of the problem as provided. This typically shows the problem is clearly understood and leads to a solution that allows the business participants to follow the Decision Requirement Diagram (DRD) of the decision as they understand it. On the positive side, the approach taken in Model 1 is computationally efficient as we first discard all periods where the Applicant and Spouse were not living at the same address prior to doing any other checks. However, the resulting logic may seem quite complex for what seems like a simple problem. Some rather advanced list creation and comparison logic were needed, and the logic can look a little daunting.

In the second model (Model 2), while the DRD is not as literal to the problem statement as was the DRD of Model 1, we obtain a logic that is much simpler to understand and maintain. By checking for conditions at each day, it is now easy to see how changes in regulations introducing new overlapping condition requirements could be introduced into this logic. Which is not so obvious for the logic of Model 1. On the negative side, Model 2 will always loop over every single day whether there is overlap or not, making it less computationally efficient as Model 1.

There you have it, Model 1 leads to a DRD closer to the problem definition with a complex logic definition that is computationally efficient while Model 2 leads to a simpler logic that is easier to understand, maintain and modify while being less computationally efficient. The choice is yours.

There are often other real-word practical considerations to factor in. Perhaps the most important is resource usage when the model is automated and placed in production. Perhaps it is understandability and maintainability. Some models will perform much faster than others depending on the selection and structure of the context logic. Because DMN allows powerful operations with simple syntax, it is not always obvious how a specific model will perform when automated. This should be tested and optimized if high volumes of automated decisions are to be processed.

All Blog Articles

Read our experts’ blog

View all