Bruce Silver's blog post - More On DMN Validation
Bruce Silver
Blog

More On DMN Validation

By Bruce Silver

Read Time: 5 Minutes

This month we return to a topic I’ve written about twice before, data validation in DMN models. This post, in which I will describe a third method, is hopefully the last word.

Beginning decision modelers generally assume that the input data supplied at execution time is complete and valid. But that is not always the case, and when input data is missing or invalid the invoked decision service returns either an error or an incorrect result. When the service returns an error result, typically processing stops at the first one and the error message generated deep within the runtime is too cryptic to be helpful to the modeler. So it is important to precede the main decision logic with a data validation service, either as part of the same decision model or a separate one. It should report all validation errors, not stop at the first one, and should allow more helpful, modeler-defined error messages. There is more than one way to do that, and it turns out that the design of that validation service depends on details of the use case.

The first method, which I wrote about in April 2021, uses a Collect decision table with generalized unary tests to find null or invalid input values, as you see below. When I introduced my DMN training, I thought this was the best way to do it, but it’s really ideal only for the simple models I was using in that training. That is because the method assumes that values used in the logic are easily extracted from the input data, and that the rule logic is readily expressed in a generalized unary test. Moreover, because an error in the decision table will usually fail without indicating which rule had the problem, the method assumes a modest number of rules with fairly simple validation expressions. As a consequence, this method is best used when:

The second method, which I wrote about in March 2023, takes advantage of enhanced type checking against the item definition, a new feature of DMN1.5. Unlike the first method, this one returns an error result when validation errors are present, but it returns all errors, not just the first one, each with a modeler-defined error message. Below you see the enhanced type definition, using generalized unary tests, and the modeler-defined error messages when testing in the Trisotech Decision Modeler. Those same error messages are returned in the fault message when executed as a decision service. On the Trisotech platform, this enhanced type checking can be either disabled, enabled only for input data, or enabled for input data and decisions.

This method of data validation is avoids many of the limitations of the first method, but cannot be used if you want the decision service to return a normal response, not a fault, when validation errors are present. Thus it is applicable when:

More recently I have been involved in a large data validation project in which neither of these methods are ideal. Here the input data is a massive data structure containing several hundred elements to be validated, and we want validation errors to generate a normal response, not a fault, with helpful error messages. Moreover, data values used in the rules are often buried deeply within the structure and many of them recurring, so simply extracting their values properly is non-trivial. Think of a tax return or loan application. And once you’ve extracted the needed data values, the validation rules themselves may be complex, depending on conditions of many other variables in the structure.

For these reasons, neither of the two methods described in my previous posts fit the bill here. Because an element’s validation rule can be a complex expression involving multiple elements, this rules out the type-checking method and is a problem as well for the Collect decision table. Decision tables also add the problem of testing. When you have many rules, some of them are going to be coded incorrectly the first time, and if a rule returns an error the whole decision table fails, so debugging is extremely difficult. Moreover, if a rule fails to return the expected result, you need to be able to determine whether it’s because you have incorrectly extracted the data element value or you have incorrectly defined the rule logic. Your validation method needs to separate those concerns.

This defines a new set of requirements:

The third method thus requires a more complex architecture, comprising:

While overkill for simple validation services, in complex validation scenarios this method has a number of distinct advantages over the other two:

Let’s walk through this third data validation method. We start with the Extraction service. The input data Complex Input has the structure shown here:

In this case there is only one non-repeating component, containing just two child elements, and one repeating component, also containing just two child elements. In the project I am working on, there are around 10 non-repeating components and 50 repeating components, many containing 10 or more child elements. So this model is much simpler than the one in my engagement.

The Extraction DRD has a separate branch for each non-repeating and each repeating component. Repeating element branches must iterate a BKM that extracts the individual elements for that instance.

The decisions ending in “Elements” extract all the variables referenced in the validation rules. These are not identical to the elements contained in Complex Input. For example, element A1 is just the value of the input data element A1, but element A1Other is either the input data element A2, if the value of A1 is “Other”, or null otherwise.

Repeating component branches must iterate a BKM that extracts the variable from a single instance of the branch.

In this case, we are extracting three variables – C1, AllCn, and Ctotal – although AllCn is just used in the calculation of Ctotal, not used in a rule. The goal of Extraction is just to obtain the values of variables used in the validation rules.

The ExtractAll service will be invoked by the Rules, and again the model has one branch for each non-repeating component and one for each repeating component. Encapsulating ExtractAll as a separate service is not necessary in a model this simple, but when there are dozens of branches it helps.

Let’s focus on Repeating Component Errors, which iterates a BKM that reports errors for a single instance of that branch.

In this example we have just two validation rules. One reports an error if element C1 is null, i.e. missing in the input. The other reports an error if element Ctotal is not greater than 0. The BKM here is a context, one context entry per rule, and all context entries have the same type, tRuleData, with the four components shown here. We could have added a fifth component containing the error message text, but here we assume that is looked up from a separate table based on the RuleID.

So the datatype tRepeatingComponentError is a context containing a context, and the decision Repeating Component Errors is a collection of a context containing a context. And to collect all the errors, we have one of these for each branch in the model.

That is an unwieldy format. We’d really like to collect the output for all the rules – with isError either true or false – in a single table. The decision ErrorTable provides that, using the little-known FEEL function get entries(). This function converts a context into a table of key-value pairs,and we want to apply it to the inner context, i.e. a single context entry of Repeating Component Error.

It might take a minute to wrap your head around this logic. Here fRow is a function definition – basically a BKM as a context entry – that converts the output of get entries() into a table row containing the key as a column. For non-repeating branches, we iterate over each error, calling get entries() on each one. This generates a table with one row per error and five columns. For repeating branches, we need to iterate over both the branches and for the errors in each branch, an iteration nested in another iteration. That creates a list of lists, so we need the flatten() function to make that a simple list, again one row per error (across all instances of the branch) and five columns. In the final result box, we just concatenate the tables to make one table for all errors in the model.

Here is the output of ErrorTable when run with the inputs below:

ErrorTable as shown here lists all the rules, whether an error or not. This is good for testing your logic. Once tested, you can easily filter this table to list only rules for which isError is true.

Bottom Line: Validating input data is always important in real-world decision services. We’ve now seen three different ways to do it, with different features and applicable in different use cases.

Follow Bruce Silver on Method & Style.

Blog Articles

Bruce Silver

View all

All Blog Articles

Read our experts’ blog

View all

Bruce Silver's blog post - Calendar Arithmetic in DMN
Bruce Silver
Blog

Calendar Arithmetic in DMN

By Bruce Silver

Read Time: 4 Minutes

Often in decision models you need to calculate a date or duration. For example, an application must be submitted within 90 days of some event, or a vaccine should not be administered within 120 days of a previous dose. DMN has powerful calendar arithmetic features. This post will illustrate how to use them.

ISO 8601 Format

The FEEL expressions for dates, times, and durations may look strange, but don’t blame DMN for that. They are based on the formats defined by ISO 8601, the international standard for dates and times in many computer languages. In FEEL expressions, dates and times can be specified by using a constructor function, e.g. date(), applied to a literal string value in ISO format, as seen below. For input and output data values, the constructor function is omitted.

A date is a 4-digit year, hyphen, 2-digit month, hyphen, 2-digit day. For example, to express May 3, 2017 as an input data value, just write 2015-05-03. But in a FEEL expression you must write date(2017-05-03).

Time is based on a 24-hour clock, no am/pm. The format is a 2-digit hour, 2-digit minute, 2-digit second, with optional fractional seconds after the decimal point. There is also an optional time offset field, representing the time zone expressed as an offset from UTC, what they used to call Greenwich Mean Time. For example, to specify 1:10 pm and 30 seconds Pacific Daylight Time, which is UTC 7 hours, you would enter 13:10:30-07:00 as an input data value, or time(13:10:30-07:00) in a FEEL expression. To specify the time as UTC, you can either use 00:00 as the time offset, or the letter Z, which stands for Zulu, which is the military symbol for UTC. DateTime values concatenate the date and time formats with a capital T separating them, as you see here. In a FEEL expression, you must use the proper constructor function with the ISO text string enclosed in quotes.

Date and Time Components

It is possible in FEEL to extract the year, month, or day component from a date, time, or dateTime using a dot followed by the component name: year, month, day, hour, minute, second, or time offset. For example, the expression date(“2017-05-03”).year returns the number 2017. All of these extractions return a number except for the time offset component, which returns not a number but a duration, with a format well discuss shortly.

DMN also provides an attribute, weekday not really a component but also extracted via dot notation returning an integer from 1 to 7, with 1 meaning Monday and 7 meaning Sunday.

Durations

The interval between two dates, times, or dateTimes defines a duration. DMN, like ISO, defines two kinds of duration: days and time duration, and years and months duration. Days and time duration is equivalent to the number of seconds in the duration. The ISO format is PdDThHmMsS, where the lower case d,h,m, and s are integers indicating the days, hours, minutes, and seconds in the duration. If any of them is zero, they are omitted along with the corresponding uppercase D, H, M, or S. And they are supposed to be normalized so that the sum of the component values is minimized.

For example a duration of 61 seconds could be written P0DT61S, but we can omit the 0D, and the normalized form is 1 minute 1 second, so the correct value is PT1M1S. In a FEEL expression, you need to enclose that in quotes and make it the argument of the duration constructor function: duration(“PT1M1S”).

Days and time duration is the one normally used in calendar arithmetic, but for long durations the alternative years and months duration is available, equivalent to the number of whole months included in the duration. The ISO format is PyYmM, where lower case y and m are again numbers representing number of years and months in the duration, and again the normalized form minimizes the sum of those component values. So a duration of 14 months would be written in normalized form as P1Y2M, and in FEEL, duration(P1Y2M). Since months contain varying numbers of days, the precise value of years and months duration is the number of months in between the start date and the end date, plus 1 if the day of the end month is greater than or equal to the day of the start month, or plus 0 otherwise.

As with dates and times, you can extract the components of a duration using a dot notation. For days and time duration, the components are days, hours, minutes, and seconds. For years and months duration, the components are years and months.

Arithmetic Expressions

The point of all this is to be able to do calendar arithmetic.

Lets start with addition of a duration.

A common use of calendar arithmetic is finding the difference between two dates or dateTimes.

You can multiply a duration times a number to get another duration of the same type, either days and time duration or years and months duration.

You can divide a duration by a number to get another duration of the same type.But the really useful one is dividing a duration by a duration, giving a number. For example, to find the number of seconds in a year, the expression duration(“P365D”)/duration(“PT1S”) returns the correct value, number 31536000. Note this is not the result returned by extracting the seconds component. The expression duration(P365D).seconds returns 0.

Example:
Refund Eligibility

Here is a simple example of calendar arithmetic from the DMN training. Given the purchase date and the timestamp of item return, determine the eligibility for a refund, based on simple rules.

The solution, using calendar arithmetic, is shown below:

Here we use a context in which the first context entry computes a days and time duration by subtracting the two dateTimes. The second context entry is a decision table that applies the rules. Note we can use durations like any other FEEL type in the decision table input entries.

Example:
Unix Timestamp

A second example comes from the Low-Code Business Automation training. It is common for databases and REST services to express dateTime values as a simple number. One common format used is the Unix timestamp, defined as the number of seconds since January 1, 1970, midnight UTC. To convert a Unix timestamp to a FEEL dateTime, you can use the BKM below:

The BKM multiplies the duration of one second by timestamp, a number, returning a days and time duration, and then adds that to the dateTime of January 1, 1970 midnight UTC, giving a dateTime. And you can perform the reverse mapping with the BKM below:

This time we subtract January 1, 1970 midnight UTC from the FEEL dateTime, giving a days and time duration, and then divide that by the duration one second, giving a number.

Calendar arithmetic is used all the time in both decision models and Low-Code business automation models. While the formats are unfamiliar at first, FEEL makes calendar arithmetic very easy. It’s all explained in the training. Both the DMN Method and Style course and the Low-Code Business Automation course provide 60-day use of the Trisotech platform and include post-class certification. Check out!

Follow Bruce Silver on Method & Style.

Blog Articles

Bruce Silver

View all

All Blog Articles

Read our experts’ blog

View all