CEL expressions for HL7 message validation
Common Expression Language lets you write HL7 validation rules as one-line boolean expressions. Compiled at startup, evaluated per message, mapped to ACK codes.
Common Expression Language (CEL) is a small expression language designed for evaluating boolean conditions. Google created it for Kubernetes admission policies, IAM rules, and security policies. It compiles to bytecode, evaluates in microseconds, and can’t loop or allocate memory. That makes it safe to run on untrusted input.
These same properties make it useful for HL7 message validation. A validation rule is a boolean expression: given the fields of an HL7 message, does this message meet the criteria? True means accept. False means reject.
Why CEL instead of code
The alternative to CEL is writing validation logic in your application language. Parse the message, check the fields, return a result. This works, but it means validation rules live in compiled code. Changing a rule requires a code change, a build, and a deploy.
CEL rules are configuration. They live in a YAML file alongside the rest of the server config. Add a rule, restart the server, done. No compilation step for the operator.
The trade-off is expressiveness. CEL can’t call functions, query databases, or make HTTP requests. If your validation requires looking up a patient in an external system, CEL won’t do it. For field-level checks like required fields, allowed values, and pattern matching, it’s a good fit.
Available fields
CEL rules evaluate against parsed fields from four HL7 segments. Every field is a string. Missing segments return empty maps. Missing lists return empty lists.
MSH (message header)
| Key | HL7 field | Example |
|---|---|---|
msg_type | MSH-9.1 | ADT |
trigger | MSH-9.2 | A01 |
sending_app | MSH-3 | HIS |
sending_fac | MSH-4 | CENTRAL_HOSPITAL |
receiving_app | MSH-5 | LAB |
receiving_fac | MSH-6 | PATHOLOGY |
control_id | MSH-10 | MSG00001 |
version | MSH-12 | 2.3 |
PID (patient identity)
| Key | HL7 field | Example |
|---|---|---|
id | PID-3.1 | 123456 |
name | PID-5 | VIRTANEN^MATTI |
dob | PID-7 | 19800215 |
sex | PID-8 | M |
ssn | PID-19 | 010180-123A |
country | PID-11.6 | FI |
PV1 (patient visit)
| Key | HL7 field | Example |
|---|---|---|
patient_class | PV1-2 | I |
assigned_location | PV1-3 | ICU^BED3 |
attending_doctor | PV1-7 | LAHTINEN^ANNA |
admit_datetime | PV1-44 | 20260115083000 |
OBX (observation)
| Key | HL7 field | Example |
|---|---|---|
value_type | OBX-2 | NM |
identifier | OBX-3 | WBC^White Blood Cell |
value | OBX-5 | 7.5 |
unit | OBX-6 | 10*9/L |
status | OBX-11 | F |
The obx variable contains the first OBX segment. The obx_list variable
contains all OBX segments as a list of maps with the same keys.
Writing rules
A rule has three parts: a name, a CEL expression, and an error message. The expression must evaluate to a boolean. Rules are evaluated in order. The first failure short-circuits. No further rules run.
rules:
- name: require-patient-id
expression: pid.id != ""
message: "PID-3.1 (patient ID) is required"
If pid.id is empty, the server returns an AR (Application Reject) ACK with
the message “PID-3.1 (patient ID) is required.” The sender knows exactly what’s
wrong.
Simple field checks
Require a field to be present:
- name: require-sending-facility
expression: msh.sending_fac != ""
message: "MSH-4 (sending facility) is required"
- name: require-control-id
expression: msh.control_id != ""
message: "MSH-10 (message control ID) is required"
Restrict to specific values:
- name: adt-only
expression: msh.msg_type == "ADT"
message: "Only ADT messages accepted at this endpoint"
- name: allowed-versions
expression: msh.version == "2.3" || msh.version == "2.4" || msh.version == "2.5"
message: "HL7 version must be 2.3, 2.4, or 2.5"
The in operator
Check membership in a list:
- name: admit-discharge-transfer
expression: msh.trigger in ["A01", "A02", "A03", "A04", "A08"]
message: "Only ADT triggers A01-A04 and A08 are accepted"
- name: known-sender
expression: msh.sending_fac in ["HOSP_A", "HOSP_B", "LAB_CENTRAL"]
message: "Unknown sending facility"
Conditional logic
Rules can use && (and), || (or), and ! (not). A common pattern is “if
condition A holds, then condition B must also hold”:
- name: inpatient-requires-location
expression: pv1.patient_class != "I" || pv1.assigned_location != ""
message: "Inpatient encounters (PV1-2=I) must have an assigned location (PV1-3)"
This reads: “either the patient is not an inpatient, or they have a location.” If the patient is an inpatient and has no location, both sides are false and the rule fails.
This is a standard pattern for conditional requirements. The expression
A != X || B != "" is equivalent to “if A equals X, then B must not be empty.”
Safe field access
Direct field access on a missing key causes an evaluation error:
# Dangerous if PID segment is missing or country is empty
- name: finnish-only
expression: pid.country == "FI"
If the message has no PID segment, pid is an empty map. Accessing
pid.country on an empty map is a key-not-found error. The server returns AE
(Application Error) instead of AR (Application Reject). The sender retries,
gets the same AE, retries again. Wrong behavior.
The safe access operator ? with .orValue() handles missing keys:
- name: finnish-only
expression: pid[?'country'].orValue("") == "FI"
message: "Only Finnish patients accepted at this endpoint"
If country is missing, .orValue("") returns an empty string. The expression
evaluates to "" == "FI", which is false. The server returns AR with a clear
message. The sender knows not to retry.
Use ? and .orValue() whenever a field might be absent. The four segment
maps (msh, pid, pv1, obx) are always present but may be empty if the
corresponding segment is missing from the message.
OBX list operations
Messages often carry multiple OBX (observation) segments. Lab results, vital
signs, and diagnostic reports typically have one OBX per observation value. The
obx_list variable gives access to all of them.
Require all observations to have final status
- name: all-observations-final
expression: obx_list.all(o, o.status == "F")
message: "All OBX segments must have final status (OBX-11=F)"
.all() returns true only if every element in the list satisfies the condition.
If any OBX has a status other than “F” (final), the rule fails.
Require at least one observation
- name: has-observations
expression: obx_list.exists(o, o.status == "F")
message: "At least one final OBX segment is required"
.exists() returns true if any element satisfies the condition. This is weaker
than .all(). It passes even if some OBX segments have preliminary status,
as long as at least one is final.
Check observation values
- name: observations-have-values
expression: obx_list.all(o, o.value != "")
message: "All OBX segments must have a value (OBX-5)"
Empty list behavior
If a message has no OBX segments, obx_list is an empty list.
obx_list.all(...) on an empty list returns true (vacuous truth).
obx_list.exists(...) on an empty list returns false.
This matters for rules that combine checks:
# Passes messages with no OBX segments (all of zero is true)
- name: all-final
expression: obx_list.all(o, o.status == "F")
# Rejects messages with no OBX segments (exists on empty is false)
- name: has-final
expression: obx_list.exists(o, o.status == "F")
If you want to require that OBX segments are present and all have final status, use both:
- name: has-observations
expression: size(obx_list) > 0
message: "At least one OBX segment is required"
- name: all-observations-final
expression: obx_list.all(o, o.status == "F")
message: "All OBX segments must have final status"
Rule order matters. The has-observations rule runs first. If it fails, the
all-observations-final rule never evaluates.
How rules map to ACK codes
The validation result determines which acknowledgment code the sender receives:
| What happened | ACK code | Sender action |
|---|---|---|
| All rules pass | AA | Move to next message |
| A rule evaluates to false | AR | Fix the message, don’t retry |
| A rule fails to evaluate | AE | Retry (receiver’s problem) |
| CEL syntax error at startup | (none) | Server won’t start |
The distinction between AR and AE is why safe field access matters. A rule that
errors because of a missing key returns AE, which tells the sender to retry. A
rule that evaluates cleanly to false returns AR, which tells the sender the
message is wrong. Use ? and .orValue() to keep validation failures in the
AR category.
Rules are compiled when the server starts. A syntax error, a type mismatch, or a reference to an undeclared variable prevents startup. This means a bad rule fails the boot, not a message at 3am.
A complete example
A configuration for a lab results endpoint that accepts ORU messages from known senders, requires patient identification, and enforces final observation status:
rules:
- name: require-control-id
expression: msh.control_id != ""
message: "MSH-10 (message control ID) is required"
- name: oru-only
expression: msh.msg_type == "ORU"
message: "Only ORU messages accepted at this endpoint"
- name: known-sender
expression: msh.sending_fac in ["LAB_CENTRAL", "LAB_NORTH", "LAB_SOUTH"]
message: "Unknown sending facility"
- name: require-patient-id
expression: pid.id != ""
message: "PID-3.1 (patient ID) is required"
- name: require-observations
expression: size(obx_list) > 0
message: "At least one OBX segment is required"
- name: all-observations-final
expression: obx_list.all(o, o.status == "F")
message: "All OBX segments must have final status (OBX-11=F)"
Six rules, evaluated in order. A message missing a control ID is rejected before the message type is checked. An ORU from an unknown sender is rejected before the patient ID is checked. The sender gets a specific error message for the first rule that fails.
For background on the acknowledgment codes these rules produce, see HL7 ACK and NAK codes: AA, AR, AE explained. For the protocol layer beneath validation, see What is MLLP and how does it work.