Troubleshooting
Common issues and solutions.
Server won’t start
Symptom: The process exits with listen tcp :2575: bind: address already in use.
Cause: Another process is already listening on that port.
Fix:
lsof -i :2575
Use a different address:
LISTEN_ADDR=:2576 ./mllp-server
TLS handshake failures
Symptom: Clients log tls: handshake failure or certificate signed by unknown authority.
Cause: Certificate and key don’t match, the certificate expired, or the client doesn’t trust the CA. Clients using TLS 1.1 or earlier are also rejected — the server requires TLS 1.2 minimum.
Fix: Verify the certificate:
openssl s_client -connect localhost:2575 -showcerts
openssl x509 -in /path/to/cert.pem -noout -dates
Reload updated certificates without restarting:
kill -HUP $(pgrep mllp-server)
See Configuration for TLS_CERT_FILE, TLS_KEY_FILE, TLS_CLIENT_CA, and TLS_MIN_VERSION.
Messages rejected with AR
Symptom: Sender receives a NAK with acknowledgement code AR.
Cause: A CEL validation rule evaluated to false. AR means the message was structurally valid but failed a business rule. The NAK message text includes the name of the failing rule.
Fix: Check the rule in config.yaml:
validation:
rules:
- name: require-patient-id
expression: pid['id'] != ""
message: "PID-3.1 is required"
Test against a sample message. Ensure the field path matches the actual HL7 structure. See Configuration — CEL variables for the full list of available fields.
Messages rejected with AE
Symptom: Sender receives a NAK with acknowledgement code AE.
Cause: A CEL expression caused an evaluation error — typically referencing a field that doesn’t exist in the message, or a type mismatch.
Fix: Check the server logs for the specific error. Common causes:
- Accessing a field absent in the message — use the
?operator for safe access:pid[?'country'].orValue("") - Comparing incompatible types (string vs integer)
Connector not receiving messages
Symptom: Messages are accepted (sender gets AA) but the downstream system never receives them.
Cause: The connector’s CEL filter doesn’t match, or the connector is disabled.
Fix: Check the connector in config.yaml:
connectors:
- name: my-connector
disabled: false
filter: msh.msg_type == "ADT"
url: http://downstream/hl7
Set filter: "true" temporarily to match all messages. Once confirmed, refine the expression. Enable debug logging:
LOG_LEVEL=debug ./mllp-server
DLQ filling up
Symptom: The dead letter queue grows. Messages are not reaching the downstream system.
Cause: The downstream endpoint is unreachable, or the connector has exhausted its retry attempts.
Fix: Verify connectivity:
curl -v http://downstream/hl7
Increase max_attempts in the connector config if the downstream has intermittent availability. Inspect queued messages with the CLI tool (stop the server first — the database uses an exclusive lock).
High memory usage
Symptom: Process RSS grows beyond expected levels under load.
Cause: Too many concurrent connections held open, each with its own read buffer. Large HL7 messages compound this.
Fix: Cap concurrent connections:
MAX_CONNECTIONS=200 ./mllp-server
Monitor with the mllp.connections.active metric. See Configuration for OpenTelemetry setup.
Logs not rotating
Symptom: The log file grows without bound. SIGHUP from outside a container has no effect.
Cause: The signal doesn’t reach the process inside the container.
Fix: Send the signal directly:
kubectl exec <pod> -- kill -HUP 1
Or switch to stdout and let the container runtime handle rotation:
LOG_OUTPUT=stdout ./mllp-server
Log level not changing at runtime
Symptom: Debug output doesn’t appear after you expected to enable it.
Cause: SIGUSR1 was not sent to the process.
Fix:
kill -USR1 $(pgrep mllp-server)
Each signal cycles the level: debug → info → warn → error → debug.
CLI can’t open database
Symptom: mllp-cli fails with timeout: file is locked.
Cause: The server holds an exclusive lock on the database file. Only one process can access it at a time.
Fix: Stop the server first:
kill $(pgrep mllp-server)
./mllp-cli -db outbox.db
In Kubernetes, scale the deployment to zero replicas before running maintenance on the database volume.
Certificate reload failed after SIGHUP
Symptom: Server logs a permission or path error after receiving SIGHUP. The old certificate continues to be served.
Cause: The files at TLS_CERT_FILE or TLS_KEY_FILE are missing, moved, or not readable by the server process.
Fix: Confirm the files exist and are readable:
ls -l "$TLS_CERT_FILE" "$TLS_KEY_FILE"
SIGHUP only re-reads files at the paths configured at startup. If a secrets manager writes to a new path, update the environment variables and restart.