Troubleshooting

Common issues and solutions.

Server won’t start

Symptom: The process exits with listen tcp :2575: bind: address already in use.

Cause: Another process is already listening on that port.

Fix:

lsof -i :2575

Use a different address:

LISTEN_ADDR=:2576 ./mllp-server

TLS handshake failures

Symptom: Clients log tls: handshake failure or certificate signed by unknown authority.

Cause: Certificate and key don’t match, the certificate expired, or the client doesn’t trust the CA. Clients using TLS 1.1 or earlier are also rejected — the server requires TLS 1.2 minimum.

Fix: Verify the certificate:

openssl s_client -connect localhost:2575 -showcerts
openssl x509 -in /path/to/cert.pem -noout -dates

Reload updated certificates without restarting:

kill -HUP $(pgrep mllp-server)

See Configuration for TLS_CERT_FILE, TLS_KEY_FILE, TLS_CLIENT_CA, and TLS_MIN_VERSION.

Messages rejected with AR

Symptom: Sender receives a NAK with acknowledgement code AR.

Cause: A CEL validation rule evaluated to false. AR means the message was structurally valid but failed a business rule. The NAK message text includes the name of the failing rule.

Fix: Check the rule in config.yaml:

validation:
  rules:
    - name: require-patient-id
      expression: pid['id'] != ""
      message: "PID-3.1 is required"

Test against a sample message. Ensure the field path matches the actual HL7 structure. See Configuration — CEL variables for the full list of available fields.

Messages rejected with AE

Symptom: Sender receives a NAK with acknowledgement code AE.

Cause: A CEL expression caused an evaluation error — typically referencing a field that doesn’t exist in the message, or a type mismatch.

Fix: Check the server logs for the specific error. Common causes:

Accessing a field absent in the message — use the ? operator for safe access: pid[?'country'].orValue("")
Comparing incompatible types (string vs integer)

Connector not receiving messages

Symptom: Messages are accepted (sender gets AA) but the downstream system never receives them.

Cause: The connector’s CEL filter doesn’t match, or the connector is disabled.

Fix: Check the connector in config.yaml:

connectors:
  - name: my-connector
    disabled: false
    filter: msh.msg_type == "ADT"
    url: http://downstream/hl7

Set filter: "true" temporarily to match all messages. Once confirmed, refine the expression. Enable debug logging:

LOG_LEVEL=debug ./mllp-server

DLQ filling up

Symptom: The dead letter queue grows. Messages are not reaching the downstream system.

Cause: The downstream endpoint is unreachable, or the connector has exhausted its retry attempts.

Fix: Verify connectivity:

curl -v http://downstream/hl7

Increase max_attempts in the connector config if the downstream has intermittent availability. Inspect queued messages with the CLI tool (stop the server first — the database uses an exclusive lock).

High memory usage

Symptom: Process RSS grows beyond expected levels under load.

Cause: Too many concurrent connections held open, each with its own read buffer. Large HL7 messages compound this.

Fix: Cap concurrent connections:

MAX_CONNECTIONS=200 ./mllp-server

Monitor with the mllp.connections.active metric. See Configuration for OpenTelemetry setup.

Logs not rotating

Symptom: The log file grows without bound. SIGHUP from outside a container has no effect.

Cause: The signal doesn’t reach the process inside the container.

Fix: Send the signal directly:

kubectl exec <pod> -- kill -HUP 1

Or switch to stdout and let the container runtime handle rotation:

LOG_OUTPUT=stdout ./mllp-server

Log level not changing at runtime

Symptom: Debug output doesn’t appear after you expected to enable it.

Cause: SIGUSR1 was not sent to the process.

Fix:

kill -USR1 $(pgrep mllp-server)

Each signal cycles the level: debug → info → warn → error → debug.

CLI can’t open database

Symptom: mllp-cli fails with timeout: file is locked.

Cause: The server holds an exclusive lock on the database file. Only one process can access it at a time.

Fix: Stop the server first:

kill $(pgrep mllp-server)
./mllp-cli -db outbox.db

In Kubernetes, scale the deployment to zero replicas before running maintenance on the database volume.

Certificate reload failed after SIGHUP

Symptom: Server logs a permission or path error after receiving SIGHUP. The old certificate continues to be served.

Cause: The files at TLS_CERT_FILE or TLS_KEY_FILE are missing, moved, or not readable by the server process.

Fix: Confirm the files exist and are readable:

ls -l "$TLS_CERT_FILE" "$TLS_KEY_FILE"

SIGHUP only re-reads files at the paths configured at startup. If a secrets manager writes to a new path, update the environment variables and restart.