Skip to content

Monitoring

HX-SDP exposes Prometheus metrics and structured logs for integration with standard observability stacks.


Prometheus metrics

Both HX-Gate and HX-Engine expose a /metrics endpoint (enabled by default):

# Gate metrics
curl http://localhost:8080/metrics

# Engine metrics
curl http://localhost:8000/metrics

Key metrics

Metric Type Labels Description
hx_gate_requests_total counter method, path, status Total requests handled by gate
hx_gate_request_duration_seconds histogram method, path Request latency
hx_gate_cus_total counter tenant_id, operation CUs consumed per tenant
hx_gate_rate_limit_rejections_total counter tenant_id Requests rejected by rate limiter
hx_engine_compression_ratio histogram domain, verdict Compression ratio distribution
hx_engine_put_duration_seconds histogram domain PUT operation latency
hx_engine_query_duration_seconds histogram metric Query operation latency
hx_engine_active_keys gauge namespace Number of active keys per namespace
hx_engine_storage_bytes gauge namespace Total TT-core bytes per namespace

Prometheus scrape config

scrape_configs:
  - job_name: hx-gate
    static_configs:
      - targets: ["gate:8080"]
    metrics_path: /metrics
    scrape_interval: 15s

  - job_name: hx-engine
    static_configs:
      - targets: ["engine:8000"]
    metrics_path: /metrics
    scrape_interval: 15s

Audit log

The gate writes a JSONL audit log of every operation:

Location: HX_GATE_AUDIT_LOG_PATH (default: /var/log/hx-gate/audit.jsonl)

Format (one JSON object per line):

{
  "ts": "2026-01-15T10:30:42.123Z",
  "tenant_id": "acme-corp",
  "method": "POST",
  "path": "/v1/put",
  "namespace": "production",
  "status": 200,
  "cus": 1.0,
  "latency_ms": 42.3,
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Tail the audit log

# Live stream
docker exec hx-gate tail -f /var/log/hx-gate/audit.jsonl | jq .

# Filter by tenant
docker exec hx-gate cat /var/log/hx-gate/audit.jsonl | \
  jq 'select(.tenant_id == "acme-corp")'

# Errors only
docker exec hx-gate cat /var/log/hx-gate/audit.jsonl | \
  jq 'select(.status >= 400)'

Log rotation

The audit log grows indefinitely. Configure log rotation:

# /etc/logrotate.d/hx-gate
/var/log/hx-gate/audit.jsonl {
    daily
    rotate 30
    compress
    missingok
    notifempty
    copytruncate
}

Structured application logs

Both services emit structured JSON logs to stdout:

docker logs hx-gate --tail 50 -f
docker logs hx-engine --tail 50 -f

Set log level via environment:

HX_GATE_LOG_LEVEL=debug      # debug | info | warning | error
HOLONOMIX_LOG_LEVEL=debug

Grafana dashboard

Import these panels for a complete HX-SDP dashboard:

Request volume

rate(hx_gate_requests_total[5m])

P99 latency

histogram_quantile(0.99, rate(hx_gate_request_duration_seconds_bucket[5m]))

CU consumption rate (per tenant)

rate(hx_gate_cus_total[1h])

Error rate

sum(rate(hx_gate_requests_total{status=~"5.."}[5m])) /
sum(rate(hx_gate_requests_total[5m]))

Storage growth

hx_engine_storage_bytes

Compression efficiency

histogram_quantile(0.5, rate(hx_engine_compression_ratio_bucket[1h]))

Alerts

Recommended alert rules:

Alert Condition Severity
High error rate 5xx rate > 1% for 5 min Critical
Engine unreachable up{job="hx-engine"} == 0 for 2 min Critical
CU quota approaching tenant CU usage > 80% of quota Warning
Rate limit storms rate limit rejections > 100/min Warning
Storage growing fast storage_bytes increase > 1 GB/hr Info
High P99 latency P99 > 5s for 5 min Warning

Docker health checks

The Compose file includes built-in health checks:

gate:
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
    interval: 30s
    timeout: 5s
    retries: 3

engine:
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
    interval: 30s
    timeout: 5s
    retries: 3

Monitor container status:

docker compose -f deploy/hx-gate/docker-compose.fleet.yml ps