Monitoring¶

HX-SDP exposes Prometheus metrics and structured logs for integration with standard observability stacks.

Prometheus metrics¶

Both HX-Gate and HX-Engine expose a /metrics endpoint (enabled by default):

# Gate metrics
curl http://localhost:8080/metrics

# Engine metrics
curl http://localhost:8000/metrics

Key metrics¶

Metric	Type	Labels	Description
`hx_gate_requests_total`	counter	`method`, `path`, `status`	Total requests handled by gate
`hx_gate_request_duration_seconds`	histogram	`method`, `path`	Request latency
`hx_gate_cus_total`	counter	`tenant_id`, `operation`	CUs consumed per tenant
`hx_gate_rate_limit_rejections_total`	counter	`tenant_id`	Requests rejected by rate limiter
`hx_engine_compression_ratio`	histogram	`domain`, `verdict`	Compression ratio distribution
`hx_engine_put_duration_seconds`	histogram	`domain`	PUT operation latency
`hx_engine_query_duration_seconds`	histogram	`metric`	Query operation latency
`hx_engine_active_keys`	gauge	`namespace`	Number of active keys per namespace
`hx_engine_storage_bytes`	gauge	`namespace`	Total TT-core bytes per namespace

Prometheus scrape config¶

scrape_configs:
  - job_name: hx-gate
    static_configs:
      - targets: ["gate:8080"]
    metrics_path: /metrics
    scrape_interval: 15s

  - job_name: hx-engine
    static_configs:
      - targets: ["engine:8000"]
    metrics_path: /metrics
    scrape_interval: 15s

Audit log¶

The gate writes a JSONL audit log of every operation:

Location: HX_GATE_AUDIT_LOG_PATH (default: /var/log/hx-gate/audit.jsonl)

Format (one JSON object per line):

{
  "ts": "2026-01-15T10:30:42.123Z",
  "tenant_id": "acme-corp",
  "method": "POST",
  "path": "/v1/put",
  "namespace": "production",
  "status": 200,
  "cus": 1.0,
  "latency_ms": 42.3,
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Tail the audit log¶

# Live stream
docker exec hx-gate tail -f /var/log/hx-gate/audit.jsonl | jq .

# Filter by tenant
docker exec hx-gate cat /var/log/hx-gate/audit.jsonl | \
  jq 'select(.tenant_id == "acme-corp")'

# Errors only
docker exec hx-gate cat /var/log/hx-gate/audit.jsonl | \
  jq 'select(.status >= 400)'

Log rotation¶

The audit log grows indefinitely. Configure log rotation:

# /etc/logrotate.d/hx-gate
/var/log/hx-gate/audit.jsonl {
    daily
    rotate 30
    compress
    missingok
    notifempty
    copytruncate
}

Structured application logs¶

Both services emit structured JSON logs to stdout:

docker logs hx-gate --tail 50 -f
docker logs hx-engine --tail 50 -f

Set log level via environment:

HX_GATE_LOG_LEVEL=debug      # debug | info | warning | error
HOLONOMIX_LOG_LEVEL=debug

Grafana dashboard¶

Import these panels for a complete HX-SDP dashboard:

Request volume¶

rate(hx_gate_requests_total[5m])

P99 latency¶

histogram_quantile(0.99, rate(hx_gate_request_duration_seconds_bucket[5m]))

CU consumption rate (per tenant)¶

rate(hx_gate_cus_total[1h])

Error rate¶

sum(rate(hx_gate_requests_total{status=~"5.."}[5m])) /
sum(rate(hx_gate_requests_total[5m]))

Storage growth¶

hx_engine_storage_bytes

Compression efficiency¶

histogram_quantile(0.5, rate(hx_engine_compression_ratio_bucket[1h]))

Alerts¶

Recommended alert rules:

Alert	Condition	Severity
High error rate	5xx rate > 1% for 5 min	Critical
Engine unreachable	`up{job="hx-engine"} == 0` for 2 min	Critical
CU quota approaching	tenant CU usage > 80% of quota	Warning
Rate limit storms	rate limit rejections > 100/min	Warning
Storage growing fast	storage_bytes increase > 1 GB/hr	Info
High P99 latency	P99 > 5s for 5 min	Warning

Docker health checks¶

The Compose file includes built-in health checks:

gate:
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
    interval: 30s
    timeout: 5s
    retries: 3

engine:
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
    interval: 30s
    timeout: 5s
    retries: 3

Monitor container status:

docker compose -f deploy/hx-gate/docker-compose.fleet.yml ps