Health Checks¶

The Nexus Broker continuously monitors integration health across two dimensions: provider-level (is the upstream API alive?) and connection-level (is this user's credential still valid?). Both run as background workers inside the broker process.

Background Workers¶

HealthWorker — Provider-Level (5-minute interval)¶

Probes every registered OAuth2 provider by sending a synthetic invalid_grant request to its token_url. This deliberately bad request tells us whether the provider's API is reachable and responding to OAuth traffic without requiring a real user credential.

Provider Response	Status Set
`400 Bad Request` or `401 Unauthorized`	`healthy` — API is alive and rejecting correctly
`5xx Server Error`	`unhealthy` — API is down
`200 OK` (unexpected for invalid grant)	`degraded` — API behaving abnormally
Network error / timeout	`unhealthy`
No `token_url` configured	`unknown`

For non-OAuth2 providers (API key, basic auth), the worker makes a HEAD request to user_info_endpoint or api_base_url. Any non-5xx response is treated as healthy.

Concurrency: max 10 providers checked concurrently (semaphore + WaitGroup).

ConnectionHealthWorker — Connection-Level (1-minute interval)¶

Validates individual user connections in batches of 100 on a fixed ticker, prioritising those never checked or longest overdue. Each check has a 15-second timeout. A shared http.Client is reused across checks for connection pooling.

Auth Type	Check Method
`oauth2`	Attempt a background token refresh via `ConnectionService.Refresh`
`api_key`	Decrypt credential, extract `api_key` field, `GET` to `user_info_endpoint` using provider's configured `AuthHeader`
`basic_auth`	Decrypt credential, extract `username`/`password`, `GET` to `user_info_endpoint` with `Authorization: Basic`
No endpoint configured	Mark `unknown`

OAuth2 status code handling: The worker inspects RefreshResponse.StatusCode to distinguish definitive credential errors from transient failures:

Upstream Status	`health_status` set	`connection.status` changed?
Refresh succeeds	`healthy`	No
400 / 401 (invalid_grant, revoked)	`expired`	Yes → `expired` (if provider healthy)
403 (scope issue)	`degraded`	No
5xx (upstream error)	`unhealthy`	No
Network error / nil response	`degraded`	No

Provider shielding: Before expiring a connection, the worker cross-references the upstream provider's health_status. If the provider is unhealthy or degraded, the connection is marked unhealthy (retriable) instead of expired (terminal). This prevents mass-expiration during transient upstream outages.

Error handling: If UpdateStatus fails when expiring a connection, the worker logs the error and skips the health_status write to avoid leaving the connection in an inconsistent state.

Concurrency: max 20 connections checked concurrently (semaphore + WaitGroup).

`health_status` Values¶

Both provider_profiles and connections share the same status vocabulary:

Value	Meaning
`healthy`	Last check passed
`unhealthy`	Last check failed — retriable (transient upstream or provider-shielded)
`degraded`	Partial failure — scope issues, network errors, or internal errors where credential validity is unknown
`expired`	Credential confirmed invalid (400/401) — user must re-authenticate
`unknown`	Not yet checked, or not enough information to check

API Endpoints¶

`GET /providers/health`¶

Returns the health status of all registered providers. No credentials are included.

GET /providers/health
Authorization: X-API-Key <key>

[
  {
    "id": "uuid",
    "name": "google",
    "health_status": "healthy",
    "last_health_check_at": "2026-05-19T07:00:00Z",
    "health_message": ""
  },
  {
    "id": "uuid",
    "name": "stripe",
    "health_status": "unhealthy",
    "last_health_check_at": "2026-05-19T07:05:00Z",
    "health_message": "upstream returned 503"
  }
]

Returns [] (not null) when no providers exist.

`GET /connections?workspace_id={workspace_id}`¶

Returns all non-pending connections for a workspace with health status. No credentials or tokens are included.

GET /connections?workspace_id=ws-123
Authorization: X-API-Key <key>

[
  {
    "id": "uuid",
    "provider_id": "uuid",
    "provider_name": "google",
    "auth_type": "oauth2",
    "status": "active",
    "scopes": ["email", "calendar.read"],
    "health_status": "healthy",
    "last_health_check_at": "2026-05-19T07:00:00Z",
    "created_at": "2026-05-01T00:00:00Z",
    "updated_at": "2026-05-19T07:00:00Z"
  }
]

Use case: Rendering a connections dashboard with live health indicators.

`GET /connections/{id}/token` (enhanced)¶

The existing token endpoint now includes health_status in its response alongside credentials and strategy.

{
  "strategy": { "type": "oauth2" },
  "credentials": { "access_token": "..." },
  "health_status": "healthy"
}

Use case: Showing an inline warning or re-auth prompt when consuming a credential.

Worker Mode¶

Health workers run inside the standard broker process. For deployments that need to separate HTTP serving from background polling, pass --worker-only to the binary:

nexus-broker --worker-only

In this mode, the HTTP server does not start. The process listens for SIGINT/SIGTERM and cancels the worker context, signalling in-flight checks to stop. Note: the current implementation does not explicitly wait for worker goroutines to complete before exiting.

The same Docker image and environment variables are used — just override the container command.

Database Migrations¶

Health check columns are added automatically by the incremental migration scripts. Run:

nexus-broker migrate up

This applies all pending migrations in order (13_add_provider_health.sql, 14_add_connection_health.sql, 15_add_connection_health_index.sql, etc.). There is no need to run individual scripts — the migrator tracks which have already been applied.