Health Checks¶
The Nexus Broker continuously monitors integration health across two dimensions: provider-level (is the upstream API alive?) and connection-level (is this user's credential still valid?). Both run as background workers inside the broker process.
Background Workers¶
HealthWorker — Provider-Level (5-minute interval)¶
Probes every registered OAuth2 provider by sending a synthetic invalid_grant request to its token_url. This deliberately bad request tells us whether the provider's API is reachable and responding to OAuth traffic without requiring a real user credential.
| Provider Response | Status Set |
|---|---|
400 Bad Request or 401 Unauthorized |
healthy — API is alive and rejecting correctly |
5xx Server Error |
unhealthy — API is down |
200 OK (unexpected for invalid grant) |
degraded — API behaving abnormally |
| Network error / timeout | unhealthy |
No token_url configured |
unknown |
For non-OAuth2 providers (API key, basic auth), the worker makes a HEAD request to user_info_endpoint or api_base_url. Any non-5xx response is treated as healthy.
Concurrency: max 10 providers checked concurrently (semaphore + WaitGroup).
ConnectionHealthWorker — Connection-Level (1-minute interval)¶
Validates individual user connections in batches of 100 on a fixed ticker, prioritising those never checked or longest overdue. Each check has a 15-second timeout. A shared http.Client is reused across checks for connection pooling.
| Auth Type | Check Method |
|---|---|
oauth2 |
Attempt a background token refresh via ConnectionService.Refresh |
api_key |
Decrypt credential, extract api_key field, GET to user_info_endpoint using provider's configured AuthHeader |
basic_auth |
Decrypt credential, extract username/password, GET to user_info_endpoint with Authorization: Basic |
| No endpoint configured | Mark unknown |
OAuth2 status code handling: The worker inspects RefreshResponse.StatusCode to distinguish definitive credential errors from transient failures:
| Upstream Status | health_status set |
connection.status changed? |
|---|---|---|
| Refresh succeeds | healthy |
No |
| 400 / 401 (invalid_grant, revoked) | expired |
Yes → expired (if provider healthy) |
| 403 (scope issue) | degraded |
No |
| 5xx (upstream error) | unhealthy |
No |
| Network error / nil response | degraded |
No |
Provider shielding: Before expiring a connection, the worker cross-references the upstream provider's health_status. If the provider is unhealthy or degraded, the connection is marked unhealthy (retriable) instead of expired (terminal). This prevents mass-expiration during transient upstream outages.
Error handling: If UpdateStatus fails when expiring a connection, the worker logs the error and skips the health_status write to avoid leaving the connection in an inconsistent state.
Concurrency: max 20 connections checked concurrently (semaphore + WaitGroup).
health_status Values¶
Both provider_profiles and connections share the same status vocabulary:
| Value | Meaning |
|---|---|
healthy |
Last check passed |
unhealthy |
Last check failed — retriable (transient upstream or provider-shielded) |
degraded |
Partial failure — scope issues, network errors, or internal errors where credential validity is unknown |
expired |
Credential confirmed invalid (400/401) — user must re-authenticate |
unknown |
Not yet checked, or not enough information to check |
API Endpoints¶
GET /providers/health¶
Returns the health status of all registered providers. No credentials are included.
[
{
"id": "uuid",
"name": "google",
"health_status": "healthy",
"last_health_check_at": "2026-05-19T07:00:00Z",
"health_message": ""
},
{
"id": "uuid",
"name": "stripe",
"health_status": "unhealthy",
"last_health_check_at": "2026-05-19T07:05:00Z",
"health_message": "upstream returned 503"
}
]
Returns [] (not null) when no providers exist.
GET /connections?workspace_id={workspace_id}¶
Returns all non-pending connections for a workspace with health status. No credentials or tokens are included.
[
{
"id": "uuid",
"provider_id": "uuid",
"provider_name": "google",
"auth_type": "oauth2",
"status": "active",
"scopes": ["email", "calendar.read"],
"health_status": "healthy",
"last_health_check_at": "2026-05-19T07:00:00Z",
"created_at": "2026-05-01T00:00:00Z",
"updated_at": "2026-05-19T07:00:00Z"
}
]
Use case: Rendering a connections dashboard with live health indicators.
GET /connections/{id}/token (enhanced)¶
The existing token endpoint now includes health_status in its response alongside credentials and strategy.
{
"strategy": { "type": "oauth2" },
"credentials": { "access_token": "..." },
"health_status": "healthy"
}
Use case: Showing an inline warning or re-auth prompt when consuming a credential.
Worker Mode¶
Health workers run inside the standard broker process. For deployments that need to separate HTTP serving from background polling, pass --worker-only to the binary:
In this mode, the HTTP server does not start. The process listens for SIGINT/SIGTERM and cancels the worker context, signalling in-flight checks to stop. Note: the current implementation does not explicitly wait for worker goroutines to complete before exiting.
The same Docker image and environment variables are used — just override the container command.
Database Migrations¶
Health check columns are added automatically by the incremental migration scripts. Run:
This applies all pending migrations in order (13_add_provider_health.sql, 14_add_connection_health.sql, 15_add_connection_health_index.sql, etc.). There is no need to run individual scripts — the migrator tracks which have already been applied.