feat: add Logto healthcheck to /health endpoint #2

Merged
maximus merged 1 commit from issue-1-logto-healthcheck into main 2026-04-22 01:56:23 +00:00
Owner

Fixes #1.

Changes

  • New logto: {status, responseTimeMs, error?} field in /health response
  • URL configurable via LOGTO_HEALTH_URL env (default: https://auth.lacompagniemaximus.com/oidc/.well-known/openid-configuration)
  • 3s timeout via AbortController ; /health stays HTTP 200 even if Logto is down
  • getCpuPercent converted to async (setTimeout-based delay) so the 500ms CPU sample and the Logto fetch run concurrently via Promise.all ; total latency stays max(500ms, <=3000ms) instead of the sum
  • Commit project CLAUDE.md (previously untracked) with the new field documented

Smoke tests (local)

Scenario logto.status responseTimeMs error HTTP Total latency
Real Logto (happy) up 255 - 200 566 ms
DNS fail (.invalid) down 47 fetch failed 200 568 ms
HTTP 404 down 206 HTTP 404 200 538 ms
Hanging endpoint (httpbin delay/10) down 3003 timeout 200 3061 ms

Last row confirms the AbortController timeout bound and that /health still returns 200 on Logto down.

Acceptance criteria

  • /health response includes logto: {status, responseTimeMs, error?}
  • Timeout bounded at 3s (no blocking beyond)
  • /health stays HTTP 200 when Logto is down
  • Total latency ~= max(500ms CPU, 3000ms Logto), not the sum (parallelized via Promise.all)
Fixes #1. ## Changes - New `logto: {status, responseTimeMs, error?}` field in `/health` response - URL configurable via `LOGTO_HEALTH_URL` env (default: `https://auth.lacompagniemaximus.com/oidc/.well-known/openid-configuration`) - 3s timeout via `AbortController` ; `/health` stays HTTP 200 even if Logto is down - `getCpuPercent` converted to async (`setTimeout`-based delay) so the 500ms CPU sample and the Logto fetch run concurrently via `Promise.all` ; total latency stays `max(500ms, <=3000ms)` instead of the sum - Commit project `CLAUDE.md` (previously untracked) with the new field documented ## Smoke tests (local) | Scenario | logto.status | responseTimeMs | error | HTTP | Total latency | |---|---|---|---|---|---| | Real Logto (happy) | up | 255 | - | 200 | 566 ms | | DNS fail (.invalid) | down | 47 | fetch failed | 200 | 568 ms | | HTTP 404 | down | 206 | HTTP 404 | 200 | 538 ms | | Hanging endpoint (httpbin delay/10) | down | 3003 | timeout | 200 | 3061 ms | Last row confirms the AbortController timeout bound and that `/health` still returns 200 on Logto down. ## Acceptance criteria - [x] `/health` response includes `logto: {status, responseTimeMs, error?}` - [x] Timeout bounded at 3s (no blocking beyond) - [x] `/health` stays HTTP 200 when Logto is down - [x] Total latency ~= max(500ms CPU, 3000ms Logto), not the sum (parallelized via `Promise.all`)
maximus added 1 commit 2026-04-22 01:38:37 +00:00
Fixes #1.

- New `logto: {status, responseTimeMs, error?}` field in /health response
- Configurable via LOGTO_HEALTH_URL env (default: auth.lacompagniemaximus.com
  OIDC discovery endpoint)
- 3s timeout via AbortController; /health stays HTTP 200 even if Logto is down
- getCpuPercent converted to async (setTimeout-based delay) so the 500ms CPU
  sample and the Logto fetch run concurrently via Promise.all; total latency
  stays max(500ms, <=3000ms) instead of the sum
- Commit project CLAUDE.md (previously untracked) with the new field documented

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
maximus added the
status:review
type:feature
labels 2026-04-22 01:38:49 +00:00
Author
Owner

Verdict: APPROVE

Summary

Clean implementation of the Logto healthcheck with correct timeout bounding, proper parallelization, and fail-safe error handling. Smoke test matrix covers the four key scenarios (happy path, DNS fail, HTTP error, timeout) and confirms /health stays HTTP 200 on Logto down.

Checklist walkthrough

Security — OK. No secrets in the diff, .env.example updated (not .env), URL goes through fetch (no shell/SQL). Auth gate still runs before getHealth().

Correctness — OK.

  • getCpuPercent rewrite from execSync("sleep 0.5") to await delay(500) removes blocking and enables parallelism.
  • Promise.all([getCpuPercent(), getLogtoHealth()]) gives max(500ms, <=3000ms) as claimed.
  • AbortController + clearTimeout in finally — no leaked timer.
  • getLogtoHealth always resolves (never throws), so /health stays 200 when Logto is down. Confirmed by row 4 of the smoke matrix (httpbin delay/10).
  • Error branches cover: network failure, HTTP non-2xx, AbortError"timeout".

Tests — No automated test suite exists for this ~127-line service, consistent with project conventions. The PR documents a 4-scenario manual smoke matrix including the timeout boundary. Acceptable.

Quality — Minimal, idiomatic, with a good inline comment explaining the async CPU sampler. CLAUDE.md now tracked and updated.

Data — N/A (no DB, no migrations).

Suggestions (non-blocking)

  1. index.js:12LOGTO_TIMEOUT_MS is hardcoded. If you ever want to tune it in prod without a rebuild, expose it via env. Not needed right now.
  2. index.js:155 — the 500 branch returns err.message (pre-existing pattern). Fine while the endpoint is auth-gated, but worth keeping in mind if the service is ever fronted by a public probe — a readFileSync error could leak a filesystem path.
  3. index.js:47-63 — optional: log a server-side line on Logto down so post-mortems don't depend on catching the JSON response at the moment it happened.
## Verdict: APPROVE ## Summary Clean implementation of the Logto healthcheck with correct timeout bounding, proper parallelization, and fail-safe error handling. Smoke test matrix covers the four key scenarios (happy path, DNS fail, HTTP error, timeout) and confirms `/health` stays HTTP 200 on Logto down. ## Checklist walkthrough **Security** — OK. No secrets in the diff, `.env.example` updated (not `.env`), URL goes through `fetch` (no shell/SQL). Auth gate still runs before `getHealth()`. **Correctness** — OK. - `getCpuPercent` rewrite from `execSync("sleep 0.5")` to `await delay(500)` removes blocking and enables parallelism. - `Promise.all([getCpuPercent(), getLogtoHealth()])` gives `max(500ms, <=3000ms)` as claimed. - `AbortController` + `clearTimeout` in `finally` — no leaked timer. - `getLogtoHealth` always resolves (never throws), so `/health` stays 200 when Logto is down. Confirmed by row 4 of the smoke matrix (httpbin delay/10). - Error branches cover: network failure, HTTP non-2xx, `AbortError` → `"timeout"`. **Tests** — No automated test suite exists for this ~127-line service, consistent with project conventions. The PR documents a 4-scenario manual smoke matrix including the timeout boundary. Acceptable. **Quality** — Minimal, idiomatic, with a good inline comment explaining the async CPU sampler. `CLAUDE.md` now tracked and updated. **Data** — N/A (no DB, no migrations). ## Suggestions (non-blocking) 1. `index.js:12` — `LOGTO_TIMEOUT_MS` is hardcoded. If you ever want to tune it in prod without a rebuild, expose it via env. Not needed right now. 2. `index.js:155` — the 500 branch returns `err.message` (pre-existing pattern). Fine while the endpoint is auth-gated, but worth keeping in mind if the service is ever fronted by a public probe — a `readFileSync` error could leak a filesystem path. 3. `index.js:47-63` — optional: log a server-side line on Logto `down` so post-mortems don't depend on catching the JSON response at the moment it happened.
maximus merged commit fc3c3a9268 into main 2026-04-22 01:56:23 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: maximus/vps-health-api#2
No description provided.