feat: add Logto healthcheck to /health endpoint #2

Merged

maximus merged 1 commit from issue-1-logto-healthcheck into main

2026-04-22 01:56:23 +00:00

maximus commented

2026-04-22 01:38:37 +00:00

Owner

Fixes #1.

Changes

New logto: {status, responseTimeMs, error?} field in /health response
URL configurable via LOGTO_HEALTH_URL env (default: https://auth.lacompagniemaximus.com/oidc/.well-known/openid-configuration)
3s timeout via AbortController ; /health stays HTTP 200 even if Logto is down
getCpuPercent converted to async (setTimeout-based delay) so the 500ms CPU sample and the Logto fetch run concurrently via Promise.all ; total latency stays max(500ms, <=3000ms) instead of the sum
Commit project CLAUDE.md (previously untracked) with the new field documented

Smoke tests (local)

Scenario	logto.status	responseTimeMs	error	HTTP	Total latency
Real Logto (happy)	up	255	-	200	566 ms
DNS fail (.invalid)	down	47	fetch failed	200	568 ms
HTTP 404	down	206	HTTP 404	200	538 ms
Hanging endpoint (httpbin delay/10)	down	3003	timeout	200	3061 ms

Last row confirms the AbortController timeout bound and that /health still returns 200 on Logto down.

Acceptance criteria

/health response includes logto: {status, responseTimeMs, error?}
Timeout bounded at 3s (no blocking beyond)
/health stays HTTP 200 when Logto is down
Total latency ~= max(500ms CPU, 3000ms Logto), not the sum (parallelized via Promise.all)

Fixes #1. ## Changes - New `logto: {status, responseTimeMs, error?}` field in `/health` response - URL configurable via `LOGTO_HEALTH_URL` env (default: `https://auth.lacompagniemaximus.com/oidc/.well-known/openid-configuration`) - 3s timeout via `AbortController` ; `/health` stays HTTP 200 even if Logto is down - `getCpuPercent` converted to async (`setTimeout`-based delay) so the 500ms CPU sample and the Logto fetch run concurrently via `Promise.all` ; total latency stays `max(500ms, <=3000ms)` instead of the sum - Commit project `CLAUDE.md` (previously untracked) with the new field documented ## Smoke tests (local) | Scenario | logto.status | responseTimeMs | error | HTTP | Total latency | |---|---|---|---|---|---| | Real Logto (happy) | up | 255 | - | 200 | 566 ms | | DNS fail (.invalid) | down | 47 | fetch failed | 200 | 568 ms | | HTTP 404 | down | 206 | HTTP 404 | 200 | 538 ms | | Hanging endpoint (httpbin delay/10) | down | 3003 | timeout | 200 | 3061 ms | Last row confirms the AbortController timeout bound and that `/health` still returns 200 on Logto down. ## Acceptance criteria - [x] `/health` response includes `logto: {status, responseTimeMs, error?}` - [x] Timeout bounded at 3s (no blocking beyond) - [x] `/health` stays HTTP 200 when Logto is down - [x] Total latency ~= max(500ms CPU, 3000ms Logto), not the sum (parallelized via `Promise.all`)

maximus added 1 commit 2026-04-22 01:38:37 +00:00

feat: add Logto healthcheck to /health endpoint 28dd759f98

Fixes #1.

- New `logto: {status, responseTimeMs, error?}` field in /health response
- Configurable via LOGTO_HEALTH_URL env (default: auth.lacompagniemaximus.com
  OIDC discovery endpoint)
- 3s timeout via AbortController; /health stays HTTP 200 even if Logto is down
- getCpuPercent converted to async (setTimeout-based delay) so the 500ms CPU
  sample and the Logto fetch run concurrently via Promise.all; total latency
  stays max(500ms, <=3000ms) instead of the sum
- Commit project CLAUDE.md (previously untracked) with the new field documented

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maximus added the

status:review

type:feature

labels 2026-04-22 01:38:49 +00:00

maximus commented

2026-04-22 01:41:15 +00:00

Author

Owner

Verdict: APPROVE

Summary

Clean implementation of the Logto healthcheck with correct timeout bounding, proper parallelization, and fail-safe error handling. Smoke test matrix covers the four key scenarios (happy path, DNS fail, HTTP error, timeout) and confirms /health stays HTTP 200 on Logto down.

Checklist walkthrough

Security — OK. No secrets in the diff, .env.example updated (not .env), URL goes through fetch (no shell/SQL). Auth gate still runs before getHealth().

Correctness — OK.

getCpuPercent rewrite from execSync("sleep 0.5") to await delay(500) removes blocking and enables parallelism.
Promise.all([getCpuPercent(), getLogtoHealth()]) gives max(500ms, <=3000ms) as claimed.
AbortController + clearTimeout in finally — no leaked timer.
getLogtoHealth always resolves (never throws), so /health stays 200 when Logto is down. Confirmed by row 4 of the smoke matrix (httpbin delay/10).
Error branches cover: network failure, HTTP non-2xx, AbortError → "timeout".

Tests — No automated test suite exists for this ~127-line service, consistent with project conventions. The PR documents a 4-scenario manual smoke matrix including the timeout boundary. Acceptable.

Quality — Minimal, idiomatic, with a good inline comment explaining the async CPU sampler. CLAUDE.md now tracked and updated.

Data — N/A (no DB, no migrations).

Suggestions (non-blocking)

index.js:12 — LOGTO_TIMEOUT_MS is hardcoded. If you ever want to tune it in prod without a rebuild, expose it via env. Not needed right now.
index.js:155 — the 500 branch returns err.message (pre-existing pattern). Fine while the endpoint is auth-gated, but worth keeping in mind if the service is ever fronted by a public probe — a readFileSync error could leak a filesystem path.
index.js:47-63 — optional: log a server-side line on Logto down so post-mortems don't depend on catching the JSON response at the moment it happened.

## Verdict: APPROVE ## Summary Clean implementation of the Logto healthcheck with correct timeout bounding, proper parallelization, and fail-safe error handling. Smoke test matrix covers the four key scenarios (happy path, DNS fail, HTTP error, timeout) and confirms `/health` stays HTTP 200 on Logto down. ## Checklist walkthrough **Security** — OK. No secrets in the diff, `.env.example` updated (not `.env`), URL goes through `fetch` (no shell/SQL). Auth gate still runs before `getHealth()`. **Correctness** — OK. - `getCpuPercent` rewrite from `execSync("sleep 0.5")` to `await delay(500)` removes blocking and enables parallelism. - `Promise.all([getCpuPercent(), getLogtoHealth()])` gives `max(500ms, <=3000ms)` as claimed. - `AbortController` + `clearTimeout` in `finally` — no leaked timer. - `getLogtoHealth` always resolves (never throws), so `/health` stays 200 when Logto is down. Confirmed by row 4 of the smoke matrix (httpbin delay/10). - Error branches cover: network failure, HTTP non-2xx, `AbortError` → `"timeout"`. **Tests** — No automated test suite exists for this ~127-line service, consistent with project conventions. The PR documents a 4-scenario manual smoke matrix including the timeout boundary. Acceptable. **Quality** — Minimal, idiomatic, with a good inline comment explaining the async CPU sampler. `CLAUDE.md` now tracked and updated. **Data** — N/A (no DB, no migrations). ## Suggestions (non-blocking) 1. `index.js:12` — `LOGTO_TIMEOUT_MS` is hardcoded. If you ever want to tune it in prod without a rebuild, expose it via env. Not needed right now. 2. `index.js:155` — the 500 branch returns `err.message` (pre-existing pattern). Fine while the endpoint is auth-gated, but worth keeping in mind if the service is ever fronted by a public probe — a `readFileSync` error could leak a filesystem path. 3. `index.js:47-63` — optional: log a server-side line on Logto `down` so post-mortems don't depend on catching the JSON response at the moment it happened.

maximus merged commit fc3c3a9268 into main

2026-04-22 01:56:23 +00:00

maximus referenced this pull request from a commit

2026-04-22 01:56:24 +00:00

Merge pull request 'feat: add Logto healthcheck to /health endpoint' (#2) from issue-1-logto-healthcheck into main

No reviewers