Files
incidentops/SPECS.md
minhtrannhat 359291eec7 feat: project skeleton
- infra (k8s, kind, helm, docker) backbone is implemented
- security: implementation + unit tests are done
2025-11-21 12:00:00 +00:00

4.5 KiB

IncidentOps Specification

Multi-tenant incident management API. Org context embedded in JWT — no orgId in URLs.

Architecture

Service Stack Purpose
api FastAPI, asyncpg REST API, JWT auth, RBAC
worker Celery, Redis Notifications, escalations
web Next.js Dashboard (future)

Infrastructure: PostgreSQL, Redis, ingress-nginx, Helm/Skaffold

Auth

JWT Access Token Claims

  • sub: user_id (uuid)
  • org_id: active org (uuid)
  • org_role: admin | member | viewer
  • iss: issuer (configurable, default: incidentops)
  • aud: audience (configurable, default: incidentops-api)
  • jti: unique token ID (uuid)
  • iat: issued at (unix timestamp)
  • exp: expiration (unix timestamp)

Refresh Token

  • Opaque token returned in JSON (not cookie)
  • Stored hashed in DB with active_org_id
  • Rotated on refresh and org-switch

Endpoints

Endpoint Description
POST /v1/auth/register Create user + default org, return tokens
POST /v1/auth/login Authenticate, return tokens
POST /v1/auth/refresh Rotate refresh token, mint new access token
POST /v1/auth/switch-org Change active org, rotate tokens
POST /v1/auth/logout Revoke refresh token

Authorization

Roles

Role Permissions
viewer Read-only
member + create incidents, transitions, comments
admin + manage members, notification targets

Enforcement

  • Role check via dependency injection
  • Ownership check: resource org_id must match JWT org_id

API Routes

All under /v1. Auth required unless noted.

Org (implicit from JWT)

  • GET /org — current org summary
  • GET /org/members (admin)
  • GET /org/services
  • POST /org/services (member+)
  • GET /org/notification-targets (admin)
  • POST /org/notification-targets (admin)

Incidents

  • GET /incidents?status=&cursor=&limit=
  • POST /services/{serviceId}/incidents (member+)
  • GET /incidents/{incidentId}
  • GET /incidents/{incidentId}/events
  • POST /incidents/{incidentId}/transition (member+)
  • POST /incidents/{incidentId}/comment (member+)

Health

  • GET /healthz — liveness
  • GET /readyz — readiness (postgres + redis)

Incident State Machine

Triggered → Acknowledged → Mitigated → Resolved
  • Transitions validated at application level
  • Optimistic locking via version column
  • All changes recorded in incident_events

Database Schema

Table Purpose
users User accounts
orgs Organizations
org_members User-org membership + role
services Org-scoped services
incidents Org-scoped incidents with version
incident_events Append-only timeline
refresh_tokens Token rotation + active org
notification_targets Webhook/email/slack configs
notification_attempts Delivery tracking (idempotent)

Background Jobs (Celery)

Task Queue Purpose
incident_triggered default Fan-out to notification targets
send_webhook default HTTP POST with retry
escalate_if_unacked critical Delayed escalation (stretch)

Config (Environment)

Variable Required Default
DATABASE_URL Yes
REDIS_URL No redis://localhost:6379/0
JWT_SECRET_KEY Yes
JWT_ALGORITHM No HS256
JWT_ISSUER No incidentops
JWT_AUDIENCE No incidentops-api
ACCESS_TOKEN_EXPIRE_MINUTES No 15
REFRESH_TOKEN_EXPIRE_DAYS No 30

Development

Use uv for all Python operations:

# Install dependencies
uv sync

# Run tests
uv run pytest tests/

# Run the API server
uv run uvicorn app.main:app --reload

# Run migrations
uv run python migrations/migrate.py

Project Structure

incidentops/
├── app/
│   ├── main.py              # FastAPI entry
│   ├── config.py            # pydantic-settings
│   ├── db.py                # asyncpg pool
│   ├── core/                # security, exceptions
│   ├── api/v1/              # route handlers
│   ├── schemas/             # pydantic models
│   ├── repositories/        # data access
│   └── services/            # business logic
├── worker/
│   ├── celery_app.py
│   └── tasks/
├── migrations/
│   └── *.sql + migrate.py
├── helm/
├── Dockerfile
├── docker-compose.yml
└── pyproject.toml