feat: project skeleton

- infra (k8s, kind, helm, docker) backbone is implemented
- security: implementation + unit tests are done
This commit is contained in:
2025-11-21 12:00:00 -05:00
commit fbe9fbba6e
46 changed files with 3450 additions and 0 deletions

163
SPECS.md Normal file
View File

@@ -0,0 +1,163 @@
# IncidentOps Specification
Multi-tenant incident management API. Org context embedded in JWT — no `orgId` in URLs.
## Architecture
| Service | Stack | Purpose |
|---------|-------|---------|
| **api** | FastAPI, asyncpg | REST API, JWT auth, RBAC |
| **worker** | Celery, Redis | Notifications, escalations |
| **web** | Next.js | Dashboard (future) |
**Infrastructure:** PostgreSQL, Redis, ingress-nginx, Helm/Skaffold
## Auth
### JWT Access Token Claims
- `sub`: user_id (uuid)
- `org_id`: active org (uuid)
- `org_role`: `admin | member | viewer`
- `iss`: issuer (configurable, default: `incidentops`)
- `aud`: audience (configurable, default: `incidentops-api`)
- `jti`: unique token ID (uuid)
- `iat`: issued at (unix timestamp)
- `exp`: expiration (unix timestamp)
### Refresh Token
- Opaque token returned in JSON (not cookie)
- Stored hashed in DB with `active_org_id`
- Rotated on refresh and org-switch
### Endpoints
| Endpoint | Description |
|----------|-------------|
| `POST /v1/auth/register` | Create user + default org, return tokens |
| `POST /v1/auth/login` | Authenticate, return tokens |
| `POST /v1/auth/refresh` | Rotate refresh token, mint new access token |
| `POST /v1/auth/switch-org` | Change active org, rotate tokens |
| `POST /v1/auth/logout` | Revoke refresh token |
## Authorization
### Roles
| Role | Permissions |
|------|-------------|
| viewer | Read-only |
| member | + create incidents, transitions, comments |
| admin | + manage members, notification targets |
### Enforcement
- Role check via dependency injection
- Ownership check: resource `org_id` must match JWT `org_id`
## API Routes
All under `/v1`. Auth required unless noted.
### Org (implicit from JWT)
- `GET /org` — current org summary
- `GET /org/members` (admin)
- `GET /org/services`
- `POST /org/services` (member+)
- `GET /org/notification-targets` (admin)
- `POST /org/notification-targets` (admin)
### Incidents
- `GET /incidents?status=&cursor=&limit=`
- `POST /services/{serviceId}/incidents` (member+)
- `GET /incidents/{incidentId}`
- `GET /incidents/{incidentId}/events`
- `POST /incidents/{incidentId}/transition` (member+)
- `POST /incidents/{incidentId}/comment` (member+)
### Health
- `GET /healthz` — liveness
- `GET /readyz` — readiness (postgres + redis)
## Incident State Machine
```
Triggered → Acknowledged → Mitigated → Resolved
```
- Transitions validated at application level
- Optimistic locking via `version` column
- All changes recorded in `incident_events`
## Database Schema
| Table | Purpose |
|-------|---------|
| `users` | User accounts |
| `orgs` | Organizations |
| `org_members` | User-org membership + role |
| `services` | Org-scoped services |
| `incidents` | Org-scoped incidents with version |
| `incident_events` | Append-only timeline |
| `refresh_tokens` | Token rotation + active org |
| `notification_targets` | Webhook/email/slack configs |
| `notification_attempts` | Delivery tracking (idempotent) |
## Background Jobs (Celery)
| Task | Queue | Purpose |
|------|-------|---------|
| `incident_triggered` | default | Fan-out to notification targets |
| `send_webhook` | default | HTTP POST with retry |
| `escalate_if_unacked` | critical | Delayed escalation (stretch) |
## Config (Environment)
| Variable | Required | Default |
|----------|----------|---------|
| `DATABASE_URL` | Yes | — |
| `REDIS_URL` | No | `redis://localhost:6379/0` |
| `JWT_SECRET_KEY` | Yes | — |
| `JWT_ALGORITHM` | No | `HS256` |
| `JWT_ISSUER` | No | `incidentops` |
| `JWT_AUDIENCE` | No | `incidentops-api` |
| `ACCESS_TOKEN_EXPIRE_MINUTES` | No | `15` |
| `REFRESH_TOKEN_EXPIRE_DAYS` | No | `30` |
## Development
Use `uv` for all Python operations:
```bash
# Install dependencies
uv sync
# Run tests
uv run pytest tests/
# Run the API server
uv run uvicorn app.main:app --reload
# Run migrations
uv run python migrations/migrate.py
```
## Project Structure
```
incidentops/
├── app/
│ ├── main.py # FastAPI entry
│ ├── config.py # pydantic-settings
│ ├── db.py # asyncpg pool
│ ├── core/ # security, exceptions
│ ├── api/v1/ # route handlers
│ ├── schemas/ # pydantic models
│ ├── repositories/ # data access
│ └── services/ # business logic
├── worker/
│ ├── celery_app.py
│ └── tasks/
├── migrations/
│ └── *.sql + migrate.py
├── helm/
├── Dockerfile
├── docker-compose.yml
└── pyproject.toml
```