feat: add observability stack and background task infrastructure
Add OpenTelemetry instrumentation with distributed tracing and metrics: - Structured JSON logging with trace context correlation - Auto-instrumentation for FastAPI, asyncpg, httpx, redis - OTLP exporter for traces and Prometheus metrics endpoint Implement Celery worker and notification task system: - Celery app with Redis/SQS broker support and configurable queues - Notification tasks for incident fan-out, webhooks, and escalations - Pluggable TaskQueue abstraction with in-memory driver for testing Add Grafana observability stack (Loki, Tempo, Prometheus, Grafana): - OpenTelemetry Collector for receiving OTLP traces and logs - Tempo for distributed tracing backend - Loki for log aggregation with Promtail DaemonSet - Prometheus for metrics scraping with RBAC configuration - Grafana with pre-provisioned datasources and API overview dashboard - Helm templates for all observability components Enhance application infrastructure: - Global exception handlers with structured ErrorResponse schema - Request logging middleware with timing metrics - Health check updated to verify task queue connectivity - Non-root user in Dockerfile for security - Init containers in Helm deployments for dependency ordering - Production Helm values with autoscaling and retention policies
This commit is contained in:
@@ -3,6 +3,47 @@
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class ErrorDetail(BaseModel):
|
||||
"""Individual error detail for validation errors."""
|
||||
|
||||
loc: list[str | int] = Field(description="Location of the error (field path)")
|
||||
msg: str = Field(description="Error message")
|
||||
type: str = Field(description="Error type identifier")
|
||||
|
||||
|
||||
class ErrorResponse(BaseModel):
|
||||
"""Structured error response returned by all error handlers."""
|
||||
|
||||
error: str = Field(description="Error type (e.g., 'not_found', 'validation_error')")
|
||||
message: str = Field(description="Human-readable error message")
|
||||
details: list[ErrorDetail] | None = Field(
|
||||
default=None, description="Additional error details for validation errors"
|
||||
)
|
||||
request_id: str | None = Field(
|
||||
default=None, description="Request trace ID for debugging"
|
||||
)
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"examples": [
|
||||
{
|
||||
"error": "not_found",
|
||||
"message": "Incident not found",
|
||||
"request_id": "abc123def456",
|
||||
},
|
||||
{
|
||||
"error": "validation_error",
|
||||
"message": "Request validation failed",
|
||||
"details": [
|
||||
{"loc": ["body", "title"], "msg": "Field required", "type": "missing"}
|
||||
],
|
||||
"request_id": "abc123def456",
|
||||
},
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
class CursorParams(BaseModel):
|
||||
"""Pagination parameters using cursor-based pagination."""
|
||||
|
||||
|
||||
Reference in New Issue
Block a user