# Architecture - mockupAWS ## 1. Overview mockupAWS è una piattaforma di simulazione costi AWS che permette di profilare traffico log e calcolare i driver di costo (SQS, Lambda, Bedrock/LLM) prima del deploy in produzione. **Architettura:** Layered Architecture con pattern Repository e Service Layer **Paradigma:** Async-first (FastAPI + SQLAlchemy async) **Deployment:** Container-based (Docker Compose) --- ## 2. System Architecture ### 2.1 High-Level Architecture ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ CLIENT LAYER │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────┐ │ │ │ Logstash │ │ React Web UI │ │ API Consumers │ │ │ │ (Log Source) │ │ (Dashboard) │ │ (CI/CD, Scripts) │ │ │ └────────┬─────────┘ └────────┬─────────┘ └───────────┬──────────────┘ │ └───────────┼─────────────────────┼────────────────────────┼───────────────────┘ │ │ │ │ HTTP POST │ HTTPS │ API Key + JWT │ /ingest │ /api/v1/* │ /api/v1/* ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ API LAYER │ │ FastAPI + Uvicorn (ASGI) │ │ ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ Middleware Stack │ │ │ │ ├── CORS │ │ │ │ ├── Rate Limiting (slowapi) │ │ │ │ ├── Authentication (JWT / API Key) │ │ │ │ ├── Request Validation (Pydantic) │ │ │ │ └── Error Handling │ │ │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ /scenarios │ │ /ingest │ │ /reports │ │ /pricing │ │ │ │ CRUD │ │ (log │ │ generate │ │ (admin) │ │ │ │ │ │ intake) │ │ download │ │ │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │ └─────────┼────────────────┼────────────────┼──────────────────┼─────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ SERVICE LAYER │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │ │ │ ScenarioService │ │ IngestService │ │ CostCalculator │ │ │ │ ─────────────── │ │ ────────────── │ │ ───────────── │ │ │ │ • create() │ │ • ingest_log() │ │ • calculate_sqs_cost() │ │ │ │ • update() │ │ • batch_process()│ │ • calculate_lambda_cost() │ │ │ │ • delete() │ │ • deduplicate() │ │ • calculate_bedrock_cost() │ │ │ │ • lifecycle() │ │ • persist() │ │ • get_total_cost() │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │ │ │ ReportService │ │ PIIDetector │ │ TokenizerService │ │ │ │ ────────────── │ │ ─────────── │ │ ─────────────── │ │ │ │ • generate_csv()│ │ • detect_email()│ │ • count_tokens() │ │ │ │ • generate_pdf()│ │ • scan_patterns()│ │ • encode() │ │ │ │ • compile() │ │ • report() │ │ • get_encoding() │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │ └─────────┬──────────────────────────────────────────────────────┬────────────┘ │ │ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ REPOSITORY LAYER │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │ │ │ ScenarioRepo │ │ LogRepo │ │ PricingRepo │ │ │ │ ───────────── │ │ ─────── │ │ ────────── │ │ │ │ • get_by_id() │ │ • save() │ │ • get_by_service_region() │ │ │ │ • list() │ │ • list_by_ │ │ • list_active() │ │ │ │ • create() │ │ scenario() │ │ • update() │ │ │ │ • update() │ │ • count_by_ │ │ • bulk_insert() │ │ │ │ • delete() │ │ hash() │ │ │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ MetricRepo │ │ ReportRepo │ │ │ │ ────────── │ │ ────────── │ │ │ │ │ • save() │ │ • save() │ │ │ │ │ • get_aggregated│ │ • list() │ │ │ │ │ • list_by_type()│ │ • delete() │ │ │ │ └──────────────────┘ └──────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ SQLAlchemy 2.0 Async │ asyncpg driver ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ DATABASE LAYER │ │ PostgreSQL 15+ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │ │ │ scenarios │ │ scenario_logs │ │ aws_pricing │ │ │ │ ───────── │ │ ───────────── │ │ ─────────── │ │ │ │ • metadata │ │ • logs storage │ │ • service prices │ │ │ │ • state machine │ │ • hash for dedup│ │ • history tracking │ │ │ │ • cost totals │ │ • PII flags │ │ • region-specific │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ scenario_metrics│ │ reports │ │ │ │ │ ─────────────── │ │ ──────── │ │ │ │ │ • time-series │ │ • generated │ │ │ │ │ • aggregates │ │ • metadata │ │ │ │ │ • cost breakdown│ │ • file refs │ │ │ │ └──────────────────┘ └──────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### 2.2 Layer Responsibilities | Layer | Responsabilità | Tecnologie | |-------|----------------|------------| | **Client** | Interazione utente, ingestion log | Browser, Logstash, curl | | **API** | Routing, validation, auth, middleware | FastAPI, Pydantic, slowapi | | **Service** | Business logic, orchestration | Python async/await | | **Repository** | Data access, query abstraction | SQLAlchemy 2.0 Repository pattern | | **Database** | Persistenza, ACID, queries | PostgreSQL 15+ | --- ## 3. Database Schema ### 3.1 Entity Relationship Diagram ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ SCHEMA ERD │ └─────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────┐ ┌─────────────────────┐ │ scenarios │ │ aws_pricing │ ├─────────────────────┤ ├─────────────────────┤ │ PK id: UUID │ │ PK id: UUID │ │ name: VARCHAR(255)│ │ service: VARCHAR │ │ description: TEXT│ │ region: VARCHAR │ │ tags: JSONB │ │ tier: VARCHAR │ │ status: ENUM │ │ price: DECIMAL │ │ region: VARCHAR │ │ unit: VARCHAR │ │ created_at: TS │ │ effective_from: D│ │ updated_at: TS │ │ effective_to: D │ │ completed_at: TS │ │ is_active: BOOL │ │ total_requests: INT│ │ source_url: TEXT │ │ total_cost: DEC │ └─────────────────────┘ └──────────┬──────────┘ │ │ 1:N ▼ ┌─────────────────────┐ ┌─────────────────────┐ │ scenario_logs │ │ scenario_metrics │ ├─────────────────────┤ ├─────────────────────┤ │ PK id: UUID │ │ PK id: UUID │ │ FK scenario_id: UUID│ │ FK scenario_id: UUID│ │ received_at: TS │ │ timestamp: TS │ │ message_hash: V64│ │ metric_type: VAR │ │ message_preview │ │ metric_name: VAR │ │ source: VARCHAR │ │ value: DECIMAL │ │ size_bytes: INT │ │ unit: VARCHAR │ │ has_pii: BOOL │ │ metadata: JSONB │ │ token_count: INT │ └─────────────────────┘ │ sqs_blocks: INT │ └─────────────────────┘ │ │ 1:N (optional) ▼ ┌─────────────────────┐ │ reports │ ├─────────────────────┤ │ PK id: UUID │ │ FK scenario_id: UUID│ │ format: ENUM │ │ file_path: TEXT │ │ generated_at: TS │ │ metadata: JSONB │ └─────────────────────┘ ``` ### 3.2 DDL - Schema Definition ```sql -- ============================================ -- EXTENSIONS -- ============================================ CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; CREATE EXTENSION IF NOT EXISTS "pg_trgm"; -- For text search -- ============================================ -- ENUMS -- ============================================ CREATE TYPE scenario_status AS ENUM ('draft', 'running', 'completed', 'archived'); CREATE TYPE report_format AS ENUM ('pdf', 'csv'); -- ============================================ -- TABLE: scenarios -- ============================================ CREATE TABLE scenarios ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), name VARCHAR(255) NOT NULL, description TEXT, tags JSONB DEFAULT '[]'::jsonb, status scenario_status NOT NULL DEFAULT 'draft', region VARCHAR(50) NOT NULL DEFAULT 'us-east-1', created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), completed_at TIMESTAMP WITH TIME ZONE, started_at TIMESTAMP WITH TIME ZONE, total_requests INTEGER NOT NULL DEFAULT 0, total_cost_estimate DECIMAL(12, 6) NOT NULL DEFAULT 0.000000, -- Constraints CONSTRAINT chk_name_not_empty CHECK (char_length(trim(name)) > 0), CONSTRAINT chk_region_not_empty CHECK (char_length(trim(region)) > 0) ); -- Indexes CREATE INDEX idx_scenarios_status ON scenarios(status); CREATE INDEX idx_scenarios_region ON scenarios(region); CREATE INDEX idx_scenarios_created_at ON scenarios(created_at DESC); CREATE INDEX idx_scenarios_tags ON scenarios USING GIN(tags); -- Trigger for updated_at CREATE OR REPLACE FUNCTION update_updated_at_column() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = NOW(); RETURN NEW; END; $$ language 'plpgsql'; CREATE TRIGGER update_scenarios_updated_at BEFORE UPDATE ON scenarios FOR EACH ROW EXECUTE FUNCTION update_updated_at_column(); -- ============================================ -- TABLE: scenario_logs -- ============================================ CREATE TABLE scenario_logs ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), scenario_id UUID NOT NULL REFERENCES scenarios(id) ON DELETE CASCADE, received_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), message_hash VARCHAR(64) NOT NULL, -- SHA256 message_preview VARCHAR(500), source VARCHAR(100) DEFAULT 'unknown', size_bytes INTEGER NOT NULL DEFAULT 0, has_pii BOOLEAN NOT NULL DEFAULT FALSE, token_count INTEGER NOT NULL DEFAULT 0, sqs_blocks INTEGER NOT NULL DEFAULT 1, -- Constraints CONSTRAINT chk_size_positive CHECK (size_bytes >= 0), CONSTRAINT chk_token_positive CHECK (token_count >= 0), CONSTRAINT chk_blocks_positive CHECK (sqs_blocks >= 1) ); -- Indexes CREATE INDEX idx_logs_scenario_id ON scenario_logs(scenario_id); CREATE INDEX idx_logs_received_at ON scenario_logs(received_at DESC); CREATE INDEX idx_logs_message_hash ON scenario_logs(message_hash); CREATE INDEX idx_logs_has_pii ON scenario_logs(has_pii) WHERE has_pii = TRUE; -- ============================================ -- TABLE: scenario_metrics -- ============================================ CREATE TABLE scenario_metrics ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), scenario_id UUID NOT NULL REFERENCES scenarios(id) ON DELETE CASCADE, timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), metric_type VARCHAR(50) NOT NULL, -- 'sqs', 'lambda', 'bedrock', 'safety' metric_name VARCHAR(100) NOT NULL, value DECIMAL(15, 6) NOT NULL DEFAULT 0.000000, unit VARCHAR(20) NOT NULL, -- 'count', 'bytes', 'tokens', 'usd', 'invocations' metadata JSONB DEFAULT '{}'::jsonb ); -- Indexes CREATE INDEX idx_metrics_scenario_id ON scenario_metrics(scenario_id); CREATE INDEX idx_metrics_timestamp ON scenario_metrics(timestamp DESC); CREATE INDEX idx_metrics_type ON scenario_metrics(metric_type); CREATE INDEX idx_metrics_scenario_type ON scenario_metrics(scenario_id, metric_type); -- ============================================ -- TABLE: aws_pricing -- ============================================ CREATE TABLE aws_pricing ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), service VARCHAR(50) NOT NULL, -- 'sqs', 'lambda', 'bedrock' region VARCHAR(50) NOT NULL, tier VARCHAR(50) NOT NULL DEFAULT 'standard', price_per_unit DECIMAL(15, 10) NOT NULL, unit VARCHAR(20) NOT NULL, -- 'per_million_requests', 'per_gb_second', 'per_1k_tokens' effective_from DATE NOT NULL DEFAULT CURRENT_DATE, effective_to DATE, is_active BOOLEAN NOT NULL DEFAULT TRUE, source_url VARCHAR(500), description TEXT, -- Constraints CONSTRAINT chk_price_positive CHECK (price_per_unit >= 0), CONSTRAINT chk_valid_dates CHECK (effective_to IS NULL OR effective_to >= effective_from), CONSTRAINT uq_pricing_unique_active UNIQUE (service, region, tier, effective_from) WHERE is_active = TRUE ); -- Indexes CREATE INDEX idx_pricing_service ON aws_pricing(service); CREATE INDEX idx_pricing_region ON aws_pricing(region); CREATE INDEX idx_pricing_active ON aws_pricing(service, region, tier) WHERE is_active = TRUE; -- ============================================ -- TABLE: reports -- ============================================ CREATE TABLE reports ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), scenario_id UUID NOT NULL REFERENCES scenarios(id) ON DELETE CASCADE, format report_format NOT NULL, file_path VARCHAR(500) NOT NULL, file_size_bytes INTEGER, generated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), generated_by VARCHAR(100), -- user_id or api_key_id metadata JSONB DEFAULT '{}'::jsonb ); -- Indexes CREATE INDEX idx_reports_scenario_id ON reports(scenario_id); CREATE INDEX idx_reports_generated_at ON reports(generated_at DESC); ``` ### 3.3 Key Queries ```sql -- Query: Get scenario with aggregated metrics SELECT s.*, COUNT(DISTINCT sl.id) as total_logs, COUNT(DISTINCT CASE WHEN sl.has_pii THEN sl.id END) as pii_violations, SUM(sl.token_count) as total_tokens, SUM(sl.sqs_blocks) as total_sqs_blocks FROM scenarios s LEFT JOIN scenario_logs sl ON s.id = sl.scenario_id WHERE s.id = :scenario_id GROUP BY s.id; -- Query: Get cost breakdown by service SELECT metric_type, SUM(value) as total_value, unit FROM scenario_metrics WHERE scenario_id = :scenario_id AND metric_name LIKE '%cost%' GROUP BY metric_type, unit; -- Query: Get active pricing for service/region SELECT * FROM aws_pricing WHERE service = :service AND region = :region AND is_active = TRUE AND (effective_to IS NULL OR effective_to >= CURRENT_DATE) ORDER BY effective_from DESC LIMIT 1; ``` --- ## 4. API Specifications ### 4.1 OpenAPI Overview ```yaml openapi: 3.0.0 info: title: mockupAWS API version: 0.2.0 description: AWS Cost Simulation Platform API servers: - url: http://localhost:8000/api/v1 description: Development server security: - BearerAuth: [] - ApiKeyAuth: [] ``` ### 4.2 Endpoints #### Scenarios API ```yaml # POST /scenarios - Create new scenario request: content: application/json: schema: type: object required: [name, region] properties: name: type: string minLength: 1 maxLength: 255 description: type: string tags: type: array items: type: string region: type: string enum: [us-east-1, us-west-2, eu-west-1, eu-central-1] tier: type: string enum: [standard, on-demand] default: standard response: 201: content: application/json: schema: $ref: '#/components/schemas/Scenario' # GET /scenarios - List scenarios parameters: - name: status in: query schema: type: string enum: [draft, running, completed, archived] - name: region in: query schema: type: string - name: page in: query schema: type: integer default: 1 - name: page_size in: query schema: type: integer default: 20 maximum: 100 response: 200: content: application/json: schema: type: object properties: items: type: array items: $ref: '#/components/schemas/Scenario' total: type: integer page: type: integer page_size: type: integer # GET /scenarios/{id} - Get scenario details # PUT /scenarios/{id} - Update scenario # DELETE /scenarios/{id} - Delete scenario # POST /scenarios/{id}/start - Start scenario # POST /scenarios/{id}/stop - Stop scenario # POST /scenarios/{id}/archive - Archive scenario ``` #### Ingest API ```yaml # POST /ingest - Ingest log headers: X-Scenario-ID: required: true schema: type: string format: uuid request: content: application/json: schema: type: object required: [message] properties: message: type: string minLength: 1 source: type: string default: unknown response: 202: description: Log accepted content: application/json: schema: type: object properties: status: type: string example: accepted log_id: type: string format: uuid estimated_cost_impact: type: number 400: description: Invalid scenario or scenario not running ``` #### Metrics API ```yaml # GET /scenarios/{id}/metrics - Get scenario metrics response: 200: content: application/json: schema: type: object properties: scenario_id: type: string summary: type: object properties: total_requests: type: integer total_cost_usd: type: number sqs_blocks: type: integer lambda_invocations: type: integer llm_tokens: type: integer pii_violations: type: integer cost_breakdown: type: array items: type: object properties: service: type: string cost_usd: type: number percentage: type: number timeseries: type: array items: type: object properties: timestamp: type: string format: date-time metric_type: type: string value: type: number ``` #### Reports API ```yaml # POST /scenarios/{id}/reports - Generate report request: content: application/json: schema: type: object required: [format] properties: format: type: string enum: [pdf, csv] include_logs: type: boolean default: false date_from: type: string format: date-time date_to: type: string format: date-time response: 202: description: Report generation started content: application/json: schema: type: object properties: report_id: type: string status: type: string enum: [pending, processing, completed] download_url: type: string # GET /reports/{id}/download - Download report # GET /reports/{id}/status - Check report status ``` #### Pricing API (Admin) ```yaml # GET /pricing - List pricing # POST /pricing - Create pricing entry # PUT /pricing/{id} - Update pricing # DELETE /pricing/{id} - Delete pricing (soft delete) ``` ### 4.3 Schemas ```yaml components: schemas: Scenario: type: object properties: id: type: string format: uuid name: type: string description: type: string tags: type: array items: type: string status: type: string enum: [draft, running, completed, archived] region: type: string created_at: type: string format: date-time updated_at: type: string format: date-time completed_at: type: string format: date-time total_requests: type: integer total_cost_estimate: type: number LogEntry: type: object properties: id: type: string format: uuid scenario_id: type: string format: uuid received_at: type: string format: date-time message_hash: type: string message_preview: type: string source: type: string size_bytes: type: integer has_pii: type: boolean token_count: type: integer sqs_blocks: type: integer securitySchemes: BearerAuth: type: http scheme: bearer bearerFormat: JWT ApiKeyAuth: type: apiKey in: header name: X-API-Key ``` --- ## 5. Data Flow ### 5.1 Log Ingestion Flow ``` ┌──────────┐ POST /ingest ┌──────────────┐ │ Client │ ───────────────────────>│ FastAPI │ │(Logstash)│ Headers: │ Middleware │ │ │ X-Scenario-ID: uuid │ │ └──────────┘ └──────┬───────┘ │ │ 1. Validate scenario exists & running │ 2. Parse JSON payload ▼ ┌──────────────┐ │ Ingest │ │ Service │ └──────┬───────┘ │ ┌───────────────────────┼───────────────────────┐ │ │ │ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ PII Detector │ │ SQS Calculator│ │ Tokenizer │ │ • check email│ │ • calc blocks │ │ • count │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ has_pii: bool │ sqs_blocks: int │ tokens: int └──────────────────────┼─────────────────────┘ │ ▼ ┌──────────────┐ │ LogRepo │ │ save() │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ PostgreSQL │ │ scenario_logs│ └──────────────┘ ``` ### 5.2 Scenario State Machine ``` ┌─────────────────────────────────────────────────────────┐ │ │ ▼ │ ┌──────────┐ POST /start ┌──────────┐ │ ┌───────│ DRAFT │────────────────────>│ RUNNING │ │ │ └──────────┘ └────┬─────┘ │ │ ▲ │ │ │ │ │ POST /stop │ │ │ POST /archive ▼ │ │ │ ┌──────────┐ │ │ ┌────┴────┐<────────────────────│COMPLETED │──────────────────┘ │ │ARCHIVED │ └──────────┘ └──────>└─────────┘ ``` ### 5.3 Cost Calculation Flow ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ COST CALCULATION PIPELINE │ └─────────────────────────────────────────────────────────────────────────┘ Input: scenario_logs row ├─ sqs_blocks ├─ token_count └─ (future: lambda_gb_seconds) │ ▼ ┌─────────────────┐ │ Pricing Service │ │ • get_active() │ └────────┬────────┘ │ Query: SELECT * FROM aws_pricing │ WHERE service IN ('sqs', 'lambda', 'bedrock') │ AND region = :scenario_region │ AND is_active = TRUE ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ COST FORMULAS │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ SQS Cost: │ │ cost = blocks × price_per_million / 1,000,000 │ │ Example: 100 blocks × $0.40 / 1M = $0.00004 │ │ │ │ Lambda Cost: │ │ request_cost = invocations × price_per_million / 1,000,000 │ │ compute_cost = gb_seconds × price_per_gb_second │ │ total = request_cost + compute_cost │ │ Example: 1M invoc × $0.20/1M + 10GBs × $0.00001667 = $0.20 + $0.00017│ │ │ │ Bedrock Cost: │ │ input_cost = input_tokens × price_per_1k_input / 1,000 │ │ output_cost = output_tokens × price_per_1k_output / 1,000 │ │ total = input_cost + output_cost │ │ Example: 1000 tokens × $0.003/1K = $0.003 │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────┐ │ Update │ │ scenarios │ │ total_cost │ └─────────────────┘ ``` --- ## 6. Security Architecture ### 6.1 Authentication & Authorization ``` ┌─────────────────────────────────────────────────────────────────┐ │ AUTHENTICATION LAYERS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Layer 1: API Key (Programmatic Access) │ │ ├─ Header: X-API-Key: │ │ ├─ Rate limiting: 1000 req/min │ │ └─ Scope: /ingest, /metrics (read-only on other resources) │ │ │ │ Layer 2: JWT Token (Web UI Access) │ │ ├─ Header: Authorization: Bearer │ │ ├─ Expiration: 24h │ │ ├─ Refresh token: 7d │ │ └─ Scope: Full access based on roles │ │ │ │ Layer 3: Role-Based Access Control (RBAC) │ │ ├─ admin: Full access │ │ ├─ user: CRUD own scenarios, read pricing │ │ └─ readonly: View only │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### 6.2 Data Security | Layer | Measure | Implementation | |-------|---------|----------------| | **Transport** | TLS 1.3 | Nginx reverse proxy | | **Storage** | Hashing | SHA-256 for message_hash | | **PII** | Detection + Truncation | Email regex, 500 char preview limit | | **API** | Rate Limiting | slowapi: 100/min public, 1000/min authenticated | | **DB** | Parameterized Queries | SQLAlchemy ORM (no raw SQL) | | **Secrets** | Environment Variables | python-dotenv, Docker secrets | ### 6.3 PII Detection Strategy ```python # Pattern matching for common PII def detect_pii(message: str) -> dict: patterns = { 'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', 'ssn': r'\b\d{3}-\d{2}-\d{4}\b', 'credit_card': r'\b(?:\d[ -]*?){13,16}\b', 'phone': r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b' } results = {} for pii_type, pattern in patterns.items(): matches = re.findall(pattern, message) if matches: results[pii_type] = len(matches) return { 'has_pii': len(results) > 0, 'pii_types': list(results.keys()), 'total_matches': sum(results.values()) } ``` --- ## 7. Technology Stack ### 7.1 Backend | Component | Technology | Version | Purpose | |-----------|------------|---------|---------| | Framework | FastAPI | ≥0.110 | Web framework | | Server | Uvicorn | ≥0.29 | ASGI server | | Validation | Pydantic | ≥2.7 | Data validation | | ORM | SQLAlchemy | ≥2.0 | Database ORM | | Migrations | Alembic | latest | DB migrations | | Driver | asyncpg | latest | Async PostgreSQL | | Tokenizer | tiktoken | ≥0.6 | Token counting | | Rate Limit | slowapi | latest | API rate limiting | | Auth | python-jose | latest | JWT handling | | Testing | pytest | ≥8.1 | Test framework | | HTTP Client | httpx | ≥0.27 | Async HTTP | ### 7.2 Frontend | Component | Technology | Version | Purpose | |-----------|------------|---------|---------| | Framework | React | ≥18 | UI library | | Language | TypeScript | ≥5.0 | Type safety | | Build | Vite | latest | Build tool | | Styling | Tailwind CSS | ≥3.4 | CSS framework | | Components | shadcn/ui | latest | UI components | | Charts | Recharts | latest | Data viz | | State | React Query | ≥5.0 | Server state | | HTTP | Axios | latest | HTTP client | | Routing | React Router | ≥6.0 | Navigation | ### 7.3 Infrastructure | Component | Technology | Purpose | |-----------|------------|---------| | Container | Docker | Application containers | | Orchestration | Docker Compose | Multi-container dev | | Database | PostgreSQL 15+ | Primary data store | | Reverse Proxy | Nginx | SSL, static files | | Process Manager | systemd / PM2 | Production process mgmt | --- ## 8. Project Structure ``` mockupAWS/ ├── backend/ │ ├── src/ │ │ ├── __init__.py │ │ ├── main.py # FastAPI app entry │ │ ├── config.py # Settings & env vars │ │ ├── dependencies.py # FastAPI dependencies │ │ ├── models/ # SQLAlchemy models │ │ │ ├── __init__.py │ │ │ ├── base.py # Base model │ │ │ ├── scenario.py │ │ │ ├── scenario_log.py │ │ │ ├── scenario_metric.py │ │ │ ├── aws_pricing.py │ │ │ └── report.py │ │ ├── schemas/ # Pydantic schemas │ │ │ ├── __init__.py │ │ │ ├── scenario.py │ │ │ ├── log.py │ │ │ ├── metric.py │ │ │ ├── pricing.py │ │ │ └── report.py │ │ ├── api/ # API routes │ │ │ ├── __init__.py │ │ │ ├── deps.py # Dependencies │ │ │ └── v1/ │ │ │ ├── __init__.py │ │ │ ├── scenarios.py # /scenarios/* │ │ │ ├── ingest.py # /ingest │ │ │ ├── metrics.py # /metrics │ │ │ ├── reports.py # /reports │ │ │ └── pricing.py # /pricing │ │ ├── services/ # Business logic │ │ │ ├── __init__.py │ │ │ ├── scenario_service.py │ │ │ ├── ingest_service.py │ │ │ ├── cost_calculator.py │ │ │ ├── report_service.py │ │ │ └── pii_detector.py │ │ ├── repositories/ # Data access │ │ │ ├── __init__.py │ │ │ ├── base.py │ │ │ ├── scenario_repo.py │ │ │ ├── log_repo.py │ │ │ ├── metric_repo.py │ │ │ └── pricing_repo.py │ │ ├── core/ # Core utilities │ │ │ ├── __init__.py │ │ │ ├── security.py # Auth, JWT │ │ │ ├── database.py # DB connection │ │ │ └── exceptions.py # Custom exceptions │ │ └── utils/ # Utilities │ │ ├── __init__.py │ │ └── hashing.py # SHA-256 utils │ ├── alembic/ # Database migrations │ │ ├── versions/ # Migration files │ │ ├── env.py │ │ └── alembic.ini │ ├── tests/ │ │ ├── __init__.py │ │ ├── conftest.py # pytest fixtures │ │ ├── unit/ │ │ │ ├── test_services.py │ │ │ └── test_cost_calculator.py │ │ ├── integration/ │ │ │ ├── test_api_scenarios.py │ │ │ ├── test_api_ingest.py │ │ │ └── test_api_metrics.py │ │ └── e2e/ │ │ └── test_full_flow.py │ ├── Dockerfile │ ├── pyproject.toml │ └── requirements.txt │ ├── frontend/ │ ├── src/ │ │ ├── components/ │ │ │ ├── ui/ # shadcn/ui components │ │ │ ├── layout/ │ │ │ │ ├── Header.tsx │ │ │ │ ├── Sidebar.tsx │ │ │ │ └── Layout.tsx │ │ │ ├── scenarios/ │ │ │ │ ├── ScenarioList.tsx │ │ │ │ ├── ScenarioCard.tsx │ │ │ │ ├── ScenarioForm.tsx │ │ │ │ └── ScenarioDetail.tsx │ │ │ ├── metrics/ │ │ │ │ ├── MetricCard.tsx │ │ │ │ ├── CostChart.tsx │ │ │ │ └── MetricsDashboard.tsx │ │ │ └── reports/ │ │ │ ├── ReportGenerator.tsx │ │ │ └── ReportDownload.tsx │ │ ├── pages/ │ │ │ ├── Dashboard.tsx │ │ │ ├── ScenariosPage.tsx │ │ │ ├── ScenarioCreate.tsx │ │ │ ├── ScenarioDetail.tsx │ │ │ ├── Compare.tsx │ │ │ ├── Reports.tsx │ │ │ └── Settings.tsx │ │ ├── hooks/ │ │ │ ├── useScenarios.ts │ │ │ ├── useMetrics.ts │ │ │ └── useReports.ts │ │ ├── services/ │ │ │ ├── api.ts # Axios config │ │ │ ├── scenarioApi.ts │ │ │ └── metricApi.ts │ │ ├── types/ │ │ │ ├── scenario.ts │ │ │ ├── metric.ts │ │ │ └── api.ts │ │ ├── context/ │ │ │ └── ThemeContext.tsx │ │ ├── App.tsx │ │ └── main.tsx │ ├── public/ │ ├── index.html │ ├── Dockerfile │ ├── package.json │ ├── tsconfig.json │ ├── tailwind.config.js │ └── vite.config.ts │ ├── docker-compose.yml ├── nginx.conf ├── .env.example ├── .env ├── .gitignore └── README.md ``` --- ## 9. Decisioni Architetturali ### DEC-001: Async-First Architecture **Decisione:** Utilizzare Python async/await in tutto lo stack (FastAPI, SQLAlchemy, asyncpg) **Motivazione:** - Alto throughput richiesto (>1000 RPS) - I/O bound operations (DB, tokenizer) - Migliore utilizzo risorse rispetto a sync **Alternative considerate:** - Sync + ThreadPool: Più semplice ma meno efficiente - Celery + Redis: Troppo complesso per use case **Conseguenze:** - Curva di apprendimento per async - Debugging più complesso - Migliore scalabilità --- ### DEC-002: Repository Pattern **Decisione:** Implementare Repository Pattern per accesso dati **Motivazione:** - Separazione tra business logic e data access - Facile testing con mock repositories - Possibilità di cambiare DB in futuro **Struttura:** ```python class BaseRepository(Generic[T]): async def get(self, id: UUID) -> T | None: ... async def list(self, **filters) -> list[T]: ... async def create(self, obj: T) -> T: ... async def update(self, id: UUID, data: dict) -> T: ... async def delete(self, id: UUID) -> bool: ... ``` --- ### DEC-003: Separate Database per Scenario **Decisione:** Utilizzare una singola tabella `scenario_logs` con `scenario_id` FK invece di DB separati **Motivazione:** - Più semplice da gestire - Query cross-scenario possibili (confronti) - Backup/restore più semplice **Alternative considerate:** - Schema per scenario: Troppo overhead - DB separati: Troppo complesso per MVP --- ### DEC-004: Message Hashing for Deduplication **Decisione:** Utilizzare SHA-256 hash del messaggio per deduplicazione **Motivazione:** - Privacy: Non memorizzare messaggi completi - Performance: Hash lookup O(1) - Storage: Risparmio spazio **Implementazione:** ```python import hashlib message_hash = hashlib.sha256(message.encode()).hexdigest() ``` --- ### DEC-005: Time-Series Metrics **Decisione:** Salvare metriche come time-series in `scenario_metrics` **Motivazione:** - Trend analysis possibile - Aggregazioni flessibili - Audit trail **Trade-off:** - Più storage rispetto a campi aggregati - Query più complesse ma indicizzate --- ## 10. Performance Considerations ### 10.1 Database Optimization | Optimization | Implementation | Benefit | |--------------|----------------|---------| | Indexes | B-tree on foreign keys, timestamps | Fast lookups | | GIN | tags (JSONB) | Fast array search | | Partitioning | scenario_logs by date | Query pruning | | Connection Pool | asyncpg pool (20-50) | Concurrency | ### 10.2 Caching Strategy (Future) ``` Layer 1: In-memory (FastAPI state) ├─ Active scenario metadata └─ AWS pricing (rarely changes) Layer 2: Redis (future) ├─ Session storage ├─ Rate limiting counters └─ Report generation status ``` ### 10.3 Query Optimization - Use `selectinload` for relationships - Batch inserts for logs (copy_expert) - Materialized views for reports - Async tasks for heavy operations --- ## 11. Error Handling Strategy ### 11.1 Exception Hierarchy ```python class AppException(Exception): """Base application exception""" status_code: int = 500 code: str = "internal_error" class NotFoundException(AppException): status_code = 404 code = "not_found" class ValidationException(AppException): status_code = 400 code = "validation_error" class ConflictException(AppException): status_code = 409 code = "conflict" class RateLimitException(AppException): status_code = 429 code = "rate_limited" ``` ### 11.2 Global Exception Handler ```python @app.exception_handler(AppException) async def app_exception_handler(request: Request, exc: AppException): return JSONResponse( status_code=exc.status_code, content={ "error": exc.code, "message": str(exc), "timestamp": datetime.utcnow().isoformat() } ) ``` --- ## 12. Deployment Architecture ### 12.1 Docker Compose (Development) ```yaml version: '3.8' services: postgres: image: postgres:15-alpine environment: POSTGRES_DB: mockupaws POSTGRES_USER: app POSTGRES_PASSWORD: ${DB_PASSWORD} volumes: - postgres_data:/var/lib/postgresql/data ports: - "5432:5432" healthcheck: test: ["CMD-SHELL", "pg_isready -U app -d mockupaws"] backend: build: ./backend environment: DATABASE_URL: postgresql+asyncpg://app:${DB_PASSWORD}@postgres:5432/mockupaws ports: - "8000:8000" depends_on: postgres: condition: service_healthy frontend: build: ./frontend ports: - "3000:80" depends_on: - backend volumes: postgres_data: ``` ### 12.2 Production Considerations - Use managed PostgreSQL (AWS RDS, Azure PostgreSQL) - Nginx as reverse proxy with SSL - Environment-specific configuration - Log aggregation (ELK or similar) - Monitoring (Prometheus + Grafana) - Health checks and readiness probes --- *Documento creato da @spec-architect* *Versione: 1.0* *Data: 2026-04-07*