Files

Luca Sacchi Ricciardi cd6f8ad166 docs: complete architecture specifications and project planning

Add comprehensive technical specifications for mockupAWS v0.2.0:

- export/architecture.md: Complete system architecture with:
  * Layered architecture diagram (Client → API → Service → Repository → DB)
  * Full database schema with DDL SQL (5 tables, indexes, constraints)
  * API specifications (OpenAPI format) for all endpoints
  * Security architecture (auth, PII detection, rate limiting)
  * Data flow diagrams (ingestion, cost calculation, state machine)
  * Technology stack details (backend, frontend, infrastructure)
  * Project structure for backend and frontend
  * 4 Architecture Decision Records (DEC-001 to DEC-004)

- export/kanban.md: Task breakdown with 32 tasks organized in:
  * Database setup (DB-001 to DB-007)
  * Backend models/schemas (BE-001 to BE-003)
  * Backend repositories (BE-004 to BE-008)
  * Backend services (BE-009 to BE-014)
  * Backend API (BE-015 to BE-020)
  * Testing (QA-001 to QA-003)

- export/progress.md: Project tracking initialized with:
  * Current status: 0% complete, Fase 1 setup
  * Sprint planning and metrics
  * Resource links and team assignments

All specifications follow 'Little Often' principle with tasks < 2 hours.

2026-04-07 13:10:12 +02:00

52 KiB

Raw Blame History

Architecture - mockupAWS

1. Overview

mockupAWS è una piattaforma di simulazione costi AWS che permette di profilare traffico log e calcolare i driver di costo (SQS, Lambda, Bedrock/LLM) prima del deploy in produzione.

Architettura: Layered Architecture con pattern Repository e Service Layer
Paradigma: Async-first (FastAPI + SQLAlchemy async)
Deployment: Container-based (Docker Compose)

2. System Architecture

2.1 High-Level Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                               CLIENT LAYER                                   │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────────────┐  │
│  │   Logstash       │  │   React Web UI   │  │   API Consumers          │  │
│  │   (Log Source)   │  │   (Dashboard)    │  │   (CI/CD, Scripts)       │  │
│  └────────┬─────────┘  └────────┬─────────┘  └───────────┬──────────────┘  │
└───────────┼─────────────────────┼────────────────────────┼───────────────────┘
            │                     │                        │
            │ HTTP POST           │ HTTPS                  │ API Key + JWT
            │ /ingest             │ /api/v1/*              │ /api/v1/*
            ▼                     ▼                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                                API LAYER                                     │
│                         FastAPI + Uvicorn (ASGI)                             │
│  ┌──────────────────────────────────────────────────────────────────────┐   │
│  │  Middleware Stack                                                    │   │
│  │  ├── CORS                                                            │   │
│  │  ├── Rate Limiting (slowapi)                                         │   │
│  │  ├── Authentication (JWT / API Key)                                  │   │
│  │  ├── Request Validation (Pydantic)                                   │   │
│  │  └── Error Handling                                                  │   │
│  └──────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐   │
│  │ /scenarios   │ │ /ingest      │ │ /reports     │ │ /pricing         │   │
│  │   CRUD       │ │   (log       │ │   generate   │ │   (admin)        │   │
│  │              │ │    intake)   │ │   download   │ │                  │   │
│  └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘   │
└─────────┼────────────────┼────────────────┼──────────────────┼─────────────┘
          │                │                │                  │
          ▼                ▼                ▼                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                             SERVICE LAYER                                    │
│  ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │
│  │  ScenarioService │ │  IngestService   │ │  CostCalculator              │ │
│  │  ─────────────── │ │  ──────────────  │ │  ─────────────               │ │
│  │  • create()      │ │  • ingest_log()  │ │  • calculate_sqs_cost()      │ │
│  │  • update()      │ │  • batch_process()│ │  • calculate_lambda_cost()   │ │
│  │  • delete()      │ │  • deduplicate() │ │  • calculate_bedrock_cost()  │ │
│  │  • lifecycle()   │ │  • persist()     │ │  • get_total_cost()          │ │
│  └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │
│  ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │
│  │  ReportService   │ │  PIIDetector     │ │  TokenizerService            │ │
│  │  ──────────────  │ │  ───────────     │ │  ───────────────             │ │
│  │  • generate_csv()│ │  • detect_email()│ │  • count_tokens()            │ │
│  │  • generate_pdf()│ │  • scan_patterns()│ │  • encode()                  │ │
│  │  • compile()     │ │  • report()      │ │  • get_encoding()            │ │
│  └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │
└─────────┬──────────────────────────────────────────────────────┬────────────┘
          │                                                      │
          ▼                                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           REPOSITORY LAYER                                   │
│  ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │
│  │  ScenarioRepo    │ │  LogRepo         │ │  PricingRepo                 │ │
│  │  ─────────────   │ │  ───────         │ │  ──────────                  │ │
│  │  • get_by_id()   │ │  • save()        │ │  • get_by_service_region()   │ │
│  │  • list()        │ │  • list_by_      │ │  • list_active()             │ │
│  │  • create()      │ │    scenario()    │ │  • update()                  │ │
│  │  • update()      │ │  • count_by_     │ │  • bulk_insert()             │ │
│  │  • delete()      │ │    hash()        │ │                              │ │
│  └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │
│  ┌──────────────────┐ ┌──────────────────┐                                  │
│  │  MetricRepo      │ │  ReportRepo      │                                  │
│  │  ──────────      │ │  ──────────      │                                  │ │
│  │  • save()        │ │  • save()        │                                  │ │
│  │  • get_aggregated│ │  • list()        │                                  │ │
│  │  • list_by_type()│ │  • delete()      │                                  │ │
│  └──────────────────┘ └──────────────────┘                                  │
└─────────────────────────────────────────────────────────────────────────────┘
          │
          │ SQLAlchemy 2.0 Async
          │ asyncpg driver
          ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           DATABASE LAYER                                     │
│                              PostgreSQL 15+                                  │
│  ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │
│  │  scenarios       │ │  scenario_logs   │ │  aws_pricing                 │ │
│  │  ─────────       │ │  ─────────────   │ │  ───────────                 │ │
│  │  • metadata      │ │  • logs storage  │ │  • service prices            │ │
│  │  • state machine │ │  • hash for dedup│ │  • history tracking          │ │
│  │  • cost totals   │ │  • PII flags     │ │  • region-specific           │ │
│  └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │
│  ┌──────────────────┐ ┌──────────────────┐                                  │
│  │  scenario_metrics│ │  reports         │                                  │ │
│  │  ─────────────── │ │  ────────        │                                  │ │
│  │  • time-series   │ │  • generated     │                                  │ │
│  │  • aggregates    │ │  • metadata      │                                  │ │
│  │  • cost breakdown│ │  • file refs     │                                  │ │
│  └──────────────────┘ └──────────────────┘                                  │
└─────────────────────────────────────────────────────────────────────────────┘

2.2 Layer Responsibilities

Layer	Responsabilità	Tecnologie
Client	Interazione utente, ingestion log	Browser, Logstash, curl
API	Routing, validation, auth, middleware	FastAPI, Pydantic, slowapi
Service	Business logic, orchestration	Python async/await
Repository	Data access, query abstraction	SQLAlchemy 2.0 Repository pattern
Database	Persistenza, ACID, queries	PostgreSQL 15+

3. Database Schema

3.1 Entity Relationship Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                              SCHEMA ERD                                  │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────┐         ┌─────────────────────┐
│     scenarios       │         │   aws_pricing       │
├─────────────────────┤         ├─────────────────────┤
│ PK id: UUID         │         │ PK id: UUID         │
│    name: VARCHAR(255)│         │    service: VARCHAR │
│    description: TEXT│         │    region: VARCHAR  │
│    tags: JSONB      │         │    tier: VARCHAR    │
│    status: ENUM     │         │    price: DECIMAL   │
│    region: VARCHAR  │         │    unit: VARCHAR    │
│    created_at: TS   │         │    effective_from: D│
│    updated_at: TS   │         │    effective_to: D  │
│    completed_at: TS │         │    is_active: BOOL  │
│    total_requests: INT│       │    source_url: TEXT │
│    total_cost: DEC  │         └─────────────────────┘
└──────────┬──────────┘
           │
           │ 1:N
           ▼
┌─────────────────────┐         ┌─────────────────────┐
│   scenario_logs     │         │  scenario_metrics   │
├─────────────────────┤         ├─────────────────────┤
│ PK id: UUID         │         │ PK id: UUID         │
│ FK scenario_id: UUID│         │ FK scenario_id: UUID│
│    received_at: TS  │         │    timestamp: TS    │
│    message_hash: V64│         │    metric_type: VAR │
│    message_preview  │         │    metric_name: VAR │
│    source: VARCHAR  │         │    value: DECIMAL   │
│    size_bytes: INT  │         │    unit: VARCHAR    │
│    has_pii: BOOL    │         │    metadata: JSONB  │
│    token_count: INT │         └─────────────────────┘
│    sqs_blocks: INT  │
└─────────────────────┘
           │
           │ 1:N (optional)
           ▼
┌─────────────────────┐
│      reports        │
├─────────────────────┤
│ PK id: UUID         │
│ FK scenario_id: UUID│
│    format: ENUM     │
│    file_path: TEXT  │
│    generated_at: TS │
│    metadata: JSONB  │
└─────────────────────┘

3.2 DDL - Schema Definition

-- ============================================
-- EXTENSIONS
-- ============================================
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pg_trgm"; -- For text search

-- ============================================
-- ENUMS
-- ============================================
CREATE TYPE scenario_status AS ENUM ('draft', 'running', 'completed', 'archived');
CREATE TYPE report_format AS ENUM ('pdf', 'csv');

-- ============================================
-- TABLE: scenarios
-- ============================================
CREATE TABLE scenarios (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    name VARCHAR(255) NOT NULL,
    description TEXT,
    tags JSONB DEFAULT '[]'::jsonb,
    status scenario_status NOT NULL DEFAULT 'draft',
    region VARCHAR(50) NOT NULL DEFAULT 'us-east-1',
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    completed_at TIMESTAMP WITH TIME ZONE,
    started_at TIMESTAMP WITH TIME ZONE,
    total_requests INTEGER NOT NULL DEFAULT 0,
    total_cost_estimate DECIMAL(12, 6) NOT NULL DEFAULT 0.000000,
    
    -- Constraints
    CONSTRAINT chk_name_not_empty CHECK (char_length(trim(name)) > 0),
    CONSTRAINT chk_region_not_empty CHECK (char_length(trim(region)) > 0)
);

-- Indexes
CREATE INDEX idx_scenarios_status ON scenarios(status);
CREATE INDEX idx_scenarios_region ON scenarios(region);
CREATE INDEX idx_scenarios_created_at ON scenarios(created_at DESC);
CREATE INDEX idx_scenarios_tags ON scenarios USING GIN(tags);

-- Trigger for updated_at
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ language 'plpgsql';

CREATE TRIGGER update_scenarios_updated_at
    BEFORE UPDATE ON scenarios
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- ============================================
-- TABLE: scenario_logs
-- ============================================
CREATE TABLE scenario_logs (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    scenario_id UUID NOT NULL REFERENCES scenarios(id) ON DELETE CASCADE,
    received_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    message_hash VARCHAR(64) NOT NULL, -- SHA256
    message_preview VARCHAR(500),
    source VARCHAR(100) DEFAULT 'unknown',
    size_bytes INTEGER NOT NULL DEFAULT 0,
    has_pii BOOLEAN NOT NULL DEFAULT FALSE,
    token_count INTEGER NOT NULL DEFAULT 0,
    sqs_blocks INTEGER NOT NULL DEFAULT 1,
    
    -- Constraints
    CONSTRAINT chk_size_positive CHECK (size_bytes >= 0),
    CONSTRAINT chk_token_positive CHECK (token_count >= 0),
    CONSTRAINT chk_blocks_positive CHECK (sqs_blocks >= 1)
);

-- Indexes
CREATE INDEX idx_logs_scenario_id ON scenario_logs(scenario_id);
CREATE INDEX idx_logs_received_at ON scenario_logs(received_at DESC);
CREATE INDEX idx_logs_message_hash ON scenario_logs(message_hash);
CREATE INDEX idx_logs_has_pii ON scenario_logs(has_pii) WHERE has_pii = TRUE;

-- ============================================
-- TABLE: scenario_metrics
-- ============================================
CREATE TABLE scenario_metrics (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    scenario_id UUID NOT NULL REFERENCES scenarios(id) ON DELETE CASCADE,
    timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    metric_type VARCHAR(50) NOT NULL, -- 'sqs', 'lambda', 'bedrock', 'safety'
    metric_name VARCHAR(100) NOT NULL,
    value DECIMAL(15, 6) NOT NULL DEFAULT 0.000000,
    unit VARCHAR(20) NOT NULL, -- 'count', 'bytes', 'tokens', 'usd', 'invocations'
    metadata JSONB DEFAULT '{}'::jsonb
);

-- Indexes
CREATE INDEX idx_metrics_scenario_id ON scenario_metrics(scenario_id);
CREATE INDEX idx_metrics_timestamp ON scenario_metrics(timestamp DESC);
CREATE INDEX idx_metrics_type ON scenario_metrics(metric_type);
CREATE INDEX idx_metrics_scenario_type ON scenario_metrics(scenario_id, metric_type);

-- ============================================
-- TABLE: aws_pricing
-- ============================================
CREATE TABLE aws_pricing (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    service VARCHAR(50) NOT NULL, -- 'sqs', 'lambda', 'bedrock'
    region VARCHAR(50) NOT NULL,
    tier VARCHAR(50) NOT NULL DEFAULT 'standard',
    price_per_unit DECIMAL(15, 10) NOT NULL,
    unit VARCHAR(20) NOT NULL, -- 'per_million_requests', 'per_gb_second', 'per_1k_tokens'
    effective_from DATE NOT NULL DEFAULT CURRENT_DATE,
    effective_to DATE,
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    source_url VARCHAR(500),
    description TEXT,
    
    -- Constraints
    CONSTRAINT chk_price_positive CHECK (price_per_unit >= 0),
    CONSTRAINT chk_valid_dates CHECK (effective_to IS NULL OR effective_to >= effective_from),
    CONSTRAINT uq_pricing_unique_active UNIQUE (service, region, tier, effective_from)
        WHERE is_active = TRUE
);

-- Indexes
CREATE INDEX idx_pricing_service ON aws_pricing(service);
CREATE INDEX idx_pricing_region ON aws_pricing(region);
CREATE INDEX idx_pricing_active ON aws_pricing(service, region, tier) WHERE is_active = TRUE;

-- ============================================
-- TABLE: reports
-- ============================================
CREATE TABLE reports (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    scenario_id UUID NOT NULL REFERENCES scenarios(id) ON DELETE CASCADE,
    format report_format NOT NULL,
    file_path VARCHAR(500) NOT NULL,
    file_size_bytes INTEGER,
    generated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    generated_by VARCHAR(100), -- user_id or api_key_id
    metadata JSONB DEFAULT '{}'::jsonb
);

-- Indexes
CREATE INDEX idx_reports_scenario_id ON reports(scenario_id);
CREATE INDEX idx_reports_generated_at ON reports(generated_at DESC);

3.3 Key Queries

-- Query: Get scenario with aggregated metrics
SELECT 
    s.*,
    COUNT(DISTINCT sl.id) as total_logs,
    COUNT(DISTINCT CASE WHEN sl.has_pii THEN sl.id END) as pii_violations,
    SUM(sl.token_count) as total_tokens,
    SUM(sl.sqs_blocks) as total_sqs_blocks
FROM scenarios s
LEFT JOIN scenario_logs sl ON s.id = sl.scenario_id
WHERE s.id = :scenario_id
GROUP BY s.id;

-- Query: Get cost breakdown by service
SELECT 
    metric_type,
    SUM(value) as total_value,
    unit
FROM scenario_metrics
WHERE scenario_id = :scenario_id
  AND metric_name LIKE '%cost%'
GROUP BY metric_type, unit;

-- Query: Get active pricing for service/region
SELECT *
FROM aws_pricing
WHERE service = :service
  AND region = :region
  AND is_active = TRUE
  AND (effective_to IS NULL OR effective_to >= CURRENT_DATE)
ORDER BY effective_from DESC
LIMIT 1;

4. API Specifications

4.1 OpenAPI Overview

openapi: 3.0.0
info:
  title: mockupAWS API
  version: 0.2.0
  description: AWS Cost Simulation Platform API

servers:
  - url: http://localhost:8000/api/v1
    description: Development server

security:
  - BearerAuth: []
  - ApiKeyAuth: []

4.2 Endpoints

Scenarios API

# POST /scenarios - Create new scenario
request:
  content:
    application/json:
      schema:
        type: object
        required: [name, region]
        properties:
          name:
            type: string
            minLength: 1
            maxLength: 255
          description:
            type: string
          tags:
            type: array
            items:
              type: string
          region:
            type: string
            enum: [us-east-1, us-west-2, eu-west-1, eu-central-1]
          tier:
            type: string
            enum: [standard, on-demand]
            default: standard

response:
  201:
    content:
      application/json:
        schema:
          $ref: '#/components/schemas/Scenario'

# GET /scenarios - List scenarios
parameters:
  - name: status
    in: query
    schema:
      type: string
      enum: [draft, running, completed, archived]
  - name: region
    in: query
    schema:
      type: string
  - name: page
    in: query
    schema:
      type: integer
      default: 1
  - name: page_size
    in: query
    schema:
      type: integer
      default: 20
      maximum: 100

response:
  200:
    content:
      application/json:
        schema:
          type: object
          properties:
            items:
              type: array
              items:
                $ref: '#/components/schemas/Scenario'
            total:
              type: integer
            page:
              type: integer
            page_size:
              type: integer

# GET /scenarios/{id} - Get scenario details
# PUT /scenarios/{id} - Update scenario
# DELETE /scenarios/{id} - Delete scenario
# POST /scenarios/{id}/start - Start scenario
# POST /scenarios/{id}/stop - Stop scenario
# POST /scenarios/{id}/archive - Archive scenario

Ingest API

# POST /ingest - Ingest log
headers:
  X-Scenario-ID:
    required: true
    schema:
      type: string
      format: uuid

request:
  content:
    application/json:
      schema:
        type: object
        required: [message]
        properties:
          message:
            type: string
            minLength: 1
          source:
            type: string
            default: unknown

response:
  202:
    description: Log accepted
    content:
      application/json:
        schema:
          type: object
          properties:
            status:
              type: string
              example: accepted
            log_id:
              type: string
              format: uuid
            estimated_cost_impact:
              type: number

  400:
    description: Invalid scenario or scenario not running

Metrics API

# GET /scenarios/{id}/metrics - Get scenario metrics
response:
  200:
    content:
      application/json:
        schema:
          type: object
          properties:
            scenario_id:
              type: string
            summary:
              type: object
              properties:
                total_requests:
                  type: integer
                total_cost_usd:
                  type: number
                sqs_blocks:
                  type: integer
                lambda_invocations:
                  type: integer
                llm_tokens:
                  type: integer
                pii_violations:
                  type: integer
            cost_breakdown:
              type: array
              items:
                type: object
                properties:
                  service:
                    type: string
                  cost_usd:
                    type: number
                  percentage:
                    type: number
            timeseries:
              type: array
              items:
                type: object
                properties:
                  timestamp:
                    type: string
                    format: date-time
                  metric_type:
                    type: string
                  value:
                    type: number

Reports API

# POST /scenarios/{id}/reports - Generate report
request:
  content:
    application/json:
      schema:
        type: object
        required: [format]
        properties:
          format:
            type: string
            enum: [pdf, csv]
          include_logs:
            type: boolean
            default: false
          date_from:
            type: string
            format: date-time
          date_to:
            type: string
            format: date-time

response:
  202:
    description: Report generation started
    content:
      application/json:
        schema:
          type: object
          properties:
            report_id:
              type: string
            status:
              type: string
              enum: [pending, processing, completed]
            download_url:
              type: string

# GET /reports/{id}/download - Download report
# GET /reports/{id}/status - Check report status

Pricing API (Admin)

# GET /pricing - List pricing
# POST /pricing - Create pricing entry
# PUT /pricing/{id} - Update pricing
# DELETE /pricing/{id} - Delete pricing (soft delete)

4.3 Schemas

components:
  schemas:
    Scenario:
      type: object
      properties:
        id:
          type: string
          format: uuid
        name:
          type: string
        description:
          type: string
        tags:
          type: array
          items:
            type: string
        status:
          type: string
          enum: [draft, running, completed, archived]
        region:
          type: string
        created_at:
          type: string
          format: date-time
        updated_at:
          type: string
          format: date-time
        completed_at:
          type: string
          format: date-time
        total_requests:
          type: integer
        total_cost_estimate:
          type: number

    LogEntry:
      type: object
      properties:
        id:
          type: string
          format: uuid
        scenario_id:
          type: string
          format: uuid
        received_at:
          type: string
          format: date-time
        message_hash:
          type: string
        message_preview:
          type: string
        source:
          type: string
        size_bytes:
          type: integer
        has_pii:
          type: boolean
        token_count:
          type: integer
        sqs_blocks:
          type: integer

  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT
    ApiKeyAuth:
      type: apiKey
      in: header
      name: X-API-Key

5. Data Flow

5.1 Log Ingestion Flow

┌──────────┐     POST /ingest        ┌──────────────┐
│  Client  │ ───────────────────────>│  FastAPI     │
│(Logstash)│  Headers:               │  Middleware  │
│          │   X-Scenario-ID: uuid   │              │
└──────────┘                         └──────┬───────┘
                                            │
                                            │ 1. Validate scenario exists & running
                                            │ 2. Parse JSON payload
                                            ▼
                                     ┌──────────────┐
                                     │  Ingest      │
                                     │  Service     │
                                     └──────┬───────┘
                                            │
                    ┌───────────────────────┼───────────────────────┐
                    │                       │                       │
                    ▼                       ▼                       ▼
            ┌──────────────┐       ┌──────────────┐       ┌──────────────┐
            │ PII Detector │       │ SQS Calculator│      │ Tokenizer    │
            │ • check email│       │ • calc blocks │      │ • count      │
            └──────┬───────┘       └──────┬───────┘      └──────┬───────┘
                   │                      │                     │
                   │ has_pii: bool        │ sqs_blocks: int     │ tokens: int
                   └──────────────────────┼─────────────────────┘
                                          │
                                          ▼
                                   ┌──────────────┐
                                   │  LogRepo     │
                                   │  save()      │
                                   └──────┬───────┘
                                          │
                                          ▼
                                   ┌──────────────┐
                                   │  PostgreSQL  │
                                   │ scenario_logs│
                                   └──────────────┘

5.2 Scenario State Machine

                    ┌─────────────────────────────────────────────────────────┐
                    │                                                         │
                    ▼                                                         │
              ┌──────────┐     POST /start     ┌──────────┐                  │
     ┌───────│  DRAFT   │────────────────────>│ RUNNING  │                  │
     │       └──────────┘                     └────┬─────┘                  │
     │            ▲                               │                        │
     │            │                               │ POST /stop             │
     │            │ POST /archive                 ▼                        │
     │            │                          ┌──────────┐                  │
     │       ┌────┴────┐<────────────────────│COMPLETED │──────────────────┘
     │       │ARCHIVED │                     └──────────┘
     └──────>└─────────┘

5.3 Cost Calculation Flow

┌─────────────────────────────────────────────────────────────────────────┐
│                         COST CALCULATION PIPELINE                        │
└─────────────────────────────────────────────────────────────────────────┘

Input: scenario_logs row
├─ sqs_blocks
├─ token_count
└─ (future: lambda_gb_seconds)
         │
         ▼
┌─────────────────┐
│ Pricing Service │
│ • get_active()  │
└────────┬────────┘
         │ Query: SELECT * FROM aws_pricing
         │ WHERE service IN ('sqs', 'lambda', 'bedrock')
         │   AND region = :scenario_region
         │   AND is_active = TRUE
         ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                              COST FORMULAS                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  SQS Cost:                                                              │
│    cost = blocks × price_per_million / 1,000,000                        │
│    Example: 100 blocks × $0.40 / 1M = $0.00004                          │
│                                                                         │
│  Lambda Cost:                                                           │
│    request_cost = invocations × price_per_million / 1,000,000           │
│    compute_cost = gb_seconds × price_per_gb_second                      │
│    total = request_cost + compute_cost                                  │
│    Example: 1M invoc × $0.20/1M + 10GBs × $0.00001667 = $0.20 + $0.00017│
│                                                                         │
│  Bedrock Cost:                                                          │
│    input_cost = input_tokens × price_per_1k_input / 1,000               │
│    output_cost = output_tokens × price_per_1k_output / 1,000            │
│    total = input_cost + output_cost                                     │
│    Example: 1000 tokens × $0.003/1K = $0.003                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐
│  Update         │
│  scenarios      │
│  total_cost     │
└─────────────────┘

6. Security Architecture

6.1 Authentication & Authorization

┌─────────────────────────────────────────────────────────────────┐
│                      AUTHENTICATION LAYERS                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Layer 1: API Key (Programmatic Access)                         │
│  ├─ Header: X-API-Key: <key>                                    │
│  ├─ Rate limiting: 1000 req/min                                 │
│  └─ Scope: /ingest, /metrics (read-only on other resources)     │
│                                                                  │
│  Layer 2: JWT Token (Web UI Access)                             │
│  ├─ Header: Authorization: Bearer <jwt>                         │
│  ├─ Expiration: 24h                                             │
│  ├─ Refresh token: 7d                                           │
│  └─ Scope: Full access based on roles                           │
│                                                                  │
│  Layer 3: Role-Based Access Control (RBAC)                      │
│  ├─ admin: Full access                                          │
│  ├─ user: CRUD own scenarios, read pricing                      │
│  └─ readonly: View only                                         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

6.2 Data Security

Layer	Measure	Implementation
Transport	TLS 1.3	Nginx reverse proxy
Storage	Hashing	SHA-256 for message_hash
PII	Detection + Truncation	Email regex, 500 char preview limit
API	Rate Limiting	slowapi: 100/min public, 1000/min authenticated
DB	Parameterized Queries	SQLAlchemy ORM (no raw SQL)
Secrets	Environment Variables	python-dotenv, Docker secrets

6.3 PII Detection Strategy

# Pattern matching for common PII
def detect_pii(message: str) -> dict:
    patterns = {
        'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
        'ssn': r'\b\d{3}-\d{2}-\d{4}\b',
        'credit_card': r'\b(?:\d[ -]*?){13,16}\b',
        'phone': r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
    }
    
    results = {}
    for pii_type, pattern in patterns.items():
        matches = re.findall(pattern, message)
        if matches:
            results[pii_type] = len(matches)
    
    return {
        'has_pii': len(results) > 0,
        'pii_types': list(results.keys()),
        'total_matches': sum(results.values())
    }

7. Technology Stack

7.1 Backend

Component	Technology	Version	Purpose
Framework	FastAPI	≥0.110	Web framework
Server	Uvicorn	≥0.29	ASGI server
Validation	Pydantic	≥2.7	Data validation
ORM	SQLAlchemy	≥2.0	Database ORM
Migrations	Alembic	latest	DB migrations
Driver	asyncpg	latest	Async PostgreSQL
Tokenizer	tiktoken	≥0.6	Token counting
Rate Limit	slowapi	latest	API rate limiting
Auth	python-jose	latest	JWT handling
Testing	pytest	≥8.1	Test framework
HTTP Client	httpx	≥0.27	Async HTTP

7.2 Frontend

Component	Technology	Version	Purpose
Framework	React	≥18	UI library
Language	TypeScript	≥5.0	Type safety
Build	Vite	latest	Build tool
Styling	Tailwind CSS	≥3.4	CSS framework
Components	shadcn/ui	latest	UI components
Charts	Recharts	latest	Data viz
State	React Query	≥5.0	Server state
HTTP	Axios	latest	HTTP client
Routing	React Router	≥6.0	Navigation

7.3 Infrastructure

Component	Technology	Purpose
Container	Docker	Application containers
Orchestration	Docker Compose	Multi-container dev
Database	PostgreSQL 15+	Primary data store
Reverse Proxy	Nginx	SSL, static files
Process Manager	systemd / PM2	Production process mgmt

8. Project Structure

mockupAWS/
├── backend/
│   ├── src/
│   │   ├── __init__.py
│   │   ├── main.py                 # FastAPI app entry
│   │   ├── config.py               # Settings & env vars
│   │   ├── dependencies.py         # FastAPI dependencies
│   │   ├── models/                 # SQLAlchemy models
│   │   │   ├── __init__.py
│   │   │   ├── base.py             # Base model
│   │   │   ├── scenario.py
│   │   │   ├── scenario_log.py
│   │   │   ├── scenario_metric.py
│   │   │   ├── aws_pricing.py
│   │   │   └── report.py
│   │   ├── schemas/                # Pydantic schemas
│   │   │   ├── __init__.py
│   │   │   ├── scenario.py
│   │   │   ├── log.py
│   │   │   ├── metric.py
│   │   │   ├── pricing.py
│   │   │   └── report.py
│   │   ├── api/                    # API routes
│   │   │   ├── __init__.py
│   │   │   ├── deps.py             # Dependencies
│   │   │   └── v1/
│   │   │       ├── __init__.py
│   │   │       ├── scenarios.py    # /scenarios/*
│   │   │       ├── ingest.py       # /ingest
│   │   │       ├── metrics.py      # /metrics
│   │   │       ├── reports.py      # /reports
│   │   │       └── pricing.py      # /pricing
│   │   ├── services/               # Business logic
│   │   │   ├── __init__.py
│   │   │   ├── scenario_service.py
│   │   │   ├── ingest_service.py
│   │   │   ├── cost_calculator.py
│   │   │   ├── report_service.py
│   │   │   └── pii_detector.py
│   │   ├── repositories/           # Data access
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── scenario_repo.py
│   │   │   ├── log_repo.py
│   │   │   ├── metric_repo.py
│   │   │   └── pricing_repo.py
│   │   ├── core/                   # Core utilities
│   │   │   ├── __init__.py
│   │   │   ├── security.py         # Auth, JWT
│   │   │   ├── database.py         # DB connection
│   │   │   └── exceptions.py       # Custom exceptions
│   │   └── utils/                  # Utilities
│   │       ├── __init__.py
│   │       └── hashing.py          # SHA-256 utils
│   ├── alembic/                    # Database migrations
│   │   ├── versions/               # Migration files
│   │   ├── env.py
│   │   └── alembic.ini
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── conftest.py             # pytest fixtures
│   │   ├── unit/
│   │   │   ├── test_services.py
│   │   │   └── test_cost_calculator.py
│   │   ├── integration/
│   │   │   ├── test_api_scenarios.py
│   │   │   ├── test_api_ingest.py
│   │   │   └── test_api_metrics.py
│   │   └── e2e/
│   │       └── test_full_flow.py
│   ├── Dockerfile
│   ├── pyproject.toml
│   └── requirements.txt
│
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── ui/                 # shadcn/ui components
│   │   │   ├── layout/
│   │   │   │   ├── Header.tsx
│   │   │   │   ├── Sidebar.tsx
│   │   │   │   └── Layout.tsx
│   │   │   ├── scenarios/
│   │   │   │   ├── ScenarioList.tsx
│   │   │   │   ├── ScenarioCard.tsx
│   │   │   │   ├── ScenarioForm.tsx
│   │   │   │   └── ScenarioDetail.tsx
│   │   │   ├── metrics/
│   │   │   │   ├── MetricCard.tsx
│   │   │   │   ├── CostChart.tsx
│   │   │   │   └── MetricsDashboard.tsx
│   │   │   └── reports/
│   │   │       ├── ReportGenerator.tsx
│   │   │       └── ReportDownload.tsx
│   │   ├── pages/
│   │   │   ├── Dashboard.tsx
│   │   │   ├── ScenariosPage.tsx
│   │   │   ├── ScenarioCreate.tsx
│   │   │   ├── ScenarioDetail.tsx
│   │   │   ├── Compare.tsx
│   │   │   ├── Reports.tsx
│   │   │   └── Settings.tsx
│   │   ├── hooks/
│   │   │   ├── useScenarios.ts
│   │   │   ├── useMetrics.ts
│   │   │   └── useReports.ts
│   │   ├── services/
│   │   │   ├── api.ts              # Axios config
│   │   │   ├── scenarioApi.ts
│   │   │   └── metricApi.ts
│   │   ├── types/
│   │   │   ├── scenario.ts
│   │   │   ├── metric.ts
│   │   │   └── api.ts
│   │   ├── context/
│   │   │   └── ThemeContext.tsx
│   │   ├── App.tsx
│   │   └── main.tsx
│   ├── public/
│   ├── index.html
│   ├── Dockerfile
│   ├── package.json
│   ├── tsconfig.json
│   ├── tailwind.config.js
│   └── vite.config.ts
│
├── docker-compose.yml
├── nginx.conf
├── .env.example
├── .env
├── .gitignore
└── README.md

9. Decisioni Architetturali

DEC-001: Async-First Architecture

Decisione: Utilizzare Python async/await in tutto lo stack (FastAPI, SQLAlchemy, asyncpg)

Motivazione:

Alto throughput richiesto (>1000 RPS)
I/O bound operations (DB, tokenizer)
Migliore utilizzo risorse rispetto a sync

Alternative considerate:

Sync + ThreadPool: Più semplice ma meno efficiente
Celery + Redis: Troppo complesso per use case

Conseguenze:

Curva di apprendimento per async
Debugging più complesso
Migliore scalabilità

DEC-002: Repository Pattern

Decisione: Implementare Repository Pattern per accesso dati

Motivazione:

Separazione tra business logic e data access
Facile testing con mock repositories
Possibilità di cambiare DB in futuro

Struttura:

class BaseRepository(Generic[T]):
    async def get(self, id: UUID) -> T | None: ...
    async def list(self, **filters) -> list[T]: ...
    async def create(self, obj: T) -> T: ...
    async def update(self, id: UUID, data: dict) -> T: ...
    async def delete(self, id: UUID) -> bool: ...

DEC-003: Separate Database per Scenario

Decisione: Utilizzare una singola tabella scenario_logs con scenario_id FK invece di DB separati

Motivazione:

Più semplice da gestire
Query cross-scenario possibili (confronti)
Backup/restore più semplice

Alternative considerate:

Schema per scenario: Troppo overhead
DB separati: Troppo complesso per MVP

DEC-004: Message Hashing for Deduplication

Decisione: Utilizzare SHA-256 hash del messaggio per deduplicazione

Motivazione:

Privacy: Non memorizzare messaggi completi
Performance: Hash lookup O(1)
Storage: Risparmio spazio

Implementazione:

import hashlib
message_hash = hashlib.sha256(message.encode()).hexdigest()

DEC-005: Time-Series Metrics

Decisione: Salvare metriche come time-series in scenario_metrics

Motivazione:

Trend analysis possibile
Aggregazioni flessibili
Audit trail

Trade-off:

Più storage rispetto a campi aggregati
Query più complesse ma indicizzate

10. Performance Considerations

10.1 Database Optimization

Optimization	Implementation	Benefit
Indexes	B-tree on foreign keys, timestamps	Fast lookups
GIN	tags (JSONB)	Fast array search
Partitioning	scenario_logs by date	Query pruning
Connection Pool	asyncpg pool (20-50)	Concurrency

10.2 Caching Strategy (Future)

Layer 1: In-memory (FastAPI state)
├─ Active scenario metadata
└─ AWS pricing (rarely changes)

Layer 2: Redis (future)
├─ Session storage
├─ Rate limiting counters
└─ Report generation status

10.3 Query Optimization

Use selectinload for relationships
Batch inserts for logs (copy_expert)
Materialized views for reports
Async tasks for heavy operations

11. Error Handling Strategy

11.1 Exception Hierarchy

class AppException(Exception):
    """Base application exception"""
    status_code: int = 500
    code: str = "internal_error"

class NotFoundException(AppException):
    status_code = 404
    code = "not_found"

class ValidationException(AppException):
    status_code = 400
    code = "validation_error"

class ConflictException(AppException):
    status_code = 409
    code = "conflict"

class RateLimitException(AppException):
    status_code = 429
    code = "rate_limited"

11.2 Global Exception Handler

@app.exception_handler(AppException)
async def app_exception_handler(request: Request, exc: AppException):
    return JSONResponse(
        status_code=exc.status_code,
        content={
            "error": exc.code,
            "message": str(exc),
            "timestamp": datetime.utcnow().isoformat()
        }
    )

12. Deployment Architecture

12.1 Docker Compose (Development)

version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: mockupaws
      POSTGRES_USER: app
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d mockupaws"]

  backend:
    build: ./backend
    environment:
      DATABASE_URL: postgresql+asyncpg://app:${DB_PASSWORD}@postgres:5432/mockupaws
    ports:
      - "8000:8000"
    depends_on:
      postgres:
        condition: service_healthy

  frontend:
    build: ./frontend
    ports:
      - "3000:80"
    depends_on:
      - backend

volumes:
  postgres_data:

12.2 Production Considerations

Use managed PostgreSQL (AWS RDS, Azure PostgreSQL)
Nginx as reverse proxy with SSL
Environment-specific configuration
Log aggregation (ELK or similar)
Monitoring (Prometheus + Grafana)
Health checks and readiness probes

Documento creato da @spec-architect
Versione: 1.0
Data: 2026-04-07

52 KiB Raw Blame History Unescape Escape

Architecture - mockupAWS

1. Overview

2. System Architecture

2.1 High-Level Architecture

2.2 Layer Responsibilities

3. Database Schema

3.1 Entity Relationship Diagram

3.2 DDL - Schema Definition

3.3 Key Queries

4. API Specifications

4.1 OpenAPI Overview

4.2 Endpoints

Scenarios API

Ingest API

Metrics API

Reports API

Pricing API (Admin)

4.3 Schemas

5. Data Flow

5.1 Log Ingestion Flow

5.2 Scenario State Machine

5.3 Cost Calculation Flow

6. Security Architecture

6.1 Authentication & Authorization

6.2 Data Security

6.3 PII Detection Strategy

7. Technology Stack

7.1 Backend

7.2 Frontend

7.3 Infrastructure

8. Project Structure

9. Decisioni Architetturali

DEC-001: Async-First Architecture

DEC-002: Repository Pattern

DEC-003: Separate Database per Scenario

DEC-004: Message Hashing for Deduplication

DEC-005: Time-Series Metrics

10. Performance Considerations

10.1 Database Optimization

10.2 Caching Strategy (Future)

10.3 Query Optimization

11. Error Handling Strategy

11.1 Exception Hierarchy

11.2 Global Exception Handler

12. Deployment Architecture

12.1 Docker Compose (Development)

12.2 Production Considerations

52 KiB

Raw Blame History