Add comprehensive technical specifications for mockupAWS v0.2.0: - export/architecture.md: Complete system architecture with: * Layered architecture diagram (Client → API → Service → Repository → DB) * Full database schema with DDL SQL (5 tables, indexes, constraints) * API specifications (OpenAPI format) for all endpoints * Security architecture (auth, PII detection, rate limiting) * Data flow diagrams (ingestion, cost calculation, state machine) * Technology stack details (backend, frontend, infrastructure) * Project structure for backend and frontend * 4 Architecture Decision Records (DEC-001 to DEC-004) - export/kanban.md: Task breakdown with 32 tasks organized in: * Database setup (DB-001 to DB-007) * Backend models/schemas (BE-001 to BE-003) * Backend repositories (BE-004 to BE-008) * Backend services (BE-009 to BE-014) * Backend API (BE-015 to BE-020) * Testing (QA-001 to QA-003) - export/progress.md: Project tracking initialized with: * Current status: 0% complete, Fase 1 setup * Sprint planning and metrics * Resource links and team assignments All specifications follow 'Little Often' principle with tasks < 2 hours.
52 KiB
Architecture - mockupAWS
1. Overview
mockupAWS è una piattaforma di simulazione costi AWS che permette di profilare traffico log e calcolare i driver di costo (SQS, Lambda, Bedrock/LLM) prima del deploy in produzione.
Architettura: Layered Architecture con pattern Repository e Service Layer
Paradigma: Async-first (FastAPI + SQLAlchemy async)
Deployment: Container-based (Docker Compose)
2. System Architecture
2.1 High-Level Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ Logstash │ │ React Web UI │ │ API Consumers │ │
│ │ (Log Source) │ │ (Dashboard) │ │ (CI/CD, Scripts) │ │
│ └────────┬─────────┘ └────────┬─────────┘ └───────────┬──────────────┘ │
└───────────┼─────────────────────┼────────────────────────┼───────────────────┘
│ │ │
│ HTTP POST │ HTTPS │ API Key + JWT
│ /ingest │ /api/v1/* │ /api/v1/*
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ API LAYER │
│ FastAPI + Uvicorn (ASGI) │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Middleware Stack │ │
│ │ ├── CORS │ │
│ │ ├── Rate Limiting (slowapi) │ │
│ │ ├── Authentication (JWT / API Key) │ │
│ │ ├── Request Validation (Pydantic) │ │
│ │ └── Error Handling │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ /scenarios │ │ /ingest │ │ /reports │ │ /pricing │ │
│ │ CRUD │ │ (log │ │ generate │ │ (admin) │ │
│ │ │ │ intake) │ │ download │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
└─────────┼────────────────┼────────────────┼──────────────────┼─────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ SERVICE LAYER │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ ScenarioService │ │ IngestService │ │ CostCalculator │ │
│ │ ─────────────── │ │ ────────────── │ │ ───────────── │ │
│ │ • create() │ │ • ingest_log() │ │ • calculate_sqs_cost() │ │
│ │ • update() │ │ • batch_process()│ │ • calculate_lambda_cost() │ │
│ │ • delete() │ │ • deduplicate() │ │ • calculate_bedrock_cost() │ │
│ │ • lifecycle() │ │ • persist() │ │ • get_total_cost() │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ ReportService │ │ PIIDetector │ │ TokenizerService │ │
│ │ ────────────── │ │ ─────────── │ │ ─────────────── │ │
│ │ • generate_csv()│ │ • detect_email()│ │ • count_tokens() │ │
│ │ • generate_pdf()│ │ • scan_patterns()│ │ • encode() │ │
│ │ • compile() │ │ • report() │ │ • get_encoding() │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │
└─────────┬──────────────────────────────────────────────────────┬────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ REPOSITORY LAYER │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ ScenarioRepo │ │ LogRepo │ │ PricingRepo │ │
│ │ ───────────── │ │ ─────── │ │ ────────── │ │
│ │ • get_by_id() │ │ • save() │ │ • get_by_service_region() │ │
│ │ • list() │ │ • list_by_ │ │ • list_active() │ │
│ │ • create() │ │ scenario() │ │ • update() │ │
│ │ • update() │ │ • count_by_ │ │ • bulk_insert() │ │
│ │ • delete() │ │ hash() │ │ │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ MetricRepo │ │ ReportRepo │ │
│ │ ────────── │ │ ────────── │ │ │
│ │ • save() │ │ • save() │ │ │
│ │ • get_aggregated│ │ • list() │ │ │
│ │ • list_by_type()│ │ • delete() │ │ │
│ └──────────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
│ SQLAlchemy 2.0 Async
│ asyncpg driver
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATABASE LAYER │
│ PostgreSQL 15+ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ scenarios │ │ scenario_logs │ │ aws_pricing │ │
│ │ ───────── │ │ ───────────── │ │ ─────────── │ │
│ │ • metadata │ │ • logs storage │ │ • service prices │ │
│ │ • state machine │ │ • hash for dedup│ │ • history tracking │ │
│ │ • cost totals │ │ • PII flags │ │ • region-specific │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ scenario_metrics│ │ reports │ │ │
│ │ ─────────────── │ │ ──────── │ │ │
│ │ • time-series │ │ • generated │ │ │
│ │ • aggregates │ │ • metadata │ │ │
│ │ • cost breakdown│ │ • file refs │ │ │
│ └──────────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
2.2 Layer Responsibilities
| Layer | Responsabilità | Tecnologie |
|---|---|---|
| Client | Interazione utente, ingestion log | Browser, Logstash, curl |
| API | Routing, validation, auth, middleware | FastAPI, Pydantic, slowapi |
| Service | Business logic, orchestration | Python async/await |
| Repository | Data access, query abstraction | SQLAlchemy 2.0 Repository pattern |
| Database | Persistenza, ACID, queries | PostgreSQL 15+ |
3. Database Schema
3.1 Entity Relationship Diagram
┌─────────────────────────────────────────────────────────────────────────┐
│ SCHEMA ERD │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────┐ ┌─────────────────────┐
│ scenarios │ │ aws_pricing │
├─────────────────────┤ ├─────────────────────┤
│ PK id: UUID │ │ PK id: UUID │
│ name: VARCHAR(255)│ │ service: VARCHAR │
│ description: TEXT│ │ region: VARCHAR │
│ tags: JSONB │ │ tier: VARCHAR │
│ status: ENUM │ │ price: DECIMAL │
│ region: VARCHAR │ │ unit: VARCHAR │
│ created_at: TS │ │ effective_from: D│
│ updated_at: TS │ │ effective_to: D │
│ completed_at: TS │ │ is_active: BOOL │
│ total_requests: INT│ │ source_url: TEXT │
│ total_cost: DEC │ └─────────────────────┘
└──────────┬──────────┘
│
│ 1:N
▼
┌─────────────────────┐ ┌─────────────────────┐
│ scenario_logs │ │ scenario_metrics │
├─────────────────────┤ ├─────────────────────┤
│ PK id: UUID │ │ PK id: UUID │
│ FK scenario_id: UUID│ │ FK scenario_id: UUID│
│ received_at: TS │ │ timestamp: TS │
│ message_hash: V64│ │ metric_type: VAR │
│ message_preview │ │ metric_name: VAR │
│ source: VARCHAR │ │ value: DECIMAL │
│ size_bytes: INT │ │ unit: VARCHAR │
│ has_pii: BOOL │ │ metadata: JSONB │
│ token_count: INT │ └─────────────────────┘
│ sqs_blocks: INT │
└─────────────────────┘
│
│ 1:N (optional)
▼
┌─────────────────────┐
│ reports │
├─────────────────────┤
│ PK id: UUID │
│ FK scenario_id: UUID│
│ format: ENUM │
│ file_path: TEXT │
│ generated_at: TS │
│ metadata: JSONB │
└─────────────────────┘
3.2 DDL - Schema Definition
-- ============================================
-- EXTENSIONS
-- ============================================
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pg_trgm"; -- For text search
-- ============================================
-- ENUMS
-- ============================================
CREATE TYPE scenario_status AS ENUM ('draft', 'running', 'completed', 'archived');
CREATE TYPE report_format AS ENUM ('pdf', 'csv');
-- ============================================
-- TABLE: scenarios
-- ============================================
CREATE TABLE scenarios (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name VARCHAR(255) NOT NULL,
description TEXT,
tags JSONB DEFAULT '[]'::jsonb,
status scenario_status NOT NULL DEFAULT 'draft',
region VARCHAR(50) NOT NULL DEFAULT 'us-east-1',
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
completed_at TIMESTAMP WITH TIME ZONE,
started_at TIMESTAMP WITH TIME ZONE,
total_requests INTEGER NOT NULL DEFAULT 0,
total_cost_estimate DECIMAL(12, 6) NOT NULL DEFAULT 0.000000,
-- Constraints
CONSTRAINT chk_name_not_empty CHECK (char_length(trim(name)) > 0),
CONSTRAINT chk_region_not_empty CHECK (char_length(trim(region)) > 0)
);
-- Indexes
CREATE INDEX idx_scenarios_status ON scenarios(status);
CREATE INDEX idx_scenarios_region ON scenarios(region);
CREATE INDEX idx_scenarios_created_at ON scenarios(created_at DESC);
CREATE INDEX idx_scenarios_tags ON scenarios USING GIN(tags);
-- Trigger for updated_at
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TRIGGER update_scenarios_updated_at
BEFORE UPDATE ON scenarios
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();
-- ============================================
-- TABLE: scenario_logs
-- ============================================
CREATE TABLE scenario_logs (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
scenario_id UUID NOT NULL REFERENCES scenarios(id) ON DELETE CASCADE,
received_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
message_hash VARCHAR(64) NOT NULL, -- SHA256
message_preview VARCHAR(500),
source VARCHAR(100) DEFAULT 'unknown',
size_bytes INTEGER NOT NULL DEFAULT 0,
has_pii BOOLEAN NOT NULL DEFAULT FALSE,
token_count INTEGER NOT NULL DEFAULT 0,
sqs_blocks INTEGER NOT NULL DEFAULT 1,
-- Constraints
CONSTRAINT chk_size_positive CHECK (size_bytes >= 0),
CONSTRAINT chk_token_positive CHECK (token_count >= 0),
CONSTRAINT chk_blocks_positive CHECK (sqs_blocks >= 1)
);
-- Indexes
CREATE INDEX idx_logs_scenario_id ON scenario_logs(scenario_id);
CREATE INDEX idx_logs_received_at ON scenario_logs(received_at DESC);
CREATE INDEX idx_logs_message_hash ON scenario_logs(message_hash);
CREATE INDEX idx_logs_has_pii ON scenario_logs(has_pii) WHERE has_pii = TRUE;
-- ============================================
-- TABLE: scenario_metrics
-- ============================================
CREATE TABLE scenario_metrics (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
scenario_id UUID NOT NULL REFERENCES scenarios(id) ON DELETE CASCADE,
timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
metric_type VARCHAR(50) NOT NULL, -- 'sqs', 'lambda', 'bedrock', 'safety'
metric_name VARCHAR(100) NOT NULL,
value DECIMAL(15, 6) NOT NULL DEFAULT 0.000000,
unit VARCHAR(20) NOT NULL, -- 'count', 'bytes', 'tokens', 'usd', 'invocations'
metadata JSONB DEFAULT '{}'::jsonb
);
-- Indexes
CREATE INDEX idx_metrics_scenario_id ON scenario_metrics(scenario_id);
CREATE INDEX idx_metrics_timestamp ON scenario_metrics(timestamp DESC);
CREATE INDEX idx_metrics_type ON scenario_metrics(metric_type);
CREATE INDEX idx_metrics_scenario_type ON scenario_metrics(scenario_id, metric_type);
-- ============================================
-- TABLE: aws_pricing
-- ============================================
CREATE TABLE aws_pricing (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
service VARCHAR(50) NOT NULL, -- 'sqs', 'lambda', 'bedrock'
region VARCHAR(50) NOT NULL,
tier VARCHAR(50) NOT NULL DEFAULT 'standard',
price_per_unit DECIMAL(15, 10) NOT NULL,
unit VARCHAR(20) NOT NULL, -- 'per_million_requests', 'per_gb_second', 'per_1k_tokens'
effective_from DATE NOT NULL DEFAULT CURRENT_DATE,
effective_to DATE,
is_active BOOLEAN NOT NULL DEFAULT TRUE,
source_url VARCHAR(500),
description TEXT,
-- Constraints
CONSTRAINT chk_price_positive CHECK (price_per_unit >= 0),
CONSTRAINT chk_valid_dates CHECK (effective_to IS NULL OR effective_to >= effective_from),
CONSTRAINT uq_pricing_unique_active UNIQUE (service, region, tier, effective_from)
WHERE is_active = TRUE
);
-- Indexes
CREATE INDEX idx_pricing_service ON aws_pricing(service);
CREATE INDEX idx_pricing_region ON aws_pricing(region);
CREATE INDEX idx_pricing_active ON aws_pricing(service, region, tier) WHERE is_active = TRUE;
-- ============================================
-- TABLE: reports
-- ============================================
CREATE TABLE reports (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
scenario_id UUID NOT NULL REFERENCES scenarios(id) ON DELETE CASCADE,
format report_format NOT NULL,
file_path VARCHAR(500) NOT NULL,
file_size_bytes INTEGER,
generated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
generated_by VARCHAR(100), -- user_id or api_key_id
metadata JSONB DEFAULT '{}'::jsonb
);
-- Indexes
CREATE INDEX idx_reports_scenario_id ON reports(scenario_id);
CREATE INDEX idx_reports_generated_at ON reports(generated_at DESC);
3.3 Key Queries
-- Query: Get scenario with aggregated metrics
SELECT
s.*,
COUNT(DISTINCT sl.id) as total_logs,
COUNT(DISTINCT CASE WHEN sl.has_pii THEN sl.id END) as pii_violations,
SUM(sl.token_count) as total_tokens,
SUM(sl.sqs_blocks) as total_sqs_blocks
FROM scenarios s
LEFT JOIN scenario_logs sl ON s.id = sl.scenario_id
WHERE s.id = :scenario_id
GROUP BY s.id;
-- Query: Get cost breakdown by service
SELECT
metric_type,
SUM(value) as total_value,
unit
FROM scenario_metrics
WHERE scenario_id = :scenario_id
AND metric_name LIKE '%cost%'
GROUP BY metric_type, unit;
-- Query: Get active pricing for service/region
SELECT *
FROM aws_pricing
WHERE service = :service
AND region = :region
AND is_active = TRUE
AND (effective_to IS NULL OR effective_to >= CURRENT_DATE)
ORDER BY effective_from DESC
LIMIT 1;
4. API Specifications
4.1 OpenAPI Overview
openapi: 3.0.0
info:
title: mockupAWS API
version: 0.2.0
description: AWS Cost Simulation Platform API
servers:
- url: http://localhost:8000/api/v1
description: Development server
security:
- BearerAuth: []
- ApiKeyAuth: []
4.2 Endpoints
Scenarios API
# POST /scenarios - Create new scenario
request:
content:
application/json:
schema:
type: object
required: [name, region]
properties:
name:
type: string
minLength: 1
maxLength: 255
description:
type: string
tags:
type: array
items:
type: string
region:
type: string
enum: [us-east-1, us-west-2, eu-west-1, eu-central-1]
tier:
type: string
enum: [standard, on-demand]
default: standard
response:
201:
content:
application/json:
schema:
$ref: '#/components/schemas/Scenario'
# GET /scenarios - List scenarios
parameters:
- name: status
in: query
schema:
type: string
enum: [draft, running, completed, archived]
- name: region
in: query
schema:
type: string
- name: page
in: query
schema:
type: integer
default: 1
- name: page_size
in: query
schema:
type: integer
default: 20
maximum: 100
response:
200:
content:
application/json:
schema:
type: object
properties:
items:
type: array
items:
$ref: '#/components/schemas/Scenario'
total:
type: integer
page:
type: integer
page_size:
type: integer
# GET /scenarios/{id} - Get scenario details
# PUT /scenarios/{id} - Update scenario
# DELETE /scenarios/{id} - Delete scenario
# POST /scenarios/{id}/start - Start scenario
# POST /scenarios/{id}/stop - Stop scenario
# POST /scenarios/{id}/archive - Archive scenario
Ingest API
# POST /ingest - Ingest log
headers:
X-Scenario-ID:
required: true
schema:
type: string
format: uuid
request:
content:
application/json:
schema:
type: object
required: [message]
properties:
message:
type: string
minLength: 1
source:
type: string
default: unknown
response:
202:
description: Log accepted
content:
application/json:
schema:
type: object
properties:
status:
type: string
example: accepted
log_id:
type: string
format: uuid
estimated_cost_impact:
type: number
400:
description: Invalid scenario or scenario not running
Metrics API
# GET /scenarios/{id}/metrics - Get scenario metrics
response:
200:
content:
application/json:
schema:
type: object
properties:
scenario_id:
type: string
summary:
type: object
properties:
total_requests:
type: integer
total_cost_usd:
type: number
sqs_blocks:
type: integer
lambda_invocations:
type: integer
llm_tokens:
type: integer
pii_violations:
type: integer
cost_breakdown:
type: array
items:
type: object
properties:
service:
type: string
cost_usd:
type: number
percentage:
type: number
timeseries:
type: array
items:
type: object
properties:
timestamp:
type: string
format: date-time
metric_type:
type: string
value:
type: number
Reports API
# POST /scenarios/{id}/reports - Generate report
request:
content:
application/json:
schema:
type: object
required: [format]
properties:
format:
type: string
enum: [pdf, csv]
include_logs:
type: boolean
default: false
date_from:
type: string
format: date-time
date_to:
type: string
format: date-time
response:
202:
description: Report generation started
content:
application/json:
schema:
type: object
properties:
report_id:
type: string
status:
type: string
enum: [pending, processing, completed]
download_url:
type: string
# GET /reports/{id}/download - Download report
# GET /reports/{id}/status - Check report status
Pricing API (Admin)
# GET /pricing - List pricing
# POST /pricing - Create pricing entry
# PUT /pricing/{id} - Update pricing
# DELETE /pricing/{id} - Delete pricing (soft delete)
4.3 Schemas
components:
schemas:
Scenario:
type: object
properties:
id:
type: string
format: uuid
name:
type: string
description:
type: string
tags:
type: array
items:
type: string
status:
type: string
enum: [draft, running, completed, archived]
region:
type: string
created_at:
type: string
format: date-time
updated_at:
type: string
format: date-time
completed_at:
type: string
format: date-time
total_requests:
type: integer
total_cost_estimate:
type: number
LogEntry:
type: object
properties:
id:
type: string
format: uuid
scenario_id:
type: string
format: uuid
received_at:
type: string
format: date-time
message_hash:
type: string
message_preview:
type: string
source:
type: string
size_bytes:
type: integer
has_pii:
type: boolean
token_count:
type: integer
sqs_blocks:
type: integer
securitySchemes:
BearerAuth:
type: http
scheme: bearer
bearerFormat: JWT
ApiKeyAuth:
type: apiKey
in: header
name: X-API-Key
5. Data Flow
5.1 Log Ingestion Flow
┌──────────┐ POST /ingest ┌──────────────┐
│ Client │ ───────────────────────>│ FastAPI │
│(Logstash)│ Headers: │ Middleware │
│ │ X-Scenario-ID: uuid │ │
└──────────┘ └──────┬───────┘
│
│ 1. Validate scenario exists & running
│ 2. Parse JSON payload
▼
┌──────────────┐
│ Ingest │
│ Service │
└──────┬───────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ PII Detector │ │ SQS Calculator│ │ Tokenizer │
│ • check email│ │ • calc blocks │ │ • count │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ has_pii: bool │ sqs_blocks: int │ tokens: int
└──────────────────────┼─────────────────────┘
│
▼
┌──────────────┐
│ LogRepo │
│ save() │
└──────┬───────┘
│
▼
┌──────────────┐
│ PostgreSQL │
│ scenario_logs│
└──────────────┘
5.2 Scenario State Machine
┌─────────────────────────────────────────────────────────┐
│ │
▼ │
┌──────────┐ POST /start ┌──────────┐ │
┌───────│ DRAFT │────────────────────>│ RUNNING │ │
│ └──────────┘ └────┬─────┘ │
│ ▲ │ │
│ │ │ POST /stop │
│ │ POST /archive ▼ │
│ │ ┌──────────┐ │
│ ┌────┴────┐<────────────────────│COMPLETED │──────────────────┘
│ │ARCHIVED │ └──────────┘
└──────>└─────────┘
5.3 Cost Calculation Flow
┌─────────────────────────────────────────────────────────────────────────┐
│ COST CALCULATION PIPELINE │
└─────────────────────────────────────────────────────────────────────────┘
Input: scenario_logs row
├─ sqs_blocks
├─ token_count
└─ (future: lambda_gb_seconds)
│
▼
┌─────────────────┐
│ Pricing Service │
│ • get_active() │
└────────┬────────┘
│ Query: SELECT * FROM aws_pricing
│ WHERE service IN ('sqs', 'lambda', 'bedrock')
│ AND region = :scenario_region
│ AND is_active = TRUE
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ COST FORMULAS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ SQS Cost: │
│ cost = blocks × price_per_million / 1,000,000 │
│ Example: 100 blocks × $0.40 / 1M = $0.00004 │
│ │
│ Lambda Cost: │
│ request_cost = invocations × price_per_million / 1,000,000 │
│ compute_cost = gb_seconds × price_per_gb_second │
│ total = request_cost + compute_cost │
│ Example: 1M invoc × $0.20/1M + 10GBs × $0.00001667 = $0.20 + $0.00017│
│ │
│ Bedrock Cost: │
│ input_cost = input_tokens × price_per_1k_input / 1,000 │
│ output_cost = output_tokens × price_per_1k_output / 1,000 │
│ total = input_cost + output_cost │
│ Example: 1000 tokens × $0.003/1K = $0.003 │
│ │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Update │
│ scenarios │
│ total_cost │
└─────────────────┘
6. Security Architecture
6.1 Authentication & Authorization
┌─────────────────────────────────────────────────────────────────┐
│ AUTHENTICATION LAYERS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: API Key (Programmatic Access) │
│ ├─ Header: X-API-Key: <key> │
│ ├─ Rate limiting: 1000 req/min │
│ └─ Scope: /ingest, /metrics (read-only on other resources) │
│ │
│ Layer 2: JWT Token (Web UI Access) │
│ ├─ Header: Authorization: Bearer <jwt> │
│ ├─ Expiration: 24h │
│ ├─ Refresh token: 7d │
│ └─ Scope: Full access based on roles │
│ │
│ Layer 3: Role-Based Access Control (RBAC) │
│ ├─ admin: Full access │
│ ├─ user: CRUD own scenarios, read pricing │
│ └─ readonly: View only │
│ │
└─────────────────────────────────────────────────────────────────┘
6.2 Data Security
| Layer | Measure | Implementation |
|---|---|---|
| Transport | TLS 1.3 | Nginx reverse proxy |
| Storage | Hashing | SHA-256 for message_hash |
| PII | Detection + Truncation | Email regex, 500 char preview limit |
| API | Rate Limiting | slowapi: 100/min public, 1000/min authenticated |
| DB | Parameterized Queries | SQLAlchemy ORM (no raw SQL) |
| Secrets | Environment Variables | python-dotenv, Docker secrets |
6.3 PII Detection Strategy
# Pattern matching for common PII
def detect_pii(message: str) -> dict:
patterns = {
'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
'ssn': r'\b\d{3}-\d{2}-\d{4}\b',
'credit_card': r'\b(?:\d[ -]*?){13,16}\b',
'phone': r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
}
results = {}
for pii_type, pattern in patterns.items():
matches = re.findall(pattern, message)
if matches:
results[pii_type] = len(matches)
return {
'has_pii': len(results) > 0,
'pii_types': list(results.keys()),
'total_matches': sum(results.values())
}
7. Technology Stack
7.1 Backend
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Framework | FastAPI | ≥0.110 | Web framework |
| Server | Uvicorn | ≥0.29 | ASGI server |
| Validation | Pydantic | ≥2.7 | Data validation |
| ORM | SQLAlchemy | ≥2.0 | Database ORM |
| Migrations | Alembic | latest | DB migrations |
| Driver | asyncpg | latest | Async PostgreSQL |
| Tokenizer | tiktoken | ≥0.6 | Token counting |
| Rate Limit | slowapi | latest | API rate limiting |
| Auth | python-jose | latest | JWT handling |
| Testing | pytest | ≥8.1 | Test framework |
| HTTP Client | httpx | ≥0.27 | Async HTTP |
7.2 Frontend
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Framework | React | ≥18 | UI library |
| Language | TypeScript | ≥5.0 | Type safety |
| Build | Vite | latest | Build tool |
| Styling | Tailwind CSS | ≥3.4 | CSS framework |
| Components | shadcn/ui | latest | UI components |
| Charts | Recharts | latest | Data viz |
| State | React Query | ≥5.0 | Server state |
| HTTP | Axios | latest | HTTP client |
| Routing | React Router | ≥6.0 | Navigation |
7.3 Infrastructure
| Component | Technology | Purpose |
|---|---|---|
| Container | Docker | Application containers |
| Orchestration | Docker Compose | Multi-container dev |
| Database | PostgreSQL 15+ | Primary data store |
| Reverse Proxy | Nginx | SSL, static files |
| Process Manager | systemd / PM2 | Production process mgmt |
8. Project Structure
mockupAWS/
├── backend/
│ ├── src/
│ │ ├── __init__.py
│ │ ├── main.py # FastAPI app entry
│ │ ├── config.py # Settings & env vars
│ │ ├── dependencies.py # FastAPI dependencies
│ │ ├── models/ # SQLAlchemy models
│ │ │ ├── __init__.py
│ │ │ ├── base.py # Base model
│ │ │ ├── scenario.py
│ │ │ ├── scenario_log.py
│ │ │ ├── scenario_metric.py
│ │ │ ├── aws_pricing.py
│ │ │ └── report.py
│ │ ├── schemas/ # Pydantic schemas
│ │ │ ├── __init__.py
│ │ │ ├── scenario.py
│ │ │ ├── log.py
│ │ │ ├── metric.py
│ │ │ ├── pricing.py
│ │ │ └── report.py
│ │ ├── api/ # API routes
│ │ │ ├── __init__.py
│ │ │ ├── deps.py # Dependencies
│ │ │ └── v1/
│ │ │ ├── __init__.py
│ │ │ ├── scenarios.py # /scenarios/*
│ │ │ ├── ingest.py # /ingest
│ │ │ ├── metrics.py # /metrics
│ │ │ ├── reports.py # /reports
│ │ │ └── pricing.py # /pricing
│ │ ├── services/ # Business logic
│ │ │ ├── __init__.py
│ │ │ ├── scenario_service.py
│ │ │ ├── ingest_service.py
│ │ │ ├── cost_calculator.py
│ │ │ ├── report_service.py
│ │ │ └── pii_detector.py
│ │ ├── repositories/ # Data access
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ ├── scenario_repo.py
│ │ │ ├── log_repo.py
│ │ │ ├── metric_repo.py
│ │ │ └── pricing_repo.py
│ │ ├── core/ # Core utilities
│ │ │ ├── __init__.py
│ │ │ ├── security.py # Auth, JWT
│ │ │ ├── database.py # DB connection
│ │ │ └── exceptions.py # Custom exceptions
│ │ └── utils/ # Utilities
│ │ ├── __init__.py
│ │ └── hashing.py # SHA-256 utils
│ ├── alembic/ # Database migrations
│ │ ├── versions/ # Migration files
│ │ ├── env.py
│ │ └── alembic.ini
│ ├── tests/
│ │ ├── __init__.py
│ │ ├── conftest.py # pytest fixtures
│ │ ├── unit/
│ │ │ ├── test_services.py
│ │ │ └── test_cost_calculator.py
│ │ ├── integration/
│ │ │ ├── test_api_scenarios.py
│ │ │ ├── test_api_ingest.py
│ │ │ └── test_api_metrics.py
│ │ └── e2e/
│ │ └── test_full_flow.py
│ ├── Dockerfile
│ ├── pyproject.toml
│ └── requirements.txt
│
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ ├── ui/ # shadcn/ui components
│ │ │ ├── layout/
│ │ │ │ ├── Header.tsx
│ │ │ │ ├── Sidebar.tsx
│ │ │ │ └── Layout.tsx
│ │ │ ├── scenarios/
│ │ │ │ ├── ScenarioList.tsx
│ │ │ │ ├── ScenarioCard.tsx
│ │ │ │ ├── ScenarioForm.tsx
│ │ │ │ └── ScenarioDetail.tsx
│ │ │ ├── metrics/
│ │ │ │ ├── MetricCard.tsx
│ │ │ │ ├── CostChart.tsx
│ │ │ │ └── MetricsDashboard.tsx
│ │ │ └── reports/
│ │ │ ├── ReportGenerator.tsx
│ │ │ └── ReportDownload.tsx
│ │ ├── pages/
│ │ │ ├── Dashboard.tsx
│ │ │ ├── ScenariosPage.tsx
│ │ │ ├── ScenarioCreate.tsx
│ │ │ ├── ScenarioDetail.tsx
│ │ │ ├── Compare.tsx
│ │ │ ├── Reports.tsx
│ │ │ └── Settings.tsx
│ │ ├── hooks/
│ │ │ ├── useScenarios.ts
│ │ │ ├── useMetrics.ts
│ │ │ └── useReports.ts
│ │ ├── services/
│ │ │ ├── api.ts # Axios config
│ │ │ ├── scenarioApi.ts
│ │ │ └── metricApi.ts
│ │ ├── types/
│ │ │ ├── scenario.ts
│ │ │ ├── metric.ts
│ │ │ └── api.ts
│ │ ├── context/
│ │ │ └── ThemeContext.tsx
│ │ ├── App.tsx
│ │ └── main.tsx
│ ├── public/
│ ├── index.html
│ ├── Dockerfile
│ ├── package.json
│ ├── tsconfig.json
│ ├── tailwind.config.js
│ └── vite.config.ts
│
├── docker-compose.yml
├── nginx.conf
├── .env.example
├── .env
├── .gitignore
└── README.md
9. Decisioni Architetturali
DEC-001: Async-First Architecture
Decisione: Utilizzare Python async/await in tutto lo stack (FastAPI, SQLAlchemy, asyncpg)
Motivazione:
- Alto throughput richiesto (>1000 RPS)
- I/O bound operations (DB, tokenizer)
- Migliore utilizzo risorse rispetto a sync
Alternative considerate:
- Sync + ThreadPool: Più semplice ma meno efficiente
- Celery + Redis: Troppo complesso per use case
Conseguenze:
- Curva di apprendimento per async
- Debugging più complesso
- Migliore scalabilità
DEC-002: Repository Pattern
Decisione: Implementare Repository Pattern per accesso dati
Motivazione:
- Separazione tra business logic e data access
- Facile testing con mock repositories
- Possibilità di cambiare DB in futuro
Struttura:
class BaseRepository(Generic[T]):
async def get(self, id: UUID) -> T | None: ...
async def list(self, **filters) -> list[T]: ...
async def create(self, obj: T) -> T: ...
async def update(self, id: UUID, data: dict) -> T: ...
async def delete(self, id: UUID) -> bool: ...
DEC-003: Separate Database per Scenario
Decisione: Utilizzare una singola tabella scenario_logs con scenario_id FK invece di DB separati
Motivazione:
- Più semplice da gestire
- Query cross-scenario possibili (confronti)
- Backup/restore più semplice
Alternative considerate:
- Schema per scenario: Troppo overhead
- DB separati: Troppo complesso per MVP
DEC-004: Message Hashing for Deduplication
Decisione: Utilizzare SHA-256 hash del messaggio per deduplicazione
Motivazione:
- Privacy: Non memorizzare messaggi completi
- Performance: Hash lookup O(1)
- Storage: Risparmio spazio
Implementazione:
import hashlib
message_hash = hashlib.sha256(message.encode()).hexdigest()
DEC-005: Time-Series Metrics
Decisione: Salvare metriche come time-series in scenario_metrics
Motivazione:
- Trend analysis possibile
- Aggregazioni flessibili
- Audit trail
Trade-off:
- Più storage rispetto a campi aggregati
- Query più complesse ma indicizzate
10. Performance Considerations
10.1 Database Optimization
| Optimization | Implementation | Benefit |
|---|---|---|
| Indexes | B-tree on foreign keys, timestamps | Fast lookups |
| GIN | tags (JSONB) | Fast array search |
| Partitioning | scenario_logs by date | Query pruning |
| Connection Pool | asyncpg pool (20-50) | Concurrency |
10.2 Caching Strategy (Future)
Layer 1: In-memory (FastAPI state)
├─ Active scenario metadata
└─ AWS pricing (rarely changes)
Layer 2: Redis (future)
├─ Session storage
├─ Rate limiting counters
└─ Report generation status
10.3 Query Optimization
- Use
selectinloadfor relationships - Batch inserts for logs (copy_expert)
- Materialized views for reports
- Async tasks for heavy operations
11. Error Handling Strategy
11.1 Exception Hierarchy
class AppException(Exception):
"""Base application exception"""
status_code: int = 500
code: str = "internal_error"
class NotFoundException(AppException):
status_code = 404
code = "not_found"
class ValidationException(AppException):
status_code = 400
code = "validation_error"
class ConflictException(AppException):
status_code = 409
code = "conflict"
class RateLimitException(AppException):
status_code = 429
code = "rate_limited"
11.2 Global Exception Handler
@app.exception_handler(AppException)
async def app_exception_handler(request: Request, exc: AppException):
return JSONResponse(
status_code=exc.status_code,
content={
"error": exc.code,
"message": str(exc),
"timestamp": datetime.utcnow().isoformat()
}
)
12. Deployment Architecture
12.1 Docker Compose (Development)
version: '3.8'
services:
postgres:
image: postgres:15-alpine
environment:
POSTGRES_DB: mockupaws
POSTGRES_USER: app
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d mockupaws"]
backend:
build: ./backend
environment:
DATABASE_URL: postgresql+asyncpg://app:${DB_PASSWORD}@postgres:5432/mockupaws
ports:
- "8000:8000"
depends_on:
postgres:
condition: service_healthy
frontend:
build: ./frontend
ports:
- "3000:80"
depends_on:
- backend
volumes:
postgres_data:
12.2 Production Considerations
- Use managed PostgreSQL (AWS RDS, Azure PostgreSQL)
- Nginx as reverse proxy with SSL
- Environment-specific configuration
- Log aggregation (ELK or similar)
- Monitoring (Prometheus + Grafana)
- Health checks and readiness probes
Documento creato da @spec-architect
Versione: 1.0
Data: 2026-04-07