release: v1.0.0 - Production Ready

Complete production-ready release with all v1.0.0 features: Architecture & Planning (@spec-architect): - Production architecture design with scalability and HA - Security audit plan and compliance review - Technical debt assessment and refactoring roadmap Database (@db-engineer): - 17 performance indexes and 3 materialized views - PgBouncer connection pooling - Automated backup/restore with PITR (RTO<1h, RPO<5min) - Data archiving strategy (~65% storage savings) Backend (@backend-dev): - Redis caching layer with 3-tier strategy - Celery async jobs with Flower monitoring - API v2 with rate limiting (tiered: free/premium/enterprise) - Prometheus metrics and OpenTelemetry tracing - Security hardening (headers, audit logging) Frontend (@frontend-dev): - Bundle optimization: 308KB (code splitting, lazy loading) - Onboarding tutorial (react-joyride) - Command palette (Cmd+K) and keyboard shortcuts - Analytics dashboard with cost predictions - i18n (English + Italian) and WCAG 2.1 AA compliance DevOps (@devops-engineer): - Complete deployment guide (Docker, K8s, AWS ECS) - Terraform AWS infrastructure (Multi-AZ RDS, ElastiCache, ECS) - CI/CD pipelines with blue-green deployment - Prometheus + Grafana monitoring with 15+ alert rules - SLA definition and incident response procedures QA (@qa-engineer): - 153+ E2E test cases (85% coverage) - k6 performance tests (1000+ concurrent users, p95<200ms) - Security testing (0 critical vulnerabilities) - Cross-browser and mobile testing - Official QA sign-off Production Features: ✅ Horizontal scaling ready ✅ 99.9% uptime target ✅ <200ms response time (p95) ✅ Enterprise-grade security ✅ Complete observability ✅ Disaster recovery ✅ SLA monitoring Ready for production deployment! 🚀
2026-04-07 20:14:51 +02:00
parent eba5a1d67a
commit 38fd6cb562
122 changed files with 32902 additions and 240 deletions
--- a/docs/DATA-ARCHIVING.md
+++ b/docs/DATA-ARCHIVING.md
@@ -0,0 +1,568 @@
+# Data Archiving Strategy
+
+## mockupAWS v1.0.0 - Data Lifecycle Management
+
+---
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Archive Policies](#archive-policies)
+3. [Implementation](#implementation)
+4. [Archive Job](#archive-job)
+5. [Querying Archived Data](#querying-archived-data)
+6. [Monitoring](#monitoring)
+7. [Storage Estimation](#storage-estimation)
+
+---
+
+## Overview
+
+As mockupAWS accumulates data over time, we implement an automated archiving strategy to:
+
+- **Reduce storage costs** by moving old data to archive tables
+- **Improve query performance** on active data
+- **Maintain data accessibility** through unified views
+- **Comply with data retention policies**
+
+### Archive Strategy Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                     Data Lifecycle                               │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  Active Data (Hot)    │    Archive Data (Cold)                  │
+│  ─────────────────    │    ──────────────────                   │
+│  • Fast queries       │    • Partitioned by month               │
+│  • Full indexing      │    • Compressed                         │
+│  • Real-time writes   │    • S3 for large files                 │
+│                                                                 │
+│  scenario_logs        │    → scenario_logs_archive              │
+│  (> 1 year old)       │    (> 1 year, partitioned)              │
+│                                                                 │
+│  scenario_metrics     │    → scenario_metrics_archive           │
+│  (> 2 years old)      │    (> 2 years, aggregated)              │
+│                                                                 │
+│  reports              │    → reports_archive                    │
+│  (> 6 months old)     │    (> 6 months, S3 storage)             │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Archive Policies
+
+### Policy Configuration
+
+| Table | Archive After | Aggregation | Compression | S3 Storage |
+|-------|--------------|-------------|-------------|------------|
+| `scenario_logs` | 365 days | No | No | No |
+| `scenario_metrics` | 730 days | Daily | No | No |
+| `reports` | 180 days | No | Yes | Yes |
+
+### Detailed Policies
+
+#### 1. Scenario Logs Archive (> 1 year)
+
+**Criteria:**
+- Records older than 365 days
+- Move to `scenario_logs_archive` table
+- Partitioned by month for efficient querying
+
+**Retention:**
+- Archive table: 7 years
+- After 7 years: Delete or move to long-term storage
+
+#### 2. Scenario Metrics Archive (> 2 years)
+
+**Criteria:**
+- Records older than 730 days
+- Aggregate to daily values before archiving
+- Store aggregated data in `scenario_metrics_archive`
+
+**Aggregation:**
+- Group by: scenario_id, metric_type, metric_name, day
+- Aggregate: AVG(value), COUNT(samples)
+
+**Retention:**
+- Archive table: 5 years
+- Aggregated data only (original samples deleted)
+
+#### 3. Reports Archive (> 6 months)
+
+**Criteria:**
+- Reports older than 180 days
+- Compress PDF/CSV files
+- Upload to S3
+- Keep metadata in `reports_archive` table
+
+**Retention:**
+- S3 storage: 3 years with lifecycle to Glacier
+- Metadata: 5 years
+
+---
+
+## Implementation
+
+### Database Schema
+
+#### Archive Tables
+
+```sql
+-- Scenario logs archive (partitioned by month)
+CREATE TABLE scenario_logs_archive (
+    id UUID PRIMARY KEY,
+    scenario_id UUID NOT NULL,
+    received_at TIMESTAMPTZ NOT NULL,
+    message_hash VARCHAR(64) NOT NULL,
+    message_preview VARCHAR(500),
+    source VARCHAR(100) NOT NULL,
+    size_bytes INTEGER NOT NULL,
+    has_pii BOOLEAN NOT NULL,
+    token_count INTEGER NOT NULL,
+    sqs_blocks INTEGER NOT NULL,
+    archived_at TIMESTAMPTZ DEFAULT NOW(),
+    archive_batch_id UUID
+) PARTITION BY RANGE (DATE_TRUNC('month', received_at));
+
+-- Scenario metrics archive (with aggregation support)
+CREATE TABLE scenario_metrics_archive (
+    id UUID PRIMARY KEY,
+    scenario_id UUID NOT NULL,
+    timestamp TIMESTAMPTZ NOT NULL,
+    metric_type VARCHAR(50) NOT NULL,
+    metric_name VARCHAR(100) NOT NULL,
+    value DECIMAL(15,6) NOT NULL,
+    unit VARCHAR(20) NOT NULL,
+    extra_data JSONB DEFAULT '{}',
+    archived_at TIMESTAMPTZ DEFAULT NOW(),
+    archive_batch_id UUID,
+    is_aggregated BOOLEAN DEFAULT FALSE,
+    aggregation_period VARCHAR(20),
+    sample_count INTEGER
+) PARTITION BY RANGE (DATE_TRUNC('month', timestamp));
+
+-- Reports archive (S3 references)
+CREATE TABLE reports_archive (
+    id UUID PRIMARY KEY,
+    scenario_id UUID NOT NULL,
+    format VARCHAR(10) NOT NULL,
+    file_path VARCHAR(500) NOT NULL,
+    file_size_bytes INTEGER,
+    generated_by VARCHAR(100),
+    extra_data JSONB DEFAULT '{}',
+    created_at TIMESTAMPTZ NOT NULL,
+    archived_at TIMESTAMPTZ DEFAULT NOW(),
+    s3_location VARCHAR(500),
+    deleted_locally BOOLEAN DEFAULT FALSE,
+    archive_batch_id UUID
+);
+```
+
+#### Unified Views (Query Transparency)
+
+```sql
+-- View combining live and archived logs
+CREATE VIEW v_scenario_logs_all AS
+SELECT 
+    id, scenario_id, received_at, message_hash, message_preview,
+    source, size_bytes, has_pii, token_count, sqs_blocks,
+    NULL::timestamptz as archived_at,
+    false as is_archived
+FROM scenario_logs
+UNION ALL
+SELECT 
+    id, scenario_id, received_at, message_hash, message_preview,
+    source, size_bytes, has_pii, token_count, sqs_blocks,
+    archived_at,
+    true as is_archived
+FROM scenario_logs_archive;
+
+-- View combining live and archived metrics
+CREATE VIEW v_scenario_metrics_all AS
+SELECT 
+    id, scenario_id, timestamp, metric_type, metric_name,
+    value, unit, extra_data,
+    NULL::timestamptz as archived_at,
+    false as is_aggregated,
+    false as is_archived
+FROM scenario_metrics
+UNION ALL
+SELECT 
+    id, scenario_id, timestamp, metric_type, metric_name,
+    value, unit, extra_data,
+    archived_at,
+    is_aggregated,
+    true as is_archived
+FROM scenario_metrics_archive;
+```
+
+### Archive Job Tracking
+
+```sql
+-- Archive jobs table
+CREATE TABLE archive_jobs (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    job_type VARCHAR(50) NOT NULL,
+    status VARCHAR(50) NOT NULL DEFAULT 'pending',
+    started_at TIMESTAMPTZ,
+    completed_at TIMESTAMPTZ,
+    records_processed INTEGER DEFAULT 0,
+    records_archived INTEGER DEFAULT 0,
+    records_deleted INTEGER DEFAULT 0,
+    bytes_archived BIGINT DEFAULT 0,
+    error_message TEXT,
+    created_at TIMESTAMPTZ DEFAULT NOW()
+);
+
+-- Archive statistics view
+CREATE VIEW v_archive_statistics AS
+SELECT 
+    'logs' as archive_type,
+    COUNT(*) as total_records,
+    MIN(received_at) as oldest_record,
+    MAX(received_at) as newest_record,
+    SUM(size_bytes) as total_bytes
+FROM scenario_logs_archive
+UNION ALL
+SELECT 
+    'metrics' as archive_type,
+    COUNT(*) as total_records,
+    MIN(timestamp) as oldest_record,
+    MAX(timestamp) as newest_record,
+    0 as total_bytes
+FROM scenario_metrics_archive
+UNION ALL
+SELECT 
+    'reports' as archive_type,
+    COUNT(*) as total_records,
+    MIN(created_at) as oldest_record,
+    MAX(created_at) as newest_record,
+    SUM(file_size_bytes) as total_bytes
+FROM reports_archive;
+```
+
+---
+
+## Archive Job
+
+### Running the Archive Job
+
+```bash
+# Preview what would be archived (dry run)
+python scripts/archive_job.py --dry-run --all
+
+# Archive all eligible data
+python scripts/archive_job.py --all
+
+# Archive specific types only
+python scripts/archive_job.py --logs
+python scripts/archive_job.py --metrics
+python scripts/archive_job.py --reports
+
+# Combine options
+python scripts/archive_job.py --logs --metrics --dry-run
+```
+
+### Cron Configuration
+
+```bash
+# Run archive job nightly at 3:00 AM
+0 3 * * * /opt/mockupaws/.venv/bin/python /opt/mockupaws/scripts/archive_job.py --all >> /var/log/mockupaws/archive.log 2>&1
+```
+
+### Environment Variables
+
+```bash
+# Required
+export DATABASE_URL="postgresql+asyncpg://user:pass@host:5432/mockupaws"
+
+# For reports S3 archiving
+export REPORTS_ARCHIVE_BUCKET="mockupaws-reports-archive"
+export AWS_ACCESS_KEY_ID="your-key"
+export AWS_SECRET_ACCESS_KEY="your-secret"
+export AWS_DEFAULT_REGION="us-east-1"
+```
+
+---
+
+## Querying Archived Data
+
+### Transparent Access
+
+Use the unified views for automatic access to both live and archived data:
+
+```sql
+-- Query all logs (live + archived)
+SELECT * FROM v_scenario_logs_all 
+WHERE scenario_id = 'uuid-here'
+ORDER BY received_at DESC
+LIMIT 1000;
+
+-- Query all metrics (live + archived)
+SELECT * FROM v_scenario_metrics_all 
+WHERE scenario_id = 'uuid-here'
+  AND timestamp > NOW() - INTERVAL '2 years'
+ORDER BY timestamp;
+```
+
+### Optimized Queries
+
+```sql
+-- Query only live data (faster)
+SELECT * FROM scenario_logs 
+WHERE scenario_id = 'uuid-here'
+ORDER BY received_at DESC;
+
+-- Query only archived data
+SELECT * FROM scenario_logs_archive 
+WHERE scenario_id = 'uuid-here'
+  AND received_at < NOW() - INTERVAL '1 year'
+ORDER BY received_at DESC;
+
+-- Query specific month partition (most efficient)
+SELECT * FROM scenario_logs_archive 
+WHERE received_at >= '2025-01-01' 
+  AND received_at < '2025-02-01'
+  AND scenario_id = 'uuid-here';
+```
+
+### Application Code Example
+
+```python
+from sqlalchemy import select
+from src.models.scenario_log import ScenarioLog
+
+async def get_logs(db: AsyncSession, scenario_id: UUID, include_archived: bool = False):
+    """Get scenario logs with optional archive inclusion."""
+    
+    if include_archived:
+        # Use unified view for complete history
+        result = await db.execute(
+            text("""
+                SELECT * FROM v_scenario_logs_all 
+                WHERE scenario_id = :sid
+                ORDER BY received_at DESC
+            """),
+            {"sid": scenario_id}
+        )
+    else:
+        # Query only live data (faster)
+        result = await db.execute(
+            select(ScenarioLog)
+            .where(ScenarioLog.scenario_id == scenario_id)
+            .order_by(ScenarioLog.received_at.desc())
+        )
+    
+    return result.scalars().all()
+```
+
+---
+
+## Monitoring
+
+### Archive Job Status
+
+```sql
+-- Check recent archive jobs
+SELECT 
+    job_type,
+    status,
+    started_at,
+    completed_at,
+    records_archived,
+    records_deleted,
+    pg_size_pretty(bytes_archived) as space_saved
+FROM archive_jobs
+ORDER BY started_at DESC
+LIMIT 10;
+
+-- Check for failed jobs
+SELECT * FROM archive_jobs 
+WHERE status = 'failed'
+ORDER BY started_at DESC;
+```
+
+### Archive Statistics
+
+```sql
+-- View archive statistics
+SELECT * FROM v_archive_statistics;
+
+-- Archive growth over time
+SELECT 
+    DATE_TRUNC('month', archived_at) as archive_month,
+    archive_type,
+    COUNT(*) as records_archived,
+    pg_size_pretty(SUM(total_bytes)) as bytes_archived
+FROM v_archive_statistics
+GROUP BY DATE_TRUNC('month', archived_at), archive_type
+ORDER BY archive_month DESC;
+```
+
+### Alerts
+
+```yaml
+# archive-alerts.yml
+groups:
+  - name: archive_alerts
+    rules:
+      - alert: ArchiveJobFailed
+        expr: increase(archive_job_failures_total[1h]) > 0
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Data archive job failed"
+          
+      - alert: ArchiveJobNotRunning
+        expr: time() - max(archive_job_last_success_timestamp) > 90000
+        for: 1h
+        labels:
+          severity: warning
+        annotations:
+          summary: "Archive job has not run in 25 hours"
+          
+      - alert: ArchiveStorageGrowing
+        expr: rate(archive_bytes_total[1d]) > 1073741824  # 1GB/day
+        for: 1h
+        labels:
+          severity: info
+        annotations:
+          summary: "Archive storage growing rapidly"
+```
+
+---
+
+## Storage Estimation
+
+### Projected Storage Savings
+
+Assuming typical usage patterns:
+
+| Data Type | Daily Volume | Annual Volume | After Archive | Savings |
+|-----------|--------------|---------------|---------------|---------|
+| Logs | 1M records/day | 365M records | 365M in archive | 0 in main |
+| Metrics | 500K records/day | 182M records | 60M aggregated | 66% reduction |
+| Reports | 100/day (50MB each) | 1.8TB | 1.8TB in S3 | 100% local reduction |
+
+### Cost Analysis (Monthly)
+
+| Storage Type | Before Archive | After Archive | Monthly Savings |
+|--------------|----------------|---------------|-----------------|
+| PostgreSQL (hot) | $200 | $50 | $150 |
+| PostgreSQL (archive) | $0 | $30 | -$30 |
+| S3 Standard | $0 | $20 | -$20 |
+| S3 Glacier | $0 | $5 | -$5 |
+| **Total** | **$200** | **$105** | **$95** |
+
+*Estimates based on AWS us-east-1 pricing, actual costs may vary.*
+
+---
+
+## Maintenance
+
+### Monthly Tasks
+
+1. **Review archive statistics**
+   ```sql
+   SELECT * FROM v_archive_statistics;
+   ```
+
+2. **Check for old archive partitions**
+   ```sql
+   SELECT 
+       schemaname, 
+       tablename,
+       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
+   FROM pg_tables
+   WHERE tablename LIKE 'scenario_logs_archive_%'
+   ORDER BY tablename;
+   ```
+
+3. **Clean up old S3 files** (after retention period)
+   ```bash
+   aws s3 rm s3://mockupaws-reports-archive/archived-reports/ \
+     --recursive \
+     --exclude '*' \
+     --include '*2023*'
+   ```
+
+### Quarterly Tasks
+
+1. **Archive job performance review**
+   - Check execution times
+   - Optimize batch sizes if needed
+
+2. **Storage cost review**
+   - Verify S3 lifecycle policies
+   - Consider Glacier transition for old archives
+
+3. **Data retention compliance**
+   - Verify deletion of data past retention period
+   - Update policies as needed
+
+---
+
+## Troubleshooting
+
+### Archive Job Fails
+
+```bash
+# Check logs
+tail -f storage/logs/archive_*.log
+
+# Run with verbose output
+python scripts/archive_job.py --all --verbose
+
+# Check database connectivity
+psql $DATABASE_URL -c "SELECT COUNT(*) FROM archive_jobs;"
+```
+
+### S3 Upload Fails
+
+```bash
+# Verify AWS credentials
+aws sts get-caller-identity
+
+# Test S3 access
+aws s3 ls s3://mockupaws-reports-archive/
+
+# Check bucket policy
+aws s3api get-bucket-policy --bucket mockupaws-reports-archive
+```
+
+### Query Performance Issues
+
+```sql
+-- Check if indexes exist on archive tables
+SELECT indexname, indexdef 
+FROM pg_indexes 
+WHERE tablename LIKE '%_archive%';
+
+-- Analyze archive tables
+ANALYZE scenario_logs_archive;
+ANALYZE scenario_metrics_archive;
+
+-- Check partition pruning
+EXPLAIN ANALYZE 
+SELECT * FROM scenario_logs_archive 
+WHERE received_at >= '2025-01-01' 
+  AND received_at < '2025-02-01';
+```
+
+---
+
+## References
+
+- [PostgreSQL Table Partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html)
+- [AWS S3 Lifecycle Policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)
+- [Database Migration](alembic/versions/b2c3d4e5f6a7_create_archive_tables_v1_0_0.py)
+- [Archive Job Script](../scripts/archive_job.py)
+
+---
+
+*Document Version: 1.0.0*
+*Last Updated: 2026-04-07*