lucasacchi/mockupAWS

Fork 0

Files

Luca Sacchi Ricciardi 38fd6cb562

CI/CD - Build & Test / Backend Tests (push) Has been cancelled

Details

CI/CD - Build & Test / Frontend Tests (push) Has been cancelled

Details

CI/CD - Build & Test / Security Scans (push) Has been cancelled

Details

CI/CD - Build & Test / Docker Build Test (push) Has been cancelled

Details

CI/CD - Build & Test / Terraform Validate (push) Has been cancelled

Details

Deploy to Production / Build & Test (push) Has been cancelled

Details

Deploy to Production / Security Scan (push) Has been cancelled

Details

Deploy to Production / Build Docker Images (push) Has been cancelled

Details

Deploy to Production / Deploy to Staging (push) Has been cancelled

Details

Deploy to Production / E2E Tests (push) Has been cancelled

Details

Deploy to Production / Deploy to Production (push) Has been cancelled

Details

E2E Tests / Run E2E Tests (push) Has been cancelled

Details

E2E Tests / Visual Regression Tests (push) Has been cancelled

Details

E2E Tests / Smoke Tests (push) Has been cancelled

Details

release: v1.0.0 - Production Ready

Complete production-ready release with all v1.0.0 features:

Architecture & Planning (@spec-architect):
- Production architecture design with scalability and HA
- Security audit plan and compliance review
- Technical debt assessment and refactoring roadmap

Database (@db-engineer):
- 17 performance indexes and 3 materialized views
- PgBouncer connection pooling
- Automated backup/restore with PITR (RTO<1h, RPO<5min)
- Data archiving strategy (~65% storage savings)

Backend (@backend-dev):
- Redis caching layer with 3-tier strategy
- Celery async jobs with Flower monitoring
- API v2 with rate limiting (tiered: free/premium/enterprise)
- Prometheus metrics and OpenTelemetry tracing
- Security hardening (headers, audit logging)

Frontend (@frontend-dev):
- Bundle optimization: 308KB (code splitting, lazy loading)
- Onboarding tutorial (react-joyride)
- Command palette (Cmd+K) and keyboard shortcuts
- Analytics dashboard with cost predictions
- i18n (English + Italian) and WCAG 2.1 AA compliance

DevOps (@devops-engineer):
- Complete deployment guide (Docker, K8s, AWS ECS)
- Terraform AWS infrastructure (Multi-AZ RDS, ElastiCache, ECS)
- CI/CD pipelines with blue-green deployment
- Prometheus + Grafana monitoring with 15+ alert rules
- SLA definition and incident response procedures

QA (@qa-engineer):
- 153+ E2E test cases (85% coverage)
- k6 performance tests (1000+ concurrent users, p95<200ms)
- Security testing (0 critical vulnerabilities)
- Cross-browser and mobile testing
- Official QA sign-off

Production Features:
✅ Horizontal scaling ready
✅ 99.9% uptime target
✅ <200ms response time (p95)
✅ Enterprise-grade security
✅ Complete observability
✅ Disaster recovery
✅ SLA monitoring

Ready for production deployment! 🚀

2026-04-07 20:14:51 +02:00

11 KiB

Raw Blame History

Backup & Restore Documentation

mockupAWS v1.0.0 - Database Disaster Recovery Guide

Overview
Recovery Objectives
Backup Strategy
Restore Procedures
Point-in-Time Recovery (PITR)
Disaster Recovery Procedures
Monitoring & Alerting
Troubleshooting

Overview

This document describes the backup, restore, and disaster recovery procedures for the mockupAWS PostgreSQL database.

Components

Automated Backups: Daily full backups via pg_dump
WAL Archiving: Continuous archiving for Point-in-Time Recovery
Encryption: AES-256 encryption for all backups
Storage: S3 with cross-region replication
Retention: 30 days for daily backups, 7 days for WAL archives

Recovery Objectives

Metric	Target	Description
RTO	< 1 hour	Time to restore service after failure
RPO	< 5 minutes	Maximum data loss acceptable
Backup Window	02:00-04:00 UTC	Daily backup execution time
Retention	30 days	Backup retention period

Backup Strategy

Backup Types

1. Full Backups (Daily)

Schedule: Daily at 02:00 UTC
Tool: pg_dump with custom format
Compression: gzip level 9
Encryption: AES-256-CBC
Retention: 30 days

2. WAL Archiving (Continuous)

Method: PostgreSQL archive_command
Frequency: Every WAL segment (16MB)
Storage: S3 nearline storage
Retention: 7 days

3. Configuration Backups

Files: postgresql.conf, pg_hba.conf
Schedule: Weekly
Storage: Version control + S3

Storage Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Primary Region │────▶│  S3 Standard    │────▶│  S3 Glacier     │
│  (us-east-1)    │     │  (30 days)      │     │  (long-term)    │
└─────────────────┘     └─────────────────┘     └─────────────────┘
         │
         ▼
┌─────────────────┐
│ Secondary Region│
│ (eu-west-1)     │  ← Cross-region replication for DR
└─────────────────┘

Required Environment Variables

# Required
export DATABASE_URL="postgresql://user:pass@host:5432/dbname"
export BACKUP_BUCKET="mockupaws-backups-prod"
export BACKUP_ENCRYPTION_KEY="your-256-bit-key-here"

# Optional
export BACKUP_REGION="us-east-1"
export BACKUP_SECONDARY_REGION="eu-west-1"
export BACKUP_SECONDARY_BUCKET="mockupaws-backups-dr"
export BACKUP_RETENTION_DAYS="30"

Restore Procedures

Quick Reference

Scenario	Command	ETA
Latest full backup	`./scripts/restore.sh latest`	15-30 min
Specific backup	`./scripts/restore.sh s3://bucket/path`	15-30 min
Point-in-Time	`./scripts/restore.sh latest --target-time "..."`	30-60 min
Verify only	`./scripts/restore.sh <file> --verify-only`	5-10 min

Step-by-Step Restore

1. Pre-Restore Checklist

Identify target database (should be empty or disposable)
Ensure sufficient disk space (2x database size)
Verify backup integrity: ./scripts/restore.sh <backup> --verify-only
Notify team about maintenance window
Document current database state

2. Full Restore from Latest Backup

# Set environment variables
export DATABASE_URL="postgresql://postgres:password@localhost:5432/mockupaws"
export BACKUP_ENCRYPTION_KEY="your-encryption-key"
export BACKUP_BUCKET="mockupaws-backups-prod"

# Perform restore
./scripts/restore.sh latest

3. Restore from Specific Backup

# From S3
./scripts/restore.sh s3://mockupaws-backups-prod/backups/full/20260407/backup.enc

# From local file
./scripts/restore.sh /path/to/backup/mockupaws_full_20260407_120000.sql.gz.enc

4. Post-Restore Verification

# Check database connectivity
psql $DATABASE_URL -c "SELECT COUNT(*) FROM scenarios;"

# Verify key tables
psql $DATABASE_URL -c "\dt"

# Check recent data
psql $DATABASE_URL -c "SELECT MAX(created_at) FROM scenario_logs;"

Point-in-Time Recovery (PITR)

Prerequisites

Base Backup: Full backup from before target time
WAL Archives: All WAL segments from backup time to target time
Configuration: PostgreSQL configured for archiving

PostgreSQL Configuration

Add to postgresql.conf:

# WAL Archiving
wal_level = replica
archive_mode = on
archive_command = 'aws s3 cp %p s3://mockupaws-wal-archive/wal/%f'
archive_timeout = 60

# Recovery settings (applied during restore)
recovery_target_time = '2026-04-07 14:30:00 UTC'
recovery_target_action = promote

PITR Procedure

# Restore to specific point in time
./scripts/restore.sh latest --target-time "2026-04-07 14:30:00"

Manual PITR (Advanced)

# 1. Stop PostgreSQL
sudo systemctl stop postgresql

# 2. Clear data directory
sudo rm -rf /var/lib/postgresql/data/*

# 3. Restore base backup
pg_basebackup -h primary -D /var/lib/postgresql/data -Fp -Xs -P

# 4. Create recovery signal
touch /var/lib/postgresql/data/recovery.signal

# 5. Configure recovery
cat >> /var/lib/postgresql/data/postgresql.conf <<EOF
restore_command = 'aws s3 cp s3://mockupaws-wal-archive/wal/%f %p'
recovery_target_time = '2026-04-07 14:30:00 UTC'
recovery_target_action = promote
EOF

# 6. Start PostgreSQL
sudo systemctl start postgresql

# 7. Monitor recovery
psql -c "SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();"

Disaster Recovery Procedures

DR Scenarios

Scenario 1: Database Corruption

# 1. Isolate corrupted database
psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'mockupaws';"

# 2. Restore from latest backup
./scripts/restore.sh latest

# 3. Verify data integrity
./scripts/verify-data.sh

# 4. Resume application traffic

Scenario 2: Complete Region Failure

# 1. Activate DR region
export BACKUP_BUCKET="mockupaws-backups-dr"
export AWS_REGION="eu-west-1"

# 2. Restore to DR database
./scripts/restore.sh latest

# 3. Update DNS/application configuration
# Point to DR region database endpoint

# 4. Verify application functionality

Scenario 3: Accidental Data Deletion

# 1. Identify deletion timestamp (from logs)
DELETION_TIME="2026-04-07 15:23:00"

# 2. Restore to point just before deletion
./scripts/restore.sh latest --target-time "$DELETION_TIME"

# 3. Export missing data
pg_dump --data-only --table=deleted_table > missing_data.sql

# 4. Restore to current and import missing data

DR Testing Schedule

Test Type	Frequency	Responsible
Backup verification	Daily	Automated
Restore test (dev)	Weekly	DevOps
Full DR drill	Monthly	SRE Team
Cross-region failover	Quarterly	Platform Team

Monitoring & Alerting

Backup Monitoring

-- Check backup history
SELECT 
    backup_type,
    created_at,
    status,
    EXTRACT(EPOCH FROM (NOW() - created_at))/3600 as hours_since_backup
FROM backup_history 
ORDER BY created_at DESC 
LIMIT 10;

Prometheus Alerts

# backup-alerts.yml
groups:
  - name: backup_alerts
    rules:
      - alert: BackupNotRun
        expr: time() - max(backup_last_success_timestamp) > 90000
        for: 1h
        labels:
          severity: critical
        annotations:
          summary: "Database backup has not run in 25 hours"
          
      - alert: BackupFailed
        expr: increase(backup_failures_total[1h]) > 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Database backup failed"
          
      - alert: LowBackupStorage
        expr: s3_bucket_free_bytes / s3_bucket_total_bytes < 0.1
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Backup storage capacity < 10%"

Health Checks

# Check backup status
curl -f http://localhost:8000/health/backup || echo "Backup check failed"

# Check WAL archiving
psql -c "SELECT archived_count, failed_count FROM pg_stat_archiver;"

# Check replication lag (if applicable)
psql -c "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;"

Troubleshooting

Common Issues

Issue: Backup fails with "disk full"

# Check disk space
df -h

# Clean old backups
./scripts/backup.sh cleanup

# Or manually remove old local backups
find /path/to/backups -mtime +7 -delete

Issue: Decryption fails

# Verify encryption key matches
export BACKUP_ENCRYPTION_KEY="correct-key"

# Test decryption
openssl enc -aes-256-cbc -d -pbkdf2 -in backup.enc -out backup.sql -pass pass:"$BACKUP_ENCRYPTION_KEY"

Issue: Restore fails with "database in use"

# Terminate connections
psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'mockupaws' AND pid <> pg_backend_pid();"

# Retry restore
./scripts/restore.sh latest

Issue: S3 upload fails

# Check AWS credentials
aws sts get-caller-identity

# Test S3 access
aws s3 ls s3://$BACKUP_BUCKET/

# Check bucket permissions
aws s3api get-bucket-acl --bucket $BACKUP_BUCKET

Log Files

Log File	Purpose
`storage/logs/backup_*.log`	Backup execution logs
`storage/logs/restore_*.log`	Restore execution logs
`/var/log/postgresql/*.log`	PostgreSQL server logs

Getting Help

Check this documentation
Review logs in storage/logs/
Contact: #database-ops Slack channel
Escalate to: on-call SRE (PagerDuty)

Appendix

A. Backup Retention Policy

Backup Type	Retention	Storage Class
Daily Full	30 days	S3 Standard-IA
Weekly Full	12 weeks	S3 Standard-IA
Monthly Full	12 months	S3 Glacier
Yearly Full	7 years	S3 Glacier Deep Archive
WAL Archives	7 days	S3 Standard

B. Backup Encryption

# Generate encryption key
openssl rand -base64 32

# Store in secrets manager
aws secretsmanager create-secret \
  --name mockupaws/backup-encryption-key \
  --secret-string "$(openssl rand -base64 32)"

C. Cron Configuration

# /etc/cron.d/mockupaws-backup
# Daily full backup at 02:00 UTC
0 2 * * * root /opt/mockupaws/scripts/backup.sh full >> /var/log/mockupaws/backup.log 2>&1

# Hourly WAL archive
0 * * * * root /opt/mockupaws/scripts/backup.sh wal >> /var/log/mockupaws/wal.log 2>&1

# Daily cleanup
0 4 * * * root /opt/mockupaws/scripts/backup.sh cleanup >> /var/log/mockupaws/cleanup.log 2>&1

Document History

Version	Date	Author	Changes
1.0.0	2026-04-07	DB Team	Initial release

For questions or updates to this document, contact the Database Engineering team.

11 KiB Raw Blame History

Backup & Restore Documentation

mockupAWS v1.0.0 - Database Disaster Recovery Guide

Table of Contents

Overview

Components

Recovery Objectives

Backup Strategy

Backup Types

1. Full Backups (Daily)

2. WAL Archiving (Continuous)

3. Configuration Backups

Storage Architecture

Required Environment Variables

Restore Procedures

Quick Reference

Step-by-Step Restore

1. Pre-Restore Checklist

2. Full Restore from Latest Backup

3. Restore from Specific Backup

4. Post-Restore Verification

Point-in-Time Recovery (PITR)

Prerequisites

PostgreSQL Configuration

PITR Procedure

Manual PITR (Advanced)

Disaster Recovery Procedures

DR Scenarios

Scenario 1: Database Corruption

Scenario 2: Complete Region Failure

Scenario 3: Accidental Data Deletion

DR Testing Schedule

Monitoring & Alerting

Backup Monitoring

Prometheus Alerts

Health Checks

Troubleshooting

Common Issues

Issue: Backup fails with "disk full"

Issue: Decryption fails

Issue: Restore fails with "database in use"

Issue: S3 upload fails

Log Files

Getting Help

Appendix

A. Backup Retention Policy

B. Backup Encryption

C. Cron Configuration

Document History

11 KiB

Raw Blame History