Phase Plans (5 files): - 04-RESEARCH.md: Domain research on Docker limits, healthchecks, EC2 parallels - 04-VALIDATION.md: Success criteria and validation strategy - 04-01-PLAN.md: Test infrastructure (RED phase) - 04-02-PLAN.md: Diátxis documentation - 04-03-PLAN.md: Infrastructure implementation (GREEN phase) Test Scripts (6 files, 1300+ lines): - 01-resource-limits-test.sh: Validate INF-03 compliance - 02-healthcheck-test.sh: Validate healthcheck configuration - 03-enforcement-test.sh: Verify resource limits with docker stats - 04-verify-infrastructure.sh: Infrastructure verification - 99-final-verification.sh: End-to-end student verification - run-all-tests.sh: Test orchestration with fail-fast - quick-test.sh: Fast validation (<30s) Documentation (11 files, 2500+ lines): Tutorials (3): - 01-set-resource-limits.md: EC2 instance types, Docker limits syntax - 02-implement-healthchecks.md: ELB health check parallels - 03-dependencies-with-health.md: depends_on with service_healthy How-to Guides (4): - check-resource-usage.md: docker stats monitoring - test-limits-enforcement.md: Stress testing CPU/memory - custom-healthcheck.md: HTTP, TCP, database healthchecks - instance-type-mapping.md: Docker limits → EC2 mapping Reference (3): - compose-resources-syntax.md: Complete deploy.resources reference - healthcheck-syntax.md: All healthcheck parameters - ec2-instance-mapping.md: Instance type mapping table Explanation (1): - compute-ec2-parallels.md: Container=EC2, Limits=Instance Type, Healthcheck=ELB Infrastructure: - docker-compose.yml: 5 services (web, app, worker, db, stress-test) All services: INF-03 compliant (cpus + memory limits) All services: healthcheck configured EC2 parallels: t2.nano, t2.micro, t2.small, t2.medium, m5.large - Dockerfile: Alpine 3.19 + stress tools + non-root user Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
348 lines
7.6 KiB
Markdown
348 lines
7.6 KiB
Markdown
# Tutorial 2: Implementare Healthchecks
|
|
|
|
In questo tutorial imparerai a configurare healthchecks per i container Docker, simulando gli **ELB Health Checks** di AWS.
|
|
|
|
## Obiettivi di Apprendimento
|
|
|
|
Al termine di questo tutorial sarai in grado di:
|
|
- Comprendere cosa sono i healthchecks e perché sono importanti
|
|
- Configurare healthchecks HTTP, CMD e custom
|
|
- Mappare healthchecks Docker agli ELB Health Checks
|
|
- Monitorare lo stato di salute dei container
|
|
|
|
---
|
|
|
|
## Prerequisiti
|
|
|
|
- Completamento di Tutorial 1: Configurare i Limiti delle Risorse
|
|
- docker-compose.yml con servizi configurati
|
|
- Container che espongono porte HTTP o servizi monitorabili
|
|
|
|
---
|
|
|
|
## Parte 1: Healthchecks - Concetti Fondamentali
|
|
|
|
### Cos'è un Healthcheck?
|
|
|
|
Un **healthcheck** è un comando periodico che verifica se un container è "sano" (healthy).
|
|
|
|
**Stati di un Container:**
|
|
1. **created** - Container creato ma non avviato
|
|
2. **starting** - Container in avvio (healthcheck in corso)
|
|
3. **healthy** - Healthcheck passing (container OK)
|
|
4. **unhealthy** - Healthcheck failing (container problematico)
|
|
5. **exited** - Container terminato
|
|
|
|
### Perché sono Importanti?
|
|
|
|
- **Failover:** Sostituisce container non sani
|
|
- **Zero-downtime:** Attiva solo container sani nel load balancer
|
|
- **Dependencies:** Altri servizi aspettano che il container diventi healthy
|
|
- **Monitoring:** Avvisa automaticamente su problemi
|
|
|
|
### Parallelismo: ELB Health Checks
|
|
|
|
| Docker | AWS ELB |
|
|
|--------|---------|
|
|
| healthcheck.test | Health check path/protocol |
|
|
| healthcheck.interval | Health check interval (default 30s) |
|
|
| healthcheck.timeout | Health check timeout (default 5s) |
|
|
| healthcheck.retries | Unhealthy threshold |
|
|
| healthcheck.start_period | Grace period |
|
|
|
|
---
|
|
|
|
## Parte 2: Sintassi Healthcheck in Docker Compose
|
|
|
|
### Sintassi di Base
|
|
|
|
```yaml
|
|
services:
|
|
web:
|
|
image: nginx:alpine
|
|
healthcheck:
|
|
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost/"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 3
|
|
start_period: 5s
|
|
```
|
|
|
|
### Parametri Spiegati
|
|
|
|
| Parametro | Default | Descrizione |
|
|
|-----------|---------|-------------|
|
|
| test | - | Comando da eseguire (richiesto) |
|
|
| interval | 30s | Frequenza del check |
|
|
| timeout | 30s | Tempo massimo per completare |
|
|
| retries | 3 | Tentativi prima di标记 unhealthy |
|
|
| start_period | 0s | Grace period all'avvio |
|
|
|
|
---
|
|
|
|
## Parte 3: Pratica - HTTP Healthcheck per Web Server
|
|
|
|
### Step 1: Aggiungere healthcheck al servizio web
|
|
|
|
Modifica `docker-compose.yml`:
|
|
|
|
```yaml
|
|
web:
|
|
image: nginx:alpine
|
|
container_name: lab03-web
|
|
hostname: web
|
|
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '1'
|
|
memory: 1G
|
|
|
|
# HTTP Healthcheck
|
|
healthcheck:
|
|
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost/"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 3
|
|
start_period: 5s
|
|
|
|
ports:
|
|
- "127.0.0.1:8080:80"
|
|
|
|
restart: unless-stopped
|
|
```
|
|
|
|
### Step 2: Avviare e verificare
|
|
|
|
```bash
|
|
docker compose up -d web
|
|
```
|
|
|
|
### Step 3: Monitorare lo stato di salute
|
|
|
|
```bash
|
|
# Mostra stato di salute
|
|
docker ps
|
|
|
|
# Output:
|
|
# CONTAINER IMAGE STATUS
|
|
# lab03-web nginx:alpine Up 30 seconds (healthy)
|
|
```
|
|
|
|
### Step 4: Ispezionare i dettagli del healthcheck
|
|
|
|
```bash
|
|
docker inspect lab03-web --format '{{json .State.Health}}' | jq
|
|
```
|
|
|
|
**Output JSON:**
|
|
```json
|
|
{
|
|
"Status": "healthy",
|
|
"FailingStreak": 0,
|
|
"Log": [
|
|
{
|
|
"Start": "2024-03-25T10:00:00Z",
|
|
"End": "2024-03-25T10:00:00Z",
|
|
"ExitCode": 0,
|
|
"Output": ""
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Parte 4: Pratica - Database Healthcheck
|
|
|
|
### Step 1: Aggiungere servizio database
|
|
|
|
```yaml
|
|
db:
|
|
image: postgres:16-alpine
|
|
container_name: lab03-db
|
|
hostname: db
|
|
|
|
environment:
|
|
POSTGRES_DB: lab03_db
|
|
POSTGRES_USER: lab03_user
|
|
POSTGRES_PASSWORD: lab03_password
|
|
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '2'
|
|
memory: 4G
|
|
|
|
# Database Healthcheck
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U lab03_user -d lab03_db || exit 1"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
start_period: 10s
|
|
|
|
volumes:
|
|
- db-data:/var/lib/postgresql/data
|
|
|
|
restart: unless-stopped
|
|
```
|
|
|
|
Nota che il database:
|
|
- Ha più `retries` (5 vs 3) - i database partono più lentamente
|
|
- Ha `start_period` più lungo (10s vs 5s) - grace period esteso
|
|
|
|
### Step 2: Verificare il database diventi healthy
|
|
|
|
```bash
|
|
docker compose up -d db
|
|
|
|
# Attendere che diventi healthy
|
|
watch -n 2 'docker ps --filter "name=lab03-db" --format "table {{.Names}}\t{{.Status}}"'
|
|
```
|
|
|
|
Premi `Ctrl+C` quando vedi `(healthy)`.
|
|
|
|
---
|
|
|
|
## Parte 5: Pratica - CMD-SHELL Healthcheck
|
|
|
|
Per comandi più complessi, usa `CMD-SHELL`:
|
|
|
|
```yaml
|
|
app:
|
|
image: myapp:latest
|
|
container_name: lab03-app
|
|
hostname: app
|
|
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '1'
|
|
memory: 2G
|
|
|
|
# Custom healthcheck with shell
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "curl -f http://localhost/health || exit 1"]
|
|
interval: 15s
|
|
timeout: 3s
|
|
retries: 3
|
|
start_period: 30s
|
|
|
|
restart: unless-stopped
|
|
```
|
|
|
|
### Esempi di Healthcheck
|
|
|
|
**HTTP con curl:**
|
|
```yaml
|
|
test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
|
|
```
|
|
|
|
**TCP connection:**
|
|
```yaml
|
|
test: ["CMD-SHELL", "nc -z localhost 8080 || exit 1"]
|
|
```
|
|
|
|
**File existence:**
|
|
```yaml
|
|
test: ["CMD-SHELL", "test -f /tmp/ready || exit 1"]
|
|
```
|
|
|
|
**Simple always-succeed:**
|
|
```yaml
|
|
test: ["CMD-SHELL", "exit 0"]
|
|
```
|
|
|
|
---
|
|
|
|
## Parte 6: ELB Health Check Parallelism
|
|
|
|
### Mapping Completo
|
|
|
|
| Docker Healthcheck | ELB Health Check | AWS Default | Docker Default |
|
|
|--------------------|------------------|-------------|----------------|
|
|
| test | Protocol + Path | TCP:80 | Configurato |
|
|
| interval | Interval | 30s | 30s |
|
|
| timeout | Timeout | 5s | 30s |
|
|
| retries | Unhealthy Threshold | 2 | 3 |
|
|
| start_period | - | - | 0s |
|
|
|
|
### Configurazione Equivalente AWS
|
|
|
|
**Docker:**
|
|
```yaml
|
|
healthcheck:
|
|
test: ["CMD", "wget", "--spider", "-q", "http://localhost/health"]
|
|
interval: 30s
|
|
timeout: 5s
|
|
retries: 2
|
|
```
|
|
|
|
**ELB (equivalente):**
|
|
```json
|
|
{
|
|
"TargetGroup": {
|
|
"HealthCheckProtocol": "HTTP",
|
|
"HealthCheckPath": "/health",
|
|
"HealthCheckIntervalSeconds": 30,
|
|
"HealthCheckTimeoutSeconds": 5,
|
|
"UnhealthyThresholdCount": 2
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Parte 7: Debugging Healthchecks
|
|
|
|
### Problema: Container rimane "starting"
|
|
|
|
**Causa:** Healthcheck non passa entro `start_period`
|
|
|
|
**Soluzione:**
|
|
- Aumenta `start_period`
|
|
- Riduci `interval` per check più frequenti
|
|
- Verifica che il servizio sia realmente pronto
|
|
|
|
### Problema: Container diventa "unhealthy"
|
|
|
|
**Debug:**
|
|
```bash
|
|
# Guarda i log del healthcheck
|
|
docker inspect lab03-web --format '{{range .State.Health.Log}}{{.Start}} - {{.ExitCode}} - {{.Output}}{{"\n"}}{{end}}'
|
|
|
|
# Esegui il comando manualmente
|
|
docker exec lab03-web wget --spider -q http://localhost/
|
|
```
|
|
|
|
### Problema: Healthcheck troppo lento
|
|
|
|
**Ottimizza:**
|
|
- Usa check HTTP leggeri (non pagine pesanti)
|
|
- Usa check TCP invece di HTTP se possibile
|
|
- Riduci `timeout` se il check è veloce
|
|
|
|
---
|
|
|
|
## Riepilogo
|
|
|
|
In questo tutorial hai imparato:
|
|
|
|
✓ **Concetto:** Healthchecks verificano lo stato di salute dei container
|
|
✓ **Sintassi:** Parametri test, interval, timeout, retries, start_period
|
|
✓ **Tipi:** HTTP, CMD-SHELL, custom healthchecks
|
|
✓ **Parallelismo:** Healthcheck Docker → ELB Health Check
|
|
✓ **Monitoring:** `docker ps` mostra stato (healthy/unhealthy)
|
|
|
|
---
|
|
|
|
## Prossimi Passi
|
|
|
|
Nel prossimo tutorial imparerai a:
|
|
- Configurare **dipendenze** tra servizi
|
|
- Usare `depends_on` con `condition: service_healthy`
|
|
- Implementare startup ordinato per applicazioni multi-tier
|
|
|
|
Continua con **Tutorial 3: Dipendenze con Healthchecks** →
|