# Domain Pitfalls **Domain:** Cloud Lab Projects with Docker Simulation **Researched:** 2026-03-24 **Overall confidence:** HIGH (based on official Docker documentation and educational best practices) ## Critical Pitfalls ### Pitfall 1: Data Loss on Container Restart **What goes wrong:** Students lose all their work when containers are restarted or removed. Database data, uploaded files, and configuration changes disappear because data was written to the container's writable layer instead of volumes. **Why it happens:** - Beginners don't understand Docker's layered filesystem - Tutorials often skip volume configuration for simplicity - The difference between `docker stop` vs `docker rm` isn't clear - Anonymous volumes vs named volumes confusion **How to avoid:** - Always use named Docker volumes for persistent data - Explicitly declare volumes in `docker-compose.yml` under the top-level `volumes` key - Teach volume lifecycle: volumes persist after container removal - Use `--mount` flag syntax (more explicit) instead of `-v` for beginners **Warning signs:** - No top-level `volumes:` section in docker-compose.yml - Using inline volume paths like `./data:/app/data` without explaining persistence - Labs that work once but fail on restart - Students asking "where did my data go?" **Phase to address:** Lab 1 (IAM & Sicurezza) - Introduce volume concepts early Lab 4 (Storage) - Critical for object storage persistence Lab 5 (Database) - Essential for database data persistence --- ### Pitfall 2: Networking Confusion - Localhost vs Container Names **What goes wrong:** Students try to connect to services using `localhost` or `127.0.0.1` instead of container service names. Connections fail because containers have isolated network stacks. **Why it happens:** - Mental model from local development doesn't translate to containers - Docker's embedded DNS isn't explained - Students don't understand that each container has its own `localhost` - Confusion between exposed ports and published ports **How to avoid:** - Teach Docker's internal DNS resolution first - Always use service names for inter-container communication - Create network diagrams showing container isolation - Explain `EXPOSE` (docs) vs `-p` (publish) difference clearly **Warning signs:** - Connection refused errors when using localhost - Students asking "why can't I connect to the database?" - Mixing up `localhost` inside container vs `localhost` on host - Port mapping confusion (internal vs external ports) **Phase to address:** Lab 2 (Network) - Core networking concepts Lab 5 (Database) - Application-to-database connections --- ### Pitfall 3: OOM Killer - Resource Exhaustion **What goes wrong:** Containers or the entire Docker daemon are killed by the kernel's OOM (Out Of Memory) killer. Students lose work and the lab environment becomes unstable. **Why it happens:** - No memory limits set in docker-compose - Multiple containers compete for host memory - Students' machines have limited RAM (8GB or less) - Memory leaks in student code go unchecked **How to avoid:** - Always set `mem_limit` in docker-compose for each service - Use `deploy.resources.limits.memory` in compose file format v3+ - Monitor with `docker stats` - Teach students to check container resource usage - Provide minimum host requirements (16GB RAM recommended) **Warning signs:** - Containers randomly exit with code 137 - Host system becomes slow - `docker ps` shows containers restarting repeatedly - System logs mention "OOM killer" **Phase to address:** Lab 3 (Compute) - Resource limits are essential here All labs - Enforce resource limits from the start --- ### Pitfall 4: Security Misconfiguration - Running as Root **What goes wrong:** Containers run as root by default, creating security vulnerabilities and permission issues with volume mounts. **Why it happens:** - Docker's default behavior is root unless specified otherwise - Base images often set `USER root` - Beginners don't understand Linux user permissions - Volume permission errors seem "easier" to fix with root **How to avoid:** - Always specify `user:` directive in docker-compose or Dockerfile - Create non-root users in Dockerfiles - Teach Linux permission basics alongside Docker - Use Docker's user namespaces for advanced labs **Warning signs:** - Permission denied errors with volumes - Files created as root on host system - Security warnings in `docker inspect` - Running commands with `sudo` inside containers **Phase to address:** Lab 1 (IAM & Sicurezza) - User permissions and security basics --- ### Pitfall 5: Port Conflicts and Binding Issues **What goes wrong:** Multiple students' labs conflict when using default ports. Services fail to start because ports are already in use. **Why it happens:** - Hardcoded default ports (3306, 5432, 8080, etc.) - Multiple labs running simultaneously - Not teaching port mapping flexibility - Students don't know how to check occupied ports **How to avoid:** - Use non-standard ports in examples (e.g., 5433 instead of 5432) - Teach students to check port usage: `netstat -tuln` or `ss -tuln` - Document all port mappings in lab materials - Provide scripts to detect port conflicts **Warning signs:** - "port already allocated" errors - Services failing to start silently - Students reporting "it worked yesterday but not today" - Multiple labs can't run simultaneously **Phase to address:** Lab 2 (Network) - Port mapping and exposure All labs - Use unique port ranges per lab --- ### Pitfall 6: depends_on Without Readiness Checks **What goes wrong:** Services start but fail because dependencies aren't ready. Applications crash trying to connect to databases that are still initializing. **Why it happens:** - `depends_on` only waits for containers to start, not be ready - Databases need time to initialize (can take 10-30 seconds) - No healthcheck or readiness probe configured - Students assume "started" = "ready to accept connections" **How to avoid:** - Implement healthchecks for all services - Use restart policies with delays - Teach the difference between "running" and "healthy" - Provide example healthcheck scripts for common services **Warning signs:** - Intermittent connection failures on lab startup - "Connection refused" errors that go away with manual retry - Services exiting and restarting - Need to manually restart containers to make things work **Phase to address:** Lab 3 (Compute) - Health checks and service readiness Lab 5 (Database) - Database initialization timing --- ### Pitfall 7: Orphaned Resources - Disk Space Exhaustion **What goes wrong:** Students' disk space fills up with stopped containers, unused volumes, and dangling images. Docker becomes unusable. **Why it happens:** - Students never clean up resources - No teaching of `docker system prune` - Volumes aren't removed when containers are deleted - Image layers accumulate during development **How to avoid:** - Teach cleanup commands in every lab - Provide cleanup scripts - Use `--rm` flag for one-off containers - Explain volume lifecycle and manual removal - Monitor disk usage in lab instructions **Warning signs:** - Disk space warnings on host system - `docker system df` shows large unused space - Slow container startup due to image bloat - "No space left on device" errors **Phase to address:** Lab 1 (IAM & Sicurezza) - Docker basics and cleanup All labs - Include cleanup section in each lab --- ## Technical Debt Patterns Shortcuts that seem reasonable but create long-term problems. | Shortcut | Immediate Benefit | Long-term Cost | When Acceptable | |----------|-------------------|----------------|-----------------| | Using `--privileged` flag | Fixes permission issues quickly | Major security hole, teaches bad practices | NEVER in educational context | | Hardcoding IPs in configs | Works immediately for one student | Doesn't scale, breaks on different machines | NEVER - use service names | | Using `network_mode: host` | Simplifies networking | Breaks isolation, conflicts between labs | Only for debugging, never in final labs | | Anonymous volumes | Less configuration | Data loss, difficult cleanup | NEVER - always use named volumes | | Ignoring healthchecks | Faster startup | Flaky services, difficult debugging | NEVER - causes intermittent failures | | Using latest tags | No version numbers to track | Unexpected breaking changes | NEVER - use specific versions | | Exposing all ports | "It just works" | Security issues, port conflicts | NEVER - expose only what's needed | ## Integration Gotchas Common mistakes when connecting to external services. | Integration | Common Mistake | Correct Approach | |-------------|----------------|------------------| | MinIO (S3) | Using AWS endpoint `s3.amazonaws.com` | Use `http://localhost:9000` or container service name | | MinIO Console | Confusing port 9000 (API) with 9001 (Console) | Document both ports clearly, separate service configs | | PostgreSQL | Using default port 5432 (causes conflicts) | Use 5433 or other non-standard port | | MySQL | Not setting `MYSQL_ROOT_PASSWORD` env var | Always set required environment variables | | Redis | Using default port 6379 (may conflict) | Use non-standard port or proper networking | | Networks | Using legacy `--link` flag | Use user-defined networks and service names | | Volumes | Bind mounting to non-existent host paths | Create host directories first or use named volumes | | DNS | Using `/etc/hosts` inside containers | Use Docker's embedded DNS with service names | ## Performance Traps Patterns that work at small scale but fail as usage grows. | Trap | Symptoms | Prevention | When It Breaks | |------|----------|------------|----------------| | No resource limits | Works with 1-2 containers, host crashes at 5+ | Always set CPU and memory limits | 3-5 containers on 8GB RAM | | Single bridge network | Fine for simple apps, confusing for complex | Use multiple networks for isolation | 5+ services with different security needs | | tmpfs for data | Fast but data loss on restart | Use named volumes for persistence | Immediately on container restart | | Logging to json-file without rotation | Works initially, disk fills up | Set `max-size` and `max-file` options | After running labs repeatedly | | Building images in Compose | Slow rebuilds, large layers | Use multi-stage builds, .dockerignore | After 3-4 iterations | | Running everything on default network | Works until naming conflicts arise | Use custom networks per lab | When running multiple labs simultaneously | ## Security Mistakes Domain-specific security issues beyond general web security. | Mistake | Risk | Prevention | |---------|------|------------| | Mounting Docker socket (`/var/run/docker.sock`) | Container breakout, root on host | NEVER do this in educational context | | Running containers as root | Privilege escalation vulnerabilities | Always use `user:` directive | | Exposing all ports to 0.0.0.0 | Services accessible externally | Bind to 127.0.0.1 or use internal networks | | Using default credentials | Easy unauthorized access | Require password changes, document security | | Sharing host PID namespace | Container can see host processes | Never use `pid: host` in labs | | Ignoring cgroups | No resource isolation = DoS potential | Always set resource limits | | Using `--privileged` flag | Complete host access | NEVER acceptable, teaches bad practices | | Not using AppArmor/SELinux profiles | Missing security layer | Enable when available, document why | ## UX Pitfalls Common user experience mistakes in this domain. | Pitfall | User Impact | Better Approach | |---------|-------------|-----------------| | Silent failures | Students don't know what went wrong | Always provide error messages and logs | | No progress indicators | Students think labs are broken | Show startup progress, especially for databases | | Hidden dependencies | Labs fail mysteriously | List all prerequisites clearly, check at startup | | Complex YAML errors | Students stuck on syntax | Validate YAML before running, provide examples | | No rollback capability | Mistakes require starting over | Git version control, snapshot instructions | | Missing cleanup steps | Accumulating cruft, confusion | Provide cleanup scripts for each lab | | Unclear parallelisms | Students don't see the point | Explicitly map Docker concepts to AWS services | | Assuming prior knowledge | Beginners get lost | Provide background reading, glossary of terms | ## "Looks Done But Isn't" Checklist Things that appear complete but are missing critical pieces. - [ ] **Volume Persistence:** Often missing named volume declarations — verify data survives `docker down` and `docker up` - [ ] **Network Isolation:** Often missing network per lab — verify containers can't accidentally talk between labs - [ ] **Health Checks:** Often missing readiness verification — verify `docker ps` shows "healthy" not just "running" - [ ] **Resource Limits:** Often missing CPU/memory constraints — verify `docker stats` shows limits - [ ] **Cleanup Scripts:** Often missing tear-down instructions — verify `docker system df` is clean after lab - [ ] **Error Handling:** Often missing graceful failure modes — verify lab handles common errors (port conflicts, missing volumes) - [ ] **Cloud Parallels:** Often missing explicit mappings — verify each Docker component maps to a specific AWS service - [ ] **Diátaxis Documentation:** Often missing explanation documents — verify each lab has all 4 document types ## Recovery Strategies When pitfalls occur despite prevention, how to recover. | Pitfall | Recovery Cost | Recovery Steps | |---------|---------------|----------------| | Data loss from missing volumes | HIGH | Rebuild from scratch, add volumes, restore from backup if available | | OOM killer crash | MEDIUM | Add memory limits to all services, restart Docker daemon, free host memory | | Port conflicts | LOW | Change port mappings, kill conflicting processes, restart services | | Permission errors | MEDIUM | Add `user:` directive, chown volume directories, rebuild containers | | Network connectivity issues | LOW | Verify service names, check network attachment, ping between containers | | Disk space exhaustion | HIGH | Run `docker system prune -a`, remove unused volumes, clear build cache | | Orphaned containers | LOW | Run `docker container prune`, remove stopped containers manually | ## Pitfall-to-Phase Mapping How roadmap phases should address these pitfalls. | Pitfall | Prevention Phase | Verification | |---------|------------------|--------------| | Data loss on restart | Lab 1 - IAM & Sicurezza (introduce volumes), Lab 4 - Storage (critical) | Verify: `docker down` then `docker up` preserves data | | Networking confusion | Lab 2 - Network (core concepts), Lab 5 - Database (application connections) | Verify: Service names resolve, inter-container communication works | | OOM killer | Lab 3 - Compute (resource limits mandatory) | Verify: `docker stats` shows limits, no OOM errors under load | | Running as root | Lab 1 - IAM & Sicurezza (user permissions) | Verify: `docker exec -it whoami` shows non-root user | | Port conflicts | Lab 2 - Network (port mapping), All labs (unique ports) | Verify: Multiple labs can run simultaneously without errors | | depends_on without readiness | Lab 3 - Compute (healthchecks), Lab 5 - Database (initialization) | Verify: All services show "healthy" before app tries to connect | | Orphaned resources | Lab 1 - IAM & Sicurezza (cleanup commands), All labs (cleanup section) | Verify: `docker system df` shows minimal unused space after cleanup | ## Sources - [Docker Engine Security Documentation](https://docs.docker.com/engine/security/) - HIGH confidence, official Docker security best practices - [Docker Volumes Documentation](https://docs.docker.com/storage/volumes/) - HIGH confidence, official volume management reference - [Docker Compose File v3 Reference](https://docs.docker.com/compose/compose-file/compose-file-v3/) - HIGH confidence, official Compose specification - Docker documentation on resource limits and OOM prevention - Common educational patterns from container-based training courses - Known issues from Docker-based learning environments --- *Pitfalls research for: Cloud Lab Projects with Docker Simulation* *Researched: 2026-03-24*