16 KiB
Domain Pitfalls
Domain: Cloud Lab Projects with Docker Simulation Researched: 2026-03-24 Overall confidence: HIGH (based on official Docker documentation and educational best practices)
Critical Pitfalls
Pitfall 1: Data Loss on Container Restart
What goes wrong: Students lose all their work when containers are restarted or removed. Database data, uploaded files, and configuration changes disappear because data was written to the container's writable layer instead of volumes.
Why it happens:
- Beginners don't understand Docker's layered filesystem
- Tutorials often skip volume configuration for simplicity
- The difference between
docker stopvsdocker rmisn't clear - Anonymous volumes vs named volumes confusion
How to avoid:
- Always use named Docker volumes for persistent data
- Explicitly declare volumes in
docker-compose.ymlunder the top-levelvolumeskey - Teach volume lifecycle: volumes persist after container removal
- Use
--mountflag syntax (more explicit) instead of-vfor beginners
Warning signs:
- No top-level
volumes:section in docker-compose.yml - Using inline volume paths like
./data:/app/datawithout explaining persistence - Labs that work once but fail on restart
- Students asking "where did my data go?"
Phase to address: Lab 1 (IAM & Sicurezza) - Introduce volume concepts early Lab 4 (Storage) - Critical for object storage persistence Lab 5 (Database) - Essential for database data persistence
Pitfall 2: Networking Confusion - Localhost vs Container Names
What goes wrong:
Students try to connect to services using localhost or 127.0.0.1 instead of container service names. Connections fail because containers have isolated network stacks.
Why it happens:
- Mental model from local development doesn't translate to containers
- Docker's embedded DNS isn't explained
- Students don't understand that each container has its own
localhost - Confusion between exposed ports and published ports
How to avoid:
- Teach Docker's internal DNS resolution first
- Always use service names for inter-container communication
- Create network diagrams showing container isolation
- Explain
EXPOSE(docs) vs-p(publish) difference clearly
Warning signs:
- Connection refused errors when using localhost
- Students asking "why can't I connect to the database?"
- Mixing up
localhostinside container vslocalhoston host - Port mapping confusion (internal vs external ports)
Phase to address: Lab 2 (Network) - Core networking concepts Lab 5 (Database) - Application-to-database connections
Pitfall 3: OOM Killer - Resource Exhaustion
What goes wrong: Containers or the entire Docker daemon are killed by the kernel's OOM (Out Of Memory) killer. Students lose work and the lab environment becomes unstable.
Why it happens:
- No memory limits set in docker-compose
- Multiple containers compete for host memory
- Students' machines have limited RAM (8GB or less)
- Memory leaks in student code go unchecked
How to avoid:
- Always set
mem_limitin docker-compose for each service - Use
deploy.resources.limits.memoryin compose file format v3+ - Monitor with
docker stats - Teach students to check container resource usage
- Provide minimum host requirements (16GB RAM recommended)
Warning signs:
- Containers randomly exit with code 137
- Host system becomes slow
docker psshows containers restarting repeatedly- System logs mention "OOM killer"
Phase to address: Lab 3 (Compute) - Resource limits are essential here All labs - Enforce resource limits from the start
Pitfall 4: Security Misconfiguration - Running as Root
What goes wrong: Containers run as root by default, creating security vulnerabilities and permission issues with volume mounts.
Why it happens:
- Docker's default behavior is root unless specified otherwise
- Base images often set
USER root - Beginners don't understand Linux user permissions
- Volume permission errors seem "easier" to fix with root
How to avoid:
- Always specify
user:directive in docker-compose or Dockerfile - Create non-root users in Dockerfiles
- Teach Linux permission basics alongside Docker
- Use Docker's user namespaces for advanced labs
Warning signs:
- Permission denied errors with volumes
- Files created as root on host system
- Security warnings in
docker inspect - Running commands with
sudoinside containers
Phase to address: Lab 1 (IAM & Sicurezza) - User permissions and security basics
Pitfall 5: Port Conflicts and Binding Issues
What goes wrong: Multiple students' labs conflict when using default ports. Services fail to start because ports are already in use.
Why it happens:
- Hardcoded default ports (3306, 5432, 8080, etc.)
- Multiple labs running simultaneously
- Not teaching port mapping flexibility
- Students don't know how to check occupied ports
How to avoid:
- Use non-standard ports in examples (e.g., 5433 instead of 5432)
- Teach students to check port usage:
netstat -tulnorss -tuln - Document all port mappings in lab materials
- Provide scripts to detect port conflicts
Warning signs:
- "port already allocated" errors
- Services failing to start silently
- Students reporting "it worked yesterday but not today"
- Multiple labs can't run simultaneously
Phase to address: Lab 2 (Network) - Port mapping and exposure All labs - Use unique port ranges per lab
Pitfall 6: depends_on Without Readiness Checks
What goes wrong: Services start but fail because dependencies aren't ready. Applications crash trying to connect to databases that are still initializing.
Why it happens:
depends_ononly waits for containers to start, not be ready- Databases need time to initialize (can take 10-30 seconds)
- No healthcheck or readiness probe configured
- Students assume "started" = "ready to accept connections"
How to avoid:
- Implement healthchecks for all services
- Use restart policies with delays
- Teach the difference between "running" and "healthy"
- Provide example healthcheck scripts for common services
Warning signs:
- Intermittent connection failures on lab startup
- "Connection refused" errors that go away with manual retry
- Services exiting and restarting
- Need to manually restart containers to make things work
Phase to address: Lab 3 (Compute) - Health checks and service readiness Lab 5 (Database) - Database initialization timing
Pitfall 7: Orphaned Resources - Disk Space Exhaustion
What goes wrong: Students' disk space fills up with stopped containers, unused volumes, and dangling images. Docker becomes unusable.
Why it happens:
- Students never clean up resources
- No teaching of
docker system prune - Volumes aren't removed when containers are deleted
- Image layers accumulate during development
How to avoid:
- Teach cleanup commands in every lab
- Provide cleanup scripts
- Use
--rmflag for one-off containers - Explain volume lifecycle and manual removal
- Monitor disk usage in lab instructions
Warning signs:
- Disk space warnings on host system
docker system dfshows large unused space- Slow container startup due to image bloat
- "No space left on device" errors
Phase to address: Lab 1 (IAM & Sicurezza) - Docker basics and cleanup All labs - Include cleanup section in each lab
Technical Debt Patterns
Shortcuts that seem reasonable but create long-term problems.
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|---|---|---|---|
Using --privileged flag |
Fixes permission issues quickly | Major security hole, teaches bad practices | NEVER in educational context |
| Hardcoding IPs in configs | Works immediately for one student | Doesn't scale, breaks on different machines | NEVER - use service names |
Using network_mode: host |
Simplifies networking | Breaks isolation, conflicts between labs | Only for debugging, never in final labs |
| Anonymous volumes | Less configuration | Data loss, difficult cleanup | NEVER - always use named volumes |
| Ignoring healthchecks | Faster startup | Flaky services, difficult debugging | NEVER - causes intermittent failures |
| Using latest tags | No version numbers to track | Unexpected breaking changes | NEVER - use specific versions |
| Exposing all ports | "It just works" | Security issues, port conflicts | NEVER - expose only what's needed |
Integration Gotchas
Common mistakes when connecting to external services.
| Integration | Common Mistake | Correct Approach |
|---|---|---|
| MinIO (S3) | Using AWS endpoint s3.amazonaws.com |
Use http://localhost:9000 or container service name |
| MinIO Console | Confusing port 9000 (API) with 9001 (Console) | Document both ports clearly, separate service configs |
| PostgreSQL | Using default port 5432 (causes conflicts) | Use 5433 or other non-standard port |
| MySQL | Not setting MYSQL_ROOT_PASSWORD env var |
Always set required environment variables |
| Redis | Using default port 6379 (may conflict) | Use non-standard port or proper networking |
| Networks | Using legacy --link flag |
Use user-defined networks and service names |
| Volumes | Bind mounting to non-existent host paths | Create host directories first or use named volumes |
| DNS | Using /etc/hosts inside containers |
Use Docker's embedded DNS with service names |
Performance Traps
Patterns that work at small scale but fail as usage grows.
| Trap | Symptoms | Prevention | When It Breaks |
|---|---|---|---|
| No resource limits | Works with 1-2 containers, host crashes at 5+ | Always set CPU and memory limits | 3-5 containers on 8GB RAM |
| Single bridge network | Fine for simple apps, confusing for complex | Use multiple networks for isolation | 5+ services with different security needs |
| tmpfs for data | Fast but data loss on restart | Use named volumes for persistence | Immediately on container restart |
| Logging to json-file without rotation | Works initially, disk fills up | Set max-size and max-file options |
After running labs repeatedly |
| Building images in Compose | Slow rebuilds, large layers | Use multi-stage builds, .dockerignore | After 3-4 iterations |
| Running everything on default network | Works until naming conflicts arise | Use custom networks per lab | When running multiple labs simultaneously |
Security Mistakes
Domain-specific security issues beyond general web security.
| Mistake | Risk | Prevention |
|---|---|---|
Mounting Docker socket (/var/run/docker.sock) |
Container breakout, root on host | NEVER do this in educational context |
| Running containers as root | Privilege escalation vulnerabilities | Always use user: directive |
| Exposing all ports to 0.0.0.0 | Services accessible externally | Bind to 127.0.0.1 or use internal networks |
| Using default credentials | Easy unauthorized access | Require password changes, document security |
| Sharing host PID namespace | Container can see host processes | Never use pid: host in labs |
| Ignoring cgroups | No resource isolation = DoS potential | Always set resource limits |
Using --privileged flag |
Complete host access | NEVER acceptable, teaches bad practices |
| Not using AppArmor/SELinux profiles | Missing security layer | Enable when available, document why |
UX Pitfalls
Common user experience mistakes in this domain.
| Pitfall | User Impact | Better Approach |
|---|---|---|
| Silent failures | Students don't know what went wrong | Always provide error messages and logs |
| No progress indicators | Students think labs are broken | Show startup progress, especially for databases |
| Hidden dependencies | Labs fail mysteriously | List all prerequisites clearly, check at startup |
| Complex YAML errors | Students stuck on syntax | Validate YAML before running, provide examples |
| No rollback capability | Mistakes require starting over | Git version control, snapshot instructions |
| Missing cleanup steps | Accumulating cruft, confusion | Provide cleanup scripts for each lab |
| Unclear parallelisms | Students don't see the point | Explicitly map Docker concepts to AWS services |
| Assuming prior knowledge | Beginners get lost | Provide background reading, glossary of terms |
"Looks Done But Isn't" Checklist
Things that appear complete but are missing critical pieces.
- Volume Persistence: Often missing named volume declarations — verify data survives
docker downanddocker up - Network Isolation: Often missing network per lab — verify containers can't accidentally talk between labs
- Health Checks: Often missing readiness verification — verify
docker psshows "healthy" not just "running" - Resource Limits: Often missing CPU/memory constraints — verify
docker statsshows limits - Cleanup Scripts: Often missing tear-down instructions — verify
docker system dfis clean after lab - Error Handling: Often missing graceful failure modes — verify lab handles common errors (port conflicts, missing volumes)
- Cloud Parallels: Often missing explicit mappings — verify each Docker component maps to a specific AWS service
- Diátaxis Documentation: Often missing explanation documents — verify each lab has all 4 document types
Recovery Strategies
When pitfalls occur despite prevention, how to recover.
| Pitfall | Recovery Cost | Recovery Steps |
|---|---|---|
| Data loss from missing volumes | HIGH | Rebuild from scratch, add volumes, restore from backup if available |
| OOM killer crash | MEDIUM | Add memory limits to all services, restart Docker daemon, free host memory |
| Port conflicts | LOW | Change port mappings, kill conflicting processes, restart services |
| Permission errors | MEDIUM | Add user: directive, chown volume directories, rebuild containers |
| Network connectivity issues | LOW | Verify service names, check network attachment, ping between containers |
| Disk space exhaustion | HIGH | Run docker system prune -a, remove unused volumes, clear build cache |
| Orphaned containers | LOW | Run docker container prune, remove stopped containers manually |
Pitfall-to-Phase Mapping
How roadmap phases should address these pitfalls.
| Pitfall | Prevention Phase | Verification |
|---|---|---|
| Data loss on restart | Lab 1 - IAM & Sicurezza (introduce volumes), Lab 4 - Storage (critical) | Verify: docker down then docker up preserves data |
| Networking confusion | Lab 2 - Network (core concepts), Lab 5 - Database (application connections) | Verify: Service names resolve, inter-container communication works |
| OOM killer | Lab 3 - Compute (resource limits mandatory) | Verify: docker stats shows limits, no OOM errors under load |
| Running as root | Lab 1 - IAM & Sicurezza (user permissions) | Verify: docker exec -it <container> whoami shows non-root user |
| Port conflicts | Lab 2 - Network (port mapping), All labs (unique ports) | Verify: Multiple labs can run simultaneously without errors |
| depends_on without readiness | Lab 3 - Compute (healthchecks), Lab 5 - Database (initialization) | Verify: All services show "healthy" before app tries to connect |
| Orphaned resources | Lab 1 - IAM & Sicurezza (cleanup commands), All labs (cleanup section) | Verify: docker system df shows minimal unused space after cleanup |
Sources
- Docker Engine Security Documentation - HIGH confidence, official Docker security best practices
- Docker Volumes Documentation - HIGH confidence, official volume management reference
- Docker Compose File v3 Reference - HIGH confidence, official Compose specification
- Docker documentation on resource limits and OOM prevention
- Common educational patterns from container-based training courses
- Known issues from Docker-based learning environments
Pitfalls research for: Cloud Lab Projects with Docker Simulation Researched: 2026-03-24