Files
2026-03-24 19:26:48 +01:00

16 KiB

Domain Pitfalls

Domain: Cloud Lab Projects with Docker Simulation Researched: 2026-03-24 Overall confidence: HIGH (based on official Docker documentation and educational best practices)

Critical Pitfalls

Pitfall 1: Data Loss on Container Restart

What goes wrong: Students lose all their work when containers are restarted or removed. Database data, uploaded files, and configuration changes disappear because data was written to the container's writable layer instead of volumes.

Why it happens:

  • Beginners don't understand Docker's layered filesystem
  • Tutorials often skip volume configuration for simplicity
  • The difference between docker stop vs docker rm isn't clear
  • Anonymous volumes vs named volumes confusion

How to avoid:

  • Always use named Docker volumes for persistent data
  • Explicitly declare volumes in docker-compose.yml under the top-level volumes key
  • Teach volume lifecycle: volumes persist after container removal
  • Use --mount flag syntax (more explicit) instead of -v for beginners

Warning signs:

  • No top-level volumes: section in docker-compose.yml
  • Using inline volume paths like ./data:/app/data without explaining persistence
  • Labs that work once but fail on restart
  • Students asking "where did my data go?"

Phase to address: Lab 1 (IAM & Sicurezza) - Introduce volume concepts early Lab 4 (Storage) - Critical for object storage persistence Lab 5 (Database) - Essential for database data persistence


Pitfall 2: Networking Confusion - Localhost vs Container Names

What goes wrong: Students try to connect to services using localhost or 127.0.0.1 instead of container service names. Connections fail because containers have isolated network stacks.

Why it happens:

  • Mental model from local development doesn't translate to containers
  • Docker's embedded DNS isn't explained
  • Students don't understand that each container has its own localhost
  • Confusion between exposed ports and published ports

How to avoid:

  • Teach Docker's internal DNS resolution first
  • Always use service names for inter-container communication
  • Create network diagrams showing container isolation
  • Explain EXPOSE (docs) vs -p (publish) difference clearly

Warning signs:

  • Connection refused errors when using localhost
  • Students asking "why can't I connect to the database?"
  • Mixing up localhost inside container vs localhost on host
  • Port mapping confusion (internal vs external ports)

Phase to address: Lab 2 (Network) - Core networking concepts Lab 5 (Database) - Application-to-database connections


Pitfall 3: OOM Killer - Resource Exhaustion

What goes wrong: Containers or the entire Docker daemon are killed by the kernel's OOM (Out Of Memory) killer. Students lose work and the lab environment becomes unstable.

Why it happens:

  • No memory limits set in docker-compose
  • Multiple containers compete for host memory
  • Students' machines have limited RAM (8GB or less)
  • Memory leaks in student code go unchecked

How to avoid:

  • Always set mem_limit in docker-compose for each service
  • Use deploy.resources.limits.memory in compose file format v3+
  • Monitor with docker stats
  • Teach students to check container resource usage
  • Provide minimum host requirements (16GB RAM recommended)

Warning signs:

  • Containers randomly exit with code 137
  • Host system becomes slow
  • docker ps shows containers restarting repeatedly
  • System logs mention "OOM killer"

Phase to address: Lab 3 (Compute) - Resource limits are essential here All labs - Enforce resource limits from the start


Pitfall 4: Security Misconfiguration - Running as Root

What goes wrong: Containers run as root by default, creating security vulnerabilities and permission issues with volume mounts.

Why it happens:

  • Docker's default behavior is root unless specified otherwise
  • Base images often set USER root
  • Beginners don't understand Linux user permissions
  • Volume permission errors seem "easier" to fix with root

How to avoid:

  • Always specify user: directive in docker-compose or Dockerfile
  • Create non-root users in Dockerfiles
  • Teach Linux permission basics alongside Docker
  • Use Docker's user namespaces for advanced labs

Warning signs:

  • Permission denied errors with volumes
  • Files created as root on host system
  • Security warnings in docker inspect
  • Running commands with sudo inside containers

Phase to address: Lab 1 (IAM & Sicurezza) - User permissions and security basics


Pitfall 5: Port Conflicts and Binding Issues

What goes wrong: Multiple students' labs conflict when using default ports. Services fail to start because ports are already in use.

Why it happens:

  • Hardcoded default ports (3306, 5432, 8080, etc.)
  • Multiple labs running simultaneously
  • Not teaching port mapping flexibility
  • Students don't know how to check occupied ports

How to avoid:

  • Use non-standard ports in examples (e.g., 5433 instead of 5432)
  • Teach students to check port usage: netstat -tuln or ss -tuln
  • Document all port mappings in lab materials
  • Provide scripts to detect port conflicts

Warning signs:

  • "port already allocated" errors
  • Services failing to start silently
  • Students reporting "it worked yesterday but not today"
  • Multiple labs can't run simultaneously

Phase to address: Lab 2 (Network) - Port mapping and exposure All labs - Use unique port ranges per lab


Pitfall 6: depends_on Without Readiness Checks

What goes wrong: Services start but fail because dependencies aren't ready. Applications crash trying to connect to databases that are still initializing.

Why it happens:

  • depends_on only waits for containers to start, not be ready
  • Databases need time to initialize (can take 10-30 seconds)
  • No healthcheck or readiness probe configured
  • Students assume "started" = "ready to accept connections"

How to avoid:

  • Implement healthchecks for all services
  • Use restart policies with delays
  • Teach the difference between "running" and "healthy"
  • Provide example healthcheck scripts for common services

Warning signs:

  • Intermittent connection failures on lab startup
  • "Connection refused" errors that go away with manual retry
  • Services exiting and restarting
  • Need to manually restart containers to make things work

Phase to address: Lab 3 (Compute) - Health checks and service readiness Lab 5 (Database) - Database initialization timing


Pitfall 7: Orphaned Resources - Disk Space Exhaustion

What goes wrong: Students' disk space fills up with stopped containers, unused volumes, and dangling images. Docker becomes unusable.

Why it happens:

  • Students never clean up resources
  • No teaching of docker system prune
  • Volumes aren't removed when containers are deleted
  • Image layers accumulate during development

How to avoid:

  • Teach cleanup commands in every lab
  • Provide cleanup scripts
  • Use --rm flag for one-off containers
  • Explain volume lifecycle and manual removal
  • Monitor disk usage in lab instructions

Warning signs:

  • Disk space warnings on host system
  • docker system df shows large unused space
  • Slow container startup due to image bloat
  • "No space left on device" errors

Phase to address: Lab 1 (IAM & Sicurezza) - Docker basics and cleanup All labs - Include cleanup section in each lab


Technical Debt Patterns

Shortcuts that seem reasonable but create long-term problems.

Shortcut Immediate Benefit Long-term Cost When Acceptable
Using --privileged flag Fixes permission issues quickly Major security hole, teaches bad practices NEVER in educational context
Hardcoding IPs in configs Works immediately for one student Doesn't scale, breaks on different machines NEVER - use service names
Using network_mode: host Simplifies networking Breaks isolation, conflicts between labs Only for debugging, never in final labs
Anonymous volumes Less configuration Data loss, difficult cleanup NEVER - always use named volumes
Ignoring healthchecks Faster startup Flaky services, difficult debugging NEVER - causes intermittent failures
Using latest tags No version numbers to track Unexpected breaking changes NEVER - use specific versions
Exposing all ports "It just works" Security issues, port conflicts NEVER - expose only what's needed

Integration Gotchas

Common mistakes when connecting to external services.

Integration Common Mistake Correct Approach
MinIO (S3) Using AWS endpoint s3.amazonaws.com Use http://localhost:9000 or container service name
MinIO Console Confusing port 9000 (API) with 9001 (Console) Document both ports clearly, separate service configs
PostgreSQL Using default port 5432 (causes conflicts) Use 5433 or other non-standard port
MySQL Not setting MYSQL_ROOT_PASSWORD env var Always set required environment variables
Redis Using default port 6379 (may conflict) Use non-standard port or proper networking
Networks Using legacy --link flag Use user-defined networks and service names
Volumes Bind mounting to non-existent host paths Create host directories first or use named volumes
DNS Using /etc/hosts inside containers Use Docker's embedded DNS with service names

Performance Traps

Patterns that work at small scale but fail as usage grows.

Trap Symptoms Prevention When It Breaks
No resource limits Works with 1-2 containers, host crashes at 5+ Always set CPU and memory limits 3-5 containers on 8GB RAM
Single bridge network Fine for simple apps, confusing for complex Use multiple networks for isolation 5+ services with different security needs
tmpfs for data Fast but data loss on restart Use named volumes for persistence Immediately on container restart
Logging to json-file without rotation Works initially, disk fills up Set max-size and max-file options After running labs repeatedly
Building images in Compose Slow rebuilds, large layers Use multi-stage builds, .dockerignore After 3-4 iterations
Running everything on default network Works until naming conflicts arise Use custom networks per lab When running multiple labs simultaneously

Security Mistakes

Domain-specific security issues beyond general web security.

Mistake Risk Prevention
Mounting Docker socket (/var/run/docker.sock) Container breakout, root on host NEVER do this in educational context
Running containers as root Privilege escalation vulnerabilities Always use user: directive
Exposing all ports to 0.0.0.0 Services accessible externally Bind to 127.0.0.1 or use internal networks
Using default credentials Easy unauthorized access Require password changes, document security
Sharing host PID namespace Container can see host processes Never use pid: host in labs
Ignoring cgroups No resource isolation = DoS potential Always set resource limits
Using --privileged flag Complete host access NEVER acceptable, teaches bad practices
Not using AppArmor/SELinux profiles Missing security layer Enable when available, document why

UX Pitfalls

Common user experience mistakes in this domain.

Pitfall User Impact Better Approach
Silent failures Students don't know what went wrong Always provide error messages and logs
No progress indicators Students think labs are broken Show startup progress, especially for databases
Hidden dependencies Labs fail mysteriously List all prerequisites clearly, check at startup
Complex YAML errors Students stuck on syntax Validate YAML before running, provide examples
No rollback capability Mistakes require starting over Git version control, snapshot instructions
Missing cleanup steps Accumulating cruft, confusion Provide cleanup scripts for each lab
Unclear parallelisms Students don't see the point Explicitly map Docker concepts to AWS services
Assuming prior knowledge Beginners get lost Provide background reading, glossary of terms

"Looks Done But Isn't" Checklist

Things that appear complete but are missing critical pieces.

  • Volume Persistence: Often missing named volume declarations — verify data survives docker down and docker up
  • Network Isolation: Often missing network per lab — verify containers can't accidentally talk between labs
  • Health Checks: Often missing readiness verification — verify docker ps shows "healthy" not just "running"
  • Resource Limits: Often missing CPU/memory constraints — verify docker stats shows limits
  • Cleanup Scripts: Often missing tear-down instructions — verify docker system df is clean after lab
  • Error Handling: Often missing graceful failure modes — verify lab handles common errors (port conflicts, missing volumes)
  • Cloud Parallels: Often missing explicit mappings — verify each Docker component maps to a specific AWS service
  • Diátaxis Documentation: Often missing explanation documents — verify each lab has all 4 document types

Recovery Strategies

When pitfalls occur despite prevention, how to recover.

Pitfall Recovery Cost Recovery Steps
Data loss from missing volumes HIGH Rebuild from scratch, add volumes, restore from backup if available
OOM killer crash MEDIUM Add memory limits to all services, restart Docker daemon, free host memory
Port conflicts LOW Change port mappings, kill conflicting processes, restart services
Permission errors MEDIUM Add user: directive, chown volume directories, rebuild containers
Network connectivity issues LOW Verify service names, check network attachment, ping between containers
Disk space exhaustion HIGH Run docker system prune -a, remove unused volumes, clear build cache
Orphaned containers LOW Run docker container prune, remove stopped containers manually

Pitfall-to-Phase Mapping

How roadmap phases should address these pitfalls.

Pitfall Prevention Phase Verification
Data loss on restart Lab 1 - IAM & Sicurezza (introduce volumes), Lab 4 - Storage (critical) Verify: docker down then docker up preserves data
Networking confusion Lab 2 - Network (core concepts), Lab 5 - Database (application connections) Verify: Service names resolve, inter-container communication works
OOM killer Lab 3 - Compute (resource limits mandatory) Verify: docker stats shows limits, no OOM errors under load
Running as root Lab 1 - IAM & Sicurezza (user permissions) Verify: docker exec -it <container> whoami shows non-root user
Port conflicts Lab 2 - Network (port mapping), All labs (unique ports) Verify: Multiple labs can run simultaneously without errors
depends_on without readiness Lab 3 - Compute (healthchecks), Lab 5 - Database (initialization) Verify: All services show "healthy" before app tries to connect
Orphaned resources Lab 1 - IAM & Sicurezza (cleanup commands), All labs (cleanup section) Verify: docker system df shows minimal unused space after cleanup

Sources


Pitfalls research for: Cloud Lab Projects with Docker Simulation Researched: 2026-03-24