# Technical Specification - Secure Bash Log Ingestion (Sprint 2) **Status:** 🟑 In Review **Sprint:** 2 **Priority:** πŸ”΄ Critical - Security Fix **Author:** @tech-lead **Date:** 2026-04-02 **Security Review:** Required before implementation --- ## 1. Overview Riscrittura dello script di log ingestion con focus sulla sicurezza, risolvendo le vulnerabilitΓ  HIGH identificate nella Sprint 1 Review. Lo script deve essere resistente a Command Injection, JSON Injection, e Path Traversal. ### 1.1 VulnerabilitΓ  Addressate (da Sprint 1 Review) | VulnerabilitΓ  | SeveritΓ  | Stato Sprint 1 | Mitigazione Sprint 2 | |---------------|----------|----------------|---------------------| | JSON Injection via Log Content | πŸ”΄ HIGH | Incomplete escaping | jq-based JSON encoding | | Path Traversal via LOG_SOURCES | πŸ”΄ HIGH | Weak validation | Whitelist /var/log only | | Command Injection | πŸ”΄ HIGH | Implicit risk | Array-based commands, no eval | | Race Condition offset files | 🟑 MEDIUM | No atomicity | Atomic write (tmp + mv) | | Information Disclosure | 🟑 MEDIUM | Full values logged | Masked sensitive data | | No Webhook Authentication | πŸ”΄ HIGH | None | HMAC-SHA256 signature | --- ## 2. Architecture ### 2.1 Modular Structure ``` secure_logwhisperer.sh β”‚ β”œβ”€β”€ Configuration & Validation β”‚ β”œβ”€β”€ load_config() # Load with validation β”‚ β”œβ”€β”€ validate_environment() # Check jq, curl, permissions β”‚ └── validate_log_source() # Whitelist /var/log paths β”‚ β”œβ”€β”€ Input Sanitization β”‚ β”œβ”€β”€ sanitize_path() # Path traversal prevention β”‚ β”œβ”€β”€ sanitize_log_line() # DLP + control char removal β”‚ └── validate_line_length() # MAX_LINE_LENGTH enforcement β”‚ β”œβ”€β”€ Security Functions β”‚ β”œβ”€β”€ encode_json_payload() # jq-based safe JSON encoding β”‚ β”œβ”€β”€ generate_hmac_signature() # HMAC-SHA256 for webhook auth β”‚ └── sanitize_for_display() # Mask sensitive data in logs β”‚ β”œβ”€β”€ Core Logic β”‚ β”œβ”€β”€ tail_log_safe() # Read logs without injection β”‚ β”œβ”€β”€ atomic_write_offset() # Atomic file operations β”‚ └── dispatch_webhook_secure() # Authenticated HTTP POST β”‚ └── Main Loop └── monitor_loop() # Safe monitoring with error handling ``` ### 2.2 Data Flow (Secure) ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Log Source β”‚ /var/log/* only β”‚ (read-only) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ validate_log_source() β”‚ β”‚ - Check path starts with /var/log β”‚ β”‚ - Verify file is readable β”‚ β”‚ - Reject symlinks outside /var/log β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ sanitize_log_line() β”‚ β”‚ - Remove control characters β”‚ β”‚ - DLP: mask PII/secrets β”‚ β”‚ - Truncate to MAX_LINE_LENGTH β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ encode_json_payload() β”‚ β”‚ - Use jq for safe JSON encoding β”‚ β”‚ - No manual string escaping β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ generate_hmac_signature() β”‚ β”‚ - HMAC-SHA256(payload + timestamp) β”‚ β”‚ - Prevent replay attacks β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ dispatch_webhook_secure() β”‚ β”‚ - HTTPS only β”‚ β”‚ - X-LogWhisperer-Signature header β”‚ β”‚ - Timeout and retry with backoff β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## 3. Security Requirements ### 3.1 Input Validation #### Path Validation (ANTI-PATH TRAVERSAL) ```bash validate_log_source() { local path="$1" # MUST start with /var/log/ if [[ ! "$path" =~ ^/var/log/ ]]; then log_error "Invalid log source path: $path (must be under /var/log/)" return 1 fi # MUST be a regular file or fifo (no symlinks outside /var/log) if [[ -L "$path" ]]; then local realpath realpath=$(readlink -f "$path") if [[ ! "$realpath" =~ ^/var/log/ ]]; then log_error "Symlink target outside /var/log: $realpath" return 1 fi fi # MUST be readable if [[ ! -r "$path" ]]; then log_error "Log source not readable: $path" return 1 fi return 0 } ``` #### Log Line Sanitization (DLP + ANTI-INJECTION) ```bash sanitize_log_line() { local line="$1" # Remove control characters (keep only printable ASCII + newline) line=$(printf '%s' "$line" | tr -d '\x00-\x08\x0b-\x0c\x0e-\x1f\x7f') # Truncate to MAX_LINE_LENGTH if [[ ${#line} -gt $MAX_LINE_LENGTH ]]; then line="${line:0:$MAX_LINE_LENGTH}...[truncated]" fi # DLP: Mask sensitive patterns # Passwords line=$(printf '%s' "$line" | sed -E 's/(password|passwd|pwd)=[^[:space:]]+/\1=***/gi') # Email addresses line=$(printf '%s' "$line" | sed -E 's/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/[EMAIL]/g') # API Keys and Tokens (16+ alphanumeric chars) line=$(printf '%s' "$line" | sed -E 's/(api[_-]?key|token|secret)=[a-zA-Z0-9]{16,}/\1=***/gi') # IP addresses line=$(printf '%s' "$line" | sed -E 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/[IP]/g') printf '%s' "$line" } ``` ### 3.2 Safe JSON Encoding #### ANTI-JSON INJECTION: Use jq ```bash encode_json_payload() { local client_id="$1" local hostname="$2" local source="$3" local severity="$4" local raw_log="$5" local pattern="$6" local timestamp timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ") # Use jq for safe JSON encoding - no manual escaping jq -n \ --arg client_id "$client_id" \ --arg hostname "$hostname" \ --arg source "$source" \ --arg severity "$severity" \ --arg timestamp "$timestamp" \ --arg raw_log "$raw_log" \ --arg pattern "$pattern" \ '{ client_id: $client_id, hostname: $hostname, source: $source, severity: $severity, timestamp: $timestamp, raw_log: $raw_log, matched_pattern: $pattern }' } ``` **Requirement:** `jq` must be installed. Script exits with error if missing. ### 3.3 Webhook Authentication #### HMAC-SHA256 Signature ```bash generate_hmac_signature() { local payload="$1" local timestamp timestamp=$(date +%s) # Generate signature: HMAC-SHA256(payload + timestamp) local signature signature=$(printf '%s:%s' "$timestamp" "$payload" | \ openssl dgst -sha256 -hmac "$CLIENT_SECRET" | \ sed 's/^.* //') printf '%s:%s' "$timestamp" "$signature" } dispatch_webhook_secure() { local payload="$1" local sig_data sig_data=$(generate_hmac_signature "$payload") local timestamp=${sig_data%%:*} local signature=${sig_data#*:} # Enforce HTTPS if [[ ! "$WEBHOOK_URL" =~ ^https:// ]]; then log_error "Webhook URL must use HTTPS" return 1 fi # Send with signature header curl -s -X POST "$WEBHOOK_URL" \ -H "Content-Type: application/json" \ -H "X-LogWhisperer-Signature: $signature" \ -H "X-LogWhisperer-Timestamp: $timestamp" \ -d "$payload" \ --max-time 30 \ --retry 3 \ --retry-delay 1 } ``` **New Configuration:** `CLIENT_SECRET` (shared secret for HMAC) ### 3.4 Atomic File Operations #### ANTI-RACE CONDITION ```bash atomic_write_offset() { local offset_file="$1" local offset_value="$2" local tmp_file="${offset_file}.tmp.$$" # Write to temp file with PID suffix printf '%s' "$offset_value" > "$tmp_file" # Atomic move mv "$tmp_file" "$offset_file" } ``` ### 3.5 Safe Command Execution #### ANTI-COMMAND INJECTION ```bash # WRONG: vulnerable to injection tail -n 0 -F "$log_source" 2>/dev/null | while read -r line; do ... done # CORRECT: array-based, no interpretation local tail_cmd=("tail" "-n" "0" "-F" "$log_source") "${tail_cmd[@]}" 2>/dev/null | while IFS= read -r line; do ... done ``` **Rules:** - No `eval` anywhere - No backtick command substitution on user input - Use `printf %q` if variable must be in command - Use arrays for complex commands --- ## 4. Configuration ### 4.1 New Config Parameters ```bash # config.env WEBHOOK_URL="https://your-n8n-instance.com/webhook/logwhisperer" CLIENT_ID="unique-client-uuid" CLIENT_SECRET="shared-secret-for-hmac" # NEW LOG_SOURCES="/var/log/syslog,/var/log/nginx/error.log" POLL_INTERVAL=5 MAX_LINE_LENGTH=2000 OFFSET_DIR="/var/lib/logwhisperer" ``` ### 4.2 Validation Requirements | Parameter | Validation | Failure Action | |-----------|------------|----------------| | `WEBHOOK_URL` | MUST be HTTPS | Exit with error | | `CLIENT_ID` | Valid UUID format | Exit with error | | `CLIENT_SECRET` | Min 32 chars, no spaces | Exit with error | | `LOG_SOURCES` | All paths MUST be under /var/log | Skip invalid paths, log warning | | `MAX_LINE_LENGTH` | Integer between 500-10000 | Use default 2000 | --- ## 5. Dependencies ### 5.1 Required | Tool | Purpose | Check in Script | |------|---------|-----------------| | `jq` | Safe JSON encoding | Exit if missing | | `curl` | HTTP POST | Exit if missing | | `openssl` | HMAC-SHA256 | Exit if missing | | `date` | Timestamp generation | Exit if missing | ### 5.2 Optional | Tool | Purpose | Fallback | |------|---------|----------| | `systemctl` | Service management | Skip systemd setup | --- ## 6. Error Handling ### 6.1 Error Levels | Level | Description | Action | |-------|-------------|--------| | `FATAL` | Config invalid, security violation | Exit immediately | | `ERROR` | Single log source unreadable | Skip source, continue | | `WARN` | Retryable error (network) | Retry with backoff | | `INFO` | Normal operation | Log and continue | ### 6.2 Graceful Degradation ```bash # If one log source fails, continue with others for source in "${LOG_SOURCES_ARRAY[@]}"; do if ! validate_log_source "$source"; then log_error "Skipping invalid source: $source" continue fi monitor_source "$source" & done ``` --- ## 7. Testing Strategy ### 7.1 Security Test Cases (RED Phase) | Test ID | Description | Expected Behavior | |---------|-------------|-------------------| | `SEC-001` | Path `/etc/passwd` in LOG_SOURCES | Rejected, logged as error | | `SEC-002` | Path `../../../etc/shadow` | Rejected, logged as error | | `SEC-003` | Symlink to `/etc/shadow` from /var/log | Rejected, logged as error | | `SEC-004` | Log line with `"; rm -rf /;"` | Sanitized, no command execution | | `SEC-005` | Log line with `password=secret123` | Masked as `password=***` in payload | | `SEC-006` | Log line with `user@example.com` | Masked as `[EMAIL]` in payload | | `SEC-007` | Missing jq binary | Exit with clear error message | | `SEC-008` | HTTP webhook URL (non HTTPS) | Exit with error | | `SEC-009` | Payload tampering (wrong HMAC) | Webhook rejects (tested server-side) | | `SEC-010` | Offset file corruption | Detected, reset to 0 (safe) | ### 7.2 Integration Tests | Test ID | Description | Expected | |---------|-------------|----------| | `INT-001` | End-to-end with valid log | Payload delivered with HMAC | | `INT-002` | Network timeout | Retry 3x, then skip | | `INT-003` | Webhook returns 4xx | Stop retry, log error | | `INT-004` | Multiple concurrent log sources | All monitored correctly | --- ## 8. Acceptance Criteria ### 8.1 Security - [ ] All log sources validated against /var/log whitelist - [ ] JSON encoding uses jq (no manual escaping) - [ ] All payloads signed with HMAC-SHA256 - [ ] HTTPS enforced for webhooks - [ ] DLP masking applied to PII/secrets - [ ] Atomic writes for offset files - [ ] No eval or command substitution on user input ### 8.2 Functionality - [ ] Backward compatible with Sprint 1 config (minus security fixes) - [ ] All Sprint 1 tests still pass (except where behavior changed for security) - [ ] New security tests pass - [ ] Graceful handling of missing jq/curl/openssl ### 8.3 Performance - [ ] No significant slowdown (< 10% overhead) - [ ] Sanitization completes in < 10ms per line - [ ] HMAC generation < 5ms per payload --- ## 9. Migration from Sprint 1 ### 9.1 Breaking Changes | Aspect | Sprint 1 | Sprint 2 | Migration | |--------|----------|----------|-----------| | JSON Encoding | Manual sed | jq required | Install jq | | Webhook Auth | None | HMAC | Add CLIENT_SECRET | | Path Validation | None | /var/log only | Update config if needed | | Dependencies | bash, curl | + jq, openssl | Update install.sh | ### 9.2 Upgrade Path ```bash # install.sh will: 1. Check for jq, install if missing 2. Generate CLIENT_SECRET if not present 3. Validate existing LOG_SOURCES 4. Warn about paths outside /var/log ``` --- ## 10. Risks and Mitigations | Risk | Likelihood | Impact | Mitigation | |------|------------|--------|------------| | jq not available on target | Medium | High | Fallback to Python JSON encoding | | Performance degradation | Low | Medium | Benchmark tests | | False positives in DLP | Medium | Low | Configurable DLP patterns | | Backward compatibility | Medium | Medium | Major version bump, migration guide | --- ## 11. Notes for Implementation ### 11.1 @context-auditor Checklist Before implementation, verify: - [ ] Latest jq documentation for JSON encoding options - [ ] Best practices for HMAC-SHA256 in bash - [ ] curl security flags for production use ### 11.2 @security-auditor Pre-implementation Review Required before GREEN phase: - [ ] Review validate_log_source() logic - [ ] Verify sanitize_log_line() regex patterns - [ ] Check HMAC implementation for timing attacks - [ ] Confirm atomic write implementation ### 11.3 @qa-engineer Test Requirements Create tests for: - [ ] All SEC-* test cases (RED phase) - [ ] Integration with webhook signature verification - [ ] Performance benchmarks --- *Security First. Safety Always.*