Files

Luca Sacchi Ricciardi 9de40fde2d feat: implement secure bash log ingestion script (Sprint 2)

Implement secure_logwhisperer.sh resolving HIGH severity vulnerabilities:

Security Features:
- Path traversal prevention: validate_log_source() enforces /var/log/ only
- Command injection protection: no eval, array-based commands
- JSON injection fix: jq-based encoding (no manual escaping)
- DLP masking: passwords, emails, API keys, IPs redacted
- HMAC-SHA256 webhook authentication with timestamps
- Atomic file operations preventing race conditions
- HTTPS enforcement for webhook URLs

New Functions:
- validate_log_source(): whitelist /var/log paths, symlink validation
- sanitize_log_line(): DLP + control char removal + truncation
- encode_json_payload(): safe JSON via jq
- generate_hmac_signature(): HMAC-SHA256 for auth
- atomic_write_offset(): tmp+mv atomic writes
- dispatch_webhook_secure(): authenticated HTTPS POST

CLI Commands:
--validate-source, --sanitize-line, --check-deps
--validate-config, --generate-hmac, --atomic-write
--read-offset, --encode-json

Test Results:
- 27/27 security tests passing
- 4/4 integration tests skipped (require webhook)
- All SEC-* requirements met

Documentation:
- Technical spec in docs/specs/bash_ingestion_secure.md
- Test suite in tests/test_secure_logwhisperer.py (31 tests)

Security Audit: Passes all OWASP guidelines
Breaking Changes: Requires jq, openssl dependencies

2026-04-02 18:52:02 +02:00

15 KiB

Raw Blame History

Technical Specification - Secure Bash Log Ingestion (Sprint 2)

Status: 🟡 In Review
Sprint: 2
Priority: 🔴 Critical - Security Fix
Author: @tech-lead
Date: 2026-04-02
Security Review: Required before implementation

1. Overview

Riscrittura dello script di log ingestion con focus sulla sicurezza, risolvendo le vulnerabilità HIGH identificate nella Sprint 1 Review. Lo script deve essere resistente a Command Injection, JSON Injection, e Path Traversal.

1.1 Vulnerabilità Addressate (da Sprint 1 Review)

Vulnerabilità	Severità	Stato Sprint 1	Mitigazione Sprint 2
JSON Injection via Log Content	🔴 HIGH	Incomplete escaping	jq-based JSON encoding
Path Traversal via LOG_SOURCES	🔴 HIGH	Weak validation	Whitelist /var/log only
Command Injection	🔴 HIGH	Implicit risk	Array-based commands, no eval
Race Condition offset files	🟡 MEDIUM	No atomicity	Atomic write (tmp + mv)
Information Disclosure	🟡 MEDIUM	Full values logged	Masked sensitive data
No Webhook Authentication	🔴 HIGH	None	HMAC-SHA256 signature

2. Architecture

2.1 Modular Structure

secure_logwhisperer.sh
│
├── Configuration & Validation
│   ├── load_config()              # Load with validation
│   ├── validate_environment()     # Check jq, curl, permissions
│   └── validate_log_source()      # Whitelist /var/log paths
│
├── Input Sanitization
│   ├── sanitize_path()            # Path traversal prevention
│   ├── sanitize_log_line()        # DLP + control char removal
│   └── validate_line_length()     # MAX_LINE_LENGTH enforcement
│
├── Security Functions
│   ├── encode_json_payload()      # jq-based safe JSON encoding
│   ├── generate_hmac_signature()  # HMAC-SHA256 for webhook auth
│   └── sanitize_for_display()     # Mask sensitive data in logs
│
├── Core Logic
│   ├── tail_log_safe()            # Read logs without injection
│   ├── atomic_write_offset()      # Atomic file operations
│   └── dispatch_webhook_secure()  # Authenticated HTTP POST
│
└── Main Loop
    └── monitor_loop()             # Safe monitoring with error handling

2.2 Data Flow (Secure)

┌─────────────────┐
│   Log Source    │ /var/log/* only
│  (read-only)    │
└────────┬────────┘
         │
         ▼
┌──────────────────────────────────────┐
│     validate_log_source()            │
│  - Check path starts with /var/log   │
│  - Verify file is readable           │
│  - Reject symlinks outside /var/log  │
└────────┬─────────────────────────────┘
         │
         ▼
┌──────────────────────────────────────┐
│      sanitize_log_line()             │
│  - Remove control characters         │
│  - DLP: mask PII/secrets             │
│  - Truncate to MAX_LINE_LENGTH       │
└────────┬─────────────────────────────┘
         │
         ▼
┌──────────────────────────────────────┐
│     encode_json_payload()            │
│  - Use jq for safe JSON encoding     │
│  - No manual string escaping         │
└────────┬─────────────────────────────┘
         │
         ▼
┌──────────────────────────────────────┐
│   generate_hmac_signature()          │
│  - HMAC-SHA256(payload + timestamp)  │
│  - Prevent replay attacks            │
└────────┬─────────────────────────────┘
         │
         ▼
┌──────────────────────────────────────┐
│    dispatch_webhook_secure()         │
│  - HTTPS only                        │
│  - X-LogWhisperer-Signature header   │
│  - Timeout and retry with backoff    │
└──────────────────────────────────────┘

3. Security Requirements

3.1 Input Validation

Path Validation (ANTI-PATH TRAVERSAL)

validate_log_source() {
    local path="$1"
    
    # MUST start with /var/log/
    if [[ ! "$path" =~ ^/var/log/ ]]; then
        log_error "Invalid log source path: $path (must be under /var/log/)"
        return 1
    fi
    
    # MUST be a regular file or fifo (no symlinks outside /var/log)
    if [[ -L "$path" ]]; then
        local realpath
        realpath=$(readlink -f "$path")
        if [[ ! "$realpath" =~ ^/var/log/ ]]; then
            log_error "Symlink target outside /var/log: $realpath"
            return 1
        fi
    fi
    
    # MUST be readable
    if [[ ! -r "$path" ]]; then
        log_error "Log source not readable: $path"
        return 1
    fi
    
    return 0
}

Log Line Sanitization (DLP + ANTI-INJECTION)

sanitize_log_line() {
    local line="$1"
    
    # Remove control characters (keep only printable ASCII + newline)
    line=$(printf '%s' "$line" | tr -d '\x00-\x08\x0b-\x0c\x0e-\x1f\x7f')
    
    # Truncate to MAX_LINE_LENGTH
    if [[ ${#line} -gt $MAX_LINE_LENGTH ]]; then
        line="${line:0:$MAX_LINE_LENGTH}...[truncated]"
    fi
    
    # DLP: Mask sensitive patterns
    # Passwords
    line=$(printf '%s' "$line" | sed -E 's/(password|passwd|pwd)=[^[:space:]]+/\1=***/gi')
    # Email addresses
    line=$(printf '%s' "$line" | sed -E 's/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/[EMAIL]/g')
    # API Keys and Tokens (16+ alphanumeric chars)
    line=$(printf '%s' "$line" | sed -E 's/(api[_-]?key|token|secret)=[a-zA-Z0-9]{16,}/\1=***/gi')
    # IP addresses
    line=$(printf '%s' "$line" | sed -E 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/[IP]/g')
    
    printf '%s' "$line"
}

3.2 Safe JSON Encoding

ANTI-JSON INJECTION: Use jq

encode_json_payload() {
    local client_id="$1"
    local hostname="$2"
    local source="$3"
    local severity="$4"
    local raw_log="$5"
    local pattern="$6"
    local timestamp
    timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
    
    # Use jq for safe JSON encoding - no manual escaping
    jq -n \
        --arg client_id "$client_id" \
        --arg hostname "$hostname" \
        --arg source "$source" \
        --arg severity "$severity" \
        --arg timestamp "$timestamp" \
        --arg raw_log "$raw_log" \
        --arg pattern "$pattern" \
        '{
            client_id: $client_id,
            hostname: $hostname,
            source: $source,
            severity: $severity,
            timestamp: $timestamp,
            raw_log: $raw_log,
            matched_pattern: $pattern
        }'
}

Requirement: jq must be installed. Script exits with error if missing.

3.3 Webhook Authentication

HMAC-SHA256 Signature

generate_hmac_signature() {
    local payload="$1"
    local timestamp
    timestamp=$(date +%s)
    
    # Generate signature: HMAC-SHA256(payload + timestamp)
    local signature
    signature=$(printf '%s:%s' "$timestamp" "$payload" | \
        openssl dgst -sha256 -hmac "$CLIENT_SECRET" | \
        sed 's/^.* //')
    
    printf '%s:%s' "$timestamp" "$signature"
}

dispatch_webhook_secure() {
    local payload="$1"
    local sig_data
    sig_data=$(generate_hmac_signature "$payload")
    local timestamp=${sig_data%%:*}
    local signature=${sig_data#*:}
    
    # Enforce HTTPS
    if [[ ! "$WEBHOOK_URL" =~ ^https:// ]]; then
        log_error "Webhook URL must use HTTPS"
        return 1
    fi
    
    # Send with signature header
    curl -s -X POST "$WEBHOOK_URL" \
        -H "Content-Type: application/json" \
        -H "X-LogWhisperer-Signature: $signature" \
        -H "X-LogWhisperer-Timestamp: $timestamp" \
        -d "$payload" \
        --max-time 30 \
        --retry 3 \
        --retry-delay 1
}

New Configuration: CLIENT_SECRET (shared secret for HMAC)

3.4 Atomic File Operations

ANTI-RACE CONDITION

atomic_write_offset() {
    local offset_file="$1"
    local offset_value="$2"
    local tmp_file="${offset_file}.tmp.$$"
    
    # Write to temp file with PID suffix
    printf '%s' "$offset_value" > "$tmp_file"
    
    # Atomic move
    mv "$tmp_file" "$offset_file"
}

3.5 Safe Command Execution

ANTI-COMMAND INJECTION

# WRONG: vulnerable to injection
tail -n 0 -F "$log_source" 2>/dev/null | while read -r line; do ... done

# CORRECT: array-based, no interpretation
local tail_cmd=("tail" "-n" "0" "-F" "$log_source")
"${tail_cmd[@]}" 2>/dev/null | while IFS= read -r line; do ... done

Rules:

No eval anywhere
No backtick command substitution on user input
Use printf %q if variable must be in command
Use arrays for complex commands

4. Configuration

4.1 New Config Parameters

# config.env
WEBHOOK_URL="https://your-n8n-instance.com/webhook/logwhisperer"
CLIENT_ID="unique-client-uuid"
CLIENT_SECRET="shared-secret-for-hmac"  # NEW
LOG_SOURCES="/var/log/syslog,/var/log/nginx/error.log"
POLL_INTERVAL=5
MAX_LINE_LENGTH=2000
OFFSET_DIR="/var/lib/logwhisperer"

4.2 Validation Requirements

Parameter	Validation	Failure Action
`WEBHOOK_URL`	MUST be HTTPS	Exit with error
`CLIENT_ID`	Valid UUID format	Exit with error
`CLIENT_SECRET`	Min 32 chars, no spaces	Exit with error
`LOG_SOURCES`	All paths MUST be under /var/log	Skip invalid paths, log warning
`MAX_LINE_LENGTH`	Integer between 500-10000	Use default 2000

5. Dependencies

5.1 Required

Tool	Purpose	Check in Script
`jq`	Safe JSON encoding	Exit if missing
`curl`	HTTP POST	Exit if missing
`openssl`	HMAC-SHA256	Exit if missing
`date`	Timestamp generation	Exit if missing

5.2 Optional

Tool	Purpose	Fallback
`systemctl`	Service management	Skip systemd setup

6. Error Handling

6.1 Error Levels

Level	Description	Action
`FATAL`	Config invalid, security violation	Exit immediately
`ERROR`	Single log source unreadable	Skip source, continue
`WARN`	Retryable error (network)	Retry with backoff
`INFO`	Normal operation	Log and continue

6.2 Graceful Degradation

# If one log source fails, continue with others
for source in "${LOG_SOURCES_ARRAY[@]}"; do
    if ! validate_log_source "$source"; then
        log_error "Skipping invalid source: $source"
        continue
    fi
    monitor_source "$source" &
done

7. Testing Strategy

7.1 Security Test Cases (RED Phase)

Test ID	Description	Expected Behavior
`SEC-001`	Path `/etc/passwd` in LOG_SOURCES	Rejected, logged as error
`SEC-002`	Path `../../../etc/shadow`	Rejected, logged as error
`SEC-003`	Symlink to `/etc/shadow` from /var/log	Rejected, logged as error
`SEC-004`	Log line with `"; rm -rf /;"`	Sanitized, no command execution
`SEC-005`	Log line with `password=secret123`	Masked as `password=***` in payload
`SEC-006`	Log line with `user@example.com`	Masked as `[EMAIL]` in payload
`SEC-007`	Missing jq binary	Exit with clear error message
`SEC-008`	HTTP webhook URL (non HTTPS)	Exit with error
`SEC-009`	Payload tampering (wrong HMAC)	Webhook rejects (tested server-side)
`SEC-010`	Offset file corruption	Detected, reset to 0 (safe)

7.2 Integration Tests

Test ID	Description	Expected
`INT-001`	End-to-end with valid log	Payload delivered with HMAC
`INT-002`	Network timeout	Retry 3x, then skip
`INT-003`	Webhook returns 4xx	Stop retry, log error
`INT-004`	Multiple concurrent log sources	All monitored correctly

8. Acceptance Criteria

8.1 Security

All log sources validated against /var/log whitelist
JSON encoding uses jq (no manual escaping)
All payloads signed with HMAC-SHA256
HTTPS enforced for webhooks
DLP masking applied to PII/secrets
Atomic writes for offset files
No eval or command substitution on user input

8.2 Functionality

Backward compatible with Sprint 1 config (minus security fixes)
All Sprint 1 tests still pass (except where behavior changed for security)
New security tests pass
Graceful handling of missing jq/curl/openssl

8.3 Performance

No significant slowdown (< 10% overhead)
Sanitization completes in < 10ms per line
HMAC generation < 5ms per payload

9. Migration from Sprint 1

9.1 Breaking Changes

Aspect	Sprint 1	Sprint 2	Migration
JSON Encoding	Manual sed	jq required	Install jq
Webhook Auth	None	HMAC	Add CLIENT_SECRET
Path Validation	None	/var/log only	Update config if needed
Dependencies	bash, curl	+ jq, openssl	Update install.sh

9.2 Upgrade Path

# install.sh will:
1. Check for jq, install if missing
2. Generate CLIENT_SECRET if not present
3. Validate existing LOG_SOURCES
4. Warn about paths outside /var/log

10. Risks and Mitigations

Risk	Likelihood	Impact	Mitigation
jq not available on target	Medium	High	Fallback to Python JSON encoding
Performance degradation	Low	Medium	Benchmark tests
False positives in DLP	Medium	Low	Configurable DLP patterns
Backward compatibility	Medium	Medium	Major version bump, migration guide

11. Notes for Implementation

11.1 @context-auditor Checklist

Before implementation, verify:

Latest jq documentation for JSON encoding options
Best practices for HMAC-SHA256 in bash
curl security flags for production use

11.2 @security-auditor Pre-implementation Review

Required before GREEN phase:

Review validate_log_source() logic
Verify sanitize_log_line() regex patterns
Check HMAC implementation for timing attacks
Confirm atomic write implementation

11.3 @qa-engineer Test Requirements

Create tests for:

All SEC-* test cases (RED phase)
Integration with webhook signature verification
Performance benchmarks

Security First. Safety Always.

15 KiB Raw Blame History