Adopt CrowdSec for intrusion detection

ADR-0040 ACCEPTED · 2026-01-30

CrowdSec for Intrusion Detection

Context

The application runs on a public-facing Hetzner cluster with SSH and Traefik exposed to the internet. Any public endpoint receives constant automated scanning from bots probing for vulnerabilities (WordPress, phpMyAdmin, command injection, etc.).

Previously Suricata (deep packet inspection IDS) was evaluated but provided limited value:

Most traffic is TLS-encrypted (can't inspect payload)
High false positives from TCP stream reassembly
Heavy resource usage and ongoing rule tuning required

A lightweight, log-based detection system better fits our architecture where application logs are the source of truth for request content.

Decision

Adopt CrowdSec for intrusion detection using log-based analysis with community threat intelligence.

Architecture

┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  Application     │────▶│    CrowdSec      │────▶│    Bouncer       │
│  Logs (Traefik,  │     │    Engine        │     │   (nftables)     │
│  SSH)            │     │                  │     │                  │
└──────────────────┘     └──────────────────┘     └──────────────────┘
                                │
                                ▼
                       ┌──────────────────┐
                       │  CrowdSec Cloud  │
                       │  (CAPI - shared  │
                       │   blocklists)    │
                       └──────────────────┘

Detection Approach: Log-Based (Reactive)

CrowdSec reads application logs after requests are processed:

Request hits Traefik → processed → logged
CrowdSec parses log → matches scenario (e.g., SQL injection pattern)
Creates ban decision → nftables blocks future requests from that IP

Trade-off: The first malicious request (or series triggering a scenario) gets through. This is acceptable because:

Single probes returning 404 are reconnaissance, not successful attacks
Most scenarios require multiple bad requests before triggering (reduces false positives)
Attackers learn nothing useful from error responses

Proactive Blocking: CAPI Blocklists

CrowdSec's Central API (CAPI) provides crowd-sourced threat intelligence:

IPs attacking other CrowdSec users are shared (anonymized)
Known-bad IPs blocked before they hit your server
70k+ users contributing to shared blocklist

This provides proactive protection without inline inspection overhead.

Future Consideration: AppSec WAF (Inline)

For applications requiring inline blocking (e.g., payment processing, PII handling), CrowdSec offers an AppSec component:

┌──────────┐     ┌──────────────┐     ┌──────────┐
│  Client  │────▶│   Traefik    │────▶│   App    │
└──────────┘     │  + CrowdSec  │     └──────────┘
                 │   Plugin     │
                 └──────┬───────┘
                        │ (inspect before routing)
                        ▼
                 ┌──────────────┐
                 │  CrowdSec    │
                 │  AppSec      │
                 │  (port 7422) │
                 └──────────────┘

Requirements:

Traefik bouncer plugin v1.2.0+
crowdsecurity/appsec-virtual-patching collection
AppSec acquisition config

Trade-offs:

Adds latency to every request (extra hop)
More complex configuration
Higher resource usage

Current stance: Not needed for pre-launch travel app. Log-based detection + CAPI is sufficient. Revisit if handling sensitive data or experiencing targeted attacks.

Rationale

Why log-based over inline WAF?

Simpler architecture, fewer failure modes
No latency added to legitimate requests
CAPI blocklists provide most proactive value anyway
Our attack surface is small (GraphQL API, not exploitable legacy PHP)

Why CrowdSec over alternatives?

Lightweight compared to Suricata/Snort
Works with encrypted traffic (reads decrypted logs)
Community blocklists leverage collective intelligence
Good NixOS module support

Why per-node rather than centralized?

Each node receives same CAPI blocklists independently
Avoids single point of failure for security
Simpler than running shared LAPI for 3-node cluster

Consequences

Positive

Low overhead: Reads logs asynchronously, no request latency
Community intelligence: Benefits from 70k+ user threat data
Works with TLS: Inspects decrypted application logs
Automatic updates: Hub collections updated daily

Negative

Reactive detection: First few malicious requests get through
Log dependency: Only sees what applications log
No payload inspection: Can't detect zero-days in request bodies until logged

Monitoring

Metrics exported to Prometheus/Grafana:

Active bans and alerts
Packets blocked (local vs CAPI)
Log lines parsed and scenario matches

References

CrowdSec Documentation
CrowdSec AppSec Quickstart
Traefik Bouncer Plugin
GitHub Issue #76: Replace Suricata with CrowdSec