Executive summary
- What broke: Logging coverage was incomplete or misconfigured.
- Why it matters: detection and incident response depend on reliable telemetry.
- How to catch it: monitor logging configuration changes and event volume health.
- How to prevent it: centralization, guardrails, and logging health alerts.
Architecture
Baseline design: organization-wide trail, centralized into a protected logging destination.
Diagram placeholder: AWS Accounts → Org Trail → Central Log Storage → SIEM / Alerts.
The misconfiguration
- Trail not multi-region (activity in another region not captured).
- Logs not centralized across accounts.
- Trail stopped or destination bucket policy modified.
- High-risk data events not enabled for sensitive services.
Why this is dangerous
- Incident response impact: inability to reconstruct events.
- Detection impact: alerts fail because events never arrive.
- False confidence: dashboards appear quiet while blind.
Exploitation simulation (isolated lab)
- Create a controlled event in a covered area and verify log capture.
- Create a controlled event in an uncovered area and confirm absence.
- Document timestamps and expected vs. actual telemetry.
Detection strategy
- Configuration monitoring: alert when trails are stopped, deleted, or modified.
- Destination monitoring: alert on log bucket policy changes.
- Volume baselines: detect sudden drops in event flow.
You want to detect both logging drift and missing telemetry.
Remediation
- Enable org-wide, multi-region logging.
- Centralize logs into a protected account.
- Define mandatory logging coverage for critical services.
Hardening checklist
- Guardrails preventing logging disablement.
- Separate log ownership from workload teams.
- Immutable or append-only log storage.
- Monitoring that checks log freshness.
Lessons learned
- No alerts can mean no visibility.
- Logging must be treated as production infrastructure.
- Centralization reduces silent failure risk.