How to store and query logs
You should store logs to detect any potential errors within infrastructure and to respond to security incidents.
Your product should have a proportionate design for short and long term storage of logs and ensuring the Confidentiality, Integrity, and Availability of logs.
The NCSC Cyber Assessment Framework, which MHCLG must comply with, has an entire category dedicated to Security Monitoring.
Short-term storage
You should not need to store logs spanning long periods in your short-term queryable store. Practical retention periods for short-term queryable logs are:
- no more than 7 days for non-production environments
- no more than 30 days production environments
You should consider storing security and audit events for up to a year, this is because the average MTTD (Mean Time to Detect) is 204 days (over 6 months) to identify a breach, according to a 2023 IBM data breach study.
Your product may have legal or other requirements determining how long you should store logs. For example, the Payment Card Industry Data Security Standard (PCI-DSS) requires 3 months of easily accessible logs and 12 months of archived logs.
Long-term storage
When storing logs, you should focus on long-term durability for compliance reasons or to identify long-term performance trends.
You may consider archiving to S3 in human-readable files using your own process.
Log shipping
If you have set up log shipping, you should consider:
- how your logs will get to short and long-term stores - we recommend using Amazon CloudWatch to collect logs
- what happens if one of these stores is unavailable
- whether the store will be overloaded when it comes back online
Filtering out sensitive information
You should ensure that sensitive information, such as query parameters containing personally identifiable information (PII), is filtered out before logs are shipped. Make use of log filtering mechanisms which may be present in the application framework you are using, such as in Ruby on Rails.
Structured logging
In order to allow for rich querying of log data you should ensure that your logs are in a structured format.