Following the Google playbook for Site Reliability Engineering (SRE)
SRE as a paradigm, as practised within Google, offers a way of managing the inherent tension between a product team’s wish to achieve velocity in feature delivery, and what would traditionally be an operations team of sysadmins whose responsibility is to keep deployed environments stable.
Point 14 of the GOV.UK Service Standard codifies the responsibilities of a GOV.UK service team to provide a reliable service. The Google SRE paradigm, which includes the concepts of SLOs, SLIs, and error budgets, can be a useful one to adopt as it will assist in setting fine-grained service level expectations, and ensuring your team is able to deliver features and evolve your service while adhering to these agreed service levels.
Read an introduction to setting Service Level Objectives (SLOs).
Consider running a Service Level Indicator (SLI) workshop.