The push towards digital transformation and cloud-native infrastructure is massive, yet organizations also need to maintain legacy capabilities. With this pressure comes the need to manage operations with the same rigor and automation we apply to infrastructure, coding, and security. Many organizations have embraced the ideas of everything in a pipeline and all things as code. Teams are successfully deploying applications and the underlying frameworks, but the actual operation of service delivery and assurance is often an afterthought or purely reactive.
PagerDuty fills this gap with Operations as Code.
Operations as Code extends the principles of Infrastructure as Code (IaC) to operational procedures. It involves defining, managing, and executing operational tasks — such as defining escalation policies, orchestrations that link runbooks, automating diagnostics and standardizing incident workflows — using PagerDuty’s Terraform provider. This approach ensures that operational practices are standardized, version-controlled, and can be executed with minimal human intervention.
Full Service Ownership
One of the tenets PagerDuty has long subscribed to is Full Service Ownership – You build it, you run it, you own it. Operations as Code removes the dependency on centralized teams. As the need for speed increases, DevOps teams cannot be beholden to centralized ITSM or even PagerDuty admins to integrate new monitoring, enrich events, or create new runbooks.
Similarly, centralized ServiceNow teams spending expensive, specialized skills on monitoring integrations, event management, enrichment, and automation that can be managed via Operations as Code makes little economic sense. These teams, especially in large organizations, are already stretched thin and the backlog of work grows daily. Leveraging PagerDuty’s Terraform provider achieves the same goals while delivering better outcomes for everyone.
Leveraging Pipelines and Terraform for Operations
Terraform, traditionally used in IaC, is the lingua franca of DevOps. By writing Terraform configurations, teams can automate the provisioning and management of not only infrastructure, but also the components and workflows that ensure Operational Excellence. PagerDuty’s Terraform can build service definitions, configure users, teams, and roles, define escalation policies and schedules, build event correlation, orchestration, and runbooks for automated diagnostics.
Continuous Integration and Continuous Deployment (CI/CD) play a crucial role in Operations as Code. By integrating operational tasks into CI/CD pipelines, you can ensure that changes are tested, reviewed, and deployed in a controlled and automated manner. Instead of directly changing configurations via PagerDuty’s UI or API, pipelines allow for version control, standardization, and rollback if there are errors.
Quality gates are traditionally used for code reviews, automated testing, security checks, etc. For Operations as Code, they can ensure consistency of service standards, such as minimum 3 tier escalation policy and maximum times between escalations, minimum requirements for runbooks, minimum enrichment via orchestrations, etc.
This creates a great foundation to increase operational maturity. It’s easy to start with basic templates and rules such as “never ship an app without a runbook”. You can leverage a quality gate to check that there is always a Terraform with a link to a Confluence document or knowledge base article.
You can then grow over time, identifying “Winners and Sinners” applications to baseline current operational maturity. Templates can be standardized and reused by teams that may not be as mature. One customer using this model found that services that met at least 5 of their 7 operational standards had about 30% better MTTR than those that didn’t. This will eventually lead them to defining minimal operating standards and breaking builds for those teams and services that don’t meet expectations.
Benefits of Operations as Code
Organizations that deploy Operations as Code will see several benefits, many with immediate return on investment (ROI).
Toil reduction is critical. Too much time is spent in “ClickOps”, and by shifting from manual configurations, more time and resources are freed up for customer-impacting work. You will also reduce operational risk by ensuring traceability of changes to configurations, version control, and reusable templates. Similarly, you can operationalize governance and compliance by leveraging parsers, quality gates, and approved templates, while leadership can define minimum acceptable standards and expected outcomes.
Developer experience is improved by reducing ramp time of new team members, reducing toil in keeping the lights on and shift break-fix work to junior team members, so senior staff can focus on reducing tech debt (or mining tech wealth, if you’re optimistic) to deliver great customer experiences.
Operational Excellence is improved by reducing the frequency, severity, and duration of outages by ensuring repeatable outcomes and reduced errors. You can shift away from tribal knowledge by giving senior people a simplified, repeatable method to record their innate knowledge, creating context for reuse by junior staff.
Getting Started
Talk to your PagerDuty contact on how to get started.
We’ll start with success metrics and then identify the areas where we can get a fast start with automation and templates. Where could you immediately reduce risk and what outcomes could you influence with standardizing operations?
We’ll look at the ability to start a Center of Excellence with the right enthusiasts and experts who can help with Q&A, become keepers of the templates, and help continuously improve automation and orchestration.
We’ll start with simple but impactful areas, and then focus on continuous improvement where we regularly review and improve your processes based on feedback and metrics.
What’s Next?
Operations as Code offers the promise of consistency, efficiency, and reliability by standardizing how you build operational tasks. By leveraging PagerDuty’s Terraform provider with your CI/CD pipelines, you can lead your teams in adopting this transformative approach. While challenges exist, they are readily surmountable with careful planning, execution, and continuous improvement especially if you’ve engaged your PagerDuty team.
This simplified approach to Operations as Code can be a cornerstone of Operational Excellence allowing your teams to move from a world of toil and break-fix to automation-driven full service ownership that will better serve your teams, and most importantly, your customers.
To get hands on with this, sign up for a free trial today.
The post Operations as Code: Operational Excellence with PagerDuty appeared first on PagerDuty.