PagerDuty Incident Status in Slack

SlackOps for PagerDuty Part 4

View earlier blogs in this series here.

No matter how effective you are at resolving incidents, poor incident update practices lead to headaches all around. If you don’t provide an update you can be sure someone will interrupt you to ask “what’s the status on this incident?” When you have customer impacting issues, not providing timely updates can significantly hurt customer satisfaction or lead to the outright loss of customers. A less obvious consequence is duplicated or wasted effort. If others don’t know what’s going on then they are likely to expend energy chasing down the same data or wrong leads. This is especially true when multiple incidents are going on and may be related or with complex incidents that involve multiple teams. Getting PagerDuty Incident Status in Slack is the way to solve this problem.

There is really no good reason to not to be making regular incident updates, yet this happens all the time. It’s easy to forget to make an update when you are working the incident. Even if you are fortunate to have an organizational structure that allows for a comms owner for every incident there are still often cases where updates are made internally but forgotten externally, or vis versa.

So lets see how to speed those update efforts up and help ensure one never gets missed.

Step 1 Make your PagerDuty Incident Status in Slack updates

With PagerDuty you can post a note to an incident or you can provide a status update. There are benefits to both and both can be done with RigD in Slack. Start by typing

add pagerduty note

Then provide the incident number and your note.

Add a PegerDuty Incident Note

Similarly for a PagerDuty incident status update start with

update pagerduty status

And again add the incident number and your status update.

Update PagerDuty Incident Status from Slack

One additional manual convenience we provide is our incident activities menu which appears with every PagerDuty incident feed notification or when you get incident details in Slack.

Take Any PagerDuty Incident Action from Slack

Step 2 Use RigD Automation to Open the PagerDuty Incident Slack Channel

Making those updates adhoc in Slack will definitely add a measure of convenience, but it won’t combat forgetfulness. To do that we need to set up automated update reminders. We will again use a RigD flow that makes both an incident status update and add a note, thus ensuring no one misses the latest update. We have another helpful guide to speed the setup of this. Start from the PagerDuty help by typing

help with pagerduty

Then choose the Automate Incident Updates button

Automate Incident Updates

Your first update should always be at a set amount of time; it’s the one update that most often gets missed or delayed while you try to validate the problem and asses the impact. Choose a time to make that first update, we recommend not more than 10 minutes.

How Long before you make your first update

Next you need to decide how to hand subsequent updates. Given most major incidents last for hours you will be making many follow on updates so you want to strike a balance in timing. You can also skip this input and choose the interval between updates manually after each update. This can helpful in managing that balance between over and under communication, but don’t forget to set it each time!

Decide how frequently to make subsequent Incident Updates

Finally, choose some text for the RigD alias trigger to make it easy to initiate the update automation during an incident.

Choose Text for your RigD Alias

ou now have everything you need to never again forget to make an incident update. Let’s see how it works in practice by typing our alias text

!p1 updates
Adding a Flow

This update sequence will kick off in a Slack thread. Why do we use threads? Using a thread for this allows you to keep it in the forefront in Slack while you engage in discussion and coordination in your primary incident channel space. This helps reduce the potential to miss making an update and also prevents your update activity from distracting others in the main channel discussion.

Incident Updates in Slack Threads

Automated reminders do reliably drive those incident updates and the speed and simplicity of making them right in Slack.

Now when it comes to making customer facing updates we love Atlassian StatusPage and so do a lot of our customers. So we often see this flow modified to include both an internal update to the PagerDuty incident as well as an external update to a corresponding StatusPage Incident. This easy to do with RigD and we are always available to help.

As with our previous parts lets take a look at the time savings and financial impact of this Slack based approach. Assuming a relatively simple and well understood update posting it manually in the PagerDuty UI takes about 26 seconds. From the PagerDuty ROI study the average duration for a regular incident is 42 minutes and 66 for an outage. If we make an update every 15 minutes and a final resolution update we get about 4 updates per incident. If we are going to make both an update and post a note for completeness we are looking at 1183 hours spent making updates annually, and updates account for $271,787 of the annual cost of outages . Using an automated RigD update flow you are looking at at 3 seconds to start the flow and about 6 seconds per update for a total of just 36 seconds per incident. So implementing RigD for those updates has just wiped out $224,747 of outage costs and given your team back 979 hours over the year. You might be thinking that’s crazy do companies really loose that much money. Consider that Amazon lost an estimated $90m in about 75 minutes according to this Tech Crunch article. That’s $1.2 million per minute. Makes loosing a couple hundred thousand dollars in a year seem minor. Sure none of us are Amazons size, but every dollar and minute lost matters regardless of company size.

Make sure you set up your PagerDuty incident status in Slack.

Learn more about RigD here, and give our Slack App a try.

Next Up in this Series: Part 5: Take Command of your PagerDuty Incident Response

No Comments

Sorry, the comment form is closed at this time.