Incidents 101
Let's Talk Datadog
A Potential Incident And The Path Forward
Mock Incidents
Incidents 101 Continued
100

An event that disrupts or diminishes the quality of Spreedly’s service, impacting a customer's ability to use the product.

What is an incident? 

100

State the navigation path to the incident page in Datadog. 

What is "Service Management -> Incidents"? 

100

- Complete the statement below. -
If you are unsure if something that you are seeing is an incident, _____ _____ and err on the side of caution and page someone.

What is "say something"? 

100

These two ticket types are used for organizing and tracking tickets in Zendesk during an incident. 

What are "Problem" and "Incident" tickets? 

100

Spreedly currently has _____ severity levels.

What is "4" severity levels? 

200

The goal of incident response is to ______ ______ as quickly as possible.

What is "restore service"? 

200

True or False. Datadog allows posting directly to the status page. 

What is "True"? 

200

Some(not all) requests to Spreedly’s public API to tokenize, deliver, authorize, or purchase are down. Give this situation a severity. 

What is a "Severity 2"? 

200

States what's happening below.  

What is "paging the incident coordinator on call" 

200

There are three channels in Slack that are of importance when it comes to incidents. 

What is:
-"incident-chat"
-"incident-response"
-"incident-post-review"

300

Part of this individual's responsibility is to:
-Declare the severity level of the Incident
-Approve public messaging
-Form the incident response team

What is "Incident Coordinator"? 

300

True or False: When declaring an incident in Datadog, an individual must choose the severity level and the correct Incident coordinator before declaring the incident.  

What is "false"? 


These items are not mandatory, and the IC will update the fields. 

300

True or False: A former customer writes in and states that all transactions are down on the Datatrans gateway. You should initiate an incident.  

What is "False"? 

This is not a Spreedly customer, and therefore, we should not initiate an incident. 

300

You are the support engineer on call for incidents, and you receive a page from OpsGenie. Indicate the channel you should access first. 

What is "incident-chat"? 

300

In a _____ session, we address any problems, bottlenecks, mistakes made, and successes achieved that took place during the incident.

What is a "Retro" session? 

400

This incident responder is called in to resolve the issue and identify the root cause of the incident. 

What is "Engineering Responder"? 

400

True or False. Datadog has built-in monitoring that can aid our engineers in the early detection of an issue. 

What is "True"? 

Datadog's monitoring enables our engineers to detect issues more quickly, even before customers are aware.

400

True or False: We don't have to engage the Account Manager for sensitive accounts after an incident. 

What is "False"?

We should engage the account manager in instances where we know the account is sensitive, and especially in sticky instances like incidents. 

400

List 1 of the three ways an incident may be triggered. 

1. What is "the customer" via slack or zendesk? 

2. What is "monitoring"?
3. What is via "red alert"? 

 
400

True or False: You have been asked to update the status page of an ongoing incident. You should write a draft and post it without review. 

What is "false"?

You should write up a draft and have it reviewed by the Incident Coordinator. 

500

This incident responder is in charge of customer communications, updating the Status Page, and handling communications via Zendesk and other channels.

What is "Support Responder"? 

500

What are 1 of the two ways we can declare an incident? 

1. What is "from Slack type /datadog".
Or
2. What is "from Datadog Service Mgmt → Incidents"

See Visual: Declaring An Incident

500

All requests to Spreedly’s public API to tokenize, deliver, authorize, and purchase are down. Give this situation a severity. 

What is a "Severity 1"? 

500

True or False: A post-mortem will follow an incident that the incident coordinator labeled as a Sev1 or Sev2. 

What is "True"?

500

This channel is used after an incident to relay details of the post-mortem. 

What is "incident-post-review"?

M
e
n
u