TSDB
OpenSearch
Bamboo
Mixed Bag
Fielding Alerts
100

Fastest command to determine TSDB app health status across a Pod

tbrpc status

100

Number of master nodes in the per-pod OpenSearch clusters

3

100

Department that ensures Build Agents are standardized

TechOps

100

What to do when vpn.logicmonitor.net is not working for you

1. clear route tables

2. try location-specific VPN

100

Process for getting help after receiving an alert

1. Ping TOP Slack channel

2. Escalate alert via PD

200

Do this when Santaba queries to one TSDB server are failing

Shutdown TSDB app on that server
200

Number of data worker nodes in per-pod OpenSearch clusters

6

200

Department responsible for correcting/adjusting ACL permissions in Bamboo

InfoSec ala Engineering Access Teams

200

How can you quickly notify backup on-call that you need help?

Escalate via PagerDuty app

or File an SD ticket and let the page slip through

200

One way to keep support updated about a Service Disruption

communicate via slack in #help-support or #techops-support

Bonus Points for High Severity process

300

Common size of /db partition for Datacenter Pods and New Reduced AWS Volumes

25TB for DC

4TB for AWS

300

How to resolve out-of-storage alerts in OpenSearch?

Resize cluster EBS volumes on worker nodes

300

Reason we cannot safely stop Santaba deployments

The ansible play, if cancelled mid-way, will leave Santaba in a non-functioning state

300

True or False
This TF implementation is correct for our Pods project:
provider "aws" { region  = local.region }
where
region = element(split("/", var.directory), 1)

False

300

Best course of action for Santaba Consul Service Registration alerts at 2am

Restart Santaba

400

Which service produces data into the tsdb kafka topic

MPP - MetricsProcessorv2

400

How to resolve OpenSearch worker node over-utilization

Resize the instance types of the worker nodes

400

How Bamboo deploys non-k8s applications (and which are they)

Dockerized ansible playbooks (Santaba, Reporting, TSDB)

400

Why is a Page On-Call or Service Disruption ticket not paging TechOps

There exists a ticket of that type in Jira that's still left in Open status

400

Way to get Development online to help with a Service Disruption

Escalate SD ticket in PagerDuty by reassigning to the correct Escalation Chain

500

Two ways to restore data to TSDB

1) backups localrestoreimmutable 

2) tbclone

500

Two ways we limit access to OpenSearch clusters

1) VPC facing (old ones were public facing)

2) IAM policies

500

How Bamboo deploys k8s applications

k8sdeployer

500

How you determine active companies on a Santaba using only mysql commands

select name,status from santaba.companies (where status='active')

500

How you recover a failed Santaba server

TF a new one, restore all customers (more details)
M
e
n
u