Fastest command to determine TSDB app health status across a Pod
tbrpc status
Number of master nodes in the per-pod OpenSearch clusters
3
Department that ensures Build Agents are standardized
TechOps
What to do when vpn.logicmonitor.net is not working for you
1. clear route tables
2. try location-specific VPN
Process for getting help after receiving an alert
1. Ping TOP Slack channel
2. Escalate alert via PD
Do this when Santaba queries to one TSDB server are failing
Number of data worker nodes in per-pod OpenSearch clusters
6
Department responsible for correcting/adjusting ACL permissions in Bamboo
InfoSec ala Engineering Access Teams
How can you quickly notify backup on-call that you need help?
Escalate via PagerDuty app
or File an SD ticket and let the page slip through
One way to keep support updated about a Service Disruption
communicate via slack in #help-support or #techops-support
Bonus Points for High Severity process
Common size of /db partition for Datacenter Pods and New Reduced AWS Volumes
25TB for DC
4TB for AWS
How to resolve out-of-storage alerts in OpenSearch?
Resize cluster EBS volumes on worker nodes
Reason we cannot safely stop Santaba deployments
The ansible play, if cancelled mid-way, will leave Santaba in a non-functioning state
True or False
This TF implementation is correct for our Pods project:
provider "aws" { region = local.region }
where
region = element(split("/", var.directory), 1)
False
Best course of action for Santaba Consul Service Registration alerts at 2am
Restart Santaba
Which service produces data into the tsdb kafka topic
MPP - MetricsProcessorv2
How to resolve OpenSearch worker node over-utilization
Resize the instance types of the worker nodes
How Bamboo deploys non-k8s applications (and which are they)
Dockerized ansible playbooks (Santaba, Reporting, TSDB)
Why is a Page On-Call or Service Disruption ticket not paging TechOps
There exists a ticket of that type in Jira that's still left in Open status
Way to get Development online to help with a Service Disruption
Escalate SD ticket in PagerDuty by reassigning to the correct Escalation Chain
Two ways to restore data to TSDB
1) backups localrestoreimmutable
2) tbclone
Two ways we limit access to OpenSearch clusters
1) VPC facing (old ones were public facing)
2) IAM policies
How Bamboo deploys k8s applications
k8sdeployer
How you determine active companies on a Santaba using only mysql commands
select name,status from santaba.companies (where status='active')
How you recover a failed Santaba server