Interwebz
Bob LOBlaw
Mayday
EZ Points
Servers
100
Uh oh. Looks like we'll have to call the client's ISP to troubleshoot an internet down issue.


Where do you go to find out who is their ISP?

ITG Internet / WAN asset

100

You're working on a LOB app performance issue affecting everyone but need to restart SQL. Who do you contact to get approval? Do you need a CR?

POC and no CR is needed since its an incident but you still need to notify CC Teams

100

Something bad happened and you're trying to set a priority on the ticket but can't decide between two priorities (Med / High)

Which one do you set?

High. Always set the higher of the two priorities if you're unsure

100

What do you fill out after an incident is resolved?

Incident report

100

You use this to get pre-windows lights out control on HP servers

Bonus points if you know the words in the acronym

ILO Integrated Lights Out
200

Uh oh. User called to report their office internet is down. Whats the first thing you do? 

Hint: its the first step in our network down ops manual article.

Check monitoring to verify that the alert or report from a user is indeed an outage.

200

Worldox is failing to launch for everyone. Who do you call if the client has both Baker Cadence and Worldox Support listed as vendors?

WD Support

BC is for consulting, projects or complex changes.

200

Who do you communicate incident updates to on the client side?

POC or alternate POCs if they're unavailable

200

Uh oh. Colo is officially down, what priority do we use?

Priority 1 - Critical

200

You use this to get pre-windows lights out control on Dell servers

Bonus points if you know the words in the acronym

Integrated Dell Remote Access Controller (iDRAC)

300

Modem is offline and confirmed to be powered on and cables firmly reseated. Who do you call to fix?

ISP

300

Uh oh. PC Law is spitting out a funky error preventing users from using Matter Manager. You've gone through our docs, Googled and checked with our team but no solution in sight.

What do you do next?

Call the vendor for support

300

When you communicate outage updates which medium do you use (phone, email, RFC 2549)? Why?

Phone (cell # if phones are down) because its fastest and if network is down emails probably wont work

300

Ah man. A network is confirmed to be down. Which document should I follow to troubleshoot?

Network Down in the SDOM

300

Hmm... RAID controller is showing a failed disk. What do you do next?

Hint: this answer is pretty flexible so use your judgement

1. Check that we have a good backup and possibly run another one for now + shorten the time between backups

2. Dispatch onsite tech to reseat drive. If it still shows failed then remove drive.

3. Contact vendor while onsite to get a warranty issued.

400

In which order do you troubleshoot an internet down incident?

Router, AD / DNS server(s), Modem, Switch

  1. Is the modem online?
  2. Is the router online?
  3. Is the switch online?
  4. Is the internal DNS server online?
400

Oh man. You have to do some risky stuff to fix a LOB app issue after hours on a Windows VM.

What should you do before and after you apply the fix?

Create a HV checkpoint and delete the checkpoint afterwards.

400

Uh oh. A security incident has occurred and you've confirmed it with the client and systems. You've notified the CC team and are working on the issue now.

Whats your first priority?

Stop the attack in progress

400

Awww yeah. The server performance issue self resolved.

Do we call it a win and close the issue?

No, if you did nothing then its likely it will reoccur. Be a good tech and find out why and what mitigations should be put in place to prevent it from happening again.

400

Uh oh. Server won't power on and onsite contact confirmed that power is good on that circuit. 

What should you do? Walk us through the steps and troubleshooting you might do.

1. Have onsite contact check that power cables are seated and UPS is powered on.

2. Dispatch onsite ASAP

3. While onsite, double check tests and contact vendor for warranty support.

4. Spin up BDR

500

Uh oh. Meraki is showing as online but all computers and servers are offline.

Wut could it be? Name 2 likely causes [SERIOUS]

¯\_(ツ)_/¯

DNS, L2 switching related, L3 (bad firewall rule, NAT, etc)

500

When assessing a LOB app outage, what should you do on that initial contact with them?

  1. Log onto affected users to verify that the report from a user is indeed an outage.
    1. Check when they first noticed issues also find out the extent of the issue (e.g. is it just some users, some programs, etc.)
    2. Document in the ticket what the symptoms are and how to recreate the issue
500

Phew. Security incident averted and incident report filed.

What is the last thing to do and who should do it?

Have Scott / Alfred contact the client with the incident report details.

500

Who can never be trusted?

The user

500

Server has died and is out of warranty. We've told the client multiple times already to replace the thing and they've turned us down.

After you've chuckled at their misfortune what are a few things you can do?

Send to sales and get a quote out for a replacement.

Get a hot spare replacement server in place with approval from Andy, Scott or Alfred.

Fire up BDR. Chances are if they didn't listen to our warnings about the server they didn't want to buy a BDR.