In IT, this is defined as an unplanned interruption to an IT service or a reduction in the quality of an IT service.
What is an incident?
Any addition, modification, or removal of anything that could have an effect on IT services is known as this.
What is a change?
Problem management reduces these by identifying and fixing root causes
Incidents
Tool used to monitor network health
ThousandEyes
First President of India
Dr. Rajendra Prasad
The primary objective of this ITIL process is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations.
What is incident management?
A primary goal of effective change management is to minimize the risk of these negative events occurring after a modification to an IT system.
What are incidents or outages
Root cause category that involves procedures that were not followed
People/Process Failure
One of the three pillars of observability, these are timestamped records of events occurring within a system, often used for debugging and auditing.
What are logs
First president of the United States
George Washington
When an incident cannot be resolved at the first point of contact, it is passed to a higher level of support, often following a defined set of rules, in this process.
What is escalation
Before a significant change is implemented, this is often required, involving testing the change in a non-production environment to ensure it works as expected and doesn't introduce new issues.
What is testing (or staging) phase?
This type of problem management occurs after incidents have occurred, while its proactive counterpart aims to prevent incidents before they arise.
Reactive problem management
Another key pillar of observability, these are numerical data points collected over time, often used to track system performance and health, such as CPU utilization or request rates.
What are metrics
Which iconic white marble mausoleum was built by a Mughal emperor?
Taj Mahal
This formal agreement between a service provider and a customer defines the expected level of service, including metrics like incident response times and resolution targets.
What is a Service Level Agreement (SLA)
This formal group or committee reviews and approves significant changes, ensuring they are properly assessed, planned, and authorized before implementation.
What is the Change Advisory Board (CAB)
While incident management focuses on quickly restoring service, problem management aims to find and eliminate this, which is the underlying reason for recurring issues.
What is the root cause?
Response time measures the speed at which a system processes a request and returns a result, and for user-facing applications, is directly related to user experience.
Response Time
This city in the United States contains the highest number of people
New York
During a major IT outage, this specific role is responsible for coordinating all resolution efforts, managing communication to stakeholders, and ensuring the incident is resolved within agreed-upon service level targets.
What is the Major Incident Manager (or Incident Commander)
This type of change is pre-authorized, low-risk, and commonly performed, often following a well-defined procedure.
What is a standard change?
This is a temporary solution or a way to reduce the impact of a problem, often used while the permanent fix is being developed or implemented.
What is a workaround?
The process of identifying and associating related events across multiple datasets or data types. This association enables to understand each events in the context of the overall system.
Event Correlation
What city is considered the financial capital of India
Mumbai