This type of analysis describes current characteristics or patterns in the data to answer the fundamental question, "What happened?"
Descriptive Analysis
In this initial step, you must determine what you want to achieve to decide which specific technique is most appropriate for your data
Define your Data Analysis Goal
In this learning category, the model is trained on labeled data, meaning the input data is paired with the correct corresponding output labels
Supervised Learning
This algorithm uses a tree-like structure where internal nodes represent feature-based decisions and leaf nodes represent final class labels
Decision Tree
This common metric represents the proportion of correctly classified instances out of the total number of instances
Accuracy
Taking patterns from descriptive analytics a step further, this process uses techniques like root cause analysis to discover "Why did this happen?"
Diagnostic Analysis
This step is vital because your choice here determines the specific features and limitations you will face during data gathering and processing
Choose your Data Analysis Tool/s
The goal of this learning type is to discover hidden structures or patterns, such as clustering, within data that has no associated target values
Unsupervised Learning
This probabilistic classifier is based on Bayes' Theorem and operates under the assumption that features are conditionally independent given the class
Naive Bayes
Also known as a Type I error, this occurs when the model incorrectly predicts a positive outcome for an actually negative case
False Positive
This analysis type uses historical and current data to forecast what might happen in the future, often utilizing regression models or time series analysis
Predictive Analysis
To gain a comprehensive understanding of patterns, you should ideally perform this step from several different perspectives
Analyze your Data
Used when the output is a continuous value, this specific supervised learning technique is used to predict things like house prices or stock trends
Regression
This non-parametric method classifies a sample based on the majority class of its "K" closest neighbors within the feature space
K-Nearest Neighbors (KNN)
This metric, also called sensitivity, measures the proportion of actual positive instances that the model correctly identified
Recall
Often serving as an "expert opinion," this analysis suggests actionable takeaways to support decisions on what a business should do next
Prescriptive Analysis
This specific process utilizes specialized techniques to enhance the overall quality and reliability of the data you have gathered
Preprocess your Data
This unsupervised learning technique involves reducing the number of features in a dataset while carefully retaining the most important information
Dimensionality Reduction (or PCA)
This supervised model works by finding the "optimal hyperplane" to best separate different classes in a feature space
Support Vector Machine (SVM)
Known as a Type II error, this occurs when the model predicts a negative outcome for an instance that was actually positive
False Negative
Recommending specific games to a user based on their unique budget, hardware, and skill level is a complex application of this analysis category
Prescriptive Analysis
If your goals are found to be unmet during this final stage, you may be required to change your tools and restart the entire analysis process
Assess your Results
This specific unsupervised goal involves grouping similar data points together based on behavior, such as segmenting customers without predefined categories
Clustering
Despite its name, this is a statistical method used specifically for modeling the probability of binary outcomes
Logistic Regression
This metric is the harmonic mean of precision and recall; it is particularly useful for balancing both concerns in imbalanced datasets
F1 Score