Fill in the missing words:
Data Analytics specialize in gathering _____ data, and deriving ____ from it.
raw ; insights
What is the output:
number1 = 20
number2 = 35
answer = number1 + number2
print( answer)
55
What is a pie chart and what does it show?
A circular statistical graphic divided into slices to illustrate numerical proportions
What are the two categories for data?
Bonus: Name the two parts of each category. +50 pts.
1. Categorical: Ordinal or Nominal
2. Numerical: Continuous or Discrete
Solve for x:
x + 3 = 9
x = 5
What are the 3 C's of Data?
Correctness, Completeness, and Clarity
How does indentation work in Python?
Indentation in mandatory in Python and is used to define a block of code. An error will return if a programmer forgets to properly indent.
What is the difference between a histogram and a bar chart?
Histograms display the distribution of continuous numerical data using connected bars, representing frequency across intervals (bins).
Bar graphs compare distinct, categorical data using separated bars to show differences between groups
What is the mean, median, and mode?
Mean: Average of the dataset.
Median: Middle value of ordered observations.
Mode: Value that occurs most frequently.
What does SOH CAH TOA stand for/mean?
Sine = Opposite/Hypotenuse
Cosine = Adjacent/Hypotenuse
Tangent = Opposite/Adjacent
What are the 4 pillars of Data Analysis?
Descriptive, Diagnostic, Predictive, and Prescriptive.
What did the programmer do to make the change in outputs?
Output 1:
[1, 1, 2, 2, 3, 3, 4, 4, 6]
Output 2:
[1, 2, 3, 4, 6]
The programmer changed the initial list (output 1) into a set (output 2).
Which visualization will be the best to compare Guest and Subscriber based on number of trips?
User Type: Number of Trips:
Guest 23
Subscriber 102
Guest 24
Subscriber 77
... ...
A line chart.
X - axis: numbers (number of trips)
Y-axis: guest or subscriber
Line 1: guest
Line 2: subscriber
What is the formula for calculating percentiles value by hand?
(n+1) * %/100
Divide:
20 / (10/8) = ?
16
Handling missing data, outliers, duplicates, and errors are all parts of what process?
Data Cleaning
What is the _init_() function in Python?
All classes in Python have it and it's automatically executed when the class is being initiated. Used to assign values to object properties or other necessary operations when an object is being created.
Name 2 out of the 4 types of visualizations.
1. Conceptual-Declarative (idea illustration)
2. Conceptual-Exploratory (idea generation)
3. Data-Driven-Declarative (everyday data viz like charts)
4. Data-Driven-Exploratory (visual discovery)
Describer the difference between union of two events and the intersection of two events.
Union is outcomes that are either in A or B, or in both events. In other words all possible outcomes.
Intersection is the outcomes that are only both in A and in B.
If a is 110% greater then b, and b is 90% less then 47, what is the value of a?
a = 9.87
Name at least 4 of the 7 Data Analysis Methods.
1. Time Series Analysis
2. Monte Carlo Simulation
3. Cohort Analysis
4. Factor Analysis
5. Dispersion Analysis
6. Decision Trees
7. Cluster Analysis
What is the output of the following code?
def call_by_val(x):
x = x * 2
return x
def call_by_ref(b):
b.append("D")
return b
a = ["E"]
num = 6
# Call functions
updated_num = call_by_val(num)
updated_list = call_by_ref(a)
# Print after function calls
print("Updated value after call_by_val:", updated_num)
print("Updated list after call_by_ref:", updated_list)
Updated value after call_by_val: 12
Updated list after call_by_ref: ['E', 'D']
Name all 5 C's of Data Visualization.
Clarity, Conciseness, Context, Consistency, and Compelling.
Which two rules and/or laws are used in Bayes' Theorem and how?
The Multiplication Rule & The Law of Total Probability.
Multiplication Rule / The Law of Total Probability.
Find the area under the curve of y = 6x^4 + 3x^2+21 from x = 0 to x = 6.
Area = 9673.2