What does this code do?
pandas.read_csv('some_data.csv')
Creates a dataframe of some data in CSV format.
What does this represent?
Σ ((observed - expected)2 / (expected))
--------------------------------------------------------
Sum of the squared observed values minus expected mean value divided by expected mean value
A Chi-Squared test
How do you get the mean of a population without using imported package modules ( .mean() )?
hint - use Python functions
sum(population) / len(population)
How do you check the datatypes within a pandas dataframe?
pandas.DataFrame.dtypes
Measure of the spread of a population or sample of numbers.
Symbols - σ (population) or s (sample)
What is Standard Deviation?
Name the type of variable here:
Boolean values (True/False; Yes/No; 1/0)
Discrete
What is does this represent?
x̄ ± t(s/√ n)
------------------------------------------------------------
sample mean +/- t_statistic * (standard_deviation / square root of number of samples)
A Confidence Interval
What is the code to perform a 1-sample t-test using Scipy?
- include common parameters
scipy.stats.ttest_1samp(array, expected_mean, axis=0, nan_policy='propagate')
How do you check for the number of null values in a dataframe?
pandas.DataFrame.isna().sum()
It's a visual that displays the frequency distribution of variables.
What is a Contingency Table?
------------------------------------
What is a Cross Tabulation?
Name the type of variable here:
All real numbers
Continuous
What does this represent?
P(A|B) = P(B|A)P(A) / P(B)
-------------------------------------------------------------
Probability of (A, given B) = (Probability of (B, given A) * Probability of (A)) / (Probability of (B))
Bayes' Theorem
What is the code for a 2-sample t-test using Scipy?
- include common parameters (not defaults)
scipy.stats.ttest_ind(array_1, array_2, axis=0, equal_var=True, nan_policy='propagate')
What package and module do you use in this scenario to replace non-NaN values?
pandas.DataFrame.replace('?', _______)
numpy.NaN
It's a number that represents the likelihood of obtaining a test result at least as extreme as the results actually obtained assuming the null hypothesis is correct.
What is a P-value?
What does this code return?
df[df['year']==2017]
Returns a dataframe where each observation in the df['time'] column is equal to 2017
What does this particular equation represent?
σ 2 = (Σ (x - μ )2) / N
-------------------------------------------
(sigma)2 = (Sum of (observation - mean of all observations)2) / (Total number of observations)
Population Variance
When making a contingency table, what two lines of code do I need to grab the vertical and horizontal 'All' values assuming the table is 7 x 7 (excluding variable names)?
hint - start with:
contingency_table. ........
row_sums = contingency_table.iloc[0:6, 6]
grabs the 0th, 1st, 2nd, 3rd, 4th, and 5th row values from the 6th column (starting from 0)
col_sums = contingency_table.iloc[6, 0:6]
grabs the 0th, 1st, 2nd, 3rd, 4th, and 5th column values from the 6th row (starting from 0)
You notice when using pandas.read_csv() that there are no column headers. Which parameter do you use to insert column headers?
pandas.read_csv('.csv', names=['list'])
In the case of deciding whether a six-sided die is fair, it's what you do if an obtained p-value is lower than the previously set confidence level.
What is 'Reject the Null Hypothesis that it is a fair six-sided die and suggest the alternative that it is an unfair die'
What does this code show you?
df.describe(exclude='number')
Non-numeric statistics of a dataset
What does this represent?
P(A|B) = P(A∩B) / P(B)
-----------------------------------------------------------
Probability of (A, given B) = Probability of (similar elements of both A and B) / Probability of (B)
Law of Total Probability
What code is used to perform a Chi-Squared test using Scipy?
- Include Parameters
scipy.stats.chisquare(observed, expected)
What code do you use to create a contingency table using two categorical variables?
hint - use Pandas
pandas.crosstab(df['column_1'], df['column_2'], margins=True)
You use it to determine whether there is a statistically significant difference between the expected and observed frequencies in one or more categories.
What is a Chi-Squared test?