Acronyms
Functions
BOOTSTRAPPP
Machine learning
Mystery
100

What does the p in P-value stand for?

Probability

100

What does the .split function do?

It will split the data usually a string into a list but can also be used to split the data into a training and test set using num rows

100

What is Bootstrapping?

a resampling technique used in statistics to estimate the sampling distributing of a statistic

100

What does the .split function do in this instance and which set of data has a 80% of the data and who has the other 20%? 

trainf, testf = ROH_data.split(int(0.8 * ROH_data.num_rows))

print(trainf.num_rows, 'training and', testf.num_rows, 'test instances.')

It will split the data into a 80 20 split. The train data will have 80% of the data and the other 20 is made to test and validate on.

100

What is positive correlation?

it means that 2 variables move together whether they are increasing or decreasing at the same time.

200

What does RMSE stand for?

Root Mean Standard Error

200

What does the .loc() function do? 

It lets you call a column from a dataset by the name of the column


200

Why is it better to use more samples in bootstrapping?

to show numerical stability and precision within your answer


200

What is the difference between a train and test set?

The Train data is used to build a model and help it identify patterns while the test data is used to validate the accuracy of the model on unseen values.

200

Name the 4 TA's/CA's

Mohammad, Chloe, Tyler, Max

300

What does the k stand for in k nearest neighbors?



k is the number of neighbors to consider once sort to find the closest rows/observations in the test data set

300

explain the difference between df[0] and df[7] 

Slide 1

df[0] is the name of the molecule while df[7] is the structure of df[0] is the name row.

300

What type of datasets is bootstrapping best used for

Datasets with small numbers of samples/datapoints, non-normal datasets, and/or when theoretical formulas are complex

300

Describe what RMSE does

It finds the magnitude of the observed and predicted values. It helps show how close the model you made is to the actual data.

300

What country outside of U.S.A. has Dr. Smith spent the most time?

Norway

400

what is R^2?

a goodness of fit for data

400

What are the best functions for data cleaning?

np.isnan, .where, np.unique

400

What types of data can throw off bootstrapping? 

Extreme values/outliers

400

Is it better to have a higher or lower RMSE?

Lower

400

What is NAN/Null data in a dataset?

Data that is missing from the dataset


500

What does CSV stand for 

Comma separated values.

500

what does .append do?

it is able to attach another instance to a dataset including another dataset for list

500

How many samples are enough for bootstrapping?

1,000 - 10,000

500

What does nearest neighbor mean in the context of a a training data set?



The rows/observations which are closest in terms of values of attributes

500

what does P value?

measures the strength of evidence against a null hypothesis, indicating the probability of obtaining results at least as extreme as those observed if the null hypothesis is true

M
e
n
u