1.1-anaylzing one variable data!
1.1 the science and art of data
displaying categorial data
1.6 measuring center
box plots and outliers.
100

What is the mean of the dataset and what does it tell us?

the numbers are 10, 12, 15, 18, 20, 25, 30  

Mean=710+12+15+18+20+25+30 divided by 7 =130/7 the mean is 18.57!

100

How does the method of data collection affect the reliability and validity of the data?

valilidty of measurement need to be balanced bad or poor data collection models can lead too unreliable or biased conditions

100

What is the most common way to display categorical data, and why?

Bar charts are the most common way there the most easiest and simplest.

100

A student receives the following scores on five math quizzes: 85, 92, 78, 95, and 88. What is the mean score for these quizzes?

so you add all the values together then divide so,438 divided by 5 which equalls 87.6

100

What is a boxplot, and what does it represent?

its a graphical representation of the distrobution of any dataset.

200

What is the median of the dataset and how does it compare to the mean?

10, 12, 15, 18, 20, 25, 30]  

18- theres 7 numbers and the 4th is the median.

200

How does data visualization act as both a science and an art?

both visulations and just the pleasing affect is all as one.

200

How do pie charts differ from bar charts in displaying categorical data, and when might you prefer one over the other?

the pie charts display a vivid proportion of the data listed which gives multiple options on how to tkae the info in.

200

Find the median of the following data set, which shows the number of books read by a book club's members over one year: 5, 12, 6, 18, 9, 11, 15, 10.

arrange the numbers, knowing that its even. the middle numbers are 10 and 11. then calculate the average which is 10.5 so tge median is 10.5.

200

What are the quartiles in a boxplot, and how are they calculated?

it divides quartiles into 4 equal parts.

300

What is the standard deviation, and what does it tell us about the data? 

10, 12, 15, 18, 20, 25, 30

first find the mean,then squared diffrences,variance then your left with 48.88 the satndard deviation is squared 47.88 = 6.92

300

Why is data cleaning considered the most important step in the data analysis process?

It’s also an art to know when to impute missing values (fill them in) and when to remove certain outliers that may skew your analysis. Each decision requires careful judgment based on context!!

300

What is a contingency table, and how does it help in understanding relationships between categorical variables? 

A contingency table (or cross-tabulation) that just shows frequency distrobutionn or two.

300

How is mean calculated?

mean= sum of values divided by number of values

300

What defines an outlier in a boxplot?

anything out side of the "whiskers" especially if a number has an absurd diffrent amount rather than others.

400

Are there any outliers in the dataset? 

10, 12, 15, 18, 20, 25, 30

there are no outliers!

400

How do data scientists balance the objective nature of data with the subjective interpretation of that data?

data is just all numbers but when it comes to other factors its very important to keep everything together and subjective.

400

How can you use a stacked bar chart to display categorical data with multiple subgroups?

A stacked bar chart displays categorical data along with subcategories by stacking sections within each bar.

400

What is the median, and when is it preferred over the mean?

when the line is outlied or skewed.

400

How do you interpret the whiskers in a boxplot?

They are interpreted from the mininum and maxinum values represented.

500

What is the skewness of the dataset? Is it symmetric, positively, or negatively skewed? 

10, 12, 15, 18, 20, 25, 30

the mean is greater than the median so its a slight skew to the right.

500

How can data models be both accurate and relevant, especially when working with real-world phenomena that are dynamic and unpredictable?

Data models are simplifications of real-world situations and issues and solutions. there both into accuracy and acountabiloity.

500

  What are the two primary types of categorical data, and how does the choice of display depend on the type?

The two main types of categorical data are nominal and ordinal.

500

What is the mode, and can a dataset have more than one mode?

The mode is the value that appears most frequently in a dataset.

500

How do boxplots help in identifying skewness in a dataset?

By looking at the median position and and distrobution of quartiles.

M
e
n
u