What is the mean of the dataset and what does it tell us?
the numbers are 10, 12, 15, 18, 20, 25, 30
Mean=710+12+15+18+20+25+30 divided by 7 =130/7 the mean is 18.57!
How does the method of data collection affect the reliability and validity of the data?
valilidty of measurement need to be balanced bad or poor data collection models can lead too unreliable or biased conditions
What is the most common way to display categorical data, and why?
Bar charts are the most common way there the most easiest and simplest.
A student receives the following scores on five math quizzes: 85, 92, 78, 95, and 88. What is the mean score for these quizzes?
so you add all the values together then divide so,438 divided by 5 which equalls 87.6
What is a boxplot, and what does it represent?
its a graphical representation of the distrobution of any dataset.
What is the median of the dataset and how does it compare to the mean?
10, 12, 15, 18, 20, 25, 30]
18- theres 7 numbers and the 4th is the median.
How does data visualization act as both a science and an art?
both visulations and just the pleasing affect is all as one.
How do pie charts differ from bar charts in displaying categorical data, and when might you prefer one over the other?
the pie charts display a vivid proportion of the data listed which gives multiple options on how to tkae the info in.
Find the median of the following data set, which shows the number of books read by a book club's members over one year: 5, 12, 6, 18, 9, 11, 15, 10.
arrange the numbers, knowing that its even. the middle numbers are 10 and 11. then calculate the average which is 10.5 so tge median is 10.5.
What are the quartiles in a boxplot, and how are they calculated?
it divides quartiles into 4 equal parts.
What is the standard deviation, and what does it tell us about the data?
10, 12, 15, 18, 20, 25, 30
first find the mean,then squared diffrences,variance then your left with 48.88 the satndard deviation is squared 47.88 = 6.92
Why is data cleaning considered the most important step in the data analysis process?
It’s also an art to know when to impute missing values (fill them in) and when to remove certain outliers that may skew your analysis. Each decision requires careful judgment based on context!!
What is a contingency table, and how does it help in understanding relationships between categorical variables?
A contingency table (or cross-tabulation) that just shows frequency distrobutionn or two.
How is mean calculated?
mean= sum of values divided by number of values
What defines an outlier in a boxplot?
anything out side of the "whiskers" especially if a number has an absurd diffrent amount rather than others.
Are there any outliers in the dataset?
10, 12, 15, 18, 20, 25, 30
there are no outliers!
How do data scientists balance the objective nature of data with the subjective interpretation of that data?
data is just all numbers but when it comes to other factors its very important to keep everything together and subjective.
How can you use a stacked bar chart to display categorical data with multiple subgroups?
A stacked bar chart displays categorical data along with subcategories by stacking sections within each bar.
What is the median, and when is it preferred over the mean?
when the line is outlied or skewed.
How do you interpret the whiskers in a boxplot?
They are interpreted from the mininum and maxinum values represented.
What is the skewness of the dataset? Is it symmetric, positively, or negatively skewed?
10, 12, 15, 18, 20, 25, 30
the mean is greater than the median so its a slight skew to the right.
How can data models be both accurate and relevant, especially when working with real-world phenomena that are dynamic and unpredictable?
Data models are simplifications of real-world situations and issues and solutions. there both into accuracy and acountabiloity.
What are the two primary types of categorical data, and how does the choice of display depend on the type?
The two main types of categorical data are nominal and ordinal.
What is the mode, and can a dataset have more than one mode?
The mode is the value that appears most frequently in a dataset.
How do boxplots help in identifying skewness in a dataset?
By looking at the median position and and distrobution of quartiles.