# StatConcepts: A Visual Tour of Statistical Ideas H. Joseph Newton and Jane L. Harvill

Published April, 1997 by Duxbury Press

Click here to see Henrik Schmiediche's JAVA version of the random sampling lab in StatConcepts.

### From the Preface:

Most introductory statistics courses consist of three parts: 1) Descriptive statistics; using numbers and graphs to summarize the information about a data set, 2) Inferential statistics; making conclusions about numerical characteristics of entire populations of objects from those of samples from the populations, and 3) Statistical concepts; the basic logical and mathematical ideas underpinning descriptive and inferential statistics.

There are a wide variety of computer programs that make it easy for students to accomplish what is required for the first two of these parts, while there is very little software for illustrating statistical concepts. That's why we wrote StatConcepts; a set of ``laboratories'' for illustrating ideas.

StatConcepts is actually a collection of programs written in the language of StataQuest, which is a student version of a program called Stata which is designed to do descriptive and inferential statistics.

StatConcepts is not intended as a text, but as a supplement to the many introductory statistics texts that exist. Its main focus is on correct intrepretation and understanding of statistical concepts, terminology, and results and not on computation for a given problem, although there are some labs that allow students to compute results.

In many ways, the computer is the laboratory for the science of statistics. Most of the ideas of statistics start out with the phrase ``If we did this procedure over and over again, then this is what we would see.'' The only way to realistically do things over and over again is on a computer. In these labs we have tried to use graphics to show what in fact we would see if we did various things over and over again.

We assume that instructors will not incorporate all of the labs in the StatConcepts collection (there are 28 of them!) into a course, but rather pick and choose those they feel would be most useful in the course (and that they have time to cover in their already cramped schedule).

We would hope that instructors can show the labs to the students using some kind of projection, but each chapter of this book contains a ``guided tour'' through each lab that a student could read while at a computer. These guided tours cannot totally replace an instructor but they can certainly help instructors use the labs as a supplement to their course.

While the labs and this book is intended primarily for introductory courses, we have found them very valuable in courses at all levels. The level has been kept as nontechnical as possible, but more advanced students will be able to relate to the graphs and descriptions at a more mathematical level.

#### Overview of the Labs

There are 28 labs in the complete collection although there are fewer items on the Labs menu as some items have submenus containing more than one lab. There is a chapter in this book for each item on the Labs menu, including:
1. Introduction to Concept Labs: This lab is actually just a greeting and an invitation to look at a help file giving an overview of the entire collection of labs. It also allows the user to specify their own random number generator seed (see Chapter 1).
2. Random Sampling Lab : This lab repeatedly shows random sampling without replacement from a population of 100 boxes. It also previews the ideas of sampling distributions and the central limit theorem. This Java Applet was written by Henrik Schmiediche of the Department of Statistics of Texas A&M University.
3. Relative Frequency and Probability: This lab again illustrates random sampling without replacement using the example of a lottery game. It also illustrates the relative frequency interpretation of probability by repeatedly drawing six winning numbers from the numbers 1 through 50 and keeping track of the number of draws containing at least two consecutive numbers. Deriving the formula for the probability of this event is beyond the scope of most courses.
4. How are Populations Distributed?: This lab shows students that distributions come in all shapes and sizes and come in parametric families. It graphs densities from 14 different families and also generates random samples from one member of each family and superimposes the density on the histogram of the sample, thus illustrating variability from one sample to another.
5. Sampling from 0-1 Populations: This item actually leads to four different labs:
1. Sampling With and Without Replacement: The binomial and hypergeometric distributions are illustrated by having the user specify the number of elements in a 0-1 population, the proportion of 1's, and the size of a sample, and then superimposing the probability plot of the number of 1's in the sample under the sampling with and without replacement conditions.
2. The Negative Binomial Distribution: This lab graphs the negative binomial distribution for user-specified values of the parameters.
3. Poisson Approximation to Binomial: This lab superimposes the binomial distribution and its poisson approximation for user specified values of the parameters. It makes it easy to see when the Poisson approximation works well and when it doesn't.
4. Normal Approximation to Binomial: This lab superimposes the binomial distribution and its normal approximation for user specified values of the parameters. It makes it easy to see when the normal approximation works well and when it doesn't.
6. Bivariate Descriptive Statistics: This item leads to three different labs:
1. Scatterplots I: This lab shows scatterplots of random samples from a bivariate normal population for 20 different values of the correlation coefficient ranging from -0.9 to 0.9.
2. Scatterplots II: This lab allows the user to generate scatterplots for any sample size and any population correlation coefficient.
3. Least Squares: This lab allows the user to generate a wide variety of different scatterplots and then see the true line, the least squares line, and the vertical errors that go into the residual sum of squares.
7. Central Limit Theorem: This lab illustrates sample means for repeated sampling from a user specified choice of four parent populations (normal, exponential, uniform, and 0-1) and is actually two labs in one:
1. One-at-a-time: One sample at a time, boxes corresponding to sample means are placed above an axis until the tallest column of boxes fills the graph.
2. 500 Samples: The histogram of the sample means for 500 samples is drawn with the approximating normal curve superimposed.
8. Z, t, Chi-square, F: This item leads to six labs:
1. Critical Values: This lab graphs rejection regions for one and two tailed tests for any of Z, t, Chi-square, or F for user specified significance level and, if necessary, degrees of freedom.
2. Normal Curves: This lab starts by drawing the standard normal curve and then the user can repeatedly change the mean and/or variance and each time the lab draws the new normal curve on the same axes.
3. Chi-square Curves: This lab starts by drawing the Chi-square curve with 10 degrees of freedom and then the user can repeatedly change the degrees of freedom and each time the lab draws the new Chi-square curve on the same axes.
4. F Curves: This lab starts by drawing the F curve with 10 and 10 degrees of freedom and then the user can repeatedly change the degrees of freedom and each time the lab draws the new F curve on the same axes.
5. t Converging to Z: This lab allows the user to superimpose any part of the Z curve and the same part of the t curve for increasing degrees of freedom.
6. Normal Approximation to Binomial: This is the same lab as the one under the Sampling From 0-1 Populations Lab.
9. Sampling Distributions: This lab allows the user to generate 500 samples (or pairs of samples) of user specified size from one of three parent populations (normal, uniform, and exponential) and calculate the Z, one or two sample t, Chi-square, or F statistics and then superimpose the histogram of the 500 statistics and the theoretical normal theory curve. It also displays the percentiles of the 500 statistics and the theoretical curve to see the agreement (disagreement) of the two if assumptions are (are not) met.
10. Minimum Variance Estimation: This lab allows the user to generate 500 samples of user specified size from one of four parent populations (N(0,1), U(-0.5,0.5), t with three degrees of freedom, and Laplace), each symmetric about zero, and then draw the histograms of the 500 sample means and 500 sample medians. The sample mean and standard deviations of the 500 means and 500 medians are also displayed. The lab shows that the sample mean is not always the best estimator.
11. Calculating Confidence Intervals: This lab allows the user to calculate confidence intervals for the 11 different one and two sample inference situations for means, variances, and proportion problems usually covered in an introductory course.
12. Interpreting Confidence Intervals: This lab allows the user to generate 50, 100, or 150 samples (or pairs of samples) of user specified size from one of four parent populations (normal, uniform, exponential, and 0-1) and draw horizontal lines for the confidence intervals (for user specified significance level) for user specified parameter as well as a vertical line representing the true value of the parameter. This allows the user to see the effect of changing significance level and sample size on the width of intervals as well as the effect of violation of assumptions on the confidence interval coverage probability.
13. Calculating Tests of Hypotheses: This lab allows the user to calculate test statistics and p-values for the same 11 situations as in the Calculating Confidence Intervals Lab. It draws a graph with the test statistic marked and the tail areas corresponding to the p-value shaded in.
14. Tests of Significance: This lab draws a graph of a user specified Z, t, Chi-square, or F curve and shades in the area corresponding to the p-value for either a user specified or lab generated value of a test statistic.
15. Level of Significance of a Test: This lab allows the user to generate 500 samples (or pairs of samples) of user specified size from one of three parent populations (normal, uniform, and exponential) and calculate the Z, one or two sample t, Chi-square, or F statistics and then superimpose the histogram of the 500 statistics and the theoretical normal theory curve. It shades in the area under the curve for the rejection region (for the user specified significance level and one or two tailed test) and displays the proportion of times the null hypothesis is rejected, thus showing the agreement (disagreement) with the value of significance level if assumptions are (are not) met.
16. Power of a Test: This lab is the same as the Level of Significance of a Test Lab except now the user can specify the degree to which the null hyopthesis is actually false (including actually being true). This allows the user to see the effect of sample size and degree of falseness on the power of the test.
17. Between and Within Variation: This lab starts by generating and graphing two sets of four samples, each of user specified size. The first set of four samples are from normal populations having differing means, while in the second set the population means are all the same. All populations have the same variance. Then the lab allows the user to repeatedly change the population variance, each time redrawing the plot and displaying the p-value of the One Way ANOVA test of equality of means. This allows the user to see how the test is comparing the between sample variability to the within sample variability.
18. Calculating One-Way ANOVA: This lab allows the user to enter sample sizes, means, and variances (or standard deviations) for specified number of samples and then displays the ANOVA table.
19. Chi-square Goodness of Fit Test: This lab illustrates the Chi-square goodness of fit by generating a user specified number of points in a square and then placing a grid of (user specified number of) boxes on the points, counting how many are in each box, and then displaying the p-value of the resulting Chi-square test. This allows users to see how nonuniform the placement of points can look even though they are in fact being uniformly placed.