21 Pages Posted: 5 Apr 2010
This note explains the basics of sampling. It defines and discusses the concepts of random sampling, the law of averages, and the central limit theorem. It covers the sampling of both continuous uncertain quantities (where the sample is summarized by the sample average and sample standard deviation) and categorical variables (where the sample is summarized by the sample proportion). The note carefully explains that the results of random sampling from an infinite population are equivalent to repeated and independent outcomes of an underlying probability distribution.
Rev. Mar. 14, 2017
The word sampling probably brings to mind a large collection of items from which a small number of items will be selected and measured. We inspect units from yesterday's production and grade their quality. We poll potential voters in an upcoming election and find out how they plan to vote. We capture fish from a lake and measure their length. We study a subset of companies in an industry and summarize their financial performance. We survey customers from our universe of customers and monitor their satisfaction. In the language of sampling, the large collection of items is called the population and the smaller number of items actually selected and measured is called the sample. Because the number of items in the population can be very large, and the costs of sampling nontrivial, a complete sampling of the population (a census) is usually not economical. The challenges become how to select a useful sample and how to interpret and use the information contained in the sample, recognizing that it provides an imperfect picture of the population.
This note explains how samples behave so that we can accurately interpret the results of a sample. Our interpretation of a sample begins with an understanding of the method used to collect the sample. For the sample to reflect the population from which it was drawn, the sample must be chosen in a certain way. The most common method for collecting a sample that will accurately reflect the population is called random sampling. A random sample is one in which each item in the population has an equal chance of being included in the sample. For yesterday's production, randomness requires that we take our sample at randomly chosen times throughout the day. For the fish in the lake example, it will be very difficult to collect a random sample unless every size of fish is equally likely to be caught (a highly unlikely assumption). If the sampling is not done randomly, it is difficult if not impossible to interpret the sample results. If large fish are wiser and less likely to be caught, the fish we catch will not be a random sample of the population of fish. The lengths of the fish in our sample will thus be biased: The average length of the fish in the sample will tend to understate the average length of the fish in the lake.
In addition to samples that were collected randomly, this note will consider samples collected from very large or infinite populations. As long as the size of the sample is small relative to the size of the population, the size of the population is irrelevant to interpreting the sample results. Only if the sampling is accomplished without replacement and the sample size accounts for a notable portion of the population will the size of the population affect the interpretation of the sample results.
. . .
Suggested Citation: Suggested Citation
Here is the Coronavirus
related research on SSRN
This is a Darden A Case paper. Darden A Case charges $6.25 .
File name: UVA-QA-0513.pdf
If you wish to purchase the right to make copies of this paper for distribution to others, please select the quantity.