# Handbook of Descriptive Statistics/Introduction

A distribution is a collection of measurements of a single phenomenon. If the number of responses is small, just listing them all may be adequate description. In this case, no summarization or data reduction is needed. However, if the number of measurements is large, a full list may fail as a communication or analysis tool.

Happily, distributions can be summarized. Some summaries are very brief and provide just a little description. For instance, the mean is a one number summary that captures just one aspect of a set of numbers. There are six aspects of a distribution that may warrant consideration when summarizing.

## Sample size

The number of subjects or items measured is often of fundamental interest. For instance, we might be examing a distribution of the heights of college students containing measurements on 1,234 subjects.

## Scale and precision

The type of data you are dealing with (continuous, catgeorical, ordinal, etc.) influences many of the choices you must make about how to describe and analyze your data. The units of measure (inches, kilos, %, mmol, drachmas per acre of corn, etc.) should be noted. For our example, the data were recorded in inches to the nearest 0.1 inch. In other words, measurements were rounded to the nearest tenth of an inch before recording. For categorical data, the "scale" is just the names of the categories. If we also recorded the gender of the students, we might have three categories: "Male," "Female" and "Unknown."

## Central tendency

Along the scale, about where do the data lie? Theoretically, adult human height is measured on a scale that goes on to infinity. However, most of the measurements we will observe center around a value of 68 inches (5'-8"). There are a large variety of ways to describe the central tendency. For continuous data, the mean (or average) is often calculated. But the mean has limitations and other measures of central tendency are useful: median, geometric mean, mode, etc.