Social Statistics: Introduction

Social statistics is the application of statistical methods to social science data. Social statistics uses the same mathematical tools as any other form of statistical analyses but it uses them in different ways that take into account the distinct features of social science data. Nearly all data used in the social sciences are observational, not experimental, which means that social scientists must use statistical methods to control for outside influences on the relationships of interest, since experimental controls are not available. The observational character of social science data also makes it difficult in most circumstances to perform formal hypothesis tests, since hypotheses are almost always conditioned on prior knowledge of the data to be used in testing them. These and other challenges make the practice of social statistics different from the practice of statistics in the physical, biological, and psychological sciences.

This book presents and explains the use of conventional statistical techniques in the social sciences. Instead of following the standard approach used in social statistics textbooks, this book focuses on statistical concepts and techniques as they are actually used in the social sciences. Thus for example the very commonly used linear regression model is introduced early, while mathematically simpler but less commonly used means and standard deviations are introduced later as logical derivatives of regression. A conceptual approach is taken throughout the book. The use of mathematics is kept to a minimum. Mathematical definitions for all of the techniques used in this book are readily available on the Internet, including Wikipedia.

Chapters 1–4 of this book lay out the basic model that underlies nearly all contemporary social statistics: the linear regression model. The linear regression model expresses a dependent variable (Y) as the linear function of an independent variable (X) plus error. Linear regression is especially important in the social sciences because nearly all advanced models used in the social science literatures are elaborations of the simple linear regression model. Even univariate descriptive statistics like the mean and standard deviation can be understood as deriving from the special degenerate case of linear regression with no independent variables. Though regression is difficult to explain mathematically to students who have no knowledge of matrix algebra, it is very easy to explain graphically. Thus this book takes a graphical approach to understanding linear regression and, by extension, univariate statistics.

Chapters 5–8 of this book expand the simple linear regression model with a single independent variable into the multiple linear regression model with multiple independent variables. Multiple linear regression is introduced as a method for including statistical controls in social science settings where it is impossible to impose experimental controls. Thus multiple regression is tied to the problem of inference in statistical models: the desire to infer something about the world outside the data being analyzed. A key tool for statistical inference is the t statistic, to which an entire chapter is devoted. The t statistic is used in determining the statistical significance of a given regression coefficient. In the multiple regression framework t statistics can also be used to compare the relative significance of different independent variables. Another tool for these comparisons, the standardized regression coefficient, is also given its own chapter.

Chapters 9–11 of this book focus on different model configurations that can be estimated using the same basic tool of multiple linear regression. These chapters apply the statistics introduced in Chapters 1–8 to specific, very commonly used types of regression models, including ANOVA and interaction models. Future chapters of this book may introduce other model types as well.

Chapter 1 →