Social Statistics: Key Terms

This is a list of the key terms from each chapter, for sake of convenience.

Chapter 1

Conceptualization is the process of developing a theory about some aspect of the social world.
Cases are the individuals or entities about which data have been collected.
Databases are arrangements of data into variables and cases.
Dependent variables are variables that are thought to depend on other variables in a model.
Generalization is the act of turning theories about specific situations into theories that apply to many situations.
Independent variables are variables that are thought to cause the dependent variables in a model.
Metadata are additional attributes of cases that are not meant to be included in analyses.
Operationalization is the process of turning a social theory into specific hypotheses about real data.
Scatter plots are very simple statistical models that depict data on a graph.
Statistical models are mathematical simplifications of the real world.
Variables are analytically meaningful attributes of cases.

Chapter 2

Expected values are the values that a dependent variable would be expected to have based solely on values of the independent variable.
Linear regression models are statistical models in which expected values of the dependent variable are thought to rise or fall in a straight line according to values of the independent variable.
Outliers are data points in a statistical model that are far away from most of the other data points.
Regression error is the degree to which an expected value of a dependent variable in a linear regression model differs from its actual value.
Robustness is the extent to which statistical models give similar results despite changes in operationalization.
Slope is the change in the expected value of the dependent variable divided by the change in the value of the independent variable.

Chapter 3

Extrapolation is the process of using a regression model to compute predicted values inside the range of the observed data.
Intercepts are the places where regression lines cross the dependent variable axis in a scatter plot.
Interpolation is the process of using a regression model to compute predicted values inside the range of the observed data.
Predicted values are expected values of a dependent variable that correspond to selected values of the independent variable.
Regression coefficients are the slopes and intercepts that define regression lines.

Chapter 4

Conditional means are the expected values of dependent variables for specific groups of cases.
Degrees of freedom are the number of errors in a model that are actually free to vary.
Mean models are very simple statistical models in which a variable has just one expected value, its mean.
Means are the expected values of variables.
Parameters are the figures associated with statistical models, like means and regression coefficients.
Regression error standard deviation is a measure of the amount of spread in the error in a regression model.
Standard deviation is a measure of the amount of spread in a variable, which is the same thing as the amount of spread in the error in a mean model.

Chapter 5

Case-specific error is error resulting from any of the millions of influences and experiences that may cause a specific case to have a value that is different from its expected value.
Descriptive statistics is the use of statistics to describe the data we actually have in hand.
Inferential statistics is the use of statistics to make conclusions about characteristics of the real world underlying our data.
Measurement error is error resulting from accidents, mistakes, or misunderstandings in the measurement of a variable.
Observed parameters are the actually observed values of parameters like means, intercepts, and slopes based on the data we actually have in hand.
Sampling error is error resulting from the random chance of which research subjects are included in a sample.
Standard error is a measure of the amount of error associated with an observed parameter.
True parameters are the true values of parameters like means, intercepts, and slopes based on the real (but unobserved) characteristics of the world.

Chapter 6

Paired samples are databases in which each case represents two linked observations.
Statistical significance is when a statistical result is so large that is unlikely to have occurred just by chance.
Substantive significance is when a statistical result is large enough to be meaningful in the view of the researcher and society at large.
t statistics are measures based on observed parameters that are used to make specific inferences about the probabilities of true parameters.

Chapter 7

Complementary controls are control variables that complement an independent variable of interest by unmasking its explanatory power in a multiple regression model.
Competing controls are control variables that compete with an independent variable of interest by splitting its explanatory power in a multiple regression model.
Control variables are variables that are "held constant" in a multiple regression analysis in order to highlight the effect of a particular independent variable of interest.
Multicausal models are statistical models that have one dependent variable but two or more independent variables.
Multiple linear regression models are statistical models in which expected values of the dependent variable are thought to rise or fall in a straight lines according to values of two or more independent variables.
Predictors are the independent variables in regression models.

Chapter 8

Correlation (r) is a measure of the strength of the relationship between two variables that runs from r = −1 (perfect negative correlation) through r = 0 (no correlation) to r = +1 (perfect positive correlation).
R2 is a measure of the proportion of the total variability in the dependent variable that is explained by a regression model.
Standardized coefficients are the coefficients of regression models that have been estimated using standardized variables.
Standardized variables are variables that have been transformed by subtracting the mean from every observed value and then dividing by the standard deviation.
Unstandardized coefficients are the coefficients of regression models that have been estimated using original unstandardized variables.
Unstandardized variables are variables that are expressed in their original units.

Chapter 9

Base models are initial models that include all of the background independent variables in an analysis that are not of particular theoretical interest for a regression analysis.
Confounding variables are variables that might affect both the dependent variable and an independent variable of interest.
Explanatory models are regression models that are primarily intended to be used for evaluating different theories for explaining the differences between cases in their values of the dependent variable.
Parsimony is the virtue of using simple models that are easy to understand and interpret.
Predictive models are regression models that are primarily intended to be used for making predictions about dependent variables as outcomes.
Saturated models are final models that include all of the variables used in a series of models in an analysis.

Chapter 10

Analysis of variance (ANOVA) is a type of regression model that focuses on the proportion of the total variability in a dependent variable that is explained by a categorical variable.
ANOVA variables are the numerical variables in a regression model that together describe the effects of categorical group memberships.
Categorical variables are variables that divide cases into two or more groups.
Mixed models are regression models that include both ANOVA components and ordinary independent variables.
Numerical variables are variables that take numerical values that represent meaningful orderings of the cases from lower numbers to highest numbers.
Reference groups are the groups that are set aside in ANOVA variables and not explicitly included as variables in ANOVA models.

Chapter 11

Interaction effects are the coefficients of the interaction variables in an interaction model.
Interaction models are regression models that allow the slopes of some variables to differ for different categorical groups.
Interaction variables are variables created by multiplying an ANOVA variable by an independent variable of interest.
Intercept effects are the coefficients of the ANOVA variables in an interaction model.
Main effects are the coefficients of the independent variable of interest in an interaction model for the reference group.

← Chapter 11 · Noted Contributors →