Social Research Methods/Statistical Analysis

Introduction

• Statistics is the applied branch of mathematics especially appropriate for a variety of research analysis

Descriptive Statistics

• Descriptive statistis are used to summarize data under study. Some descriptive statistics summarize the distribution of attributes on a single variable; others summarize the associations between variables.

• Descriptive statistics summarizing the relationships between variables are called measures of association.

• Many measures of association are based on a proportionate reduction of error (PRE) model. This model is based on a comparison of 1. the numbers of errors we would make in attempting to guess the attributes of a given variable for each of the cases under study - if we knew nothing but the distribution of attributes on that variable - and 2. the number of errors we would make if we knew the joint distribution overall and were told for each case the attribute o one variable each time we were asked to guess the attribute of the other. These measures include lambda, which is appropriate for the analysis of two nominal variables; gamma, which is appropriate for the analysis of two ordinal variables; and Pearson's product-moment correlation, which is appropriate for the analysis of two interval or ratio variables.

• Regression analysis represents the relationships between variables in the form of equations, which can be used to predict the values of a dependent variable on the basis of values of one or more independent variables

• Regression equations are computed on the basis of a regression line: the geometric line representing, with the least amount of discrepancy, the actual location of points in a scattergram.

• Types of regression analysis include linear regression analysis, multiple regression analysis, partial regression analysis, and curvilinear regression analysis.

Inferential Statistics

• Inferential statistics are used to estimate the generalizability of findings arrived at through the analysis of a sampling to the larger population from which the sample has been selected. Some inferential statistics estimate the single-variable characteristics of the population; others - tests of statistical significance - estimate the relationships between variables in the population.

• Inferences about some characteristic of population must indicate a confidence interval and a confidence level. Computations of confidence levels and intervals are based on a probability theory and assume that conventional probability-sampling techniques have been employed in the study.

• Inferences about the generalizability, to a population, of the associations discovered between variables in a sample involve tests of statistical significance, which estimate the likelihood that an association as large as the observed one could result from normal sampling error if no such association exists between the variables in the larger population. Tests of statistical significance are also based on probability theory and assume that conventional probability-sampling techniques have been employed in the study.

• The level of significance of an observed association is reported in the form of the probability that the association could have been produced merely by sampling error. To say that an association is significant at the .05 level is to say that an association is large as the observed one could not be expected o result from sampling error more than 5 times our of 100.

• Social researchers tend to use a particular set of levels of significance in connection with tests of statistical significance: .05, .01 and .001. This is merely a convention, however.

•A frequently used test of statistical significance in tabular data is chi-sqaure.

• The t-test is a frequently used test of statistical significance for comparing means.

• Statistical significance must not be confused with substantial significance, the latter meaning that an observed association is strong, important, meaningful, or worth writing home to your mother about.

• Tests of statistical significance, strictly speaking, make assumptions about data and methods that are almost never satisfied completely by real social research. Despite this, the tests can serve a useful function in the analysis and interpretation of data.

Other Multivariate Techniques

• Path analysis is a method of presenting graphically the networks of causal relationships among several variables. It illustrates the primary "paths" of variables through which independent variables cause dependent ones. Path coefficients that represent the partial relationships between variables.

• Time-series analysis is an analysis of changes in a variable (such as crime rates) over time.

• Factor analysis, feasible only with a computer, is an analytic method of discovering the general dimensions represented by a collection of actual variables. These general dimensions, or factors, are calculated hypothetical dimensions that are not perfectly represented by any of the empirical variables under study but are highly associated with groups of empirical variables. A factor loading indicates the degree of association between a given empirical variable and a given factor.

• Analysis of variance (ANOVA) is based on comparing variations between and within groups and determining whether between-group differences could reasonably have occurred in simple random sampling or whether they likely represent a genuine relationship between the variables involved.

• Discriminant analysis seeks to account for variation in some dependent variable. It results in an equation that scored people on the basis of that hypothetical dimensions and allows us to predict their values on the dependent variable.

• Log-linear models offer a method for analyzing complex relationships among several nominal variables having more than two attributes each.

• Geographic Information Systems (GIS) map quantitative data that describe geographic unites for a graphic display.

Key Terms that are important for understanding statistical analyses.

• Analysis of variance (ANOVA): Method of analysis in which cases under study are combined into groups representing an independent variable, and the extent to which the groups differ from one another is analyzed in terms of some dependent variable. Then, the extent to which the groups differ is compared with the standard of random distribution.

• Curvilinear regression analysis: A form of regression analysis that allows relationships among variables to be expressed with curved geometric lines instead of straight ones.

• Descriptive statistics: Statistical computation describing either the characteristics of a sample or the relationship among variables in a sample. Descriptive statistics merely summarize a set of sample observations, whereas inferential statistics move beyond the description of specific observations to make inferences about the larger population from which the sample observations were drawn.

• Discriminant analysis: Method of analysis similar to multiple regression, except that dependent variable can be nominal.

• Factor analysis: A complex algebraic method for determining the general dimensions or factors that exist within a set of concrete observations.

• Geographic Information Systems (GIS): Analytic technique in which researchers map quantitative data that describe geographic units in a graphic display.

• Inferential statistics: The body of statistical computations relevant to making inferences from findings based on sample observations to some larger populations.

• Level of significance: In the context of tests of statistical significance, the degree of likelihood that an observed, empirical relationship could be attributable to sample error. A relationship is significant at the .05 level if the likelihood of its being only a function of sampling error is no greater than 5 out of 100.

• Linear regression analysis: A form of statistical analysis that seeks the equation for the straight line that best describes the relationship between two ratio variables.

• Log-linear analysis: Data-analysis technique based on specifying models that describe the interrelationships among variables and then comparing expected and observed table-cell frequencies.

• Multiple regression analysis: A form of statistical analysis that seeks the equation representing the impact of two or more independent variables on a single dependent variable.

• Nonsampling error: Those imperfections of data quality that are a result of factors other than sampling error. Exampling include misunderstandings of questions by respondents, erroneous recordings by interviewers and coders, and keypunch errors.

• Partial regression analysis: A form of regression analysis in which the effects of one or more variables are held constant, similar to the logic of the elaboration model.

• Path analysis: A form of multivariate analysis in which the causal relationships among variables are presented in a graphic format.

• Proportionate reduction of error (PRE): A logical model for assessing the strength of a relationship by asking how much knowing values on one variable would reduce our errors in guessing values on the other. For example, if we know how much education people have, we can improve our ability to estimate how much they earn, thus indicating there is a relationship between the two variables.

• Regression analysis: A method of data analysis in which the relationships among variables are represented in the form of an equation, called a regression equation.

• Statistical significance: A general term referring to the likelihood that relationships observed in a sample could be attributed to sampling error alone.

• Tests of statistical significance: A class of statistical computations that indicate the likelihood that the relationship observed between variables in a sample can be attributed to sampling error only.

• Time-series analysis: An analysis of changes in a variable (such as crime rates) over time.