Statistical Methods for Detecting and Correcting Sample Selection Bias

Gary Cuddeback
College of Social Work
The University of Tennessee
128 Henson Hall
Knoxville TN 37996-3332
(865) 974-1707
FAX: (865) 974-1662
gcuddeba@utk.edu
 
Beth Wilson
College of Social Work
The University of Tennessee
128 Henson Hall
Knoxville TN 37996-3332
(865) 974-1707
FAX: (865) 974-1662
bethwilson@aol.com
 
John G. Orme
College of Social Work
The University of Tennessee
128 Henson Hall
Knoxville TN 37996-3332
(865) 974-1707
FAX: (865) 974-1662
 jorme@utk.edu
 
Terri Combs-Orme
College of Social Work
The University of Tennessee
128 Henson Hall
Knoxville TN 37996-3332
(865) 974-1707
FAX: (865) 974-1662
tcombs-orme@utk.edu
Purpose: Researchers seldom realize 100% participation for any research study. If participants and non-participants are systematically different substantive results may be biased in unknown ways, and external or internal validity may be compromised. Typically social work researchers use bivariate tests to detect selection bias (e.g., c2 to compare the race of participants and non-participants). Occasionally multiple regression methods are used (e.g., logistic regression with participation/non-participation as the dependent variable). Neither of these methods can be used to correct substantive results for selection bias. Rather, subjective judgments are made about the possible effects of selection bias on substantive results.
 
Methods: Sample selection models are a well-developed class of econometric models that can be used to detect and correct for selection bias, but these are rarely used in social work research. Data available for participants and non-participants (e.g., demographic variables) are used to model participation/non-participation (typically using binary probit multiple regression). Simultaneously, a substantive model (no different than would otherwise be tested) is estimated and corrected for selection bias. A wide variety of statistical methods are available to estimate these substantive models (e.g., linear regression, binary and multinomial logistic regression), and so sample selection models can be used to analyze almost all types of dependent variables.
 
Results: This presentation will: (1) give an overview of sample selection models; (2) illustrate selected models using data from a study of 230 foster families in which there was 70% participation; (3) compare substantive results with and without the use of sample selection models; (4) discuss computer software for estimating sample selection models; and (5) direct conference participants to additional literature in this area.

Implications: Sample selection models can help further social work research by providing researchers with methods of detecting and correcting sample selection bias.