image image
Media

Search Results


Below results based on the criteria 'missing data'
Total number of records returned: 16

1
Paper
Not Asked and Not Answered: Multiple Imputation for Multiple Surveys
Gelman, Andrew
King, Gary
Liu, Chuanhai

Uploaded 10-27-1997
Keywords Bayesian inference
cluster sampling
diagnostics
hierarchical models
ignorable nonresponse
missing data
political science
sample surveys
stratified sampling
multiple imputation
Abstract We present a method of analyzing a series of independent cross-sectional surveys in which some questions are not answered in some surveys and some respondents do not answer some of the questions posed. The method is also applicable to a single survey in which different questions are asked, or different sampling methods used, in different strata or clusters. Our method involves multiply-imputing the missing items and questions by adding to existing methods of imputation designed for single surveys a hierarchical regression model that allows covariates at the individual and survey levels. Information from survey weights is exploited by including in the analysis the variables on which the weights were based, and then reweighting individual responses (observed and imputed) to estimate population quantities. We also develop diagnostics for checking the fit of the imputation model based on comparing imputed to non-imputed data. We illustrate with the example that motivated this project --- a study of pre-election public opinion polls, in which not all the questions of interest are asked in all the surveys, so that it is infeasible to impute each survey separately.

2
Paper
Heterogeneity and Bias in Models of Vote Choice
Berinsky, Adam

Uploaded 04-21-1997
Keywords voting models
selection bias
heteroskedasticity
missing data
Abstract Voters in the United States do not behave in a homogenous manner. Voting models typically account for such heterogeneity by seeking to decompose the process of vote choice into a number of distinct components. By examining voting choice data in this way, researchers are able to ascertain reasonable estimates of the average effect of various socio-economic and political variables on the candidate selection process. Models of this sort, while plausible, may not properly reflect the true heterogeneity of the American voter. At their core, simple models assume that voters use a common and uniform decision rule when deciding between candidates. But it is possible, if not likely, that different groups and classes of citizens use differently tructured processes to determine their choice of candidates. Searchers have attempted to account for this heterogeneity in a variety of ways. Rivers(1988) and Jackson (1992), for example, have accounted for differences in the voting behavior of individuals by allowing the mean effect of theoretically important variables to vary across individuals. While these approaches are extremely promising, in this paper I will take a different approach and examine three more subtle forms of heterogeneity in the vote choice process: (1) heterogeneity induced by non-random selection from the full population of citizens into the vote choice model sample; (2) heterogeneity due to the interaction of selection bias and non-constant variance; and (3) heterogeneity in the patterns of missing data across groups of the respondents. While much of the discussion in the paper is focused on the first two forms of heterogeneity, it is the third form of heterogeneity - one not typically addressed in the political science literature - that is the most important determinant of the degree of bias in vote choice models. Thus, heterogeneity within the sample of respondents affects the vote choice model estimates, just not in the way I originally envisioned. It is not just heterogeneity in the variance term, or in the selection into the vote choice process that poses a threat to accurate estimates of the power of the predictors in our vote choice models. Rather, it is the failure to preserve or account for the heterogeneity of the paths by which people answer survey questions that is the real bogeyman of vote choice models.

3
Paper
Attributing Effects to A Cluster Randomized Get-Out-The-Vote Campaign: An Application of Randomization Inference Using Full Matching
Bowers, Jake
Hansen, Ben

Uploaded 07-18-2005
Keywords causal inference
randomization inference
attributable effects
full matching
instrumental variables
missing data
field experiments
clustering
Abstract Statistical analysis requires a probability model: commonly, a model for the dependence of outcomes $Y$ on confounders $X$ and a potentially causal variable $Z$. When the goal of the analysis is to infer $Z$'s effects on $Y$, this requirement introduces an element of circularity: in order to decide how $Z$ affects $Y$, the analyst first determines, speculatively, the manner of $Y$'s dependence on $Z$ and other variables. This paper takes a statistical perspective that avoids such circles, permitting analysis of $Z$'s effects on $Y$ even as the statistician remains entirely agnostic about the conditional distribution of $Y$ given $X$ and $Z$, or perhaps even denies that such a distribution exists. Our assumptions instead pertain to the conditional distribution $Z vert X$, and the role of speculation in settling them is reduced by the existence of random assignment of $Z$ in a field experiment as well as by poststratification, testing for overt bias before accepting a poststratification, and optimal full matching. Such beginnings pave the way for ``randomization inference'', an approach which, despite a long history in the analysis of designed experiments, is relatively new to political science and to other fields in which experimental data are rarely available. The approach applies to both experiments and observational studies. We illustrate this by applying it to analyze A. Gerber and D. Green's New Haven Vote 98 campaign. Conceived as both a get-out-the-vote campaign and a field experiment in political participation, the study assigned households to treatment and desired to estimate the effect of treatment on the individuals nested within the households. We estimate the number of voters who would not have voted had the campaign not prompted them to --- that is, the total number of votes attributable to the interventions of the campaigners --- while taking into account the non-independence of observations within households, non-random compliance, and missing responses. Both our statistical inferences about these attributable effects and the stratification and matching that precede them rely on quite recent developments from statistics; our matching, in particular, has novel features of potentially wide applicability. Our broad findings resemble those of the original analysis by citet{gerbergreen00}.

4
Paper
Diagnostics for multivariate imputation
Abayomi, Kobi
Gelman, Andrew
Levy, Marc

Uploaded 08-16-2005
Keywords missing data
multiple imputation
regression diagnostics
Abstract We consider three sorts of diagnostics for random imputations: (a) displays of the completed data, intended to reveal unusual patterns that might suggest problems with the imputations, (b) comparisons of the distributions of observed and imputed data values, and (c) checks of the fit of observed data to the model used to create the imputations. We formulate these methods in terms of sequential regression multivariate imputation [Van Buuren and Oudshoom 2000, and Raghunathan, Van Hoewyk, and Solenberger 2001], an iterative procedure in which the missing values of each variable are randomly imputed conditional on all the other variables in the completed data matrix. We also consider a recalibration procedure for sequential regression imputations. We apply these methods to the 2002 Environmental Sustainability Index (ESI), a linear aggregation of 68 environmental variables on 142 countries, with 22% missing values.

5
Paper
What to do About Missing Values in Time Series Cross-Section Data
King, Gary
Honaker, James

Uploaded 07-14-2006
Keywords Missing data
multiple imputation
EM
IP
EMis
time series
cross-section
Abstract Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in these fields have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the time-series cross-section data structures common in these fields. We attempt to rectify this situation. First, we build a multiple imputation model that allows smooth time trends, shifts across cross-sectional units, and correlations over time and space, resulting in far more accurate imputations. Second, we build nonignorable missingness models by enabling analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficult-to-interpret model parameters. Third, since these tasks could not be accomplished within existing imputation algorithms, in that they cannot handle as many variables as needed even in the simpler cross-sectional data for which they were designed, we also develop a new algorithm that substantially expands the range of computationally feasible data types and sizes for which multiple imputation can be used. These developments made it possible for us to implement our methods in new open source software which, unlike all existing multiple imputation packages, virtually never crashes.

6
Paper
Matching for Causal Inference Without Balance Checking
Iacus, Stefano
King, Gary
Porro, Giuseppe

Uploaded 06-26-2008
Keywords Matching
causal inference
observational data
missing data

Abstract We address a major discrepancy in matching methods for causal inference in observational data. Since these data are typically plentiful, the goal of matching is to reduce bias and only secondarily to keep variance low. However, most matching methods seem designed for the opposite problem, guaranteeing sample size ex ante but limiting bias by controlling for covariates through reductions in the imbalance between treated and control groups only ex post and only sometimes. (The resulting practical difficulty may explain why many published applications do not check whether imbalance was reduced and so may not even be decreasing bias.) We introduce a new class of "Monotonic Imbalance Bounding" (MIB) matching methods that enables one to choose a fixed level of maximum imbalance, or to reduce maximum imbalance for one variable without changing it for the others. We then discuss a specific MIB method called "Coarsened Exact Matching" (CEM) which, unlike most existing approaches, also explicitly bounds through ex ante user choice both the degree of model dependence and the causal effect estimation error, eliminates the need for a separate procedure to restrict data to common support, meets the congruence principle, is approximately invariant to measurement error, works well with modern methods of imputation for missing data, is computationally efficient even with massive data sets, and is easy to understand and use. This method can improve causal inferences in a wide range of applications, and may be preferred for simplicity of use even when it is possible to design superior methods for particular problems. We also make available open source software which implements all our suggestions.

7
Paper
Multiple Overimputation: A Unified Approach to Measurement Error and Missing Data
Blackwell, Matthew
Honaker, James
King, Gary

Uploaded 07-19-2010
Keywords measurement error
missing data
multiple imputation
EM
overimputation
Abstract Social scientists typically devote considerable effort to reducing measurement error during data collection and then ignore the issue during data analysis. Although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative that generalizes the popular multiple imputation (MI) framework by treating missing data problems as a special case of extreme measurement error and correcting for both. Like MI, the proposed "multiple overimputation" (MO) framework is a simple two-step procedure. First, multiple (around 5) completed copies of the data set are created where cells measured without error are held constant, those missing are imputed from the distribution of predicted values, and cells (or entire variables) with measurement error are ``overimputed,'' that is imputed from a predictive distribution with observation-level priors defined by the mismeasured values and available external information, if any. In the second step, analysts can then run whatever statistical method they would have run on each of the overimputed data sets as if there had been no missingness or measurement error; the results are then combined via a simple procedure. We also offer open source software that implements all the methods described herein.

8
Paper
We Have to Be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data
Cranmer, Skyler
Gill, Jeff

Uploaded 04-30-2012
Keywords missing data
categorical
hot-decking
MCAR
multiple imputation
MAR
GLM
regression
missingness
Abstract Missing values are a frequent problem in empirical political science research. Surprisingly, there has been little attention to the match between the measurement of the missing values and the correcting algorithms used. While multiple imputation is a vast improvement over the deletion of cases with missing values, it is often ill suited for imputing highly non-granular discrete data. We develop a simple technique for imputing missing values in such situations, which is a variant of hot deck imputation, drawing from the conditional distribution of the variable with missing values to preserve the discrete measure of the variable. This method is tested against existing techniques using Monte Carlo analysis and then applied to real data on democratisation and modernisation theory. We provide software for our imputation technique in a free and easy-to-use package for the \R\ statistical environment.

9
Paper
Parameterization and Bayesian Modeling
Gelman, Andrew

Uploaded 06-15-2004
Keywords censored data
data augmentation
Gibbs sampler
hierarchical model
missing data imputation
parameter expansion
prior distribution
truncated data
Abstract Progress in statistical computation often leads to advances in statistical modeling. For example, it is surprisingly common that an existing model is reparameterized, solely for computational purposes, but then this new configuration motivates a new family of models that is useful in applied statistics. One reason this phenomenon may not have been noticed in statistics is that reparameterizations do not change the likelihood. In a Bayesian framework, however, a transformation of parameters typically suggests a new family of prior distributions. We discuss examples in censored and truncated data, mixture modeling, multivariate imputation, stochastic processes, and multilevel models.

10
Paper
Is There a Gender Gap in Fiscal Political Preferences
Alvarez, R. Michael
McCaffery, Edward J.

Uploaded 08-12-2000
Keywords Gender gap
fiscal politics
taxation
budget surplus
multinomial logit
missing data
imputation
framing
survey experiments
Abstract This paper examines the relationship between attitudes on potential uses of the budget surplus and gender. Survey results show relatively weak support overall for using a projected surplus to reduce taxes, with respondents much likelier to prefer increased social spending on education or social security. There is a significant gender gap with men being far more likely than women to support tax cuts or paying down the national debt. Given a menu of particular types of tax cuts, women are marginally more likely to favor child-care relief or working poor tax credits whereas men are marginally more likely to favor capital gains reduction or tax rate cuts. When primed that the tax laws are biased against two-worker families, men significantly change their preferences, moving from support for general tax rate cuts to support for working poor tax relief, but not to child-care relief. One of the strongest results to emerge is that women are far more likely than men not to express an opinion or to confess ignorance about fiscal matters. Both genders increase their ``no opinion'' answer in the face of priming, but men more so than women. Further research will explore this no opinion/uncertainty aspect.

11
Paper
The Political Economy of Non-Tariff Trade Barriers: A Test of the Veto Players Theory of Policy Change
Kotin, Daniel

Uploaded 07-13-1999
Keywords cross-sectional time-series
missing data
Abstract This paper tests George Tsebelis's (1995) veto players model of policy stability, as applied to international trade policy. The veto players model argues that policy change is more difficult with the number of political actors that can veto such change, their ideological polarization, and (for collective veto players) their cohesion. I test this model's ability to predict the variation in non-tariff barriers (NTBs) to international trade for 16 industrial democracies, over the period 1981-94. Such barriers may be becoming increasingly attractive to states seeking to maintain trade protection in the face of secular declines in tariff rates. In a regression model controlling for economic factors and other domestic political influences on NTBs, such as politicians' trade policy preferences, minority government, and constituency pressures, I find support for Tsebelis's theory: Governments in the sample that are more polarized on the trade policy dimension are less able to change NTB policy. This finding holds despite the presence of a significant amount of missing data on the dependent variable, which consists of first differences taken across missing years, according to an alternative model in which the missing NTB levels are imputed via interpolation, and from which the first differences are then computed. Although more NTB data is needed to verify them, these preliminary results add to a growing body of literature finding empirical support for the veto players theory.

12
Paper
Listwise Deletion is Evil: What to Do About Missing Data in Political Science
King, Gary
Honaker, James
Joseph, Anne
Scheve, Kenneth

Uploaded 07-13-1998
Keywords missing data
imputation
IP
EM
EMs
EMis
data augmentation
MCMC
importance sampling
item nonresponse
Abstract We address a substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing data problems based on the concept of ``multiple imputation,'' but most researchers in our field and other social sciences still use far inferior methods. Indeed, we demonstrate that the threats to validity from current missing data practices rival the biases from the much better known omitted variable problem. This discrepancy is not entirely our fault, as the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise on the part of the user (indeed, even experts disagree on how to use them). In this paper, we adapt an existing algorithm, and use it to implement a general-purpose, multiple imputation model for missing data. This algorithm is between 20 and 100 times faster than the leading method recommended in the statistics literature and is very easy to use. We also quantify the considerable risks of current political science missing data practices, illustrate how to use the new procedure, and demonstrate the advantages of our approach to multiple imputation through simulated data as well as via replications of existing research.

13
Paper
Listwise Deletion is Evil: What to Do About Missing Data in Political Science (revised)
King, Gary
Honaker, James
Joseph, Anne
Scheve, Kenneth

Uploaded 08-19-1998
Keywords missing data
imputation
IP
EM
EMs
EMis
data augmentation
MCMC
importance sampling
item nonresponse
Abstract We propose a remedy to the substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing data problems based on the concept of ``multiple imputation,'' but most researchers in our field and other social sciences still use far inferior methods. Indeed, we demonstrate that the threats to validity from current missing data practices rival the biases from the much better known omitted variable problem. As it turns out, this discrepancy is not entirely our fault, as the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise on the part of the user (even experts disagree on how to use them). In this paper, we adapt an existing algorithm, and use it to implement a general-purpose, multiple imputation model for missing data. This algorithm is between 65 and 726 times faster than the leading method recommended in the statistics literature and is very easy to use. We also quantify the considerable risks of current political science missing data practices, illustrate how to use the new procedure, and demonstrate the advantages of our approach to multiple imputation through simulated data as well as via replications of existing research. We also offer easy-to-use public domain software that implements our approach.

14
Poster
Weighted Estimation for Analyses with Missing Data
Samii, Cyrus

Uploaded 07-21-2010
Keywords missing data
doubly robust
inverse probability weighting
semi-parametric
post-treatment
regression
sample selection
Abstract Missing data plague data analyses in political science. The recent applied statistics literature reflects renewed interest in weighting methods for missing data problems. Three properties are stressed in this literature: (i) robustness, (ii) the ability to use post-treatment information in causal analysis, and (iii) methods to gain efficiency. I present these results, hoping to show the potential in using refashioned weighting methods for political science research.

15
Poster
Effects of Interviewer Gender and Hijab on Gender-Related Survey Responses: Findings from a Nationally-Representative Field Experiment in Morocco
Benstead, Lindsay

Uploaded 11-07-2010
Keywords Gender of Interviewer Effects
Response Effects
Interviewer Religious Dress
Hijab
Morocco
Middle East
Missing Data
North Africa
Women's Rights
Muslim World
Abstract Despite the recent expansion of surveying in the Muslim world, few published studies have addressed methodological questions, including how observable interviewer characteristics affect responses and data quality. Although there are a limited number of studies on interviewer dress effects, none examine interviewer gender. This study asks whether and why gender and religious dress affect responses to gender-related questions. Drawing upon original data from a nationally-representative, partially-randomized survey of 800 Moroccans conducted in 2007, the study finds strong evidence that gender and dress affect responses and item non-response. The paper argues that because hijab implies multiple personal, religious, and political dimensions of identity nested within gender identity, interviewer gender and dress must be considered as intersecting categories. For questions pertaining to women’s role in the public sphere, responses were affected by interviewer dress; respondents reported more progressive attitudes and were more likely to refuse to respond to female interviewers not wearing hijab than to veiled female interviewers and male interviewers. For support for gender equality in family law, results were affected by interviewer gender; respondents reported more liberal views and were more likely to fail to respond to female interviewers with both dress styles than male interviewers. Interviewer characteristics affected responses to more than half of the 174 questions included in the survey, including support for democracy and religiosity. Researchers conducting surveys should code and control for interviewer characteristics in order to reduce total survey error and better understand the social processes which generate public opinion in this crucial region.

16
Poster
Bounds for Logistic Regression Coefficients with Nonignorable Missing Outcomes
Kenkel, Brenton

Uploaded 07-27-2011
Keywords partial identification
bounds
missing data
measurement error
Abstract I develop a new method to estimate logistic regression coefficients when there is nonignorable missingness or measurement error in the outcome variable. The estimator finds the set of all coefficient vectors that could be obtained under any assumption about the missing outcomes.


< prev 1 next>
   
wustlArtSci