logoPicV1 logoTextV1

Search Results


Below results based on the criteria 'data augmentation'
Total number of records returned: 7

1
Paper
A Bayesian analysis of the multinomial probit model using marginal data augmentation
Imai, Kosuke
van Dyk, David A.

Uploaded 08-21-2002
Keywords Bayesian analysis
Data augmentation
Prior distributions
Probit models
Rate of convergence
Abstract We introduce a set of new Markov chain Monte Carlo algorithms for Bayesian analysis of the multinomial probit model. Our Bayesian representation of the model places a new, and possibly improper, prior distribution directly on the identifiable parameters and thus is relatively easy to interpret and use. Our algorithms, which are based on the method of marginal data augmentation, involve only draws from standard distributions and dominate other available Bayesian methods in that they are as quick to converge as the fastest methods but with a more attractive prior specification.

2
Paper
Modeling Structural Changes: Bayesian Estimation of Multiple Changepoint Models and State Space Models
Park, Jong Hee

Uploaded 07-17-2006
Keywords Multiple changepoint model
State space model
Markov chain Monte Carlo methods
Bayes factors
Data augmentation.
Abstract While theoretical models in political science are inspired by structural changes in politics, most empirical methods assume stable patterns of causal relationships. Static models with constant parameters do not properly capture dynamic changes in the data and lead to incorrect parameter estimates. In this paper, I introduce two Bayesian time series models, which can detect and estimate possible structural changes in temporal data: multiple changepoint models and state space models. To emphasize the utility of the models to political scientists, I show some examples in the context of discrete dependent variables. Then, I apply these models to an important debate in international politics over U.S. use of force abroad. The findings of the multiple changepoint and state space models demonstrate that the predictors of presidential use of force have shifted dramatically.

3
Paper
Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach
Imai, Kosuke
Lu, Ying
Strauss, Aaron

Uploaded 12-16-2006
Keywords Coarse data
Contextual effects
Data augmentation
EM algorithm
Missing information principle
Nonparametric Bayesian Modeling.
Abstract Ecological inference is a statistical problem where aggregate-level data are used to make inferences about individual-level behavior. Recent years have witnessed resurgent interest in ecological inference among political methodologists and statisticians. In this paper, we conduct a theoretical and empirical study of Bayesian and likelihood inference for 2 x 2 ecological tables by applying the general statistical framework of incomplete data. We first show that the ecological inference problem can be decomposed into three factors: distributional effects which address the possible misspecification of parametric modeling assumptions about the unknown distribution of missing data, contextual effects which represent the possible correlation between missing data and observed variables, and aggregation effects which are directly related to the loss of information caused by data aggregation. We then examine how these three factors affect inference and offer new statistical methods to address each of them. To deal with distributional effects, we propose a nonparametric Bayesian model based on a Dirichlet process prior which relaxes common parametric assumptions. We also specify the statistical adjustments necessary to account for contextual effects. Finally, while little can be done to cope with aggregation effects, we offer a method to quantify the magnitude of such effects in order to formally assess its severity. We use simulated and real data sets to empirically investigate the consequences of these three factors and to evaluate the performance of our proposed methods. C code, along with an easy-to-use R interface, is publicly available for implementing our proposed methods.

4
Paper
Parameterization and Bayesian Modeling
Gelman, Andrew

Uploaded 06-15-2004
Keywords censored data
data augmentation
Gibbs sampler
hierarchical model
missing data imputation
parameter expansion
prior distribution
truncated data
Abstract Progress in statistical computation often leads to advances in statistical modeling. For example, it is surprisingly common that an existing model is reparameterized, solely for computational purposes, but then this new configuration motivates a new family of models that is useful in applied statistics. One reason this phenomenon may not have been noticed in statistics is that reparameterizations do not change the likelihood. In a Bayesian framework, however, a transformation of parameters typically suggests a new family of prior distributions. We discuss examples in censored and truncated data, mixture modeling, multivariate imputation, stochastic processes, and multilevel models.

5
Paper
Listwise Deletion is Evil: What to Do About Missing Data in Political Science
King, Gary
Honaker, James
Joseph, Anne
Scheve, Kenneth

Uploaded 07-13-1998
Keywords missing data
imputation
IP
EM
EMs
EMis
data augmentation
MCMC
importance sampling
item nonresponse
Abstract We address a substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing data problems based on the concept of ``multiple imputation,'' but most researchers in our field and other social sciences still use far inferior methods. Indeed, we demonstrate that the threats to validity from current missing data practices rival the biases from the much better known omitted variable problem. This discrepancy is not entirely our fault, as the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise on the part of the user (indeed, even experts disagree on how to use them). In this paper, we adapt an existing algorithm, and use it to implement a general-purpose, multiple imputation model for missing data. This algorithm is between 20 and 100 times faster than the leading method recommended in the statistics literature and is very easy to use. We also quantify the considerable risks of current political science missing data practices, illustrate how to use the new procedure, and demonstrate the advantages of our approach to multiple imputation through simulated data as well as via replications of existing research.

6
Paper
Listwise Deletion is Evil: What to Do About Missing Data in Political Science (revised)
King, Gary
Honaker, James
Joseph, Anne
Scheve, Kenneth

Uploaded 08-19-1998
Keywords missing data
imputation
IP
EM
EMs
EMis
data augmentation
MCMC
importance sampling
item nonresponse
Abstract We propose a remedy to the substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing data problems based on the concept of ``multiple imputation,'' but most researchers in our field and other social sciences still use far inferior methods. Indeed, we demonstrate that the threats to validity from current missing data practices rival the biases from the much better known omitted variable problem. As it turns out, this discrepancy is not entirely our fault, as the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise on the part of the user (even experts disagree on how to use them). In this paper, we adapt an existing algorithm, and use it to implement a general-purpose, multiple imputation model for missing data. This algorithm is between 65 and 726 times faster than the leading method recommended in the statistics literature and is very easy to use. We also quantify the considerable risks of current political science missing data practices, illustrate how to use the new procedure, and demonstrate the advantages of our approach to multiple imputation through simulated data as well as via replications of existing research. We also offer easy-to-use public domain software that implements our approach.

7
Paper
Parametric and Nonparametric Bayesian Models for Ecological Inference in 2 x 2 Tables
Imai, Kosuke
Lu, Ying

Uploaded 07-21-2004
Keywords Aggregate data
Data augmentation
Density estimation
Dirichlet process prior
Normal mixtures
Racial voting
Abstract The ecological inference problem arises when making inferences about individual behavior from aggregate data. Such a situation is frequently encountered in the social sciences and epidemiology. In this article, we propose a Bayesian approach based on data augmentation. We formulate ecological inference in $2 times 2$ tables as a missing data problem where only the weighted average of two unknown variables is observed. This framework directly incorporates the deterministic bounds, which contain all information available from the data, and allow researchers to incorporate the individual-level data whenever available. Within this general framework, we first develop a parametric model. We show that through the use of an $EM$ algorithm, the model can formally quantify the effect of missing information on parameter estimation. This is an important diagnostic for evaluating the degree of aggregation effects. Next, we introduce a nonparametric Bayesian model using a Dirichlet process prior to relax the distributional assumption of the parametric model. Through simulations and an empirical application, we evaluate the relative performance of our models and other existing methods. We show that in many realistic scenarios, aggregation effects are so severe that more than half of the information is lost, yielding estimates with little precision. We also find that our nonparametric model generally outperforms parametric models. C-code, along with an R interface, is publicly available for implementing our Markov chain Monte Carlo algorithms to fit the proposed models.


< prev 1 next>
   
wustlArtSci