logoPicV1 logoTextV1

Search Results


Below results based on the criteria 'sampling'
Total number of records returned: 20

1
Paper
Estimation in Dirichlet Random Effects Models
Kyung, Minjung
Gill, Jeff
Casella, George

Uploaded 04-28-2009
Keywords generalized linear mixed model
Dirichlet process random effects model
precision parameter likelihood
Gibbs sampling
importance sampling
probit mixed Dirichlet random effects model
Abstract We develop a new Gibbs sampler for a linear mixed model with a Dirichlet process random effect term, which is easily extended to a generalized linear mixed model with a probit link function. Our Gibbs sampler exploits the properties of the multinomial and Dirichlet distribution, and is shown to be an improvement, in terms of operator norm and efficiency, over other commonly used MCMC algorithms. We also investigate methods for the estimation of the precision parameter of the Dirichlet process, finding that maximum likelihood may not be desirable, but a posterior mode is a reasonable approach. Examples are given to show how these models perform on real data. Our results complement both the theoretical basis of the Dirichlet process nonparametric prior and the computational work that has been done to date. Forthcoming: Annals of Statistics.

2
Paper
What to do When Your Hessian is Not Invertible: Alternatives to Model Respecification in Nonlinear Estimation
Gill, Jeff
King, Gary

Uploaded 05-14-2002
Keywords Hessian
Cholesky
generalized inverse
maximum likelihood
statistical computing
importance sampling
pseudo-variance
generalized linear model
singular normal
Abstract What should a researcher do when statistical analysis software terminates before completion with a message that the Hessian is not invertable? The standard textbook advice is to respecify the model, but this is another way of saying that the researcher should change the question being asked. Obviously, however, computer programs should not be in the business of deciding what questions are worthy of study. Although noninvertable Hessians are sometimes signals of poorly posed questions, nonsensical models, or inappropriate estimators, they also frequently occur when information about the quantities of interest does exist in the data, through the likelihood function. We explain the problem in some detail and lay out two preliminary proposals for ways of dealing with noninvertable Hessians without changing the question asked.

3
Paper
Spike and Slab Prior Distributions for Simultaneous Bayesian Hypothesis Testing, Model Selection, and Prediction, of Nonlinear Outcomes
Pang, Xun
Gill, Jeff

Uploaded 07-13-2009
Keywords Spike and Slab Prior
Hypothesis Testing
Bayesian Model Selection
Bayesian Model Averaging
Adaptive Rejection Sampling
Generalized Linear Model
Abstract A small body of literature has used the spike and slab prior specification for model selection with strictly linear outcomes. In this setup a two-component mixture distribution is stipulated for coefficients of interest with one part centered at zero with very high precision (the spike) and the other as a distribution diffusely centered at the research hypothesis (the slab). With the selective shrinkage, this setup incorporates the zero coefficient contingency directly into the modeling process to produce posterior probabilities for hypothesized outcomes. We extend the model to qualitative responses by designing a hierarchy of forms over both the parameter and model spaces to achieve variable selection, model averaging, and individual coefficient hypothesis testing. To overcome the technical challenges in estimating the marginal posterior distributions possibly with a dramatic ratio of density heights of the spike to the slab, we develop a hybrid Gibbs sampling algorithm using an adaptive rejection approach for various discrete outcome models, including dichotomous, polychotomous, and count responses. The performance of the models and methods are assessed with both Monte Carlo experiments and empirical applications in political science.

4
Paper
Alternative Models of Dynamics in Binary Time-Series--Cross-Section Models: The Example of State Failure
Beck, Nathaniel
Jackman, Simon
Epstein, David
O'Halloran, Sharyn

Uploaded 07-14-2001
Keywords dynamic probit
btscs
state failure
Gibbs sampling
MCMC
transitional models
discrete data
ROC
correlated binary data
generalized residuals
Abstract This paper investigates a variety of dynamic probit models for time-series--cross-section data in the context of explaining state failure. It shows that ordinary probit, which ignores dynamics, is misleading. Alternatives that seem to produce sensible results are the transition model and a model which includes a lagged emph{latent} dependent variable. It is argued that the use of a lagged latent variable is often superior to the use of a lagged realized dependent variable. It is also shown that the latter is a special case of the transition model. The relationship between the transition model and event history methods is also considered: the transition model estimates an event history model for both values of the dependent variable, yielding estimates that are identical to those produced by the two event history models. Furthermore, one can incorporate the insights gleaned from the event history models into the transition analysis, so that researchers do not have to assume duration independence. The conclusion notes that investigations of the various models have been limited to data sets which contain long sequences of zeros; models may perform differently in data sets with shorter bursts of zeros and ones.

5
Paper
Penalized Regression, Standard Errors, and Bayesian Lassos
Kyung, Minjung
Gill, Jeff
Ghosh, Malay
Casella, George

Uploaded 02-23-2010
Keywords model selection
lassos
Bayesian hierarchical models
LARS algorithm
EM/Gibbs sampler
Geometric Ergodicity
Gibbs Sampling
Abstract Penalized regression methods for simultaneous variable selection and coefficient estimation, especially those based on the lasso of Tibshirani (1996), have received a great deal of attention in recent years, mostly through frequentist models. Properties such as consistency have been studied, and are achieved by different lasso variations. Here we look at a fully Bayesian formulation of the problem, which is flexible enough to encompass most versions of the lasso that have been previously considered. The advantages of the hierarchical Bayesian formulations are many. In addition to the usual ease-of-interpretation of hierarchical models, the Bayesian formulation produces valid standard errors (which can be problematic for the frequentist lasso), and is based on a geometrically ergodic Markov chain. We compare the performance of the Bayesian lassos to their frequentist counterparts using simulations and data sets that previous lasso papers have used, and see that in terms of prediction mean squared error, the Bayesian lasso performance is similar to and, in some cases, better than, the frequentist lasso.

6
Paper
Estimation Of Electoral Disproportionality And Thresholds Via MCMC
Kalandrakis, Anastassios

Uploaded 11-03-1999
Keywords Electoral Disproportionality
Electoral Thresholds
Gibbs Sampling
MCMC
Metropolis Algorithm
Abstract For statistical as well as political reasons -- some already identified in the literature -- measures of both electoral disproportionality and electoral thresholds are essential and must be combined in numerical summaries of electoral institutions. With few exceptions, none of these quantities can be reliably inferred directly from the provisions of the electoral law, thus impairing "large scale" comparative studies. Through the use of sampling based Bayes methods I am able to simultaneously estimate these two quantities from electoral returns. I apply the proposed procedure on 45 electoral systems in use over 216 elections to the national parliaments in the 15 countries of the European Union in the period 1945-1996. The resultant two-dimensional summary of electoral systems has several attractive properties in comparison to indices of disproportionality currently used in comparative politics.

7
Paper
Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman's Critique
Lin, Winston

Uploaded 09-02-2011
Keywords Covariate adjustment
Randomization inference
Neyman's repeated sampling approach
Sandwich estimator
Social experiments
Abstract Freedman [Adv. in Appl. Math. 40 (2008a) 180–193; Ann. Appl. Stat. (2008b) 2 176–196] critiqued OLS regression adjustment of estimated treatment effects in randomized experiments, using Neyman’s model for randomization inference. This paper argues that in sufficiently large samples, the statistical problems he raised are either minor or easily fixed. OLS adjustment improves or does not hurt asymptotic precision when the regression includes a full set of treatment-covariate interactions. Asymptotically valid confidence intervals can be constructed with the Huber-White sandwich standard error estimator. Even the traditional OLS adjustment has benign large-sample properties when subjects are randomly assigned to two groups of equal size. The strongest reasons to support Freedman’s preference for unadjusted estimates are transparency and the dangers of specification search.

8
Paper
Listwise Deletion is Evil: What to Do About Missing Data in Political Science
King, Gary
Honaker, James
Joseph, Anne
Scheve, Kenneth

Uploaded 07-13-1998
Keywords missing data
imputation
IP
EM
EMs
EMis
data augmentation
MCMC
importance sampling
item nonresponse
Abstract We address a substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing data problems based on the concept of ``multiple imputation,'' but most researchers in our field and other social sciences still use far inferior methods. Indeed, we demonstrate that the threats to validity from current missing data practices rival the biases from the much better known omitted variable problem. This discrepancy is not entirely our fault, as the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise on the part of the user (indeed, even experts disagree on how to use them). In this paper, we adapt an existing algorithm, and use it to implement a general-purpose, multiple imputation model for missing data. This algorithm is between 20 and 100 times faster than the leading method recommended in the statistics literature and is very easy to use. We also quantify the considerable risks of current political science missing data practices, illustrate how to use the new procedure, and demonstrate the advantages of our approach to multiple imputation through simulated data as well as via replications of existing research.

9
Paper
Listwise Deletion is Evil: What to Do About Missing Data in Political Science (revised)
King, Gary
Honaker, James
Joseph, Anne
Scheve, Kenneth

Uploaded 08-19-1998
Keywords missing data
imputation
IP
EM
EMs
EMis
data augmentation
MCMC
importance sampling
item nonresponse
Abstract We propose a remedy to the substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing data problems based on the concept of ``multiple imputation,'' but most researchers in our field and other social sciences still use far inferior methods. Indeed, we demonstrate that the threats to validity from current missing data practices rival the biases from the much better known omitted variable problem. As it turns out, this discrepancy is not entirely our fault, as the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise on the part of the user (even experts disagree on how to use them). In this paper, we adapt an existing algorithm, and use it to implement a general-purpose, multiple imputation model for missing data. This algorithm is between 65 and 726 times faster than the leading method recommended in the statistics literature and is very easy to use. We also quantify the considerable risks of current political science missing data practices, illustrate how to use the new procedure, and demonstrate the advantages of our approach to multiple imputation through simulated data as well as via replications of existing research. We also offer easy-to-use public domain software that implements our approach.

10
Paper
The Robustness of Normal-theory LISREL Models: Tests Using a New Optimizer, the Bootstrap, and Sampling Experiments, with Applications
Mebane, Walter R.
Sekhon, Jasjeet
Wells, Martin T.

Uploaded 01-01-1995
Keywords statistics
estimation
covariance structures
linear structural relations
LISREL
bootstrap
confidence intervals
BCa
specification tests
goodness-of-fit
hypothesis tests
optimization
evolutionary programming
genetic algorithms
monte carlo
sampling experiment
Abstract Asymptotic results from theoretical statistics show that the linear structural relations (LISREL) covariance structure model is robust to many kinds of departures from multivariate normality in the observed data. But close examination of the statistical theory suggests that the kinds of hypotheses about alternative models that are most often of interest in political science research are not covered by the nice robustness results. The typical size of political science data samples also raises questions about the applicability of the asymptotic normal theory. We present results from a Monte Carlo sampling experiment and from analysis of two real data sets both to illustrate the robustness results and to demonstrate how it is unwise to rely on them in substantive political science research. We propose new methods using the bootstrap to assess more accurately the distributions of parameter estimates and test statistics for the LISREL model. To implement the bootstrap we use optimization software two of us have developed, incorporating the quasi-Newton BFGS method in an evolutionary programming algorithm. We describe methods for drawing inferences about LISREL models that are much more reliable than the asymptotic normal-theory techniques. The methods we propose are implemented using the new software we have developed. Our bootstrap and optimization methods allow model assessment and model selection to use well understood statistical principles such as classical hypothesis testing.

11
Paper
Senate Voting on NAFTA: The Power and Limitations of MCMC Methods for Studying Voting across Bills and across States
Smith, Alastair
McGillivray, Fiona

Uploaded 07-09-1996
Keywords NAFTA
MCMC
Gibbs sampling
bivariate probit
Senate
Abstract We examine similarities in senate voting within states and across two senate bills: the 1991 fast track authorization bill and the 1993 NAFTA implementation bill. A series of bivariate probit models are estimated by Markov Chain Monte Carlo simulation. We discuss the power of MCMC techniques and how the output of these sampling procedures can be used for Bayesian model comparisons. Having separately explored the similarities in votes across bills and within states, we develop a 4-variate probit model to explain voting on NAFTA. The power of MCMC techniques to estimate this complicated model is demonstrated with two different MCMC procedures. We conclude by discussing the data requirements for these techniques.

12
Paper
A Bayesian Method for the Analysis of Dyadic Crisis Data
Smith, Alastair

Uploaded 11-04-1996
Keywords Bayesian model testing
Censored data
Crisis data
Gibbs sampling
Markov chain Monte Carlo
Ordered discrete choice model
Strategic choice
Abstract his paper examines the level of force that nations use during disputes. Suppose that two nations, A and B, are involved in a dispute. Each nation chooses the level of violence that it is prepared to use in order to achieve its objectives. Since there are two opponents making decisions, the outcome of the crisis is determined by a bivariate rather than univariate process. I propose a bivariate ordered discrete choice model to examine the relationship between nation A's decision to use force, nation B's decision to use force, and a series of explanatory variables. The model is estimated in the Bayesian context using a Markov chain Monte Carlo simulation technique. I analyze Bueno de Mesquita and Lalman's (1992) dyadically coded version of the Militarized Interstate Dispute data (Gochman and Moaz 1984). Various models are compared using Bayes Factors. The results indicate that nation A's and nation B's decisions to use force can not be regarded as independent. Bayesian model comparison show that variables derived from Bueno de Mesquita's expected utility theory (1982, 1985; Bueno de Mesquita and Lalman 1986, 1992) provide the best explanatory variables for decision making in crises.

13
Paper
Not Asked and Not Answered: Multiple Imputation for Multiple Surveys
Gelman, Andrew
King, Gary
Liu, Chuanhai

Uploaded 10-27-1997
Keywords Bayesian inference
cluster sampling
diagnostics
hierarchical models
ignorable nonresponse
missing data
political science
sample surveys
stratified sampling
multiple imputation
Abstract We present a method of analyzing a series of independent cross-sectional surveys in which some questions are not answered in some surveys and some respondents do not answer some of the questions posed. The method is also applicable to a single survey in which different questions are asked, or different sampling methods used, in different strata or clusters. Our method involves multiply-imputing the missing items and questions by adding to existing methods of imputation designed for single surveys a hierarchical regression model that allows covariates at the individual and survey levels. Information from survey weights is exploited by including in the analysis the variables on which the weights were based, and then reweighting individual responses (observed and imputed) to estimate population quantities. We also develop diagnostics for checking the fit of the imputation model based on comparing imputed to non-imputed data. We illustrate with the example that motivated this project --- a study of pre-election public opinion polls, in which not all the questions of interest are asked in all the surveys, so that it is infeasible to impute each survey separately.

14
Paper
Sampling people or people in places? The BES as an election study
Johnston, Ron
Harris, Rich
Jones, Kelvyn

Uploaded 08-15-2005
Keywords British Election Study
representativeness
sampling
Abstract UK general elections involve a number of separate, though complexly inter-linked, contests for support among the parties. Two of these are reflected in the main types of model of voting behaviour used by political scientists, whereas the third involves the separate contests that take place in most cases among the main political parties in the (now) 646 constituencies which send representatives to the House of Commons. Ideally, electoral surveys should take account of all three. In this note, we explore the extent to which that is the case with the 2005 British Election Study with the coverage restricted to England and Wales only, for technical reasons and explore the implications of our findings for future electoral studies.

15
Paper
How many people do you know in prison?: using overdispersion in count data to estimate social structure in networks
Zheng, Tian
Salganik, Matt
Gelman, Andrew

Uploaded 10-12-2005
Keywords negative binomial distribution
overdispersion
sampling
social networks
social structure
Abstract Networks--sets of objects connected by relationships--are important in a number of fields. The study of networks has long been central to sociology, where researchers have attempted to understand the causes and consequences of the structure of relationships in large groups of people. Using insight from previous network research, Killworth et al. (1998a,b) and McCarty et al. (2001) developed and evaluated a method for estimating the sizes of hard-to-count populations using network data collected from a simple random sample of Americans. In this paper we show how, using a multilevel overdispersed Poisson regression model, these data can also be used to estimate aspects of social structure in the population. Our work goes beyond most previous research on networks by using variation, as well as average responses, as a source of information. We apply our method to the McCarty et al. data and find that Americans vary greatly in their number of acquaintances. Further, Americans show great variation in propensity to form ties to people in some groups (e.g., males in prison, the homeless, and American Indians), but little variation for other groups (e.g., twins, people named Michael or Nicole). We also explore other features of these data and consider ways in which survey data can be used to estimate network structure.

16
Paper
Struggles with survey weighting and regression modeling
Gelman, Andrew

Uploaded 10-12-2005
Keywords multilevel modeling
poststrati cation
sampling weights
shrinkage
Abstract The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models can quickly become very complicated, with potentially thousands of post-stratification cells. It is then a challenge to develop general families of multilevel probability models that yield reasonable Bayesian inferences. We discuss in the context of several ongoing public health and social surveys. This work is currently open-ended, and we conclude with thoughts on how research could proceed to solve these problems.

17
Paper
The Spatial Probit Model of Interdependent Binary Outcomes: Estimation, Interpretation, and Presentation
Franzese, Robert
Hays, Jude

Uploaded 07-20-2007
Keywords Spatial Probit
Bayesian Gibbs-Sampler Estimator
Recursive Importance-Sampling Estimator
Interdependence
Diffusion
Contagion
Emulation
Abstract We have argued and shown elsewhere the ubiquity and prominence of spatial interdependence in political science research and noted that much previous practice has neglected this interdependence or treated it solely as nuisance to the serious detriment of sound inference. Previously, we considered only linear-regression models of spatial and/or spatio-temporal interdependence. In this paper, we turn to binary-outcome models. We start by stressing the ubiquity and centrality of interdependence in binary outcomes of interest to political and social scientists and note that, again, this interdependence has been ignored in most contexts where it likely arises and that, in the few contexts where it has been acknowledged, the endogeneity of the spatial lag has not be recognized. Next, we explain some of the severe challenges for empirical analysis posed by spatial interdependence in binary-outcome models, and then we follow recent advances in the spatial-econometric literature to suggest Bayesian or recursive-importance-sampling (RIS) approaches for tackling estimation. In brief and in general, the estimation complications arise because among the RHS variables is an endogenous weighted spatial-lag of the unobserved latent outcome, y*, in the other units; Bayesian or RIS techniques facilitate the complicated nested optimization exercise that follows from that fact. We also advance that literature by showing how to calculate estimated spatial effects (as opposed to parameter estimates) in such models, how to construct confidence regions for those (adopting a simulation strategy for the purpose), and how to present such estimates effectively.

18
Paper
A Spatial Model of Electoral Platforms
Elff, Martin

Uploaded 07-01-2008
Keywords Parties
party families
electoral platforms
party manifestos
spatial models
unobserved data
latent trait models
EM algorithm
Monte Carlo integration
Monte Carlo EM
importance sampling
SIR algorithm
ideological dimensions
Abstract The reconstruction of political positions of parties, candidates and governments has made considerable headway during the last decades, not the least due to the efforts of the Manifesto Research Group the and Comparative Manifestos Project, which compiled and published a data set on the electoral platforms of political parties from most major democracies for most of the post-war era. A central assumption underlying the coding of electoral platforms into quantitative data as done by the MRG/CMP is that parties take positions by selective emphases of policy objectives, which put their accomplishments in a most positive light (Budge 2001) or are representative for their current polital/ideological positions. Consequently, the MRG/CMP data consist of percentages of the respective manifesto texts that refer to various policy objectives. As a consequence both of this underlying assumption and of the structure of the CMP data, methods of classical multivariate analysis are not well suited to these data, due to the requirements to the data for an appropriate application of these methods (van der Brug 2001; Elff 2002). The paper offers an alternative method for reconstructing positions in political spaces based on latent trait modelling, which both re?ects the assumptions underlying the coding of the texts and the peculiar structure of the data. Finally, the validity of the proposed method is demonstrated with respect to the average position of party families within reconstructed policy spaces. It turns out that communist, socialist, and social democrat parties differ clearly from ??bourgeois?? parties with regards to their positions on an economic left/right dimension, while British and Scandinavian conservative parties can be distinguished from Christian democratic parties by their respective positions on a libertarian/authoritarian and a traditionalist/modernist dimension. Similarly, the typical political positions of green (or ??New Politics??) parties can be distinguished from the positions of other party families.

19
Paper
Binary and Ordinal Time Series with AR(p) Errors: Bayesian Model Determination for Latent High-Order Markovian Processes
Pang, Xun

Uploaded 07-06-2008
Keywords Autoregressive Errors
Auxiliary Particle Filter
Fixed-lag Smoothing
Markov Chain Monte Carlo (MCMC)
Political Science
Sampling Importance Resampling(SIR)
Abstract To directly and adequately correct serial correlation in binary and ordinal response data, this paper proposes a probit model with errors following a pth-order autoregressive process, and develops simulation-based methods in the Bayesian context to handle computational challenges of posterior estimation, model comparison, and lag order determination. Compared to the extant methods, such as quasi-ML, GCM, and and simulation-based ML estimators, the current method does not rely on the properties of the big variance-covariance matrix or the shape of the likelihood function. In addition, the present model efficiently handles high-order autocorrelated errors that raise computationally formidable difficulties to the conventional methods. By applying a mixed sampler of the Gibbs and Metropolis-Hastings algorithm, the posterior distributions of the parameters do not depend on initial observations. The auxiliary particle filter, complemented by the fixed-lag smoothing, is extended to approximate Bayes Factors for models with latent high-order Markov processes. Computational methods are tested with empirical data. Energy cooperation policies of the International Energy Agency are analyzed in terms of their effects on global oil-supply security. The current model with different lag orders, together with other competitive models, is estimated and compared.

20
Poster
Blossom: An evolutionary strategy optimizer with applications to matching, scaling, networks, and sampling
Beauchamp, Nick

Uploaded 07-24-2013
Keywords maximum likelihood
genetic algorithms
matching
multidimensional scaling
social networks
clustering
sampling methods
markov chain monte carlo
evolutionary algorithms
estimation of distribution algorithms
Abstract This paper introduces a new maximization and importance sampling algorithm, "Blossom," along with an associated R script, which is especially well suited to rugged, discontinuous, and multimodal functions where even approximate gradient methods are unfeasible, and MCMC approaches work poorly. The Blossom algorithm employs an evolutionary optimization strategy related to the Estimation of Multivariate Normal Algorithm (EMNA) or Covariance Matrix Adaptation (CMA), within the general family of Estimation of Distribution Algorithms (EDA). It works by successive iterations of sampling, selecting the highest-scoring subsample, and using the variance-covariance matrix of that subsample to generate a new sample, with various self-adapting parameters. Compared against a benchmark suite of challenging functions introduced in Yao, Liu, and Lin (1999), it finds equal or better maxima to those found by the genetic algorithm Genoud introduced in Mebane and Sekhon (2011). The algorithm is then tested in four challenging domains from political science: (1) estimation of nonlinear and multimodal spatial metrics; (2) maximizing balance for matching; (3) ideological scaling of judges with discontinuous objective functions; (4) community detection in social networks. In all of these cases, Blossom outperforms most existing nonlinear optimizers in R. Finally, the samples gathered during the optimization process can be efficiently used for importance sampling using approximate voronoi cells around sample points, equalling the performance of MCMC metropolis samplers in some circumstances, and also of use for generating efficient proposal distributions. Even in an increasingly MCMC world, there remain important roles for effective general-purpose optimizers, and Blossom is especially effective for rough terrains where most other methods fail.


< prev 1 next>
   
wustlArtSci