logoPicV1 logoTextV1

Search Results

Below results based on the criteria 'clustering'
Total number of records returned: 6

Analyzing the US Senate in 2003: Similarities, Networks, Clusters and Blocs
Jakulin, Aleks

Uploaded 10-27-2004
Keywords roll call analysis
latent variable models
information theory
Abstract To analyze the roll calls in the US Senate in year 2003, we have employed the methods already used throughout the science community for analysis of genes, surveys and text. With information-theoretic measures we assess the association between pairs of senators based on the votes they cast. Furthermore, we can evaluate the influence of a voter by postulating a Shannon information channel between the outcome and a voter. The matrix of associations can be summarized using hierarchical clustering, multi-dimensional scaling and link analysis. With a discrete latent variable model we identify blocs of cohesive voters within the Senate, and contrast it with continuous ideal point methods. Under the bloc-voting model, the Senate can be interpreted as a weighted vote system, and we were able to estimate the empirical voting power of individual blocs through what-if analysis.

Attributing Effects to A Cluster Randomized Get-Out-The-Vote Campaign: An Application of Randomization Inference Using Full Matching
Bowers, Jake
Hansen, Ben

Uploaded 07-18-2005
Keywords causal inference
randomization inference
attributable effects
full matching
instrumental variables
missing data
field experiments
Abstract Statistical analysis requires a probability model: commonly, a model for the dependence of outcomes $Y$ on confounders $X$ and a potentially causal variable $Z$. When the goal of the analysis is to infer $Z$'s effects on $Y$, this requirement introduces an element of circularity: in order to decide how $Z$ affects $Y$, the analyst first determines, speculatively, the manner of $Y$'s dependence on $Z$ and other variables. This paper takes a statistical perspective that avoids such circles, permitting analysis of $Z$'s effects on $Y$ even as the statistician remains entirely agnostic about the conditional distribution of $Y$ given $X$ and $Z$, or perhaps even denies that such a distribution exists. Our assumptions instead pertain to the conditional distribution $Z vert X$, and the role of speculation in settling them is reduced by the existence of random assignment of $Z$ in a field experiment as well as by poststratification, testing for overt bias before accepting a poststratification, and optimal full matching. Such beginnings pave the way for ``randomization inference'', an approach which, despite a long history in the analysis of designed experiments, is relatively new to political science and to other fields in which experimental data are rarely available. The approach applies to both experiments and observational studies. We illustrate this by applying it to analyze A. Gerber and D. Green's New Haven Vote 98 campaign. Conceived as both a get-out-the-vote campaign and a field experiment in political participation, the study assigned households to treatment and desired to estimate the effect of treatment on the individuals nested within the households. We estimate the number of voters who would not have voted had the campaign not prompted them to --- that is, the total number of votes attributable to the interventions of the campaigners --- while taking into account the non-independence of observations within households, non-random compliance, and missing responses. Both our statistical inferences about these attributable effects and the stratification and matching that precede them rely on quite recent developments from statistics; our matching, in particular, has novel features of potentially wide applicability. Our broad findings resemble those of the original analysis by citet{gerbergreen00}.

Cosponsorship Coalitions in the U.S. House of Representatives
Grant, J. Tobin
Pellegrini, Pasquale (Pat) A.

Uploaded 04-22-1998
Keywords clustering
duration models
hazard models
spatial models
Abstract urrent theories and methods for studying of cosponsorship assume that the decision to cosponsor is identical to decision to vote. In this paper we develop a new theory of cosponsorship that identifies where along the ideological spectrum cosponsors of a bill are more likely to be. Moreover, we predict that members with organizational ties to the sponsor are more likely to cosponsor than other members. To test this theory, we employ a spatial duration model. This method has recently been used by geographers to estimate areas that are more likely to experience an "event." Using this technique permits a statistical test that supports our substantive hypotheses that cosponsorship coalitions are shaped by the characteristics of the location of the bill, the shared ties to the sponsor, and the policy area. In addition, more active sponsors are associated with wider and less clustered coalitions. These findings demonstrate that theories of the voting decision are not applicable to cosponsorship.

Predicting oil nationalization: Comparing estimates from a Bayesian Hierarchical Model to a Mixture Model
Mahdavi, Paasha

Uploaded 07-17-2013
Keywords Bayesian mixture models
Bayesian longitudinal models
Model-based clustering
Resource curse
Abstract Recent expropriations in the oil sectors of Argentina, Bolivia, and Venezuela have renewed interest in the study of nationalizations in the oil industry. Employing Bayesian estimation methods, this study seeks to answer two substantive questions and two methodological questions. First, what is the probability of oil nationalization in a given country, in a given year? Second, what international and country-level factors influence this probability? Third, are there differences in predictive results when using the Bayesian hierarchical approach vs. the Bayesian mixture modeling approach? Fourth, does the assumption that countries can be clustered into groups change the model estimates? The results indicate a 1.1\% probability of nationalization in a given country in a given year with the strongest empirical predictors of nationalization being global oil prices, a country's oil production history, and the diffusion of previous nationalizations. Methodologically, the findings here suggest little if any difference between using the hierarchical model framework compared to the mixture model framework. Overall, adding the assumption of clustering by country only slightly improves predictive accuracy while maintaining similar fixed effects model estimates.

Blossom: An evolutionary strategy optimizer with applications to matching, scaling, networks, and sampling
Beauchamp, Nick

Uploaded 07-24-2013
Keywords maximum likelihood
genetic algorithms
multidimensional scaling
social networks
sampling methods
markov chain monte carlo
evolutionary algorithms
estimation of distribution algorithms
Abstract This paper introduces a new maximization and importance sampling algorithm, "Blossom," along with an associated R script, which is especially well suited to rugged, discontinuous, and multimodal functions where even approximate gradient methods are unfeasible, and MCMC approaches work poorly. The Blossom algorithm employs an evolutionary optimization strategy related to the Estimation of Multivariate Normal Algorithm (EMNA) or Covariance Matrix Adaptation (CMA), within the general family of Estimation of Distribution Algorithms (EDA). It works by successive iterations of sampling, selecting the highest-scoring subsample, and using the variance-covariance matrix of that subsample to generate a new sample, with various self-adapting parameters. Compared against a benchmark suite of challenging functions introduced in Yao, Liu, and Lin (1999), it finds equal or better maxima to those found by the genetic algorithm Genoud introduced in Mebane and Sekhon (2011). The algorithm is then tested in four challenging domains from political science: (1) estimation of nonlinear and multimodal spatial metrics; (2) maximizing balance for matching; (3) ideological scaling of judges with discontinuous objective functions; (4) community detection in social networks. In all of these cases, Blossom outperforms most existing nonlinear optimizers in R. Finally, the samples gathered during the optimization process can be efficiently used for importance sampling using approximate voronoi cells around sample points, equalling the performance of MCMC metropolis samplers in some circumstances, and also of use for generating efficient proposal distributions. Even in an increasingly MCMC world, there remain important roles for effective general-purpose optimizers, and Blossom is especially effective for rough terrains where most other methods fail.

The Impact of Sampling Procedures on Statistical Inference with Clustered Data
Jin, Shuai

Uploaded 07-20-2015
Keywords Sampling
Monte Carlo
Abstract This study explores the performances of multiple methods handling clustering under different sampling procedures. Many disciplines have proposed various methods to correct the downward bias in the OLS variance estimates with clustering. However, these methods do not take into account the effects of sampling procedures. The sampling procedures affect how clustering in the population enters into the samples; therefore, it affects the performance of the methods analyzing clustered data. This study compares eight methods of estimating variances of linear regression coefficients with clustered data under three different sampling procedures. Monte Carlo simulation results show that sampling procedures affect the variance estimates of both the group-level and the individual-level independent variables. Simple random sampling produces stable and small standard errors. Jackknife cluster standard errors generally perform well. This study analyzes a national Chinese survey dataset in the application section. The results from the real data confirm the conclusions from the Monte Carlo simulations.

< prev 1 next>