
1 
Paper

Bayesian Measures of Explained Variance and Pooling in Multilevel (Hierarchical) Models
Gelman, Andrew
Pardoe, Iain

Uploaded 
04162004

Keywords 
adjusted Rsquared Bayesian inference hierarchical model multilevel regression partial pooling shrinkage

Abstract 
Explained variance (R2) is a familiar summary of the fit of a linear
regression and has been generalized in various ways to multilevel
(hierarchical) models. The multilevel models we consider in this paper
are characterized by hierarchical data structures in which individuals
are grouped into units (which themselves might be further grouped into
larger units), and there are variables measured on individuals and each
grouping unit. The models are based on regression relationships at
different levels, with the first level corresponding to the individual
data, and subsequent levels corresponding to betweengroup regressions
of individual predictor effects on grouping unit variables. We present
an approach to defining R2 at each level of the multilevel model, rather
than attempting to create a single summary measure of fit. Our method is
based on comparing variances in a single fitted model rather than
comparing to a null model. In simple regression, our measure generalizes
the classical adjusted R2. We also discuss a related variance comparison
to summarize the degree to which estimates at each level of the model
are pooled together based on the levelspecific regression relationship,
rather than estimated separately. This pooling factor is related to the
concept of shrinkage in simple hierarchical models. We illustrate the
methods on a dataset of radon in houses within counties using a series
of models ranging from a simple linear regression model to a multilevel
varyingintercept, varyingslope model.


3 
Paper

Estimating incumbency advantage and its variation, as an example of a before/after study
Gelman, Andrew
Huang, Zaiying

Uploaded 
02072003

Keywords 
Bayesian inference beforeafter study Congressional elections Gibbs

Abstract 
Incumbency advantage is one of the most studied features in American
legislative elections. In this paper, we construct and implement an
estimate that allows incumbency advantage to vary between individual
incumbents. This model predicts that openseat elections will be less
variable than those with incumbents running, an observed empirical
pattern that is not explained by previous models. We apply our method
to the U.S. House of Representatives in the twentieth century: our
estimate of the overall pattern of incumbency advantage over time is
similar to previous estimates (although slightly lower), and we also
find a pattern of increasing variation. In addition to the
application to incumbency advantage, our approach represents a new
method, using multilevel modeling, for estimating effects in
before/after studies. 

5 
Paper

StateLevel Opinions from National Surveys: Poststratification using Hierarchical Logistic Regression
Park, David K.
Gelman, Andrew
Bafumi, Joseph

Uploaded 
07122002

Keywords 
Bayesian Inference Hierarchical Logit Poststratification Public Opinion States Elections

Abstract 
Previous researchers have pooled national surveys in order to construct
statelevel opinions. However, in order to overcome the small n problem
for less populous states, they have aggregated a decade or more of
national surveys to construct their measures. For example, Erikson,
Wright and McIver (1993) pooled 122 national surveys conducted over 13
years to produce statelevel partisan and ideology estimates. Brace,
SimsButler, Arceneaux, and Johnson (2002) pooled 22 surveys over a
25year period to produce statelevel opinions on a number of specific
issues. We construct a hierarchical logistic regression model for the
mean of a binary response variable conditional on poststratification
cells. This approach combines the modeling approach often used in
smallarea estimation with the population information used in
poststratification (see Gelman and Little 1997). We produce statelevel
estimates pooling seven national surveys conducted over a nineday
period. We first apply the method to a set of U.S preelection polls,
poststratified by state, region, as well as the usual demographic
variables and evaluate the model by comparing it to statelevel election
outcomes. We then produce statelevel partisan and ideology estimates by
comparing it to Erikson, Wright and McIver's estimates. 

6 
Paper

Flexible Prior Specifications for Factor Analytic Models with an Application to the Measurement of American Political Ideology
Quinn, Kevin M.

Uploaded 
04202000

Keywords 
factor analysis intrinsic autoregression hierarchical modeling Bayesian inference political ideology

Abstract 
Factor analytic measurement models are widely used in the social
sciences to measure latent variables and functions thereof. Examples
include the measurement of: political preferences, liberal democracy,
latent determinants of exchange rates, and latent factors in arbitrage
pricing theory models and the corresponding pricing deviations.
Oftentimes, the results of these measurement models are sensitive to
distributional assumptions that are made regarding the latent factors.
In this paper I demonstrate how prior distributions commonly used in
image processing and spatial statistics provide a flexible means to
model dependencies among the latent factor scores that cannot be
easily captured with standard prior distributions that treat the
factor scores as (conditionally) exchangeable. Markov chain Monte
Carlo techniques are used to fit the resulting models. These modeling
techniques are illustrated with a simulated data example and an
analysis of American political attitudes drawn from the 1996 American
National Election Study. 

8 
Paper

Operationalizing and Testing Spatial Theories of Voting
Quinn, Kevin M.
Martin, Andrew D.

Uploaded 
04151998

Keywords 
spatial voting factor analysis multinomial probit multinomial logit Bayesian inference model comparison Bayes factors MCMC Dutch politics Danish politics

Abstract 
Spatial models of voting behavior provide the foundation for a
substantial number of theoretical results. Nonetheless, empirical
work involving the spatial model faces a number of potential
difficulties. First, measures of the latent voter and candidate issue
positions must be obtained. Second, evaluating the fit of competing
statistical models of voter choice is often more complicated than
previously realized. In this paper, we discuss precisely these
issues. We argue that confirmatory factor analysis applied to
masslevel issue preference questions is an attractive means of
measuring voter ideal points. We also show how party issue positions
can be recovered using a variation of this strategy. We go on to
discuss the problems of assessing the fit of competing statistical
models (multinomial logit vs. multinomial probit) and competing
explanations (those based on spatial theory vs. those derived from
other theories of voting such as sociological theories). We
demonstrate how the Bayesian perspective not only provides
computational advantages in the case of fitting the multinomial probit
model, but also how it facilitates both types of comparison mentioned
above. Results from the Netherlands and Denmark suggest that even
when the computational cost of multinomial probit is disregarded, the
decision whether to use multinomial probit (MNP) or multinomial logit
(MNL) is not clearcut. 

9 
Paper

Not Asked and Not Answered: Multiple Imputation for Multiple Surveys
Gelman, Andrew
King, Gary
Liu, Chuanhai

Uploaded 
10271997

Keywords 
Bayesian inference cluster sampling diagnostics hierarchical models ignorable nonresponse missing data political science sample surveys stratified sampling multiple imputation

Abstract 
We present a method of analyzing a series of independent
crosssectional surveys in which some questions are not answered in
some surveys and some respondents do not answer some of the questions
posed. The method is also applicable to a single survey in which
different questions are asked, or different sampling methods used, in
different strata or clusters. Our method involves multiplyimputing
the missing items and questions by adding to existing methods of
imputation designed for single surveys a hierarchical regression model
that allows covariates at the individual and survey levels.
Information from survey weights is exploited by including in the
analysis the variables on which the weights were based, and then
reweighting individual responses (observed and imputed) to estimate
population quantities. We also develop diagnostics for checking the
fit of the imputation model based on comparing imputed to nonimputed
data. We illustrate with the example that motivated this project 
a study of preelection public opinion polls, in which not all the
questions of interest are asked in all the surveys, so that it is
infeasible to impute each survey separately. 

10 
Paper

Multilevel (hierarchical) modeling: what it can and can't do
Gelman, Andrew

Uploaded 
01262005

Keywords 
Bayesian inference hierarchical model multilevel regression

Abstract 
Multilevel (hierarchical) modeling is a generalization of linear and generalized linear modeling in which regression coefficients are themselves given a model, whose parameters are also estimated from data. We illustrate the strengths and limitations of multilevel modeling through an example of the prediction of home radon levels in U.S. counties. The multilevel model is highly effective for predictions at both levels of the model but could easily be misinterpreted for
causal inference. 

12 
Paper

Designing and Analyzing Randomized Experiments
Horiuchi, Yusaku
Imai, Kosuke
Taniguchi, Naoko

Uploaded 
07052005

Keywords 
Bayesian inference causal inference noncompliance nonresponse randomized block design

Abstract 
In this paper, we demonstrate how to effectively design and analyze randomized experiments, which are becoming increasingly common in political science research. Randomized experiments provide researchers with an opportunity to obtain unbiased estimates of causal effects because the randomization of treatment guarantees that the treatment and control groups are on average equal in both observed and unobserved characteristics. Even in randomized experiments, however, complications can arise. In political science experiments, researchers often cannot force subjects to comply with treatment assignment or to provide the information necessary for the estimation of causal effects. Building on the recent statistical literature, we show how to make statistical adjustments for these noncompliance and nonresponse problems when analyzing randomized experiments. We also demonstrate how to design randomized experiments so that the potential impact of such complications is minimized. 

13 
Paper

Modeling Foreign Direct Investment as a Longitudinal Social Network
Jensen, Nathan
Martin, Andrew
Westveld, Anton

Uploaded 
07132007

Keywords 
foreign direct investment social network data longitudinal data hierarchical modeling mixture modeling Bayesian inference.

Abstract 
An extensive literature in international and comparative political economy has focused on the how the mobility of capital affects the ability of governments to tax and regulate firms. The conventional wisdom holds that governments are in competition with each other to attract foreign direct investment (FDI). Nationstates observe the fiscal and regulatory decisions of competitor governments, and are forced to either respond with policy changes or risk losing foreign direct investment, along with the politically salient jobs that come with these investments. The political economy of FDI suggests a network of investments with complicated dependencies.
We propose an empirical strategy for modeling investment patterns in 24 advanced industrialized countries from 19852000. Using bilateral FDI data we estimate how increases in flows of FDI affect the flows of FDI in other countries. Our statistical model is based on the methodology developed by Westveld & Hoff (2007). The model allows the temporal examination of each notion's activity level in investing, attractiveness to investors, and reciprocity between pairs of nations. We extend the model by treating the reported inflow and outflow data as independent replicates of the true value and allowing for a mixture model for the fixed effects portion of the network model. Using a fully Bayesian approach, we also impute missing data within the MCMC algorithm used to fit the model. 

15 
Paper

A default prior distribution for logistic and other regression models
Gelman, Andrew
Jakulin, Aleks
Pittau, Maria Grazia
Su, YuSung

Uploaded 
08032007

Keywords 
Bayesian inference generalized linear model least squares hierarchical model linear regression logistic regression multilevel model noninformative prior distribution

Abstract 
We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student$t$ prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longertailed version of the distribution attained by assuming onehalf additional success and onehalf additional failure in a logistic regression. We implement a procedure to fit generalized linear models in R with this prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several examples, including a series of logistic regressions predicting voting preferences, an imputation model for a public health data set, and a hierarchical logistic regression in epidemiology.
We recommend this default prior distribution for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small) and also automatically applying more shrinkage to higherorder interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missingdata imputation. 

16 
Paper

Why we (usually) don't have to worry about multiple comparisons
Gelman, Andrew
Hill, Jennifer
Yajima, Masanao

Uploaded 
06012008

Keywords 
Bayesian inference hierarchical modeling multiple comparisons type S error statistical significance

Abstract 
The problem of multiple comparisons can disappear when viewed from a Bayesian perspective. We propose building multilevel models in the settings where multiple comparisons arise. These address the multiple comparisons problem and also yield more efficient estimates, especially in settings with low grouplevel variation, which is where multiple comparisons are a particular concern.
Multilevel models perform partial pooling (shifting estimates toward each other), whereas classical procedures typically keep the centers of intervals stationary, adjusting
for multiple comparisons by making the intervals wider (or, equivalently, adjusting the pvalues corresponding to intervals of fixed width). Multilevel estimates make comparisons more conservative, in the sense that intervals for comparisons are more likely to include zero; as a result, those comparisons that are made with confidence are more likely to be valid. 

17 
Paper

Nonparametric Priors For Ordinal Bayesian Social Science Models: Specification and Estimation
Gill, Jeff
Casella, George

Uploaded 
08212008

Keywords 
generalized linear mixed model ordered probit Bayesian approaches nonparametric priors Dirichlet process mixture models nonparametric Bayesian inference

Abstract 
A generalized linear mixed model, ordered probit, is used to estimate levels of stress in presidential political appointees as a means of understanding their surprisingly short tenures. A Bayesian approach is developed, where the random effects are modeled with a Dirichlet process mixture prior, allowing for useful incorporation of prior information, but retaining some vagueness in the form of the prior. Applications of Bayesian models in the social sciences are typically done with ``noninformative'' priors, although some use of informed versions exists. There has been disagreement over this, and our approach may be a step in the direction of satisfying both camps. We give a detailed description of the data, show how to implement the model, and describe some interesting conclusions. The model utilizing a nonparametric prior fits better and reveals more information in the data than standard approaches. 

