logoPicV1 logoTextV1

Search Results


Below results based on the criteria 'Regression'
Total number of records returned: 47

1
Paper
Bootstrap Methods for Non-nested Hypothesis Tests
Mebane, Walter R.
Sekhon, Jasjeet

Uploaded 07-20-1996
Keywords Cox Test
Bootstrap
LISREL
Endogenous Switching Regression
Tobit-Style Censoring
Abstract Cox (1961; 1962) proposed a fairly general method that can be used to construct powerful tests of alternative hypotheses from separate statistical families. We prove that non-parametric bootstrap methods can produce consistent and second-order correct approximations to the distribution of the Cox statistic for non-nested LISREL-style covariance structure models. We use the method to investigate a question about the specification of a LISREL model used by Kinder, Adams and Gronke (1989). In a second application---a pair of non-nested endogenous switching regression models with tobit-style censoring, applied to real data---we illustrate how bootstrap calibration can be used to correct the size of the test when the test distribution is being estimated by Monte Carlo simulation due to concern about nonregularity.

2
Paper
When Mayors Matter: Estimating the Impact of Mayoral Partisanship on City Policy
Gerber, Elisabeth
Hopkins, Daniel

Uploaded 09-18-2009
Keywords Regression discontinuity design
partisanship
urban fiscal policy
Abstract U.S. cities are limited in their ability to set policy. Can these constraints mute the impact of mayorsâ?? partisanship on policy outcomes? We hypothesize that mayoral discretion--and thus partisanshipâ??s influence--will be more pronounced in policy areas where there is the less shared authority between local, state, and federal governments. To test this hypothesis, we create a novel data set combining U.S. mayoral election returns from 1990 to 2006 with urban fiscal data. Using regression discontinuity design, we find that cities that elect a Democratic mayor spend less on public safety, a policy area where local discretion is high, than otherwise similar cities that elect a Republican or Independent. We find no differences on tax policy, social policy, and other areas that are characterized by significant overlapping authority. These results have important implications for political accountability: mayors may not be able to influence the full range of policies that are nominally local responsibilities.

3
Paper
The Exposure Theory of Access: Why Some Firms Seek More Access to Incumbents Than Others
Fouirnaies, Alexander
Hall, Andrew B.

Uploaded 04-07-2014
Keywords interest groups
regulation
campaign finance
incumbency advantage
regression discontinuity design
rdd
text analysis
pca
Abstract Access-oriented interest groups are responsible for a large part of incumbents' financial advantage in U.S. legislative campaigns, but there is considerable variation among access-oriented groups in their contribution behavior. Why do some interest groups seek more access to incumbents than others? In this paper we demonstrate that firms with higher levels of exposure to regulation seek more access to incumbents in both U.S. state legislatures and the U.S. House. To do so, we construct a measure of firm-level exposure to regulation using the text of SEC filings, and we employ an electoral regression discontinuity design to estimate how firms' sensitivity to incumbency varies with exposure. The results indicate that the most regulated firms are more than twice as sensitive to incumbency as the least regulated firms, on average. The findings point to one potential value of political access: the chance to influence regulatory policy.

4
Paper
Who Needs Ecological Regression? Measuring the Constitutionality of Majority-Minority Districts
Epstein, David
O'Halloran, Sharyn

Uploaded 04-09-1997
Keywords voting rights
districting
ecological regression
Abstract According to the Supreme Courts interpretation of the 1965 Voting Rights Act, minority voters must have an equal opportunity to elect the representative of their choice. Yet the key term candidate of choice has never been fully defined, the elections analyzed in voting rights cases are usually for different offices than the ones being challenged and, worst of all, current methods for determining the point of equal opportunity rely heavily on ecological regression. In fact, the ecological fallacy is especially pernicious in voting rights cases, for it can be triggered by exactly the phenomena that the VRA sought to promote. This paper offers an alternative measure of equal opportunity that circumvents ecological regression in favor of probit or logit analyses of electoral results. It also provides a way to compare the likely substantive policy impacts of competing districting schemes. These techniques are then applied to the analysis of elections to the South Carolina State Senate.

5
Paper
Language Access and Initiative Outcomes: Did the Voting Rights Act Influence Support for Bilingual Education?

Uploaded 12-17-2009
Keywords regression discontinuity design
multilevel modeling
immigrant political incorporation
language access
elections
Voting Rights Act
Abstract This paper investigates one tool designed to enfranchise immigrants: foreign-language election materials. Specifically, it estimates the impact of Spanish-language assistance provided under Section 203 of the Voting Rights Act. Focusing on a California initiative on bilingual education, it tests how Spanish-language materials influenced turnout and election outcomes in Latino neighborhoods. It also considers the possibility of an anti-Spanish backlash in non-Hispanic white neighborhoods. Empirically, the analysis couples a regression discontinuity design with multilevel modeling to isolate the impact of Section 203. The analysis finds that Spanish-language assistance increased turnout and reduced support for ending bilingual education in Latino neighborhoods with many Spanish speakers. It finds hints of backlash among non-Hispanic white precincts, but not with the same certainty. The turnout finding gains additional support from multilevel regression discontinuity analyses of 2004 Latino voter turnout nationwide. For Latino citizens who speak little English, the availability of Spanish ballots increases turnout and influences election outcomes as well.

6
Paper
Getting the Mean Right is a Good Thing: Generalized Additive Models
Beck, Nathaniel
Jackman, Simon

Uploaded 01-30-1997
Keywords non-parametric regression
scatterplot smoothing
local fitting
splines
non-linearity
Perot
incumbency
cabinet duration
democratic peace
Abstract This is a substantial revision of the paper submitted as beck96. A shorter version of this paper is under consideration at a political science journal of note. Theory: Social scientists almost always use statistical models positing the dependent variable as a linear function of X, despite suspicions that the social and political world is not so parsimonious. Generalized additive models (GAMs) permit each independent variable to be modelled non-parametrically while requiring that the independent variables combine additively, striking a sensible balance between the flexibility of non-parametric techniques and the ease of interpretation and familiarity of linear regression. GAMs thus offer social scientists a practical methodology for improving on the extant practice of ``linearity by default''. Method: We present the statistical concepts and tools underlying GAMs (e.g., scatterplot smoothing, non-parametrics more generally, and accompanying graphical methods), and summarize issues pertaining to estimation, inference, and the statistical properties of GAMs. Monte Carlo experiments assess the validity of tests of linearity accompanying GAMs. Re-analysis of published work in American politics, comparative politics, and international relations demonstrates the usefulness of GAMs in social science settings. Results: Our re-analyses of published work show that GAMs can extract substantive mileage beyond that yielded by linear regression, offering novel insights, particularly in terms of modelling interactions. The Monte Carlo experiments show there is little danger of GAMs spuriously finding non-linear structures. All data analysis, Monte Carlo experiments, and statistical graphs were generated using S-PLUS, Version 3.3. The routines and data are available at ftp://weber.uscd.edu/pub/nbeck/gam.

7
Paper
Multilevel (hierarchical) modeling: what it can and can't do
Gelman, Andrew

Uploaded 01-26-2005
Keywords Bayesian inference
hierarchical model
multilevel regression
Abstract Multilevel (hierarchical) modeling is a generalization of linear and generalized linear modeling in which regression coefficients are themselves given a model, whose parameters are also estimated from data. We illustrate the strengths and limitations of multilevel modeling through an example of the prediction of home radon levels in U.S. counties. The multilevel model is highly effective for predictions at both levels of the model but could easily be misinterpreted for causal inference.

8
Paper
Data Mining for Theorists
Kenkel, Brenton
Signorino, Curtis

Uploaded 07-26-2011
Keywords empirical implications of theoretical models
basis regression
adaptive lasso
bootstrap
functional form misspecification
Abstract Among those interested in statistically testing formal models,two approaches dominate. The structural estimation approach derives a structural probability model based on the formal model and then estimates parameters associated with that model. The reduced-form approach generally applies off-the-shelf techniques---such as OLS, logit, or probit---to test whether the independent variables are related to a decision variable according to the comparative statics predictions. We provide a new statistical method for the comparative statics approach. The decision variable of interest is modeled as a polynomial function of the available covariates, which allows for the nonmonotonic and interactive relationships commonly found in strategic choice data. We use the adaptive lasso to reduce the number of parameters and prevent overfitting, and we obtain measures of uncertainty via the nonparametric bootstrap. The method is "data mining" because the aim is to discover complex relationships in data without imposing a particular structure,but "for theorists" in that it was developed specifically to deal with the peculiar features of data on strategic choice. Using a Monte Carlo simulation, we show that the method handily outperforms other non-structural techniques in estimating a nonmonotonic relationship from strategic choice data.

9
Paper
Bayesian Measures of Explained Variance and Pooling in Multilevel (Hierarchical) Models
Gelman, Andrew
Pardoe, Iain

Uploaded 04-16-2004
Keywords adjusted R-squared
Bayesian inference
hierarchical model
multilevel regression
partial pooling
shrinkage
Abstract Explained variance (R2) is a familiar summary of the fit of a linear regression and has been generalized in various ways to multilevel (hierarchical) models. The multilevel models we consider in this paper are characterized by hierarchical data structures in which individuals are grouped into units (which themselves might be further grouped into larger units), and there are variables measured on individuals and each grouping unit. The models are based on regression relationships at different levels, with the first level corresponding to the individual data, and subsequent levels corresponding to between-group regressions of individual predictor effects on grouping unit variables. We present an approach to defining R2 at each level of the multilevel model, rather than attempting to create a single summary measure of fit. Our method is based on comparing variances in a single fitted model rather than comparing to a null model. In simple regression, our measure generalizes the classical adjusted R2. We also discuss a related variance comparison to summarize the degree to which estimates at each level of the model are pooled together based on the level-specific regression relationship, rather than estimated separately. This pooling factor is related to the concept of shrinkage in simple hierarchical models. We illustrate the methods on a dataset of radon in houses within counties using a series of models ranging from a simple linear regression model to a multilevel varying-intercept, varying-slope model.

10
Paper
Strange Bedfellows or the Usual Suspects? Spatial Models of Ideology and Interest Group Coalitions
Almeida, Richard

Uploaded 04-01-2005
Keywords Interest groups
coalitions
spatial theory
poisson regression
ideology
Abstract Entering into coalitions has become a standard tactic for interest groups trying to maximize success while minimizing cost. The strategic conditions underlying decisions to form or join coalitions are beginning to be explored in the political science literature, yet very little is known about the process and criteria through which interest groups select coalition partners. In this paper, I explore the partner selection process by applying spatial theories of ideology and coalition formation to interest group participation on amicus curiae briefs. Previous work demonstrates that the lobbying efforts of groups can be used to generate a general measure of ideology for any group. These captured ideology scores are used in statistical models of interest group coalition partner selection on amicus curiae briefs from 1954-1985. This research demonstrates that the ideology scores captured for each group are powerful predictors of interest group coalition partner selection, even when controls for resources, group type, and other potential predictors are included.

11
Paper
Practical Issues in Implementing and Understanding Bayesian Ideal Point Estimation
Bafumi, Joseph
Gelman, Andrew
Park, David K.
Kaplan, Noah

Uploaded 06-11-2004
Keywords Ideal points
Bayesian
Logistic regression
Rasch model
Abstract In recent years, logistic regression (Rasch) models have been used in political science for estimating ideal points of legislators and Supreme Court justices. These models present estimation and identifiability challenges, such as improper variance estimates, scale and translation invariance, reflection invariance, and issues with outliers. We resolve these issues using Bayesian hierarchical modeling, linear transformations, informative regression predictors, and explicit modeling for outliers. In addition, we explore new ways to usefully display inferences and check model fit.

12
Paper
Diagnostics for multivariate imputation
Abayomi, Kobi
Gelman, Andrew
Levy, Marc

Uploaded 08-16-2005
Keywords missing data
multiple imputation
regression diagnostics
Abstract We consider three sorts of diagnostics for random imputations: (a) displays of the completed data, intended to reveal unusual patterns that might suggest problems with the imputations, (b) comparisons of the distributions of observed and imputed data values, and (c) checks of the fit of observed data to the model used to create the imputations. We formulate these methods in terms of sequential regression multivariate imputation [Van Buuren and Oudshoom 2000, and Raghunathan, Van Hoewyk, and Solenberger 2001], an iterative procedure in which the missing values of each variable are randomly imputed conditional on all the other variables in the completed data matrix. We also consider a recalibration procedure for sequential regression imputations. We apply these methods to the 2002 Environmental Sustainability Index (ESI), a linear aggregation of 68 environmental variables on 142 countries, with 22% missing values.

13
Paper
Kernel Regularized Least Squares: Moving Beyond Linearity and Additivity Without Sacrificing Interpretability
Hainmueller, Jens
Hazlett, Chad

Uploaded 04-25-2012
Keywords regression
classification
prediction
Abstract We propose the use of Kernel Regularized Least Squares (KRLS) for social science modeling and inference problems. KRLS borrows from machine learning methods designed to solve regression and classification problems without relying on linearity or additivity assumptions. The method constructs a flexible hypothesis space that uses kernels as radial basis functions and finds the best fitting surface in this space by minimizing a complexity-penalized least squares problem. We provide an accessible explanation of the method and argue that it is well suited for social science inquiry because it avoids strong parametric assumptions and still allows for simple interpretation in ways analogous to OLS or other members of the GLM family. We also extend the method in several directions to make it more effective for social inquiry. In particular, we (1) derive new estimators for the pointwise marginal effects and their variances, (2) establish unbiasedness, consistency, and asymptotic normality of the KRLS estimator under fairly general conditions, (3) develop an automated approach to chose smoothing parameters, and (4) provide companion software. We illustrate the use of the methods through several simulations and a real-data example.

14
Paper
Robust Estimation and Outlier Detection for Overdispersed Multinomial Models of Count Data, with an Application to the Elian Effect in Florida
Mebane, Walter R.
Sekhon, Jasjeet

Uploaded 07-12-2002
Keywords robust estimation
overdispersed multinomial regression
Abstract We develop a robust estimation method for regression models for vectors of counts (overdispersed multinomial models). The method requires only that the model is good for most---not all---of the observed data, and it identifies outliers. A Monte Carlo sampling experiment shows that the robust method can produce consistent parameter estimates and correct statistical inferences even when ten percent of the data are generated by a significantly different process, where nonrobust maximum likelihood estimation fails. We analyze Florida county vote data from the 2000 presidential election, considering votes for five categories of presidential candidates (Buchanan, Nader, Gore, Bush and ``other''), focusing on Cuban-Americans' reactions to the Elian Gonzalez affair. We replicate results regarding Buchanan's vote in Palm Beach County. We use Census tract data within Miami-Dade County to confirm the need to take the Cuban-American population explicitly into account. The analysis illustrates how the robust method can support triangulation to verify whether a regression specification is adequate.

15
Paper
Scaling regression inputs by dividing by two standard deviations
Gelman, Andrew

Uploaded 06-10-2006
Keywords regression
standardization
$z$-score
Abstract Interpretation of regression coefficients is sensitive to the scale of the inputs. One method often used to place input variables on a common scale is to divide each variable by its standard deviation. Here we propose dividing each variable by {em two} standard deviations, so that the generic comparison is with inputs equal to the mean $pm 1$ standard deviation. The resulting coefficients are then directly comparable for untransformed binary predictors. We have implemented the procedure as a function in R. We illustrate the method with a simple public-opinion analysis that is typical of regressions in social science.

16
Paper
We Have to Be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data
Cranmer, Skyler
Gill, Jeff

Uploaded 04-30-2012
Keywords missing data
categorical
hot-decking
MCAR
multiple imputation
MAR
GLM
regression
missingness
Abstract Missing values are a frequent problem in empirical political science research. Surprisingly, there has been little attention to the match between the measurement of the missing values and the correcting algorithms used. While multiple imputation is a vast improvement over the deletion of cases with missing values, it is often ill suited for imputing highly non-granular discrete data. We develop a simple technique for imputing missing values in such situations, which is a variant of hot deck imputation, drawing from the conditional distribution of the variable with missing values to preserve the discrete measure of the variable. This method is tested against existing techniques using Monte Carlo analysis and then applied to real data on democratisation and modernisation theory. We provide software for our imputation technique in a free and easy-to-use package for the \R\ statistical environment.

17
Paper
Individual Choice and Ecological Analysis
McCue, Kenneth F.

Uploaded 12-02-2001
Keywords ecological regression
voter transitions
multivariate multinomial
split-ticket voting
aggregation bias
liner probability model
Abstract The use of the linear probability model in aggregate voting analysis has now received widespread attention in political science. This article shows that when the linear probability model is assumed to be consistent for the choice of the individual, it is actually a member of a general class of models for estimating individual responses from aggregate data. This class has the useful property that it defines the aggregate analysis problem as a function of the individual choice decisions, and allows the placement of most aggregate voting models into a common probabilistic framework. This framework allows the solution of such problems as inference of individual responses from aggregate data, estimation of the transition model, and the joint estimation and inference from individual and aggregate data. Examples with actual data are provided for these techniques with excellent results.

18
Paper
Using Graphs Instead of Tables to Improve the Presentation of Empirical Results in Political Science
Kastellec, Jonathan
Leoni, Eduardo

Uploaded 11-15-2006
Keywords statistical graphics
tables
presentation
descriptive statistics
regression results
Abstract When political scientists present empirical results, they are much more likely to use tables rather than graphs, despite the fact that the latter greatly increases the clarity of presentation and makes it easier for a reader or listener to draw clear and correct inferences. Using a sample of leading journals, we document this tendency and suggest reasons why researchers prefer tables. We argue the extra work required in producing graphs is rewarded by greatly enhanced presentation and communication of empirical results. We illustrate their benefits by turning several published tables into graphs, including tables that present descriptive data and regression results. We show that regression graphs properly emphasize point estimates and confidence intervals rather than null significance hypothesis testing, and that they can successfully present the results of multiple regression models. A move away from tables and towards graphs would increase the quality of the discipline's communicative output and make empirical findings more accessible to every type of audience.

19
Paper
Enhancing a Geographic Regression Discontinuity Design Through Matching to Estimate the Effect of Ballot Initiatives on Voter Turnout
Keele, Luke
Titiunik, Rocio
Zubizarreta, Jose

Uploaded 07-13-2012
Keywords matching
causal inference
geopgraphy
regression discontinuity
Abstract Of late there has been a renewed interest in natural experiments as a method for drawing causal inferences from observational data. One form of natural experiment exploits variation in geography where units in one geographic area receive a treatment while units in another area do not. In this kind of geographic natural experiment, the hope is that assignment to treatment via geographic location creates as-if random variation in treatment assignment. When this happens, adjustment for baseline covariates is unnecessary. In many applications, however, some adjustment for baseline covariates may be necessary due to strategic sorting around the border between treatment and control areas. As such, analysts may wish to combine identification strategies--using both spatial proximity and covariates--for more plausible inferences. Here we explore how to utilize spatial proximity as well as covariates in the analysis of geographic natural experiments. We contend that standard statistical tools are ill-equipped to exploit covariates as well as variation in treatment assignment that is a function of spatial proximity. We use a mixed integer programming matching algorithm to flexibly incorporate information about both the discontinuity and observed covariates which allows us to minimize spatial distance while preserving balance on observed covariates. We argue the combining both information about covariates and the discontinuity creates a method of estimation that can be informally thought of as doubly robust. We demonstrate the method with data on ballot initiatives and turnout in Milwaukee, WI.

20
Paper
Logistic Regression in Rare Events Data (revised)
King, Gary
Zeng, Langche

Uploaded 07-09-1999
Keywords rare events
logit
logistic regression
binary dependent variables
bias correction
case-control
choice-based
endogenous selection
selection bias
Abstract This paper is for the \r\nmethods conference; it \r\nis a revised version of \r\na paper that was \r\npreviously sent to the \r\npaper server.

21
Paper
Incumbency as a Source of Contamination in Mixed Electoral Systems
Hainmueller, Jens
Kern, Holger Lutz

Uploaded 03-10-2006
Keywords contamination
mixed electoral systems
causal inference
regression-discontinuity design
treatment effects
incumbency
Abstract In this paper we demonstrate empirically that incumbency is a source of contamination in Germany's mixed electoral system. Using a quasi-experimental research design that allows for causal inference under a weaker set of assumptions than the regression models commonly used in the electoral systems literature, we find that incumbency causes a gain of $1.4$ to $1.7$ percentage points in PR vote shares. We also present simulations of Bundestag seat distributions to demonstrate that contamination effects caused by incumbency are sufficiently large to trigger significant shifts in parliamentary majorities

22
Paper
Using Regression Discontinuity to Uncover the Personal Incumbency Advantage
Erikson, Robert S.
Titiunik, Rocio

Uploaded 07-17-2012
Keywords regression discontinuity
incumbency advantage
Abstract We study the conditions under which estimating the incumbency advantage using a regression discontinuity (RD) design recovers the personal incumbency advantage in a two-party system. Lee (2008) introduced RD as a method for estimating the party incumbency advantage. We develop a simple model that expands the interpretation of the RD design and leads to unbiased estimates of the personal incumbency advantage. Our model yields the surprising result that the RD design double counts the personal incumbency advantage. We estimate the incumbency advantage using our model with data from U.S. House elections between 1968 and 2008. We also explore the estimation of the incumbency advantage beyond the limited RD conditions where knife-edge electoral shifts create the leverage for causal inference.

23
Paper
Logistic Regression in Rare Events Data
King, Gary
Zeng, Langche

Uploaded 05-20-1999
Keywords rare events
logit
logistic regression
binary dependent variables
bias correction
case-control
choice-based
endogenous selection
selection bias
Abstract Rare events are binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (``nonevents''). In many literatures, rare events have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter million dyads, only a few of which are at war. As it turns out, easy procedures exist for making valid inferences when sampling all available events (e.g., wars) and a tiny fraction of non-events (peace). This enables scholars to save as much as 99% of their (non-fixed) data collection costs, or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

24
Paper
Splitting a predictor at the upper quarter or third and the lower quarter or third
Gelman, Andrew
Park, David

Uploaded 07-06-2007
Keywords discretization
linear regression
statistical communication
trichotomizing
Abstract A linear regression of $y$ on $x$ can be approximated by a simple difference: the average values of $y$ corresponding to the highest quarter or third of $x$, minus the average values of $y$ corresponding to the lowest quarter or third of $x$. A simple theoretical analysis shows this comparison performs reasonably well, with 80%--90% efficiency compared to the linear regression if the predictor is uniformly or normally distributed. Discretizing $x$ into three categories claws back about half the efficiency lost by the commonly-used strategy of dichotomizing the predictor. We illustrate with the example that motivated this research: an analysis of income and voting which we had originally performed for a scholarly journal but then wanted to communicate to a general audience.

25
Paper
An Alternative Solution to the Heckman Selection Problem: Selection Bias as Functional Form Misspecification
Kenkel, Brenton
Signorino, Curtis

Uploaded 07-18-2012
Keywords selection models
functional form misspecification
nonparametric models
polynomial regression
Abstract The "selection problem" is typically seen as a form of omitted variable bias. We recast the problem as one of functional form misspecification and examine two situations in which flexible or nonparametric estimation techniques may be used as a complement or alternative to traditional selection models. First, we show that such techniques can allow a researcher to recover the conditional relationship between covariates and the expected outcome, even if data on the probability of selection into the subsample is unavailable. We demonstrate the validity of this approach analytically and using Monte Carlo simulations. Second, we show that flexible methods can be used to validate or improve a linear selection model specification when a researcher does possess the prior-stage data. We illustrate this process with an application to data from Mroz (1987) on women's wages.

26
Paper
Inference from Response-Based Samples with Limited Auxiliary Information
King, Gary
Zeng, Langche

Uploaded 07-09-1999
Keywords rare events
logit
logistic regression
binary dependent variables
bias correction
case-control
choice-based
endogenous selection
selection bias
epidemiology
Abstract This paper is for the methods conference; it is related to "Logistic Regression in Rare Events Data," also by us; the conference presentation will be based on both papers. We address a disagreement between epidemiologists and econometricians about inference in response-based (a.k.a. case-control, choice-based, retrospective, etc.) samples. Epidemiologists typically make the rare event assumption (that the probability of disease is arbitrarily small), which makes the relative risk easy to estimate via the odds ratio. Econometricians do not like this assumption since it is false and implies that attributable risk (a.k.a. a first difference) is zero, and they have developed methods that require no auxiliary information. These methods produce bounds on the quantities of interest that, unfortunately, are often fairly wide and always encompass a conclusion of no treatment effect (relative risks of 1 or attributable risks of 0) no matter how strong the true effect is. We simplify the existing bounds for attributable risk, making it much easier to estimate, and then suggest one possible resolution of the disagreement by providing a method that allows researchers to include easily available information (such as that the fraction of the population with the disease falls within at most [.001,.05]); this method considerably narrows the bounds on the quantities of interest. We also offer software to implement the methods suggested. We would very much appreciate any comments you might have!

27
Paper
A default prior distribution for logistic and other regression models
Gelman, Andrew
Jakulin, Aleks
Pittau, Maria Grazia
Su, Yu-Sung

Uploaded 08-03-2007
Keywords Bayesian inference
generalized linear model
least squares
hierarchical model
linear regression
logistic regression
multilevel model
noninformative prior distribution
Abstract We propose a new prior distribution for classical (non-hierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-$t$ prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. We implement a procedure to fit generalized linear models in R with this prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several examples, including a series of logistic regressions predicting voting preferences, an imputation model for a public health data set, and a hierarchical logistic regression in epidemiology. We recommend this default prior distribution for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small) and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation.

28
Paper
On the Validity of the Regression Discontinuity Design for Estimating Electoral Effects: New Evidence from Over 40,000 Close Races
Eggers, Andrew
Folke, Olle
Fowler, Anthony
Hainmueller, Jens
Hall, Andrew B.

Uploaded 05-15-2013
Keywords regression discontinuity
elections
Abstract Many papers use regression discontinuity (RD) designs that exploit ``close" election outcomes in order to identify the effects of election results on various political and economic outcomes of interest. Several recent papers critique the use of RD designs based on close elections because of the potential for imbalance near the threshold that distinguishes winners from losers. In particular, for U.S.\ House elections during the post-war period, lagged variables such as incumbency status and previous vote share are significantly correlated with victory even in very close elections. This type of sorting naturally raises doubts about the key RD assumption that the assignment of treatment around the threshold is quasi-random. In this paper, we examine whether similar sorting occurs in other electoral settings, including the U.S. House in other time periods, statewide, state legislative, and mayoral races in the U.S., and national and/or local elections in a variety of other countries, including the U.K., Canada, Germany, France, Australia, India, and Brazil. No other case exhibits sorting. Evidently, the U.S.\ House during the post-war period is an anomaly.

29
Paper
Regression Analysis and the Philosophy of Social Sciences -- a Critical Realist View
Ron, Amit

Uploaded 12-20-1999
Keywords Regression analysis
empiricism
critical realism
philosophy of social science
Abstract This paper challenges the connection conventionally made between regression analysis and the empiricist philosophy of science and offers an alternative explication for the way regression analysis is being practiced. The alternative explication is based on critical realism, a competing approach to empiricism in the field of philosophy of science. The paper argues that critical realism can better explicate the way in which scientists ‘play’ with the data as part of the process of inquiry. The practice of regression analysis is understood by the critical realist explication as a post hoc attempt to identify a restricted closed system. The gist of successful regression analysis is not being able to offer a law-like statement but to bring forth evidence of an otherwise hidden mechanism. Through the study methodological debates regarding regression analysis, it is argued that critical realism can offer conceptual tools for better understanding the core issues that are at stake in these debates.

30
Paper
MPs for Sale? Estimating Returns to Office in Post-War British Politics
Eggers, Andrew
Hainmueller, Jens

Uploaded 03-22-2008
Keywords regression discontinuity design
RDD
matching
UK
Britain
political economy
Abstract While the role of money in policymaking is a central question in political economy research, surprisingly little attention has been given to the rents politicians actually derive from politics. We use both matching and a regression discontinuity design to analyze an original dataset on the estates of recently deceased British politicians. We find that serving in Parliament roughly doubled the wealth at death of Conservative MPs but had no discernible effect on the wealth of Labour MPs. We argue that Conservative MPs profited from office in a lax regulatory environment by using their political positions to obtain outside work as directors, consultants, and lobbyists, both while in office and after retirement. Our results are consistent with anecdotal evidence on MPs' outside financial dealings but suggest that the magnitude of Conservatives' financial gains from office was larger than has been appreciated.

31
Paper
The Perils of Failed Randomization: Investigating Regression Adjustment of Regionally Confounded Cross-National Data
Paine, Jack

Uploaded 07-18-2013
Keywords Natural experiment
Regression
Causal Inference
Political Regimes
Abstract Many important papers studying cross-national outcomes such as political regime type or economic development exploit treatment variables generated by either geological or pre-modern historical processes. A general and major problem with these treatments, however, derives from their heavy regional concentration. Despite not being caused by other variables that independently affect the dependent variable, due to geological or historical accidents, variables such as oil or settler mortality claimed to be exogenous are nonetheless highly correlated with potential confounders that impede drawing causal inferences. With the goal of eliminating bias by controlling for observables, many papers studying variables such as these use parametric procedures to control for regional dummies. While estimation techniques such as ordinary least squares (OLS) provide a seemingly straightforward methodological fix, OLS also obscures particular shortcomings of the data, and imposes strong assumptions to combine information across regions. The current paper takes a closer look at these assumptions and provides examples from top political science and economic journals to show how disaggregating the data can either help to support or to severely qualify existing results.

32
Paper
Time Series Models for Discrete Data: solutions to a problem with quantitative studies of international conflict
Jackman, Simon

Uploaded 07-21-1998
Keywords categorical time series
dependent binary data
Markov regression models
latent autoregressive process
Markov Chain Monte Carlo
international conflict
democratic peace
Abstract Discrete dependent variables with a time series structure occupy something of a statistical limbo for even well-trained political scientists, prompting awkward methodological compromises and dubious substantive conclusions. An important example is the use of binary response models in the analysis of longitudinal data on international conflict: researchers understand that the data are not independent, but lack any way to model serial dependence in the data. Here I survey methods for modeling categorical data with a serial structure. I consider a number of simple models that enjoy frequent use outside of political science (originating in biostatistics), as well as a logit model with an autoregressive error structure (the latter model is fit via Bayesian simulation using Markov chain Monte Carlo methods). I illustrate these models in the context of international conflict data. Like other re-analyses of these data addressing the issue of serial dependence, citeaffixed{beck:btscs}{e.g.,}, I find economic interdependence does not lessen the chances of international conflict. Other findings include a number of interesting asymmetries in the effects of covariates on transitions from peace to war (and vice versa). Any reasonable model of international conflict should take into account the high levels of persistence in the data; the models I present here suggest a number of methods for doing so.

33
Paper
Model Specification in Instrumental-Variables Regression
Dunning, Thad

Uploaded 07-03-2008
Keywords Instrumental-Variables Least Squares (IVLS) regression
model specification
specification error
homogenous partial effects
Abstract In many applications of instrumental-variables regression, researchers seek to defend the plausibility of a key assumption: the instrumental variable is independent of the error term in a linear regression model. Although fulfilling this exogeneity criterion is necessary for a valid application of the instrumental variables approach, it is not sufficient. In the regression context, the identification of causal effects depends not just on the exogeneity of the instrument but also on the validity of the underlying model. In this paper, I focus on one feature of such models: the assumption that variation in the endogenous regressor that is related to the instrumental variable has the same effect as variation that is unrelated to the instrument. In many applications, this assumption may be quite strong, but relaxing it can limit our ability to estimate parameters of interest. After discussing two substantive examples, I develop analytic results (simulations are reported elsewhere). I also present a specification test that may be useful for determining the relevance of these issues in a given application.

34
Paper
krls: A Stata Package for Kernel-Based Regularized Least Squares
Ferwerda, Jeremy
Hainmueller, Jens
Hazlett, Chad

Uploaded 09-13-2013
Keywords machine learning
regression
classification
prediction
Stata
Abstract The Stata package krls implements Kernel-Based Regularized Least Squares (KRLS), a machine learning method described in Hainmueller and Hazlett (2013) that allows users to solve regression and classification problems without manual specification search and strong functional form assumptions. The flexible KRLS estimator learns the functional form from the data and thereby protects inferences against misspecification bias. Yet, it nevertheless allows for interpretability and inference in ways similar to ordinary regression models. In particular, KRLS provides closed-form estimates for the predicted values, variances, and the pointwise partial derivatives that characterize the marginal effects of each independent variable at each data point in the covariate space. The method is thus a convenient and powerful alternative to OLS and other GLMs for regression-based analyses.

35
Paper
The Number of Parties: New Evidence from Local Elections
Benoit, Kenneth

Uploaded 08-11-1998
Keywords electoral systems
regression analysis
Duverger
political parties
Hungary
Abstract Theory: Duverger's ``Law'' concerning the structural and psychological consequences of electoral rules has been much studied in both single cases and in multinational samples, but these suffer from several common theoretical and empirical shortcomings that make their estimates suspect. Besides resort to experimental data, another solution is to select a carefully controlled election dataset where the precise nature of the processes generating the data is understood. Local elections provide a means to control social cleavages as well as to provide a potentially large number of observations. Hypotheses: The size of electoral districts, as well as the type of electoral formula, will influence the number of parties that compete, the concentration of support for these parties, and the number of parties that win seats, even when the elections are confined to one country at the subnational level. In addition, the greater number of observations should provide very precise estimates of these effects. Methods: Regression analysis of district magnitude with an interactive term characterizing rules as proportional or plurality. The data come from 8,377 Hungarian local elected bodies consisting of municipal councils, county councils, town councils, and mayors. Results: The results extend previous research on Duverger's effects, providing more precise estimates that may be compared directly to previous results. In addition, the analysis of rare multi-member plurality elections reveals a counter-intuitive result about candidate and party entry in response to these rules, suggesting several directions for future investigation of MMP rules.

36
Paper
The Persuasive Effects of Direct Mail: A Regression Discontinuity Approach
Meredith, Marc
Kessler, Daniel
Gerber, Alan

Uploaded 07-21-2008
Keywords regression discontinuity
direct mail
persuasion
turnout
Abstract During the contest for Kansas attorney general in 2006, an organization sent out 6 pieces of mail criticizing the incumbent's conduct in office. We exploit a discontinuity in the rule used to select which households received the mailings to identify the causal effect of mail on vote choice and voter turnout. We find these mailings had both a statistically and politically significant effect on the challenger's vote share. Our estimates suggest that a ten percentage point increase in the amount of mail sent to a precinct increased the challenger's vote share by approximately three percentage points. Furthermore, our results suggest that the mechanism for this increase was persuasion rather than mobilization.

37
Paper
What Happens When Extremists Win Primaries?
Hall, Andrew B.

Uploaded 12-17-2013
Keywords rdd
regression discontinuity
scaling
primaries
polarization
congress
elections
Abstract I study how the nomination of an extremist changes general-election outcomes and legislative behavior in the U.S. House, 1980--2010, using a regression discontinuity design in primary elections. When an extremist---as measured by primary-election campaign receipt patterns---wins a "coin-flip" election over a moderate, the party's general-election vote share decreases by approximately 11--14 percentage points, and the probability that the party wins the seat decreases by 40--48 percentage points. These negative effects persist, without diminishing, for eight years---the entire redistricting period. This electoral penalty is so large that randomly nominating the more extreme primary candidate causes the district's subsequent roll-call representation to reverse, becoming more liberal when an extreme Republican is nominated and more conservative when an extreme Democrat is nominated. In safe districts, however, extremists win the general election often enough to cancel out these roll-call reversals. The tradeoff that primary voters face between supporting "electable" candidates and supporting more ideological candidates thus varies across district types.

38
Paper
Measuring the Electoral and Policy Impact of Majority-Minority Voting Districts: Candidates of Choice, Equal Opportunity, and Representation
Epstein, David
O'Halloran, Sharyn

Uploaded 09-15-1998
Keywords voting rights act
ecological regression
Abstract The Voting Rights Act guarantees minority voters an "equal opportunity to elect the candidate of their choice." Yet the implementation of this requirement is beset with technical difficulties: first, current case law provides no clear definition as to who qualifies as a candidate of choice of the minority community; second, traditional techniques for estimating equal opportunity rely heavily on ecological regression, which is prone to statistical bias; and third, no attempt is made to systematically evaluate the impact of alternative districting strategies on the substantive representation of minority interests, rather than just descriptive representation. This paper offers an alternative approach to majority-minority districting that 1) explicitly defines the term "candidate of choice;" 2) determines the point of equal opportunity without relying on ecological regression; and 3) estimates the expected impact of competing districting schemes on substantive representation. It then applies this technique to a set of alternative districting plans for the South Carolina State Senate.

39
Paper
Can October Surprise? A Natural Experiment Assessing Late Campaign Effects
Meredith, Marc
Malhotra, Neil

Uploaded 10-14-2008
Keywords Vote by mail
natural experiment
campaign effects
momentum
convenience voting
regression discontinuity
Abstract One consequence of the proliferation of vote-by-mail (VBM) in certain areas of the United States is the opportunity for voters to cast ballots weeks before Election Day. Understanding the ensuing effects of VBM on late campaign information loss has important implications for both the study of campaign dynamics and public policy debates on the expansion of convenience voting. Unfortunately, the self-selection of voters into VBM makes it difficult to casually identify the effect of VBM on election outcomes. We overcome this identification problem by exploiting a natural experiment, in which some precincts are assigned to be VBM-only based on an arbitrary threshold of the number of registered voters. We assess the effects of VBM on candidate performance in the 2008 California presidential primary via a regression discontinuity design. We show that VBM both increases the probability of selecting candidates who withdrew from the race in the interval after the distribution of ballots but before Election Day and affects the relative performance of candidates remaining in the race. Thus, we find evidence of late campaign information loss, pointing to the influence of campaign events and momentum in American politics, as well as the unintended consequences of convenience voting.

40
Paper
The fault in our stars: Measuring and correcting significance bias in Political Science
Esarey, Justin
Wu, Ahra

Uploaded 01-16-2014
Keywords significance
hypothesis test
regression
Abstract Prior research finds that statistically significant results are overrepresented in scientific publications. If significant results are consistently favored in the review process, published results will systematically overstate the magnitude of their findings. Worse yet, the typical two-tailed statistical significance test with \alpha=0.05 does little to prevent the proliferation of false positives in the literature. In this paper, we systematically measure the impact of these two forms of significance bias on published research in quantitative political science. We estimate that 35% or more of published results exaggerate their substantive significance to a meaningful degree, with an average upward bias of 9%-20%. Additionally, 15%-35% of published results are at elevated risk of being false positives. Most importantly, we evaluate a variety of new and existing methodological strategies to correct both forms of significance bias. We conclude that a smaller \alpha threshold combined with conservative Bayesian priors is an effective remedy.

41
Paper
Getting the Mean Right: Generalized Additive Models
Beck, Nathaniel
Jackman, Simon

Uploaded 00-00-0000
Keywords non-parametric regression
smoothing
loess
non-linear egression
Monte Carlo analysis
interaction effects
incumbency
cabinet duration
violence
Abstract We examine the utility of the generalized additive model as an alternative to the common linear model. Generalized additive models are flexible in that they allow the effect of each independent variable to be modelled non-parametrically while requiring that the effect of all the independent variables is additive. GAMs are common in the statistics literature but are conspicuously absent in political science. The paper presents the basic features of the generalized additive model. Through Monte Carlo experimentation we show that there is little danger of the generalized additive model finding spurious structures. We use GAMS to reanalyze several political science data sets. These applications show that generalized additive models can be used to improve standard analyses by guiding researchers as to the parametric shape of response functions. The technique also provides interesting insights about data, particularly in terms of modelling interactions.

42
Paper
Regression Adjustments to Experimental Data: Do David Freedmanâ??s Concerns Apply to Political Science?
Green, Donald

Uploaded 07-15-2009
Keywords Experiments
Regression
Covariates
Analysis of Covariance
Abstract Abstract: One of David Freedman's important legacies was to raise awareness of the assumptions that underlie everyday statistical practice, such as regression analysis. His recent papers (Freedman 2008a, 2008b) offer stern warnings to those who offer regression analysis as an appropriate way to analyze experimental results. In particular, Freedman demonstrates that including pre-treatment covariates as controls leads to bias in finite samples and inaccurate standard errors. Freedman advises researchers against using regression adjustments for experiments involving fewer than 500 observations (2008a, p.191), a recommendation that has gained increasing attention and acceptance among social scientists. This paper argues that the ever-cautious Freedman was probably too cautious in his recommendations. After explicating the special features of Freedman's model, I use a combination of simulated and actual examples to show that as a practical matter the biases that Freedman pointed out tend to be negligible for N > 20. Pathological cases that could generate biases for larger experiments involve extreme outliers that would be readily detected through visual inspection.

43
Paper
How can soccer improve statistical learning?
Filho, Dalson
Rocha, Enivaldo
Paranhos, Ranulfo
Júnior, José

Uploaded 03-19-2014
Keywords quantitative methods
linear regression
soccer
Abstract This paper presents an active classroom exercise focusing on the interpretation of ordinary least squares regression coefficients. Methodologically, students analyze Brazilian soccer matches data, formulate and test classical hypothesis regarding home team advantage. Technically, our framework is simply adapted for others sports and has no implementation cost. In addition, the exercise is easily conducted by the instructor and highly enjoyable for the students. The intuitive approach also facilitates the understanding of linear regression practical application.

44
Poster
The Dynamic Battle for Pieces of Pie: Modeling Party Support in Multi-Party Nations
Philips, Andrew
Rutherford, Amanda

Uploaded 07-10-2014
Keywords Error Correction model (ECM)
Compositional
Seemingly Unrelated Regression (SUR)
Party Support
Dynamic Modeling
Multi-party
Abstract Current compositional models fail to adequately show tradeoffs between more than three categories. In addition, they inadequately address both short- and long-run dynamics. We propose a modeling approach for estimating and interpreting the causal relationships that shape tradeoffs in party support as they evolve over time. This builds on the work by Katz and King (1999) and Tomz, Tucker, and Wittenberg (2002) by developing dynamic models of compositional dependent variables. Using a set of existing tools --Aitchson’s (1986) logratio transformation, error correction models (ECMs), and Seemingly Unrelated Regressions (SUR)—we aim to produce a user-friendly statistical package that will help researchers estimate their models and produce graphical interpretation of scenarios they specify.

45
Poster
Weighted Estimation for Analyses with Missing Data
Samii, Cyrus

Uploaded 07-21-2010
Keywords missing data
doubly robust
inverse probability weighting
semi-parametric
post-treatment
regression
sample selection
Abstract Missing data plague data analyses in political science. The recent applied statistics literature reflects renewed interest in weighting methods for missing data problems. Three properties are stressed in this literature: (i) robustness, (ii) the ability to use post-treatment information in causal analysis, and (iii) methods to gain efficiency. I present these results, hoping to show the potential in using refashioned weighting methods for political science research.

46
Poster
Embracing Methodological Pluralism in Comparative Politics: Game Theory, Data Inspection, and Case Studies
Paine, Jack

Uploaded 07-17-2014
Keywords Comparative Politics
Methodology
Multiple Regression
Game Theory
Case Studies
Civil Wars
Oil
Abstract Inferring causal relationships from cross national data poses inherent difficulties—an unsolvable problem. But the staple method of multiple regression obscures as much as it illuminates. We can do better with the data we have to generate more reliable statistical findings. This poster examines how game theory, simple data inspection, and case studies can provide additional support for well-substantiated arguments and expose concerns with problematic regression results. I draw examples from my substantive research focused mainly on civil wars and authoritarian regimes. Thus, this poster also summarizes methodological themes from my dissertation.

47
Poster
WhentheSTARs Align:What IOs AreMoreConducivetoDemocratization
Chyzh, Olga

Uploaded 07-31-2011
Keywords democracy
spatial dependence
diffusion
international organization
spatial regression
m-STAR
Abstract The scholars of democracy have long noted the tendency of democratic states to cluster in time and space. While most theoretical explanations of this phenomenon posit causal mechanisms related to spatial interdependence (e.g. diffusion, socialization), very few studies have conducted adequate empirical tests of these theories. This methodological oversight is due both to the scarcity of available statistical techniques that allow for testing these types of effects, as well as to the methodological sophistication of the existing techniques. Yet the value of empirical inferences is largely dependent on correct model specification. I develop several hypotheses linking state democracy level to membership in international organizations (IOs) that vary in scope, institutional capacity, and centralization. I test these hypotheses using several alternative approaches that allow to correct or explicitly model spatial and temporal dependence. I start with more common approaches, such as the use of a lagged dependent variable, fixed effects, and panel corrected standard errors, and then re-estimate the results using a multi-parametric spatio-temporal autocorrelation model (m-STAR). In this final model, I test my hypotheses using overlapping IO memberships in different types of IOs, as well as geographic contiguity as the spatial weights. I argue that while the lagged dependent variable, fixed effects, and panel-corrected standard errors show more desirable qualities than a naĂŻve model, the m-STAR provides for the most adequate testing, from both a methodological and a theoretical perspective. Unlike the former three techniques that treat spatial and temporal dependence as a nuisance, the M-STAR allows for explicit modeling and estimation of contemporaneous spatial effects. Its ability to estimate spatial effects occurring within the same time-period as the unit-level effects makes this model particularly useful at evaluating the hypotheses posited in this paper, as well as such phenomena as diffusion and socialization more broadly.


< prev 1 next>
   
wustlArtSci