
1 
Paper

Automated Production of HighVolume, NearRealTime Political Event Data
Schrodt, Philip

Uploaded 
08302010

Keywords 
event data ICEWS DARPA natural language processing open source forecasting prediction conflict

Abstract 
This paper summarizes the current stateoftheart for generating highvolume, nearrealtime event data using automated coding methods, based on recent efforts for the DARPA Integrated Crisis Early Warning System (ICEWS) and NSFfunded research. The ICEWS work expanded by more than two orders of magnitude previous automated coding efforts, coding of about 26million sentences generated from 8million stories condensed from around 30 gigabytes of text. The actual coding took six minutes. The paper is largely a general ``howto'' guide to the pragmatic challenges and solutions to various elements of the process of generating event data using automated techniques. It also discusses a number of ways that this could be augmented with existing opensource natural language processing software to generate a thirdgeneration event data coding system. 

3 
Paper

Dynamic Bayesian Forecasting of Presidential Elections in the States
Linzer, Drew

Uploaded 
07162012

Keywords 
President Forecasting Public Opinion Elections

Abstract 
I present a dynamic Bayesian forecasting model that enables early and accurate prediction of U.S. presidential election outcomes at the state level. The method systematically combines information from historical forecasting models in real time with results from the large number of statelevel opinion surveys that are released publicly during the campaign. The result is a set of forecasts that are initially as good as the historical model, then gradually increase in accuracy as Election Day nears. I employ a hierarchical specification to overcome the limitation that not every state is polled on every day, allowing the model to borrow strength both across states and, through the use of randomwalk priors, across time. The model also filters away daytoday variation in the polls due to sampling error and national campaign eects, which enables daily tracking of voter preferences towards the presidential candidates at the state and national levels. Simulation techniques are used to estimate the candidates' probability of winning each state and, consequently, a majority of votes in the Electoral College. I apply the model to preelection polls from the 2008 presidential campaign and demonstrate that the victory of Barack Obama was never realistically in doubt. The model is currently ready to be deployed for forecasting the outcome of the 2012 presidential election.
Project website: votamatic.org 

4 
Paper

Moving Mountains: Bayesian Forecasting As Policy Evaluation
Brandt, Patrick T.
Freeman, John R.

Uploaded 
04242002

Keywords 
Bayesian vector autoregression VAR policy evaluation conditional forecasting

Abstract 
Many policy analysts fail to appreciate the dynamic, complex causal
nature of political processes. We advocate a vector autoregression
(VAR) based approach to policy analysis that accounts for various
multivariate and dynamic elements in policy formulation and
for both dynamic and specification uncertainty of parameters. The
model we present is based on recent developments in Bayesian
VAR modeling and forecasting. We present an example based on work in
Goldstein et al. (2001) that illustrates how a full accounting of the
dynamics and uncertainty in multivariate data can lead to more
precise and instructive results about international mediation in
Middle Eastern conflict. 

5 
Paper

How Factual is your Counterfactual?
King, Gary
Zeng, Langche

Uploaded 
07122001

Keywords 
counterfactual causality forecasting democracy

Abstract 
Inferences about counterfactuals are essential for prediction,
answering ``what if'' questions, and estimating causal effects.
However, when the counterfactuals posed are too far from the data at
hand, conclusions drawn from wellspecified statistical analyses
become based on speculation and convenient but indefensible model
assumptions rather than empirical evidence. Yet, standard model
outputs do not reveal the degree of modeldependence, and so this
problem can be hard to detect, regardless of its severity. We
develop easytoapply methods to evaluate counterfactuals that do
not require sensitivity testing over specified classes of models.
One analysis with these methods applies to the class of all models,
for any smooth conditional expectation function, and to the set of
all possible dependent variables, given only the choice of a set of
explanatory variables. We illustrate by studying the scholarly
literatures that try to assess the effects of changes in the degree
of democracy in a country (on any dependent variable); we find
widespread evidence that scholars are inadvertently drawing
conclusions based more on their hypotheses than on their empirical
evidence. 

6 
Paper

Forecasting Conflict in the Balkans using Hidden Markov Models
Schrodt, Philip A.

Uploaded 
08242000

Keywords 
forecasting event data hidden Markov models conflict Balkans Yugoslavia

Abstract 
This study uses hidden Markov models (HMM) to forecast conflict in the former
Yugoslavia for the period January 1991 through January 1999. The political and
military events reported in the lead sentences of Reuters news service stories
were coded into the World Events Interaction Survey (WEIS) event data scheme.
The forecasting scheme involved randomly selecting eight 100event "templates"
taken at a 1, 3 or 6month forecasting lag for highconflict and lowconflict
weeks. A separate HMM is developed for the highconflictweek sequences and
the lowconflictweek sequences. Forecasting is done by determining whether a
sequence of observed events fit the highconflict or lowconflict model with
higher probability.
Models were selected to maximize the difference between correct and incorrect
predictions, evaluated by week. Three weighting schemes were used: unweighted
(U), penalize false positives (P) and penalize false negatives (N). There is a
relatively high level of convergence in the estimates‹the best and worst models
of a given type vary in accuracy by only about 15% to 20%. In fullsample
tests, the U and P models produce at overall accuracy of around 80%. However,
these models correctly forecast only about 25% of the highconflict weeks,
although about 60% of the cases where a highconflict week has been forecast
turn out to have high conflict. In contrast, the N model has an overall
accuracy of only about 50% in fullsample tests, but it correctly forecasts
highconflict weeks with 85% accuracy in the 3 and 6month horizon and 92%
accuracy in the 1month horizon. However, this is achieved by excessive
predictions of highconflict weeks: only about 30% of the cases where a
highconflict week has been forecast are highconflict. Models that use
templates from only the previous year usually do about as well as models based
on the entire sample.
The models are remarkably insensitive to the length of the forecasting
horizon‹the dropoff in accuracy at longer forecasting horizons is very small,
typically around 2%4%. There is also no clear difference in the estimated
coefficients for the 1month and 6month models. An extensive analysis was
done of the coefficient estimates in the fullsample model to determine what
the model was "looking at" in order to make predictions. While a number of
statistically significant differences exist between the high and low conflict
models, these do not fall into any neat patterns. This is probably due to a
combination of the large number of parameters being estimated, the multiple
local maxima in the estimation surface, and the complications introduced by the
presence of a number of very low probability event categories. Some
experiments with simplified models indicate that it is possible to use models
with substantially fewer parameters without markedly decreasing the accuracy of
the predictions; in fact predictions of the high conflict periods actually
increase in accuracy quite substantially. 

7 
Paper

The Problem with Quantitative Studies of International Conflict
Beck, Nathaniel
King, Gary
Zeng, Langche

Uploaded 
07151998

Keywords 
Conflict logit neural networks forecasting Bayesian analysis

Abstract 
Despite immense data collections, prestigious journals, and
sophisticated analyses, empirical findings in the literature on
international conflict are frequently unsatisfying. Statistical
results appear to change from article to article and specification
to specification. Very few relationships hold up to replication
with even minor respecification. Accurate forecasts are
nonexistent. We provide a simple conjecture about what accounts for
this problem, and offer a statistical framework that better matches
the substantive issues and types of data in this field. Our model,
a version of a ``neural network'' model, forecasts substantially
better than any previous effort, and appears to uncover some
structural features of international conflict. 

8 
Paper

Forecasting Parliamentary Outcomes in Multiparty Elections: Hungary 1998
Benoit, Kenneth

Uploaded 
08161998

Keywords 
computer simulation resampling election forecasting electoral systems Hungary

Abstract 
Forecasting seat outcomes in legislative elections in countries with
stable, twoparty systems is sufficiently challenging as to have
proven elusive for much of democratic experience. Forecasting an
election in a relatively new democracy with a fluid multiparty
system, therefore, would seem on its face to be a hopeless
objective. In this paper I attempt to demonstrate that election
forecasting in such an environment is in fact quite feasible, using
data from previous elections, opinion poll research, and computer
simulation models to predict the outcome of the Hungarian
parliamentary elections which took place in May 1998. First, I
discuss the general problems with election forecasting, and then
outline a strategy for dealing with each. I outline a forecasting
method in detail, which I apply to Hungary's case to generate a
prediction published in December 1997. The remainder of the paper
compares the actual results of the election to the author's
forecasts published before the election, identifying areas for
improvement in the basic forecasting model but also proving that
accurate forecasting of final outcomes in multiparty elections is
possible in practice. 

9 
Paper

Estimating the Probability of Events That have Never Occurred: When Does Your Vote Matter?
Gelman, Andrew
King, Gary
Boscardin, John

Uploaded 
10271997

Keywords 
conditional probability decision analysis elections electoral campaigning forecasting political science presidential elections rare events rational choice subjective probability voting power

Abstract 
Researchers sometimes argue that statisticians have little to
contribute when few realizations of the process being estimated are
observed. We show that this argument is incorrect even in the
extreme situation of estimating the probabilities of events so rare
that they have never occurred. We show how statistical forecasting
models allow us to use empirical data to improve inferences about
the probabilities of these events.
Our application is estimating the probability that your vote will be
decisive in a U.S. presidential election, a problem that has been
studied by political scientists for more than two decades. The
exact value of this probability is of only minor interest, but the
number has important implications for understanding the optimal
allocation of campaign resources, whether states and voter groups
receive their fair share of attention from prospective presidents,
and how formal ``rational choice'' models of voter behavior might be
able to explain why people vote at all.
We show how the probability of a decisive vote can be estimated
empirically from statelevel forecasts of the presidential election
and illustrate with the example of 1992. Based on generalizations
of standard political science forecasting models, we estimate the
(prospective) probability of a single vote being decisive as about 1
in 10 million for close national elections such as 1992, varying by
about a factor of 10 among states.
Our results support the argument that subjective probabilities of
many types are best obtained via empiricallybased statistical
prediction models rather than solely mathematical reasoning. We
discuss the implications of our findings for the types of decision
analyses that are used in public choice studies. 

10 
Paper

Estimating the Probability of Events That have Never Occurred: When Does Your Vote Matter?
Gelman, Andrew
King, Gary
Boscardin, John

Uploaded 
02141997

Keywords 
conditional probability decision analysis elections electoral campaigning forecasting political science presidential elections rare events rational choice subjective probability voting power

Abstract 
Researchers sometimes argue that statisticians have little to
contribute when few realizations of the process being estimated are
observed. We show that this argument is incorrect even in the extreme
situation of estimating the probabilities of events so rare that they
have never occurred. We show how statistical forecasting models allow
us to use empirical data to improve inferences about the probabilities
of these events.
Our application is estimating the probability that your vote will be
decisive in a U.S. presidential election, a problem that has been
studied by researchers in political science for more than two decades.
The exact value of this probability is of only minor interest, but the
number has important implications for understanding the optimal
allocation of campaign resources, whether states and voter groups
receive their fair share of attention from prospective presidents, and
how formal ``rational choice'' models of voter behavior might be able
to explain why people vote at all.
We show how the probability of a decisive vote can be estimated
empirically from statelevel forecasts of the presidential election
and illustrate with the example of 1992. Based on generalizations of
standard political science forecasting models, we estimate the
(prospective) probability of a single vote being decisive as about 1
in 10 million for close national elections such as 1992, varying by
about a factor of 10 among states.
Our results support the argument that subjective probabilities of many
types are best obtained via empiricallybased statistical prediction
models rather than solely mathematical reasoning. We discuss the
implications of our findings for the types of decision analyses that
are used in public choice studies. 

11 
Paper

Demographic Forecasting
Girosi, Federico
King, Gary

Uploaded 
07102003

Keywords 
forecasting

Abstract 
We introduce a new framework for forecasting agesexcountrycausespecific mortality rates that incorporates considerably more information, and thus has the potential to forecast much better, than any existing approach. Mortality forecasts are used in a wide variety of academic fields, and for global and national health policy making, medical and pharmaceutical research, and social security and retirement planning.
As it turns out, the tools we developed in pursuit of this goal also have broader statistical implications, in addition to their use for forecasting mortality or other variables with similar statistical properties. First, our methods make it possible to include different explanatory variables in a time series regression for each crosssection, while still borrowing strength from one regression to improve the estimation of all. Second, we show that many existing Bayesian (hierarchical and spatial) models with explanatory variables use prior densities that incorrectly formalize prior knowledge. Many demographers and public health researchers have fortuitously avoided this problem so prevalent in other fields by using prior knowledge only as an ex post check on empirical results, but this approach excludes considerable information from their models. We show how to incorporate this demographic knowledge into a model in a statistically appropriate way. Finally, we develop a set of tools useful for developing models with Bayesian priors in the presence of partial prior ignorance. This approach also provides many of the attractive features claimed by the empirical Bayes approach, but fully within the standard Bayesian theory of inference. The latest version of this manuscript is available at http://gking.harvard.edu. 

