Automated Production of High-Volume, Near-Real-Time Political Event Data
This paper summarizes the current state-of-the-art for generating high-volume, near-real-time event data using automated coding methods, based on recent efforts for the DARPA Integrated Crisis Early Warning System (ICEWS) and NSF-funded research. The ICEWS work expanded by more than two orders of magnitude previous automated coding efforts, coding of about 26-million sentences generated from 8-million stories condensed from around 30 gigabytes of text. The actual coding took six minutes. The paper is largely a general ``how-to'' guide to the pragmatic challenges and solutions to various elements of the process of generating event data using automated techniques. It also discusses a number of ways that this could be augmented with existing open-source natural language processing software to generate a third-generation event data coding system.
natural language processing
Document ID Number