image image
Media

Document Detail


permalink to this item
WORKING PAPER
Splitting a predictor at the upper quarter or third and the lower quarter or third
Gelman, Andrew
Park, David

Abstract
A linear regression of $y$ on $x$ can be approximated by a simple difference: the average values of $y$ corresponding to the highest quarter or third of $x$, minus the average values of $y$ corresponding to the lowest quarter or third of $x$. A simple theoretical analysis shows this comparison performs reasonably well, with 80%--90% efficiency compared to the linear regression if the predictor is uniformly or normally distributed. Discretizing $x$ into three categories claws back about half the efficiency lost by the commonly-used strategy of dichotomizing the predictor. We illustrate with the example that motivated this research: an analysis of income and voting which we had originally performed for a scholarly journal but then wanted to communicate to a general audience.

Keywords
discretization
linear regression
statistical communication
trichotomizing


File
icnPdfMini thirds3.pdf


Uploaded
07-06-2007

Document ID Number
697


   
wustlArtSci