|
|
WORKING PAPER
Splitting a predictor at the upper quarter or third and the lower quarter or third
Gelman, Andrew
Park, David
Abstract
A linear regression of $y$ on $x$ can be approximated by a simple difference: the average values of $y$ corresponding to the highest quarter or third of $x$, minus the average values of $y$ corresponding to the lowest quarter or third of $x$. A simple theoretical analysis shows this comparison performs reasonably well, with 80%--90% efficiency compared to the linear regression if the predictor is uniformly or normally distributed. Discretizing $x$ into three categories claws back about half the efficiency lost by the commonly-used strategy of dichotomizing the predictor.
We illustrate with the example that motivated this research: an analysis of income and voting which we had originally performed for a scholarly journal but then wanted to communicate to a general audience.
Keywords
discretization linear regression statistical communication trichotomizing
File
Uploaded
07-06-2007
Document ID Number
697
|
|