About the Society
Papers, Posters, Syllabi
Submit an Item
Polmeth Mailing List
Below results based on the criteria 'linear regression'
Total number of records returned: 2
Splitting a predictor at the upper quarter or third and the lower quarter or third
A linear regression of $y$ on $x$ can be approximated by a simple difference: the average values of $y$ corresponding to the highest quarter or third of $x$, minus the average values of $y$ corresponding to the lowest quarter or third of $x$. A simple theoretical analysis shows this comparison performs reasonably well, with 80%--90% efficiency compared to the linear regression if the predictor is uniformly or normally distributed. Discretizing $x$ into three categories claws back about half the efficiency lost by the commonly-used strategy of dichotomizing the predictor. We illustrate with the example that motivated this research: an analysis of income and voting which we had originally performed for a scholarly journal but then wanted to communicate to a general audience.
A default prior distribution for logistic and other regression models
Pittau, Maria Grazia
generalized linear model
noninformative prior distribution
We propose a new prior distribution for classical (non-hierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-$t$ prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. We implement a procedure to fit generalized linear models in R with this prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several examples, including a series of logistic regressions predicting voting preferences, an imputation model for a public health data set, and a hierarchical logistic regression in epidemiology. We recommend this default prior distribution for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small) and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation.