The article by Joseph Price and Justin Wolfers, Racial Discrimination Among NBA Referees, shows that the coefficient on (%White Referees)*(Black dummy) in a regression predicting fouls per game is positive and significant. However, in table 2, they note that the mean fouls per 48 minutes played for Black players was 4.330 and for White players, 4.970 significant at the 1% level. The Black-White foul rate differential is 0.827 for all white referee crews, 0.574 for all black referee crews. In the multivariate regression, the coefficients on the Black dummy variable is always negative, meaning controlling for everything else they find most important and measurable, blacks receive fewer fouls than whites. So perhaps white refs aren’t biased against blacks, but rather are merely less biased against white players? After all, if it were a simple story of calling more fouls on blacks, why should the coefficient on black players alone be negative and significant?
They explain this away by showing a highly selective regression in footnote 3, that is contradicted by the coefficient on the black player dummy in Table 4, highlighting that when a statistician finds what he wants, he stops. With tens of ‘control’ variables, any statistical presentation inherently draws from a large set of permutations known only to the researcher (choosing variables among 16 generates over 65,000 unique combinations). If the controls in footnote 3 were good enough for explaining why the black dummy is essentially zero, why were not those same controls used when proving that the coefficient on (black dummy)(%white refs) is positive in Table 4? After all, it’s the tables, not the footnotes, where you put your best regressions.
There’s a whole lot of multicollinearity going on that makes inference tenuous, and footnote 3 suggests the authors were very cavalier about the alternative hypothesis of there being an overall bias against whites that white refs mitigate but barely dent.