Multiple Imputation

Multiple imputation is a statistical technique that imputes values for missing data. That is, it provides a best guess at what the missing data should be. Older techniques would substitute values for missing information. Mean substitution uses the mean of the variable for all missing values. The problem with mean substitution is that the standard error is lower than it should be (because all missing values don’t deviate from the mean, but they do count as degrees of freedom). Multiple imputation techniques takes a distribution of possible values for the missing datum and uses one to replace it. Thus, standard errors are unbiased. The Social Science Statistics Blog has an excellent entry on the question of whether multiple imputation is making up data:

The fact is that the vast majority of our statistical techniques require rectangular data sets, and so data that look like swiss cheese make it really hard to do anything sensible with directly. Listwise deletion, where you excise horizontal slices out of the cheese wherever you see holes, discards a lot of cheese! What MI does instead is to fill in the holes in the data using all available information from the rest of the data set (thus moving some information around) and adding uncertainty to these imputations in the form of variation in the values across the different imputed data sets (thus taking back assertions of knowledge from the imputations when it is not predictable from the rest of the data and from duplication of the same information in different places in the data). If done properly, MI merely puts the data in a convenient rectangular format and enables the user (with some simple combining rules) to apply statsitical techniques to data acting as if it were fully observed. MI standard errors then are not too small, which would be the case if data were being made up.

Recent Posts

Categories

Archives