On the obsession with being normal

In statistics, one of the first distributions that one learns about is usually the normal distribution. Not only because it’s pretty, also because it’s ubiquitous.

In addition, the normal distribution is often the reference that is used when discussion other distributions: right skewed is skewed to the right  compared to the normal distribution; when looking at kurtosis, a leptokurtic distribution is relatively spiky compared to the normal distribution: and unimodality is considered the norm, too.

There exist quantitative representations of skewness, kurtosis, and modality (the dip test), and each of these can be tested against a null hypothesis, where the null hypothesis is (almost) always that the skewness, kurtosis, or dip test value of the distribution is equal to that of a normal distribution.

In addition, some statistical tests require that the sampling distribution of the relevant statistic is approximately normal (e.g. the t-test), and some require an even more elusive assumption called multivariate normality.

Perhaps all these bit of knowledge mesh together in people’s minds, or perhaps there’s another explanation: but for some reason, many researchers and almost all students operate on the assumption that their data have to be normally distributed. If they are not, they often resort to, for example, converting their data into categorical variables or transforming the data.

Continue reading “On the obsession with being normal”

The importance of matching: a case study

Earlier (ok, in the only previous, first post on this blog) I discussed the recent study of Zachary Horne et al. (2015), where they concluded that threatening communication may be an effective approach to counter anti-vaccination attitudes. One of the problems with this study was that the manipulation was not valid: the conditions differed on many variables, any of which may explain the results they found.

After I deliberated for a while whether to inform the authors of the blog post, I decided to do so in the spirit of academic debate, transparency, and learning from each other. He swiftly replied, and one of the things he dis was correct my assumption that they did not share their data. They did actually share their data! I think that’s very commendable – I strongly believe that all researchers should Fully Disclose. Zachary posted it at the excellent (and free) Open Science Framework repository, specifically at http://osf.io/nx364. After having downloaded the data, I decided to write a brief follow-up post about matching of conditions and validity of manipulations. Continue reading “The importance of matching: a case study”