Appropriate humility: choosing sides in the alpha wars based on psychology rather than methodology and statistics

[ Note: this is a first draft, a preprint of a blog post so to speak 🙂 ]

A recent 72-author preprint proposed to recalibrate when we award the qualitative label ‘significant’ in research in psychology (and other fields) such that more evidence is required before that label is used. In other words, the paper proposes that researchers have to be a bit more certain of their case before proclaiming that they have found a new effect.

The paper met with resistance, and although any proposal for change usually is, what’s interesting is that in this case, the resistance came in part from researchers involved in Open Science (the umbrella term for the movement to mature science through openness, collaboration and accountability). Since these researchers often fight for improved research practices ‘at all costs’ this resistance seems odd.

Thus ensued the Alpha Wars.

Continue reading “Appropriate humility: choosing sides in the alpha wars based on psychology rather than methodology and statistics”

When wishful thinking kills: the tragic consequences of misplaced faith in introspection

[Image by Silver Blue,]

[These are some thoughts that I’ll eventually work into a paper, so it may be a bit rough/drafty]

Psychology is characterized by an interesting paradox. On the one hand, it’s a very popular topic. After all, everybody’s a person, and the most important influences in most people’s worlds are other people. Who doesn’t love learning about oneself, one’s loved ones, one’s boss, and the leaders of one’s country? People are endlessly complex, so psychology and psychological research provide a veritable fount of knowledge.

On the other hand, that complexity of the human psychology is tenaciously denied. It is almost as if that complexity is seen rather like a spiritual entity, safe to invoke whenever it’s convenient to stare in wonder at the awesome quirks of nature and never-ending weirdness of people, but blissfully disregarded whenever it it threatening or gets in the way of day-to-day activities.

Continue reading “When wishful thinking kills: the tragic consequences of misplaced faith in introspection”

How to select junior (or other) researchers, and why not to use Impact Factors

[ UPDATE: a commentary based on this blog post has now been published in the Journal of Informetrics at ]

Recently a preprint was posted at ArXiv to explore the question “Can the Journal Impact Factor Be Used as a Criterion for the Selection of Junior Researchers?“. The abstract concludes as follows:

The results of the study indicate that the JIF (in its normalized variant) is able to discriminate between researchers who published papers later on with a citation impact above or below average in a field and publication year – not only in the short term, but also in the long term. However, the low to medium effect sizes of the results also indicate that the JIF (in its normalized variant) should not be used as the sole criterion for identifying later success: other criteria, such as the novelty and significance of the specific research, academic distinctions, and the reputation of previous institutions, should also be considered.

In this post, I aim to explain why this is wrong (and more, how following this recommendation may retard scientific progress) and I have a go at establishing a common sense framework for researcher selection that might work.

Continue reading “How to select junior (or other) researchers, and why not to use Impact Factors”

Theories versus logic models

This post responds to a question by Matti Heino, partly phrased in this Facebook post, and partly in this presentation.

Wow, good question and points!!!

I’d say, in response to slide 28: yes, they are. A logic model is not a theory. I define a logic model in this context as a model that is built from theories and empirical evidence to try and explain one very specific, bounded scenario. I define a theory as a generic constellation of constructs and (e.g. causal) relationships between those constructs. (PN, e.g., is not a theory).
The goal of theory is to derive abstract laws about reality. Their level of abstraction grants them value; gravity works in general, not only in Padova. “Attitudes predict human behavior” is a theoretical statement. “Attitude predicts physical activity in my specific subgroup” is no longer a theoretical statement: whether it’s true or not tells us little about reality in general.
So, the logic model you construe for an intervention, which you base on theory (but where you deliberately omit variables that are irrelevant in your specific situation, even though you know they can be important predictors of behavior), and which you ‘fill in’ using empirical evidence regarding the beliefs (‘change objectives’ in Intervention Mapping lingo), is not a theory. It’s also not something to evaluate in your intervention evaluation.
It’s something to study BEFORE intervention development (step 2 of Intervention Mapping).
Then, once you have your logic model of change (as IM calls it), you move forward and start matching the relevant determinants to theory. If you don’t know in advance which determinants (and which sub-determinants or beliefs) you should target with your behavior change methods, your chances of success are already diminished before you even started.
So, this is not a matter of testing theory. Intervention evaluation is not fundamental/basic science. It’s application of science. You’re under no obligation to contribute to theory – in fact, you have the wrong design for contributing to theory. Your presentation clearly shows why this is the case.
If you want to test theory, design a study to test theory.
(Similarly, if you’re curious about mediation, design a study to test mediation – i.e. a factorial experiment with multiple measurement moments – and I haven’t checked that paper (“what’s the mechanism”) recently, you might need even more.)
People commonly respond to this by expressing exasperation that it all has to be so complicated. I sympathize, but believe that nobody’s served by conducting invalid science because that keeps things fun and easy.
Only learning one or two things from a study, even one with a huge dataset, is fine. Knowledge is valuable, so it’s ok to have to work for it 🙂

Why one randomization does not a successful experiment make

Based on a PsyArXiv preprint with the admittedly slightly provocative title “Why most experiments in psychology failed: sample sizes required for randomization to generate equivalent groups as a partial solution to the replication crisis” a modest debate erupted on Facebook (see here; you need to be in the PsychMAP group to access the link, though) and Twitter (see here, here, and here) regarding randomization.

John Myles White was nice enough to produce a blog post with an example of why Covariate-Based Diagnostics for Randomized Experiments are Often Misleading (check out his blog; he has other nice entries, e.g. about why you should always report confidence intervals over point estimates).

I completely agree with the example he provides (except that where he says ‘large, finite population of N people’ I assume he means ‘large, finite sample of N people drawn from an infinite population’). This is what puzzled me about the whole discussion. I agreed with (almost all) arguments provided; but only a minority of the arguments seemed to concern the paper. So either I’m still missing something, or, as Matt Moehr ventured, we’re talking about different things.

So, hoping to get to the bottom of this, I’ll also provide an example. It probably won’t be as fancy as John’s example, but I have to work with what I have 🙂

Continue reading “Why one randomization does not a successful experiment make”

On the obsession with being normal

In statistics, one of the first distributions that one learns about is usually the normal distribution. Not only because it’s pretty, also because it’s ubiquitous.

In addition, the normal distribution is often the reference that is used when discussion other distributions: right skewed is skewed to the right  compared to the normal distribution; when looking at kurtosis, a leptokurtic distribution is relatively spiky compared to the normal distribution: and unimodality is considered the norm, too.

There exist quantitative representations of skewness, kurtosis, and modality (the dip test), and each of these can be tested against a null hypothesis, where the null hypothesis is (almost) always that the skewness, kurtosis, or dip test value of the distribution is equal to that of a normal distribution.

In addition, some statistical tests require that the sampling distribution of the relevant statistic is approximately normal (e.g. the t-test), and some require an even more elusive assumption called multivariate normality.

Perhaps all these bit of knowledge mesh together in people’s minds, or perhaps there’s another explanation: but for some reason, many researchers and almost all students operate on the assumption that their data have to be normally distributed. If they are not, they often resort to, for example, converting their data into categorical variables or transforming the data.

Continue reading “On the obsession with being normal”

Fear is a bad counsellor

[ primary audience: behavior change intervention developers ]

Threatening communication is a popular behavior change method used tobacco packaging, to promote seatbelt use and discourage substance use. However, much research also suggests that it is not the best weapon of choice when the goal is to really change behavior, or even when the goal is to raise awareness or educate people.

How is that paradox possible? This blog post will answer that question.

Continue reading “Fear is a bad counsellor”

Why one-sided tests in psychology are practically indefensible

This post is a response to a post by Daniel Lakens, “One-sided tests: Efficient and Underused“, whom I greatly respect and, apparently up until now, always vehemently agreed with. So this post is partly an opportunity for him and others to explain where I’m wrong, so dear reader, if you would take this time to point that out, I would be most grateful. Alternatively, telling me I’m right is also very much appreciated of course 🙂 In any case, if you haven’t done so yet, please read Daniel’s post first (also, see below this post for an update with more links and the origin of this discussion).

Continue reading “Why one-sided tests in psychology are practically indefensible”

Gezondheidscommunicatie op tabaksverpakking: angst is een slechte raadgever

In deze korte post (korte link om te delen: wil ik uitleggen wat je moet doen op pakjes sigaretten. Ik leg kort uit waarom ik fel tegen angstaanjagende afbeeldingen en teksten ben; waarom ze zo populair zijn; en wat ik vind dat je wel op pakjes sigaretten moet zetten. (Haast? Ga gelijk naar de bottom line.)

Continue reading “Gezondheidscommunicatie op tabaksverpakking: angst is een slechte raadgever”

Een niet-representatieve steekproef zegt ja tegen MDMA

[ This is a Dutch post, as it concerns a “study” by a Dutch TV channel, BNN ]

Op dinsdag 22 september 2015 kwamen er verontrustende berichten de wereld in:

Schokkend feitje nummer 1: 35% van die jongeren zegt meer drugs te gebruiken door de ophoging van de alcoholgrens! Damn.

[citatie van Spuiten en Slikken]

Bijna een derde van de jongeren gebruikt elke week drugs, en een derde doet maandelijks aan drugsgebruik.

[citatie van]

Dit lijken ernstige signalen. Gelukkig blijkt bij nadere inspectie dat het onderzoek waar deze conclusies op gebaseerd worden, ongeschikt is om dit soort conclusies te trekken. Er zijn zes serieuze problemen met dit onderzoek: Continue reading “Een niet-representatieve steekproef zegt ja tegen MDMA”