Why estimations of determinant relevance should not be based on regression analysis

This is a draft as a contribution to a discussion to a response to a discussion in the Facebook Page Psychological Methods Discussion Group.

The reason regression analyses aren’t a useful tool to determine the relative relevance of each behavioral determinant has three components.

First, the Health Psychology theories we use in determinant studies (e.g. the Reasoned Action Approach) usually explicitly state that the determinants of behavior are correlated. This is true more widely: behavioral determinants are generally associated to each other. This isn’t problematic in any way in itself, but it is something to take into account.

Second, when developing behavior change theories, we almost always use multi-component interventions. This is an extremely applied field: there’s very little experimental pre-testing of intervention components. People usually just figure out which determinants to focus on; match these to behavior change methods; and then combine applications of those methods in one behavior change intervention, which is then administered as a whole. These are usually one-shot endeavours, so you want to maximize the use of your limited resources, so people generally try to determine which determinants to focus on, never being sure in advance whether changing all (of any) of them will be succesful.

Third, the regression coefficients are only based on those parts of each predictor that are unique. This means that overlapping explanations are removed. Given that determinants are associated (describe overlapping parts of the human psychology), this means that the regression coefficients do not pertain to the original operationalisation of the construct, but rather to a ‘sub-construct’ that differs from the original construct in some way (some unknown way, unless you deliberately explore this further).

For example, let’s say we have four items, two of which are correlated quite a lot: A, B, B’ and C. A and B together form the operationalisation of X, and B’ and C form the operationalisation of Y. If we’d add X and Y as predictors in a regression analysis where we predict behavior (or, as is often done, intention to perform a behavior), the regression coefficient of X would not reflect the association of X with behavior, but of A with behavior, and the regression coefficient of Y would reflect the association of C with behavior. The overlap between X and Y in their prediction of Y (B and B’) would be removed from these terms.

In terms of commonly used determinants of behavior, this could mean, for example, that those aspects of self-efficacy that are associated to some attitudinal beliefs are ‘removed’ from the self-efficacy construct (and those aspects of attitude that are strongly associated to those same aspects of self-efficacy in their explanation of behavior would also be ‘removed’ from attitude, at least from the ‘attitude’ construct to which the regression coefficient of attitude would pertain).

To make this even more concrete, take the following example. Image we’re developing an intervention to promote the use of hearing protection, specifically earplugs, in Dutch nightlife settings. Further, imagine that earplugs are generally quite uncomfortable; diminish the experience of the music somewhat; are easy to carry; and don’t hinder conversation. People who have experience with using earplugs are aware of this, but people who don’t use earplugs have their own ideas which may be sensible but are often not.

An intervention developer does a determinant study and in the questionnaire measures the following two attitudinal beliefs:

  • “When I use earplugs, I experience an uncomfortable feeling in my ears.”
  • “When I use earplugs, my experience of the music is diminished.”

And the following two self-efficacy beliefs:

  • “Earplugs are easy to carry.”
  • “Earplugs hinder conversation.” (reverse coded)

The first two are aggregated with two other beliefs into the Attitude measure, and the latter two are aggregated with two other beliefs into the Self-Efficacy measure. The intervention developer then conducts analyses to determine which of the eight determinants (of which Attitude and Self-Efficacy are two) she should target in her intervention (as always, she has limited resources and so cannot target all determinants).

Correlation analyses show that Attitude correlates .5 with behavior, and Self-Efficacy correlates .7 with behavior. Other determinants have correlations between -.1 and .4. Let’s assume the sample sizes are so huge that the 99.99% confidence intervals have margins of error of .1. In this case, if the researcher would pick only one determinant, Self-Efficacy would make the most sense: it is the strongest predictor, and so successfully changing Self-Efficacy would have the largest impact on behavior change.

However, the four beliefs listed above are correlated to each other: people with more experience using earplugs will hold all four beliefs, whereas people with no experience will not hold these beliefs necessarily. Because these beliefs are correlated, Attitude and Self-Efficacy are correlated. A part of this correlation also concerns the prediction of behavior. Therefore, in a regression analysis, the regression coefficient of Attitude will only reflect the association between the other two attitudinal beliefs and behavior, and the regression coefficient of Self-Efficacy will only reflect the association between the other two self-efficacy beliefs and behavior. As a consequence, the regression coefficients of Attitude and Self-Efficacy don’t represent the association of each construct to behavior.

The regression coefficients reflect the associations of constructs other than the ones that were originally operationalised. A part of the variance of those original constructs has been removed, and thereby, the construct as reflected by the regression coefficients is altered.

Of course, ‘in real life’, intervention developers won’t know all this; we only know that our determinants are correlated (and are supposed to be, theoretically), and that regression analysis doesn’t know to which determinants overlapping explained variance ‘belongs’, and therefore removes the overlapping variance from the equation (literally, hehe :-)). The result is that you end up with other constructs, and you don’t know exactly which ‘sub-constructs’ you end up with.

This makes it hard to identify methods of behavior change that can then be used to effectively target those determinants. After all, as in the above example, what is self-efficacy if you’ve removed the influence of past experience exactly?

Therefore, ranking of determinants in terms of relevance for a target behavior should be based on bivariate associations.

Author: Gjalt-Jorn Peters

Gjalt-Jorn Peters works at the Dutch Open University, where he teaches methodology and statistics, and does research into health psychology, specifically behavior change. He currently works on Party Panel, a Dutch study into party behavior, Smoking Synthesis, a literature study to map what we know about reasons people start or stop smoking, his company, Greater Good, and annoying everybody around him by trying to get them to use R, partly by working on R package userfriendlyscience. He lives in Maastricht with his girlfriend and five guinea pigs (for now), and is allergic to cats. And some people.

Leave a Reply