Rethinking my Science
How will social and personality psychologists look back on 2011? With pride at having continued the hard work of unraveling the mysteries of human behavior, or with concern that the only thing that is unraveling is their discipline?
This question has been addressed in a recent post by Sanjay Srivastava (2011a) which highlighted two potentially troubling events relating to social psychology that occurred in 2011:
- The paper on precognition by Daryl Bem in the Journal of Personality and Social Psychology (Bem, 2011). Experiments reported in this paper suggested that events that occurred in the future could cause events that occurred in the past.
- The scientific fraud case of social psychologist Diederik Stapel.
These events have created a substantial amount of controversy, and Srivastava makes a crucial argument for their importance. But although these two events are significant and certainly deserve our attention, they are flukes rather than game-changers.
Bem’s precognition paper was probably a freak event, and it was demystified a bit by a panel of experts at an important symposium at the Society of Experimental Social Psychology meeting in October. The panel members reported a variety of studies that did not find evidence for precognition or other forms of ESP, and questioned the statistical methods used by Bem and others. Overall, far more empirical and theoretical work must be conducted before these findings can be accepted as valid.
The behavior of Dietrich Stapel is reprehensible and reminds us all what we simply cannot do. His actions have harmed a lot of us and will likely cause even further difficulties going forward. In response, many of us have spent time discussing the pressures of academia with our students and colleagues. But Stapel’s behavior is also a fluke. Some of us probably do fabricate data, but I imagine the numbers are relatively few.
As shocking as they are, neither of these events create real problems for social psychologists; say for a committee who is evaluating a social psychology candidate for promotion or tenure (unless of course he or she is claiming to have found precognition or has falsified his or her data!)
But three other papers published over the past two years must completely change how we think about our field and how we must conduct our research within it. And each is particularly important for me, personally, because each has challenged a fundamental assumption that was part of my training as a social psychologist.
First is the paper by Henrich, Heine, and Norenzayan (2010) on the use of WEIRD samples. These authors argue that our usual college student samples (White, Educated, Intelligent, Rich, and Democratic) are just terrible – that we cannot expect them to generalize very far, and that we desperately need to expand them.
I was trained, and have always trained my students, that “since we can’t get a representative sample of the population of interest (everyone), then we might as well study college students as they are convenient.” This belief was so dear to me that I fully expected to find that the many respected scientists who responded to the target paper would buy into this logic – it just seems so reasonable! But with some exceptions (e.g. Gaertner, Sedikides, & Brown, 2010), they did not. It appears that many – even most – scientists agree that samples used by social psychologists are flawed and that our conclusions are therefore invalid.
Although it has received little of our attention, to me this is an article that alters our entire approach. Unless we take the question of generalization across participants more seriously, we are indeed in facing a crisis of confidence in our field. I dread the day when my faculty asks me to support the promotion of a social psychologist whose work is based entirely on college student samples. What can I possibly say in his or her defense? What do you plan to say?
Secondly, and equally troubling to me, is a finding, excellently summarized in a New Yorker article by Jonah Lehrer (Lehrer, 2011), regarding the instability of observed effects. The basic phenomenon is that observed findings in the social and biological sciences weaken with time. Effects that are easily replicable at first become less so every day. Drugs stop working over time the same way that social psychological phenomena become more and more elusive. The “the decline effect” or “the truth wears off effect,” is not easy to dismiss, although perhaps the strength of the decline effect will itself decline over time.
Frankly I have difficulty getting my head around this idea (I’m guessing others do too) but it is nevertheless exceedingly troubling. I know that I need to replicate my effects, but am often unable to do it. And perhaps this is part of the reason. Given the difficulty of replication, will we continue to even bother? And what becomes of our research if we do even less replicating than we do now? This is indeed a problem that does not seem likely to go away soon and, which represents a major challenge for us.
The third paper that has fundamentally changed my thinking, and one that is also noted by Srivastava, is the Psychological Science paper by Simmons, Nelson, and Simonsohn (2011). Simmons et al. have argued that scientists frequently engage in research practices that lead to false positives – that is, reporting findings that may not hold up when and if others attempt to replicate the studies.
This work has received substantial attention with social psychology (Kraus, 2011) in relation to the Stapel case, the nature of its recommendations for testing the legitimacy of data, and because of potential limitations in its survey methodology (Discussion Group, 2011). But its fundamental assertions are deep and long-lasting, and they have substantially affected me.
Although there are many ways that I take the comments to heart, perhaps most important to me is the realization that some of the basic techniques that I have long used to collect and analyze data – techniques that were taught to me by my mentors and which I have shared with my students – are simply wrong.
I don’t know about you, but I’ve frequently “looked early” at my data, and I think my students do too. And I certainly bury studies that don’t work, let alone fail to report dependent variables that have been uncooperative. And I have always argued that the researcher has the obligation to write the best story possible, even if may mean substantially “rewriting the research hypothesis.” Over the years my students have asked me about these practices (“What do you recommend, Herr Professor?”) and I have routinely, but potentially wrongly, reassured them that in the end, truth will win out.
In short, this important paper will – must – completely change the field. It has shined a light on the elephant in the room, which is that we are publishing too many Type-1 errors, and we all know it. We need to step up to the plate and figure out what to do about it before more damage is done. Fortunately, recent information suggests that the editors at Psychological Science are discussing this issue carefully (Srivastava, S. (2011)
Whew! What a year 2011 was – let’s hope that we come back with some good answers to these troubling issues in 2012.
Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&db=psyh&AN=2011-01894-001&site=ehost-live
Gaertner, L., Sedikides, C., Cai, H., & Brown, J. D. (2010). It’s not WEIRD, it’s WRONG: When Researchers Overlook Underlying Genotypes, they will not detect universal processes. Behavioral and Brain Sciences, 33(2-3), 93–94.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61–83.
Kraus, Michael W. (2011). Friday Fun: One Researcher’s P-Curve Analysis. Retrieved from: http://psych-your-mind.blogspot.com/2012/02/friday-fun-one-researchers-p-curve.html
Lehrer, J. (2011). The Truth Wears off.: Is there something wrong with the scientific method? The New Yorker. Retrieved from: http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.
Srivastava, S. (2011a). Groundbreaking or Definitive: Journals need to pick one [Weblog comment]. Retrieved from http://spsptalks.wordpress.com/2011/12/31/groundbreaking-or-definitive-journals-need-to-pick-one/
Srivastava, S. (2011b). An editorial board discusses fMRI analysis and “false-positive psychology.” [Weblog comment]. Retrieved from http://hardsci.wordpress.com/2012/01/02/an-editorial-board-discusses-fmri-analysis-and-false-positive-psychology/