Openness with data: The time has come
Like it or not, the time is rapidly approaching when social psychologists will be expected or required to make their data freely available to other scientists. The very idea of being required to share their data strikes fear and outrage in the hearts of many researchers.
Two concerns typically arise when the topic is broached. First, psychologists fear that others will discover inadvertent errors or other problems in their data analyses, leading to public embarrassment or humiliation. This fear is reinforced by the current ethical principles of the American Psychological Association, which require that psychologists share their data for the purposes of verifying substantive claims through reanalysis. Those who obtain data for this purpose may not use it for any other purpose unless they obtain prior written agreement. In other words, by current ethical standards, the only reason we must share our data is so that others can check to see if we have done anything wrong. No wonder people drag their feet when they receive requests for their data.
Second, psychologists often fear that other scientists will use their data to write articles that they had intended to write themselves. It’s bad enough to be scooped by independent research; it’s terrible to think of being scooped with one’s own data. Some data are expensive and time-consuming to collect, or involve samples of research participants that are hard to access. Longitudinal data can take years, or decades, to collect. Why should another researcher get credit for studies based on my data, obtained through my own efforts, and often the result of grant proposals I wrote? Publication of original research is the coin of the realm. To be forced to give away one’s data before one has completely milked it for publications seems downright unjust.
On the other hand, there are some very good reasons why social psychologists should, and will, share their data in the future. First, many of us may soon have no choice. Since 2003, NIH has instituted a data sharing policy for grants exceeding $500,000 (http://grants.nih.gov/grants/policy/data_sharing/). Word from people in the know indicates that NIH plans to extend this policy to all NIH grants. Starting in 2011, NSF requires a full data management plan for all proposals (http://www.nsf.gov/fga/dias/policy/dmp.jsp). NIMH has convened scientific working groups to consider how electronic sharing of data could improve research and practice. These developments have clear implications for psychologists.
More important than changes in funding agency policies is the potential benefit to our science that results from sharing data. I recently chaired a task force on data sharing for APA’s Publication and Communication Board. I must admit, I tried to get out of doing this, and began the task with the same fears and sense of injustice about the idea of being required to share my data as many others. However, the discussion among the task force members changed my mind.
Sharing data with other scientists can have tremendous benefits for our science. When data sets are available to other scientists, they can easily be used to test new hypotheses by other scientists, including graduate students and researchers at smaller institutions who lack the infrastructure to collect their own data. Data can more easily be synthesized for meta-analysis. The generalizability of particular findings across labs and samples can more easily be explored. When data are archived in a repository, they can be analyzed later with new, more powerful or integrative techniques than available at time of data collection. Finally, sharing data encourages a culture of openness and accountability in scientific research.
This last point is not something we should take lightly. The recent interim report by the Levelt Committee indicated that Diederik Stapel fabricated data for dozens of studies over about 17 years, with untold costs for the careers of young scientists, for our science, and for public trust in science. The fact that this fraud continued in our best journals for so many years suggests that something is not working in our field.
Openness and accountability achieved through sharing data will not completely solve the problem of data fabrication, but it can help. Recent criticisms of our science, such as the article by Ben Carey in the New York Times, suggest that we must get in front of this problem, leading through example rather than dragging our heels as funding agencies and federal laws force us to change.
It seems to me that one shift in our culture that would encourage data sharing is recognition that collecting data is an important contribution to science. If data sets were considered citable contributions, then researchers could get credit in the form of a citation each time their data were used in a secondary analysis. Some researchers might find that their data sets are cited more than their articles. To be sure, it will take time to convince tenure and promotion committees to consider citations of data to be significant indicators of the impact of a scientist’s work, but I believe this culture change can, and will, happen.
The APA P&C task force on data sharing developed a draft set of principles that could guide the move toward more data sharing. The task force recognizes the questions that the draft policy raises–where will data be deposited, will it be permanent, will it be interpretable, what about human subjects protections, who will have access, and how can people get credit for their data when its used by others? Implementation will surely be complicated, but I think we must begin to develop answers to these questions.
This is the moment for social psychology to take the lead on this issue. Doing so would both advance our science, and help re-establish our credibility as scientists. I hope all the social and personality psychology societies—SPSP, SESP, EASP, ARP–and social psychology journals, including JPSP, agree that we need to move in this direction, and begin thinking about how to implement it.
Draft Principles proposed by the APA P&C Board task force on data sharing, with commentary
- APA believes that sharing data promotes science.
- APA journals policy requires that, for articles published in APA journals, authors share the data on which the article is based.
- It is the responsibility of the author to find and deposit data on a hosting site in usable, interpretable form.
- The original author and the secondary user of the data both are responsible for protecting individual participants’ privacy and confidentiality of the data.
- The secondary user of data must acknowledge the original source of the data and may not transfer those data to any other individuals.
There are many compelling scientific reasons for sharing data. Sharing data within the larger scientific enterprise, promotes hypothesis generation and testing, programmatic decision-making, and determining the generalizability of particular findings; opens up the data for analysis with new, more powerful or integrative techniques than available at time of collection; allows aggregation for the purposes of knowledge synthesis, and encourages a culture of openness and accountability in scientific research.
All authors of articles published in APA journals should participate in data sharing activities as long as sharing and linking data do not violate the privacy rights or confidentiality of data on identifiable research participants. The responsibility for protecting the confidentiality of the data and the rights of research participants lies both with the original author and with any subsequent scientist using the data for new purposes (secondary user) . Data that have more potential to reveal subject identity need additional security. Both the original author and the secondary user of the data are responsible for ensuring that the level of security protection of human rights is in place. Sharing of data must comply with federal and institutional guidelines.
If an author knows prior to the publication of an article that it will not be possible to share the data on which the article is based, that situation should be disclosed to the journal editor prior to the publication of the article.
It is the responsibility of the author(s) to make data published in APA journals available to the scientific and academic community in usable, interpretable form. APA expects that authors preserve their data in a permanent archive so that their data can be available to scientists indefinitely.
Data should be archived at least at the level of detail used for analyses reported in the article. The archive should include metadata such as, but not limited to, code books, user manuals, and analysis procedures. The data archive should include the first data transformation, such as cortisol scores as compared to biological samples and diagnostic score as opposed to interview transcript or biological samples.
Data sharing arrangements must comply with copyright restrictions, consent provided by participants, requirement of funding agencies, and rules promulgated by the employer of the holder of the data.
Secondary users must acknowledge the original source of the data and may not transfer the data to any other individuals. Sharing of data does not entitle the original author to authorship on articles generated by secondary users, nor should it preclude the possibility of authorship. Authorship of the original data must be cited in the methods section and in the reference list, with appropriate DOIs or URIs.