Should you share your neuroimaging data?
Increasingly, journals are incentivizing authors to share their data – and in some cases, it is becoming required for publication. Given this, it is important to understand and evaluate the pros and cons of sharing your data.
The pros are:
- Increased citations. Some researchers might publish using your data and cite you. Even though researchers might not publish using your data, they might run rudimentary analysis to compare with their results and cite you. On average, there is a 25% citation boost for linking your data to your publications.
- Create new collaborations. Some researchers might contact you to assist them in reanalyzing your data for a specific question that might not have been relevant to you at the time of publication – or they might want to submit a grant application with you to do so.
- You might get another publication out of publishing your data. For example, Scientific Data, a Nature Group journal focuses on publication of data only, where you describe your data, how it is publically released, and provide a detailed description and basic quality metrics.
- It will be easier for you or your collaborators to reuse that data. Admittedly, the person most likely to reanalyze that data in the future is probably you. If you ever have to reanalyze that data again, having it properly formatted and readily available online could save you dozens of hours of headache juggling through old hard drives, CD (even archival tapes), re-contacting the student who acquired it and now moved to another job, etc…
- You might increase your chance of getting funding. This is especially true for agencies such as NIH (National Institute of Health in the USA) that promote such practices. Showing a track record of releasing your data will show that you mean business!
- You are benefiting science. There is probably a reason why you became an academic scientist that goes beyond personal gain. Perhaps, it was to inspire new generations or advance human knowledge. There is little doubt that sharing your data is more aligned with these beliefs than not sharing your data.
The cons are:
- Preparation time. It takes time to format your data so it can be shared online. Admittedly, some researchers simply dump their files with no documentation on NITRC, figshare, OSF, or another repository that does not impose any data formatting. However, for neuroimaging data, we would advocate the use of repositories such as OpenNeuro that enforce formats such as BIDS. There are now many software to format your data as BIDS in just a few clicks. For example, for EEG, which is my area of research, there is an EEGLAB plugin that will automatically convert raw imported data files to BIDS. Formatting your data to BIDS will ensure that it contains all the documentation necessary for reuse and that it is compatible with standard processing pipelines.
- Data, especially when you have a lot of it, is power. We all know researchers that have acquired a couple of dozen neuroimaging datasets and restrict their access to a handful of collaborators that might publish with it so they can be a co-author. Releasing the data publicly would mean loss of potential publication and collaborations. If you are one of these researchers and it works for you, then you should probably continue to do that. However, there are a large number of researchers that cling to their data just in case such an opportunity might come up. If you are one of those that retain access to their data just in case, we would argue that the potential benefits of publicly releasing your data outweigh the potential loss. First, you are probably more likely to spur collaborations with your data online. Second, not formatting and releasing your data after you are finished with the experiment means that the efforts to do so a couple of years down the road will become exponentially harder.
I hope I have convinced you to share your data, if not for the public good, for your own benefit.
Sharing data is just the first step. The second step is to share your analysis pipeline and hopefully, I will get to write a future blog on this topic.
- Arnaud Delorme, Ph.D. – shared his first study containing raw EEG datasets from 16 participants in 2002 on his personal website