In recent years, there have been incredible advances in scientific tools available at our disposal. As a result, the rate of scientific discovery and the amount of data produced by molecular biologists and proteomic specialists has been astounding. Projects such as the Cancer Genome Atlas and the ENCODE Project have generated billions of data points and provide opportunities for original researchers and other investigators to use these results in their own work to advance our knowledge of biology and biomedicine. This data explosion has challenged scientists and funding agencies to come up with new models for dealing with this massive amount of data in the most efficient way possible.
In order to tackle this challenge, the National Institute of Health (NIH), has created a Big Data to Knowledge (BD2K) initiative to enable biomedical research as a digital research enterprise, to facilitate discovery and support new knowledge, and to maximize community engagement. So far this year, the NIH has invested $32 Million in BD2K with an additional $624 Million expected to be injected into the project by the year 2020.
According to NIH director Francis S. Collins:
Mammoth data sets are emerging at an accelerated pace in today’s biomedical research and these funds will help us overcome the obstacles to maximizing their utility. The potential of these data, when used effectively, is quite astounding.
Note Dr. Collins’ use of the words “when used effectively.” Effective use and analysis of massive data sets requires open collaboration between scientists across various disciplines and nationalities. Governments play a critical role in facilitating such collaboration and science-friendly collaborative policies are not always forthcoming. Furthermore, lack of data standards for many types of data, and the low adoption of data standards across the research community has also proven to be a significant obstacle to the efficient used of Big Data. In addition, many scientists also do not have the opportunity or facility to use big data and have not been trained in the computational skills to access and analyze large data sets.
Let’s hope that the recent grants awarded by the NIH strengthen the effective use of Big Data so that the time and effort spent in creating this data does not go to waste.