Mining New Insights from Unstructured Scientific Data

April 24, 2017

Mining New Insights from Unstructured Scientific Data

Drowning In Data

Technology is revolutionizing science and medicine. We see, hear, and experience the results of this revolution every day. Google Analytics can now predict flu strains faster than official sources. Physicians already leverage technology to better diagnose and treat disease down to the personalized level.1 And throughout the global healthcare system, the ability to analyze patient data holds the potential to improve the quality of healthcare delivery and reduce cost significantly.2

There is no question that technology is advancing medicine in ways only dreamed of a decade ago. But there is an immense white elephant lurking in the research and clinical labs of the world.

“Science has data coming out of its ears,” Dr. Timo Hannay, Managing Director of Digital Science, noted in an article that appeared in WIRED. “Yet in this age of big data, science has a big problem,” Dr. Hannay said. “It is not doing nearly enough to encourage and enable the sharing, analysis, and interpretation of the vast swatches of data that researchers are collecting.”3

One reason is there is just too much information out there. In addition to what is already in existence, we create more than 2.5 quintillion bytes of new data every day. Put another way, that is the equivalent of 57.5 billion, 32-gigabyte iPads—or about eight iPads for every person on earth every day.4

Buried beneath all that data, however, are any number of potential breakthroughs right under our collective noses. The challenge is how to best mine the critical data currently hidden away in the countless clouds, hard drives, clinical reports and lab notes, and published reports that continue to grow at a mind-numbing pace and in formats that do not easily lend themselves to analytics.

Finding the Right Information

Scientists yearn for ways to easily search for the exact data they need and quickly understand the relationships between sources of information. In effect, users need to be able to find the precise data sources for their analysis just as easily as they can find the right product on Amazon.

This requires an intuitive search experience, “that allows for faster identification of relevant datasets across disparate systems to create a rich visual data and information continuum meant to improve scientific insights and the quality of decisions,” explains Dr. Mark Demesmaeker, Vice President Scientific Analytics, at PerkinElmer.

Considering that nearly 90% of information assets in a typical enterprise currently go untapped, this semantic search experience can lead to some pretty powerful findings in the scientific, pharmaceutical, and clinical worlds.5

PerkinElmer is the exclusive distributor of the Attivio® platform that allows researchers to quickly identify and unify the relevant data sources for their analysis, including structured, semi-structured, and unstructured content, bridging proprietary and public domain sources.

The platform samples critical content from the disparate sources to understand their implicit relationship and can also suggest the connection of related data sets even when those datasets do not directly reference each other. The resulting virtual “datamarts” can then be seamlessly sent to the TIBCO Spotfire® visual analytics software to generate visually intuitive, interactive displays to support further collaboration and decision making.

Most of the information being created today is unstructured, being captured in free text, pdfs, emails, journal articles, and other forms that were not easily utilized for analytics – until now. The Attivio platform incorporates advanced text analytics to fully unlock unstructured data. “Using text mining with scientifically focused ontologies, unstructured data can now be structured into tabular models which are easily utilized with leading analytics platforms like TIBCO Spotfire,” Demesmaeker says.

The SciBite Connection

The latest addition to PerkinElmer’s solutions for big data is SciBite’s TERMite text analytics platform. This application enriches the text mining with dictionaries and ontologies containing millions of scientific and medically relevant terms to greatly improve the text analytics output quality.

“Basically, it uses a comprehensive scientific understanding, to extract the right information from unstructured data sources such as PubMed to empower researchers with a complete view of all relevant information versus a partial picture.

Imagine asking the question “How the 84,000+ PubMed articles on Pancreatic Cancer quote CA19-9 as suitable diagnostic biomarker for a specific sub-form?” and trying to answer that by reading the top search results. The only way to really understand the full body of literature is for a computer to read all of the articles and provide text mining output for analytical and statistical analysis.” Demesmaeker says. “That is the real strength of the Attivio and SciBite expanded platform.”

References

  1. Eric Schadt, “The Role Of Big Data In Medicine,” McKenssey Insights, November 2015, accessed February 7, 2017.
  2. Wullianallur Raghupathi, Viju Raghupathi, “Big Data Analytics In Healthcare: Promise And Potential,” Health Information Science And Systems, December 2014, pp. 2047-2501, accessed March 3, 2017.
  3. Imo Hannay, “Science’s Big Data Problem”, WIRED, 2014, accessed February 7, 2017.
  4. Cory Vander Jagt, “Can You Find The Needle In The Haystack? Let’s Talk BI Data Discovery,” GoodData, January 28, 2015, accessed February 7, 2017.
  5. Douglas Laney, “Information Innovation Key Initiative Overview,” Gartner, April 22, 2014, accessed February 7, 2017.

更多故事关于 生命科学, 疾病研究, Innovation Spotlight

珀金埃尔默推出新型米质分析仪 PaddyCheck™ PC 6800

致力于为创建更健康的世界而不懈努力的全球技术领导企业珀金埃尔默,今天宣布推出PaddyCheck™ PC 6800米质分析仪。这一全新的解决方案综合利用了图像分析法和压力测试法,可加快样品分析通量,提高准确度,为更加一致的稻米品质评价提供标准化结果。

珀金埃尔默助力2019昆明国际肿瘤研究论坛

2019昆明国际肿瘤研究论坛在昆明顺利召开,本届会议邀请了国内外癌症研究领域顶尖的华人学者们齐聚一堂,共襄盛会,围绕“癌症基础研究和临床治疗”的主题,共同分享与探讨了癌症发生、发展和治疗的新知识和远见。 

Innovators in Cancer Immunology: Lymphoma

Cancer researchers are uncovering new insights using our Phenoptics™ Quantitative Pathology Research Solutions. In a lymphoma cancer case study, Dr. Rodig uses t...

Related Products