AstraZeneca researcher writes how SureChem solves problem of searching text for chemical structures
Jan 27, 2006
Reel Two's SureChem is highlighted in a January 2006 Drug Discovery Today article about the importance of mining chemical structures from drug literature. Author Debra L. Banville of AstraZeneca noted how an early alpha version of SureChem already addressed a number of challenges faced by researchers trying to find specific chemical compound information.
Such tools for mining structures from text have "significant business value" for AstraZeneca and other firms involved in drug discovery, Banville writes. "Because a typical project at AstraZeneca can involve 50-100 key patents, covering thousands and chemical structures, having structural information in a relational database significantly reduces the information into a manageable form."
The article describes the particular hurdles of correctly identifying chemical compounds in text, including non-standard naming conventions, mapping of generic or common names to structures and linking index and reference numbers to chemical names. Banville also notes that the "selective (versus comprehensive) indexing polices of bibliographic chemical databases" mean researchers need to look further.
"In many instances, only full-text searching can provide the information required to build a chemical reaction database or map structure-activity relationships," she writes. "The challenge lies in the ability to take text written without any prescribed set of rules and develop an automated process capable of extracting meaningful information for evaluation by a knowledgeable end user."
The full article, entitled "Mining chemical structure information from the drug literature" can be found in Drug Discovery Today Volume 11, Number 1/2, January 2006, pages 35-42).