General news and CORE cooperate to build AI Chemist

CORE and are extremely pleased to announce the initiation of a new research collaboration funded by the Norwegian Research Council.

Discovering scientific insights about a specific topic is challenging, particularly in an area like chemistry which is one of the top-five most published fields with over 11 million publications and 307,000 patents. The team at have spent the last 5 years building an award-winning AI engine for scientific text understanding. Their patented algorithms for identifying text similarity, extracting tabular data and creating domain-specific entity representations mean they are world leaders in this domain. 

The AI Chemist project is a collaboration between and The Open University, Oxford University, Trinity College, Dublin and University College, London. CORE is a not-for-profit platform delivered by The Open University in cooperation with Jisc that hosts the world’s largest collection of open access scientific articles. As of February 2022, the CORE dataset provides metadata information (title, author, abstract, publishing year, etc.) for approximately 210 million articles, and the full text for 29.5 million articles.

Working in partnership with CORE developers and researchers, will now leverage the vast quantities of research papers available in the CORE dataset. This dataset will be employed in improving the quality of text extraction from scientific literature from Chemistry focused domains. The output of this phase will support and The AI Chemist in understanding reasoning and inference across research papers. 

Currently, the state of the art in the chemical domain is a combination of direct manual evaluation of text documents, social networks and curated, but incomplete databases. The manual nature of these approaches makes discovery of novel application areas immensely time consuming. The goal is to develop a set of algorithms that can machine read vast amounts of scientific literature and data, discover and detect mentions of entities of interest and their relations (such as chemical products, compounds, properties, processes, applications, etc.) and connect these pieces of information to build an increasingly complex knowledge graph.

Dr Ronin Wu, Research Lead and Head of Research Collaboration at, said: ‘ are extremely pleased to be partnering with CORE on the AI Chemist project and we’re looking forward to seeing some exciting new developments with our AI models’.

Dr. Petr Knoth, Head of CORE and Senior Research Fellow in Text and Data Mining, said: “This cooperative research project will put CORE at the forefront of the global effort to create open scholarly knowledge graphs. As part of this project we will use state-of-the-art machine learning approaches to address problems including topic / themes extraction, affiliation extraction, deduplication and citation function detection. With the demise of Microsoft Academic Graph at the end of 2021, we see on a daily basis how much this is in demand among CORE users. ”  


Share and Enjoy !


Leave a Reply

Your email address will not be published. Required fields are marked *