Scientific knowledge graphs that contain three important currencies flowing in the scientific world, i.e., publication (literature), dataset and code, can boost a research-oriented searching and recommendation system.

I am Na Li, a PhD student in computer science at University of Amsterdam, currently exploring notebook search for scientific users. In this post I will introduce knowledge graphs in the scientific world.


Currently I am working on notebook search and one thing that can benefit a searching system is knowledge graph (KG). According to wikipedia, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to integrate data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics underlying the used terminology.

In our case, we consider three important currencies flowing in the scientific world especially in the data science domain, i.e., publication (literature), dataset and code. Towards each element, there are knowledge graphs behind.

We will introduce 5 knowledge graphs that contain at least one aforementioned element.

Microsoft Academic Graph (MAG)[1] is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study. Unfortunately, it stopped updating at the end of 2021. Open Research Knowledge Graph (ORKG)[2] aims to describe research papers in a structured manner. It is an infrastructure at its early stage. PubMed knowledge graph (PKG) targets biomedical domain and contains bio-enties[3]. It is constructed by extract bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID®, and identifying fine-grained affiliation data from MapAffil.  Computer Science Ontology (CSO)[4] is a large-scale ontology of research areas that was automatically generated using the Klink-2 algorithm on the Rexplore dataset, which consists of about 16 million publications, mainly in the field of Computer Science. Artificial Intelligence Knowledge Graph (AI-KG)[5] is a large-scale automatically generated knowledge graph containing 14M RDF triples and 1,2M statements extracted from 333K research publications in the field of AI. It describes 857,658 research entities of 5 types (i.e., tasks, methods, metrics, materials, others) linked by 27 relations.

With above scientific knowledge graphs, it is possible to leverage external information to enhance the searching system.



The following photo is a fabulous view from the window of my office in Science Park 904 — University of Amsterdam, 1098 XH Amsterdam, Netherlands.




[3] Xu, J., Kim, S., Song, M. et al. Building a PubMed knowledge graph. Sci Data 7, 205 (2020).



Na Li – ESR1