Notebook search requires the understanding of users’ intents and requirements and it may depend on the domain of knowledge.

 I am Na Li, a PhD student in computer science at University of Amsterdam, currently exploring notebook search for scientific users. In this post I will introduce my experiences at the CVBLab in UPV and share some understanding for notebook search.

According to the training program of the CLARIFY project, I started my secondment in the CVBLab from 1st October. This on-site secondment gives me a great opportunity to closely observe the work of AI experts in the CLARIFY project. CVBLab specializes in signal, image and video processing as well as data analysis and the creation of automatic predictive models, whose expertise has been applied to diagnostic aid, human behaviour analysis and applications related to the industrial sector. The atmosphere here is diligent but relaxing. Here PhD students and supervisors work closely. Communication immediately happens when needed.  

Two-week stay has already given me precious experiences to understand their work. We sit at the same office, which already gives me some hints about their daily work. Small talks during coffee break and lunch time are beneficial as well. It is so easy to exchange ideas when we are face-to-face everyday. Interestingly, their experiences in deep learning also helped me with the setup of computing platforms when I began to explore deep learning solutions for notebook search.  

 

Notebook search refers to retrieving notebooks from a database given natural language queries. The most commonly used repository for notebooks is GitHub, which is exactly where I start from. Currently I am conducting notebook indexing and retrieval using a dataset that contains approximately 6000 notebooks extracted from GitHub. In the first phase, only the text in the notebooks is indexed and ranked via BM25 method. Notebook search requires the understanding of users’ intents and requirements and it may depend on the domain of knowledge. In the next step, domain-specific (medical or more specifically, pathological) ranking algorithms will be considered, which will benefit from cooperations with AI researchers in the medical domain. By observing and talking to some of them, I can understand their searching intents and requirements in order to find out the most important features for ranking.  

The photo shows a causal discussion with colleagues at the CVBLab — Universitat Politècnica de València, Spain. 

Na Li – ESR1