My research interests concern information extraction, information retrieval, and data quality.
My research focuses on bridging the semantic gap in Information Retrieval (IR), particularly in the medical domain, by combining lexical and semantic signals for more effective query-document matching. Key areas include self-supervised, multi-task learning to combine implicit and explicit representations for IR, as well as knowledge-based, weighted query reformulations.
In Information Extraction, my research focuses on weakly/distantly supervised Entity Linking (EL) and Relation Extraction (RE) methods to construct Knowledge Graphs (KGs) with limited manual annotations, also empowering image classification and manual annotation tools.
Recently, I am also focusing on the development of efficient approaches to evaluate the quality of large-scale KGs, leveraging sampling and estimation techniques that provide strong statistical guarantees.
My research work received support by the HEREDITARY Horizon Europe and the EXAMODE and BRAINTEASER H2020 EU projects. More information below.
HEREDITARY aims to significantly transform the way we approach disease detection, prepare treatment response, and explore medical knowledge by building a robust, interoperable, trustworthy and secure framework that integrates multimodal health data (including genetic data) while ensuring compliance with cross-national privacy-preserving policies. The HEREDITARY framework comprises five interconnected layers, from federated data processing and semantic data integration to visual interaction.
By utilizing advanced federated analytics and learning workflows, we aim to identify new risk factors and treatment responses focusing, as exploratory use cases, on neurodegenerative and gut microbiome related disorders. HEREDITARY is harmonizing and linking various sources of clinical, genomic, and environmental data on a large scale. This enables clinicians, researchers, and policymakers to understand these diseases better and develop more effective treatment strategies. HEREDITARY adheres to the citizen science paradigm to ensure that patients and the public have a primary role in guiding scientific and medical research while maintaining full control of their data. Our goal is to change the way we approach healthcare by unlocking insights that were previously impossible to obtain.
Role: UNIPD responsible for the “Communication and Dissemination” committee; contact person for the task 4.6 “Evidence-based knowledge graph creation and exploration”.
Project No: 101137074
Call: ORIZON-HLTH-2023-TOOL-05
Topic: Tools and technologies for a healthy society
Funding (UNIPD): 1.138.046€
The manual and automatic annotations used to estimate the accuracy of DBpedia can be found here. [paper]
The SPARQL endpoint to access the CORE KB is available here. [paper]
The gene expression-cancer KB generated by the Collaborative Oriented Relation Extraction (CORE) system can be found here. [paper]
The TBGA dataset for gene-disease association extraction can be found here. [paper]
The runs, pools, plots, and analyses to reproduce the Semantic-Aware neural Framework for IR (SAFIR) results are available here. [paper]
The runs used to perform experiments on Precision Medicine (PM) query reformulations can be found here. [paper]
The methods used to estimate KG veracity for entity-oriented search can be found here. [paper]
The source code used to aggregate labels and estimate the accuracy of DBpedia is available here. [paper]
The methods to estimate KG accuracy in an efficient and reliable manner are available here. [paper]
The CoreKB platform for searching reliable facts over gene expression-cancer associations is available here. [paper]
The source code and info about the Collaborative Oriented Relation Extraction (CORE) system are available here. [paper]
The source code and info about the Semantic Knowledge Extractor Tool (SKET) are available here. [paper]
The source code and info about Biomedical Relation Extraction (BioRE) methods are available here. [paper]
The source code and info about the Semantic-Aware Neural Framework for IR (SAFIR) can be found here. [paper]