Active Projects

  • image

    HEREDITARY

    2024 - 2027

    Hereditary: HetERogeneous sEmantic Data integratIon for the guT-bRain interplaY

    HEREDITARY aims to significantly transform the way we approach disease detection, prepare treatment response, and explore medical knowledge by building a robust, interoperable, trustworthy and secure framework that integrates multimodal health data (including genetic data) while ensuring compliance with cross-national privacy-preserving policies. The HEREDITARY framework comprises five interconnected layers, from federated data processing and semantic data integration to visual interaction.

    By utilizing advanced federated analytics and learning workflows, we aim to identify new risk factors and treatment responses focusing, as exploratory use cases, on neurodegenerative and gut microbiome related disorders. HEREDITARY is harmonizing and linking various sources of clinical, genomic, and environmental data on a large scale. This enables clinicians, researchers, and policymakers to understand these diseases better and develop more effective treatment strategies. HEREDITARY adheres to the citizen science paradigm to ensure that patients and the public have a primary role in guiding scientific and medical research while maintaining full control of their data. Our goal is to change the way we approach healthcare by unlocking insights that were previously impossible to obtain.

    Role: UNIPD responsible for the “Communication and Dissemination” committee; contact person for the task 4.6 “Evidence-based knowledge graph creation and exploration”.

    Project No: 101137074
    Call: ORIZON-HLTH-2023-TOOL-05
    Topic: Tools and technologies for a healthy society
    Funding (UNIPD): 1.138.046€

  • image

    BRAINTEASER

    2021 - 2024

    Brainteaser: BRinging Artificial INTelligencE home for a better cAre of amyotrophic lateral sclerosis and multiple SclERosis

    Amyotrophic Lateral Sclerosis (ALS) and Multiple Sclerosis (MS) are chronic diseases characterized by progressive or alternate impairment of neurological functions (motor, sensory, visual, cognitive). Artificial Intelligence is the key to successfully satisfy these needs to: i) better describe disease mechanisms; ii) stratify patients according to their phenotype assessed all over the disease evolution; iii) predict disease progression in a probabilistic, time dependent fashion; iv) investigate the role of the environment; v) suggest interventions that can delay the progression of the disease. BRAINTEASER will integrate large clinical datasets with novel personal and environmental data collected using low-cost sensors and apps.

    We are leader of the "Open Science and FAIR Data" WP. The main goals of the WP are:
    • Design of open ontologies to represent the data of the project and create knowledge bases to enrich and augment the value of the data.
    • Design and implement methods for the evaluation of the FAIRification of the data and metadata produced by applying and reviewing the FAIR principles of the European Open Science Cloud (EOSC). Integration and sharing of research data with EOSC services.
    • Design and implementation of the methods to expose the data as Linked Open Data and the services to favour their exploration and re-use.
    • Organisation of three annual open evaluation challenges and sharing of the produced experimental data as open data Evaluation.
    Role: Participant.

    Project No: 101017598
    Call: H2020-SC1-DTH-2020-1
    Topic: Personalised early risk prediction, prevention and intervention based on Artificial Intelligence and Big Data technologies
    Funding (UNIPD): 732.250€

Past Projects

  • image

    EXAMODE

    2019 - 2022

    ExaMode: Extreme-scale Analytics via Multimodal Ontology Discovery & Enhancement

    Exascale volumes of diverse data from distributed sources are continuously produced. Healthcare data stand out in the size produced (production is expected to be over 2000 exabytes in 2020), heterogeneity (many media, acquisition methods), included knowledge (e.g. diagnosis) and commercial value. The supervised nature of deep learning models requires large labeled, annotated data, which precludes models to extract knowledge and value. Examode solves this by allowing easy & fast, weakly supervised knowledge discovery of exascale heterogeneous data, limiting human interaction.

    We are leader of the "Semantic knowledge discovery and visualisation" WP. The main goals of the WP are:
    • Develop relation extraction methods to automatically extract semantic relationships between authoritative concepts within un/semi-structured text.
    • Leverage entity linking methods in conjunction with developed relation extraction techniques to create report-level semantic networks out of extracted concepts and relationships.
    • Model report-level semantic networks through conceptual descriptive frameworks to empower data management and exploitation.
    • Develop information retrieval methods to semantically connect and discover semantic networks associated with relevant medical reports.
    • Develop information visualization and visual analytics methods for interacting with deep learning algorithm and improve their understandability.
    Role: Task leader for the task 2.1 “Semantic knowledge extractor prototype”; task leader for the task 2.3 “Automatic knowledge discovery system prototype and user study outcome”.

    Project No: 825292
    Call: H2020-ICT-2018-2
    Topic: Big Data technologies and extreme-scale analytics
    Funding (UNIPD): 516.000€

Data and Software

Data

The SPARQL endpoint to access the CORE KB is available here.

The gene expression-cancer KB generated by the Collaborative Oriented Relation Extraction (CORE) system can be found here.

The TBGA dataset for gene-disease association extraction can be found here.

The runs, pools, plots, and analyses to reproduce the Semantic-Aware neural Framework for IR (SAFIR) results are available here.

The runs used to perform experiments on Precision Medicine (PM) query reformulations can be found here.

Software

The methods to estimate KG accuracy in an efficient and reliable manner are available here.

The CoreKB platform for searching reliable facts over gene expression-cancer associations is available here.

The source code and info about the CORE system are available here.

The source code and info about the Semantic Knowledge Extractor Tool (SKET) are available here.

The source code and info about Biomedical Relation Extraction (BioRE) methods are available here.

The source code and info about the SAFIR can be found here.