Traditional Information Retrieval (IR) effectiveness metrics assume that a relevant document satisfies the information need as a whole. Nevertheless, if the information need is faceted or contains subtopics, this notion of relevance cannot model documents relevant only to one or a few subtopics. Furthermore, faceted documents in a ranked list may focus on the same subtopics, and their content may overlap while neglecting other subtopics. Hence, a search result, where topranked documents deal with different subtopics should be preferred over a result where documents are thematically limited and provide overlapping information. The Multi-Dimensional Cumulated Utility (MDCU) metric, recently formulated theoretically by Järvelin and Sormunen, extends the evaluation of novelty and diversity by considering content overlapping among documents. While Järvelin and Sormunen described the theory of MDCU and illustrated its application on a toy example, they did not investigate its empirical use. In this paper, we show the practical feasibility and validity of the MDCU by applying it to publicly available TREC test collections. Furthermore, we analyse its relation with the well-established α-nDCG, and finally, we provide a Python implementation of the MDCU, fostering its adoption as an evaluation framework. Our results indicate a positive correlation between α-nDCG and MDCU, suggesting that both measures correctly identify similar trends when evaluating the IR systems. Finally, compared to α-nDCG, MDCU exhibits a stronger statistical power and identifies up to 9 times more statistically significantly different pairs of systems.
When interacting with an Information Retrieval (IR) system, users might disclose personal information, such as medical details, through their queries. Thus, assessing the level of privacy granted to users when querying an IR system is essential to determine the confidentiality of submitted sensitive data. Query obfuscation protocols have traditionally been employed to obscure a user`s real information need when retrieving documents. In these protocols, the query is modified employing ε-Differential Privacy (DP) obfuscation mechanisms, which alter query terms according to a predefined privacy budget ε. While this budget ensures formal mathematical guarantees, it provides only limited guarantees of the privacy experienced by the user and calls for empirical privacy evaluation to be carried out. Such privacy assessments employ lexical and semantic similarity measures between the original and obfuscated queries. In this study, we explore the role of Large Language Models (LLMs) in privacy evaluation, simulating a scenario where users employ such models to determine whether their input has been effectively privatized. Our primary research objective is to determine whether LLMs provide a novel perspective on privacy estimation and if their assessments serve as a proxy for traditional similarity metrics, such as the Jaccard and cosine similarity derived from Transformer-based sentence embeddings. Our findings reveal a positive correlation between LLMs-generated privacy scores and cosine similarity computed using different Transformer architectures. This suggests that LLM assessments act as a proxy for similarity-based measures.
When Information Retrieval ( IR) models are applied to and trained on sensitive and personal information, users’ privacy is at risk. While mechanisms have been presented to safeguard user privacy, the effectiveness of these privacy protections is generally evaluated by studying the relations between performance on a downstream task and the parameters of the mechanisms, e.g., the privacy budget epsilon in Differential Privacy (DP). is often causes a partial understanding between formal privacy and the privacy experienced by the user, the actual privacy. In this paper, we discuss the Query Inference for Privacy and Utility ( QuIPU) framework, a novel evaluation methodology designed to assess actual privacy based on the risk that an “honest-but-curious” IR system may correctly guess the original query from the obfuscated queries received. The QuIPU framework constitutes the first endeavour to quantify actual privacy for IR tasks, extending beyond the partial comparison of formal privacy parameters. Our findings show that formal privacy parameters do not necessarily correspond to actual privacy, resulting in cases where, despite identical privacy parameters, two systems reveal differing actual privacy levels.
Privacy is an essential aspect to consider when processing sensitive textual information in Natural Language Processing (NLP ) and Information Retrieval ( IR) tasks. Private medical records, queries, online posts and reviews can contain sensitive information that can endanger the confidentiality of users’ data. To address this privacy issue, the gold-standard framework employed to protect such sensitive information when dealing with textual sentences is the epsilon-Differential Privacy (DP) obfuscation framework. However, to implement, develop and test State-of-the-art mechanisms, there is a need for a unified framework for such new obfuscation mechanisms. pyPANTERA is designed as a modular, extensible library developed to enrich DP techniques, enabling the integration of new DP mechanisms and allowing reproducible comparison of the current mechanisms. The effectiveness of the pyPANTERA package is measured by applying it to sentiment analysis and query obfuscation protocols. The library’s source code is available in the public repository at https://github.com/Kekkodf/pypantera.
Data represents one of the most crucial assets of today’s digital age. Privacy-preserving strategies play a crucial role in safeguarding the confidentiality of sensitive user data during the overall processing pipeline in Natural Language Processing (NLP) and Information Retrieval (IR) tasks. This paper presents an overview of obfuscation strategies and evaluation metrics employed to process users’ textual information privately when interacting with IR systems, framing these solutions within the formal framework of epsilon-Differential Privacy (DP). The methodologies and findings presented in this paper describe the author’s preliminary studies in his current PhD activity.
Preserving privacy in Information Retrieval (IR) remains a significant issue for users when interacting with Information Retrieval Systems (IRSs). Conducting a private search when an IRS does not cooperate towards the protection of the user privacy can lead to unwanted information disclosure through the analysis of the queries sent to the system. Recent investigations in NLP and IR have adopted the use of epsilon-Differential Privacy (DP) to obfuscate the real information need contained in the user queries. Although privacy is protected from a formal point of view, such methods do not consider the fact that the obfuscations can be irrelevant if the lexical or semantic meaning of the obfuscated terms remains unchanged with respect to the real user text. This paper outlines the author's PhD research in designing new techniques based on epsilon-DP for preserving the real user information need when interacting with IRSs that aim to disclose private information.
Privacy is a fundamental right that could be threatened by IR models when applied and trained on sensitive data and personal user information. Although mechanisms have been proposed to protect user privacy, the effectiveness of the privacy protections is typically assessed by studying the relations between performance and parameters of the mechanisms, such as the privacy budget in DP. This often causes a disconnection between formal privacy and the privacy experienced by the user, the actual privacy. In this paper, we present the QuIPU framework, a novel evaluation paradigm to assess actual privacy based on the risk that an "honest-but-curious" IR system can infer the original query from the obfuscated queries received. QuIPU represents the first attempt at measuring actual privacy for IR tasks beyond the comparison of formal privacy parameters. Our analysis shows that formal privacy parameters do not imply actual privacy, causing scenarios where, for the same privacy parameter values, two systems provide different utility, but also different actual privacy. Therefore, there is a necessity for a proper way of assessing the risk, represented by QuIPU.
Privacy is critical when dealing with user-generated text, as common in Natural Language Processing (NLP) and Information Retrieval (IR) tasks. Documents, queries, posts, and reviews might pose a risk of inadvertently disclosing sensitive information. Such exposure of private data is a significant threat to user privacy, as it may reveal information that users prefer to keep confidential. The leading framework to protect user privacy when handling textual information is represented by the ε-Differential Privacy (DP). However, the research community lacks a unified framework for comparing different DP mechanisms. This study introduces pyPANTERA, an open-source Python package developed for text obfuscation. The package is designed to incorporate State-of-the-Art DP mechanisms within a unified framework for obfuscating data. pyPANTERA is not only designed as a modular and extensible library for enriching DP techniques, thereby enabling the integration of new DP mechanisms in future research, but also to allow reproducible comparison of the current State-of-the-Art mechanisms. Through extensive evaluation, we demonstrate the effectiveness of pyPANTERA, making it an essential resource for privacy researchers and practitioners. The source code of the library and for the experiments is available at: https://github.com/Kekkodf/pypantera
Evaluating privacy provided by obfuscation mechanisms remains an open problem in the research community. Especially for textual data, in Natural Language Processing (NLP) and Information Retrieval (IR) tasks, privacy guarantees are measured by analyzing the hyper-parameters of a mechanism, e.g., the privacy budget 𝜀 in Differential Privacy ( DP), and the impact of these on the performances. However, considering only the privacy parameters is not enough to understand the actual level of privacy achieved by a mechanism from a real user perspective. We analyse the requirements and the features needed to actually evaluate the privacy of obfuscated texts beyond the formal privacy provided by the analysis of the mechanisms' parameters, and suggest some research directions to devise new evaluation measures for this purpose.
The deterioration of the performances of Information Retrieval Systems (IRSs) over time remains an open issue among the Information Retrieval (IR) community. With this study for Task 1 of the Longitudinal Evaluation of Model Performance LAB (LongEval) at Conference and Labs of the Evaluation Forum (CLEF) 2024, we aim to propose and analyze the performance of an IRS that is able to handle changes over time in the data. In addition, the model uses different Large Language Models ( LLMs) to enhance the effectiveness of the retrieval process by rephrasing the queries for the search and the reranking of the retrieved documents. With an in-depth analysis of the performance of the MOUSE group Retrieval System, using the datasets provided by the organisers of CLEF , the proposed model was able to reach a Mean Average Precision (MAP) of 0.22 and a Normalized Discounted Cumulated Gain (nDCG) of 0.40 for the English collection, increasing the performance for the original French collection up to 0.31 and 0.50, for MAP and nDCG respectively.
Survival Analyses (SAs), a key statistical tool used to predict event occurrence over time, often involve sensitive information, necessitating robust privacy safeguards. This work demonstrates how the Revised Randomized Response (RRR) can be adapted to ensure Differential Privacy (DP) while performing SAs. This methodology seeks to safeguard the privacy of individuals’ data without significantly changing the utility, represented by the statistical properties of the survival rates computed. Our findings show that integrating DP through RRR into SAs is both practical and effective, providing a significant step forward in the privacy-preserving analysis of sensitive time-to-event data. This study contributes to the field by offering a new comparison method to the current state-of-the-art used for SAs in medical research.
The necessity of storing and manipulating electronic data (eHealth data) in the Oncology research field has introduced two main types of challenges. Firstly, research centers need to manage eHealth data with appropriate and secure Data Management Infrastructure, and secondly, to preserve the privacy of patients' data. This work consists of a study on the feasibility and privacy analysis of a Data Management Infrastructure in the Oncology Research Domain. The project studies potential strengths and weaknesses in developing a Digital Clinical Data Repository (DCDR) in a practical study case at “Centro di Riferimento Oncologico” in Aviano, Italy. The study considers the standard HL7 FHIR, an international standard for healthcare data retrieval and exchange, within two possible scenarios, a monolithic application and a fragmented one. The analysis of potential privacy-related aspects is examined within the General Data Protection Regulation (GDPR) framework and studied through a utility evaluation after applying Differential Private mechanisms.
In February 2022, Russia launched a full-scale invasion of Ukraine. This event had global repercussions, especially on the political decisions of European countries. As expected, the role of Italy in the conflict became a major campaign issue for the Italian General Election held on 25 September 2022. Politicians frequently use Twitter to communicate during political campaigns, but bots often interfere and attempt to manipulate elections. Hence, understanding whether bots influenced public opinion regarding the conflict and, therefore, the elections is essential. In this work, we investigate how Italian politics responded to the Russo-Ukrainian conflict on Twitter and whether bots manipulated public opinion before the 2022 general election. We first analyze 39,611 tweet of six major political Italian parties to understand how they discussed the war during the period February-December 2022. Then, we focus on the 360,823 comments under the last month’s posts before the elections, discovering around 12% of the commenters are bots. By examining their activities, it becomes clear they both distorted how war topics were treated and influenced real users during the last month before the elections.