Filter by Type

Filter by Year

Sort by Year

pyPANTERA: A Python PAckage for Natural language obfuscaTion Enforcing pRivacy & Anonymization

Francesco Luigi De Faveri, Guglielmo Faggioli, and Nicola Ferro
Conference PaperBest Resource Paper Award In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024), Boise, ID, USA, October 21-25, 2024

Abstract

Privacy is critical when dealing with user-generated text, as common in Natural Language Processing (NLP) and Information Retrieval (IR) tasks. Documents, queries, posts, and reviews might pose a risk of inadvertently disclosing sensitive information. Such exposure of private data is a significant threat to user privacy, as it may reveal information that users prefer to keep confidential. The leading framework to protect user privacy when handling textual information is represented by the ε-Differential Privacy (DP). However, the research community lacks a unified framework for comparing different DP mechanisms. This study introduces pyPANTERA, an open-source Python package developed for text obfuscation. The package is designed to incorporate State-of-the-Art DP mechanisms within a unified framework for obfuscating data. pyPANTERA is not only designed as a modular and extensible library for enriching DP techniques, thereby enabling the integration of new DP mechanisms in future research, but also to allow reproducible comparison of the current State-of-the-Art mechanisms. Through extensive evaluation, we demonstrate the effectiveness of pyPANTERA, making it an essential resource for privacy researchers and practitioners. The source code of the library and for the experiments is available at: https://github.com/Kekkodf/pypantera

Beyond the Parameters: Measuring Actual Privacy in Obfuscated Texts

Francesco Luigi De Faveri, Guglielmo Faggioli, and Nicola Ferro
Workshop Paper In: Proceedings of the 14th Italian Information Retrieval Workshop Udine, Italy, September 5-6, 2024

Abstract

Evaluating privacy provided by obfuscation mechanisms remains an open problem in the research community. Especially for textual data, in Natural Language Processing (NLP) and Information Retrieval (IR) tasks, privacy guarantees are measured by analyzing the hyper-parameters of a mechanism, e.g., the privacy budget 𝜀 in Differential Privacy ( DP), and the impact of these on the performances. However, considering only the privacy parameters is not enough to understand the actual level of privacy achieved by a mechanism from a real user perspective. We analyse the requirements and the features needed to actually evaluate the privacy of obfuscated texts beyond the formal privacy provided by the analysis of the mechanisms' parameters, and suggest some research directions to devise new evaluation measures for this purpose.

SEUPD@CLEF: Team MOUSE on Enhancing Search Engines Effectiveness with Large Language Models

Lorenzo Cazzador, Francesco Luigi De Faveri, Filippo Franceschini, Lorenzo Pamio, Samuel Piron, and Nicola Ferro
Conference Paper In: Working Notes of the Conference and Labs of the Evaluation Forum {(CLEF2024)}, Grenoble, France, 9-12 September, 2024

Abstract

The deterioration of the performances of Information Retrieval Systems (IRSs) over time remains an open issue among the Information Retrieval (IR) community. With this study for Task 1 of the Longitudinal Evaluation of Model Performance LAB (LongEval) at Conference and Labs of the Evaluation Forum (CLEF) 2024, we aim to propose and analyze the performance of an IRS that is able to handle changes over time in the data. In addition, the model uses different Large Language Models ( LLMs) to enhance the effectiveness of the retrieval process by rephrasing the queries for the search and the reranking of the retrieved documents. With an in-depth analysis of the performance of the MOUSE group Retrieval System, using the datasets provided by the organisers of CLEF , the proposed model was able to reach a Mean Average Precision (MAP) of 0.22 and a Normalized Discounted Cumulated Gain (nDCG) of 0.40 for the English collection, increasing the performance for the original French collection up to 0.31 and 0.50, for MAP and nDCG respectively.

“Dead or Alive, we can deny it”. A Differentially Private Approach to Survival Analysis.

Francesco Luigi De Faveri, Guglielmo Faggioli, Nicola Ferro, and Riccardo Spizzo
Conference Paper In: Proceedings of the 32nd Symposium of Advanced Database Systems, Villasimius, Italy, June 23rd to 26th, 2024

Abstract

Survival Analyses (SAs), a key statistical tool used to predict event occurrence over time, often involve sensitive information, necessitating robust privacy safeguards. This work demonstrates how the Revised Randomized Response (RRR) can be adapted to ensure Differential Privacy (DP) while performing SAs. This methodology seeks to safeguard the privacy of individuals’ data without significantly changing the utility, represented by the statistical properties of the survival rates computed. Our findings show that integrating DP through RRR into SAs is both practical and effective, providing a significant step forward in the privacy-preserving analysis of sensitive time-to-event data. This study contributes to the field by offering a new comparison method to the current state-of-the-art used for SAs in medical research.

A Feasibility Study and Privacy Analysis of a Data Management Infrastructure in the Oncology Research Domain.

Francesco Luigi De Faveri
Miscellaneous In: Padua Thesis and Dissertation Archive, University of Padua, Italy, 2023

Abstract

The necessity of storing and manipulating electronic data (eHealth data) in the Oncology research field has introduced two main types of challenges. Firstly, research centers need to manage eHealth data with appropriate and secure Data Management Infrastructure, and secondly, to preserve the privacy of patients' data. This work consists of a study on the feasibility and privacy analysis of a Data Management Infrastructure in the Oncology Research Domain. The project studies potential strengths and weaknesses in developing a Digital Clinical Data Repository (DCDR) in a practical study case at “Centro di Riferimento Oncologico” in Aviano, Italy. The study considers the standard HL7 FHIR, an international standard for healthcare data retrieval and exchange, within two possible scenarios, a monolithic application and a fragmented one. The analysis of potential privacy-related aspects is examined within the General Data Protection Regulation (GDPR) framework and studied through a utility evaluation after applying Differential Private mechanisms.

Twitter Bots Influence on the Russo-Ukrainian War During the 2022 Italian General Elections

Francesco Luigi De Faveri, Luca Cosuti, Pier Paolo Tricomi, and Mauro Conti
Conference Paper In: Arief, B., Monreale, A., Sirivianos, M., Li, S. (eds) Security and Privacy in Social Networks and Big Data. SocialSec 2023. Lecture Notes in Computer Science, vol 14097. Springer, Singapore. (SocialSec 2023), pp. 38-57, 2023.

Abstract

In February 2022, Russia launched a full-scale invasion of Ukraine. This event had global repercussions, especially on the political decisions of European countries. As expected, the role of Italy in the conflict became a major campaign issue for the Italian General Election held on 25 September 2022. Politicians frequently use Twitter to communicate during political campaigns, but bots often interfere and attempt to manipulate elections. Hence, understanding whether bots influenced public opinion regarding the conflict and, therefore, the elections is essential. In this work, we investigate how Italian politics responded to the Russo-Ukrainian conflict on Twitter and whether bots manipulated public opinion before the 2022 general election. We first analyze 39,611 tweet of six major political Italian parties to understand how they discussed the war during the period February-December 2022. Then, we focus on the 360,823 comments under the last month’s posts before the elections, discovering around 12% of the commenters are bots. By examining their activities, it becomes clear they both distorted how war topics were treated and influenced real users during the last month before the elections.