Fighting Uphill Battles: Improvements in Personal Data Privacy
Abstract
With the rise of modern information technology and the Internet, the worldwide interconnectivity is resulting in a massive collection and evaluation of potentially sensitive data, often out of control of those affected. The increasing impact of this data stream and the potential for its abuse raise concern, calling for protection against emerging exploitations and fear-driven self-censorship. The ability of individuals or a group to limit this flow and to express themselves selectively is commonly subsumed under the umbrella term \textit{privacy}. This thesis tackles the digital generation, processing, and control of personal information, so-called individual data privacy, from multiple angles. First, it introduces the concept of passive participation, enabling users to access information over the Internet while hiding in cover traffic passively generated by regular users of frequently visited websites. This solves the bootstrapping problem for mid- and high-latency anonymous communication networks where an adversary might collect thousands of traffic observations. Next, we analyze the statistical privacy leakage of multiple such sequential adversarial observations in the information-theoretic framework of differential privacy that aims to limit and blur the impact of individuals. There, we propose the privacy loss distribution, unifying several other often used differential privacy notions, and show that it converges towards a Gaussian shape under independent sequential composition of observations, allowing the classification of differentially private mechanisms into privacy loss classes defined by the parameters of said Gaussian distribution. However, more blurring means less accurate results, the inherent privacy-utility trade-off. We applied a gradient descent optimizer and learned utility-loss-minimizing truncated noise patterns for differentially private mechanisms that blur the impact of individuals by adding the learned noise to sensitivity-bounded outputs. Our results suggest that Gaussian additive noise is close to optimal, especially under sequential composition. Finally, we tackle the trust problem in truthfully executed deletion requests for personal data and provide a framework for probabilistic verification of such requests while demonstrating its feasibility for the case of machine learning.
People
BibTex
@PHDTHESIS{sommer2021fighting,
copyright = {In Copyright - Non-Commercial Use Permitted},
year = {2021},
type = {Doctoral Thesis},
author = {Sommer, David Marco},
size = {262 p.},
abstract = {With the rise of modern information technology and the Internet, the worldwide interconnectivity is resulting in a massive collection and evaluation of potentially sensitive data, often out of control of those affected. The increasing impact of this data stream and the potential for its abuse raise concern, calling for protection against emerging exploitations and fear-driven self-censorship. The ability of individuals or a group to limit this flow and to express themselves selectively is commonly subsumed under the umbrella term \textit{privacy}. This thesis tackles the digital generation, processing, and control of personal information, so-called individual data privacy, from multiple angles. First, it introduces the concept of passive participation, enabling users to access information over the Internet while hiding in cover traffic passively generated by regular users of frequently visited websites. This solves the bootstrapping problem for mid- and high-latency anonymous communication networks where an adversary might collect thousands of traffic observations. Next, we analyze the statistical privacy leakage of multiple such sequential adversarial observations in the information-theoretic framework of differential privacy that aims to limit and blur the impact of individuals. There, we propose the privacy loss distribution, unifying several other often used differential privacy notions, and show that it converges towards a Gaussian shape under independent sequential composition of observations, allowing the classification of differentially private mechanisms into privacy loss classes defined by the parameters of said Gaussian distribution. However, more blurring means less accurate results, the inherent privacy-utility trade-off. We applied a gradient descent optimizer and learned utility-loss-minimizing truncated noise patterns for differentially private mechanisms that blur the impact of individuals by adding the learned noise to sensitivity-bounded outputs. Our results suggest that Gaussian additive noise is close to optimal, especially under sequential composition. Finally, we tackle the trust problem in truthfully executed deletion requests for personal data and provide a framework for probabilistic verification of such requests while demonstrating its feasibility for the case of machine learning.},
keywords = {Differential privacy; Machine learning; Anonymous communication},
language = {en},
address = {Zurich},
publisher = {ETH Zurich},
DOI = {10.3929/ethz-b-000508911},
title = {Fighting Uphill Battles: Improvements in Personal Data Privacy},
school = {ETH Zurich}
}
Research Collection: 20.500.11850/508911