Privacy is often at stake because of the accumulation of sensitive personal information that a user conveys over an extended time period on a variety of platforms (social networks, discussion forums, review sites, etc.). An additional risk is that information conveyed only to friends may be transitively made visible to a broader community. Finally, data that a user provides to commercial or public services (for shopping, registering for events, etc. sometimes becomes widely visible although the user assumed that it would be kept confidential. Possible reasons could be software bugs, careless system administration, or that the service runs out of business or is acquired so that policies are neglected or changed.
Even if each individual leakage of these kinds may be relatively harmless, the major risk is that someone could compile all this information and draw conclusions about the user from the entirety of information.
For example, a job recruiter or an insurance could decline someone’s application based on the user’s digital traces collected over years, much of which the user already forgot. It is impossible to completely prevent such situations as the criticality builds up over an extended time period. However, what users need is tools and guidance to determine what information is visible beyond its originally intended scope and to assess the potential privacy risk.
The goal in this project was to develop models, methods, and scalable tools for this very purpose of improving the user’s awareness about long-term traces in the digital world and support her in understanding the potential criticality of her disclosed personal information. This entails a number of sub-goals: (1) Find and retrieve the entire personal information that the user has disclosed on the Internet over an extended time period. (2) Determine which piece of information was visible to whom. (3) Analyze the provenance of how each piece became visible (e.g., by other users copying, citing, or forwarding some data). (4) Continuously monitor the user’s actions that leave digital traces (such as posts, but also clicking on “like” buttons, rating other users’ posts or products, connecting with new friends, and so on) and match those actions in realtime against a database of past actions.
Role Within the Collaborative Research Center