Over the course of the 2020-21 academic year, this Bass Connections project focuses on the “privacy implication of COVID-19 contact tracing apps”. Students in the team investigate multiple aspects of the issue, including policies on the federal, state, local, and private levels, as well as the specifics of how contact tracing apps function. The tech subgroup focuses on the latter.
Contact tracing is method used by health authorities to determine who a person infected with COVID-19 has come into contact with. Through this, officials can notify those exposed of their possible risk of infection, and they can be tested for the virus. This can be a very intensive process usually done by human contact tracers, but digital alternatives have been proposed an implemented. At its most basic, digital contact tracing uses devices that people keep on them to remember who else they’ve come into contact with, by using Bluetooth to track based on proximity to others.
In order for people to be notified that they’ve come into contact with someone who has tested positive, it’s necessary for some level of information to shared between users. This could be done by simply publicly hosting all the tracking information of all the users for anyone to peruse, but this is an obvious violation of privacy, given the sensitive nature of health and location data. As a result, the centralized and decentralized approaches to contact tracing have been developed, and both aim to maintain the privacy of users while also fulfilling the functions necessary to contact trace. The official contact tracing apps of countries like Singapore, Australia, and France use the centralized approach, while apps for countries like Switzerland, Canada, and Ireland use the decentralized.
Contact tracing apps generally follow the centralized or decentralized approach, and each handle user information differently across individual phones and the central server. The centralized approach trusts a central server with user information in order to keep track of infected individuals and notify those who have come into contact with them. In contrast, the decentralized approach trusts the phones of the individuals to store user information and notify the user if they have come into contact with an infected individual. This is a discussion of the tech team’s understanding of each approach.
A user’s phone will have a list of temporary IDs to broadcast in fixed time intervals, as well as maintain a list of temporary IDs that it has received from other phones. When a user tests positive for the virus, uploading that information will allow the server and individual phones to alert their contacts that they’ve come into contact with a COVID-19 positive individual.
When a user begins using the app, the centralized server assigns the app a long-term identifier. The server will then provide the app a list of temporary IDs to send out, which the server has the power to map to the long-term identifier of the device. An app will store the temporary IDs to send out, as well as the temporary IDs of apps that it’s seen. Once an ID has been sent out for the last time, the app will erase it from its list of temporary IDs. If a person is infected with COVID- 19 and decides to upload that information, then the temporary IDs of all the contacts within a certain time frame will be shared to the server. The server in this case would have a list of the temporary IDs seen by infected users, which can be mapped to the associated long-term identifiers. By doing that, the server can alert those users that they’ve come into contact with someone who has tested positive.
The centralized and decentralized approaches have very similar implementations, except the server in the decentralized approach knows the temporary IDs of infected individuals, instead of the temporary IDs that an infected individual has come into contact with. As a result, the server is unable to notify those that came into contact with infected individuals, and must rely on their phones instead.
When a user begins using the app, the app will use a stored long-term identifier to generate and send out a list of temporary IDs. The list of temporary IDs it’s seen is also stored, and entries are deleted from it after a sufficient length of time (based on how long COVID-19 can be infectious for). If a person is infected and decides to upload the information, either the long-term identifier or the temporary IDs the app has sent out within a certain time frame will be shared to the server. The Google Apple framework does the latter. The server will then store and publish all the identifiers sent out by infected users. Individual apps will then check the list of temporary IDs they’ve seen against the server’s list, and alert the user if there is a match.
WHAT’S VULNERABLE (WHO’S BEEN INFECTED)?
In the centralized approach, all potentially identifying information is stored in the server, while individual apps hold little information about the user (temporary IDs are deleted as they are sent out). If the server was successfully compromised, or if the body overseeing the server uses the information maliciously, individual phones and their owners can be identified and associated with specific temporary IDs. With that knowledge, privacy risks such as identifying infected individuals and determining a user’s social circle may be occur. However, a malicious actor trying to access information stored in an individual app wouldn’t be able to learn anything about the user of the app.
In the decentralized approach, the apps store a cache of temporary IDs that have been sent out, while the server holds only the temporary IDs of positively infected users. By the design of the system, infected users give up a level of privacy, as the server publishes the temporary IDs of infected users for all apps to check. However, the server won’t have any information on uninfected users that can be used maliciously. Meanwhile, because the temporary IDs cannot be immediately removed from a phone, a hacker that breaks into an individual phone would be able to learn the temporary IDs associated with the user.
Each approach, if implemented correctly, can perform the functions necessary for contact tracing, but assumes a different source of user privacy risk. The centralized approach assumes that individual user data that can be leaked through the app is the biggest risk, while the decentralized approach assumes that the compromising of all the user data in one location is the biggest risk. Therefore, a user that is worried about being targeted for their information (such as high-profile celebrities) may prefer the centralized approach, while less well-known users may prefer the decentralized approach. In addition, governments may prefer the centralized approach, as it gives them a greater level of control over the accuracy of the information.
In the future, the tech team would like to further analyze the two approaches to determine their impacts on privacy and correctness. But regardless of the method chosen, a user that understands how each approach works can better be aware of what information their application, their public health agency, and what other apps know about them.
A very helpful source for this post and for the entire Tech team is Serge Vaudenay’s paper “Centralized or Decentralized? The Contact Tracing Dilemma.”