Data brokers and data breaches

September 27, 2022

Data brokers and data breaches

Justin Sherman

September 27, 2022


Data brokers—companies that collect, infer, and buy people’s information and then sell, license, and otherwise share or monetize it—have given people’s information, without their full knowledge and consent, to a range of buyers. Banks, health insurance companies, prospective employers, predatory lenders, law enforcement agencies like Immigration and Customs Enforcement (ICE) and the Federal Bureau of Investigation (FBI), and criminal scammers are just some of the parties who have purchased, licensed, or otherwise acquired individuals’ data from companies engaged in data brokerage.

Much less attention has been paid, however, to another risk associated with companies gathering large volumes of data and then prepackaging it: data breaches.

Companies that broker data possess, by definition, data about individuals. For some companies, gathering and monetizing that information is their entire business model. That monetization could range from selling the information outright to enabling third parties to run advertisements that draw on the information—such as delivering a Nike ad to someone in the market for new sneakers or, in an extremely exploitative and harmful case, enabling anti-abortion groups to target women literally sitting in clinic waiting rooms. Other companies broker data on the side. For example, many of the United States’ biggest internet service providers (ISPs) monetize their customers’ information through selling ad access, sharing it, and other means.

All of these companies may be subject to data breaches, and several data brokers have already compromised people’s information through poor security practices.

In 2018, a researcher found that data broker Exactis was exposing nearly 340 million people’s information to the public internet through an unsecure server. The same year, data broker Apollo was hacked, and billions of data points on people were exposed, including people’s email addresses. In 2019, San Francisco-based data broker LimeLeads, which licensed access to a database of people’s information, simply did not set up a password for its internal server—enabling anyone from the internet to access it. The broker’s data on 49 million people then showed up on a criminal hacking forum. In 2020, data broker Social Data, which scraped millions of records from Instagram, TikTok, and YouTube and then sold them (against platform terms of service), exposed nearly 235 million profiles on a server with no password or any other kind of authentication. The list goes on.

Equifax is a large data broker that collects information on hundreds of millions of workers and then sells or otherwise monetizes it (among other kinds of data). Infamously, Equifax was also hacked in September 2017, exposing the information of 147 million people, and it settled with the Federal Trade Commission (FTC), Consumer Financial Protection Bureau (CFPB), and states in July 2019—agreeing to pay at least $575 million in fines. The FBI later indicted four Chinese “military-backed” hackers with carrying out the breach. Information stolen ranged from names and addresses to Social Security Numbers, drivers’ license numbers, and more, and plenty (if not all) of that information was gathered by Equifax for the purposes of brokering it. This was not lost on lawmakers. Senator Richard Blumenthal said shortly after the hack that “the Equifax scandal is conclusive evidence that consumers need and deserve [privacy and security] protections” and that “third party data brokers profiting off the sale of personal consumer information is a shameful violation of the privacy and security of millions of Americans.”

Hacks of data brokers’ clients have also compromised broker-held information.

In August 2020, cybersecurity journalist Brian Krebs published an investigation revealing that a data broker’s data had ended up in the hands of criminals, because some of the broker’s customers were hacked. Krebs pieced the threads together: Interactive Data, a Florida-based data broker, gathered what Krebs called an “extraordinary” amount of information on individuals, including their full Social Security Number, date of birth, all current and previous known physical addresses, all known email addresses, vehicle registrations, available lines of credit, and IP addresses. The broker then sold that information to a range of clients, from law enforcement to debt recovery companies. However, it was then discovered that criminals were using that information to steal millions of dollars from the federal government in phony loan applications—and make fraudulent unemployment insurance claims against numerous states. Interactive Data wouldn’t provide many specifics about how that data ended up in criminals’ hands, but its CEO did indicate a hack may have been responsible: “We identified a handful of legitimate businesses who are customers that may have experienced a breach,” he told Krebs.

The year prior, Lily Hay Newman at WIRED reported that an unsecured web server had been exposing over 1.2 billion records to the internet, including the profiles of “hundreds of millions of people that include home and cell phone numbers, associated social media profiles like Facebook, Twitter, LinkedIn, and GitHub, work histories seemingly scraped from LinkedIn, almost 50 million unique phone numbers, and 622 million unique email addresses.” It was unclear who managed the server, and it was soon removed. But three of the four datasets that comprised the 1.2 billion records appeared to come from data broker People Data Labs. The last of those four datasets appeared to have come from another data broker, Oxydata. People Data Labs said it performs free security audits for its customers, and Oxydata said it had not been breached, forbids its customers from selling data, and requires its customers to implement “appropriate security measures.” It nonetheless appears, based on Newman’s reporting, that the leaked data was a result of data broker customers poorly securing information.

While distinct from the legally registered companies that broker data, there is even a dark web “data broker” ecosystem—actors who broker the information stolen in a data breach. As a Bleeping Computer article describes it, “when threat actors and hacking groups breach a company and steal their user databases, they commonly work with data breach brokers who market and sell the data for them. Brokers will then create posts on hacker forums and dark web marketplaces to market the stolen data.” And there has been at least one case where a dark web data broker intersected with hacks of legally registered data brokers: Brian Krebs reported in 2013 that a dark web forum selling information stolen on people had broken into the networks of data brokers LexisNexis, Dun & Bradstreet, and Kroll Background America to siphon data.

Allowing companies to collect, infer, buy, aggregate, and sell, license, or otherwise share people’s information at such scale, with virtually no regulation, only increases the risk that highly sensitive information is acquired by criminal scammers, foreign government agencies, and other actors. The data “attack surface,” in a sense, grows. Anouk Ruhaak, a Mozilla senior fellow, described it this way in 2019: “Every time a data broker makes a sale, more data is released into the wild—and, consequently, the risk of future data breaches goes up. It’s critical, therefore, that data brokers understand who they sell to and guarantee that their buyers can adequately safeguard sensitive information.” Data brokers that do not adequately invest in their own security—and that do not ensure their clients have adequate security—are not just putting individuals’ highly sensitive information at risk. They are also, in many cases, putting prepackaged datasets about those individuals at risk.

Ruhaak’s last point is additionally important. Our data brokerage team’s research, conducted at Duke’s Sanford School, has attempted to better understand the controls that data brokers do or do not put in place around the sale of people’s information. Introducing controls on the sale of highly sensitive information that was secretly collected in the first place does not eliminate all harms. Nonetheless, a lack of controls might perpetuate harm even further—such as when data brokers provide information to law enforcement carte blanche or when a company is approached by a criminal scammer, blocks the sale, and then is convinced by marketing employees to sell the data anyway (read more about that case here). The question of whether a broker vets its potential customers, for example, is vital to understanding a company’s due diligence and how further harms might be inflicted on people.

Security controls are an essential component of understanding the data brokerage landscape. The security controls that data brokers should implement might be universal, or they might vary depending on the exact sensitivity of the data the brokers hold and the nature of their data sharing. Data brokers that don’t literally provide their customers with data, such as by enabling a marketer to run advertisements through an Application Programming Interface (API) without sending over data, would have to ask security questions about and implement controls around API access, authenticating those trying to access an API, and so on. Their focus might lie more on the security of the ad interface. But data brokers that give data to customers—whether that data consists of people’s names and emails, medical conditions and drug prescriptions, or aggregated GPS locations—should have to put in place detective controls concerning their customers’ security. If those customers do not have sufficient mechanisms in place to protect that information, the broker would be further exposing people to harm by not conducting a due diligence review of that potential customer’s security practices.

Again, introducing controls on the sale of data does not wipe away the harms of collecting it in the first place. Nor does it wipe away the potential harms of selling the data. For example, a data broker could determine that law enforcement agencies have sufficient cybersecurity controls around data handling—and continue to allow police to exploit loopholes in the Fourth Amendment and related protections to buy data on people without warrants, public disclosure, or robust oversight. But not having proper security controls puts individuals at further risk. With data breaches on the rise, and a growing threat to everything from people’s financial well-being to national security, the intersection of data brokers and data breach risks demands more attention.

Justin Sherman (@jshermcyber) is a senior fellow at Duke University’s Sanford School of Public Policy, where he leads its data brokerage research project.