Last week, I had a creepy feeling — one familiar to almost anyone on the internet. My personal information had been exposed online, again.
My name, email address and information about my work history had turned up in an online database for anyone to find — my details along with those of millions of other people.
I received a notification about the incident from Have I Been Pwned?, an online service that lets me know when my email address is part of a data breach.
This time, the alert suggested my data had been lost by an industry I’d never heard of — the booming business of “data enrichment”.
How to find a data breach
When cyber security researcher Bob Diachenko found the database he was immediately struck by its scale.
It didn’t have passwords or credit card details, but it was still a “dangerous” trove that could have serious impact in the wrong hands — for targeting people with phishing attacks, for example, based on their background and profile.
He and fellow researcher Vinny Troia attempted to find the data’s owner, but with no luck.
They did determine, however, that some of the data appears to have originated with data brokers called Oxydata and People Data Labs (PDL), which offer “data enrichment” services.
PDL’s cofounder Sean Thorne told Wired his company doesn’t own the server in question, but a third party could have taken their data and left it exposed.
An Oxydata spokesperson also told me that while clients are “contractually obligated” to protect what they’ve purchased, there is little the company can do when it’s out of their hands.
No matter who was responsible, Mr Diachenko suggested the main question remained: “How did they get my data?”
The world of data enrichment
Data enrichment services take personally identifying information — email addresses or phone numbers, for example — and match it with broader profiles of individuals.
That could be information about your education level, your religion or your interests.
It’s all calculated so the organisations purchasing their services can know more about you — all the better to sell you things.
On its website, PDL claims to have data about more than 1.5 billion people, including “resume, contact, social, and demographic information”.
Troy Hunt, who runs Have I Been Pwned, said he’s uncomfortable with the world of data brokers and data enrichment.
In his view, many operate on “different levels of shadiness”, especially when it comes to how they build individual records.
My exposed data seems to have been scraped from LinkedIn.
Getty Images: stockcam
Some firms may purchase datasets from other companies that got permission to collect it — online retailers, for example.
Personal details could be gathered from internet surveys that only disclose they share data in the fine print.
Then, at what he called the more “dodgy” end, information could be scraped from websites against their terms and conditions — from university websites that detail staff contact emails, for example.
But there’s another potential source — other data breaches that have already spilled volumes of sensitive data.
“Think about ashley madison, and hey, it’s available to download for free,” Mr Hunt said.
Plenty of data brokers and credit agencies have had data breaches in recent years. Most notably, the 2017 Equifax incident in which the Social Security numbers, birth dates and addresses of around 143 million Americans was stolen.
“They’ve got all of this data,” Mr Hunt said of the industry.
“They very often don’t protect it very well. Or they sell it to customers and then the customers don’t protect it very well.”
What can be done about it?
Much of my exposed data appears to have been taken from LinkedIn, but that company’s terms stipulate that it’s forbidden to scrape data from its website.
LinkedIn confirmed it has no relationship with PDL, nor other brokers potentially caught up in this exposed database.
According to a spokesperson, the company’s investigation indicates that a third-party company has exposed a set of data aggregated from several websites, including information copied from LinkedIn public profiles.
“This was not a breach of LinkedIn,” he said.
“When anyone tries to take member data and use it for purposes, LinkedIn and its members haven’t agreed to, we take action to stop them.”
ABC Science YouTube teaser
You can take a few steps to try and protect yourself from such data collection, said Suranga Seneviratne, a computer security lecturer at the University of Sydney.
He suggested not displaying personally identifiable information like phone numbers or email addresses on publicly-facing websites, for example.
It’s also important to check the privacy settings on the sites you use, as often data about you is simply sold rather than scraped.
Ultimately, though, Dr Seneviratne argued it’s not something an individual can solve.
After all, this is data you may have willingly shared for one purpose, now being used for another without your knowledge. Until it turns up on Have I Been Pwned.
“It’s quite difficult to identify these service providers, and also quantify and measure what they are doing,” he said.
Rather, we need legal and infrastructural change to protect our data at the industry level. Or stop them collecting it at all.
Some argue it should be no surprise that data which is already publicly available on services like LinkedIn is aggregated and exposed.
But Mr Hunt disagrees — his own information was caught in this leak.
“The average person is proceeding on the assumption that when they give their personal information to an organisation, it will be respected, it will be secured properly,” he said.