Genetic testing has helped plenty of people gain insight into their ancestry, and some services even help users find their long-lost relatives. But a new study published this week in Science suggests that the information uploaded to these services can be used to figure out your identity, regardless of whether you volunteered your DNA in the first place.
The researchers behind the study were inspired by the recent case of the alleged Golden State Killer.
Earlier this year, Sacramento police arrested 72-year-old Joseph James DeAngelo for a wave of rapes and murders allegedly committed by DeAngelo in the 1970s and 1980s. And they claimed to have identified DeAngelo with the help of genealogy databases.
Traditional forensic investigation relies on matching certain snippets of DNA, called short tandem repeats, to a potential suspect. But these snippets only allow police to identify a person or their close relatives in a heavily regulated database. Thanks to new technology, the investigators in the Golden State Killer case isolated the genetic material that’s now collected by consumer genetic testing companies from the suspected killer’s DNA left behind at a crime scene. Then they searched for DNA matches within these public databases.
This information, coupled with other historical records, such as newspaper obituaries, helped investigators create a family tree of the suspect’s ancestors and other relatives. After zeroing on potential suspects, including DeAngelo, the investigators collected a fresh DNA sample from DeAngelo—one that matched the crime scene DNA perfectly.
But while the detective work used to uncover DeAngelo’s alleged crimes was certainly clever, some experts in genetic privacy have been worried about the grander implications of this method. That includes Yaniv Erlich, a computer engineer at Columbia University and chief science officer at MyHeritage, an Israel-based ancestry and consumer genetic testing service.
Erlich and his team wanted to see how easy it would be in general to use the method to find someone’s identity by relying on the DNA of distant and possibly unknown family members. So they looked at more than 1.2 million anonymous people who had gotten testing from MyHeritage, and specifically excluded anyone who had immediate family members also in the database. The idea was to figure out whether a stranger’s DNA could indeed be used to crack your identity.
They found that more than half of these people had distant relatives—meaning third cousins or further—who could be spotted in their searches. For people of European descent, who made up 75 percent of the sample, the hit rate was closer to 60 percent. And for about 15 percent of the total sample, the authors were also able to find a second cousin.
Much like the Golden State investigators, the team found they could trace back someone’s identity in the database with relative ease by using these distant relatives and other demographic but not overly specific information, such as the target’s age or possible state residence.
According to the researchers, it will take only about 2 percent of an adult population having their DNA profiled in a database before it becomes theoretically possible to trace any person’s distant relatives from a sample of unknown DNA—and therefore, to uncover their identity. And we’re getting ever closer to that tipping point.
“Once we reach 2 percent, nearly everyone will have a third cousin match, and a substantial amount will have a second cousin match,” Erlich explained. “My prediction is that for people of European descent, we’ll reach that threshold within two or three years.”
What this means for you: If you want to protect your genetic privacy, the best thing you can do is lobby for stronger legal protections and regulations. Because whether or not you’ve ever submitted your DNA for testing, someone, somewhere, is likely to be able to pick up your genetic trail.