November 9, 2016
Where Traditional DNA Testing Fails, Algorithms Take Over
By Lauren Kirchner, ProPublica
Late on a hot August night in 2014, Syracuse, New York, police tried to pull over a car driving without headlights. The driver and passenger fled into a darkened park. As the officers chased them on foot, they said they heard a gunshot. The cops never caught the suspects, but recovered a loaded handgun.
The police connected the abandoned car to its owner, and arrested him, but could not tie him to the handgun without a DNA match. The mixture of DNA on the handgun was too complicated for forensic scientists to analyze with conventional methods, a representative from the Onondaga County crime lab later testified. There were at least four people’s DNA present and possibly five or six.
So the District Attorney’s office outsourced the analysis to Cybergenetics, a private company that makes TrueAllele, a “probabilistic genotyping” software program. Where traditional DNA analysis involves manually and visually interpreting DNA markers, TrueAllele runs DNA data through complex statistical algorithms to calculate the likelihood that a particular person’s DNA is present in a mixture, compared to a random person’s DNA.
Developers of tools like TrueAllele say that they remove human bias from the equation, delivering accurate, consistent results with the exactitude and cold remove of a calculator.
But critics worry that they undermine an important aspect of due process. The accused, defense attorneys, judges and jurors typically don’t have access to the tools’ often proprietary inner workings and, thus, the ability to question their conclusions. As one attorney wrote in a brief arguing that TrueAllele’s developer should have to reveal and explain its source code, “The Petitioner cannot cross-examine a computer.”
In the Syracuse case, TrueAllele indicated that the DNA on the gun was a likely match to Frank Thomas, the 19-year-old who owned the car. Prosecutors had previously offered Thomas a deal if he pleaded guilty to a gun possession charge, but Thomas had maintained his innocence.
The TrueAllele analysis was the only physical evidence presented at trial connecting Thomas to the gun. Dr. Mark Perlin, TrueAllele’s developer, testified that a match between the DNA on the gun and Thomas’ DNA was “1.78 trillion times more probable than a coincidental match to an unrelated African American person” and “892 billion times more probable than a coincidental match to an unrelated Hispanic person.” The attorney for Thomas, who is black and Hispanic, pressed Perlin to share the tool’s source code so that his results could be independently verified. Perlin argued this was unnecessary and irrelevant.
In March, Thomas was found guilty of criminal possession of a weapon, reckless endangerment and menacing a police officer. He was sent to prison for 15 and a half years. He’s appealing his conviction.
For the past year, ProPublica has been investigating and reverse-engineering various algorithms as part of a series called “Machine Bias.” We’ve found that these complex pieces of software are helping to guide decisions in an ever-growing number of realms, including criminal justice, in ways that are often little understood and sometimes unfair.
DNA evidence is the gold standard of forensic science. Even as other techniques, from bite-mark analyses to fire patterns, have come under question, DNA has remained the most unassailable and most objective form of proof that someone did, or did not, commit a crime.
The emergence of algorithmic analysis programs, however, is creating a new frontier of DNA science. The tools are so new and expensive that only a handful of local crime labs use them regularly. But as law enforcement looks to DNA more and more frequently to solve even minor crimes, that seems almost certain to change.
Perlin says that, while he resists turning over code, he takes pains to demonstrate how TrueAllele works when it’s used in a criminal trial, giving attorneys and judges access to test the software themselves. “‘Here’s the car, here’s the keys—drive it,’” he said he tells them.
Perlin started building TrueAllele for casework in 1999, a few years after working on the Human Genome Project. He has a bachelor’s degree in chemistry, PhDs in math and computer science, and a medical degree. In the early 2000s, his company helped clear the backlog of DNA samples waiting to be interpreted for the government databank in the UK, and later used TrueAllele to help identify victims’ remains at the World Trade Center site after September 11. TrueAllele was used for the first time in a criminal case in 2009 and now encompasses some 170,000 lines of computer code.
Cybergenetics offers police, prosecutors, and defenders an appealing business model: It offers to take on their most difficult DNA cases and provides preliminary results for free. If the results indicate the likelihood of a statistical match, customers only pay at the point at which they want Cybergenetics to run a complete analysis and write a report about the results that can be used at trial. Cybergenetics also licenses its software for crime labs to use themselves. Labs in the Commonwealth of Virginia, Baltimore, Kern County in California, and Beaufort and Richland counties in South Carolina all license TrueAllele.
“Our laboratory does a lot of property crime, which involves a lot of weak samples and mixtures,” said John Barron, senior forensic scientist at the Richland County Sheriff’s Department. “It’s a more complete analysis of the mixture versus manually using [conventional DNA] thresholds, so it’s fairer to both the prosecution and the defense. We use it quite a bit.”
Since TrueAllele came on the scene, other companies have developed software to compete with it. The U.S. Army and the FBI use STRmix, developed by a New Zealand-Australia collaborative and sold in the U.S. by Nichevision, as do several public crime labs across the nation. New York City’s Office of the Chief Medical Examiner recently announced that it will switch to STRmix in 2017.
In recent years, these powerful tools have enabled prosecutors to make cases with evidence that would have otherwise been difficult or impossible to interpret. TrueAllele solved a string of armed robberies from “touch” DNA swabbed from a store counter. STRmix solved another robbery by analyzing the sweat inside a sneaker.
The software isn’t only a tool for prosecutors: The Indiana Innocence Project used TrueAllele to help free a man who had been in prison since 1991 for a violent rape that DNA proved he did not commit.
Still, probabilistic genotyping remains on the outer edge of scientific acceptance. The White House released a report in September by the President’s Council of Advisors on Science and Technology (PCAST) that called probabilistic genotyping an improvement over traditional methods of analyzing complex mixtures of DNA, but concluded the tools “still require scientific scrutiny.”
Studies have only established the validity of the available software in certain circumstances (such as a DNA mixture of three contributors), but not others, the report asserted. The authors cite a case in upstate New York in which TrueAllele and STRMix were used to analyze the same DNA data and came to different conclusions. (The judge in that case ultimately did not admit the DNA evidence into trial.)
The PCAST report also noted that independent research is especially needed. Most of the studies published on TrueAllele and STRmix in peer-reviewed journals have been done by the developers of the tools.
“Appropriate evaluation of the proposed methods should consist of studies by multiple groups, not associated with the software developers, that investigate the performance and define the limitations of programs by testing them on a wide range of mixtures with different properties,” the PCAST report says.
Perlin, TrueAllele’s creator, and John Buckleton, one of the creators of STRmix, both objected. “Your Report cannot unilaterally impose a novel notion of ‘independent authorship’ for peer-review,” wrote Perlin in an open letter, explaining that having a developer as part of a team of authors is the norm in scientific publishing. Buckleton wrote that the internal validation studies performed by jurisdictions using STRmix should be proof enough that it works.
Some makers of probabilistic genotyping software allow other programmers to use and modify their code. LRmix, software created by a pair of scientists in the Netherlands, EuroForMix, created by a Norwegian team, and Lab Retriever, a non-commercial program available under the Creative Commons license and uploaded to GitHub, are among the free, open-source tools available.
Beyond offering transparency, this approach can help expose problems. A significant bug was discovered and fixed in LikeLTD, an open-source Australian probabilistic genotyping program, because of outside scrutiny.
But TrueAllele and STRmix remain proprietary. A coding error in STRmix was only discovered in the midst of a criminal trial where prosecutors sought to include its faulty results as evidence. (Its makers say the error was minor and was quickly fixed.)
Defendants’ requests to get access to TrueAllele’s source code have consistently been denied, leading the Electronic Privacy Information Center, an advocacy group, to kick off a FOIA campaign to obtain whatever information is publicly available from the jurisdictions that use it.
Some who advocate for defendants see plenty of upside to probabilistic genotyping tools, even without the benefits of full transparency. Greg Hampikian, a professor of biology at Boise State University who leads the Idaho Innocence Project, said the project has begun using TrueAllele to help exonerate wrongly convicted people.
“Microsoft Excel doesn’t release its code either, but we can test it and see that it works, and that’s what we care about,” Hampikian said.
Judges have endorsed the business interests cited by makers of probabilistic genotyping software in ruling that they do not have to hand over their source code.
The defendant in the first case using a TrueAllele analysis appealed his conviction, based in part on his inability to understand enough about the software to challenge it. Pennsylvania Judge Jack Panella denied the appeal, saying the defendant had no right to the formula behind the software. “TrueAllele is proprietary software; it would not be possible to market TrueAllele if it were available for free,” Panella wrote.
Dr. Dan Krane, a professor of biology at Wright State University and frequent expert witness, said he figured defendants’ right to confront their accusers would outweigh companies’ right to make money.
“I suppose these are both Constitutional principles, but I thought one would trump the other,” Krane said. “And that’s not what’s happening here.”
ProPublica is an independent, non-profit newsroom that produces investigative journalism in the public interest. ProPublica is headquartered in Manhattan.