very Sunday night in the state attorney general's DNA lab in Richmond, when the offices are silent and the computers idle, an automated matching program switches on and shuffles through more than a million DNA profiles of known felons, and of unidentified crime-scene samples. The next morning, analysts arrive at work to find a list of potential matches - up to 50. In an average week, about nine of those will be found to be definitive matches.
This is how, one summer Sunday night nearly four years ago, a 71-year-old Stockton man in a wheelchair was linked to the 1972 rape and murder of Diana Sylvester, a young nurse in San Francisco.
The investigators called it a cold hit
because the suspect, John Puckett
, was never implicated until the database-matching software spat out his name. No witnesses, no confessions, no footprints or fingerprints tied Puckett to the crime. (Curiously, in 1972 an eyewitness identified a different suspect, but that suspect was never prosecuted due to a lack of corroborating evidence. He died six years later.) Still, Puckett was no Boy Scout. He had pleaded guilty to two counts of rape and a sexual assault in 1977 and was imprisoned until 1985. Which is why the state had obtained his DNA profile in the first place.
During Puckett's San Francisco trial last January, the prosecution's expert estimated that the chances of a coincidental match between the defendant's DNA and the biological evidence found at the crime scene were 1 in 1.1 million. This, no doubt, gave the jurors a compelling reason to convict Puckett of the killing and send him to prison for at least seven years - if not the rest of his life.
What the jurors didn't know, though, and what the judge didn't think they needed to know, is that there's another way to run the numbers. And according to that math, the odds of a coincidental match in Puckett's case are a whopping 1 in 3.
Now on appeal, Puckett,
along with similar cases across the country, is raising new questions and concerns over how prosecutors present DNA evidence to jurors (People v. Puckett,
No. A121368, Cal. Ct. App
., 1st Dist., May 1, 2008).
When you trawl through immense digitized compendiums of genetic fingerprints, "there is an undue risk of false matches," says former Food and Drug commissioner Donald Kennedy, who contributed to a recently released bombshell of a study from the National Research Council
that raised serious questions about how forensic labs are administered. The science, Kennedy adds, "is being shut out of court." And even experts who say the danger of false matches is being exaggerated acknowledge that as offender databases continue to grow, the chances of convicting the innocent will only increase.
So, which is it: 1 in a million, or 1 in 3? Both calculations in the Puckett
case, are, in fact, accurate. The problem is that they answer different questions. Perhaps the easiest way to explain what's going on here is to describe a classic statistical puzzle known as the "birthday problem."
First ask yourself this: What are the odds that, in any roomful of people, someone chosen at random will share your birthday? Simple answer: 1 in 365. Pretty much everybody gets that one.
Now consider another seemingly simple question: How many people have to occupy the room to make it likely that at least two of them share a birthday? The not-so-obvious answer is 23. That's all you need to create a better-than-even chance of finding a birthday match.
Furthermore, you don't need to squeeze many more people into that room to make a match a near certainty: With just 57, in fact, the odds of a match exceed 99 percent. The same counterintuitive math applies to DNA database searches as well.
A couple of decades back though, before there were big databases to tap into, this wasn't an issue: At the time, DNA evidence was used almost exclusively to verify existing suspicions about known suspects, which made the math a lot less troublesome. And even today this is the most common use of such evidence.
The analysis of such "confirmation cases" begins with a comparison of genetic markers
at particular locations on the DNA molecules, referred to as loci. If enough of the loci from the suspect's DNA line up with those from the DNA found at the crime scene (and if none are found to be different), a match is declared. Once that happens, a statistic is generated that shows how rare (or common) the matched genetic profile is in the general population. Statisticians call this the "random match probability"(RMP)
, and it is often a very small number, which can be very helpful to a prosecutor trying to win a conviction.
In the birthday problem, the RMP is 1 out of 365 (.00274), just as in Puckett's case it's 1 in 1.1 million (.00001). And for many DNA matches it's 1 in many billions. These numbers make it seem extremely unlikely that a particular genetic profile could belong to more than one individual. Therefore, everyone agrees that the RMP is the proper statistic to use in a DNA confirmationmatch case. It answers the question: How rare is the identified genetic profile in the general population?
But in cold-hit cases, where the hunt is for matches among hundreds of thousands of DNA profiles rather than for a specific suspect, many statisticians say that the RMP is the wrong statistic to use. Instead, they want to know: What is the likelihood that the database will spit out an innocent person's name?
As in the birthday problem, the odds of finding a match in a room - or a database - full of people are much greater than the odds of selecting one person at random off the street who will match a certain birthday - or DNA profile. The odds of a coincidental match - a number called the database match probability - can be much greater, as well.
So far, though, almost every judge in the country who has been presented with this mathematical paradox has dismissed it as a serious problem. As a California court of appeal observed in People v. Johnson (139 Cal. App. 4th 1135 (2006)),
"[T]he database is not on trial. Only the defendant is."
This turns the issue into one of legal relevancy rather than science, and as such, judges do not have to go through any lengthy Kelly-Frye
hearings about scientific validity to determine where the experts stand on the matter. But the experts themselves found the reasons given in such rulings as Johnson
to be highly flawed. In fact, soon after the Johnson
decision came out, the California Supreme Court received a letter, signed by 25 leading statisticians
, arguing that it was dead wrong.
"The fact that a suspect is first identified by searching a database unquestionably changes the likelihood of the matching being coincidental," the letter asserted. And though various schools of statistical thought differ on the degree of likelihood of false matches, the letter added, "We all agree that the fact that the suspect was first identified in a DNA database search must be taken into account."
After the jurors in Puckett's trial began their deliberations, Superior Court Judge Jerome Benson
- who cited the Johnson
decision in his initial ruling not to share with jurors the 1-in-3 estimate - apparently had a change of heart and asked both sides if they would stipulate to informing jurors of thi s alternate estimate. But the judge's request went nowhere, because the lawyers couldn't agree on the wording of the stipulation.
Deputy Public Defender Bicka Barlow
, who represented Puckett, happens to be trained as a molecular biologist. "If a full hearing on the science was held, database matches might not be used at all," she contends, "because there is no agreement in the scientific community. ... On this issue, the courts in California are broken."
But the math isn't the only thing that concerns Barlow. She's also troubled by the unfettered access that prosecutors have to offender databases, while the defense side is effectively prevented from using them to evaluate the state's methodology.
"In many ways, it is a unique situation," she says. "There is really no other type of evidence in which the prosecution is permitted to do this, to essentially say: 'Trust us.' "
Barlow's interest was heightened several years ago, when she got wind of a presentation that a technician from Arizona's state DNA crime lab made in 2001 at a forensics conference. The technician had done something no one else had tried before, or at least revealed publicly: She had run DNA comparisons of all the offenders in the state database and was shocked to find approximately 90 coincidental matches of profiles that were identical at nine and even ten loci. At the time, such matches were considered exceedingly rare, yet Arizona's then relatively small database of 60,000 offenders contained scores of them.
In Puckett's case, the DNA match extended to only five and a half loci, an unusually low-grade match but the best that could be developed from the degraded biological samples left over from 1972.
Barlow subpoenaed the Arizona database records, but she was not permitted to introduce them in Puckett
or any other case she has worked on that involved DNA evidence. Meanwhile, the FBI, obviously displeased by the Arizona technician's research, began threatening sanctions against crime labs that shared such information with anyone outside of law enforcement. The agency even suggested that crime labs could be barred from accessing the FBI's own national DNA database, thought to be the largest in the world, with nearly 6 million profiles.
When Barlow attempted to gain access to basic information about California's DNA database, state Deputy Attorney General Michael Chamberlain echoed the FBI's concerns. He argued in court that such disclosures would have dire consequences, violating offenders' statutory privacy protections and overloading the state's computers and DNA workforce. Chamberlain has used the same arguments with success to fight requests by other defense lawyers who have tried to follow Barlow's lead.
"The penal code ... unequivocally states that disclosure has to be strictly controlled and should not be disseminated outside the context of a particular criminal case," Chamberlain says. "That disclosure bar is crucial in maintaining the constitutionality of the statute. Big DNA databases bring with them big fears - from a privacy perspective, genetic surveillance fears. ... In response, the statute included strict nondisclosure restrictions ... that the information would not be disclosed except to law enforcement, and only when there is a demonstrated need."
Chamberlain also points out that when the Legislature was authorizing creation of the state's DNA database, it was the defense bar that most strenuously objected, on the ground that it represented a threat to privacy. "Now they've done a 180-degree turn, and they want information on matches not only on individual clients but also on all the profiles in the database," he says.
Barred from studying the actual incidence of coincidental matches in the California database - and with the state showing no inclination to conduct it's own Arizona-style study - Barlow has since turned to prominent statisticians and DNA researchers in an effort to estimate the likelihood of a false match. These experts have told her they can predict, even without examining the data, that coincidental matches are much more common than judges and juries seem to realize, due to the sheer size of the DNA databases. They point to a 1996 National Research Council study - requested, and later ignored, by the FBI - concluding that the database match probability, not the random match probability, should be used to explain the significance of a cold-hit DNA match. The report further recommends calculating this number by taking the random match probability for the suspect's profile and multiplying it by the number of offenders in the database searched. This is how the 1-in-3 number was produced in the Puckett
Should judges have to choose between one method of estimation and another? Edward Blake, for one, doesn't think so. The nationally recognized pioneer in forensic DNA technology, who runs Forensic Science Associates in Richmond, California, argues that both
the database match frequency and the random match frequency should be presented to juries "with their separate and distinct meanings." But he does agree with Barlow that excluding the former statistic in Puckett
was a "travesty."
The database probabilities are always important in cold-hit cases, Blake notes, because "the defendant is identified not through any investigative lead but simply because his name pops up in a search. Keeping that information from the jury is dishonest. It should be introduced and explained in context."
Why, then, the resistance? Blake blames it on the FBI's obsession with preserving DNA database secrecy, and the willingness of state court judges and prosecutors to go along with what, he says, amounts to an FBI "power grab." "The data should be available to everyone," Blake declares. "It's a simple matter to strip out identifying information to address privacy concerns. So what are they hiding? Nothing good-I can tell you that."
UC Irvine's William C. Thompson, a DNA evidence expert and member of the California Crime Lab Review Task Force, agrees that there is "a disturbing lack of transparency" surrounding the government's DNA databases. Thompson, who was instrumental in exposing serious problems in the now-defunct DNA lab of the Houston Police Department, says more transparency is essential to keeping the system fair and honest, and to preventing innocent people from being sent to prison.
Even Deputy AG Chamberlain, who considers this controversy overblown, concedes that letting juries know there are alternative calculations to consider may be appropriate in some cases. Moreover, the California Supreme Court reached the same conclusion in a brief note last year in another cold-hit case, People v. Nelson
(43 Cal. 4th 1242 (2008)).
In the final analysis, the point may not be that difficult even for prosecutors to concede. After all, most DNA cases these days involve matches at 10 loci or more, with 13 considered foolproof, producing RMPs in the range of 1 in tens or hundreds of billions. As Blake observes, multiplying such odds by the number of profiles in a database would dilute the power of the evidence a bit, but the chance of a coincidental match would still be fantastically small. Thus, in most instances, there will be little difference between the two calculations.
Still, as the Puckett
case shows, there are times when the crime-scene evidence is degraded or scant, so that fewer loci can be detected for comparisons. And as the number of loci decreases, the probability of an accurate match decreases dramatically.
In addition, Barlow suspects that once access to the state database information is granted, it will be clear that the statistical risk of coincidental matches is even greater than current estimates suggest. "I'm a scientist before a lawyer," she says. "Good science requires transparency, and we have none. I find this very disturbing."
Barlow is not alone. The National Research Council, in its groundbreaking report critiquing the use of technology to determine the guilt or innocence of suspects, in February recommended taking all forensic laboratories out of the administrative control of both law enforcement and prosecutors.
Meanwhile, in January, California greatly accelerated the growth of its Richmond database when it started to add DNA profiles not only of anyone convicted of a felony but also of anyone arrested
for one. This, it is estimated, will soon expand the database by 35,000 new profiles a month, which by 2012 would triple its size. With numbers like these, the case for making the science behind cold-hit DNA matches more transparent to jurors is bound to get stronger.
Edward Humes, based in Southern California, is a Pulitzer Prize-winning journalist and author of ten nonfiction books.