DocketNumber: AP-75,062
Filed Date: 3/1/2006
Status: Precedential
Modified Date: 9/15/2015
Numbers flummox many of us, and as a result, numerical evidence can become confusing and misleading. This is particularly true if the evidence that inserts numbers into the legal equation is new and marginally understood. We are now at that point with evaluation of DNA evidence. The experts who come to court to present DNA evidence frequently come up with probabilities (2) of such great magnitude that they are patently unsupportable to those who understand numbers and very impressive to those who do not. The following discussion assumes that the only evidence linking the defendant to the offense is DNA.
The first probability mistake that experts make is to treat all variables (3) as independent. A variable is independent if, when the numerical value of the variable changes, no other variables necessarily change also. In a rectangle, height and width are independent variables; changing the height does not necessarily change the width. A dependent variable is one that necessarily changes in response to a change in another variable. The area of a rectangle is the product of multiplying height times width and is a dependent variable; if the height or width changes, the area necessarily changes.
The probability of two independent variables occurring at the same time is the product of the probabilities for each: if each variable occurs one time in ten, the probability of both variables occurring at the same time is 1/10 x 1/10, or 1/100 = one in one hundred. Probabilities decrease rapidly with the number of variables. With only six independent variables that have individual probabilities of one in ten, the probability of all occurring at once is one in a million. The probability decreases even more rapidly for variables that occur less often than one in ten times; for variables that occur once in a hundred times, the probability of one in a million requires only three variables.
A problem arises when dependent variables are treated like independent ones. In a California case from 1964, (4) The People v. Collins, an older woman returning from the grocery was accosted from behind and did not see her attacker, who took her purse. She did see a young woman running from the scene and described her as weighing about 145 pounds, wearing "something dark," and having blonde hair that was lighter than Janet Collins's hair was at the time of trial. A man who had been nearby reported seeing a young white woman with a blonde ponytail running from the direction of the robbery, but did not see the offense occur. He described the woman as slightly over 5 feet tall, of ordinary build, wearing a dark blonde ponytail and dark clothing. He also reported that the young woman got into a yellow or partly yellow car driven by a black man who had a beard and moustache.
The defendants, an interracial couple, were arrested and charged because they "sort of" matched the physical descriptions, owned a car that was at least partly yellow, were newly married, jobless, and broke. They denied involvement and provided an alibi. At trial, the state presented a mathematics instructor from a state college as their expert witness on the probability that the defendants were guilty. The witness refused to assign probabilities to the various factors chosen by the prosecutor, so the prosecutor proposed "probabilities" of his own; one in three young women were blonde, one in ten wore a ponytail, one in ten cars was at least partly yellow, one in ten black men had a beard, one in four men had a moustache, and one in a thousand couples was interracial. (5) The prosecutor then multiplied his own "probabilities" together and calculated that the profile would match one in 12 million couples. (6) The Collinses were convicted of the robbery based on the claimed probability that no other couple in California matched the reported description of the perpetrators.
In reversing the conviction in 1968, the California Supreme Court noted that the evidence was presented "[w]ithout presenting any statistical evidence whatsoever in support of the probabilities for the factors selected." Collins, 428 P.2d at 36 n9. The Court also noted that all of the selected factors were treated as independent and factually true and that there was no adjustment for dependent variables or the possibility of mistake. Id. at 39. Most men with beards also have moustaches, so a correction for the overlap was necessary. Id. at 39 n15. The Supreme Court also noted that the witness had failed to consider other plausible possibilities, for example, the young woman was a light-skinned African-American with bleached hair. (7)
Some of the bad guesses increased probability, others decreased it, but the expressed probability itself was not reliable. "Mathematics, a veritable sorcerer in our computerized society, while assisting the trier of fact in the search for truth, must not cast a spell over him." Id. at 33. The Collins Court also noted that, even if one accepted the prosecutor's guesses, appropriate calculations indicated that there was a substantial probability that more than one other couple matched the selected factors. Id. at 43.
Multiplying the probabilities of all variables together, without regard to dependence, leads
to a probability that is too small, often greatly too small. For example, variable A has a probability
of occurring one time in one thousand, and always occurs with variable B. B always occurs with A.
A and B are dependent variables, and the probability that A and B will occur together is still one in
one thousand, because they never occur separately. If the probabilities of a random match to A and
B are improperly multiplied together, the probability of both A and B occurring together is 1/1000
x 1/1000 = 1/1,000,000, or one in a million, and is one thousand times too small. The numbers soon
get out of hand. One expert testified that a given profile occurred one time in 2.578 sextillion (2.578
followed by 21 zeroes), (8) a number larger than the number of known stars in the universe (estimated
at one sextillion). (9)
The population of Earth is about 6.5 billion, so anything in the sextillion range
is more than one trillion times larger than the population of Earth. It is no wonder that, faced with
numbers too large to conceive, some juries simply dismiss DNA evidence as not helpful, not
persuasive, or not credible. The other side of the coin is a jury that accepts any claim about
probabilities "because it's DNA." They have all seen "CSI: Crime Scene Investigation" or "NCIS"
and "know" that DNA is infallible. The reality of the human genome is that some genes are recessive and are therefore dependent
on other genes for expression. For example, blue eyes occur only if both parents pass on the gene
for blue eyes. If one parent passes a gene for brown eyes, the probability is high that the child's eyes
will not be blue. Many genes are intertwined to some degree; blue eyes often accompany blonde
hair, but some Irish have strikingly blue eyes and black or red hair. It is very difficult to determine
the probability of a given characteristic because we do not have a map of how each gene affects
every other gene. It may be that one in ten people have blue eyes and one in twenty is (really)
blonde, but because we know that the probability of blue eyes increases if the person is blonde,
simply multiplying 1/10 x 1/20 will not tell us the true probability of having both blue eyes and
blonde hair; the calculated value will be too low. If you are Japanese, there is close to a one hundred
percent probability that you will have dark hair and brown eyes. When the probability of a person
of Japanese ancestry having both dark hair and brown eyes is calculated, we must take that into
account. All of these characteristics are controlled by DNA, and the same rules that apply to any
probability calculation also apply to calculating the probabilities of a DNA match; if areas A and B
of each DNA sample match, but A and B always occur together, A and B must be treated as one area
of matching, not two. In this case, the claim at trial was that only 1 in 2,083 persons of Hispanic descent would
match appellant's DNA profile. How was that number calculated? We do not know. An even more
basic question is: what makes one "Hispanic"? Appellant's surname is Wilson, a name not
ordinarily thought to be Hispanic. May we assume that appellant's father was not Hispanic? Part
Hispanic? Part African-American? Part Western European? Eastern European? Asian? Were the
differing probabilities of a non-Hispanic gene pool taken into account in calculating probabilities?
How are probabilities for racial groups calculated in general? How do we calculate reasonably
accurate probabilities for people like that famous self-described "Cablinasian," Tiger Woods? (10) We
do not know. Secondly, a statement that the DNA profile of the defendant occurs in only one in one million
members of a given racial group means just that: if the reference group is one million individuals,
one person will match; if you have two million individuals in the reference group, two individuals
will match, and so on. Unfortunately, most people translate that statement, that only one person in
a million matches the profile, to mean that there is one chance in a million that the defendant is not
guilty. Statistically, however, a city with ten million members of the reference group will include
ten individuals who match the profile, and thus, there is only a one in ten chance that the defendant
is guilty. In this case, the offense occurred in Dallas County. Assuming a county population of one
million and an Hispanic population of thirty percent, 300,000 Hispanics live in Dallas County. DNA
is very reliable as to gender, so if we assume equal occurrence of gender, there are 150,000 Hispanic
males in Dallas County. Dividing 150,000 by 2083, we find that, statistically, 72 men in Dallas
County fit the profile. But do we really know that the perpetrator lived in Dallas County? Dallas
is part of the Metroplex, which has a population of more than 3 million, and we are a mobile society.
In the Metroplex, there are 216 statistical men who fit the profile. Could he be visiting from
Houston (216 men) or Chicago (360 men)? Assuming that Mexico City is close to 100% Hispanic,
5,281 men in that city alone match the profile. How many male residents of Mexico City (or
Guadalajara, Ciudad Juarez, Acapulco, etc.) were in Dallas County at the time of the offense? Even
if we restrict the possibilities to Dallas County, a stated probability that only one Hispanic in 2,083
matches this profile does not mean that there is one chance in 2,083 that he is not guilty; it means
that the probability is one in 72 that he is guilty. Finally, trial attorneys need to understand how to validate (or repudiate) DNA evidence. They
must begin with the reported match. Prosecutors may leap from a lab report saying that the samples
match to an immediate conclusion that the defendant is guilty, thus the origin of the term
"prosecutor's fallacy." (11) But is it really a match? How many areas of the DNA strands coincide? (12)
How big is the specified error range? Just as for fingerprints, the more areas that match, the more
likely that this is truly a match. (13) If there appears to be a match, advocates then need to discover how
often the laboratory that did the DNA testing produces a false positive. (14) Part of the problem for the
State of California in the O. J. Simpson trial was the revelation that the state's testing laboratory had
a false positive rate of 1 in 200, that is, one match in 200 was not, in fact, a match, thus opening the
door for the defense to argue that the sample really did not match Simpson's DNA. (15) Once it is established by the state that the two samples do, in fact, match within an
appropriate margin of error, the next question is whether the defendant is the source. The random
probabilities that are routinely used are valid only for unrelated persons. The closer the relative, the
greater the number of areas on the DNA strand that will match. Identical twins have identical DNA.
Parents will share DNA similarities with their children, and siblings will have many commonalities.
Double first cousins will also have many commonalities; first cousins will have fewer
commonalities, yet still a significant number. The only living male in a given family has a high
probability of being the source, but a family with only sons over several generations will present a
greater challenge. If the state shows that the defendant is the source, there is one more hurdle; can
the defendant be placed at the crime scene in the appropriate time frame? DNA is durable; it does not evaporate or dissipate, and the time at which it was deposited
on a surface cannot be directly determined. If the DNA sample was retrieved from a place where
the defendant lives, works, or visits frequently, it is probably not probative, as one would expect to
find the defendant's DNA in those places. Sex crimes aside, if the sample is from a place where the
defendant should not have been, the DNA, by itself, can confirm only that he was there at some time
and cannot, by itself, prove conclusively that he was there at the time of the crime. By the same
token, DNA cannot prove that the defendant was not there at the time of the crime. DNA analysis is a powerful tool in determining guilt or innocence, and usually there is other
evidence that links the defendant to the offense, but we must remember that DNA analysis is
performed by humans and is not foolproof, nor are the conclusions drawn from the analysis always
correct. Only if all the prerequisites for reliability-true match, correct source, and presence at the
crime scene in the applicable time frame-are satisfied can society have confidence that the DNA
evidence is, in and of itself, strong enough to support a conviction. I concur in the judgment of the Court. En banc Filed: March 1, 2006 Publish
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.