Abruquah v. State ( 2023 )


Menu:
  • Kobina Ebo Abruquah v. State of Maryland, No. 10, September Term, 2022. Opinion by
    Fader, C.J.
    EVIDENCE – EXPERT EVIDENCE
    Firearms identification examiner testifying as an expert witness should not have been
    permitted to offer an unqualified opinion that crime scene bullets and a bullet fragment
    were fired from the petitioner’s gun. The reports, studies, and testimony presented to the
    circuit court demonstrate that the firearms identification methodology employed by the
    examiner in this case can support reliable conclusions that patterns and markings on bullets
    are consistent or inconsistent with those on bullets fired from a particular known firearm.
    Those reports, studies, and testimony do not, however, demonstrate that the methodology
    used can reliably support an unqualified conclusion that such bullets were fired from a
    particular firearm.
    Circuit Court for Prince George’s County
    Case No. CT121375X
    Argued: October 4, 2022
    IN THE SUPREME COURT
    OF MARYLAND*
    No. 10
    September Term, 2022
    ______________________________________
    KOBINA EBO ABRUQUAH
    v.
    STATE OF MARYLAND
    ______________________________________
    Fader, C.J.,
    Watts,
    Hotten,
    Booth,
    Biran,
    Gould,
    Eaves,
    JJ.
    ______________________________________
    Pursuant to the Maryland Uniform Electronic Legal Materials
    Act (§§ 10-1601 et seq. of the State Government Article) this
    document is authentic.                                                     Opinion by Fader, C.J.
    2023-06-20 16:27-04:00                            Hotten, Gould, and Eaves, JJ., dissent.
    ______________________________________
    Filed: June 20, 2023
    Gregory Hilton, Clerk
    * At the November 8, 2022 general election, the voters of Maryland ratified a constitutional
    amendment changing the name of the Court of Appeals of Maryland to the Supreme Court
    of Maryland. The name change took effect on December 14, 2022.
    Firearms identification, a subset of toolmark identification, is “the practice of
    investigating whether a bullet, cartridge case or other ammunition component or fragment
    can be traced to a particular suspect weapon.” Fleming v. State, 
    194 Md. App. 76
    , 100-01
    (2010). The basic idea is that (1) features unique to the interior of any particular firearm
    leave unique, microscopic patterns and marks on bullets and cartridge cases that are fired
    from that firearm, and so (2) by comparing patterns and marks left on bullets and cartridge
    cases found at a crime scene (“unknown samples”) to marks left on bullets and cartridge
    cases fired from a known firearm (“known samples”), firearms examiners can determine
    whether the unknown samples were or were not fired from the known firearm.
    At the trial of the petitioner, Kobina Ebo Abruquah, the Circuit Court for Prince
    George’s County permitted a firearms examiner to testify, without qualification, that
    bullets left at a murder scene were fired from a gun that Mr. Abruquah had acknowledged
    was his. Based on reports, studies, and testimony calling into question the reliability of
    firearms identification analysis, Mr. Abruquah contends that the circuit court abused its
    discretion in permitting the firearms examiner’s testimony. The State, relying on different
    studies and testimony, contends that the examiner’s opinion was properly admitted.
    Applying the analysis required by Rochkind v. Stevenson, 
    471 Md. 1
     (2020), we
    conclude that the examiner should not have been permitted to offer an unqualified opinion
    that the crime scene bullets were fired from Mr. Abruquah’s gun. The reports, studies, and
    testimony presented to the circuit court demonstrate that the firearms identification
    methodology employed in this case can support reliable conclusions that patterns and
    markings on bullets are consistent or inconsistent with those on bullets fired from a
    particular firearm. Those reports, studies, and testimony do not, however, demonstrate that
    that methodology can reliably support an unqualified conclusion that such bullets were
    fired from a particular firearm.
    The State also contends that any error in the circuit court’s admission of the
    examiner’s testimony was harmless. Because we are not convinced “beyond a reasonable
    doubt, that the error in no way influenced the verdict,” Dionas v. State, 
    436 Md. 97
    , 108
    (2013) (quoting Dorsey v. State, 
    276 Md. 638
    , 659 (1976)), we must reverse and remand
    for a new trial.
    BACKGROUND
    Factual Background
    On August 3, 2012, police responded to three separate calls complaining of
    disturbances at the house that Mr. Abruquah shared with his roommate, Ivan Aguirre-
    Herrera. On the third of these occasions, just before midnight, two officers arrived at the
    house. According to the officers, Mr. Abruquah appeared “agitated,” “very aggressive,”
    and uncooperative. One of the officers testified that Mr. Aguirre-Herrera appeared to be
    terrified of Mr. Abruquah. Before leaving around 12:15 a.m., the officers told the men to
    stay away from each other.
    A neighbor of Messrs. Abruquah and Aguirre-Herrera testified that he heard
    multiple gunshots sometime between 11:30 p.m. on August 3 and 12:30 a.m. on August 4.
    Four days later, officers discovered Mr. Aguirre-Herrera’s body decomposing in his
    bedroom. An autopsy revealed that he had been shot five times, including once in the back
    2
    of the head. The police recovered four bullets and two bullet fragments from the crime
    scene.
    During questioning, Mr. Abruquah told the police that he owned two firearms, both
    hidden in the ceiling of the basement of the residence he shared with Mr. Aguirre-Herrera.
    The police recovered both firearms, a Glock pistol and a Taurus .38 Special revolver.
    A jailhouse informant testified that Mr. Abruquah had said that he had engaged in
    “a heated argument” with Mr. Aguirre-Herrera, “snapped,” and shot him with “a 38” that
    he kept in the ceiling of his basement.1
    Procedural Background
    Mr. Abruquah was convicted by a jury of first-degree murder and related handgun
    offenses in December 2013. Abruquah v. State, No. 246, Sept. Term 2014, 
    2016 WL 7496174
    , at *1 & n.1 (Md. App. Dec. 20, 2016). In an unreported opinion, the Appellate
    Court of Maryland (then named the Court of Special Appeals)2 reversed the judgment and
    remanded the case for a new trial on grounds that are not relevant to the current appeal. Id.
    at *9.
    On remand, Mr. Abruquah filed a motion in limine to exclude firearms identification
    evidence the State intended to offer through its expert witness, Scott McVeigh, a senior
    firearms examiner with the Firearms Examination Unit of the Prince George’s County
    The jailhouse informant testified at Mr. Abruquah’s first trial in 2013. At his
    1
    second trial, in 2018, the State read into the record a transcript of that prior testimony.
    2
    At the November 8, 2022 general election, the voters of Maryland ratified a
    constitutional amendment changing the name of the Court of Special Appeals of Maryland
    to the Appellate Court of Maryland. The name change took effect on December 14, 2022.
    3
    Police Department, Forensic Science Division. The circuit court held a four-day Frye-
    Reed hearing3 during which both parties introduced evidence and elicited testimony that
    we summarize below.
    Following the hearing, the circuit court largely denied, but partially granted, the
    motion. The court concluded that “firearm and toolmark identification is still generally
    accepted and sufficiently reliable under the Frye-Reed standard” and therefore should not
    be “excluded in its entirety.” Nonetheless, the court agreed with Mr. Abruquah that the
    subjective nature of the matching analysis made it inappropriate for an expert to “testify to
    any level of practical certainty/impossibility, ballistic certainty, or scientific certainty that
    a suspect weapon matches certain bullet or casing striations.” The court thus restricted the
    expert to opining whether the bullets and bullet fragment “recovered from the murder scene
    fall into any of” a particular set of five classifications, one of which is “[i]dentification” of
    the unknown bullet as a match to a known bullet.
    At trial, Mr. McVeigh testified about the process by which he eliminated the Glock
    pistol as a source of the unknown crime scene samples, created known samples from the
    Taurus revolver, and compared the microscopic patterns and markings on the two sets of
    samples. Over defense objection, Mr. McVeigh opined that four bullets and one bullet
    3
    Prior to our decision in Rochkind v. Stevenson, 
    471 Md. 1
     (2020), courts in
    Maryland determined the admissibility of expert testimony using the Frye-Reed
    evidentiary standard, which “turned on the ‘general acceptance’ of such evidence ‘in the
    particular field in which it belongs.’” Rochkind, 471 Md. at 4 (discussing Frye v. United
    States, 
    293 F. 1013
     (D.C. Cir. 1923) and Reed v. State, 
    283 Md. 374
     (1978)).
    4
    fragment recovered from the crime scene “at some point had been fired from [the Taurus
    revolver].”4
    Mr. Abruquah was again convicted of first-degree murder and use of a handgun in
    the commission of a crime. His first appeal from that conviction resulted in a remand to
    the circuit court to consider whether it “would reach a different conclusion concerning the
    admission of firearm and toolmark identification testimony” applying our then-new
    decision in Rochkind v. Stevenson, 
    471 Md. 1
    , 27 (2020). In that decision, which was
    issued after Mr. Abruquah’s second conviction while his appeal was pending, we
    abandoned the Frye-Reed standard for admissibility of expert testimony in favor of the
    standard set forth in Daubert v. Merrell Dow Pharmaceuticals, Inc., 
    509 U.S. 579
     (1993),
    and its progeny. Abruquah v. State, 
    471 Md. 249
    , 250 (2020).
    On remand, the circuit court held a hearing in which it once again received evidence
    from both sides, which is discussed further below. The court ultimately issued an opinion
    in which it reviewed each of the ten factors this Court set forth in Rochkind and concluded
    that the testimony remained admissible. The court noted that although Mr. Abruquah
    “ha[d] made a Herculean effort to demonstrate why the evidence should be heavily
    scrutinized, questioned and potentially impeached, the State has met the burden for
    admissibility of this evidence.”   The court therefore sustained Mr. Abruquah’s prior
    conviction.
    4
    Four bullets and two bullet fragments were recovered from the crime scene but
    Mr. McVeigh found that one of the fragments was not suitable for comparison. As a result,
    his testimony was limited to the bullets and one of the fragments.
    5
    Mr. Abruquah filed another timely appeal to the intermediate appellate court and,
    while that appeal was pending, he filed a petition for writ of certiorari in this Court. We
    granted that petition to address whether the firearms identification methodology employed
    by Mr. McVeigh is sufficiently reliable to allow a firearms examiner, without any
    qualification, to identify a specific firearm as the source of a questioned bullet or cartridge
    case found at a crime scene. See Abruquah v. State, 
    479 Md. 63
     (2022).
    DISCUSSION
    We review a circuit court’s decision to admit expert testimony for an abuse of
    discretion. Rochkind, 471 Md. at 10. Under that standard, we will “not reverse simply
    because . . . [we] would not have made the same ruling.” State v. Matthews, 
    479 Md. 278
    ,
    305 (2022) (quoting Devincentz v. State, 
    460 Md. 518
    , 550 (2018)). In connection with
    the admission of expert testimony, where circuit courts are to act as gatekeepers in applying
    the factors set out by this Court in Rochkind, a circuit court abuses its discretion by, for
    example, admitting expert evidence where there is an analytical gap between the type of
    evidence the methodology can reliably support and the evidence offered.5 See Rochkind,
    471 Md. at 26-27.
    5
    This Court has frequently described an abuse of discretion as occurring when “no
    reasonable person would take the view adopted by the circuit court” or when a decision is
    “well removed from any center mark imagined by the reviewing court and beyond the
    fringe of what the court deems minimally acceptable.” Matthews, 479 Md. at 305 (first
    quoting Williams v. State, 
    457 Md. 551
    , 563 (2018), and next quoting Devincentz v. State,
    
    460 Md. 518
    , 550 (2018)). In our view, the application of those descriptions to a trial
    court’s application of a newly adopted standard, such as that adopted by this Court in
    Rochkind as applicable to the admissibility of expert testimony, is somewhat unfair. In this
    case, in the absence of additional caselaw from this Court implementing the newly adopted
    standard, the circuit court acted deliberately and thoughtfully in approaching, analyzing,
    6
    Part I of our discussion sets forth the standard for the admissibility of expert
    testimony in Maryland following this Court’s decision in Rochkind v. Stevenson, 
    471 Md. 1
     (2020).    In Part II, we discuss general background on the firearms identification
    methodology employed by the State’s expert witness, criticisms of that methodology,
    studies of the methodology, the testimony presented to the circuit court, and caselaw from
    other jurisdictions. In Part III, we apply the factors set forth in Rochkind to the evidence
    before the circuit court.
    I.       THE ADMISSIBILITY OF EXPERT TESTIMONY
    The admissibility of expert testimony is governed by Rule 5-702, which provides:
    Expert testimony may be admitted, in the form of an opinion or
    otherwise, if the court determines that the testimony will assist the trier of
    fact to understand the evidence or to determine a fact in issue. In making
    that determination, the court shall determine
    (1) whether the witness is qualified as an expert by knowledge, skill,
    experience, training, or education,
    (2) the appropriateness of the expert testimony on the particular subject,
    and
    (3) whether a         sufficient   factual   basis   exists   to   support
    the expert testimony.
    Trial courts analyzing the admissibility of evidence under Rule 5-702 are to consider
    the following non-exhaustive list of “factors in determining whether the proffered expert
    testimony is sufficiently reliable to be provided to the trier of facts,” Matthews, 479 Md. at
    310:
    and resolving the question before it. This Court’s majority has come to a different
    conclusion concerning the outer bounds of what is acceptable expert evidence in this area.
    7
    (1) whether a theory or technique can be (and has been) tested;
    (2) whether a theory or technique has been subjected to peer review and
    publication;
    (3) whether a particular scientific technique has a known or potential rate of
    error;
    (4) the existence and maintenance of standards and controls; . . .
    (5) whether a theory or technique is generally accepted[;]
    ...
    (6) whether experts are proposing to testify about matters growing naturally
    and directly out of research they have conducted independent of the
    litigation, or whether they have developed their opinions expressly for
    purposes of testifying;
    (7) whether the expert has unjustifiably extrapolated from an accepted
    premise to an unfounded conclusion;
    (8) whether the expert has adequately accounted for obvious alternative
    explanations;
    (9) whether the expert is being as careful as [the expert] would be in [the
    expert’s] regular professional work outside [the expert’s] paid litigation
    consulting; and
    (10) whether the field of expertise claimed by the expert is known to reach
    reliable results for the type of opinion the expert would give.
    Rochkind, 471 Md. at 35-36 (first quoting Daubert, 
    509 U.S. at 593-94
     (for factors 1-5)
    and next quoting Fed. R. Evid. 702 Advisory Committee Note (cleaned up) (for factors 6-
    10)).
    In applying these “Daubert-Rochkind factors,” we have observed that the guidance
    provided by the United States Supreme Court in Daubert and its progeny, especially
    General Electric Co. v. Joiner, 
    522 U.S. 136
     (1997), and Kumho Tire Co. v. Carmichael,
    8
    
    526 U.S. 137
     (1999), “is critical to a trial court’s reliability analysis.” Rochkind, 471 Md.
    at 36. In Matthews, we summarized that guidance in five principles:
    • “[T]he reliability inquiry is ‘a flexible one.’” Matthews, 479 Md. at 311
    (quoting Rochkind, 471 Md. at 36).
    • “[T]he trial court must focus solely on principles and methodology, not on
    the conclusions that they generate. However, conclusions and methodology
    are not entirely distinct from one another. Thus, [a] trial court . . . must
    consider the relationship between the methodology applied and conclusion
    reached.” Id. (internal citations and quotation marks omitted).
    • “[A] trial court need not admit opinion evidence that is connected to existing
    data only by the ipse dixit of the expert; rather, [a] court may conclude that
    there is simply too great an analytical gap between the data and the opinion
    proffered.” Id. (internal quotation marks omitted).
    • “[A]ll of the Daubert factors are relevant to determining the reliability of
    expert testimony, yet no single factor is dispositive in the analysis. A trial
    court may apply some, all, or none of the factors depending on the particular
    expert testimony at issue.” Id. at 37.
    • “Rochkind did not upend [the] trial court’s gatekeeping function. Vigorous
    cross-examination, presentation of contrary evidence, and careful instruction
    on the burden of proof are the traditional and appropriate means of attacking
    shaky but admissible evidence.” Id. at 38 (internal quotation marks omitted).
    The overarching criterion for the admission of relevant expert testimony under
    Rochkind, and the goal to which each of the ten Daubert-Rochkind factors and the five
    principles summarized in Matthews are all addressed, is reliability. The question for a trial
    court is not whether proposed expert testimony is right or wrong, but whether it meets a
    minimum threshold of reliability so that it may be presented to a jury, where it may then
    be questioned, tested, and attacked through means such as cross-examination or the
    submission of opposing expert testimony.
    Because we evaluate a trial court’s decision to admit or exclude expert testimony
    under an abuse of discretion standard, our review is necessarily limited to the information
    9
    that was before the trial court at the time it made the decision. A trial court can hardly
    abuse its discretion in failing to consider evidence that was not before it.6
    II.     FIREARMS IDENTIFICATION EVIDENCE
    Through multiple submissions by the parties and two evidentiary hearings over the
    course of five days, the circuit court ultimately received the testimony of five witnesses
    (one twice); 18 reports or articles discussing firearms identification, describing studies
    testing firearms identification, or criticizing the theory or the results of the studies testing
    it; and a chart identifying dozens of additional or planned studies or reports. In section A
    of this Part II, we discuss firearms identification evidence generally. In sections B and C,
    we review criticisms and studies of the methodology, respectively. In section D, we
    summarize the testimony presented to the circuit court. Finally, in section E, we discuss
    how some other courts have resolved challenges to the admissibility of firearms
    identification evidence.
    6
    On appeal, the State cited articles presenting the results of studies that were not
    presented to the circuit court and, in some cases, that were not even in existence at the time
    the circuit court ruled. See, e.g., Maddisen Neuman et al., Blind Testing in Firearms:
    Preliminary Results from a Blind Quality Control Program, 67 J. Forensic Scis. 964
    (2022); Eric F. Law & Keith B. Morris, Evaluating Firearm Examiner Conclusion
    Variability Using Cartridge Case Reproductions, 66:5 J. Forensic Scis. 1704 (2021). We
    have not considered those studies in reaching our decision. If any of those studies
    materially alters the analysis applicable to the reliability of the Association of Firearm and
    Tool Mark Examiners theory of firearms identification, they will need to be presented in
    another case.
    10
    A.     Firearms Identification
    1.     The Theory Underlying Firearms Identification Generally
    Firearms identification is a subset of toolmark identification. A toolmark—literally,
    a mark left by a particular tool—is “generated when a hard object (tool) comes into contact
    with a relatively softer object,” such as the marks that result “when the internal parts of a
    firearm make contact with the brass and lead that comprise ammunition.” United States v.
    Willock, 
    696 F. Supp. 2d 536
    , 555 (D. Md. 2010) (quoting Nat’l Rsch. Council, Nat’l Acad.
    of Scis., Strengthening Forensic Science in the United States: A Path Forward 150 (2009)),
    aff’d sub nom. United States v. Mouzone, 
    687 F.3d 207
     (4th Cir. 2012). The marks are
    then viewable using a “comparison microscope,” which a firearms examiner uses “to
    compare ammunition test-fired from a recovered gun with spent ammunition from a crime
    scene[.]” United States v. Monteiro, 
    407 F. Supp. 2d 351
    , 359 (D. Mass. 2006).
    As a forensic technique to identify a particular firearm as the source of a particular
    ammunition component, firearms identification is based on the premise that no two
    firearms will make identical marks on a bullet or cartridge case. United States v. Natson,
    
    469 F. Supp. 2d 1253
    , 1260 (M.D. Ga. 2007). That, the theory goes, is because the method
    of manufacturing firearms results in the interior of each firearm being unique and,
    therefore, making unique imprints on ammunition components fired from it. 
    Id.
    As the United States District Court for the District of Massachusetts explained:
    When a firearm is manufactured, the “process of cutting, drilling, grinding,
    hand-filing, and, very occasionally, hand-polishing . . . will leave individual
    characteristics” on the components of the firearm. See Brian J.
    Heard, Handbook of Firearms and Ballistics 127 (1997). Although modern
    manufacturing methods have reduced the amount of handiwork performed
    11
    on an individual gun, the final step in production of most firearm parts
    requires some degree of hand-filing which imparts individual characteristics
    to the firearm part. See id. at 128. This process results in “randomly
    produced patterns of individual stria,” or thin grooves or markings, being left
    on firearm parts. Id. These parts are assembled to compose the final firearm.
    When a round (a single “shot”) of ammunition is fired from a particular
    firearm, the various components of the ammunition come into contact with
    the firearm at very high pressures. As a result, the individual markings on
    the firearm parts are transferred to the ammunition. Id. The ammunition is
    composed primarily of the bullet and the cartridge case. The bullet is the
    missile-like component of the ammunition that is actually projected from the
    firearm, through the barrel, toward the target. . . . The cartridge case is the
    part of the ammunition situated behind the bullet containing the primer and
    propellant, the explosive mixture of chemicals that causes the bullet to be
    projected through the barrel. Id. at 42.
    Monteiro, 
    407 F. Supp. 2d at 359-60
    .
    The patterns and marks left on bullets and cartridge cases are classified into three
    categories. First, “class characteristics” are common to all bullets and cartridge cases fired
    from “weapons of the make and model that fired the ammunition.” Willock, 
    696 F. Supp. 2d at 557-58
    . “Examples of class characteristics include the bullet’s weight and caliber;
    number and width of the lands and grooves in the gun’s barrel; and the ‘twist’ (direction
    of turn, i.e., clockwise or counterclockwise, of the rifling in the barrel).”7 
    Id. at 558
    .
    Second, “subclass characteristics” are common to “a group of guns within a certain
    make or model, such as those manufactured at a particular time and place.” Monteiro, 407
    7
    “Rifling” refers to “a pattern of channels that run the length of a firearm barrel,
    manufactured with a helical pattern, or twist,” which has raised areas called “lands,” and
    lowered areas called “grooves.” Ass’n of Firearms & Tool Mark Exam’rs, What Is Firearm
    and Tool Mark Identification?, available at https://afte.org/about-us/what-is-afte/what-is-
    firearm-and-tool-mark-identification (last accessed June 14, 2023), archived at
    https://perma.cc/UYA4-99CS. “The number and width of lands and grooves is determined
    by the manufacturer and will be the same for a large group of firearms.” 
    Id.
    12
    F. Supp. 2d at 360. “An example would include imperfections ‘on a rifling tool that imparts
    similar toolmarks on a number of barrels before being modified either through use or
    refinishing.’” Willock, 
    696 F. Supp. 2d at 558
     (quoting Ronald G. Nichols, Defending the
    Scientific Foundations of the Firearms and Tool Mark Identification Discipline:
    Responding to Recent Challenges, 52 J. Forensic Scis. 586, 587 (2007)).
    Third, “individual characteristics” are those unique to an individual firearm that
    therefore “distinguish [the firearm] from all others.” Willock, 
    696 F. Supp. 2d at 558
    (quoting Monteiro, 
    407 F. Supp. 2d at 360
    ). Individual characteristics include “[r]andom
    imperfections produced during manufacture or caused by accidental damage.”                
    Id.
    Notably, not all individual characteristics are unique, Willock, 
    696 F. Supp. 2d at 558
    , and
    individual characteristics can change over the life of a firearm as a result of, for example,
    wear, polishing, or damage. As will be discussed further below, one dispute between
    proponents of firearms identification and its detractors is the degree to which firearms
    examiners can reliably identify the difference between subclass and individual
    characteristics when performing casework.
    2.     The Association of Firearm and Tool Mark Examiners
    Methodology
    The leading methodology used by firearms examiners, and the methodology
    employed in this case by Mr. McVeigh, is the Association of Firearm and Tool Mark
    Examiners (“AFTE”) “Theory of Identification” (the “AFTE Theory”).8 See Committee
    8
    According to its website, the AFTE “is the international professional organization
    for practitioners of Firearm and/or Toolmark Identification and has been dedicated to the
    exchange of information, methods and best practices, and the furtherance of research since
    13
    for the Advancement of the Science of Firearm & Toolmark Identification, Theory of
    Identification as it Relates to Toolmarks: Revised, 43 AFTE J. 287 (2011). Examiners
    employing the AFTE Theory follow a two-step process. At step one, the examiner
    evaluates class characteristics of the unknown and known samples. See AFTE, Summary
    of the Examination Method, available at https://afte.org/resources/swggun-ark/summary-
    of-the-examination-method      (last   accessed     June    14,    2023),    archived     at
    https://perma.cc/4D8W-UDW9. If the class characteristics do not match—i.e., if the
    samples have different numbers of lands and grooves or a different twist direction—the
    firearm that produced the known sample is excluded as the source of the unknown sample.
    
    Id.
     If the class characteristics match, the second step involves “a comparative examination
    . . . utilizing a comparison microscope.” 
    Id.
     At that step, the examiner engages in “pattern
    matching” “to determine: 1) if any marks present are subclass characteristics and/or
    individual characteristics, and 2) the level of correspondence of any individual
    characteristics.”9 
    Id.
    its creation in 1969.” AFTE, What is AFTE?, available at https://afte.org/about-us/what-
    is-afte (last accessed June 14, 2023), archived at https://perma.cc/4VKT-EZW7.
    According to AFTE’s bylaws, individuals are eligible to become members if they are,
    among other things, “a practicing firearm and/or toolmark examiner,” which is defined to
    mean a person who “derives a substantial portion of their livelihood from the examination,
    identification, and evaluation of firearms and related materials and/or toolmarks; or an
    individual whose present livelihood is a direct result of the knowledge and experience
    gained from the examination, identification, and evaluation of firearms and related
    materials and/or toolmarks.” AFTE, AFTE Bylaws, Art. III, § 1, available at
    https://afte.org/about-us/bylaws (last accessed June 14, 2023), archived at
    https://perma.cc/Y2PF-XWUF.
    9
    An alternative to the AFTE method is the “consecutive matching striae method of
    toolmark analysis” (“CMS”). Fleming, 194 Md. App. at 105. “The CMS method . . . calls
    14
    Based on that “pattern matching,” the examiner makes a determination in
    accordance with the “AFTE Range of Conclusions,” which presents the following options:
    1.     “Identification” occurs when there is “[a]greement of a combination
    of individual characteristics and all discernible class characteristics where
    the extent of agreement exceeds that which can occur in the comparison of
    toolmarks made by different tools and is consistent with the agreement
    demonstrated by toolmarks known to have been produced by the same tool.”
    2.    There are three categories of “Inconclusive,” all of which require full
    agreement of “all discernible class characteristics”:
    (a) when there is “[s]ome agreement of individual characteristics . . . but
    insufficient for an identification”;
    (b) when there is neither “agreement [n]or disagreement of individual
    characteristics”; and
    (c) when there is “disagreement of individual characteristics, but
    insufficient for an elimination.”
    3.     “Elimination” occurs when there is “[s]ignificant disagreement of
    discernible class characteristics and/or individual characteristics.”
    AFTE, Range of Conclusions, available at https://afte.org/about-us/what-is-afte/afte-
    range-of-conclusions (last accessed June 14, 2023), archived at https://perma.cc/WKF5-
    M6HD.
    According to the AFTE, a positive “Identification” can be made when there is
    “sufficient agreement” between “two or more sets of surface contour patterns” on samples.
    AFTE, AFTE Theory of Identification as It Relates to Toolmarks, available at
    for the examiner to consider the number of consecutive matching striae, or ‘scratches’
    appearing on a projectile fragment. The theory provides that a positive ‘match’
    determination can be made only when a certain, statistically established number of striae
    match.” Id. Proponents of the CMS method argue that it has a “greater degree of objective
    certainty” than other methods. Id. The CMS method was not used in this case.
    15
    https://afte.org/about-us/what-is-afte/afte-theory-of-identification (last accessed June 14,
    2023), archived at https://perma.cc/E397-U8KM.           “[S]ufficient agreement,” in turn:
    (1) occurs when the level of agreement “exceeds the best agreement demonstrated between
    toolmarks known to have been produced by different tools and is consistent with agreement
    demonstrated by toolmarks known to have been produced by the same tool”; and (2) means
    that “the agreement of individual characteristics is of a quantity and quality that the
    likelihood another tool could have made the mark is so remote as to be considered a
    practical impossibility.” Id.
    The     AFTE      acknowledges       that   “[c]urrently      the   interpretation   of
    individualization/identification is subjective in nature[.]” Id. The AFTE Theory provides
    no objective criteria to determine what constitutes the “best agreement demonstrated”
    between toolmarks produced by different tools or what rises to the level of “quantity and
    quality” of agreement demonstrating a “practical impossibility” of a different tool having
    made the same mark. There are also no established standards for classifying a particular
    pattern or mark as a subclass versus an individual characteristic.
    B.     Critiques of Firearms Identification
    Firearms identification has existed as a field for more than a century.10 Throughout
    most of that time, it has been accepted by law enforcement organizations and courts without
    10
    The first prominent use of firearms identification in the United States is attributed
    to examinations made in the aftermath of the 1906 race-related incident in Brownsville,
    Texas, known as the “Brownsville Affair.” There, Army personnel matched 39 out of 45
    cartridge cases to two types of rifles “through the use of only magnified photographs of
    firing pin impressions[.]” Kathryn E. Carso, Amending the Illinois Postconviction Statute
    to Include Ballistics Testing, 
    56 DePaul L. Rev. 695
    , 700 n.43 (2007).
    16
    significant challenge. However, the advent of Daubert, work exposing the unreliability of
    other previously accepted forensic techniques,11 and recent reports questioning the
    foundations underlying firearms identification have led to greater skepticism.
    Reports issued since 2008 by two blue-ribbon groups of experts outside of the
    firearms and toolmark identification field have been critical of the AFTE Theory. In 2008,
    the National Research Council of the National Academies of Science (the “NRC”)
    published a report concerning the feasibility of developing a national database of ballistic
    images to aid in criminal investigations. National Research Council, National Academy of
    Sciences, Committee to Assess the Feasibility, Accuracy, and Technical Capability of a
    National    Ballistics   Database,   Ballistic    Imaging    1-2   (2008),    available      at
    https://nap.nationalacademies.org/read/12162/chapter/1 (last accessed June 14, 2023),
    archived at https://perma.cc/X6NG-BNVN.           In the report, the committee identified
    challenges that complicate firearms identifications, and ultimately determined that the
    creation of a national ballistic image database was not advisable at the time. Id. at 4-5.
    11
    For example, comparative bullet lead analysis was initially widely accepted
    within the scientific and legal community, and admitted successfully in criminal
    prosecutions nationwide, yet its validity was subsequently undermined and such evidence
    is now inadmissible. See Chesson v. Montgomery Mut. Ins. Co., 
    434 Md. 346
    , 358-59
    (2013) (stating that, despite the expert’s “use of th[e] technique for thirty years,”
    comparative bullet lead analysis evidence was inadmissible because its “general and
    underlying assumption . . . was no longer generally accepted by the relevant scientific
    community”); Clemons v. State, 
    392 Md. 339
    , 364-72 (2006) (comprehensively discussing
    comparative bullet lead analysis and holding that it does not satisfy Frye-Reed); Sissoko v.
    State, 
    236 Md. App. 676
    , 721-27 (2018) (discussing that the “methodology
    underlying [comparative bullet lead analysis], which was developed in the 1960s and
    became a widely accepted forensic tool by the 1980s[,] . . . [was] undermined by many in
    the relevant scientific community” and was “no longer . . . ‘valid and reliable’” (quoting
    Clemons v. State, 
    392 Md. 339
    , 359 (2006))).
    17
    Then, in 2009, the NRC published a report in which it addressed “pressing issues”
    within several forensic science disciplines, including firearms identification. National
    Research Council, National Academy of Sciences, Strengthening Forensic Science in the
    United States: A Path Forward 2-5 (2009) (the “2009 NRC Report”), available at
    https://www.ojp.gov/pdffiles1/nij/grants/228091.pdf (last accessed June 14, 2023),
    archived at https://perma.cc/RLT6-49C3.12 The NRC observed that advances in DNA
    evidence had revealed flaws in other forensic science disciplines that “may have
    contributed to wrongful convictions of innocent people,” id. at 4, and pointed especially to
    the relative “dearth of peer-reviewed, published studies establishing the scientific bases
    and validity of many forensic methods,” id. at 8.
    With respect to firearms identification specifically, the NRC criticized the AFTE
    Theory as lacking specificity in its protocols; producing results that are not shown to be
    accurate, repeatable, and reproducible; lacking databases and imaging that could improve
    the method; having deficiencies in proficiency training; and requiring examiners to offer
    opinions based on their own experiences without articulated standards. Id. at 6, 63-64, 155.
    In particular, the lack of knowledge “about the variabilities among individual tools and
    guns” means that there is an inability of examiners “to specify how many points of
    similarity are necessary for a given level of confidence in the result.” Id. at 154. Indeed,
    the NRC noted, the AFTE’s guidance, which is the “best . . . available for the field of
    12
    The lead NRC “Committee” behind the report was the “Committee on Identifying
    the Needs of the Forensic Science Community.” The committee was co-chaired by Judge
    Harry T. Edwards of the United States Court of Appeals for the District of Columbia Circuit
    and included members from a variety of distinguished academic and scientific programs.
    18
    toolmark identification, does not even consider, let alone address, questions regarding
    variability, reliability, repeatability, or the number of correlations needed to achieve a given
    degree of confidence.” Id. at 155. The NRC concluded that “[t]he validity of the
    fundamental assumptions of uniqueness and reproducibility of firearms-related toolmarks
    has not yet been fully demonstrated.” Id. at 70, 80-81, 154-55 (citation omitted).
    In 2016, the President’s Council of Advisors on Science and Technology
    (“PCAST”)13 issued a report identifying additional concerns about the scientific validity
    of, among other forensic techniques, firearms identification. See Executive Office of the
    President, President’s Council of Advisors on Science and Technology, REPORT TO THE
    PRESIDENT, Forensic Science in Criminal Courts: Ensuring Scientific Validity of
    Feature-Comparison       Methods     (2016)     (the   “PCAST      Report”),    available    at
    https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_fo
    13
    The PCAST Report provides the following description of PCAST’s role:
    The President’s Council of Advisors on Science and Technology
    (PCAST) is an advisory group of the Nation’s leading scientists and
    engineers, appointed by the President to augment the science and technology
    advice available to him from inside the White House and from cabinet
    departments and other Federal agencies. PCAST is consulted about, and
    often makes policy recommendations concerning, the full range of issues
    where understandings from the domains of science, technology, and
    innovation bear potentially on the policy choices before the President.
    PCAST Report at iv. Members of PCAST included scholars and senior executives at
    institutions and firms including Harvard University; the University of Texas at Austin,
    Honeywell; Princeton University; the University of Maryland; the University of Michigan;
    the University of California, Berkeley; United Technologies Corporation; Washington
    University of St. Louis; Alphabet, Inc.; Northwestern University; and the University of
    California, San Diego. Id. at v-vi. PCAST also consulted with “Senior Advisors”
    including eight federal appellate and trial court judges, as well as law school and university
    professors. Id. at viii-ix.
    19
    rensic_science_report_final.pdf     (last   accessed   June    14,    2023),    archived    at
    https://perma.cc/3QWJ-2DGR. With respect to all six forensic disciplines addressed in the
    report, including firearms identification, PCAST focused on whether there had been a
    demonstration of both “foundational validity” and “validity as applied.”          Id. at 4-5.
    Foundational validity, according to PCAST, requires that the method “be shown, based on
    empirical studies, to be repeatable, reproducible, and accurate, at levels that have been
    measured and are appropriate to the intended application.” Id. Validity as applied requires
    “that the method has been reliably applied in practice.” Id. at 5.
    With respect to firearms identification specifically, PCAST described the AFTE
    Theory as a “circular” method that lacks “foundational validity” because appropriate
    studies had not confirmed its accuracy, repeatability, and reproducibility. Id. at 60, 104-05.
    PCAST concluded that the studies performed to that date, with one exception, were not
    properly designed, had severely underestimated the false positive and false negative error
    rates, or otherwise “differ[ed] in important ways from the problems faced in casework.”
    Id. at 106. Among other things, PCAST noted design flaws in existing studies, including:
    (1) many were not “black-box” studies,14 id. at 49; and (2) many were closed-set studies,
    14
    “A black box study assesses the accuracy of examiners’ conclusions without
    considering how the conclusions were reached. The examiner is treated as a ‘black-box’
    and the researcher measures how the output of the ‘black-box’ (examiner’s conclusion)
    varies depending on the input (the test specimens presented for analysis). To test examiner
    accuracy, the ‘ground truth’ regarding the type or source of the test specimens must be
    known with certainty.” Organization of Scientific Area Committees for Forensic Science,
    OSAC Draft Guidance on Testing the Performance of Forensic Examiners (2018),
    available at https://www.nist.gov/document/drafthfcguidancedocument-may8pdf (last
    accessed June 14, 2023), archived at https://perma.cc/3LH5-KURT.
    20
    in which comparisons are dependent upon each other and there is always a “correct” answer
    within the set, id. at 106.
    The sole exception to PCAST’s negative critique of study designs was a study
    performed by the United States Department of Energy’s Ames Laboratory (the “Ames I
    Study”), which PCAST called “the first appropriately designed black-box study of firearms
    [identification].” Id. at 11. Nonetheless, PCAST observed that that study, which we
    discuss below, was not published in a scientific journal, had not been subjected to peer
    review, and stood alone. Id. PCAST therefore concluded that “firearms analysis currently
    falls short of the criteria for foundational validity” and called for additional testing. Id. at
    111-14.
    C.      Recent Studies of the AFTE Theory
    Numerous studies of the AFTE Theory have been performed over the course of
    several decades. The State contends that many of those studies are scientifically valid,
    reflect extremely low false positive error rates, and therefore support the reliability of the
    methodology. Mr. Abruquah argues that the studies on which the State relies are flawed
    and were properly discounted by the NRC and PCAST, that even the best studies present
    artificially low error rates by treating inconclusive findings as correct, and that the most
    recent and authoritative study reveals “shockingly” low rates of repeatability and
    reproducibility.
    The State is correct that numerous studies have purported to validate the AFTE
    Theory, including by identifying relatively low false positive error rates. One of the State’s
    expert witnesses, Dr. James E. Hamby, is the lead author on one such study, in which 697
    21
    examiners inspected “over 240 test sets consisting of bullets fired through 10 consecutively
    rifled RUGER P-85 pistol barrels.” James E. Hamby et al., A Worldwide Study of Bullets
    Fired from 10 Consecutively Rifled 9MM Ruger Pistol Barrels—Analysis of Examiner
    Error Rate, 64:2 J. Forensic Scis. 551, 551 (Mar. 2019) (the “Hamby Study”). In that
    closed-set study, of 10,455 unknown bullets examined, 10,447 “were correctly identified
    by participants to the provided ‘known’ bullets,” examiners could not reach a definitive
    conclusion on eight bullets, and none were misidentified.15 Id. at 556. The error rate,
    excluding inconclusive results, was thus 0.0%. See id.
    Examples of other studies on which the State relies, all of which identify relatively
    low error rates based on the study method employed, include: (1) Jamie A. Smith, Beretta
    barrel fired bullet validation study, 66 J. Forensic Scis. 547 (2021) (comparison testing of
    30 consecutively manufactured pistol barrels, producing a 0.55% error rate); and (2) Tasha
    P. Smith et al., A Validation Study of Bullet and Cartridge Case Comparisons Using
    Samples Representative of Actual Casework, 61 J. Forensic Scis. 939 (2016) (within-set
    study of 31 examiners matching bullets and cartridge cases, yielding a 0.0% false-positive
    rate for bullet comparisons and a 0.14% false-positive error rate for cartridge cases).
    The NRC and PCAST both are critical of closed-set studies like the Hamby Study
    and others that provide examiners with multiple “unknown” bullets or cartridge cases and
    a corresponding number of “known” bullets or cartridge cases that the examiners are asked
    15
    Of the eight, the authors point out that three examiners “reported insufficient
    individual characteristics for two of the test bullets and two trainees could not associate
    five of the test bullets to their known counterpart bullets.” Hamby Study, at 556.
    22
    to match. The NRC and PCAST criticize such studies as not being representative of
    casework because, among other reasons: (1) examiners are aware they are being tested;
    (2) a correct match exists within the set for every sample, which the examiners also know;
    and (3) the use of consecutively manufactured firearms (or barrels) in a closed-set study
    has the effect of eliminating any confusion concerning whether particular patterns or marks
    constitute subclass or individual characteristics. PCAST Report, at 32-33, 52-59, 107-09;
    2009 NRC Report, at 154-55.
    The Ames I Study, which PCAST had identified as the only one that had been
    “appropriately designed” to that point, PCAST Report, at 111, was a 2014 open-set, black-
    box study designed to measure error rates in the comparison of “known” and “unknown”
    cartridge cases (the Ames I Study did not involve bullets). See David P. Baldwin et al., A
    Study of False-Positive and False-Negative Error Rate in Cartridge Case Comparisons,
    Defense Biometrics & Forensics Office, U.S. Dep’t of Energy (Apr. 2014). In the Ames I
    Study, 15 sets of four cartridge cases fired from 25 new, same-model handguns using the
    same type of ammunition were sent to 218 examiners. Ames I Study, at 3. Each set
    included one unknown sample and three known samples fired from the same known gun,
    which might or might not have been the source of the unknown sample. Id. at 4. Even
    though there was a known correct answer of either an identification or an elimination for
    every set, examiners were permitted to make “inconclusive” responses, which were “not
    counted as an error or as a non-answer[.]” Id. at 6. Of the 1,090 comparisons where the
    “known” and “unknown” cartridge cases were fired from the same source firearm, the
    examiners incorrectly excluded only four cartridge cases, yielding a false-negative rate of
    23
    0.367%. Id. at 15. Of the 2,180 comparisons where the “known” and “unknown” cartridge
    cases were fired from different firearms, the examiners incorrectly matched 22 cartridge
    cases, yielding a false-positive rate of 1.01%.16 Id. at 16. However, of the non-matching
    comparison sets, 735, or 33.7%, were classified as inconclusive, id., a significantly higher
    percentage than in any closed-set study.
    The Ames Laboratory later conducted a second open-set, black-box study that was
    completed in 2020, in between the Frye-Reed and Daubert-Rochkind hearings in this case.
    See Stanley J. Bajic et al., Report: Validation Study of the Accuracy, Repeatability, and
    Reproducibility of Firearm Comparisons, U.S. Dep’t of Energy 1-2 (2020) (the “Ames II
    Study”). The Ames II Study, which was undertaken in direct response to PCAST’s call for
    further studies to demonstrate the foundational validity of firearms identification, id. at 12,
    enrolled 173 examiners for a three-phase study to test for all three elements PCAST had
    identified as necessary to support foundational validity: accuracy (in Phase I), repeatability
    (in Phase II), and reproducibility (in Phase III). In each of three phases, each participating
    examiner received 15 comparison sets of known and unknown cartridge cases and 15
    comparison sets of known and unknown bullets. Id. at 23. The firearms used for the bullet
    comparisons were either Beretta or Ruger handguns and the firearms used for the cartridge
    case comparisons were either Beretta or Jimenez handguns. Id. Only the researchers knew
    the “ground truth” for each packet; that is, which “unknown” cartridges and bullets
    matched or did not match the included “known” cartridges and bullets. Id. As with the
    16
    The authors stressed that a significant majority of the false positive responses—
    20 out of 22—came from just five of the 165 examiners. Ames I Study, at 16.
    24
    Ames I Study, although there was a “ground truth” correct answer for each sample set,
    examiners were permitted to pick from among the full array of the AFTE Range of
    Conclusions—identification, elimination, or one of the three levels of “inconclusive.” Id.
    at 12-13.
    The first phase of testing was designed to assess accuracy of identification, “defined
    as the ability of an examiner to correctly identify a known match or eliminate a known
    nonmatch.” Id. at 33. In the second phase, each examiner was given the same test set
    examined in phase one, without being told it was the same, to test repeatability, “defined
    as the ability of an examiner, when confronted with the exact same comparison once again,
    to reach the same conclusion as when first examined.” Id. In the third phase, each
    examiner was given a test set that had previously been examined by one of the other
    examiners, to test reproducibility, “defined as the ability of a second examiner to evaluate
    a comparison set previously viewed by a different examiner and reach the same
    conclusion.” Id.
    In the first phase, the results, shown in percentages, were:
    25
    Id. at 35. Treating inconclusive results as appropriate answers, the authors identified a
    false negative rate for bullets and cartridge cases of 2.92% and 1.76%, respectively, and a
    false positive rate for each of 0.7% and 0.92%, respectively. Id. Examiners selected one
    of the three categories of inconclusive for 20.5% of matching bullet sets and 65.3% of non-
    matching bullet sets. Id. As reflected in the following table, the results overall varied
    based on the type of handgun that produced the bullet/cartridge, with examiners’ results
    reflecting much greater certainty and correctness in classifying bullets and cartridge cases
    fired from the Beretta handguns than from the Ruger (for bullets) and Jimenez (for
    cartridge cases) handguns:17
    17
    “Of the 27 Beretta handguns used in the study, 23 were from a single recent
    manufacturing run, and four were guns produced in separate earlier manufacturing runs.”
    Ames II Study, at 56. The Ames II Study does not identify similar information for the
    Ruger or Jimenez handguns.
    26
    Id. at 53.
    Comparing the results from the second phase of testing against the results from the
    first phase, intended to test repeatability, the outcomes, shown in percentages, were:
    Id. at 39. Thus, an examiner classifying the same matching bullet or cartridge case set a
    second time classified it in the same AFTE category 79% and 75.6% of the time,
    respectively, and an examiner classifying the same non-matching bullet or cartridge case
    set a second time did so 64.7% and 62.2% of the time, respectively. Id. The authors viewed
    these percentages favorably, concluding that this level of “observed agreement” exceeded
    the level of their “expected agreement.”18 Id. at 39-41. They did so, however, based on an
    expected level of agreement reflecting the overall pattern of results from the first phase of
    18
    The study authors also produced alternate calculations in which they merged
    either (1) all inconclusive results together or (2) positive identifications with “Inconclusive
    A” results and eliminations with “Inconclusive B” results. Ames II Study, at 40. As
    expected, those results produced greater agreement, although still ranging only from 71.3%
    agreement to 85.5% agreement. Id. at 42.
    27
    testing. Id. at 39-40. In other words, the metric against which the authors gauged
    repeatability was, in essence, random chance.
    Comparing the results from the third phase of testing against the results of the first
    phase, intended to test reproducibility, the outcomes, shown in percentages, were:
    Id. at 47. Thus, an examiner classifying a matching bullet or cartridge case set previously
    classified by a different examiner classified it in the same AFTE category 67.8% and 63.6%
    of the time, respectively, and an examiner classifying a nonmatching bullet or cartridge
    case set previously classified by a different examiner classified it in the same AFTE
    category 30.9% and 40.3% of the time, respectively. Id. The authors again viewed these
    percentages largely favorably. Id. at 47-49. Again, however, that conclusion was based
    on a level of expected agreement that was essentially random based on the overall results
    from the first phase of testing. Id. at 48-49.
    The State claims support from the Ames I and Ames II Studies based on what it
    calls their relatively low overall false positive rates. The State contends that those results
    confirm the low false positive rates produced in every other study of firearms identification,
    28
    which are worthy of consideration even if they were not as robust in design as the Ames
    studies. By contrast, Mr. Abruquah claims that the high rates of inconclusive responses in
    both studies and the low rates of repeatability and reproducibility in the Ames II Study
    further support the concerns raised by NRC and PCAST about the lack of demonstrated
    foundational validity of firearms identification.
    D.     Witness Testimony
    1.     The Frye-Reed Hearing
    Five witnesses testified at the two hearings conducted by the circuit court. In the
    Frye-Reed hearing, Mr. Abruquah called William Tobin, a 27-year veteran of the Federal
    Bureau of Investigation with 24 years’ experience at the FBI Laboratory and an expert in
    forensic metallurgy. Mr. Tobin’s testimony was broadly critical of firearms identification
    generally and the AFTE Theory specifically. Citing support from multiple sources, he
    opined that: (1) firearms identification is “not a science,” does not follow the scientific
    method, and is circular; (2) the AFTE Theory is wholly subjective and lacks any guidance
    for examiners to determine the number of similarities needed to achieve an identification;
    (3) in the absence of standards, examiners ignore or “rationalize away” dissimilarities in
    samples; (4) examiners are incapable of distinguishing between subclass characteristics
    and individual characteristics—a phenomenon referred to as “subclass carryover”—thus
    undermining a fundamental premise of the AFTE Theory; (5) the studies on which the State
    had relied are flawed, do not reflect actual casework, and underestimate error rates; and
    (6) the AFTE Theory had not been subject to any “valid hypothesis testing” because the
    studies cited as support for it “lack any indicia of scientific reliability.” Mr. Tobin opined
    29
    that, in the absence of a pool of samples from all other possible firearms that might have
    fired the bullets at issue, the most a firearms examiner could accurately testify to in reliance
    on the AFTE Theory is whether it was possible that the recovered bullets were fired from
    Mr. Abruquah’s revolver.
    The State presented three witnesses. It first presented Dr. James Hamby, an AFTE
    firearms examiner with a Ph.D. in forensic science who had been Chief of the Firearms
    Division for the United States Army Lab, authored dozens of articles and studies in the
    firearms examination field, trained firearms examiners domestically and internationally,
    and who, over the course of nearly 50 years in the field, managed his own forensic
    laboratory and two others. Dr. Hamby testified generally about the AFTE Theory, which
    he asserted had been accepted by the relevant scientific community and by courts, and
    proven by numerous studies, for more than a century. Dr. Hamby agreed with PCAST that
    to have foundational validity, a methodology dependent on subjective analysis must be
    subjected to empirical testing by multiple groups, be repeatable and reproducible, and
    provide valid estimates of the method’s accuracy. He opined that studies of firearms
    identification proved that the AFTE Theory meets all those criteria and has consistently
    low error rates. Dr. Hamby acknowledged that false positives can result when similarities
    in subclass characteristics are mistaken for individual characteristics, but testified that
    trained examiners would not make that mistake.
    Dr. Hamby also discussed the controls and standards governing the work of firearms
    identification examiners, including internal laboratory procedures, the AFTE training
    manual, and periodic proficiency training required of every examiner. He testified that one
    30
    way forensic labs guard against the possibility of false positive results is by having a second
    examiner review all matches to ensure the correctness of the first examiner’s decision. In
    his decades of experience, Dr. Hamby was not personally aware of a second examiner ever
    having reached a different conclusion than the first in actual casework, which he seemed
    to view as a positive reflection on the reliability of the methodology.
    The State’s second witness was Torin Suber, a forensic scientist manager with the
    Maryland State Police. Like Dr. Hamby, Mr. Suber testified about the low false-positive
    error rates identified in the Ames I and other studies. Mr. Suber agreed that some
    examiners could potentially mistake subclass characteristics for individual characteristics,
    but testified that such errors would be limited to novice examiners who “don’t actually
    have that eye or knack for identification yet.”
    The final witness presented at the Frye-Reed hearing was the State’s testifying
    expert, Mr. McVeigh, whom the court accepted as an expert in firearms and toolmark
    examinations generally, as well as “the specifics of the examination conducted in this
    matter.” Mr. McVeigh testified that 100% of his work is in firearms examinations and that
    firearms identification is generally accepted as reliable in the relevant scientific
    community. Mr. McVeigh acknowledged the subjective standards and procedures used in
    the AFTE methodology but claimed that it is “a forensic discipline with a fairly strict
    methodology and a lot of rules and accreditation standards to follow.” He also relied
    heavily on what he described as low error rates revealed by the Ames I Study and a separate
    31
    study out of Miami-Dade County.19 Although acknowledging the concern that examiners
    might mistake subclass characteristics for individual characteristics, Mr. McVeigh testified
    that possibility is “the number one thing[] that firearm examiners guard against.” He said
    that the “current thinking in the field” is that a trained examiner can overcome that concern.
    With respect to the examination he conducted in Mr. Abruquah’s case,
    Mr. McVeigh testified that he received for analysis two firearms, a Glock pistol and a
    Taurus revolver, along with “six fired bullet items,” one of which was unsuitable for
    comparison. Based on class characteristics, he first eliminated the Glock pistol. He then
    fired two rounds from the Taurus revolver and compared markings on those bullets against
    the crime scene bullets using the comparative microscope. In doing so, he focused on the
    “land impressions,” rather than the “groove impressions[, which] are the most likely place
    where the subclass [characteristics] would occur[.]”         Mr. McVeigh opined, without
    qualification, that, based on his analysis, “at some point each one of those five projectiles
    had been fired from the Taurus revolver.” He testified that his conclusion had been
    confirmed by another examiner in his lab.
    19
    Mr. McVeigh referred to the Miami-Dade Study as an open-set study. Although
    neither party introduced a report of the Miami-Dade Study, PCAST described it as a “partly
    open” study. PCAST Report, at 109. According to PCAST, examiners were provided 15
    questioned samples, 13 of which matched samples that were provided and two of which
    did not. Id. Of the 330 non-matching samples that were provided, the examiners
    eliminated 188 of them, reached an inconclusive determination for 138 more, and made
    four false classifications. Id. The inconclusive rate for the non-matching samples was thus
    41.8% with a false positive rate of 2.1%. Id. PCAST observed that even in that “partly
    open” study, the inconclusive rate was “200-fold higher” and the false positive rate was
    “100-fold higher” than in closed set studies. Id.
    32
    On cross-examination, Mr. McVeigh admitted that he did not know how Taurus
    manufactured its .38 Special revolver, how many such revolvers had been consecutively
    manufactured and shipped to the Prince George’s County area, or how many in the area
    might show similar subclass characteristics. He also admitted that the proficiency testing
    he had undergone during his career is not blind testing and is “straight forward.” Indeed,
    to his knowledge, no one in his lab had ever failed a proficiency test. Mr. McVeigh asserted
    that bias is not a concern in firearms examinations because the examiners are not provided
    any details from the police investigation before conducting an examination.
    2.      The Daubert-Rochkind Hearing
    At the Daubert-Rochkind hearing, each party presented only one witness to
    supplement the record that had been created at the Frye-Reed hearing. The State began
    with Dr. Hamby. In addition to reviewing many of the same points from his original
    testimony, Dr. Hamby testified that the AFTE Theory had been tested since 1907 and peer
    reviewed hundreds of times. He highlighted the low error rates produced in studies,
    including those in which examiners matched bullets fired from consecutively manufactured
    barrels. He was also asked about the more recent Ames II Study, but seemed to have
    limited familiarity with it.
    Mr. Abruquah presented testimony and an extensive affidavit from David Faigman,
    Dean of the University of California Hastings College of Law, whom the court accepted as
    an expert in statistical and methodological bases for scientific evidence, including research
    design, scientific research, and methodology. Dean Faigman discussed several concerns
    with the validity of the AFTE Theory, which were principally premised on the subjective
    33
    nature of the methodology, including: (1) the difference in error rates between closed- and
    open-set tests; (2) potential biases in testing that might skew the results in studies, including
    (a) the “Hawthorne effect,” which theorizes that participants in a test who know they are
    being observed will try harder; and (b) a bias toward selecting “inconclusive” responses in
    testing when examiners know it will not be counted against them, but that an incorrect
    “ground truth” response will; (3) an absence of pre-testing and control groups; (4) the
    “prior probability problem,” in which examiners expect a certain result and so are more
    likely to find it; and (5) the lack of repeatability and reproducibility effects.
    Dean Faigman agreed with PCAST that the Ames I Study “generally . . . was the
    right approach to studying the subject.” He observed, however, that if inconclusives were
    counted as errors, the error rate from that study would “balloon[]” to over 30%. In
    discussing the Ames II Study, he similarly opined that inconclusive responses should be
    counted as errors. By not doing so, he contended, the researchers had artificially reduced
    their error rates and allowed test participants to boost their scores. By his calculation, when
    accounting for inconclusive answers, the overall error rate of the Ames II Study was 53%
    for bullet comparisons and 44% for cartridge case comparisons—essentially the same as
    “flipping a coin.” Regarding the other two phases of the Ames II Study, Dean Faigman
    found the rates of repeatability and reproducibility “shockingly low.”
    E.     The Evolving Caselaw
    Until the 2008 NRC Report, most courts seem to have accepted expert testimony on
    firearms identification without incident. See David H. Kaye, Firearm-Mark Evidence:
    Looking Back and Looking Ahead, 68 Case Western Reserve L. Rev. 723, 723-26 (2018);
    34
    see also, e.g., United States v. Davis, 
    103 F.3d 660
    , 672 (8th Cir. 1996); United States v.
    Natson, 
    469 F. Supp. 2d 1253
    , 1261 (M.D. Ga. 2007) (permitting an expert to testify “to a
    100% degree of certainty”); United States v. Foster, 
    300 F. Supp. 2d 375
    , 376 n.1, 377
    (D. Md. 2004) (stating that “numerous cases have confirmed the reliability” of firearms
    and toolmark identification); United States v. Santiago, 
    199 F. Supp. 2d 101
    , 111 (S.D.N.Y.
    2002); State v. Mack, 
    653 N.E.2d 329
    , 337 (Ohio 1995); Commonwealth v. Moore, 
    340 A.2d 447
    , 451 (Pa. 1975).
    However, “[a]fter the NRC Report issued, some jurisdictions began to limit the
    scope of a ballistics expert’s testimony.” Gardner v. United States, 
    140 A.3d 1172
    , 1183
    (D.C. 2016); see also Commonwealth v. Pytou Heang, 
    942 N.E.2d 927
    , 938 (Mass. 2011)
    (“Concerns about both the lack of a firm scientific basis for evaluating the reliability of
    forensic ballistics evidence and the subjective nature of forensic ballistics comparisons
    have prompted many courts to reexamine the admissibility of such evidence.”). Initially,
    those limitations consisted primarily of precluding experts from testifying that their
    opinions were offered with something approaching absolute certainty. In United States v.
    Willock, for example, Judge William D. Quarles, Jr. of the United States District Court for
    the District of Maryland, in adopting a report and recommendation by then-Chief
    Magistrate Judge, later Judge, Paul W. Grimm of that court, permitted an examiner to
    testify as to a “match” between a crime scene cartridge case and a particular firearm, but
    “without any characterization as to degree of certainty.” 
    696 F. Supp. 2d at 572, 574
    ; see
    also United States v. Ashburn, 
    88 F. Supp. 3d 239
    , 250 (E.D.N.Y. 2015) (limiting an
    expert’s conclusions to those within a “reasonable degree of certainty in the ballistics field”
    35
    or a “reasonable degree of ballistics certainty”); Monteiro, 
    407 F. Supp. 2d at 372
     (stating
    that the proper standard is a “reasonable degree of ballistic certainty”); United States v.
    Taylor, 
    663 F. Supp. 2d 1170
    , 1180 (D.N.M. 2009) (“[The expert] will be permitted to
    give . . . his expert opinion that there is a match . . . . [He] will not be permitted to testify
    that his methodology allows him to reach this conclusion as a matter of scientific
    certainty.”); United States v. Glynn, 
    578 F. Supp. 2d 567
    , 574-75 (S.D.N.Y. 2008)
    (allowing expert testimony that it was “more likely than not” that certain bullets or casings
    came from the same gun, “but nothing more”).
    Following issuance of the PCAST Report, some courts have imposed yet more
    stringent limitations on testimony. One example of that evolution—notable because it
    involved the same judicial officer as Willock, Judge Grimm, as well as the same examiner
    as here, Mr. McVeigh—is in United States v. Medley, No. PWG-17-242 (D. Md. Apr. 24,
    2018), ECF No. 111. In Medley, Judge Grimm thoroughly reviewed the state of knowledge
    at that time concerning firearms identification, including developments since his report and
    recommendation in Willock. Judge Grimm restricted Mr. McVeigh to testifying only “that
    the marks that were produced by the . . . cartridges are consistent with the marks that were
    found on the” recovered firearm, and precluded him from offering any opinion that the
    cartridges “were fired by the same gun” or expressing “any confidence level” in his
    opinion. Id. at 119.
    Some other courts, although still a minority overall, have recently imposed similar
    or even more restrictive limitations. See United States v. Shipp, 
    422 F. Supp. 3d 762
    , 783
    (E.D.N.Y. 2019) (limiting expert’s testimony to opining that “the recovered firearm cannot
    36
    be excluded as the source of the recovered bullet fragment and shell casing”); Williams v.
    United States, 
    210 A.3d 734
    , 744 (D.C. 2019) (“[I]t is plainly error to allow a firearms and
    toolmark examiner to unqualifiedly opine, based on pattern matching, that a specific bullet
    was fired by a specific gun.”); United States v. Adams, 
    444 F. Supp. 3d 1248
    , 1256, 1261,
    1267 (D. Or. 2020) (precluding expert from offering testimony of a match but permitting
    testimony about “limited observational evidence”).20
    III.   ANALYSIS
    In granting in part Mr. Abruquah’s motion in limine to exclude firearms
    identification evidence, the circuit court ruled that Mr. McVeigh could not testify “to any
    level of practical certainty/impossibility, ballistic certainty, or scientific certainty that a
    suspect weapon matches certain bullet or casing striations.” However, the court ruled that
    Mr. McVeigh could opine the bullets and fragment “recovered from the murder scene fall
    into any of the AFTE Range of Conclusions[,]” i.e., identification, any of the three levels
    of inconclusive, or elimination. Accordingly, at trial, after explaining how he analyzed the
    samples and compared their features, Mr. McVeigh testified, over objection and separately
    with respect to each of the four bullets and the bullet fragment, that each “at some point”
    “had been fired” from or through “the Taurus revolver.” He testified neither that his
    20
    In United States v. Davis, citing Judge Grimm’s reasoning in Medley with
    approval, a federal district court judge in West Virginia also precluded Mr. McVeigh and
    other examiners from testifying that marks on a cartridge case indicated a “match” with a
    particular firearm, while permitting the examiners to testify that marks on the cartridges
    were “similar and consistent with each other.” 
    2019 WL 4306971
    , at *7, Case No. 4:18-
    cr-00011 (W.D. Va. 2019).
    37
    opinion was offered to any particular level of certainty nor that it was subject to any
    qualifications or caveats.
    In his appeal, Mr. Abruquah does not challenge all of Mr. McVeigh’s testimony or
    that firearms identification is sufficiently reliable to be admitted for some purposes.
    Instead, he contends that the methodology is insufficiently reliable to support testimony
    “identify[ing] a specific firearm as the source of a questioned bullet,” and argues that an
    examiner should be limited to opining, “at most, that a firearm cannot be excluded as the
    source of the questioned projectile[.]”     In response, the State argues that firearms
    identification evidence has been accepted by courts applying the Daubert standard as
    reliable, has repeatedly been proven reliable in studies demonstrating very low false-
    positive rates, and that, “[a]t best, [Mr. Abruquah] has demonstrated that there are ongoing
    debates regarding how to assess the AFTE methodology[,]” not whether it is admissible.
    In light of the scope of Mr. Abruquah’s challenge, our task is to assess, based on the
    information presented to the circuit court, whether the AFTE Theory can reliably support
    an unqualified opinion that a particular firearm is the source of one or more particular
    bullets. Our analysis of the Daubert-Rochkind factors is thus tailored specifically to that
    issue, not to the reliability of the methodology more generally.
    Before turning to the specific Daubert-Rochkind factors, we offer two preliminary
    observations. First, our analysis is not dependent on whether firearms identification is a
    “science.” “Daubert’s general holding,” adopted by this Court in Rochkind, “applies not
    only to testimony based on ‘scientific’ knowledge, but also to testimony based on
    ‘technical’ and ‘other specialized’ knowledge.” Rochkind, 471 Md. at 36 (quoting Kumho
    38
    Tire Co., 
    526 U.S. at 141
    ). Second, it is also not dispositive that firearms identification is
    a subjective endeavor. See, e.g., United States v. Romero-Lobato, 
    379 F. Supp. 3d 1111
    ,
    1120 (D. Nev. 2019) (“The mere fact that an expert’s opinion is derived from subjective
    methodology does not render it unreliable.”); Ashburn, 
    88 F. Supp. 3d at 246-47
     (stating
    that “the subjectivity of a methodology is not fatal under [Federal] Rule 702 and Daubert”).
    The absence of objective criteria is a factor that we consider in our analysis of reliability,
    but it is not dispositive.
    We now turn to consider each of the ten Daubert-Rochkind factors. Of course, those
    factors “are neither exhaustive nor mandatory,” Matthews, 479 Md. at 314, but they provide
    a helpful framework for our analysis in this case.
    A.      Testability
    Although significant dispute surrounds many of the studies conducted on firearms
    identification to date, and especially their applicability to actual casework, it is undisputed
    that firearms identification can be tested. Indeed, the bottom-line recommendation of the
    most significant critics of firearms identification to date, the authors of the 2009 NRC and
    PCAST Reports, was to call for more and better testing, not to question whether such
    testing is possible.
    B.      Peer Review and Publication
    The second Daubert-Rochkind factor considers whether a methodology has been
    submitted “to the scrutiny of the scientific community,” under the belief that doing so
    “increases the likelihood that substantive flaws in methodology will be detected.” Daubert,
    
    509 U.S. at 593
    . The circuit court concluded that the State satisfied its burden to show that
    39
    the firearms and toolmark identification methodology has been peer reviewed and
    published. We think the evidence is more mixed.
    The two most robust studies of firearms identification—Ames I and II—have not
    been peer reviewed or published in a journal. The record does not disclose why. Some of
    the articles on which the State and its witnesses rely have been published in the AFTE
    Journal, a publication of the primary trade group dedicated to advancing firearms
    identification. The required steps in the AFTE Journal’s peer review process involve a
    review by “a member of [AFTE’s] Editorial Review Panel” for “grammatical and technical
    correctness” and review by an AFTE “Assistant Editor[]” for “grammar and technical
    content.” See AFTE, Peer Review Process, available at https://afte.org/afte-journal/afte-
    journal-peer-review-process     (last   accessed    June    14,    2023),    archived     at
    https://perma.cc/822Y-C7G8. That process appears designed primarily to review articles
    and studies to determine their adherence to the AFTE Theory, not to test the methodology.
    Although a handful of other firearms identification studies have been published in
    other forensic journals, the record is devoid of any information about the extent or quality
    of peer review as concerns the validity of the methodology. Nonetheless, NRC’s and
    PCAST’s critiques of some of those same studies, and of the AFTE Theory more generally,
    have served many of the same purposes that might have been served by a robust peer review
    process. See Shipp, 422 F. Supp. 3d at 777 (concluding that the AFTE Theory had been
    adequately subjected to peer review and publication due in large part to “the scrutiny of
    PCAST and the flaws it perceived in the AFTE Theory”).
    40
    C.     Known or Potential Rate of Error
    The circuit court found that the parties did not dispute “that a known or potential
    rate of error has been attributed to firearms identification evidence,” and treated that as
    favoring admission of Mr. McVeigh’s testimony. (Emphasis removed). Neither party
    disputes that there is a potential rate of error for firearms identification or that a number of
    studies have purported to identify such an error rate. However, they do dispute whether
    the studies to date have identified a reliable error rate. On that issue, we glean several
    relevant points from the record.
    First, the reported rates of “ground truth” errors—i.e., “identification” of a non-
    matching sample or “elimination” of a matching sample—from studies in the record are
    relatively low.21 Error rates in most closed-set studies hover close to zero and the overall
    error rates calculated in the Ames I and II Studies were in the low single digits.22 It thus
    21
    Most of the parties’ attention in this case is naturally focused on the “false
    positive” rate. Although false positives create the greatest risk of leading directly to an
    erroneous guilty verdict, an examiner’s erroneous failure to eliminate the possibility of a
    match could also contribute to an erroneous guilty verdict if the correct answer—
    elimination—would have led to an acquittal. To that extent, it is notable that in the first
    round of testing in the Ames II Study, examiners correctly eliminated only 33.8% of non-
    matching bullets and 48.5% of non-matching cartridge cases. See Ames II Study, at 35.
    22
    The Ames I Study identified a false negative rate of 0.367%, with a 95%
    confidence interval of up to 0.94%, a false-negative-plus-inconclusive rate of 1.376%, with
    a 95% confidence interval of up to 2.26%, and a false positive rate of 0.939%, with a 95%
    confidence interval of up to 2.26%. Ames I Study, at 17. The Ames II Study reports its
    results for bullets as having a false positive error probability of 0.656%, with a 95%
    confidence interval of up to 1.42%, and a false negative error probability of 2.87%, with a
    95% confidence interval of up to 4.26%. The Ames II Study results for cartridge cases
    showed a false positive error probability of 0.933%, with a 95% confidence interval of up
    to 1.57% and a false negative error probability of 1.87%, with a 95% confidence interval
    of up to 2.99%. Ames II Study, at 77.
    41
    appears that, at least in studies conducted thus far, it is relatively rare for an examiner in a
    study environment to identify a match between a firearm and a non-matching bullet.
    Second, the low error rates from closed-set, matching studies utilizing bullets or
    cartridges fired from consecutively manufactured firearms or barrels, offer strong support
    for the propositions that: (1) firearms produce some unique collections of individual
    patterns and markings on bullets and cartridges they fire; and (2) such collections of
    individual patterns and markings can be reliably identified when subclass characteristics
    are removed from the equation.23
    Third, the rate of “inconclusive” responses in closed-set studies is negligible to non-
    existent, see, e.g., Hamby Study, at 555-56 (finding that examiners classified eight out of
    10,445 responses as inconclusive); but the rate of such responses in open-set studies is
    significant, see, e.g., Ames I Study, at 16 (finding that examiners classified 33.7% of “true
    different-source comparisons” as inconclusive); Ames II Study, at 35 (finding that
    examiners classified more than 20% of matching bullet sets and more than 65% of non-
    matching bullet sets as inconclusive), suggesting that examiners choose “inconclusive”
    even when it is not a “correct” response. The State, its witnesses, and the studies on which
    they rely suggest that responses of “inconclusive” are properly treated as appropriate
    23
    The use of bullets and cartridges from consecutively manufactured firearms or
    barrels, although more difficult in the sense that the markings in total can be expected to
    be more similar than those fired from non-consecutively manufactured firearms or barrels,
    also makes it easier to eliminate any confusion concerning whether marks or patterns are
    subclass or individual characteristics. See Tasha P. Smith et al., A Validation Study of
    Bullet and Cartridge Case Comparisons Using Samples Representative of Actual
    Casework, 61 J. Forensic Scis. 939 (2016) (noting that toolmarks on consecutively
    manufactured firearms may be identified “when subclass influence is excused”).
    42
    responses because, as stated in the Ames I Study, if “the examiner is unable to locate
    sufficient corresponding individual characteristics to either include or exclude an exhibit
    as having been fired in a particular firearm,” then “inconclusive” is the only appropriate
    response. Ames I Study, at 6. That answer would be more convincing if rates of
    inconclusive findings were consistent as between closed-set and open-set studies or if the
    Ames II Study had produced higher levels of consistency in the repeatability or
    reproducibility portions of the study. Instead, whether an examiner chooses “inconclusive”
    in a study seems to depend on something other than just the “corresponding individual
    characteristics” themselves.
    Fourth, if at least some inconclusives should be treated as incorrect responses, then
    the rates of error in open-set studies performed to date are unreliable. Notably, if just the
    “Inconclusive-A” responses—those for which the examiner thought there was almost
    enough agreement to identify a match—for non-matching bullets in the Ames II Study
    were counted as incorrect matches, the “false positive” rate would balloon from 0.7% to
    10.13%. That is particularly noteworthy because in all the studies conducted to date, the
    participating examiners knew that (1) they were being studied and (2) an inconclusive
    response would not be counted as incorrect. There is no evidence in the record that
    examiners in a casework environment—when processing presumably less pristine samples
    than those included in studies and that were provided to them by law enforcement officers
    in the context of an investigation—select inconclusive at the same rate they do in an open-
    set testing environment.
    43
    Fifth, it is notable that the accuracy rate in the Ames II Study varied significantly
    between the two different types of firearms tested. Examiners correctly classified 89.7%
    of matching bullet sets fired from Beretta handguns but only 56.6% of those fired from
    Ruger handguns. Ames II Study, at 53. They also correctly eliminated 38.7% of non-
    matching bullet sets fired from Beretta handguns and only 21.7% of those fired from Ruger
    handguns. Id. Given that variability, it is significant that the record provides scant
    information about where Taurus revolvers might fall on the error rate spectrum.24
    Finally, we observe that even if the studies reflecting potential error rates of up to
    2.6% reflected error rates in actual casework—a proposition for which this record provides
    no support—that rate must be assessed in the context of the evidence at issue. Not all
    expert witness testimony is created the same.         Unlike testimony that results in a
    determination that the perpetrator of a crime was of a certain height range, see Matthews,
    479 Md. at 285, a conclusion that a bullet found in a victim’s body was fired from the
    defendant’s gun is likely to lead much more directly to a conviction. That effect is
    compounded by the fact that a defendant is almost certain to lack access to the best evidence
    that could potentially contradict (or, of course, confirm) such testimony, which would be
    bullets fired from other firearms from the same production run.
    24
    During the Frye-Reed hearing, Dr. Hamby testified, using Glock as an example,
    that high-quality firearms would produce bullets and cartridge cases with very consistent
    patterns and markings, even across 10,000 cartridges, because the process of firing has
    little effect on the firearm. He also testified that, by contrast, an examiner might not be
    able to tell the difference between cartridge cases from rounds fired even consecutively
    from a low-quality firearm, because each bullet “just eats up the barrel.” Asked where a
    Taurus .38 revolver falls on the spectrum between a “cheap gun versus the most
    expensive,” Dr. Hamby offered that “it’s mid-level.”
    44
    The relatively low rate of “false positive” responses in studies conducted to date is
    by far the most persuasive piece of evidence in favor of admissibility of firearms
    identification evidence. On balance, however, the record does not demonstrate that that
    rate is reliable, especially when it comes to actual casework.
    D.     Existence and Maintenance of Standards and Controls
    The circuit court found the evidence with respect to the existence and maintenance
    of standards and controls to be “muddled” and so to weigh against admission. We mostly
    agree. On the one hand, to the extent that this factor encompasses operating procedures
    designed to ensure a consistency in process, see, e.g., Adams, 444 F. Supp. 3d at 1266
    (discussing annual proficiency testing, second reviewer verification, technical review, and
    training as relevant to the analysis of standards and quality control), the State presented
    evidence of such standards and controls. That evidence includes the AFTE training
    manual, laboratory standard operating procedures, and laboratory accreditation standards.
    Together, those sources provide standards and controls applicable to: (1) the training and
    certification of firearms examiners; (2) proficiency testing of firearms examiners; and
    (3) the mechanics of how examiners treat evidence and conduct examinations. Accord
    Willock, 
    696 F. Supp. 2d at 571-72
     (finding the existence of “standards governing the
    methodology of firearms-related toolmark examination”).
    Notably, however, the record also contains evidence that severely undermines the
    value of some of those same standards and controls. For example, one control touted by
    advocates of firearms identification is a requirement that a second reviewer confirm every
    identification classification. See Taylor, 
    663 F. Supp. 2d at 1176
     (noting an expert’s
    45
    testimony that “industry standards require confirmation by at least one other examiner
    when the first examiner reaches an identification”). Indeed, Dr. Hamby testified that he
    believes error rates identified in firearms identification studies are overstated because those
    studies do not permit confirmatory review by a second examiner. However, Dr. Hamby
    also testified that the confirmatory review process is not blind, meaning that the second
    reviewer knows the conclusion reached by the first. Even more significantly, Dr. Hamby
    testified that in his decades of experience in firearms identification in multiple laboratories
    in multiple states, he was not aware of a single occasion in which a second reviewer had
    reached a different conclusion than the first. In light of the findings in the reproducibility
    phase of the Ames II Study concerning how frequently examiners in the study environment
    come to different conclusions, Dr. Hamby’s testimony strongly suggests that study results
    do not, in fact, reliably represent what occurs in actual casework.
    As a second example, although advocates of firearms identification tout periodic
    proficiency testing by Collaborative Testing Services Inc. (“CTS”) as a method of ensuring
    the quality of firearms identification, the record contains no evidence supporting efficacy
    of that testing. To the contrary, the evidence suggests that examiners rarely, if ever, fail
    CTS proficiency tests. Dr. Hamby confirmed that the industry’s mandate to CTS with
    respect to proficiency tests “was to try to make them [as] inexpensive as possible.”
    To the extent that “standards and controls” encompasses standards applicable to the
    analysis itself, see, e.g., Shipp, 422 F. Supp. 3d at 779-81 (discussing the “circular and
    subjective” nature of the sufficient agreement standard and the inability of examiners “to
    protect against false positives” as an absence of “standards controlling the technique’s
    46
    operation” (quoting Daubert, 
    509 U.S. at 594
    )), firearms identification faces an even
    greater challenge.   As noted, “sufficient agreement,” the threshold for reaching an
    “identification” classification, lacks any guiding standard other than the examiner’s own
    subjective judgment. The AFTE Theory states that:
    “sufficient agreement” is related to the significant duplication of random
    toolmarks as evidenced by the correspondence of a pattern or combination
    of patterns of surface contours.
    The theory then observes that:
    [a]greement is significant when the agreement in individual characteristics
    exceeds the best agreement demonstrated between toolmarks known to have
    been produced by different tools and is consistent with agreement
    demonstrated by toolmarks known to have been produced by the same tool.
    AFTE Theory (emphasis removed). The theory offers no guidance as to the quality or
    quantity of shared individual characteristics—even assuming it is possible to reliably
    differentiate these from subclass characteristics—that should cause an examiner to
    determine that two bullets were fired from the same firearm or the quality or quantity of
    different individual characteristics that should cause an examiner to reach the opposite
    conclusion.25 See William A. Tobin & Peter J. Blau, Hypothesis Testing of the Critical
    Underlying Premise of Discernible Uniqueness in Firearms-Toolmarks Forensic Practice,
    
    53 Jurimetrics J. 121
    , 125 (2013); 2009 NRC Report, at 153-54; see also Itiel E. Dror,
    Commentary, The Error in “Error Rate”: Why Error Rates Are So Needed, Yet So Elusive,
    25
    On cross-examination, Mr. McVeigh answered that he could not identify the
    “least number of matching individual characteristics” that he had “ever used to make an
    identification[,]” declining to say even whether it may have been as low as two shared
    characteristics.
    47
    65 J. Forensic Scis. 1034, 1037 (2020) (stating that “forensic laboratories vary widely in
    what decisions are verified”).
    As explained in the findings of the authors of the Ames II Study, in defending the
    decision not to treat inconclusive results as errors:
    When confronted with a myriad of markings to be compared, a decision has
    to be made about whether the variations noted rise above a threshold level
    the examiner has unconsciously assigned for each examination.
    Ames II Study, at 75; see also 
    id.
     (“[A]ll examiners must establish for themselves a
    threshold value for evaluation[.]”). A “standard” for evaluation that is dependent on each
    individual examiner “unconsciously assign[ing]” a threshold level “for each examination”
    may not undermine the reliability of the methodology to support generalized testimony
    about the consistency of patterns and marks on ammunition fired from a particular firearm
    and crime scene bullets. It does not, however, support the reliability of the methodology
    to identify, without qualification, a particular crime scene bullet as having been fired from
    a particular firearm.
    On this issue, we find the results of phases two and three of the Ames II Study
    particularly enlightening. The PCAST Report identified accuracy, repeatability, and
    reproducibility as the key components of the foundational validity of any forensic
    technique. PCAST Report, at 5. Dr. Hamby testified at the Frye-Reed hearing that he
    agreed with that as a general proposition. The Ames II Study, which was not available at
    the time of the Frye-Reed hearing, was designed specifically to test the repeatability and
    reproducibility of the AFTE Theory methodology.           For purposes of reviewing the
    48
    reliability of firearms identification to support the admissibility of expert testimony of a
    “match,” the level of inconsistency identified through that study is troublesome.
    Notably, at the Frye-Reed hearing, Mr. McVeigh rejected the notion that a firearms
    examiner looking at a bullet multiple times might come to different conclusions, stating
    that he believed that firearms identification’s “repeatability is not in question.” By the time
    of the Daubert hearing, however, the Ames II Study had been released, with data revealing
    that an examiner reviewing the same bullet set a second time classified it in the same AFTE
    category only 79% of time for matching sets and 65% of the time for non-matching sets.
    Ames II Study, at 39. In light of the black-box nature of the study, there is no explanation
    of this lack of consistency or of the lack of reproducibility shown in the same study.26
    Nonetheless, it highlights both (1) the absence of any standards or controls to guide the
    analysis of examiners and (2) the importance of testing unverified (though undoubtedly
    genuinely held) claims about reliability.
    The lack of standards and controls is perhaps most acute in discerning whether a
    particular characteristic is a subclass or an individual characteristic. As noted, subclass
    characteristics are those shared by a group of firearms made using the same tools, such as
    those made in the same production run at a facility. Individual characteristics are those
    26
    As noted above, the Ames II Study also found that an examiner reviewing a bullet
    set previously classified by a different examiner classified it in the same AFTE category
    68% of the time for matching sets and 31% of the time for non-matching sets. Ames II
    Study, at 47. Even when the authors of the Ames II Study paired Identifications with
    Inconclusive-A responses and Eliminations with Inconclusive-C responses, second
    examiners still reached the same results as the first examiners looking at the same set of
    matching bullet sets only 77.4% of the time, and did so when looking at the same set of
    non-matching bullet sets only 49.0% of the time. Ames II Study, at 49.
    49
    specific to a particular firearm. Both can result from aspects of the manufacturing process;
    individual characteristics can also result from later events, such as ordinary wear and
    cleaning and polishing. Currently, there are no published standards or controls to guide
    examiners in identifying whether any particular pattern or mark is a subclass or an
    individual characteristic. Mr. McVeigh testified that examiners attempt to “guard against”
    this “subclass carryover,” and that it is possible for a “trained examiner” to do so.27
    However, neither he nor any other witness identified any industry standards or controls
    addressing that topic.
    On balance, consideration of the existence and maintenance of standards and
    controls weighs against admission of testimony of a “match” between a particular firearm
    and a particular crime scene bullet. Accord Shipp, 422 F. Supp. 3d at 782 (“[T]he court
    finds that the subjective and circular nature of AFTE Theory weighs against finding that a
    firearms examiner can reliably identify when two bullets or shell casings were fired from
    the same gun.”).
    27
    Mr. Abruquah relies on a 2007 study published in the AFTE Journal that was
    designed to test the possibility that cartridge cases fired from two pistols that had been
    shipped to the same retailer on the same date would show similarities in subclass
    characteristics. See Gene C. Rivera, Subclass Characteristics in Smith & Wesson SW40VE
    Sigma Pistols, 39 AFTE J. 247 (2007) (the “Rivera Study”). The Rivera Study found
    “alarming similarities” among the marks from the two different pistols, which, the author
    concluded, “should raise further concern for the firearm and tool mark examiner who may
    rely only on one particular type of mark for identification purposes.” Id. at 250. The Rivera
    Study suggested that the AFTE Theory’s “currently accepted standard for an identification”
    may need to be reconsidered as a result of very “significant” agreement between the two
    different pistols. Id. The AFTE seems to have responded by clarifying in the statement of
    its theory that an examiner’s decision should be based on individual characteristics, but it
    has not provided standards for distinguishing those from subclass characteristics.
    50
    E.      General Acceptance
    Whether the AFTE Theory of firearms identification is generally accepted by the
    relevant community is largely dependent on what the relevant community is. Based on
    materials included in the record, as well as caselaw, the community of firearms
    identification examiners appears to be overwhelmingly accepting of the AFTE Theory.
    See, e.g., Romero-Lobato, 
    379 F. Supp. 3d at 1122
     (stating that “[t]he AFTE method
    certainly satisfies th[e general acceptance] element”); United States v. Otero, 
    849 F. Supp. 2d 425
    , 435 (D.N.J. 2012) (stating that the AFTE Theory is “widely accepted in the forensic
    community and, specifically, in the community of firearm and toolmark examiners”);
    Willock, 
    696 F. Supp. 2d at 571
     (“[D]espite its inherent subjectivity, the AFTE
    theory . . . has been generally accepted within the field of toolmark examiners[.]”);
    Monteiro, 
    407 F. Supp. 2d at 372
     (“[T]he community of toolmark examiners seems
    virtually united in their acceptance of the current technique.”).
    On the other hand, groups of eminent scientists and other academics have been
    critical of the absence of studies demonstrating the validity of firearms identification
    generally and the AFTE Theory specifically. See, e.g., 2009 NRC Report, at 155; PCAST
    Report, at 111. Indeed, the record does not divulge evidence of general acceptance of the
    methodology by any group outside of firearms identification examiners and law
    enforcement.
    We conclude that the relevant community for the purpose of determining general
    acceptance consists of both firearms examiners and the broader scientific community that
    has weighed in on the reliability of the methodology. The widespread acceptance of the
    51
    methodology among those who have vast experience with it, study it, and devote their
    careers to it is of great significance. However, we would be remiss were we to rely
    exclusively on a community that, by definition, is dependent for its livelihood on the
    continued viability of a methodology to sustain it, while ignoring the relevant and
    persuasive input of a different, well-qualified, and disinterested segment of professionals.28
    We consider this factor to be neutral.
    F.     Whether Opinions Emerged Independently or Were Developed
    for Litigation
    The circuit court found that Mr. McVeigh’s testimony grew naturally out of research
    independent of the litigation because “the ultimate purpose of th[e firearms and toolmark]
    evidence is investigation [into the victim’s death], not litigation.”          We disagree.
    “Historically, forensic science has been used primarily in two phases of the criminal-justice
    process: (1) investigation, which seeks to identify the likely perpetrator of a crime, and
    (2) prosecution, which seeks to prove the guilt of a defendant beyond a reasonable
    doubt.”29 See PCAST Report, at 4. The use of firearms identification in a criminal
    prosecution is not independent of its investigative use. Nonetheless, the purpose of this
    factor is to determine whether there is reason for skepticism that the opinion reached might
    28
    In his dissent, Justice Gould takes Mr. Abruquah to task for not retaining his own
    firearms examiner to provide a different analysis of the bullets at issue. Dissenting Op. of
    Gould, J. at 42. In doing so, Justice Gould assumes that there are firearms examiners whose
    services were readily available to Mr. Abruquah, i.e., who are willing and able to take on
    work for criminal defendants in such cases. The record contains no support for that
    proposition.
    29
    Here, for example, it appears that Mr. Abruquah was already identified as the
    likely perpetrator of the murder before Mr. McVeigh began his analysis of the Taurus
    revolver and the crime scene bullets.
    52
    be tailored to the preferred result for the litigation, rather than the expert’s considered,
    independent conclusion. Here, the circuit court lauded Mr. McVeigh’s integrity and
    forthrightness, and we have no reason to second-guess that view.30 Crediting the court’s
    findings about Mr. McVeigh’s testimony, we are confident that the court would not weigh
    this factor against admissibility and so we will not either.
    G.     Unjustified Extrapolation from Accepted Premise
    Citing Mr. Abruquah’s “voluminous data indicating that firearms identification
    evidence is unjustifiably extrapolated from the toolmarks” and the State’s “credible and
    persuasive evidence that all extrapolations are justifiably calculated and well-reasoned[,]”
    the circuit court found “this factor to be in equipoise” and so to weigh against admission.
    In Rochkind, we explained that this factor invokes the concept of an analytical gap, as
    “[t]rained experts commonly extrapolate from existing data[,]” but a circuit court is not
    required “to admit opinion evidence that is connected to existing data only by the ipse dixit
    of the expert.” 471 Md. at 36 (quoting Joiner, 
    522 U.S. at 146
    ). “An ‘analytical gap’
    typically occurs as a result of ‘the failure by the expert witness to bridge the gap between
    [the expert’s] opinion and the empirical foundation on which the opinion was derived.’”
    Matthews, 479 Md. at 317 (quoting Savage v. State, 
    455 Md. 138
    , 163 (2017)).
    30
    We observe that another seasoned trial judge, even while limiting Mr. McVeigh’s
    testimony more than we do here, was equally profuse in his laudatory comments about
    Mr. McVeigh’s integrity. See United States v. Medley, No. PWG-17-242 (D. Md. April
    24, 2019), ECF No. 111, at 14 (“Mr. McVeigh, who was, for an expert witness,
    . . . remarkably forthcoming in his testimony and credible.”); id. at 53-54 (“I’ve seldom
    seen an expert who is as sincere and straightforward and no baloney and genuine in what
    he did as Mr. McVeigh.”). Nothing about our opinion or our conclusion in this case should
    be understood as contradicting that sentiment.
    53
    Although we do not preclude the possibility that the gap may be closed in the future,
    for the reasons already discussed, this case presents just such an analytical gap. That gap
    should have foreclosed Mr. McVeigh’s unqualified testimony that the crime scene bullets
    and bullet fragment were fired from Mr. Abruquah’s Taurus revolver. Although the court
    precluded Mr. McVeigh from testifying to his opinions to a “certainty,” an unqualified
    statement that the bullets were fired from Mr. Abruquah’s revolver is still more definitive
    than can be supported by the record. To be sure, the AFTE Theory is intended to allow
    firearms examiners to reach conclusions linking particular firearms to particular unknown
    bullets.    Mr. McVeigh’s testimony was thus not an unjustified departure from the
    methodology employed by those practicing in his field. We conclude, however, for reasons
    discussed above, that although the studies and other information in the record support the
    use of the AFTE Theory to reliably identify whether patterns and lines on bullets of
    unknown origin are consistent with those known to have been fired from a particular
    firearm, they do not support the use of that methodology to reliably opine without
    qualification that the bullets of unknown origin were fired from the particular firearm.
    H.      Accounting for Obvious Alternative Explanations
    The court found this factor “definitively weighs in favor of admission” because
    Mr. McVeigh and Dr. Hamby “clearly and concisely addressed how alternative
    interpretations of toolmarks are generally accounted for in the field of firearms
    identification,” and Mr. Abruquah’s “counters in this area were ineffective.” We disagree.
    For reasons already addressed, without the ability to examine other bullets fired from other
    firearms in the same production run as the firearm under examination, the record simply
    54
    does not support that firearms identification can reliably eliminate all alternative sources
    so as to permit unqualified testimony of a match between a particular firearm and a
    particular crime scene bullet.
    I.     Level of Care
    Mr. McVeigh’s testimony here was given as part of his regular professional work,
    rendering this factor technically inapplicable. Nonetheless, to the extent this factor can be
    re-cast as a general inquiry into the level of care he exhibited, we have no qualms about
    accepting the circuit court’s determination that Mr. McVeigh is a “consummate
    professional in his field” and demonstrated a “level of care in this case” that was not
    “assailed in any convincing manner.”
    J.     Relationship Between Reliability of Methodology and Opinion to
    Be Offered
    Based on the State’s evidence concerning the reliability of firearms examinations
    and “a dearth of real-life examples of erroneous examinations,” the circuit court concluded
    that “firearm and toolmark evidence is known to reach reliable results” and, therefore, that
    this final factor favors admission of the evidence. We do not question that firearms
    identification is generally reliable, and can be helpful to a jury, in identifying whether
    patterns and markings on “unknown” bullets or cartridges are consistent or inconsistent
    with those on bullets or cartridges known to have been fired from a particular firearm. For
    that reason, to the extent Mr. Abruquah suggests that testimony about the consistency of
    55
    such patterns and markings should be excluded, we disagree.31 It is also possible that
    experts who are asked the right questions or have the benefit of additional studies and data
    may be able to offer opinions that drill down further on the level of consistency exhibited
    by samples or the likelihood that two bullets or cartridges fired from different firearms
    might exhibit such consistency. However, based on the record here, and particularly the
    lack of evidence that study results are reflective of actual casework, firearms identification
    has not been shown to reach reliable results linking a particular unknown bullet to a
    particular known firearm.
    For those reasons, we conclude that the methodology of firearms identification
    presented to the circuit court did not provide a reliable basis for Mr. McVeigh’s unqualified
    opinion that four bullets and one bullet fragment found at the crime scene in this case were
    fired from Mr. Abruquah’s Taurus revolver. In effect, there was an analytical gap between
    the type of opinion firearms identification can reliably support and the opinion
    Mr. McVeigh offered.32 Accordingly, the circuit court abused its discretion in permitting
    Mr. McVeigh to offer that opinion.
    31
    As noted, Mr. Abruquah argues that the testimony of a firearms identification
    examiner should be limited to opining, “at most, that a firearm cannot be excluded as the
    source of the questioned projectile[.]” It is not entirely clear to us whether Mr. Abruquah
    believes that testimony about the consistency of patterns and markings on bullets would be
    permissible—and, indeed, necessary to establish the basis for an opinion that a firearm
    cannot be excluded—or whether he believes that testimony about the consistency of such
    patterns and markings goes too far and should be excluded. If the latter, we disagree for
    the reasons identified.
    32
    Both dissenting opinions contend that we have been insufficiently deferential to
    the trial court’s determination. Although they observe, quite correctly, that we do not ask
    trial judges to play the role of “amateur scientists,” Dissenting Op. of Hotten, J. at 4
    56
    IV.    HARMLESS ERROR
    The State argues in the alternative that any error in admitting Mr. McVeigh’s
    testimony was harmless. We disagree.
    “The harmless error doctrine is grounded in the notion that a defendant has the right
    to a fair trial, but not a perfect one.” State v. Jordan, 
    480 Md. 490
    , 505 (2022). The
    doctrine is strictly limited only to “error[s] in the trial process itself” that may warrant
    reversal. Id. at 506 (quoting Weaver v. Massachusetts, 
    137 S. Ct. 1899
    , 1907 (2017)). For
    an appellate court to conclude that the admission of expert testimony was harmless, the
    State must show “beyond a reasonable doubt, that the error in no way influenced the
    verdict.” Dionas, 
    436 Md. at 108
     (quoting Dorsey, 
    276 Md. at 659
    ).
    Upon our review of the record, we are not convinced beyond a reasonable doubt
    that the expert testimony in no way contributed to the guilty verdict. The firearm and
    toolmark identification evidence was the only direct evidence before the jury linking
    Mr. Abruquah’s gun to the crime. Absent that evidence, the guilty verdict rested upon
    circumstantial evidence of a dispute between the men, a witness who heard gunfire around
    the time of the dispute, a firearm recovered from the residence, and testimony of a jailhouse
    (quoting Rochkind, 471 Md. at 33-34); Dissenting Op. of Gould, J. at 1, 50, we also do not
    provide increased deference simply because the subject matter of the expert testimony is
    scientific. The forensic technique under review was, until relatively recently, accepted
    almost entirely without critical analysis. See discussion above at 16-17. Daubert and
    Rochkind demand more than adherence to an orthodoxy simply because it has long been
    accepted or because of the number of impressive-sounding statistics generated by studies
    that do not establish the reliability of the specific testimony offered. They require that the
    party proffering such evidence, whatever type of evidence it is, establish that it meets a
    minimum threshold of reliability.
    57
    informant. To be sure, that evidence is strong. But the burden of showing that an error
    was harmless is high and we cannot say, beyond a reasonable doubt, that the admission of
    the particular expert testimony at issue did not influence or contribute to the jury’s decision
    to convict Mr. Abruquah. See Clemons v. State, 
    392 Md. 339
    , 372 (2006) (stating that
    “[l]ay jurors tend to give considerable weight to ‘scientific’ evidence when presented by
    ‘experts’ with impressive credentials” (quoting Reed v. State, 
    283 Md. 374
    , 386 (1978))).
    CONCLUSION
    Based on the evidence presented at the hearings, we hold that the circuit court did
    not abuse its discretion in ruling that Mr. McVeigh could testify about firearms
    identification generally, his examination of the bullets and bullet fragments found at the
    crime scene, his comparison of that evidence to bullets known to have been fired from
    Mr. Abruquah’s Taurus revolver, and whether the patterns and markings on the crime scene
    bullets are consistent or inconsistent with the patterns and markings on the known bullets.
    However, the circuit court should not have permitted the State’s expert witness to opine
    without qualification that the crime scene bullets were fired from Mr. Abruquah’s firearm.
    Because the court’s error was not harmless beyond a reasonable doubt, we will therefore
    58
    reverse the circuit court’s ruling on Mr. Abruquah’s motion in limine, vacate
    Mr. Abruquah’s convictions, and remand for a new trial.
    RULING ON MOTION IN LIMINE
    CONCERNING EXPERT TESTIMONY
    REVERSED; JUDGMENT OF THE
    CIRCUIT  COURT    FOR    PRINCE
    GEORGE’S COUNTY VACATED; CASE
    REMANDED FOR A NEW TRIAL. COSTS
    TO BE PAID BY PRINCE GEORGE’S
    COUNTY.
    59
    Circuit Court for Prince George’s County
    Case No. CT121375X
    Argued: October 4, 2022                             IN THE SUPREME COURT
    OF MARYLAND*
    No. 10
    September Term, 2022
    __________________________________
    KOBINA EBO ABRUQUAH
    v.
    STATE OF MARYLAND
    __________________________________
    Fader, C.J.,
    Watts,
    Hotten,
    Booth,
    Biran,
    Gould,
    Eaves,
    JJ.
    __________________________________
    Dissenting Opinion by Hotten, J., which
    Eaves, J., joins.
    __________________________________
    Filed: June 20, 2023
    *During the November 8, 2022 general election, the voters of Maryland ratified a
    constitutional amendment changing the name of the Court of Appeals to the Supreme Court
    of Maryland. The name change took effect on December 14, 2022.
    Respectfully, I dissent. I would hold that the Circuit Court for Prince George’s
    County did not abuse its discretion in admitting the State’s expert firearm and toolmark
    identification testimony and evidence, following its analysis and consideration of the
    factors outlined in Rochkind v. Stevenson, 
    471 Md. 1
    , 
    236 A.3d 630
     (2020). “When the
    basis of an expert’s opinion is challenged pursuant to Maryland Rule 5-702, the review is
    abuse of discretion.” Id. at 10, 236 A.3d at 636 (citation omitted); State v. Matthews, 
    479 Md. 278
    , 305, 
    277 A.3d 991
    , 1007 (2022) (citation omitted). We have declared it “the rare
    case in which a Maryland trial court’s exercise of discretion to admit or deny expert
    testimony will be overturned.” Matthews, 479 Md. at 286, 306, 277 A.3d at 996, 1008.
    This should not be one of those instances.
    The Circuit Court Did Not Abuse Its Discretion in Admitting the State’s Firearm
    Toolmark Identification Testimony Under Rochkind.
    In Rochkind, this Court abandoned the Frye-Reed standard in favor of the more
    “flexible” analysis set forth in Daubert v. Merrell Dow Pharmaceuticals, Inc., 
    509 U.S. 579
    , 
    113 S. Ct. 2786 (1993)
    , concerning the admissibility of expert testimony. Rochkind,
    471 Md. at 29, 34, 236 A.3d at 646, 650. Rochkind prescribes ten factors for trial judges
    to consider when applying Maryland Rule 5-702.1 See id. at 35, 236 A.3d at 650 (emphasis
    added). First, the trial court must consider the original five Daubert factors:
    1
    Rule 5-702 pertains to the admissibility of expert testimony and provides, in full:
    Expert testimony may be admitted, in the form of an opinion or otherwise, if
    the court determines that the testimony will assist the trier of fact to
    understand the evidence or to determine a fact in issue. In making that
    determination, the court shall determine[:]
    (continued . . .)
    (1) whether a theory or technique can be (and has been) tested;
    (2) whether a theory or technique has been subjected to peer review and
    publication;
    (3) whether a particular scientific technique has a known or potential rate of
    error;
    (4) the existence and maintenance of standards and controls; and
    (5) whether a theory or technique is generally accepted.
    Id., 236 A.3d at 650 (quoting Daubert, 
    509 U.S. at
    593–94, 
    113 S. Ct. 2786
    ). Next, “courts
    have developed additional factors for determining whether expert testimony is sufficiently
    reliable[,]” including:
    (6) whether experts are proposing to testify about matters growing naturally
    and directly out of research they have conducted independent of the
    litigation, or whether they have developed their opinions expressly for
    purposes of testifying;
    (7) whether the expert has unjustifiably extrapolated from an accepted
    premise to an unfounded conclusion;
    (8) whether the expert has adequately accounted for obvious alternative
    explanations;
    (. . . continued)
    (1) whether the witness is qualified as an expert by knowledge, skill,
    experience, training, or education,
    (2) the appropriateness of the expert testimony on the particular subject, and
    (3) whether a sufficient factual basis exists to support the expert testimony.
    “[S]ufficient factual basis” includes two subfactors: “(1) an adequate supply of data; and
    (2) a reliable methodology.” Rochkind, 471 Md. at 22, 236 A.3d at 642 (citation omitted).
    Without either, the expert’s testimony is considered to be mere “speculation or conjecture.”
    Id., 236 A.3d at 642 (internal quotations and citation omitted).
    2
    (9) whether the expert is being as careful as he [or she] would be in his [or
    her] regular professional work outside his [or her] paid litigation consulting;
    and
    (10) whether the field of expertise claimed by the expert is known to reach
    reliable results for the type of opinion the expert would give.
    Id. at 35–36, 236 A.3d at 650 (quoting Fed. R. Evid. 702 Advisory Committee Note).
    We adopted Rochkind to “refine” and “streamline the evaluation of scientific expert
    testimony under [Md.] Rule 5-702.” Id. at 30, 35, 236 A.3d at 647, 650. As a threshold
    matter, scientific testimony must be relevant and reliable. Id. at 14, 236 A.3d at 638
    (citation omitted). Rochkind provided more flexibility for the gatekeeping mechanism of
    ascertaining whether the expert evidence should be admitted in its analytical shift to a
    “reliability” standard (Daubert), as opposed to “general acceptance” (Frye-Reed). The
    Rochkind elements “provide guidance on how to determine if scientific reasoning is,
    indeed, sound, or a scientific theory adequately justifies an expert’s conclusion.” Id. at 33,
    236 A.3d at 649. “[A]ll of the Daubert factors are relevant to determining the reliability
    of expert testimony, yet no single factor is dispositive in the analysis. A trial court may
    apply some, all, or none of the factors depending on the particular expert testimony at
    issue.” Id. at 37, 236 A.3d at 651 (emphasis added) (citation omitted). As the U.S.
    Supreme Court recognized, “Daubert’s list of specific factors neither necessarily nor
    exclusively applies to all experts or in every case. Rather, the law grants a [trial] court the
    same broad latitude when it decides how to determine reliability as it enjoys in respect to
    its ultimate reliability determination.” Kumho Tire Co. v. Carmichael, 
    526 U.S. 137
    , 141–
    42, 
    119 S. Ct. 1167
    , 1171 (1999) (emphasis added) (citation omitted); Matthews, 
    479 Md.
                            3
    at 314, 277 A.3d at 1012 (quoting Kumho Tire, 
    526 U.S. at
    141–42, 
    119 S. Ct. at 1171
    );
    Rochkind, 471 Md. at 37, 236 A.3d at 651 (quoting Kumho Tire, 
    526 U.S. at
    141–42, 
    119 S. Ct. at 1171
    ); Savage v. State, 
    455 Md. 138
    , 178, 
    166 A.3d 183
    , 206 (2017) (“[A] trial
    court is not required to consider any or all of the Daubert factors in making its reliability
    determination—they were ‘meant to be helpful, not determinative.’” (Adkins, J.,
    concurring) (quoting Kumho Tire, 
    526 U.S. at 151
    , 
    119 S. Ct. at 1175
    )). Trial judges,
    therefore, assume the critical role as “gatekeepers” against unreliable scientific evidence.
    Rochkind, 471 Md. at 38, 236 A.3d at 652; Matthews, 479 Md. at 322, 277 A.3d at 1017
    (citation omitted); Daubert, 
    509 U.S. at 597
    , 
    113 S. Ct. at 2798
    ; Gen. Elec. Co. v. Joiner,
    
    522 U.S. 136
    , 142, 
    118 S. Ct. 512
    , 517 (1997); Kumho Tire, 
    526 U.S. at 147
    , 
    119 S. Ct. at 1174
     (noting that Daubert’s gatekeeping function also applies to expert testimony); Victor
    E. Schwartz, Expert Testimony Needs Judges to Act As “Gatekeepers”: The Maryland
    Court of Appeals Teaches Why, 13 J. Tort L. 229, 231 (2020).
    Trial judges were provided these factors to assist in the evidence-based management
    of their judicial gatekeeping function in criminal, civil, and equitable causes.         The
    gatekeeping function is significant, particularly for the ability of the finders of fact to
    evaluate the scientific evidence and testimony and determine whether it should be accepted
    or rejected in their ultimate determination. This Court has rejected the argument that judges
    are “amateur scientists[:]”
    [T]rial judges are not required to make a determination of the ultimate
    scientific validity of any scientific propositions. Instead, they need only
    make a much more limited inquiry: whether sufficient indicia of legitimacy
    exist to support the conclusion that evidence derived from the principle may
    be profitably considered by a fact finder at trial. We are confident that trial
    4
    judges are duly capable of undertaking the reliability analysis absent
    scientific training.
    Rochkind, 471 Md. at 34, 236 A.3d at 649 (emphasis added) (internal quotations and
    citations omitted). “Applying these standards, we determine that the [circuit court’s]
    decision in this case . . . was within its discretion and therefore lawful.” Kumho Tire, 
    526 U.S. at 142
    , 
    119 S. Ct. at 1171
    .
    In light of the newly adopted Rochkind standard, the circuit court reconsidered the
    admissibility of the firearm and toolmark identification expert evidence, known as the
    AFTE methodology.2 The court examined the Rochkind factors “by way of the pleadings,
    testimony and evidence presented during the 5-day hearing conducted prior to the second
    jury trial, coupled with the supplemental hearing and pleadings conducted after the most
    recent remand.” The court found that the State’s evidence for factors one, two, three, five,
    six, eight, nine, and ten weighed in favor of admission. The evidence for factors four and
    seven weighed against admission. Based on the Rochkind factors and the “totality of the
    evidence and arguments presented,” the circuit court admitted the State’s firearms and
    toolmark examination evidence.
    As the majority notes, Mr. Abruquah argues on appeal that the State’s expert, Mr.
    McVeigh, should have been “limited to opining, ‘at most, that a firearm cannot be excluded
    as the source of the questioned projectile’” because the methodology is “insufficiently
    reliable” to support Mr. McVeigh’s testimony. Maj. Op. at 38. To determine this “tailored”
    2
    The State’s expert, Mr. Scott McVeigh, uses the “AFTE method” to “compar[e]
    microscopic markings on a bullet or cartridge case to make an ‘identification,’ i.e., to opine
    that a specific firearm is the source of a fired ammunition component.”
    5
    issue of “whether the AFTE Theory can reliably support an unqualified opinion that a
    particular firearm is the source of one or more particular bullets[,]” the majority conducts
    its own Rochkind analysis. 
    Id.
     at 38–39.
    The majority holds that factors one, six, and nine weighs in favor of admission. Id.
    at 39, 52–53, 55. Factors two, three, four, seven, eight, and ten, the majority concludes,
    weighs against admission. Id. at 40, 45, 50, 54–56. The majority notes that factor five is
    “neutral.” Id. at 52. Upon consideration of the factors, the majority determines that the
    record, “on balance,” does not support Mr. McVeigh’s “unqualified testimony that the
    crime scene bullets and bullet fragments were fired from Mr. Abruquah’s Taurus revolver.”
    Id. at 45, 54. According to the majority, “the studies and other information in the record .
    . . do not support the use of [the AFTE Theory] to reliably opine without qualification that
    the bullets of unknown origin were fired from the particular firearm.”           Id. at 54.
    Specifically, the “firearms identification has not been shown to reach reliable results
    linking a particular unknown bullet to a particular known firearm.” Id. at 56. The majority,
    therefore, holds that “there was an analytical gap between the type of opinion firearms
    identification can reliably support and the opinion Mr. McVeigh offered.” Id. (footnote
    omitted). To the majority, this “gap should have foreclosed” Mr. McVeigh’s unqualified
    testimony. Id. at 54.
    I disagree, finding no error with the circuit court’s analysis. The concept of the
    “analytical gap” originated in Joiner, 
    522 U.S. 136
    , 
    118 S. Ct. 512
    . Rochkind, 471 Md. at
    43, 236 A.3d at 654 (Watts, J., dissenting). Over the years, it has become a “critical”
    component in Maryland’s evidentiary analysis. Id. at 14, 236 A.3d at 638. As we’ve
    6
    explained, the role of the expert is to “connect[] the dots” or “provide[] a causal link”
    between the data and/or science used by the expert and the expert’s ultimate conclusions.
    See id. at 14–19, 236 A.3d at 638–40 (internal quotation marks and citations omitted). In
    essence, an “analytical gap” results when the expert fails to “‘bridge’” the gap between the
    expert’s opinion and “‘the empirical foundation on which the opinion was derived.’”
    Matthews, 479 Md. at 317, 277 A.3d at 1014 (quoting Savage, 
    455 Md. at 163
    , 
    166 A.3d at 198
    ). In determining reliability, the trial judge “must also consider the relationship
    between the methodology applied and conclusion reached.” Rochkind, 471 Md. at 36, 236
    A.3d at 651. However, neither Daubert nor the Federal Rules of Evidence require trial
    judges “‘to admit opinion evidence that is connected to existing data only by the ipse dixit
    of the expert.’” Id., 236 A.3d at 651 (quoting Joiner, 
    522 U.S. at 144
    , 
    118 S. Ct. at 519
    ).
    “[T]he question then becomes: is this specific causation case . . . , where the analytical gap
    was too vast, or a [] case where the analytical gap was sufficiently bridged?” Id. at 25, 236
    A.3d at 644.
    The circuit court thoroughly followed the Rochkind factors as prescribed by the U.S.
    Supreme Court and this Court. Under the previous Frye-Reed standard, trial courts
    determined whether an expert’s methodology was “generally accepted in the scientific
    community.” Id. at 12–13, 236 A.3d at 637; Matthews, 479 Md. at 307, 277 A.3d at 1008
    (“[P]rior to . . . Daubert [], the Frye ‘general acceptance’ test was the dominant standard
    that courts used to determine the admissibility of novel scientific evidence.”). “Under
    Daubert, judges are charged with gauging only the threshold reliability—not the ultimate
    validity—of a particular methodology or theory.” Rochkind, 471 Md. at 33, 236 A.3d at
    7
    649 (emphasis added). In conducting the “reliability assessment,” courts are to consider
    the non-exhaustive list of factors provided in Daubert. Matthews, 479 Md. at 307, 277
    A.3d at 1008 (citing Daubert, 
    509 U.S. at
    593–94, 
    113 S. Ct. at 2797
    ). Trial courts are
    granted “broad latitude to determine[]” “reliability in a particular case[.]” Kumho Tire,
    
    526 U.S. at 153
    , 
    119 S. Ct. at 1176
    ; Rochkind, 471 Md. at 37, 236 A.3d at 651 (quoting
    Kumho Tire, 
    526 U.S. at
    141–42, 
    119 S. Ct. at 1171
    ). Thus, if “a trial court is satisfied that
    an expert has applied a reliable methodology to an adequate supply of data, the court should
    not exclude the expert’s testimony merely because the court is concerned that the expert’s
    particular conclusions may be inaccurate.” Matthews, 479 Md. at 316, 277 A.3d at 1013.
    While Rochkind requires trial judges to conduct an analysis under the Rochkind-
    Daubert factors, this Court does not require trial judges to be “scientists” and arrive at a
    conclusion with some measure of mathematical certainty. See Rochkind, 471 Md. at 33,
    236 A.3d at 649; Daubert, 
    509 U.S. at 597
    , 
    113 S. Ct. at 2799
     (“[T]he Rules of Evidence .
    . . do assign to the trial judge the task of ensuring that an expert’s testimony both rests on
    a reliable foundation and is relevant to the task at hand. Pertinent evidence based on
    scientifically valid principles will satisfy those demands.”). As the U.S. Supreme Court
    observed, “there are no certainties in science.” Daubert, 
    509 U.S. at 590
    , 
    113 S. Ct. at 2795
     (citation omitted). Accordingly, a trial judge’s gatekeeping role isn’t to determine
    whether the expert is “right” or “wrong;” rather, the judge’s role is to determine whether
    the expert’s testimony is “adequately grounded in reliable and sound science, and that there
    is not ‘too great an analytical gap’ between the expert’s methodology and conclusions.”
    Schwartz, supra, at 233 (quoting Rochkind, 471 Md. at 36, 236 A.3d at 651); Maj. Op. at
    8
    9. This is exactly what the circuit court did when it recognized Mr. Abruquah’s “Herculean
    effort” in demonstrating “why the evidence should be heavily scrutinized, questioned and
    potentially impeached[.]”      While the court did not, however, expressly address an
    “analytical gap,” it observed that “the crux” of Mr. Abruquah’s arguments “address
    impeachment rather than admissibility.”
    In Rochkind, we rejected the argument that the Rochkind-Daubert standard
    “enable[d] judges to . . . ‘usurp[] the role of juries.’” Rochkind, 471 Md. at 33, 236 A.3d
    at 649. The power to weigh the validity of the evidence still sits with the jury or fact finder.
    Id., 236 A.3d at 649. While I recognize the importance of “published standards,” it is for
    the jury or the factfinder to determine the validity of the methodology of the firearm
    identification testimony presented—not the circuit court or this Court. Maj. Op. at 50;
    Rochkind, 471 Md. at 33, 236 A.3d at 649. We reaffirmed this principle a year later in
    State v. Matthews, acknowledging that “[t]he unknown degree of uncertainty concerning
    the accuracy of [the expert testimony] went to the weight the jury should give to the expert
    testimony, not to its admissibility.” 479 Md. at 313, 277 A.3d at 1012 (footnote omitted).
    Here, the circuit court continuously reaffirmed this notion, stating that Mr. Abruquah’s
    critiques of the firearm identification evidence “are more suited to the weight such evidence
    should be given at trial.” (Emphasis added). In doing so, it fulfilled its obligation under
    Rochkind.
    The majority notes that Mr. McVeigh’s testimony “was the only direct evidence
    before the jury linking Mr. Abruquah’s gun to the crime.” Maj. Op. at 57. “Absent that
    evidence,” the majority observes that Mr. Abruquah’s “guilty verdict rested upon
    9
    circumstantial evidence[.]” Id. at 57. Accordingly, the majority concludes that the
    admission of such testimony was not harmless because it “cannot say, beyond a reasonable
    doubt, that the admission of the particular expert testimony at issue did not influence or
    contribute to the jury’s decision to convict Mr. Abruquah.” Id. at 58. This is especially
    so, the majority recognizes, because “[l]ay jurors tend to give considerable weight to
    ‘scientific evidence’ when presented by ‘experts’ with impressive credentials[.]” Id.
    (quoting Clemons v. State, 
    392 Md. 339
    , 372, 
    896 A.2d 1059
    , 1078 (2006) (internal
    quotations and citation omitted)).
    Assuming, arguendo, that the circuit court erred in admitting Mr. McVeigh’s
    testimony, such error was harmless considering the overwhelming circumstantial and
    direct evidence of guilt tying Mr. Abruquah to the shooting. As the majority notes, the
    responding officers left Mr. Abruquah and Mr. Aguirre-Herrera after their third response
    to the men’s shared residence around 12:15 a.m. According to the officers, Mr. Aguirre-
    Herrera appeared to be terrified of Mr. Abruquah. A nearby witness testified that he heard
    gunshots between 11:30 p.m. and 12:30 a.m. During questioning, Mr. Abruquah told
    police where to find his firearms in the men’s shared residence, including Mr. Abruquah’s
    Taurus .38 Special revolver. The State also introduced into evidence the transcript of the
    testimony of Cecil Muhammed, Mr. Abruquah’s jail cellmate. Mr. Muhammed testified
    that, while they were incarcerated together, Mr. Abruquah confessed to shooting Mr.
    Aguirre-Herrera with his Taurus .38 on the night in question.          According to Mr.
    Muhammed, Mr. Abruquah and Mr. Aguirre-Herrera were in a relationship, but Mr.
    Aguirre-Herrera engaged in prostitution through Craigslist and conducted such business in
    10
    the men’s shared residence. Mr. Muhammed testified that this allegedly enraged Mr.
    Abruquah and made him jealous. We recognize this evidence is circumstantial; yet, as the
    majority itself observes, is “strong.” Maj. Op. at 58.
    Trial judges provide jury instructions “to aid the jury in clearly understanding the
    case, to provide guidance for the jury’s deliberations, and to help the jury arrive at a correct
    verdict.” Stabb v. State, 
    423 Md. 454
    , 464, 
    31 A.3d 922
    , 928 (2011) (internal quotation
    marks and citation omitted). As such, trial judges instruct juries that “[a] conviction may
    rest on circumstantial evidence alone, on direct evidence alone, or on a combination of
    circumstantial and direct evidence.” Taylor v. State, 
    473 Md. 205
    , 218 n.8, 
    249 A.3d 810
    ,
    818 n.8 (2021) (emphasis added) (internal quotation marks and citation omitted). “The law
    makes no distinction between the weight to be given to either direct or circumstantial
    evidence.” Maryland Criminal Pattern Jury Instructions (“MPJI-Cr”) 3:01 (Maryland State
    Bar Association 2d ed. 2022). “‘Circumstantial evidence may support a conviction if the
    circumstances, taken together, do not require the trier of fact to resort to speculation or
    conjecture . . . . It must afford the basis for an inference of guilt beyond a reasonable
    doubt.’” Beckwitt v. State, 
    477 Md. 398
    , 429, 
    270 A.3d 307
    , 325 (2022) (quoting Smith v.
    State, 
    415 Md. 174
    , 185, 
    999 A.2d 986
    , 992 (2010)). Accordingly, even in a case relying
    solely on circumstantial evidence, “the finder of fact has the ‘ability to choose among
    differing inferences that might possibly be made from a factual situation[.]’” Smith, 
    415 Md. at 183
    , 
    999 A.2d at 991
     (quoting State v. Smith, 
    374 Md. 527
    , 534, 
    823 A.2d 664
    , 668
    (2003)).
    11
    “[A] fundamental principle underlying trial by jury is that the credibility of a witness
    and the weight to be accorded the witness’ testimony are solely within the province of the
    jury.” Fallin v. State, 
    460 Md. 130
    , 154, 
    188 A.3d 988
    , 1002 (2018) (internal quotation
    marks and citation omitted); see also MPJI-Cr 3:10. MPJI-Cr 3:14 provides, in part, that
    jurors should:
    [C]onsider an expert’s testimony together with all the other evidence. . . .
    You should give expert testimony the weight and value you believe it should
    have. You are not required to accept an expert’s testimony, even if it is
    uncontradicted. As with any other witness, you may believe all, part, or none
    of the testimony of any expert.
    (Emphasis added). In adopting Daubert, we reiterated this notion, affirming that “juries
    will continue to weigh competing, but still reliable, testimony.” Rochkind, 471 Md. at 33,
    236 A.3d at 649. We, therefore, cannot hold that the firearm evidence did or “did not
    influence or contribute to the jury’s decision[.]” Maj. Op. at 58; Stokes v. State, 
    379 Md. 618
    , 638, 
    843 A.2d 64
    , 75 (2004) (“Jury deliberations are private and are to be conducted
    in secret.” (citation omitted)). “We need not decide whether the jury could have drawn
    other inferences from the evidence, refused to draw inferences, or whether we would have
    drawn different inferences from the evidence.” Smith, 
    415 Md. at 184
    , 
    999 A.2d at 991
    (citation omitted). The jury could have based its verdict upon a weighing of all the
    evidence, including the scientific and circumstantial evidence. See MPJI-Cr 3:14; see also
    Howling v. State, 
    478 Md. 472
    , 507, 
    274 A.3d 1124
    , 1144 (2022) (“[O]ur concern is only
    whether the verdict was supported by sufficient evidence, direct or circumstantial, which
    could fairly convince a trier of fact of the defendant’s guilt of the offenses charged beyond
    a reasonable doubt.” (internal quotation marks and citations omitted)); State v. Manion,
    12
    
    442 Md. 419
    , 437, 
    112 A.3d 506
    , 517 (2015) (recognizing that “a rational trier of fact could
    conclude, beyond a reasonable doubt, that” the defendant intended to commit a crime based
    circumstantial evidence). Concluding otherwise, as the majority does here, minimizes the
    importance of both the role of the jury and jury instructions if we expect juries to believe
    that direct evidence, especially if it’s scientific evidence, provides any more persuasive
    value than circumstantial evidence. See MPJI-Cr 3:01; see also Taylor, 
    473 Md. at
    218
    n.8, 249 A.3d at 818 n.8 (“No greater degree of certainty is required when the evidence is
    circumstantial than when it is direct.” (internal quotation marks and citation omitted)).
    The circuit court followed Rochkind within the letter of the law as prescribed. See
    Matthews, 479 Md. at 305, 277 A.3d at 1007 (citing Jenkins v. State, 
    375 Md. 284
    , 296,
    
    825 A.2d 1008
    , 1015 (2003)). Based upon the extensive hearings, pleadings, testimony,
    and evidence presented, the circuit court was satisfied that the State met its burden to admit
    the firearm identification expert evidence for consideration by the jury.3 Its decision was
    neither “well removed from any center mark” nor “beyond the fringe of what [this] [C]ourt
    3
    The majority’s analysis is largely predicated on a consideration of the Daubert
    factors. Maj. Op. at 39-56. While the majority recognizes that the factors “are neither
    exhaustive nor mandatory,” its rationale seems to suggest otherwise. Id. at 39 (internal
    quotation marks and citation omitted). As we expressed, the list of Daubert factors is not
    exhaustive or mandatory. Kumho Tire, 
    526 U.S. at 141
    , 
    119 S. Ct. at 1171
    ; Rochkind, 471
    Md. at 36–37, 236 A.3d at 651 (internal quotation marks and citation omitted). The circuit
    court had “broad latitude” to consider how to determine the reliability of the State’s firearm
    and toolmark identification expert evidence. Kumho Tire, 
    526 U.S. at 141
    , 
    119 S. Ct. at 1171
    . In addition to its Daubert analysis, the circuit court considered other aspects of the
    case expressed herein in making its ultimate reliability determination. It was within the
    court’s discretion and capacity to do so as the gatekeeper. 
    Id.,
     
    119 S. Ct. at 1171
    ; Rochkind,
    471 Md. at 36–37, 236 A.3d at 651 (internal quotation marks and citation omitted). “For
    these [] reasons taken together, it concluded that [the] testimony was” admissible. Id. at
    156, 
    119 S. Ct. at 1178
    .
    13
    deems minimally acceptable.” 
    Id.,
     277 A.3d at 1007 (internal quotation marks and citation
    omitted). The majority considers this “standard” to be “somewhat unfair.” Maj. Op. at 6
    n.5. While it observes that “the circuit court acted deliberately and thoughtfully in
    approaching, analyzing, and resolving the question before it[,]” the majority nonetheless
    “c[a]me to a different conclusion concerning the outer bounds of what is acceptable expert
    evidence in this area[]” and provides no guidance for the trial court in terms of what
    standard applies. Id. at 6–7 n.5. This Court does not “reverse simply because [we] would
    not have made the same ruling.” Devincentz v. State, 
    460 Md. 518
    , 550, 
    191 A.3d 373
    ,
    391 (2018) (emphasis added) (internal quotations and citation omitted). A reasonable
    person would take the view adopted by the circuit court here. Williams v. State, 
    457 Md. 551
    , 563, 
    179 A.3d 1006
    , 1013 (2018). It was, therefore, within the realm of the jury, as
    the triers of fact, to resolve the firearm toolmark analysis and opinion, along with the other
    evidence presented, in rendering its verdict. See MPJI-Cr 3:14.
    The majority’s holding blurs the role of the trial judge, allowing judges to “exclude
    . . . legitimate opinions of experts[] that [] are for a jury to weigh credibility.” Rochkind,
    471 Md. at 33, 236 A.3d at 649. The majority appears to conflate the role of the trial judge
    as gatekeepers, with the evaluation of the science or the expert opinion that is presented for
    consideration of its admissibility by the judge. That is not what Rochkind required. At the
    time of Rochkind, we did not “foresee th[is] gloomy outlook.” Id., 236 A.3d at 649.
    However, the majority’s decision does exactly that.
    14
    CONCLUSION
    For these reasons, I respectfully dissent and would affirm the judgment of the
    Circuit Court for Prince George’s County. Justice Eaves has authorized me to state that
    she joins in this opinion.
    15
    Circuit Court for Prince George’s County
    Case No. CT121375X
    Argued: October 4, 2022                             IN THE SUPREME COURT
    OF MARYLAND*
    No. 10
    September Term, 2022
    __________________________________
    KOBINA EBO ABRUQUAH
    v.
    STATE OF MARYLAND
    __________________________________
    Fader, C.J.,
    Watts,
    Hotten,
    Booth,
    Biran,
    Gould,
    Eaves,
    JJ.
    __________________________________
    Dissenting Opinion by Gould, J.
    __________________________________
    Filed: June 20, 2023
    *During the November 8, 2022 general election, the voters of Maryland ratified a
    constitutional amendment changing the name of the Court of Appeals to the Supreme Court
    of Maryland. The name change took effect on December 14, 2022.
    In Rochkind v. Stevenson, 
    471 Md. 1
     (2020), this Court adopted the Daubert1
    framework for the admission of expert testimony and embraced certain important
    principles. Justice Hotten highlights these principles in her dissent, some of which I
    reiterate here for context. Dissenting Op. of Justice Hotten 1-4.
    First, the Daubert-Rochkind factors (“Daubert factors”) provide trial courts with a
    flexible guide—not a mandatory scoresheet—for serving as gatekeepers with respect to
    scientific or technical evidence. See, e.g., Rochkind, 471 Md. at 36-37; Daubert, 
    509 U.S. at 589, 596
    .
    Second, trial courts are not tasked with determining “the ultimate scientific validity
    of any scientific propositions.” Rochkind, 471 Md. at 34 (quoting State v. Porter, 
    698 A.2d 739
    , 757 (Conn. 1997)). Instead, the trial court’s duty is far more modest—to determine
    only “whether sufficient indicia of legitimacy exist to support the conclusion that evidence
    derived from the principle may be profitably considered by a fact finder at trial.” 
    Id.
     As
    Justice Hotten reminds us, this Court emphasized that “[w]e are confident that trial judges
    are duly capable of undertaking the reliability analysis absent scientific training.”
    Dissenting Op. of Justice Hotten 4-5 (quoting Rochkind, 471 Md. at 34). We do not ask
    judges to be “amateur scientists.” Id. at 4 (quoting Rochkind, 471 Md. at 33).
    Third, we apply an abuse of discretion standard of review to the trial court’s
    admission or exclusion of expert testimony. Rochkind, 471 Md. at 37. This deferential
    posture is inextricably linked to our expectation that trial judges need only serve as
    1
    Daubert v. Merrell Dow Pharm., Inc., 
    509 U.S. 579
     (1993).
    gatekeepers: that is, we expect trial judges to make reasonable decisions based on the
    evidence presented, not to become experts in their own right. If a trial court’s decision is
    supported by competent evidence and untainted by a mistake of law, we defer to its
    determination even if we would have reached a different conclusion.               See State v.
    Matthews, 
    479 Md. 278
    , 305 (2022) (quoting Devincentz v. State, 
    460 Md. 518
    , 550 (2018)
    (we do “not reverse simply because . . . [we] would not have made the same ruling”)); Id.
    at 306 (“[I]t is still the rare case in which a Maryland trial court’s exercise of discretion to
    admit or deny expert testimony will be overturned.”).
    This Court has articulated the abuse of discretion standard in several ways. We have
    held that an abuse of discretion occurs “where no reasonable person would take the view
    adopted by the circuit court,” Williams v. State, 
    457 Md. 551
    , 563 (2018); “when a trial
    judge exercises discretion in an arbitrary or capricious manner or when he or she acts
    beyond the letter or reason of the law[,]” Jenkins v. State, 
    375 Md. 284
    , 295-96 (2003); and
    when “the trial court’s decision [is] ‘well removed from any center mark imagined by the
    reviewing court and beyond the fringe of what that court deems minimally acceptable[,]’”
    Devincentz, 
    460 Md. at 550
     (quotation omitted).           We reiterated these standards in
    Matthews. 479 Md. at 305-06.
    Although the Majority acknowledges the abuse of discretion standard, it suggests
    that its application here is unfair “in the absence of additional caselaw from this Court
    implementing the newly adopted standard[.]” Maj. Op. 6 n.5. The Majority thus sidesteps
    the deferential standard of review by recasting its decision as establishing the “outer bounds
    of what is acceptable expert evidence in this area.” See Maj. Op. 7 n.5.
    2
    This misses the mark. First, the Majority does not in practice establish any
    boundaries for the admission of forensic firearms evidence. Second, though this Court has
    not yet evaluated this type of evidence under Daubert, the Majority’s disagreement with
    the trial court does not arise from a lack of judicial guidance. To the contrary, there is no
    shortage of federal cases applying the Daubert factors to determine the reliability of the
    Association of Firearm and Toolmark Examiners (“AFTE”) Theory of Identification (the
    “AFTE Theory”),2 see Maj. Op. 14 n.8, some of which expressly considered the 2016
    report issued by the President’s Council of Advisors on Science and Technology on the
    scientific validity of forensic techniques, including firearms identification.3 At bottom, the
    Majority simply disagrees with the trial court’s application of the Daubert factors and its
    interpretation of the evidence—a classic de novo review.
    That the Majority thinks the abuse of standard is unfair in this context does not
    justify setting it aside and applying what is, in practice if not in name, a de novo standard
    2
    See, e.g., United States v. Brown, 
    973 F.3d 667
    , 702-04 (7th Cir. 2020) (affirming
    admission of expert testimony that cartridge cases found in different locations matched and
    acknowledging PCAST findings); United States v. Johnson, 
    875 F.3d 1265
    , 1280-81 (9th
    Cir. 2017) (affirming admission of expert testimony that matched a bullet recovered from
    crime scene to defendant’s pistol and acknowledging 2009 NAS report’s criticisms of the
    AFTE Theory).
    3
    See EXECUTIVE OFFICE OF THE PRESIDENT, PRESIDENT’S COUNCIL OF ADVISORS
    ON SCIENCE AND TECHNOLOGY, REPORT TO THE PRESIDENT, FORENSIC SCIENCE IN
    CRIMINAL COURTS: ENSURING SCIENTIFIC VALIDITY OF FEATURE-COMPARISON
    METHODS (Sept. 2016) (“PCAST” or the “PCAST Report”), available at
    https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_fo
    rensic_science_report_final.pdf (last accessed June 12, 2023) archived at
    https://perma.cc/3QWJ-2DGR.
    3
    of review. If the abuse of discretion standard is not appropriate here, then we should
    reconsider whether that standard is appropriate for reviewing Daubert decisions. But we
    do not serve well the parties and trial judges who apply our decisions if we inconsistently
    apply the standards of review to a trial court’s discretionary ruling.
    Usually, when we hold that a trial court abuses its discretion, we identify what it did
    wrong and explain how to do it properly going forward. See, e.g., State v. Robertson, 
    463 Md. 342
    , 365 (2019) (explaining the trial court’s error underlying abuse of discretion
    holding and correcting the mistake for future cases); State v. Heath, 
    464 Md. 445
    , 462-65
    (2019) (holding that the trial court abused its discretion, and explaining how the abuse
    occurred and how it could be avoided in future cases). Not so today. Though the Majority
    cabins its analysis to the record here and acknowledges that trial courts may consider other
    studies in future cases, Maj. Op. 9-10 & 10 n.6, the Majority fails to instruct trial courts
    how to determine the levels at which the accuracy, repeatability, reproducibility, and
    inconclusive determination rates of firearm identification would be sufficiently reliable for
    the evidence to be “profitably considered by a fact finder at trial,” Rochkind, 471 Md. at
    34. From the Majority’s opinion today, trial courts can only glean that these metrics, based
    on the studies discussed by the Majority, fail to establish reliability. The Majority’s
    opinion leaves trial courts rudderless at sea in evaluating this type of evidence henceforth.
    As discussed below, the focus of our inquiry should not be the reliability of the
    AFTE Theory in general, but rather the reliability of conclusive determinations produced
    when the AFTE Theory is applied. Of course, an examiner applying the AFTE Theory
    might be unable to declare a match (“identification”) or a non-match (“elimination”),
    4
    resulting in an inconclusive determination. But that’s not our concern. Rather, our concern
    is this: when the examiner does declare an identification or elimination, we want to know
    how reliable that determination is. The record shows that conclusive determinations of
    either kind (identification or elimination) are highly reliable. So, given the record before
    it, the trial court here made a ruling well within the bounds of its considerable discretion.
    I join Justice Hotten’s dissent but write separately to explain how the evidence at
    the center of the Majority’s analysis was sufficient to support the trial court’s admission of
    Scott McVeigh’s unqualified opinion that bullets recovered from the murder scene were
    fired from Mr. Abruquah’s Taurus revolver.4 In so doing, I assume familiarity with the
    defined terms and discussion of the various studies (Ames I and II, 5 in particular) in the
    Majority’s opinion.
    I.
    PCAST
    Before delving into the results of Ames I and Ames II, the two studies that garnered
    the lion’s share of the Majority’s attention, we should recognize that the trial court was
    4
    I join Justice Hotten’s dissent on the issue of whether the trial court abused its
    discretion in admitting Mr. McVeigh’s unqualified opinion, but not as to harmless error.
    In that regard, I agree with the Majority that, to the extent that Mr. McVeigh’s opinion was
    inadmissible, the error would not be harmless.
    5
    See generally David P. Baldwin, et al., A Study of False-Positive and False
    Negative Error Rates in Cartridge Case Comparisons, U.S. DEP’T OF ENERGY (2014)
    (“Ames I”); Stanley J. Bajic, et al., Report: Validation Study of the Accuracy, Repeatability,
    and Reproducibility of Firearm Comparisons, U.S. DEP’T OF ENERGY 1-2 (2020) (“Ames
    II”).
    5
    presented with at least three other studies also supporting the conclusion that the AFTE
    Theory could reliably link bullets to specific guns.6
    The Majority, however, finds limited value in all studies but Ames I and II. In
    discounting these other studies, the Majority relies heavily on criticisms made by the
    PCAST Report.7 PCAST concluded that the foundational validity, and thus reliability, of
    subjective forensic feature-comparison methods such as the AFTE Theory “can only be
    established through multiple independent black box studies[.]” PCAST Report at 106. At
    that time, according to PCAST, the only appropriately designed black box study of firearms
    examination was Ames I.8 Id. at 111. PCAST concluded that, though Ames I supported
    6
    Tasha P. Smith, et al., A Validation Study of Bullet and Cartridge Case
    Comparisons Using Samples Representative of Actual Casework, 61 J. FORENSIC SCIS. 939
    (May 2016) (“Validation Study”); James E. Hamby, et al., A Worldwide Study of Bullets
    Fired From 10 Consecutively Rifled 9MM RUGER Pistol Barrels – Analysis of Examiner
    Error Rate, 64 J. FORENSIC SCIS. 551 (Mar. 2019) (“Worldwide Study”); Jamie A. Smith,
    Beretta barrel fired bullet Validation Study, 66 J. FORENSIC SCIS. 547 (Oct. 2020) (“Bullet
    Validation Study”).
    7
    These “set-based” studies can be described variously as “within-set,” “closed-set,”
    or “set-to-set comparison” studies. The distinguishing characteristic of such studies is that
    determinations are not independent of each other; an examiner’s determination for a sample
    changes the likelihood of a correct response for a subsequent sample. For this reason,
    PCAST discounts these studies. PCAST Report at 106-109.
    8
    Black box studies “measure the accuracy outcomes absent information on how
    they are reached.” Lucas Zarwell and Gregory Dutton, The History and Legacy of the
    Latent Fingerprint Black Box Study, NAT’L INST. OF JUST. 1 (Dec. 2022), available at
    https://nij.ojp.gov/topics/articles/history-and-legacy-latent-fingerprint-black-box-study
    (last accessed June 12, 2023) archived at https://perma.cc/MMS5-3S4P. Accordingly,
    black box studies are often used to assess the accuracy of subjective methods. Here, a
    black box study can measure the accuracy of the AFTE Theory without investigating how
    examiners arrive to conclusions, instead measuring only whether the method produces
    accurate outcomes.
    6
    the reliability of the AFTE Theory, the available evidence at the time “[fell] short of the
    scientific criteria for foundational validity.” Id. at 111. According to PCAST, more studies
    were needed. Id.
    “Foundational validity,” as defined by PCAST, however, is not the legal standard
    by which we evaluate expert testimony under Daubert. PCAST itself acknowledges this
    distinction. PCAST Report at 4 (“Judges’ decisions about the admissibility of scientific
    evidence rest solely on legal standards; they are exclusively the province of the courts and
    PCAST does not opine on them.”). Moreover, PCAST apparently created the term
    “foundational validity” as used in this context; the term began to appear in court opinions
    only after PCAST was published.9 And the requirements for foundational validity were
    developed by PCAST.
    The trial judge was not required to credit PCAST’s notion of foundational validity
    at all, let alone apply it strictly. What’s more, the trial judge was presented with evidence
    expressly challenging positions asserted by PCAST. Specifically, the record included a
    statement by the United States Department of Justice (“DOJ”) sharply disagreeing with
    PCAST in important respects.10 United States Department of Justice Statement on the
    PCAST Report: Forensic Science in Criminal Courts: Ensuring Scientific Validity of
    Feature-Comparison          Methods       (Jan.      13,       2021),      available       at
    9
    A search of Westlaw for “Daubert” and “foundational validity” returns no cases
    from before October 2016.
    10
    The record also included a response to the DOJ Statement from the Innocence
    Project.
    7
    https://www.justice.gov/olp/page/file/1352496/download        (“DOJ     Statement”)     (last
    accessed June 12, 2023).
    Among other things, the DOJ Statement forcefully disagreed with PCAST’s
    conclusion that only “appropriately designed” black box studies could be used to validate
    a scientific method. Id. at 10-12. Although the DOJ did not object to the individual criteria
    that PCAST deemed necessary for such a study, the DOJ disagreed with the rigidity of
    PCAST’s approach to establishing scientific validity. Id. at 11. The DOJ observed that
    “PCAST failed to cite a single authority that supports its sweeping claim that the collective
    and non-severable application of all of these experimental design requirements in multiple
    black box studies is the sine qua non for establishing the scientific validity of forensic
    ‘feature comparison’ methods.” Id. The DOJ also observed that the authorities relied upon
    by PCAST instead supported the proposition “that no single experimental design is either
    essential or required.” Id.; see also 1 DAVID L. FAIGMAN ET AL., MODERN SCIENTIFIC
    EVIDENCE: THE LAW AND SCIENCE OF EXPERT TESTIMONY 66 (2018-2019 ed. 2018)
    (“There is no single way to conduct research to answer a particular question and research
    programs rarely answer factual questions definitively. Thus, there are no ‘perfect’ studies
    or ‘final’ answers in science.”).
    In treating PCAST as near gospel, the Majority applies non-legal and overly
    demanding requirements to what should be, at its core, a “screening” exercise. See
    Daubert, 
    509 U.S. at 589, 596
     (describing a “screening role” for the trial judge). In doing
    so, the Majority discounts to the point of irrelevance a substantial body of useful, if
    8
    imperfect or incomplete, information from which the trial court reasonably concluded that
    the method used by Mr. McVeigh was reliable.
    II.
    A.
    A Hypothetical
    To set the stage for showing that the Ames I and Ames II studies can be understood
    to support the trial court’s ruling, consider the following thought experiment. Suppose you
    know nothing about firearms. You know nothing about the basic principles of forensic
    pattern comparison methods in general or the AFTE Theory in particular. You have never
    touched a gun or bullet, let alone examined one. You have never been in a crime lab. Now,
    you are tested on your ability to determine whether a particular bullet was fired from a
    particular gun.
    The test administrator fires two bullets from each of 10 consecutively manufactured
    handguns. The administrator then gives you two sets of 10 bullets each. One set consists
    of 10 “unknown” bullets—where the source of the bullet is unknown to the examiner—
    and the other set consists of 10 “known” bullets—where the source of the bullet is known.
    You are given unfettered access to a sophisticated crime lab, with the tools, supplies, and
    equipment necessary to conduct a forensic examination. And, like the vocabulary tests
    from grade school requiring you to match words with pictures, you must match each of the
    10 unknown bullets to the 10 known bullets.
    Even though you know that each of the unknowns can be matched with exactly one
    of the knowns, you probably wouldn’t know where to begin. If you had to resort to
    9
    guessing, your odds of correctly matching the 10 unknown bullets to the 10 knowns would
    be one out of 3,628,800.11 Even if you correctly matched five unknown bullets to five
    known bullets and guessed on the remaining five unknowns, your odds of matching the
    remaining unknowns correctly would be one out of 120.12 Not very promising.
    The closed-set and semi-closed-set studies before the trial court—the studies which
    PCAST discounted—show that if you were to properly apply the AFTE Theory, you would
    be very likely to match correctly each of the 10 unknowns to the corresponding knowns.
    See Validation Study; Worldwide Study; Bullet Validation Study.
    Your odds would thus improve from virtually zero (one in 3,628,800) to 100
    percent. Yet according to PCAST, those studies provide no support for the scientific
    validity of the AFTE Theory. PCAST reasons that, in set-based studies, examiners can
    rely on the process of elimination, aided by deductive reasoning. Thus, by affording
    examiners a decisional crutch, PCAST reasons, such studies likely underestimate error
    rates in actual casework.
    Now let’s assume you take a different type of test, one designed in the image of
    Ames I and Ames II. This time, the administrator gives you 30 sets of bullets, with three
    bullets in each set. Within each set of three bullets, two are identified as having been fired
    11
    With 10 unknown bullets and 10 known bullets, the odds of guessing the first pair
    correctly are one out of 10. And if you get the first right, the odds of getting the second
    right are one out of nine. If you get the first two right, the odds of getting the third right
    are one out of eight, and so on. Thus, the odds of matching each unknown bullet to the
    correct known is represented by the following calculation: (1/10) x (1/9) x (1/8) x (1/7) x
    (1/6) x (1/5) x (1/4) x (1/3) x (1/2) x (1/1).
    12
    (1/5) x (1/4) x (1/3) x (1/2) x (1/1).
    10
    from the same gun. Your task is to determine whether the third bullet, the unknown bullet,
    was also fired from the same gun. The administrator, of course, knows the correct answer
    (the “ground truth”). In contrast to the set-based studies, however, you know nothing about
    the source(s) of bullets and the relationship between knowns and unknowns. Thus, your
    answers for each set are independent of each other.
    Assume again that you know nothing about guns or the AFTE Theory. You might
    as well guess or flip a coin to determine if there is a match between the unknown and two
    knowns, which means that you can expect to answer, on average, 15 out of 30 sets correctly.
    But now assume that you are properly trained in the AFTE Theory. You examine
    each of the 30 sets. Suppose you determine that 10 sets lack sufficient information to make
    a conclusive determination of identification or elimination, so you mark 10 sets as
    inconclusive,13 but you reach conclusive determinations for the remaining 20 sets. The
    results of Ames I and II indicate a high likelihood that all 20 of those determinations would
    be correct.
    B.
    The Treatment of Inconclusive Determinations
    But let’s suppose you made one error—a false positive identification—and correctly
    determined the remaining 19 sets. The question then becomes how your error rate should
    13
    “[A] finding of inconclusive is an appropriate answer” if “the examiner does not
    find sufficient matching detail to uniquely identify a common source for the known and
    questioned samples, and there are no class characteristics such as caliber that would
    preclude the cases as having been fired from the same-source firearm[.]” Ames I at 6.
    11
    be calculated, which turns on how your 10 inconclusive determinations are treated. This
    issue was heavily debated in the trial court and looms large in the Majority’s analysis.
    The parties have focused on two ways to account for inconclusive determinations.
    The State argues that an inconclusive should be counted neither as a correct response,
    because the examiner failed to obtain the ground truth, nor as an error, because the
    examiner did not make an incorrect conclusive determination. Accordingly, the State
    advocates calculating error rates according to the method used in Ames I and Ames II: to
    include inconclusive determinations (“inconclusives”) in the denominator14 but exclude
    them from the numerator. Applying this method to the example above, the error rate would
    be 1/30, or 3.33 percent.
    Mr. Abruquah’s expert, Professor David Faigman, did not mince words about this
    method, declaring that “in the annals of scientific research or of proficiency testing, it
    would be difficult to find a more risible manner of measuring error.” To Mr. Faigman, the
    issue was simple: in Ames I and II, the ground truth was known, thus “there are really only
    two answers to the test, like a true or false exam[ple].” Mr. Faigman explained that “the
    common sense of it is if you know the answer is either A or B and the person says I don’t
    know, in any testing that I’ve ever seen that’s a wrong answer.” He argued, therefore, that
    inconclusives should be counted as errors. In the above example, under that approach, the
    error rate would be 11 out of 30, or 36.7 percent.
    14
    Under this method, the denominator thus represents the total number of responses.
    12
    The Majority doesn’t expressly choose between the competing views, but its
    analysis favors Mr. Faigman’s approach. As the Majority sees it, an inconclusive should
    be deemed an error if there was sufficient information to make a conclusive
    determination.15 Maj. Op. 44 (“[I]f at least some inconclusives should be treated as
    incorrect responses, then the rates of error in open-set studies performed to date are
    unreliable.”).
    The Majority is skeptical that the inconclusive rates in the studies mirror the rates
    of inconclusive determinations in real field work. Maj. Op. 43. The Majority points to the
    disparity observed in the inconclusive rates between closed-set and open-set studies. Maj.
    Op. 42-43. This concern echoes that of Mr. Faigman, who testified that the disparity
    suggests that “something crazy is going on here” and that inconclusives should thus be
    deemed errors unless “you can demonstrate that they are somehow right.”
    But there is no mystery here. The examiner in closed-set studies knows that the
    unknown bullets match one of the known bullets. Thus, the examiner trained in the AFTE
    Theory can use the process of elimination, aided by deductive reasoning, to make a
    conclusive determination from what would otherwise be an inconclusive finding. In other
    words, by its nature, the closed-set design reduces the rate of inconclusives. In contrast,
    open-set studies do not permit the examiner to use the process of elimination to make a
    conclusive determination, resulting in higher rates of inconclusives. The Majority’s
    15
    In making this argument, the Majority implicitly acknowledges the validity of the
    foundational assumption of the AFTE Theory—that, at least sometimes, bullets and
    cartridges display markings sufficient to match them to a specific source gun.
    13
    concern about the disparity in inconclusive rates between closed-set and open-set tests is
    thus explained away by this fundamental difference in test design.
    The Majority, however, infers from the disparity and what it considers unimpressive
    repeatability and reproducibility results that “whether an examiner chooses ‘inconclusive’
    in a study seems to depend on something other than just the ‘corresponding individual
    characteristics’ themselves.” Maj. Op. 43. The Majority implies that, because the ground
    truth is known in a test environment, the examiner, who makes a living performing these
    examinations, changes his behavior, consciously or otherwise, to minimize error rates by
    over-reporting inconclusives. See 
    id.
     (concluding that the rates of inconclusives reported
    in Ames II “suggest[] that examiners choose ‘inconclusive’ even when it is not a ‘correct’
    response”). This view mirrors Mr. Faigman’s testimony that, in the face of any ambiguity,
    the examiners who participated in the studies “default[ed] to inconclusive because [they]
    know that [they’re] in the business that a false positive has the worst optics associated with
    it.”
    Based on the premise that at least some inconclusives should be treated as errors,
    the Majority declares that the resulting “rates of error in open-set studies performed to date
    are unreliable.” Maj. Op. 43. As an example, the Majority observes that if Inconclusive-
    A responses for non-matching bullets in Ames II were counted as false positives, then the
    false positive rate “would balloon from 0.7% to 10.13%.” 
    Id.
     An “Inconclusive-A”
    determination, you might recall, is the inconclusive level closest to a positive identification.
    So, the Majority’s reasoning goes, if we know the ground truth is a non-match and we treat
    14
    the “almost a match” determination as a match (a false positive), the false positive rate
    increases substantially.
    The logic behind that view escapes me. If an examiner makes an Inconclusive-A
    determination, that means the examiner affirmatively chose not to make a positive
    identification. The examiner’s determination of Inconclusive-A does not necessarily mean
    that he almost made a false positive, as the Majority’s exercise presumes. Rather, Ames II
    instructed examiners to determine a result to be Inconclusive-A when they observed
    “[s]ome agreement of individual characteristics and all discernible class characteristics, but
    insufficient for an identification.” Ames II Report at 100. What’s more, the record lacks
    any evidence that trained and ethical examiners, which undisputedly describes Mr.
    McVeigh, are less concerned about making a false positive in actual field work than in a
    study setting.
    The Majority supposes that “in all the studies conducted to date,” examiners deemed
    some samples inconclusive that they would have instead deemed matches in actual
    casework, reasoning that “the participating examiners knew that (1) they were being
    studied and (2) an inconclusive response would not be counted as incorrect.” Maj. Op. 43.
    The record, however, does not establish either assertion.
    First, the examiners themselves were not the primary object of these studies. Rather,
    the AFTE Theory was. In Ames II, the examiners were told that the study would “assess[]
    the accuracy, repeatability, and reproducibility of decisions involving forensic
    comparisons,” that “[r]eported results and findings [would] be completely anonymized,”
    and that “[i]ndividual results [would] not be disclosed to the subjects or their employers.”
    15
    Thus, in contrast to proficiency testing, which evaluates the performance of individual
    examiners, examiners took no personal reputational or professional risk by participating
    and providing responses that faithfully reflected their determinations. So even though
    examiners knew they were participating in a study, they did not think “they” were the object
    of study.16
    Second, the written instructions provided to examiners in Ames II did not indicate
    how inconclusives would be accounted for in the study results. As a result, there’s no basis
    to conclude, as the Majority does, that the examiners knew how the authors of Ames II
    would account for inconclusives in the final analysis.
    C.
    How to Calculate Error Rates
    This brings us to a different way of looking at error rates, one that received no
    consideration by the Majority but should mitigate much of its concerns about the reliability
    of the error rates in Ames I and Ames II. I am referring to calculating error by excluding
    inconclusives from both the numerator and the denominator. This measure calculates the
    rate of error of false positives and/or false negatives against only conclusive
    determinations. Under this measure, in our example above, the error rate would be one out
    16
    Relatedly, the examiners would not likely have been concerned about the viability
    of their profession when they participated in Ames II. Even if the results of the study did
    not support the admissibility of unqualified conclusive opinions at trial, there is no basis to
    believe that participating examiners would have perceived a threat to their livelihood of
    assisting criminal investigations, particularly in light of the numerous studies supporting,
    at the very least, the viability of the AFTE Theory as an investigative tool.
    16
    of 20, or five percent. Thus, by excluding inconclusive determinations altogether, the error
    rate in our example increases from 3.33 percent to five percent.
    Before explaining the merits of this calculation, I point out that Mr. Faigman, as he
    so often did when disagreeing with an opposing view, chastised this way of calculating
    error rates, saying, “that's completely crazy from any testing regime that I have ever heard
    of and from just common sense.” But, contrary to Mr. Faigman’s unsupported criticism,
    excluding inconclusives from the numerator and denominator accords with both common
    sense and accepted statistical methodologies. It is also supported by competent evidence
    in the record.
    As a matter of common sense, the measure used to calculate error rates should align
    with the specific purpose of our inquiry. The Majority notes that “Mr. Abruquah does not
    challenge all of Mr. McVeigh’s testimony or that firearms identification is sufficiently
    reliable to be admitted for some purposes.” Maj. Op. 38. The Majority correctly defines
    the issue in narrow terms: our “task is to assess, based on the information presented to the
    circuit court, whether the AFTE Theory can reliably support an unqualified opinion that a
    particular firearm is the source of one or more particular bullets.” Id. at 38. Put another
    way, here, we are not concerned with the likelihood that application of the AFTE Theory
    will, in actual field work, conclusively tell us whether or not a specific bullet was fired
    from a specific gun. Rather, we are concerned with the likelihood that when application of
    the AFTE Theory yields a conclusive determination—here, a positive identification—the
    17
    result is correct.17 PCAST framed the issue similarly, albeit in statistical language,
    explaining that “[f]orensic feature-comparison methods typically aim to determine how
    likely it is that two samples came from the same source,” PCAST Report at 151, and that
    false positives are “especially important because [they] can lead directly to wrongful
    convictions,” Id. at 50 (footnote omitted).
    PCAST identified two accepted measures of accuracy: sensitivity and false positive
    rates. Id. at 50. PCAST defined sensitivity as “the probability that the method declares a
    proposed identification between samples that actually come from the same source” and the
    false positive rate as the “probability that the method declares a proposed identification
    between samples that actually come from different sources.” Id.
    Of critical importance here, PCAST explained that the sensitivity and false positive
    rates can be calculated “based on the conclusive examinations or on all examinations.” Id.
    at 153. PCAST even went a step further and contended that even though both measures
    “are of interest,” false positive rates should be based only on conclusive examinations
    “because evidence used against a defendant will typically be based on conclusive, rather
    17
    That’s not to say that false eliminations should not concern us. A false elimination
    could also lead to the conviction of an innocent person. See PCAST Report at 44 n.94.
    That could happen if, for example, an individual commits a homicide with a firearm and is
    not charged due to a false elimination, and instead another person is wrongly convicted.
    The implications of this observation, in my view, support the admissibility of conclusive
    determinations of both kinds, i.e., identifications and eliminations. As explained below,
    the rates of false eliminations are low, and repeatability and reproducibility data show that
    when a false elimination is made, the error is typically corrected on a second examination.
    So, in this hypothetical, the innocent defendant could hire his own firearms examiner, who
    may determine that the other suspect’s gun was the murder weapon. Under the Majority’s
    ruling, that examiner would not be allowed to testify that the other suspect’s gun was the
    murder weapon.
    18
    than inconclusive, determinations.” Id. (“The quantity of most interest in a criminal trial
    is . . . the probability that the samples are from the same source given that a match has been
    declared.”) (cleaned up).18
    So, far from being “crazy,” as Mr. Faigman argued, excluding inconclusives from
    error rate calculations when assessing the reliability of a positive identification is not only
    an acceptable approach, but the preferred one, at least according to PCAST. Moreover,
    from a mathematical standpoint, excluding inconclusives from the denominator actually
    penalizes the examiner because errors accounted for in the numerator are measured against
    a smaller denominator, i.e., a smaller sample size. That’s why the error rate in the above
    example increased from 3.33 percent to five percent.
    This brings us back to Mr. Faigman’s and the Majority’s speculation that the
    examiners in Ames II were biased toward inconclusives. Ames II was conducted in
    response to PCAST, Ames II at 12, and the firearms examiner community was acutely
    aware of PCAST when it was published. Thus, although Mr. Faigman and the Majority
    assume that the participating examiners “knew” that inconclusives would not be counted
    against them and consequently over-relied on them, it is just as likely that examiners
    assumed that inconclusives would be accounted for in the manner advocated by PCAST,
    18
    The authors of Ames II write that “[a]lthough some might propose an inconclusive
    decision as an unsuccessful outcome, or failure (‘error’) to identify a [known match], such
    a decision rightly represents judgment that the comparison presents insufficient
    information (quality and/or quantity of individual characteristics) for a definitive statement
    (minimization of false positive being paramount[).]” Keith L. Monson, et al., Planning,
    design and logistics of a decision analysis study: The FBI/Ames study involving forensic
    firearms examiners, 4 FORENSIC SCI. INT’L.: SYNERGY 1, 5 (Feb. 19, 2022) (footnotes
    omitted). The authors of Ames I agree. Ames I at 6.
    19
    with every inconclusive driving up the error rate. Perhaps that’s why Mr. McVeigh
    rejected the premise that examiners were not penalized for making an inconclusive
    determination. Because Mr. Abruquah and the Majority rely heavily on PCAST, we should
    at least consider how PCAST’s preferred measurement of error rate affects the results of
    Ames I and Ames II. I take up that task next.
    III.
    ACCURACY
    Before turning to the specific error rates reported in Ames I and Ames II, let’s first
    address where to draw the line between an acceptable and unacceptable error rate to
    establish a minimum threshold of reliability. I turn again to PCAST, which posits that “[t]o
    be considered reliable, the [false positive rate] should certainly be less than 5 percent and
    it may be appropriate that it be considerably lower, depending on the intended application.”
    PCAST Report at 152.
    PCAST is not definitive on any topic, let alone the maximum false positive rate for
    a reliability determination. But given the Majority’s reliance on PCAST, PCAST’s
    standard provides a helpful benchmark when assessing whether the trial court appropriately
    exercised its discretion. At bottom, however, trial courts should be left to their own
    discretion to make such a judgment call.
    A.
    Ames I
    Let’s start with Ames I. With respect to matching bullet sets, the number of
    inconclusives was so low that whether inconclusives are included in the denominator
    20
    makes little difference to error rates. Of the 1,090 matching sets, only 11, or 1.01 percent,
    were inconclusives. Of the conclusive determinations, 1,075 were correctly identified as a
    match (“identifications”) and four were incorrectly eliminated (“eliminations”). The four
    false eliminations were committed by three examiners; 215 of the 218 examiners did not
    report any false eliminations. Measured against the total number of matching sets (1,090),
    the false elimination rate was 0.36 percent. Against only the conclusive determinations
    (1,079), the false elimination rate was 0.37 percent.
    The error rates for non-matching bullets vary more significantly if inconclusive
    determinations are excluded from the denominator.             Of 2,178 non-matching sets,
    examiners reported 735 inconclusives for an inconclusive rate of 33.7 percent, 1,421 sets
    as correct eliminations, and 22 sets as incorrect identifications (false positives). The false
    positives were concentrated among a few examiners: 20 of the 22 false positives were made
    by the same five examiners. As a percentage of the total 2,178 non-matching sets, the false
    positive rate was 1.01 percent. As a percentage of the 1,443 conclusive determinations,
    however, the false positive rate was 1.52 percent. Either way, the results show that the risk
    of a false positive is very low, particularly when measured against the five percent
    benchmark recommended by PCAST.
    Combining the results of the matching and non-matching sets is also instructive. Of
    the total number of sets (3,268), 746 were inconclusives, for an inconclusive rate of 22.83
    percent, and 26 were either erroneous eliminations or identifications. Measured against
    the total number of sets, the overall error rate was 0.79 percent. Measured against only
    conclusive determinations, the error rate was 1.03 percent.
    21
    In sum, the results of Ames I show that, with inconclusives either included or
    excluded in the denominator in the error calculation, identifications and eliminations boast
    extremely low error rates.
    B.
    Ames II
    Although PCAST found Ames I to be an appropriately designed black box study,
    PCAST concluded that one such study was not enough to establish the scientific validity
    of the AFTE Theory. PCAST Report at 113. Eric Lander, a co-chair of PCAST and
    President of the Broad Institute of MIT and Harvard when the PCAST Report was
    published, wrote: “With only a single well-designed study estimating accuracy, PCAST
    judged that firearms analysis fell just short of the criteria for scientific validity, which
    requires reproducibility. A second study would solve this problem.” Eric S. Lander, Fixing
    Rule 702: The PCAST Report and Steps to Ensure the Reliability of Forensic Feature-
    Comparison Methods in the Criminal Courts, 86 FORDHAM L. REV. 1661, 1672 (2018).
    Ames II was that second study.
    Matching Bullet Sets
    Because Mr. McVeigh’s testimony linked bullets, not cartridges, to Mr. Abruquah’s
    gun, I will focus on the results of the bullet examinations in Ames II. There were 1,405
    recorded results for matching sets of bullets. Of those, 288 were placed in any one of the
    three inconclusive categories, Inconclusive-A, Inconclusive-B, and Inconclusive-C, for an
    inconclusive rate of 20.50 percent. Of the 1,117 conclusive determinations, 1,076 were
    correct identifications. Measured against the total number of recorded results (1,405), the
    22
    identification rate (sensitivity) was 76.6 percent, which the Majority perceives as low. But,
    when measured against the total conclusive determinations (1,117), the identification rate
    jumps to 96.3 percent, indicating far greater reliability of identifications.
    There were 41 false eliminations. As a percentage of the 1,405 recorded results, the
    false elimination rate was 2.9 percent. As a percentage of only the conclusive results, the
    false elimination rate increased to 3.7 percent—still below PCAST’s recommended five
    percent threshold.
    Non-Matching Bullet Sets
    There were 2,842 recorded results for non-matching sets, 1,861 were inconclusives,
    for an inconclusive rate of 65.48 percent, and 961 were correct eliminations. Measured
    against the total number of recorded results (2,842), the correct elimination rate was only
    33.8 percent. But measured against only the total number of conclusive determinations
    (981), the correct elimination rate jumps to 97.9 percent—another indication that
    conclusive determinations under the AFTE Theory are reliable.
    Of course, we are most concerned about the risk of false positives—that is, incorrect
    identifications. There were 20 false positives. Measured against the total number of
    recorded results (2,842), the false positive rate was 0.7 percent. Measured against only the
    conclusive determinations, however, the false positive rate increases to 2.04 percent.
    Under either measure, the false positive rate was well below PCAST’s recommended
    threshold of five percent.
    23
    In sum, using PCAST’s preferred method of calculating error rates and its five
    percent threshold for an acceptable error rate, the error rates observed in Ames II show that
    the trial court’s determination of reliability was reasonable.19
    IV.
    REPEATABILITY AND REPRODUCIBILITY
    The Majority focuses on what it perceives to be unimpressive showings of
    repeatability and reproducibility in Ames II. Maj. Op. 43, 48-49. To the Majority, the
    “inconsistent” results in these respects are “troublesome” and undermines the reliability of
    the AFTE Theory. Maj. Op. 45, 48-49.
    Before proceeding, I offer an observation about repeatability and reproducibility:
    consistent results from separate examinations of the same sample, by either the same or a
    different examiner, are not necessarily desirable. Certainly, consistency is good if the
    initial determination is correct. But consistency is undesirable if the initial determination
    is wrong, in which case we would prefer disagreement. That is, we would prefer that the
    same examiner or another examiner get it right the second time rather than repeat the
    mistake. Disagreement with an incorrect determination would increase our confidence that
    a peer review process would catch and correct mistakes, particularly false positives, and
    that the traditional tools for challenging “shaky” evidence—cross-examination, opposing
    19
    I recognize that PCAST acknowledged that an appropriate error rate threshold
    could be lower than five percent, depending on the purpose for which the evidence would
    be used. PCAST Report at 152. But how much lower or higher should be a matter for the
    trial judge to determine.
    24
    expert testimony, and presentation of contrary evidence—would expose errors. Matthews,
    479 Md. at 312 (quoting Daubert, 
    509 U.S. at 596
    ).
    So, as to repeatability and reproducibility rates: (1) the higher the better for initial
    correct identifications and correct eliminations and (2) the lower the better for initial false
    eliminations and false positive identifications. And, because our primary concern is the
    reliability of an identification, we are less concerned whether the initial and subsequent
    examination, by the same or different examiner, yielded the same particular level of
    inconclusives. Thus, the repeatability and reproducibility figures relied upon by the
    Majority, which include all categories (identification, elimination, and three levels of
    inconclusive), do not align well with the specific nature of our inquiry.
    A.
    Repeatability
    Repeatability is the likelihood that the same examiner will make the same
    determination for a particular sample on a subsequent examination. Ames II refers to an
    examiner’s initial examination as “Round One” of the study and that examiner’s
    subsequent examination as “Round Two.”
    Matching Bullet Sets
    As noted by the Majority, the overall repeatability rate was 79.0 percent for
    matching bullets and 64.7 percent for non-matching bullets. The Majority is not impressed
    by these results but doesn’t tell us what levels would, in its view, support reliability. In my
    view, reasonable minds can differ. As for the matching sets, given the wide range of
    responses, identification, elimination, Inconclusive-A, Inconclusive-B, and Inconclusive-
    25
    C, one might reasonably be impressed that, on independent examinations of the same sets
    months apart, examiners reached the same result nearly 80 percent of the time.
    But there is more to glean from the results. The following table reproduces the data
    from Table IX of the Ames II report, with percentages, to show how the Round One results
    for the matching bullet sets were distributed in Round Two. Ames II Report at 38.
    Table IX: Matching Sets (Bullets)20
    Classification on                 Classification on Second Evaluation
    First Evaluation      ID     Inc. A Inc. B Inc. C Elimination Unsuitable21        Total
    ID                     665      27        26        14           8         2        742
    89.62%   3.64% 3.50% 1.89%               1.08%     0.27%
    Inc. A                  31      28        12         6           2         0         79
    39.24%   35.44% 15.19% 7.59%             2.53%     0.00%
    Inc. B                  13      14        45         5           2         2         81
    16.05%   17.28% 55.56% 6.17%             2.47%     2.47%
    Inc. C                   2       3         3         5           3         0         16
    12.50%   18.75% 18.75% 31.25%           18.75%     0.00%
    Elimination              8       7         3         2          13         0         33
    24.24%   21.21% 9.09% 6.06%             39.39%     0.00%
    Unsuitable               1       3         3         0           0         2         9
    11.11%   33.33% 33.33% 0.00%             0.00%    22.22%
    Total                  720      82        92        32          28         6        960
    This table shows that the repeatability rate of a correct identification, which is a
    focus of our inquiry, was 89.62 percent (665/742). Given the subjective nature of the AFTE
    20
    For ease of reference, the numbers and titles of the tables in this dissent
    correspond to the corresponding numbers and titles of the tables in the Ames II Report.
    A determination of “unsuitable” is appropriate when “a comparison can not be
    21
    made due to [the] quality of the provided samples.” Ames II Report at 11.
    26
    Theory, this repeatability rate for correct identifications could reasonably be viewed as an
    indicator of reliability.
    The table also shows what happened to the 77 correct identifications from Round
    One (742 - 665 = 77) that were not repeated in Round Two. Ames II refers to different
    determinations for the same sample as a “paired disagreement.” Sixty-seven of those 77
    (87.0 percent) paired disagreements were placed in an inconclusive category: 27 in
    Inconclusive-A, 26 in Inconclusive-B, and 14 in Inconclusive-C. So, while the change
    from a correct identification reduces the repeatability rate, the different determination
    suggests that examiners exercised caution in making an identification, the determination
    typically most inculpatory to a defendant.        Examiners changed only eight correct
    identifications from Round One to false eliminations in Round Two, which weighs against
    consistency but again suggests the examiners’ tendency to err on the side of caution.
    The table also sheds light on the inconclusives. Of the 960 matching bullet sets
    examined in Round Two, 176 were in one of the inconclusive levels in Round One, 121 of
    which were again in an inconclusive level in Round Two. Individual repeatability rates of
    Inconclusive-A, Inconclusive-B, and Inconclusive-C were 35.44 percent, 55.56 percent,
    and 31.25 percent, respectively. Those rates are a drag on the overall repeatability rate.
    But, if we return to the primary focus of our inquiry—the reliability of a conclusive
    determination—we should be less concerned with movement within the inconclusive
    categories. The version of Table IX below presents the same repeatability results as the
    previous table, but with the inconclusive determinations pooled, that is, we eliminate the
    distinctions between the three inconclusive categories.
    27
    Table IX: Matching Sets (Bullets)
    Classification on Second Evaluation
    Classification on              Inconclusive
    First Evaluation       ID       (pooled)      Elimination Unsuitable   Total
    ID                        665          67              8            2      742
    89.62%       9.03%          1.08%        0.27%
    Inconclusive (pooled)      46         121              7            2     176
    26.14%      68.75%          3.98%        1.14%
    Elimination                8           12             13            0      33
    24.24%      36.36%         39.39%        0.00%
    Unsuitable                 1            6              0            2      9
    11.11%      66.67%          0.00%       22.22%
    Total                     720         206             28            6     960
    When the inconclusives are pooled, the overall repeatability rate increases from 79.0
    percent to 83.4 percent.22 That is because the repeatability rate for pooled inconclusives,
    68.75 percent (121/176), is higher than the individual repeatability rates for the three
    inconclusive categories.23
    An examination of what happened to inconclusives that were not repeated in the
    second round supports reliability. Of the 176 inconclusives in Round One, examiners
    placed 55 (176 - 121 = 55) into a different category in Round Two. Of those 55, 46 were
    correct identifications. So, though the movement from the inconclusive category reduced
    the overall repeatability rate, the vast majority (46/55, or 83.6 percent) of that movement
    resulted in a determination of the ground truth.         Conversely, a comparatively low
    22
    Overall repeatability here is calculated as: (665 paired agreement identifications
    + 121 paired agreement inconclusives + 13 paired agreement eliminations + 2 paired
    agreement unsuitables)/960 = 83.4 percent.
    23
    The results when inconclusives are pooled were available to the trial court, as the
    Ames II Report presented results under various pooling scenarios. Ames II Report passim.
    28
    proportion (7 out of 55) moved into the elimination column. That strong trend toward
    accuracy—the movement from inconclusive to ground truth on an examiner’s second
    attempt—supports the reliability of the AFTE Theory.
    Finally, let’s look at the repeatability of a false elimination. Of the 33 false
    eliminations from the first round, 13 were likewise eliminations in Round Two, a
    repeatability rate of 39.39 percent. Though this reduces the overall repeatability rate, we
    can take solace that examiners did not repeat most of their mistakes, a trend that reflects
    well on the methodology. Drilling down even further, of the 20 false eliminations which
    were not repeated, eight became correct identifications in Round Two, which also speaks
    well of the methodology. And 12 of the false eliminations were judged inconclusive in the
    second round, another shift in the direction of ground truth.
    Non-Matching Bullet Sets
    Without pooling the inconclusive results, the overall repeatability rate for non-
    matching bullets was 64.7 percent. The Majority highlights the disparity between this rate
    and the repeatability rate of 79.0 percent for non-matching sets. Maj. Op. 49.
    The following table reproduces the data from Table IX of the Ames II report, with
    percentages, to show how the Round One results for the matching bullet sets were
    distributed in Round Two. Ames II Report at 38.
    29
    Table IX: Non-Matching Sets (Bullets)
    Classification on               Classification on Second Evaluation
    First Evaluation       ID  Inc. A Inc. B Inc. C Elimination           Unsuitable   Total
    ID                       2     3          6         2          6            0        19
    10.53% 15.79% 31.58% 10.53%           31.58%        0.00%
    Inc. A                   0     52        37        42         27            0       158
    0.00% 32.91% 23.42% 26.58%           17.09%        0.00%
    Inc. B                   5     31       341        98         45            7       527
    0.95% 5.88% 64.71% 18.60%             8.54%        1.33%
    Inc. C                   1     32       109       284         53            1       480
    0.21% 6.67% 22.71% 59.17%            11.04%        0.21%
    Elimination              1     20        35        66        514            4       640
    0.16% 3.13% 5.47% 10.31%             80.31%        0.63%
    Unsuitable               0     0         13         6          4            8        31
    0.00% 0.00% 41.94% 19.35%            12.90%       25.81%
    Total                    9    138       541       498        649           20       1855
    With a focus on the primary inquiry here—the reliability of conclusive
    determinations—we can make several observations. The repeatability rate of a correct
    elimination was 80.31 percent (514/640), significantly higher than the overall repeatability
    rate of 64.7 percent and a stronger indicator of reliability.
    Of the 126 correct eliminations from Round One (640 - 514 = 126) that were not
    repeated in Round Two, 121 of those 126 (96.0 percent) were placed in an inconclusive
    category. This movement shows a caution in making conclusive eliminations that does not
    undermine the reliability of a correct identification. Only one set went from a correct
    elimination to a false positive, showing that the risk of such a flip-flop is low: one out of
    126, or 0.79 percent.
    Now let’s look at repeatability rates for Round One inconclusives. Repeatability
    rates of Inconclusive-A, Inconclusive-B, and Inconclusive-C were 32.91 percent, 64.71
    30
    percent, and 59.17 percent, respectively. Those low repeatability rates drag down the
    overall repeatability rate. But, again, should we really be concerned with the repeatability
    of a particular level of inconclusive, given that the heart of the inquiry here is the reliability
    of a positive identification?
    Let’s see what happens when the three levels of inconclusive determinations are
    pooled, again using the data from Table IX of the Ames II report, with percentage
    calculations added:
    Table IX: Non-Matching Sets (Bullets)
    Classification on Second Evaluation
    Classification on             Inconclusive
    First Evaluation      ID       (pooled)      Elimination Unsuitable        Total
    ID                        2           11             6             0           19
    10.53%      57.89%         31.58%        0.00%
    Inconclusive (pooled)     6         1026            125            8          1165
    0.52%       88.07%         10.73%        0.69%
    Elimination               1          121            514            4           640
    0.16%       18.91%         80.31%        0.63%
    Unsuitable                0           19             4             8           31
    0.00%       61.29%         12.90%       25.81%
    Total                     9         1177            649           20          1855
    For starters, the repeatability rate for inconclusives for non-matching bullets
    improves to 88.07 percent (1026/1165). More importantly, by pooling the inconclusives,
    the overall repeatability rate of all determinations, both inconclusive and conclusive,
    increases from 64.74 percent to 83.56 percent.24 Recall that the Majority noted the
    disparity between the overall repeatability rates of matching bullet sets (79.0 percent) and
    24
    Calculated as: (2 paired agreement identifications + 1026 paired agreement
    inconclusives + 514 paired agreement eliminations + 8 paired agreement unsuitables)/1855
    = 83.56 percent.
    31
    non-matching bullet sets (64.7 percent). When inconclusive results are pooled, however,
    the disparity all but disappears—the repeatability rate for matching sets and non-matching
    sets converge at 83.4 percent and 83.6 percent, respectively. Put differently, the Majority’s
    concern about the disparity between the repeatability rates of matching and non-matching
    bullets can be entirely explained by changes within the three levels of inconclusive
    determinations, which do not compromise the reliability of a conclusive determination.
    Now let’s examine what happened to inconclusives from Round One that were not
    judged inconclusive in Round Two. Of those 139 sets, 125 were correctly determined to
    be an elimination. So, although the movement out of inconclusive reduced the repeatability
    rate, nearly all of that movement (125/139 = 89.9 percent) was to the ground truth. Only
    six of the 139 sets turned into false positives. These shifts indicate reliability.
    Finally, let’s look at the repeatability rate of false identifications or false positives.
    Of the 19 false identifications from Round One, only two remained in that category in
    Round Two (10.5 percent). Thus, examiners were highly unlikely to repeat the most
    prejudicial type of mistake. Of the 17 false positives from Round One that were not
    repeated in Round Two, six were judged correct eliminations and 11 inconclusive.
    B.
    Reproducibility
    Reproducibility is the likelihood that, for a particular sample, a different examiner
    will make the same determination as the initial examiner. Ames II refers to the second
    examiner’s evaluation as “Round Three” of the study.
    32
    As the Majority notes, the overall reproducibility rate was 68.0 percent for matching
    bullets and 31.0 percent for non-matching bullets. Maj. Op. 49 n.26. The Majority is again
    unimpressed by these results. Maj. Op. 49. But, again, if we focus on the reliability of
    conclusive determinations, the data tell a different story, one more supportive of the
    reliability of the AFTE Theory.
    Matching Bullet Sets
    Let’s start with the matching bullet sets. The following table reproduces the data
    from Table XIV of the Ames II report, with percentages, to show how the Round One
    results for the matching bullet sets were distributed in Round Three when examined by
    different examiners. Ames II Report at 46.
    Table XIV: Matching Sets (Bullets)
    Classification by
    First Round                  Classification by Third Round Examiner
    Examiner           ID     Inc. A Inc. B Inc. C Elimination Unsuitable        Total
    ID                     601       38        39       14         12         5       709
    84.77%    5.36% 5.50% 1.97%             1.69%     0.71%
    Inc. A                 42        18         7        6          6         0       79
    53.16%   22.78% 8.86% 7.59%             7.59%     0.00%
    Inc. B                 34        15        22        4          6         0       81
    41.98%   18.52% 27.16% 4.94%            7.41%     0.00%
    Inc. C                  9         7         5        2          6         0       29
    31.03%   24.14% 17.24% 6.90%           20.69%     0.00%
    Elimination            13         5        14        6          3         0       41
    31.71%   12.20% 34.15% 14.63%           7.32%     0.00%
    Unsuitable              3         2         8        1          1         2       17
    17.65%   11.76% 47.06% 5.88%            5.88%     11.76%
    Total                  702       85        95       33         34         7       956
    According to this table, the reproducibility rate of correct identifications—the
    primary focus of our inquiry—was 84.77 percent (601/709). Given the subjectivity of the
    33
    AFTE Theory, that result can be reasonably viewed as an indicator of reliability. At a
    minimum, it renders far less concerning the 68 percent overall reproducibility rate on which
    the Majority focuses. Moreover, there were 108 correct identifications from Round One
    (709 - 601 = 108) that were judged differently in Round Three, 91 (84.3 percent) of which
    went into an inconclusive category.      Meanwhile, only 12 of the 108 became false
    eliminations. Thus, although the movement from a correct identification reduces the
    overall reproducibility rate, the difference indicates the examiners exercised caution, even
    at the expense of making a correct identification. That is another reason we can have
    confidence in conclusive determinations resulting from application of the AFTE Theory.
    Of the 956 bullet sets examined in Round Three, 189 were inconclusive in both
    Round One and Round Three.           Individual reproducibility rates of Inconclusive-A,
    Inconclusive-B, and Inconclusive-C were 22.78 percent, 27.16 percent, and 6.90 percent,
    respectively. Those low rates reduced the overall reproducibility rate for matching bullets.
    But the following table, also drawn from Table XIV of the Ames II report, shows
    what happens to the reproducibility results presented above when the three levels of
    inconclusive are pooled:
    34
    Table XIV: Matching Sets (Bullets)
    Classification by              Classification by Third Round Examiner
    First Round                 Inconclusive
    Examiner            ID       (pooled)      Elimination Unsuitable    Total
    ID                       601          91            12            5        709
    84.8%      12.8%           1.7%         0.7%
    Inconclusive (pooled)     85          86            18            0       189
    45.0%      45.5%           9.5%         0.0%
    Elimination               13          25             3            0        41
    31.7%      61.0%           7.3%         0.0%
    Unsuitable                3           11             1            2        17
    17.6%      64.7%           5.9%        11.8%
    Total                    702         213            34            7       956
    When inconclusive results are pooled, the reproducibility rate of inconclusives
    improves to 45.50 percent (86/189). And more importantly, the reproducibility rate of all
    determinations, both inconclusive and conclusive, increases from 67.8 percent to 72.4
    percent.25
    Let’s examine what happened to the inconclusives from Round One that were not
    inconclusive in Round Three. Of the 189 inconclusives from Round One, subsequent
    examiners placed 103 into a different category in Round Three. Of those 103, 85 became
    a correct identification. So, although that change dragged down the reproducibility rate,
    most of that movement (85/103 = 82.5 percent) produced the ground truth. Conversely, 18
    of the 103 were incorrectly judged eliminations.
    This table also shows that subsequent examiners reproduced only three of the 41, or
    7.32 percent, of the false eliminations in Round One. Here, the failure to reproduce a result
    25
    Calculated as: (601 paired agreement identifications + 86 paired agreement
    inconclusives + 3 paired agreement eliminations + 2 paired agreement unsuitables)/956 =
    72.4 percent.
    35
    is welcome; that subsequent examiners were unlikely to reproduce the mistake of the first
    examiner should be viewed favorably. Moreover, it shows that most of the time, a false
    elimination is discernible, which means that a rigorous peer review process and the
    traditional tools for challenging expert testimony, cross-examination and opposing experts,
    are likely to be effective.
    Non-Matching Bullet Sets
    The following table reproduces the data from Table XIV of the Ames II report, with
    percentages, to show how the Round One results for the non-matching bullet sets were
    distributed by different examiners in Round Three. Ames II Report at 46.
    Table XIV: Non-Matching Sets (Bullets)
    Classification by
    First Round                   Classification by Third Round Examiner
    Examiner           ID      Inc. A Inc. B Inc. C Elimination Unsuitable      Total
    ID                      0          5         8        5          1         0     19
    0.00%     26.32% 42.11% 26.32%           5.26%     0.00%
    Inc. A                  1         15        58       33         60         0     167
    0.60%      8.98% 34.73% 19.76%          35.93%     0.00%
    Inc. B                  5         61       180      125        159        10     540
    0.93%     11.30% 33.33% 23.15%          29.44%     1.85%
    Inc. C                  2         35       134      114        142         4     431
    0.46%      8.12% 31.09% 26.45%          32.95%     0.93%
    Elimination             1         71       162      193        274         0     701
    0.14%     10.13% 23.11% 27.53%          39.09%     0.00%
    Unsuitable              0          1        13        5          9         0     28
    0.00%      3.57% 46.43% 17.86%          32.14%     0.00%
    Total                   9        188       555      475        645        14    1886
    Of the 2,842 recorded results for non-matching bullets, 1,886 sets were examined
    by a different examiner in Round Three, including 19 of the 20 false identifications from
    Round One. Of these 701 correct eliminations from Round One, different examiners again
    36
    correctly eliminated 274 in Round Three, for a reproducibility rate of 39.09 percent
    (274/701), while placing the remaining 427 into another category.
    Let’s examine the 427 correct eliminations that were not again eliminated by the
    second examiner. Of that total, 426 (99.8 percent) were judged inconclusive by the
    subsequent examiner, which again indicates that examiners were generally cautious about
    making conclusive determinations.
    Only one set moved from a correct elimination to a false identification, indicating
    that it is very unlikely that different examiners, when independently examining non-
    matching sets, would reach opposite conclusive determinations. This finding supports the
    notion that through cross-examination and opposing experts, a rare false positive by the
    State’s expert can be neutralized.
    Now let’s look at the reproducibility rate of inconclusives. Of the 1,886 sets
    examined in Round Three, 1,138 were placed in one of the three inconclusive levels in
    Round One. Of those, 755 were again judged inconclusive in Round Three by a different
    examiner. Individual reproducibility rates of Inconclusive-A, Inconclusive-B, and
    Inconclusive-C were 8.89 percent, 33.33 percent, and 26.45 percent, respectively. Those
    rates drag down the overall reproducibility rate for non-matching bullets.
    The following table illustrates what happens to the results if we pool the three
    inconclusive levels:
    37
    Table XIV: Non-Matching Sets (Bullets)
    Classification by              Classification by Third Round Examiner
    First Round                 Inconclusive
    Examiner            ID      (pooled)      Elimination Unsuitable    Total
    ID                         0          18              1           0        19
    0.00%      94.74%          5.26%       0.00%
    Inconclusive (pooled)      8         755            361          14       1138
    0.70%      66.34%         31.72%       1.23%
    Elimination                1         426            274           0       701
    0.14%      60.77%         39.09%       0.00%
    Unsuitable                 0          19              9           0        28
    0.00%      67.86%         32.14%       0.00%
    Total                      9        1218            645          14       1886
    The overall reproducibility rate of the pooled inconclusive determinations for non-
    matching bullets is 66.34 percent (755/1138), a dramatic increase from the reproducibility
    rates of the individual levels of inconclusive.       And the reproducibility rate of all
    determinations, both inconclusive and conclusive, increases from 30.9 percent to 54.6
    percent.26
    The results evidence greater reliability if we examine the 383 inconclusives from
    Round One that were not deemed inconclusive in Round Three. Of the 383 inconclusives,
    361 moved from the inconclusive column in Round One to elimination in Round Three.
    So, though the migration out of inconclusive reduced the reproducibility rate for the non-
    matching bullet sets, nearly all (361/383 = 94.2 percent) moved in favor of the ground truth
    of elimination—the best possible directional change for non-matching bullets. Conversely,
    only eight sets of the 383 were incorrectly moved to the identification column. These shifts
    26
    Calculated as: (0 paired agreement identifications + 755 paired agreement
    inconclusives + 274 paired agreement eliminations + 0 paired agreement unsuitables)/1886
    = 54.6 percent.
    38
    between rounds reduced the overall reproducibility rate but increases confidence that the
    traditional tools for contesting expert testimony would be effective.
    Finally, and most importantly for this case, let’s look at the reproducibility rate of a
    false identification. There were 19 false identifications from Round One that were
    reviewed by a subsequent examiner, and not a single examiner reproduced the initial
    examiner’s mistake. Of those 19 sets, one was correctly placed in the elimination column
    and the other 18 were deemed inconclusive.
    In sum, the accuracy rates from Ames I and Ames II show that the risk of a false
    positive is both low and concentrated among a small number of examiners.                  The
    reproducibility results indicate that a subsequent examiner will very likely catch the rare
    false positive. Of the 1,886 sets of non-matching bullets that were reviewed by two
    different examiners, none were twice judged false positives.
    C.
    Recap
    Let’s recap the foregoing analysis.         By focusing on the repeatability and
    reproducibility rates of the primary issue before us, the reliability of conclusive
    determinations for bullets, and by pooling the three levels of inconclusive results, we can
    make the following observations:
    1. Repeatability rates of correct conclusive determinations were substantially higher
    than overall repeatability rates. While the overall repeatability rate for matching
    bullets was 79.0 percent and 64.7 percent for non-matching bullets, it was 89.6
    percent for correct identifications (true positives) and 80.3 percent for correct
    eliminations (true negatives).
    39
    2. Repeatability rates of incorrect conclusive determinations were much lower: 10.5
    percent for false identifications (false positives) and 39.4 percent for false
    eliminations (false negatives). Low rates of repeatability of incorrect conclusive
    results are, of course, preferred, because they indicate that mistakes are likely to be
    caught upon review.
    3. When inconclusives are pooled, the overall repeatability rate for matching bullets
    improves from 79.0 percent to 83.4 percent. For non-matching bullets, the rate
    improves from 64.7 percent to 83.6 percent. Thus, by pooling inconclusives, the
    disparity noted by the Majority in repeatability between matching and non-matching
    bullets disappears.
    4. Even when examiners were inconsistent with themselves between rounds, their
    responses were not far apart. Of the correct identifications from Round One that
    were not again judged to be identifications, 87.0 percent were judged inconclusive
    in Round Two. Of the correct eliminations from Round One that were not again
    judged to be eliminations, 96.0 percent were judged inconclusive in Round Two.
    This indicates that examiners exercised caution in making conclusive
    determinations.
    5. Where examiners made an inconclusive determination in Round One but a
    conclusive determination in Round Two, they trended strongly toward ground truth,
    an indicator of reliability:
    i.   Of the matching bullets that examiners initially judged inconclusive in
    Round One but judged differently in Round Two, examiners made a
    correct identification 83.6 percent of the time in Round Two.
    ii.   Of the non-matching bullets that examiners initially judged inconclusive
    in Round One but judged differently in Round Two, examiners made a
    correct elimination 89.9 percent of the time in Round Two.
    6. Reproducibility rates of correct conclusive determinations were substantially higher
    than overall reproducibility rates. While the overall reproducibility rate for
    matching bullets was 67.8 percent and 30.9 percent for non-matching bullets, it was
    84.8 percent for correct identifications (true positives) and 39.1 percent for correct
    eliminations (true negatives).
    7. Reproducibility rates of incorrect conclusive determinations were much lower: 5.3
    percent for false Identifications (false positives) and 7.3 percent for false
    eliminations (false negatives). Low rates of reproducibility are preferred here
    because they indicate that mistakes are likely to be caught upon review.
    40
    8. When inconclusives are pooled, the overall reproducibility rate for matching bullets
    improves from 67.8 percent to 72.4 percent. For non-matching bullets, the rate
    improves from 30.9 percent to 54.6 percent. Thus, by pooling inconclusives, the
    disparity in reproducibility rates between matching and non-matching bullets
    decreases substantially.
    9. Even when examiners were inconsistent with each other, their responses were not
    far apart. Of the correct identifications from Round One that were not again judged
    identifications, 84.3 percent were judged inconclusive in Round Three, another
    indication that the examiners exercised caution in making conclusive
    determinations. Of the correct eliminations from Round One that were not again
    judged eliminations, 99.8 percent were judged inconclusive in Round Two, again
    indicating caution.
    10. When an examiner made an inconclusive determination in Round One but a
    subsequent examiner made a conclusive determination in Round Three, the
    subsequent examination trended strongly toward ground truth, an indicator of
    reliability:
    i. Of the matching bullets that examiners initially judged inconclusive in
    Round One but judged differently in Round Two, examiners made a correct
    Identification 82.5 percent of the time in Round Two.
    ii. Of the non-matching bullets that examiners initially judged inconclusive in
    Round One but judged differently in Round Two, examiners made a correct
    elimination 94.2 percent of the time in Round Two.
    These findings and conclusions support a few takeaways: (1) examiners are not
    perfectly consistent, either with themselves or others, which is neither surprising nor
    disqualifying for a subjective pattern-matching discipline; (2) inconsistencies typically
    appear at the margins between two adjacent categories, showing that even where there is
    some “madness,” there is also “method”; (3) the vast majority of inconsistencies do not
    prejudice the defendant but instead reflect caution by examiners in making conclusive
    determinations; (4) subsequent review by the same examiner, and especially by a different
    41
    examiner, is likely to catch errors and steer toward ground truth; and (5) independent
    examinations by two examiners almost never both produce false positives.
    As we said in State v. Matthews, “[v]igorous cross-examination, presentation of
    contrary evidence, and careful instruction on the burden of proof are the traditional and
    appropriate means of attacking shaky but admissible evidence.”            479 Md. at 312
    (quotations omitted).      Here, Mr. Abruquah could have called a competing firearms
    examiner to challenge Mr. McVeigh’s opinions. Instead, he called two experts to opine
    solely on the reliability of the methodology, not Mr. McVeigh’s analysis and conclusions.
    Similarly, when cross-examining Mr. McVeigh at trial, defense counsel did challenge Mr.
    McVeigh’s analysis of the specimens recovered from the crime scene but focused instead
    on the reliability of the methodology generally. If Ames II tells us anything, it’s that if a
    false positive is made, another trained examiner will be able to, at a minimum, poke
    substantial holes in the initial examiner’s analysis. That Mr. Abruquah made no such effort
    at trial is, in my view, telling.27
    27
    This is not, as the Majority asserts, Maj. Op. 52 n.28, a criticism of Mr. Abruquah,
    but rather an observation. The Majority’s analysis hinges on the proposition that when
    making a positive identification, examiners show greater caution in studies than in field
    work. If this were a case in which the examiner was less cautious than was warranted by
    the facts and made a positive identification based on ambiguous or insufficient markings,
    one could reasonably expect that Mr. Abruquah would have attempted to expose such
    weaknesses in the examiner’s analysis through cross-examination or his own expert. I am
    not criticizing Mr. Abruquah for not doing so, but rather inviting the reader to draw
    reasonable inferences from the fact that he did not. Further, the Majority states that the
    record “contains no support for th[e] proposition” that “that there are firearms examiners
    whose services were readily available to Mr. Abruquah, i.e., who are willing and able to
    take on work for criminal defendants in such cases.” Id. However, the website for The
    Association of Firearm and Tool Mark Examiners—the same website to which the
    42
    D.
    Additional Evidence
    Not only does the Majority, in my view, fail to appreciate that Ames II has shown
    the AFTE Theory to be generally reliable, but the Majority also discounts specific
    standards and controls employed here by the Firearms Examination Unit of the Prince
    George’s County Police Department Forensic Science Division (“FEU”). Those standards
    and controls, which were presented at trial, reduce the risk of error. Two central elements
    of those protocols are examiner independence and peer review, which I discuss briefly
    here.
    At the Frye-Reed hearing, Mr. McVeigh identified protocols that the FEU follows
    to ensure examiners are independent and unbiased: examiners do not participate in
    investigations, read narratives of crime reports, or discuss cases with detectives. Mr.
    McVeigh affirmed that those protocols were followed in this case. He received unknown
    bullets and the Taurus revolver, knowing they were collected as part of the same case, and
    was asked to determine whether the firearm fired the bullets. He also received a report that
    included two paragraphs stating only the date, time, and location of the incident, and that
    officers “located the victim unresponsive in the residence suffering from an apparent
    gunshot wound.”
    Majority cites several times—has an “Expert Referral List” for “individuals requesting the
    assistance of a firearms/tool marks examiner in private casework.” Expert Referral, AFTE,
    https://afte.org/resources/expert-referral (last visited June 13, 2023).
    43
    The FEU also requires peer review, which includes technical and administrative
    review of all cases. Technical review consists of a second examiner reviewing “all forensic
    conclusions,” including “all bench notes, data, and other information that the examiner
    employs to form an opinion[.]” To be sure, technical review is not a blind second opinion,
    but it is nonetheless a form of peer review. In administrative review, the FEU manager or
    designee reviews all of the examiner’s forensic conclusions.
    Here, Mr. McVeigh’s identification work was reviewed by another examiner, who
    approved of his conclusions. Is that a perfect check against the danger of a false positive?
    No. But it is a check, and the efficacy of that safeguard is not a function of the reliability
    underlying methodology—the focus of our inquiry—but rather of the competence and skill
    of individual examiners. The Majority dismisses these procedural safeguards.28
    VI.
    EXAMINERS’ ABILITY TO DISTINGUISH INDIVIDUAL CHARACTERISTICS
    The Majority acknowledges, without fully embracing, the underlying premise of the
    AFTE Theory—that firearms produce distinctive markings on bullets and cartridges
    (“individual characteristics”), and that examiners can identify those markings. Maj. Op.
    42 (finding “strong support for the propositions that: (1) firearms produce some unique
    collections of individual patterns and markings on bullets and cartridges they fire; and
    28
    Citing testimony from Dr. James E. Hamby, the Majority implies that the peer
    review process is a pro forma rubber stamp of the initial determination. Maj. Op. 46. The
    Majority may draw its own conclusions from Dr. Hamby’s testimony, but having reviewed
    that same testimony carefully, in my view, a trial court could have reasonably drawn
    different conclusions.
    44
    (2) such collections of individual patterns and markings can be reliably identified [under
    certain conditions]”) (footnote omitted).
    The Majority, however, raises the prospect that examiners can reliably identify
    individual characteristics only “when subclass characteristics are removed from the
    equation.” 29 Id. The Majority expresses particular concern with the apparent absence of
    published standards or controls guiding examiners on how to distinguish individual from
    subclass characteristics. Maj. Op. 49 (“The lack of standards and controls is perhaps most
    acute in discerning whether a particular characteristic is a subclass or an individual
    characteristic.”).
    The Majority, however, discounts studies showing that examiners can indeed make
    reliable determinations despite the risk of subclass carryover. Though published standards
    and controls would certainly be helpful, we should not ignore evidence that examiners
    make correct determinations in practice. Indeed, the concept of a black box study is
    premised on the assumption that when a process is not itself testable, we should study the
    accuracy of outcomes.
    Before discussing these studies, I must clarify a point relating to study design. A
    first category of studies has controlled for subclass characteristics, i.e., eliminated the risk
    of subclass carryover, to determine only (1) whether firearms produce individual
    characteristics and (2) whether examiners can reliably identify those individual
    Subclass characteristics “are those shared by a group of firearms made using the
    29
    same tools, such as those made in the same production run at a facility.” Maj. Op. 49.
    45
    characteristics. See, e.g. Bullet Validation Study at 3.30 As the Majority recognizes, these
    studies show that when the risk of subclass carryover is controlled for or eliminated,
    examiners can reliably identify individual characteristics. Maj. Op. 42 & 42 n.23.
    A second category of studies assumes that examiners can identify individual
    characteristics and instead assesses whether examiners can reliably do so when there is a
    risk of subclass carryover. In these studies, similar subclass characteristics are likely
    present, but examiners do not know anything about the weapons used. Accordingly,
    examiners cannot assume that certain shared characteristics are subclass and thereby
    disregard them for purposes of individual determinations.
    30
    In the Bullet Validation Study, Jamie Smith of the Firearms Examination Unit of
    Prince George’s County Police Department Forensic Science Division sought to replicate
    Dr. Hamby’s consecutive gun study while also introducing elements of open-set design.
    Here, examiners received 15 known samples fired from consecutively manufactured
    Beretta pistol barrels and 20 unknown samples. All samples used the same type of
    ammunition. The test administrators verified through inspection that the weapons
    produced no subclass characteristics, of which participants were made aware.
    Of the unknown samples, some were also fired from the known Beretta barrels,
    while others were fired from other pistols of similar characteristics. This intentional
    mismatch between the firearms used for known and unknown samples introduced the
    possibility that unknown samples would not match any of the known samples and that,
    consequently, examiners could not count on using the process of elimination. And, because
    these tests were designed to possibly include multiple unknowns from the same source, the
    study abandoned the one-to-one relationship between known and unknown samples, which
    was characteristic of many closed studies.
    Though this study was not fully open-set and had other design limitations, it is hard
    to ignore that only seven false identifications were reported, six of which the test
    administrators reported as resulting from typos. If those alleged typos were indeed typos,
    then the false positive rate was just 0.07 percent. Even if those alleged typos were treated
    as false positives, the false positive rate was just 0.47 percent.
    46
    At least two studies of this variety were introduced at trial. Both found that subclass
    characteristics did not undermine examiners’ ability to reliably identify individual
    characteristics. First, Dr. Hamby tested examiners’ ability to identify bullets fired from ten
    consecutively manufactured pistol barrels, which were expected to share subclass
    characteristics. See generally Worldwide Study.31 Examiners were not provided any
    information about the barrels. Id. Dr. Hamby observed that “[e]rrors due to subclass
    characteristics, which one could conjecture would be a significant issue when
    consecutively rifled barrels are involved, have not been a problem for the examiners,”
    concluding that “there are identifiable features on the surface of bullets that may link them
    to the barrel that fired them.” Id. at 556.
    31
    Dr. Hamby worked with others to develop an ongoing study that tested the
    examiners’ ability to identify bullets fired from ten consecutively manufactured Ruger P-
    85 9mm pistol barrels. A total of 697 examiners from 32 countries participated. Each test
    set included a set of 20 known bullets, two fired from each of the ten barrels, and 15
    unknown bullets, comprised of one to three bullets from each of the barrels.
    To be sure, the modified closed-set design of this study limits its value, despite Dr.
    Hamby’s introduction of greater uncertainty by abandoning the one-to-one known-to-
    unknown relationship of past studies (the so-called “Sudoku style” test). Regardless, the
    results cannot be ignored: examiners correctly matched all but eight of the 10,455 unknown
    bullets to the known match. Examiners reported inconclusive determinations on the
    remaining eight and made no misidentifications.
    The authors concluded that “there are identifiable features on the surface of bullets
    that may link them to the barrel that fired them” and that shared subclass characteristics
    did not confound an examiners’ ability to draw accurate conclusions. The Majority might
    reasonably disagree with these conclusions, but, by the same token, a trial judge would not
    be unreasonable to place credence in them when determining threshold reliability under
    Daubert.
    47
    The second study, Ames II, compared examiners’ performance on samples with
    similar subclass characteristics against their performance on samples that likely had distinct
    subclass characteristics. Each sample was the same type of ammunition and fired from a
    weapon of the same make and model.                Examiners were not made aware of the
    characteristics of the weapons used in the study.
    Researchers assessed, among other things, performance with respect to two
    variables: manufacturing run and sequential groups within a single manufacturing run.
    Guns manufactured in the same run, which were produced by the same tool, would
    presumably produce greater shared subclass characteristics than guns manufactured by
    different tools in different runs. Similarly, guns manufactured in the same group within a
    single manufacturing run would presumably produce greater shared subclass
    characteristics than those manufactured in different groups within the same manufacturing
    run.32
    Examiners performed somewhat better overall, with lower rates of false positives,
    for guns from different manufacturing runs. Ames II at 56-67. The same was observed for
    guns from different groups within a single manufacturing run. Id. These observations
    suggest that samples featuring different subclass characteristics might be “easier” to
    correctly determine than those with shared subclass characteristics.
    The authors nonetheless concluded that examiners’ responses for bullets did not, as
    a whole, differ in a statistically significant way between same-run and different-run
    32
    Researchers performed this analysis for only the Beretta group of guns.
    48
    samples. Id. For cartridges, though, examiners’ responses did meaningfully differ,
    attributable mostly to differences in elimination determinations. Id. Responses did not
    meaningfully differ between same-group and different-group samples within the same
    manufacturing run for bullets or cartridges. Id. [italicized the word within]
    Notably, the false positive rates within any of the categories—cartridges included—
    ranged from 0.38 percent to 1.14 percent.33 Id. Thus, even though examiners may have
    reported false positives more frequently for certain categories of guns, that the highest false
    positive rate was just 1.14 percent paints a picture of a reliable discipline.
    From these studies, the trial court could have reasonably concluded that, despite the
    risk of subclass carryover, the AFTE Theory is sufficiently reliable to admit Mr.
    McVeigh’s proffered testimony. Additionally, the trial court could have credited testimony
    by the State’s experts acknowledging the risk of subclass carryover but emphasizing the
    caution examiners exercise to protect against it.
    CONCLUSION
    The Majority, “misunderstand[ing] Daubert to demand unassailable expert
    testimony,” United States v. Mooney, 
    315 F.3d 54
    , 63 (1st Cir. 2002), misses the forest for
    the trees. The trial court’s task is not to ensure that an expert’s conclusion is “correct,” but
    only “that the expert’s conclusion has been arrived at in a scientifically sound and
    methodologically reliable fashion.” 
    Id.
     (quoting Ruiz–Troche v. Pepsi Cola of P.R.
    Bottling Co., 
    161 F.3d 77
    , 85 (1st Cir. 1998)).
    33
    This data is taken from Table XXIII and Table XXIV in Ames II.
    49
    Although I recognize the substantial deference that jurors may accord to experts,
    particularly on forensic, technical, and scientific matters, we are dealing with the
    admissibility of expert opinion, and opinion, by definition, carries a risk of error or
    inaccuracy. This Court’s job is not to inoculate the jury from all risk of inaccurate expert
    opinion; to do so would be “overly pessimistic about the capabilities of the jury and of the
    adversary system generally.” Daubert, 
    509 U.S. at 596
    . On this basis, courts have admitted
    expert opinions on imperfect subjective methods, such as handwriting analysis and coin-
    grading. See Mooney, 
    315 F.3d at 61-63
     (allowing handwriting expert to opine that
    defendant authored specific letters, despite evidence that handwriting examiners had a
    potential rate of error of 6.5 percent); United States v. Romano, 
    794 F.3d 317
    , 330-33 (2d
    Cir. 2015) (allowing testimony on grades of coins).
    Indeed, the sort of extensive statistical investigation that the Majority and I engage
    in here is precisely what Daubert and Rochkind told courts not to do. Contrary to the
    admonitions of those Courts, this Court unwisely assumes the role of “amateur scientist,”
    see Rochkind, 471 Md. at 33, in our “exhaustive search for cosmic understanding[,]”
    Daubert, 
    509 U.S. at 597
    .
    None of the foregoing is to suggest that the Majority’s reasoning is irrational or
    unreasonable, or that admitting the testimony was the only correct decision. Rather, I
    contend that the trial court made a reasonable decision supported by the evidence, and the
    fact that others may disagree merely signifies to me that on this difficult topic, reasonable
    minds can and do differ. That, in my view, is not the stuff of abuse of discretion.
    Respectfully, I therefore dissent.
    50