Stupp Corporation v. United States ( 2021 )


Menu:
  • Case: 20-1857    Document: 61     Page: 1   Filed: 07/15/2021
    United States Court of Appeals
    for the Federal Circuit
    ______________________
    STUPP CORPORATION, A DIVISION OF STUPP
    BROS., INC., WELSPUN TUBULAR LLC USA,
    IPSCO TUBULARS, INC., MAVERICK TUBE
    CORPORATION,
    Plaintiffs
    v.
    UNITED STATES,
    Defendant-Appellee
    HYUNDAI STEEL COMPANY,
    Defendant
    SEAH STEEL CORP.,
    Defendant-Appellant
    ______________________
    2020-1857
    ______________________
    Appeal from the United States Court of International
    Trade in Nos. 1:15-cv-00334-CRK, 1:15-cv-00336-CRK,
    1:15-cv-00337-CRK, Judge Claire R. Kelly.
    ______________________
    Decided: July15, 2021
    ______________________
    ROBERT R. KIEPURA, Commercial Litigaton Branch,
    Civil Division, United States Department of Justice, Wash-
    ington, DC, argued for defendant-appellee.            Also
    Case: 20-1857    Document: 61      Page: 2    Filed: 07/15/2021
    2                                   STUPP CORPORATION   v. US
    represented by CLAUDIA BURKE, JEFFREY B. CLARK, JEANNE
    DAVIDSON; REZA KARAMLOO, Office of the Chief Counsel for
    Trade Enforcement & Compliance, United States Depart-
    ment of Commerce, Washington, DC.
    JEFFREY M. WINTON, Winton & Chapman PLLC, Wash-
    ington, DC, argued for defendant-appellant.
    ______________________
    Before TARANTO, BRYSON, and CHEN, Circuit Judges.
    BRYSON, Circuit Judge.
    Appellant SeAH Steel Corporation appeals from a de-
    cision of the Court of International Trade (“the Trade
    Court”) affirming a final determination of the United
    States Department of Commerce in an antidumping duty
    investigation. In that investigation, Commerce assessed
    SeAH a weighted average dumping margin above the de
    minimis threshold, which subjected SeAH to antidumping
    duties. SeAH challenges Commerce’s rejection of portions
    of SeAH’s case brief and various aspects of the analysis
    Commerce used to derive the dumping margin. We affirm
    with respect to the case brief issue and with respect to most
    of SeAH’s challenges to Commerce’s analysis. We vacate
    and remand, however, on the issue of whether it was rea-
    sonable for Commerce to apply a portion of its analysis—
    specifically, the “Cohen’s d test”—to sales data that may
    have been of insufficient size, not normally distributed, and
    lacking roughly equal variances.
    I
    In late 2014, Commerce initiated a less-than-fair-value
    investigation into the importation of welded line pipe from
    the Republic of Korea. See Welded Line Pipe from the Re-
    public of Korea: Preliminary Determination, 80 Fed. Reg.
    29,620 (Dep’t of Commerce May 22, 2015). The investiga-
    tion covered the period from October 1, 2013, through
    Case: 20-1857    Document: 61      Page: 3    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                  3
    September 30, 2014, and focused on sales by two Korea-
    based respondents, SeAH and Hyundai HYSCO.
    Commerce issued a preliminary determination on May
    14, 2015, that SeAH was, or likely was, selling welded line
    pipe in the United States at less than fair value during the
    relevant period. SeAH filed a case brief challenging Com-
    merce’s statistical analysis and citing academic literature
    in support of that challenge. Commerce rejected SeAH’s
    case brief because Commerce found that it violated proce-
    dural regulations governing the filing of new factual infor-
    mation. J.A. 9698–99.
    Commerce issued a final determination on October 13,
    2015. Welded Line Pipe from the Republic of Korea: Final
    Determination, 80 Fed. Reg. 61,366, and accompanying Is-
    sues and Decision Memorandum (Dep’t of Commerce Oct.
    5, 2015) (“Final Memo”), available at https://enforce-
    ment.trade.gov/frn/summary/korea-south/2015-25980-
    1.pdf. In that final determination, Commerce found that
    SeAH had dumped welded line pipe in the United States,
    calculating SeAH’s weighted average dumping margin to
    be above the de minimis threshold for less-than-fair-value
    investigations. Final Determination, 80 Fed. Reg. at
    61,367.
    When calculating a weighted average dumping margin,
    Commerce typically uses the average-to-average compari-
    son method. 19 C.F.R. § 351.414(c)(1); see also 19 U.S.C.
    § 1677f-1(d)(1). That method compares the weighted aver-
    age of the respondent’s sales prices in its home country dur-
    ing the investigation period to the weighted average of the
    respondent’s sales prices in the United States during the
    same period. 19 C.F.R. § 351.414(b)(1). The average-to-
    average method, however, sometimes fails to detect “tar-
    geted” or “masked” dumping, because a respondent’s “sales
    of low-priced ‘dumped’ merchandise would be averaged
    with (and offset by) sales of higher-priced ‘masking’ mer-
    chandise, giving the impression that no dumping was
    Case: 20-1857     Document: 61      Page: 4    Filed: 07/15/2021
    4                                    STUPP CORPORATION    v. US
    taking place.” Apex Frozen Foods Priv. Ltd. v. United
    States, 
    862 F.3d 1337
    , 1341 (Fed. Cir. 2017) (“Apex II”).
    To address the problem of targeted dumping, Congress
    created an exception to the use of the average-to-average
    method. Congress provided that when “(i) there is a pat-
    tern of export prices 1 (or constructed export prices) for com-
    parable merchandise that differ significantly among
    purchasers, regions, or periods of time, and (ii) [Commerce]
    explains why such differences cannot be taken into account
    using [the average-to-average method],” Commerce may
    compare the weighted average of the respondent’s sales
    prices in the home country to the respondent’s individual
    sales prices in the United States. 19 U.S.C. § 1677f-
    1(d)(1)(B). The rationale behind that statutory exception
    is that targeted dumping is more likely to be occurring
    when export prices fit a pricing model that differs signifi-
    cantly among different periods of time, different purchas-
    ers, or different regions of the United States. Apex II, 862
    F.3d at 1347. Commerce refers to the alternative method
    of calculating a weighted average dumping margin as the
    “average-to-transaction” method.            See 19 C.F.R.
    § 351.414(b)(3).
    Congress has not delineated exactly how Commerce is
    to assess whether there is a “‘pattern of export prices . . .
    differ[ing] significantly among purchasers, regions, or pe-
    riods of time,’” or how Commerce is to “‘explain[] why such
    differences cannot be taken into account’ using the aver-
    age-to-average or transaction-to-transaction methods.”
    Dillinger France S.A. v. United States, 
    981 F.3d 1318
    ,
    1324–25 n.5 (Fed. Cir. 2020) (quoting section 1677f-
    1(d)(1)(B)); see also Apex II, 862 F.3d at 1346. Commerce
    1   An “export” price means the price of a transaction
    in the United States; a “normal” price means the price of a
    transaction in the respondent’s home country.
    Case: 20-1857     Document: 61     Page: 5    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                   5
    has therefore devised a means for implementing Congress’s
    directive. Until 2014, Commerce applied the “Nails test”
    to detect targeted dumping. See JBF RAK LLC v. United
    States, 
    790 F.3d 1358
    , 1367 n.5 (Fed. Cir. 2015). From 2013
    to 2014, Commerce refined its methodology and began ap-
    plying what it now calls “differential pricing analysis.” See
    Differential Pricing Analysis; Request for Comments, 79
    Fed. Reg. 26,720, 26,722 (Dep’t of Commerce May 9, 2014);
    Xanthan Gum from the People’s Republic of China, 78 Fed.
    Reg. 33,351 (Dep’t of Commerce June 4, 2013).
    We have summarized the methodology behind Com-
    merce’s differential pricing analysis in prior decisions. See,
    e.g., Apex II, 862 F.3d at 1343 n.2. Because the issues in
    this case concern specific aspects of that methodology, we
    provide a more thorough description below.
    Before Commerce can conduct its differential pricing
    analysis, it must first collect data regarding the respond-
    ent’s export sales and home sales. See Final Memo at 1. If
    those sales span multiple distinct products, Commerce seg-
    ments the sales into sets based on comparable product
    groups. See Differential Pricing Analysis, 79 Fed. Reg. at
    26,722.
    To begin the differential pricing analysis, Commerce
    further segments the respondent’s export sales for each
    product group into subsets based on the region of the
    United States in which those sales took place. Id. Com-
    merce similarly constructs subsets based on the purchasers
    involved in the sales (i.e., the purchaser category) and also
    based on the time periods in which the sales took place (i.e.,
    the time-period category). Id. A particular export sale will
    be present in multiple subsets across the regional, pur-
    chaser, and time-period categories. See id.
    For each subset within a category, Commerce makes
    that subset the “test group” and aggregates the remaining
    subsets in that category into the “comparison group.” Id.
    If both groups have at least two observations (i.e., sales
    Case: 20-1857     Document: 61      Page: 6    Filed: 07/15/2021
    6                                      STUPP CORPORATION   v. US
    prices), and if the sum of the comparison group is at least
    five percent of the total amount of export sales, Commerce
    applies the “Cohen’s d test,” named after statistician Jacob
    Cohen, to evaluate whether the test group differs signifi-
    cantly from the comparison group. Id. The formula for cal-
    culating the Cohen’s d value is as follows:
    |Mc – Mt |
    σp
    see Large Residential Washers from the Republic of Korea,
    
    2016 WL 5854390
     (Dep’t of Commerce Sept. 6, 2016) (not-
    ing that Commerce applies the “two-tailed” version of the
    Cohen’s d test, which uses the absolute-value operator to
    “focus[] on both lower and higher prices”). In the formula
    used by Commerce, Mc is the mean of the comparison
    group, Mt is the mean of the test group, and σp is the simple
    average of the two groups’ standard deviations. See Mid
    Continent Steel & Wire, Inc. v. United States, 
    495 F. Supp. 3d 1298
    , 1304 (Ct. Int’l Trade 2021) (appeal docketed).
    If the Cohen’s d value is equal to or greater than 0.8 for
    any test group, the observations within that group are said
    to have “passed” the Cohen’s d test, i.e., Commerce deems
    the sales prices in the test group to be significantly differ-
    ent from the sales prices in the comparison group. 
    Id. at 1302
    –04. Commerce applies the Cohen’s d test to each test
    group within the regional, purchaser, and time-period cat-
    egories. See Differential Pricing Analysis, 79 Fed. Reg. at
    26,722–23.
    Commerce counts the number of observations within
    each product group that were tagged as “passing,” and ap-
    plies what it calls a “ratio test” to the results: If the total
    percentage of passing transactions is 33% or less, Com-
    merce uses the default average-to-average method to cal-
    culate the weighted average dumping margin. If the total
    percentage is 66% or more, Commerce tentatively selects
    the alternative average-to-transaction method as the
    Case: 20-1857     Document: 61     Page: 7    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                   7
    method it will use to calculate the weighted average dump-
    ing margin. If the total percentage is between 33% and
    66%, Commerce tentatively selects a hybrid approach in
    which it applies the alternative average-to-transaction
    method to those transactions passing the Cohen’s d test
    and the average-to-average method to the remainder of the
    transactions. 
    Id.
    If Commerce tentatively selects an alternative compar-
    ison method, it confirms its selection by applying the
    “meaningful difference” test to determine whether using
    the default average-to-average method can account for the
    disparate pricing patterns that were discovered by the Co-
    hen’s d test and the ratio test. 
    Id. at 26,
    723 (implementing
    19 U.S.C. § 1677f-1(d)(1)(B)(ii)). The first step of the mean-
    ingful difference test is to calculate the weighted average
    dumping margin using the average-to-average method.
    The second step is to calculate the weighted average dump-
    ing margin with the tentatively selected method. The third
    step is to compare the results: If the margin for the aver-
    age-to-average method is below the de minimis threshold 2
    and the margin for the tentatively selected method is above
    that threshold, or if both are above that threshold and the
    margin for the tentatively selected method is 25% greater
    than the average-to-average margin, then Commerce con-
    siders there to be a meaningful difference, and it selects the
    alternative approach. Id. If that comparison leads Com-
    merce to conclude that there is not a meaningful difference,
    Commerce applies the average-to-average method across
    the board.
    As alluded to above, the average-to-average compari-
    son method involves subtracting the weighted average of
    2   The de minimis threshold for less-than-fair-value
    investigations is 2%. 19 U.S.C. § 1673d(a)(4) (incorporat-
    ing the 2% value provided in section 1673b(b)(3)).
    Case: 20-1857    Document: 61     Page: 8    Filed: 07/15/2021
    8                                  STUPP CORPORATION   v. US
    the export prices for a particular product group from the
    weighted average of the home market prices for that prod-
    uct group and multiplying the result by the total number
    of export units sold for that product group. 3 See 19 C.F.R.
    § 351.414(b)(1) and (d)(1).
    The average-to-transaction method involves subtract-
    ing each individual export price for a particular product
    group from the weighted average of the home market prices
    for that product group in an iterative fashion, and sum-
    ming the results. See id. § 351.414(b)(3). Notably, when
    applying the average-to-transaction method, Commerce
    “zeroes out” iterations that produce a negative dumping
    margin (i.e., when the weighted average home market price
    is less than an individual export price), a practice known
    as “zeroing.” Mid Continent Steel & Wire, Inc. v. United
    States, 
    940 F.3d 662
    , 671–72 (Fed. Cir. 2019).
    Both methods result in dumping margins that Com-
    merce then aggregates across the product groups. See 19
    U.S.C. § 1677(35)(A) and (B) (defining “[d]umping margin”
    and “[w]eighted average dumping margin”). Finally, Com-
    merce divides the aggregate dumping margin by the total
    value of the export sales, yielding the weighted average
    dumping margin. See id. If the weighted average dumping
    margin is greater than the de minimis threshold, Com-
    merce makes a final determination that the respondent is
    selling goods in the United States at less than fair value,
    which can lead to the entry of an antidumping duty order.
    See id. §§ 1673d, 1673e.
    In this case, Commerce applied its differential pricing
    analysis to SeAH’s sales of welded line pipe and selected
    3   Calculating the “weighted average” of a group of
    sales prices simply requires multiplying each sales price by
    the number of units sold at that price and computing the
    average of the resulting values.
    Case: 20-1857     Document: 61      Page: 9   Filed: 07/15/2021
    STUPP CORPORATION   v. US                                   9
    the hybrid approach for calculating SeAH’s weighted aver-
    age dumping margin. J.A. 10451; see also Final Memo at
    4. That approach resulted in a weighted average dumping
    margin of 2.53%, which is above the de minimis threshold.
    Final Determination, 80 Fed. Reg. at 61,367.
    SeAH appealed to the Trade Court. Among other is-
    sues, SeAH challenged specific aspects of Commerce’s dif-
    ferential pricing analysis and Commerce’s rejection of
    SeAH’s case brief. Stupp Corp. v. United States, 
    359 F. Supp. 3d 1293
    , 1297 (Ct. Int’l Trade 2019) (“Stupp I”).
    The Trade Court affirmed. 
    Id. 4
    II
    A
    SeAH contends on appeal that Commerce acted unlaw-
    fully when it rejected SeAH’s case brief. SeAH submitted
    its case brief on September 1, 2015, more than three
    months after Commerce issued its preliminary determina-
    tion on May 14, 2015. In that case brief, SeAH cited for the
    first time certain academic articles in support of its argu-
    ment that Commerce was misusing the Cohen’s d test. See
    J.A. 9582–92. SeAH also presented results from a statisti-
    cal analysis showing that its U.S. sales data were not nor-
    mally distributed. J.A. 9586–87. Additionally, SeAH
    presented the results from its own application of Com-
    merce’s differential pricing analysis to ten hypothetical da-
    tasets that it generated based on the sales data in this case.
    J.A. 9582. The results identified disparate pricing patterns
    in five of those randomly generated datasets. According to
    4    The Trade Court subsequently denied SeAH’s mo-
    tion for reconsideration. Stupp Corp. v. United States, 
    365 F. Supp. 3d 1373
     (Ct. Int’l Trade 2019) (“Stupp II”). The
    court later issued two additional decisions in this case that
    are not pertinent to this appeal.
    Case: 20-1857    Document: 61      Page: 10    Filed: 07/15/2021
    10                                  STUPP CORPORATION    v. US
    SeAH, those results demonstrated that Commerce’s differ-
    ential pricing analysis produces false positives.
    Commerce rejected those portions of SeAH’s case brief
    because of several procedural violations. J.A. 9698. Com-
    merce first noted that those portions of SeAH’s case brief
    contained “factual information” and that such information
    likely fell under either subparagraph (iv) or (v) of 19 C.F.R.
    § 351.102(b)(21). 5 According to Commerce, SeAH failed to
    identify the subparagraph of section 351.102(b)(21) under
    which that factual information was being submitted, as re-
    quired by 19 C.F.R. § 351.301(b). Commerce added that if
    that factual information fell within the catch-all provision
    of subparagraph (v), SeAH failed to satisfy section
    351.301(b)(1), which required SeAH to explain why that
    factual information did not fall within subparagraphs (i)
    through (iv). Finally, Commerce found that SeAH’s sub-
    mission of that factual information was untimely under the
    deadlines set out in 19 C.F.R. § 351.301(c). 6 The Trade
    5  As relevant here, subparagraph (iv) covers evi-
    dence submitted by a party to rebut, clarify, or correct cer-
    tain evidence placed on the record by Commerce.
    Subparagraph (v) covers all evidence not covered by sub-
    paragraphs (i) through (iv) as well as evidence submitted
    by a party to rebut, clarify, or correct such evidence.
    6  Commerce reasoned that if SeAH’s factual infor-
    mation fell within the catch-all provision of subparagraph
    (v), then section 351.301(c)(5) required SeAH to submit
    that information at least 30 days before Commerce’s pre-
    liminary determination. SeAH failed to meet that deadline
    because it submitted that information more than three
    months after the preliminary determination. J.A. 9698.
    Although Commerce did not separately analyze the timing
    requirement for factual information submitted under sub-
    paragraph (iv) of section 351.102(b)(21), SeAH does not
    Case: 20-1857    Document: 61      Page: 11    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                  11
    Court upheld Commerce’s rejection of SeAH’s case brief.
    Stupp I, 359 F. Supp. 3d at 1299–1302. We review Com-
    merce’s rejection of SeAH’s case brief for an abuse of dis-
    cretion. See Micron Tech., Inc. v. United States, 
    117 F.3d 1386
    , 1396 (Fed. Cir. 1997).
    SeAH argues that Commerce’s rejection of the case
    brief was contrary to the position Commerce took in Anti-
    dumping Duties; Countervailing Duties, 62 Fed. Reg.
    27,296 (Dep’t of Commerce May 19, 1997) (notice of final
    rule), where Commerce stated:
    Parties are free to comment on verification reports
    and to make arguments concerning information in
    the reports up to and including the filing of case
    and rebuttal briefs . . . . In making their argu-
    ments, parties may use factual information already
    on the record or may draw on information in the
    public realm to highlight any perceived inaccura-
    cies in a report.
    
    Id. at 27,
    332. SeAH contends that the academic articles it
    cited in its case brief are in the “public realm” and that its
    statistical analyses are derived from data “already on the
    record.” According to SeAH, Commerce’s decision directing
    SeAH to remove those materials from its case brief was
    therefore inconsistent with Commerce’s publicly an-
    nounced policy, and requires reversal.
    SeAH misunderstands Commerce’s statements in the
    1997 notice of final rule. In that notice, Commerce ex-
    plained that the exception to section 351.301(c) allowing
    parties to reference factual information already on the rec-
    ord or in the public realm pertains only to a party’s use of
    factual information to highlight perceived inaccuracies “in
    contend on appeal that its submission would have been
    timely under that requirement.
    Case: 20-1857    Document: 61     Page: 12    Filed: 07/15/2021
    12                                  STUPP CORPORATION   v. US
    a report.” 
    Id.
     The context of the exception makes clear that
    “report” means a “verification report[].” 
    Id.
    Commerce may issue a verification report before issu-
    ing a final determination to “verify relevant factual infor-
    mation” that it previously gathered pursuant to its
    investigation or review. 19 C.F.R. § 351.307(a). Commerce
    issued a verification report in this case pertaining to
    SeAH’s sales data. See Stupp I, 359 F. Supp. 3d at 1308.
    However, SeAH’s references to academic articles and sta-
    tistical analyses in its case brief were not directed at cor-
    recting perceived inaccuracies in Commerce’s verification
    report. Instead, SeAH used those materials to support its
    challenge to Commerce’s differential pricing analysis, and
    in particular its challenge to the manner in which Com-
    merce applied the Cohen’s d test in the preliminary deter-
    mination. See J.A. 9582–92; see also Appellant’s Opening
    Br. 49 (“SeAH’s case brief to Commerce included discus-
    sions . . . concerning statistical practices and the meaning
    of and requirements for using Cohen’s d.”). Because SeAH
    was not rebutting factual conclusions in Commerce’s veri-
    fication report, SeAH’s submission of factual information
    did not fall within the exception to the requirements of 19
    C.F.R. § 351.301(c) described in the 1997 notice of final
    rule. 7 SeAH’s submission was thus untimely and failed to
    7   Commerce has interpreted the exception set forth
    in the 1997 notice of final rule in the same manner. See,
    e.g., Issues and Decision Memorandum for Antidumping
    Duty Administrative Review of Hot-Rolled Carbon Steel
    from India, 69 ITADOC 36,060 (Dep’t of Commerce June
    28, 2004), available at https://enforcement.trade.gov/
    frn/summary/india/04-14620-1.pdf (permitting a party to
    submit financial statements in a January 2004 case brief
    when those statements were in the “public realm” and ad-
    dressed conclusions in Commerce’s December 2003 verifi-
    cation report).
    Case: 20-1857    Document: 61     Page: 13    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                 13
    satisfy other procedural requirements set forth in section
    351.301 of Commerce’s regulations.
    More broadly, SeAH argues that Commerce’s rejection
    of SeAH’s case brief was contrary to the underlying pur-
    pose of section 351.301(c). SeAH reasons that none of the
    submitted factual information required verification by
    Commerce, and that allowing that information into the rec-
    ord would not have delayed the investigation. Relatedly,
    SeAH argues that Commerce has permitted post-deadline
    submissions of similar factual information in other in-
    stances, contrary to Commerce’s interpretation of its regu-
    lations.
    Commerce is entitled to broad discretion regarding the
    manner in which it develops the record in an antidumping
    investigation. See PSC VSMPO-Avisma Corp. v. United
    States, 
    688 F.3d 751
    , 760 (Fed. Cir. 2012) (“[C]ourts will
    defer to the judgment of an agency regarding the develop-
    ment of the agency record.”); Micron Tech., 
    117 F.3d at 1396
     (“Congress has implicitly delegated to Commerce the
    latitude to derive verification procedures ad hoc.”); Am. Al-
    loys, Inc. v. United States, 
    30 F.3d 1469
    , 1475 (Fed. Cir.
    1994) (“[T]he statute gives Commerce wide latitude in its
    verification procedures.”). Mindful of that standard, we
    will not second-guess Commerce’s application of the proce-
    dural requirements governing the submission of factual in-
    formation in case briefs.
    As for SeAH’s contention that Commerce has permitted
    other parties to make untimely submissions of factual in-
    formation in the past, the Supreme Court has explained
    that an agency is “entitled to a measure of discretion in ad-
    ministering its own procedural rules,” and that as a gen-
    eral principle, it is within the discretion of an
    administrative agency “to relax or modify its procedural
    rules adopted for the orderly transaction of business before
    it when in a given case the ends of justice require it.” Am.
    Farm Lines v. Black Ball Freight Serv., 
    397 U.S. 532
    , 538–
    Case: 20-1857    Document: 61     Page: 14    Filed: 07/15/2021
    14                                  STUPP CORPORATION   v. US
    39 (1970). Short of a showing that Commerce’s enforce-
    ment of its procedural rules is so haphazard or unreasona-
    ble as to be arbitrary or capricious—which SeAH has not
    shown to be the case—Commerce’s failure to apply those
    rules with Procrustean consistency in every case does not
    deprive it of the authority to enforce those rules in any
    case. We conclude, therefore, that Commerce’s rejection of
    SeAH’s case brief was not an abuse of discretion.
    B
    With respect to the standard for reviewing Commerce’s
    selection of the statistical tests and numerical cutoffs used
    in this case, SeAH contends that “substantial evidence” is
    the appropriate standard. SeAH points out that Commerce
    did not adopt its differential pricing analysis with the ben-
    efit of notice-and-comment rulemaking. 8 SeAH asserts
    that Commerce’s public announcements regarding its dif-
    ferential pricing analysis amount to mere policy state-
    ments. Such policy statements, SeAH argues, “are not
    legally binding,” and the agency may not rely on them to
    justify applying differential pricing analysis in every case.
    Appellant’s Opening Br. 33–35. Pointing to our decision in
    Washington Red Raspberry Commission v. United States,
    
    859 F.2d 898
     (Fed. Cir. 1988), SeAH argues that the proper
    standard for reviewing Commerce’s choice of methodology
    is whether “the record contains substantial evidence sup-
    porting [Commerce’s] basis for its application of [certain
    statistical principles].” Appellant’s Opening Br. 36 (quot-
    ing Red Raspberry, 
    859 F.2d at 903
    ).
    8  Commerce issued a “Request for Comments” an-
    nouncing its “Differential Pricing Analysis” methodology
    before it instituted the investigation in this case. See 79
    Fed. Reg. 26,720. However, Commerce has not issued a
    formal rule adopting that methodology.
    Case: 20-1857    Document: 61     Page: 15    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                 15
    The Trade Court rejected SeAH’s arguments on this is-
    sue, reasoning that the substantial evidence standard ap-
    plies to “the outputs” of Commerce’s statistical analysis,
    not to Commerce’s “interpretation of a statute.” Stupp II,
    365 F. Supp. 3d at 1378. SeAH’s labeling of the differential
    pricing analysis as a “general policy statement” was inac-
    curate, according to the court. Id. The differential pricing
    analysis was instead “the result of Commerce interpreting
    19 U.S.C. § 1677f-1(d)(1)(B) and devising a methodology to
    effectuate that interpretation.” Stupp II, 365 F. Supp. 3d
    at 1378–79. For that reason, the court held that the stand-
    ard for reviewing Commerce’s choice of methodology was
    whether that methodology “reasonably implements a given
    statutory directive.” Id. at 1378.
    We agree with the Trade Court. Contrary to SeAH’s
    suggestion, Commerce’s differential pricing analysis is an
    interpretive rule, not a general statement of policy. A pol-
    icy statement “advise[s] the public prospectively of the
    manner in which the agency proposes to exercise a discre-
    tionary power.” Lincoln v. Vigil, 
    508 U.S. 182
    , 197 (1993)
    (quoting Chrysler Corp. v. Brown, 
    441 U.S. 281
    , 302 n.31
    (1979)). As illustrated in the Lincoln case, an example of
    an agency’s exercise of a discretionary power is the decision
    of the Department of Health and Human Services to cease
    allocating funds to a particular program when the funds
    had originally been appropriated to the Department as a
    lump sum without statutory restrictions. 
    Id.
    In this case, while Commerce’s decision to consider ap-
    plying the average-to-transaction method is within its dis-
    cretionary power, 9 its determination of whether the
    9  The statute defines an optional “[e]xception” to the
    general rule that Commerce use the average-to-average
    method (or transaction-to-transaction method): “The ad-
    ministering authority may determine whether the subject
    merchandise is being sold in the United States at less than
    Case: 20-1857    Document: 61      Page: 16    Filed: 07/15/2021
    16                                  STUPP CORPORATION    v. US
    average-to-transaction method is appropriate in a particu-
    lar case is not solely within its discretion, because that de-
    termination is confined by the statutory language of 19
    U.S.C. § 1677f-1(d)(1)(B): (i) there must be a “pattern of ex-
    port prices . . . that differ significantly among purchasers,
    regions, or periods of time,” and (ii) Commerce must “ex-
    plain[] why such differences cannot be taken into account”
    using the average-to-average method. Commerce’s differ-
    ential pricing analysis is an interpretation of that statutory
    language and thus constitutes an interpretive rule. See Pe-
    rez v. Mortg. Bankers Ass’n, 
    575 U.S. 92
    , 97 (2015) (stating
    that interpretive rules are “issued by an agency to advise
    the public of the agency’s construction of the statutes and
    rules which it administers” (quoting Shalala v. Guernsey
    Mem’l Hosp., 
    514 U.S. 87
    , 99 (1995))).
    In the alternative, and somewhat contradictorily,
    SeAH argues that Commerce’s adoption of its differential
    pricing analysis constitutes a legislative rule that could be
    adopted only by notice-and-comment rulemaking. SeAH
    contends that it is “doubtful” that Commerce’s differential
    pricing analysis is merely an interpretive rule, because
    Commerce’s decision to apply that analysis resulted in
    SeAH’s weighted average dumping margin crossing the de
    minimis threshold. Appellant’s Opening Br. 32–35.
    SeAH misunderstands the distinction between inter-
    pretive and legislative rules. Legislative rules alter the
    landscape of individual rights and obligations, binding par-
    ties with the force and effect of law; interpretive rules, on
    the other hand, merely clarify existing duties for affected
    parties. Kisor v. Wilkie, 
    139 S. Ct. 2400
    , 2420 (2019);
    Splane v. West, 
    216 F.3d 1058
    , 1063 (Fed. Cir. 2000).
    Hence, the relevant distinction is not whether a newly
    adopted rule changes the outcome of a particular case; the
    fair value [using the average-to-transaction method] . . . .”
    19 U.S.C. § 1677f-1(d)(1)(B) (emphasis added).
    Case: 20-1857    Document: 61      Page: 17    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                  17
    relevant distinction is whether the rule is “an attempt to
    make new law or modify existing law,” as opposed to
    merely “represent[ing] the agency’s reading of [existing]
    statutes.” Id.; see also Am. Postal Workers Union, AFL-CIO
    v. U.S. Postal Serv., 
    707 F.2d 548
    , 560 (D.C. Cir. 1983)
    (“[T]he impact of a rule has no bearing on whether it is leg-
    islative or interpretative; interpretative rules may have a
    substantial impact on the rights of individuals.” (citing 2
    K. Davis, Administrative Law Treatise § 7:8, at 39 (2d ed.
    1979))).
    Commerce’s differential pricing analysis does not make
    new law or modify existing law—it interprets the statutory
    provision that applies to patterns of significantly differing
    export prices by providing a mechanism for identifying
    such patterns. See Guernsey Mem’l Hosp., 
    514 U.S. at 97
    –
    100 (agency’s rule requiring amortization of reimbursable
    defeasance losses was an interpretive rule implementing
    the statutory mandate that Medicare reimburse only the
    “necessary costs of efficiently delivering covered services to
    individuals covered”); POET Biorefining, LLC v. EPA, 
    970 F.3d 392
    , 408 (D.C. Cir. 2020) (“If an agency’s interpreta-
    tion were a legislative rule simply because it drew ‘crisper
    and more detailed lines than the authority being inter-
    preted,’ then ‘no rule could pass as an interpretation of a
    legislative rule unless it were confined to parroting the rule
    or replacing the original vagueness with another’—a re-
    gime we have squarely rejected. . . . Rules that are fairly
    drawn from underlying statutes or regulations may articu-
    late even relatively detailed legal obligations without
    thereby becoming legislative rules subject to notice and
    comment.” (quoting Am. Mining Cong. v. Mine Safety &
    Health Admin., 
    995 F.2d 1106
    , 1112 (D.C. Cir. 1993))).
    Our precedents make clear that the relevant standard
    for reviewing Commerce’s selection of statistical tests and
    numerical cutoffs is reasonableness, not substantial evi-
    dence. See, e.g., Mid Continent, 940 F.3d at 667 (“In carry-
    ing out its statutorily assigned tasks, Commerce has
    Case: 20-1857    Document: 61     Page: 18    Filed: 07/15/2021
    18                                  STUPP CORPORATION   v. US
    discretion to make reasonable choices within statutory con-
    straints.” (collecting cases)); Apex II, 862 F.3d at 1346
    (holding Commerce’s “meaningful difference” test to be
    “reasonable”); JBF, 790 F.3d at 1363, 1367 (holding that
    Commerce’s interpretation of 19 U.S.C. § 1677f–
    1(d)(1)(B)(i) was reasonable and that “[b]ecause Congress
    did not provide for a direct methodology, Commerce
    properly filled that gap” (cleaned up)).
    Our decision in Red Raspberry is not to the contrary.
    In that case, we applied the substantial evidence standard
    to review Commerce’s determination that a particular re-
    spondent’s dumping margin was de minimis and that the
    respondent should therefore be excluded from the anti-
    dumping duty order. 
    859 F.2d at 903
    . At the time of Com-
    merce’s 1985 final determination in that case, there was no
    statute defining a de minimis threshold or expressly au-
    thorizing a de minimis rule, and Commerce had not
    adopted or announced any rule defining and supporting a
    de minimis threshold. 10 Further, Commerce did not adopt
    10 See Red Raspberry, 
    859 F.2d at 902
     (“Congress has
    not expressly authorized the ITA to ignore de minimis or
    negligible dumping margins.”). The current statute defin-
    ing the de minimis threshold, 19 U.S.C. § 1673b(3), was not
    enacted until December 8, 1994. See Uruguay Round
    Agreements Act, Pub. L. No. 103-465, 108 Stat. 4809. Com-
    merce did not publish its rule establishing the de minimis
    threshold until 1987. See Carlisle Tire & Rubber Co. v.
    United States, 
    634 F. Supp. 419
    , 422–23 (Ct. Int’l Trade
    1986) (cited with approval in Red Raspberry, 
    859 F.2d at 903
    ) (“So far as the Court is aware, Commerce has never
    proposed a rule, or even claimed, that a .5 percent test ap-
    plies in all cases. . . . Even though there is no ‘rule’ that
    margins less than .5 percent are de minimis, Commerce
    may find that margins of approximately .45 percent are de
    Case: 20-1857     Document: 61      Page: 19    Filed: 07/15/2021
    STUPP CORPORATION    v. US                                   19
    a general definition of de minimis dumping in the Red
    Raspberry case, but simply determined that the particular
    dumping margin before it in that case was de minimis and
    insufficient to support an antidumping duty order. 11
    Hence, unlike in this case, Commerce made factual deter-
    minations in Red Raspberry without previously announc-
    ing a rule governing those determinations and without
    interpreting statutory language expressly authorizing
    those determinations to be made. It was thus appropriate
    for us to ask whether Commerce’s decision that a particu-
    lar dumping margin was de minimis was supported by sub-
    stantial evidence in the context of the particular
    investigation under review. See Red Raspberry, 
    859 F.2d at 903
    .
    In this case, by contrast, Commerce applied its differ-
    ential pricing analysis, a general approach that Commerce
    defined in a prior publication, see 79 Fed. Reg. 26,720, as a
    methodology for implementing the statutory directive in
    section 1677f-1(d)(1)(B). The appropriate standard for re-
    viewing Commerce’s differential pricing analysis and the
    specific components of that methodology is therefore rea-
    sonableness. See Mid Continent, 940 F.3d at 667; JBF, 790
    F.3d at 1363–64.
    C
    Turning to the merits of Commerce’s differential pric-
    ing analysis, SeAH contends that Commerce provided no
    substantive justification for its ratio test, and that the ratio
    minimis in this investigation. To do this Commerce must
    explain the basis for its decision.”).
    11  Red Raspberries from Canada; Final Determina-
    tion, 50 Fed. Reg. 19,768, 19,772 (Dep’t of Commerce May
    10, 1985); see also Red Raspberries from Canada; Prelimi-
    nary Determination, 49 Fed. Reg. 49,129-01, 49,131 (Dep’t
    of Commerce Dec. 18, 1984).
    Case: 20-1857    Document: 61      Page: 20    Filed: 07/15/2021
    20                                  STUPP CORPORATION    v. US
    test is otherwise not supported by evidence. Specifically,
    SeAH argues that Commerce has provided no justification,
    whether derived from general statistical principles or
    based on the facts of this case, for using the 33% and 66%
    cutoffs employed in that test. According to SeAH, Com-
    merce’s explanation of those cutoffs simply “repeat[s] [the]
    unsupported assertion that the cut-offs achieve the pur-
    poses for which Commerce wants to use them.” Appellant’s
    Opening Br. 45. SeAH argues that Commerce was re-
    quired “to explain why the particular cut-offs it had chosen
    were appropriate in the specific circumstances of this case.
    And, it was also required to point to substantial evidence
    that supported those explanations.” Id. at 45–46. We dis-
    agree.
    As a preliminary matter, Commerce has explained that
    the ratio test is not the ultimate determinant of masked
    dumping. See Issues and Decision Memorandum for Anti-
    dumping Duty Administrative Review of Polyethylene Ter-
    ephthalate Film from India, 80 ITADOC 11,160 (Dep’t of
    Commerce Mar. 2, 2015), available at https://enforce-
    ment.trade.gov/frn/summary/india/2015-04273-1.pdf (“A
    determination that there exists a pattern of prices that dif-
    fer significantly in no way indicates that dumping is being
    masked in a meaningful way.”). Rather, the ratio test is a
    preliminary step “aggregat[ing] the results of the compari-
    sons of the means between the test and comparison groups
    to gauge the extent of the significant differences in prices,”
    i.e., the “effect size[s].” Id.
    More importantly, there is no statutory language tell-
    ing Commerce how to detect patterns of significantly dif-
    fering export prices, much less how to aggregate and
    quantify pricing comparisons across product groups in or-
    der to select a statutorily defined comparison method. See
    19 U.S.C. § 1677f-1(d)(1)(A)–(B). Commerce therefore has
    discretion to determine a reasonable methodology to imple-
    ment the statutory directive. See JTEKT Corp. v. United
    States, 
    642 F.3d 1378
    , 1383 (Fed. Cir. 2011). At the highest
    Case: 20-1857     Document: 61      Page: 21    Filed: 07/15/2021
    STUPP CORPORATION    v. US                                   21
    level of abstraction, Commerce is using a conventional
    method for quantifying comparisons across discrete
    groups: counting the number of divergent sales prices, as
    identified by an effect-size test, and calculating the popu-
    lation percentage of those divergent sales prices. We hold
    that general approach to be reasonable.
    Commerce has justified its more specific selection of
    the 33% and 66% cutoffs. Regarding the 33% cutoff, Com-
    merce explained that “when a third or less of a respondent’s
    U.S. sales are not at prices that differ significantly, then
    these significantly different prices are not extensive
    enough to satisfy the first requirement of the statute.” Is-
    sues and Decision Memorandum for Administrative Review
    of the Antidumping Duty Order on Certain Steel Nails from
    the Republic of Korea, 84 ITADOC 56,424 (Dep’t of Com-
    merce Oct. 16, 2019), available at https://enforce-
    ment.trade.gov/frn/summary/korea-south/2019-22992-
    1.pdf. Likewise, “given its growing experience of applying
    section 777A(d)(1)(B) of the Act and the application of the
    [average-to-transaction] method as an alternative to the
    [average-to-average] method,” Commerce has found that
    “when two thirds or more of a respondent’s sales are at
    prices that differ significantly, then the extent of these
    sales is so pervasive that it would not permit [Commerce]
    to separate the effect of the sales where prices differ signif-
    icantly from those where prices do not differ significantly.”
    
    Id.
     Finally, “when [Commerce] finds that between one
    third and two thirds of U.S. sales are at prices that differ
    significantly, then there exists a pattern of prices that dif-
    fer significantly, and . . . the effect of this pattern can rea-
    sonably be separated from the sales whose prices do not
    differ significantly.” 
    Id.
     In the latter two situations, Com-
    merce will merely “consider[]” applying the average-to-
    transaction method, a decision that is ultimately dictated
    by the meaningful difference test. See 
    id.
    Commerce’s selection of the 33% and 66% cutoffs is a
    reasonable choice. An alternative approach might be, for
    Case: 20-1857    Document: 61     Page: 22    Filed: 07/15/2021
    22                                  STUPP CORPORATION   v. US
    example, to use a single cutoff at 50%. That approach
    would undoubtedly favor some respondents—the more fre-
    quent application of the average-to-average method would
    result in more de minimis dumping margins—but it would
    disfavor other respondents. For example, respondents hav-
    ing slightly more than 50% of their sales passing the Co-
    hen’s d test would have the average-to-transaction method
    applied to all of their sales. Commerce’s approach is less
    rigid, providing a middle ground between 33% and 66%, in
    which the average-to-transaction method is only partially
    applied. That approach provides a better fit, minimizing
    both the assessment of antidumping duties that are too
    high and the assessment of duties that are too low. We
    conclude that Commerce’s cutoffs are reasonable in light of
    the alternatives.
    SeAH is mistaken when it asserts that Commerce must
    demonstrate the propriety of the ratio test with respect to
    the particular facts of this case. As discussed above, Com-
    merce’s burden in selecting a methodology for detecting
    patterns of significantly differing export prices is reasona-
    bleness as a matter of law, not substantial evidence on the
    factual record. SeAH was free to make factual arguments
    regarding why it was inappropriate to apply the ratio test
    in this case, but it chose not to do so. Instead, SeAH has
    challenged the appropriateness of the ratio test in the ab-
    stract (e.g., by contending that the test and its cutoffs are
    “arbitrary”) and wrongly attempts to place the burden on
    Commerce to justify the use of that test as a matter of sub-
    stantial evidence in light of the facts of this case.
    For those reasons, we hold that Commerce’s ratio test
    reasonably implements the statutory requirement that
    Commerce determine whether there is “a pattern of export
    prices” “differ[ing] significantly among purchasers, re-
    gions, or periods of time” before selecting the average-to-
    transaction method. 19 U.S.C. § 1677f-1(d)(1)(B)(i).
    Case: 20-1857    Document: 61     Page: 23    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                23
    D
    SeAH next challenges Commerce’s “meaningful differ-
    ence” test. SeAH argues that Commerce’s use of that test
    fails to satisfy the statutory requirement that Commerce
    “explain[]” why significantly differing export prices among
    different purchasers, regions, or time periods “‘cannot be
    taken into account using’ [the] average-to-average
    [method].” Appellant’s Opening Br. 54–55 (quoting 19
    U.S.C. § 1677f-1(d)(1)(B)). According to SeAH, Commerce
    must show that the average-to-transaction method is more
    “accurate” than the average-to-average method in order to
    satisfy that statutory requirement. Id. at 56. SeAH fur-
    ther contends that the meaningful difference test identifies
    disparities between the results of the two methods only be-
    cause the average-to-transaction method includes zeroing,
    while the average-to-average method does not.
    Our prior decision in Apex II disposes of SeAH’s chal-
    lenges to the “meaningful difference” test. In that case, we
    addressed and rejected the argument that “Commerce’s
    meaningful difference test is unreasonable because it is in-
    consistent with the statute’s text.” 862 F.3d at 1347. The
    appellant in that case argued that the meaningful differ-
    ence test improperly conflated the ultimate margin calcu-
    lation with the task of explaining why the average-to-
    average method could not account for differences in prices.
    Id. We rejected that argument, and we also rejected the
    argument that the meaningful difference test was flawed
    because it simply measured differences in dumping mar-
    gins caused by zeroing. Id. at 1348–49.
    Seeking to distinguish Apex II, SeAH argues that we
    did not hold in that case that comparisons of the margin
    calculations from the average-to-average and average-to-
    transaction methods “are always sufficient in and of them-
    selves.” Appellant’s Opening Br. 58–59. SeAH is mis-
    taken; our holding in that case had two parts: (1)
    Commerce’s meaningful difference test is a reasonable
    Case: 20-1857    Document: 61      Page: 24     Filed: 07/15/2021
    24                                   STUPP CORPORATION    v. US
    response to the statutory directive to explain why the av-
    erage-to-average method is inadequate in certain cases,
    and (2) the meaningful difference test is sufficient to satisfy
    that directive. See 862 F.3d at 1348–49 (“Commerce’s
    methodology compares the [average-to-average] and [aver-
    age-to-transaction] methodologies, as they are applied in
    practice, and in a manner this court has expressly con-
    doned. . . . Commerce’s chosen methodology reasonably
    achieves the overarching statutory aim of addressing tar-
    geted or masked dumping.”). Accordingly, we affirm Com-
    merce’s use of the meaningful difference test.
    E
    SeAH next challenges Commerce’s use of the 0.8 cutoff
    for determining whether particular results “pass” the Co-
    hen’s d test. SeAH has two arguments: First, SeAH argues
    that Commerce’s selection of the 0.8 cutoff was arbitrary.
    Second, SeAH argues that Commerce’s application of the
    0.8 cutoff in this case was unsupported by evidence because
    Professor Cohen’s suggestion that “0.8 could be considered
    a ‘large’ effect size” was limited to comparisons involving
    data that met certain restrictive conditions—“in particu-
    lar, that the datasets being compared had roughly the
    same number of data points, were drawn from normal dis-
    tributions, and had approximately equal variances.” Ap-
    pellant’s Opening Br. 27–28. According to SeAH, none of
    those conditions were satisfied in this case. Id.
    We addressed the crux of SeAH’s first argument in our
    decision in Mid Continent: “[Appellant] next challenges
    Commerce’s reliance on a d ratio of at least 0.8 as a rigid
    measure of significance of the difference measured by the
    Cohen’s d test. . . . This is a challenge to the reasonable-
    ness of Commerce’s choice of one part of the overall analy-
    sis of differential pricing . . . .” 940 F.3d at 673. We held
    that “the 0.8 standard is ‘widely adopted’ as part of a ‘com-
    monly used measure’ of the difference relative to such over-
    all price dispersion . . . . [I]t is reasonable to adopt that
    Case: 20-1857    Document: 61      Page: 25    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                  25
    measure where there is no better, objective measure of ef-
    fect size.” Id. (citation omitted).
    We did not, however, address SeAH’s second argument
    in Mid Continent. We construe that argument as part of
    SeAH’s challenge to Commerce’s use of the Cohen’s d test,
    which we address next.
    F
    SeAH’s final contention is that Commerce misused the
    Cohen’s d test in its differential pricing analysis. SeAH ar-
    gues that the data in this case did not satisfy the conditions
    required to achieve meaningful results from the Cohen’s d
    test: in particular, the requirements that the test groups
    and the comparison groups be normally distributed, of suf-
    ficient size, and of roughly equal variances. 12 SeAH further
    argues that even if Commerce merely needed to provide
    some reasonable basis for adopting the Cohen’s d test,
    Commerce’s only support for using that test was the gen-
    eral view in the academic literature that Cohen’s d is a re-
    liable measure of effect size. According to SeAH, the
    literature ceases to provide reasonable support when Com-
    merce applies the test to data that do not satisfy the condi-
    tions assumed by that literature.
    We agree that there are significant concerns relating to
    Commerce’s application of the Cohen’s d test in this case
    and, more generally, in adjudications in which the data
    groups being compared are small, are not normally distrib-
    uted, and have disparate variances. Our concerns raise
    12  SeAH contends that Commerce “compared groups
    containing as few as 2 data points,” “compared groups with
    vastly dissimilar numbers of data points,” “compared
    groups that were not normally distributed,” and “compared
    groups with greatly dissimilar variances (as measured by
    the standard deviation).” Appellant’s Opening Br. 41–42.
    Commerce does not dispute those contentions.
    Case: 20-1857    Document: 61      Page: 26    Filed: 07/15/2021
    26                                  STUPP CORPORATION    v. US
    questions about the reasonableness of Commerce’s use of
    the Cohen’s d test in less-than-fair-value adjudications,
    warranting further supporting explanation from the De-
    partment. See Mid Continent, 940 F.3d at 667 (“Commerce
    must provide an explanation that is adequate to enable the
    court to determine whether the choices are in fact reason-
    able, including as to calculation methodologies.”).
    Our first concern is a general one: Commerce’s appli-
    cation of the Cohen’s d test to data that do not satisfy the
    assumptions on which the test is based may undermine the
    usefulness of the interpretive cutoffs. In developing those
    cutoffs, including the 0.8 cutoff, Professor Cohen noted that
    “we maintain the assumption that the populations being
    compared are normal and with equal variability, and con-
    ceive them further as equally numerous.” Jacob Cohen,
    Statistical Power Analysis for the Behavioral Sciences 21
    (2d ed. 1988); see also id. at 25–26 (discussing “small effect
    size” 0.2, “medium effect size” 0.5, and “large effect size”
    0.8 “[i]n terms of measures of nonoverlap . . . of the com-
    bined area covered by two normal equal-sized equally var-
    ying populations”).       Other literature confirms those
    assumptions. See, e.g., Robert J. Grissom & John J. Kim,
    Effect Sizes for Research: Univariate and Multivariate 66
    (2d ed. 2012) (“When the distribution of scores of a compar-
    ison population is not normal, the usual interpretation of a
    dG or d in terms of estimating the percentile standing of the
    average-scoring members of another group with respect to
    the supposed normal distribution of the comparison group’s
    scores would be invalid. Also, because standard deviations
    can be very sensitive to a distribution’s shape, . . . nonnor-
    mality can greatly influence the value of a standardized-
    mean-difference effect size and its estimate.”); id. at 68
    (noting that “Cohen’s d” is appropriate “if the two popula-
    tions that are being compared are assumed to have equal
    variances.”).
    There is extensive literature describing the problems
    associated with applying the Cohen’s d test to data that are
    Case: 20-1857    Document: 61      Page: 27     Filed: 07/15/2021
    STUPP CORPORATION   v. US                                   27
    not normally distributed or that are lacking equal vari-
    ances. See, e.g., Robert Coe, It’s the Effect Size, Stupid:
    What effect size is and why it is important, presented at the
    Annual Conference of the British Educational Research As-
    sociation (Sept. 2002) (“It has been shown that the inter-
    pretation of the ‘standardised mean difference’ measure of
    effect size [(e.g., Cohen’s d)] is very sensitive to violations
    of the assumption of normality.”); 13 David M. Lane et al.,
    Introduction to Statistics, Online Edition, 645 (“When the
    effect size is measured in standard deviation units as it is
    for Hedges’ g and Cohen’s d, it is important to recognize
    that the variability in the subjects has a large influence on
    the effect size measure.”).
    In 2005, James Algina and his collaborators inspected
    the robustness of Cohen’s d as an effect-size parameter,
    seeking to determine “if a small change in the population
    distribution can strongly affect the parameter.” James Al-
    gina et al., An Alternative to Cohen’s Standardized Mean
    Difference Effect Size: A Robust Parameter and Confidence
    Interval in the Two Independent Groups Case, 10 Psycho-
    logical Methods 317, 318 (2005). After simulating Cohen’s
    d on various data that followed a mixed-normal distribu-
    tion, e.g., a heavy-tailed distribution, they concluded that
    Cohen’s d was not robust to mixed-normal distributions,
    and that applying Cohen’s d to such data caused serious
    flaws in interpreting the resulting parameter. Id. at 318–
    319.
    In a subsequent simulation study, Johnson Ching-
    Hong Li investigated the robustness of several effect-size
    13 Professor Coe’s paper is available at https://
    www.cem.org/attachments/ebe/ESguide.pdf. Cohen’s d is a
    measure of “standardized mean difference.” Paul D. Ellis,
    The Essential Guide to Effect Sizes: Statistical Power,
    Meta-Analysis, and the Interpretation of Research Results
    13 (2010).
    Case: 20-1857     Document: 61      Page: 28    Filed: 07/15/2021
    28                                   STUPP CORPORATION     v. US
    tests, including Cohen’s d. Johnson Ching-Hong Li, Effect
    size measures in a two-independent-samples case with
    nonnormal and nonhomogeneous data, 48 Behavioral Re-
    search 1560 (2015). Li concluded that Cohen’s d “was
    found to be inaccurate when the normality and homogene-
    ity-of-variances assumptions were violated in this study,
    thereby severely affecting the accuracy of d in evaluating
    the true [effect size] in the research literature.” Id. at 1571.
    The use of Cohen’s d with test groups consisting of very
    few observations may be particularly problematic. Con-
    sider, for example, a situation in which there are eight ex-
    port sales, two occurring in each of the four regions of the
    United States. Under the differential pricing analysis, as
    Commerce describes it, Commerce would apply Cohen’s d
    to analyze the pricing differences between each region’s
    two sales (i.e., the test group) and the other regions’ six
    sales (i.e., the comparison group) even though each test
    group contains only two observations and each would po-
    tentially lack normality. The literature concludes that us-
    ing Cohen’s d in such a situation may produce an upward
    bias in the calculated effect size. See Grissom et al. at 70
    (“Both Cohen’s d and Glass’s dG have some positive bias
    (i.e., tending to overestimate their respective parameters),
    the more so the smaller the sample sizes and the larger the
    effect size in the population.”). An upward bias might pro-
    duce more “passing” results under the Cohen’s d test,
    which would tend to exaggerate dumping margins.
    Another source of concern arises from test groups con-
    taining sales prices that hover around the same value.
    Consider, for example, ten purchasers of a product, each of
    which purchases five units. Assume that the per-unit sales
    prices for a particular purchaser are not normally distrib-
    uted and are all the same, or nearly the same (e.g., $100.01,
    $100.01, $100.01, $100.01, and $99.99). Assume further
    that the per-unit sales prices across the entire set of pur-
    chasers are also very similar, falling within a relatively
    small range (such as between $99.92 and $101.01).
    Case: 20-1857    Document: 61      Page: 29    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                  29
    Applying Cohen’s d to that hypothetical data seems
    problematic: As the variance within each test group ap-
    proaches zero, the denominator in the Cohen’s d equation
    is greatly reduced and, in fact, approaches half of the val-
    ues of the standard deviations of the larger comparison
    groups. 14 That is because Commerce uses the simple aver-
    age pooled standard deviation instead of the weighted av-
    erage pooled standard deviation; the former averages the
    standard deviations of the test and comparison groups
    without accounting for the number of observations in each
    group. 15 As the denominator is reduced, the resulting ef-
    fect-size parameter is increased, tending to artificially in-
    flate the dumping margins for a set of export sales prices
    that has minimal variance. An objective examiner inspect-
    ing those export sales prices would be unlikely to conclude
    that they embody a “pattern” of prices that “differ signifi-
    cantly.” 19 U.S.C. § 1677f-1(d)(1)(B)(i). Although the prob-
    lem in that situation is a function of Commerce’s use of the
    simple average pooled standard deviation, our concern is
    14   For each iteration of the Cohen’s d test, with rotat-
    ing test groups and comparison groups, the denominator is
    simply the average of two numbers—the standard devia-
    tion of the test group and the standard deviation of the
    comparison group. When the test group’s standard devia-
    tion is zero, the denominator is equal to half of the compar-
    ison group’s standard deviation (the simple average of zero
    and any number is half of that number).
    15   In Mid Continent, we remanded so that Commerce
    could provide “more thorough consideration” and justifica-
    tion for using the simple average pooled standard devia-
    tion. 940 F.3d at 674–75. Commerce defended its position
    on remand, and the Trade Court found Commerce’s defense
    reasonable. See Mid Continent, 495 F. Supp. 3d at 1303.
    An appeal of the Trade Court’s decision is pending before
    this Court. See Mid Continent Steel & Wire, Inc. v. United
    States, No. 21-1747 (Fed. Cir. filed Mar. 17, 2021).
    Case: 20-1857    Document: 61      Page: 30     Filed: 07/15/2021
    30                                   STUPP CORPORATION    v. US
    also related to the number of observations being compared
    and the distribution of those observations—requiring
    larger test groups tends to decrease the likelihood that a
    test group would have sales prices with near-zero variance,
    and requiring normality also tends to decrease that likeli-
    hood as the number of observations increases.
    Commerce makes only two relevant arguments in re-
    sponse. First, Commerce argues that the concern over the
    assumption of normality is misplaced because “normal dis-
    tribution is a concept of probability and statistical signifi-
    cance, which are not relevant to Commerce’s differential
    pricing analysis.” Appellee’s Br. 25. Put differently, Com-
    merce argues that it does not need to worry about normal-
    ity, because it is not sampling data but instead possesses
    the entire universe of data. See id. at 25–26; see also Final
    Memo at 21–22 (making similar arguments). While Com-
    merce is correct that it does not “sample” data, that obser-
    vation does not address the fact that Professor Cohen
    derived his interpretive cutoffs under the assumption of
    normality. Nor does it address SeAH’s representation that
    Commerce’s analysis in this case violated Professor Co-
    hen’s other assumptions, homogeneity-of-variances and
    the number of observations being compared.
    Commerce’s second argument is that its approach is
    reasonable because it uses the larger, more conservative
    0.8 cutoff for identifying effect sizes that pass the Cohen’s
    d test. That argument, too, fails to address the fact that
    Professor Cohen derived his interpretive cutoffs under cer-
    tain assumptions. Violating those assumptions can sub-
    vert the usefulness of the interpretive cutoffs, transforming
    what might be a conservative cutoff into a meaningless
    comparator. See Virnetx, Inc. v. Cisco Sys., Inc., 
    767 F.3d 1308
    , 1332 (Fed. Cir. 2014) (“The Nash theorem arrives at
    a result that follows from a certain set of premises. It itself
    asserts nothing about what situations in the real world fit
    those premises. Anyone seeking to invoke the theorem as
    applicable to a particular situation must establish that fit,
    Case: 20-1857    Document: 61      Page: 31    Filed: 07/15/2021
    STUPP CORPORATION   v. US                                  31
    because the 50/50 profit-split result is proven by the theo-
    rem only on those premises. Weinstein did not do so. This
    was an essential failing in invoking the Solution.”).
    In sum, the evidence and arguments before us call into
    question whether Commerce’s application of the Cohen’s d
    test to the data in this case violated the assumptions of nor-
    mality, sufficient observation size, and roughly equal vari-
    ances associated with that test. It seems likely that
    Commerce’s application of the Cohen’s d test had a mate-
    rial impact on the results of the less-than-fair-value inves-
    tigation in this case, particularly given that the dumping
    margin assigned to SeAH (2.53%) was only slightly above
    the de minimis threshold, below which no antidumping du-
    ties would be assessed. We therefore remand to give Com-
    merce an opportunity to explain whether the limits on the
    use of the Cohen’s d test prescribed by Professor Cohen and
    other authorities were satisfied in this case or whether
    those limits need not be observed when Commerce uses the
    Cohen’s d test in less-than-fair-value adjudications. In
    that regard, we invite Commerce to clarify its argument
    that having the entire universe of data rather than a sam-
    ple makes it permissible to disregard the otherwise-appli-
    cable limitations on the use of the Cohen’s d test.
    AFFIRMED IN PART, VACATED AND REMANDED
    IN PART
    COSTS
    Each party will bear its own costs for this appeal.