Ricci v DeStefano Brief Amici Curiae
Public Court Documents
March 25, 2009
44 pages
Cite this item
-
Brief Collection, LDF Court Filings. Ricci v DeStefano Brief Amici Curiae, 2009. 631f712b-c29a-ee11-be36-6045bdeb8873. LDF Archives, Thurgood Marshall Institute. https://ldfrecollection.org/archives/archives-search/archives-item/2ff90145-e8a8-42a4-baab-1ee9293d5089/ricci-v-destefano-brief-amici-curiae. Accessed December 04, 2025.
Copied!
Nos. 07-1428 & 08-328
In T he
S u p r e m e C o u r t of tin' f H n t t d J S ta t e s ;
F r a n k R ic c i , et a l .,
Petitioners,
v.
J o h n D e St e f a n o , et a l .,
Respondents.
On Writs of Certiorari
to the United States Court of Appeals
for the Second Circuit
BRIEF OF INDUSTRIAL-ORGANIZATIONAL
PSYCHOLOGISTS AS AMICI CURIAE
IN SUPPORT OF RESPONDENTS
David C. Frederick
Counsel o f Record
Derek T. H o
Barrett C. Hester
Jennifer L. Peresie
Ke llo g g , Hu b e r , Ha n s e n ,
T od d , Evans & F ig e l , P.L.L.C.
1615 M Street, N.W., Suite 400
Washington, D.C. 20036
(202) 326-7900
Counsel for Industrial-
Organizational Psychologists
March 25, 2009
TABLE OF CONTENTS
Page
TABLE OF AUTHORITIES....................................... iii
INTEREST OF AMICI CURIAE................................ 1
SUMMARY OF ARGUMENT......................................3
ARGUMENT..................................................................5
I. THE 2003 EXAMINATIONS CONTAINED
FATAL FLAWS THAT UNDERMINED
THEIR ABILITY TO SELECT THE
MOST QUALIFIED CANDIDATES................ 5
A. Proper Validation Of Employment
Tests According To Established Stan
dards Is Essential To Ensuring Fair,
Merit-Based Selection...................................6
B. Contrary To Petitioners’ Premise, The
2003 NHFD Examinations Could Not
Have Been Validated Under Estab
lished Principles............................................ 8
1. The Test Designer Conceded That
the Exams Did Not Attempt To
Measure Command Presence, a
Critical Job Attribute........................... 10
2. The Weighting of the Multiple-
Choice and Oral Interview Portions
of the Exams Was Arbitrary and
Could Not Be Validated....................... 12
3. Flaws in the Exam-Development
Process Contributed to the Lack of
Validity Evidence of the NHFD
Tests....................................................... 19
11
4. The NHFD Tests Could Not Have
Been Validated for Strict Rank-
Ordering of Candidates........................ 21
II. THE FLAWS IN THE NHFD PRO
MOTIONAL EXAMS EXACERBATED
THEIR ADVERSE IMPACT ON MINOR
ITY CANDIDATES.......................................... 23
A. Overweighting Of The Written,
Multiple-Choice Portion Of The Exams
Increased The Adverse Impact On
Minority Candidates...................................24
B. Selection Of Candidates In Strict
Rank Order Also Contributed To Ad
verse Impact................................................ 26
III. CURRENT I/O PSYCHOLOGY RE
SEARCH SUPPORTS THE USE OF
PROMOTIONAL ASSESSMENT CEN
TERS AS A VALID AND LESS DIS
CRIMINATORY ALTERNATIVE TO
TRADITIONAL TESTING METHODS.........28
A. History Of The Assessment Center
Model............................................................29
B. Assessment Centers Have Demon
strated Validity In The Context Of
Firefighter Promotion.................................30
C. Assessment Centers Have Been
Proven To Reduce Adverse Impact On
Minorities.................................................... 32
CONCLUSION.............................................................34
Ill
TABLE OF AUTHORITIES
Page
CASES
Biondo v. City of Chicago, 382 F.3d 680
(7th Cir. 2004)....................................................... 28
Bridgeport Guardians, Inc. v. City of Bridge
port, 933 F.2d 1140 (2d Cir. 1991)......................28
Chicago Firefighters Local 2 u. City of
Chicago, 249 F.3d 649 (7th Cir. 2001).......... 27-28
Chisholm v. United States Postal Serv., 516
F. Supp. 810 (W.D.N.C. 1980), aff’d in
part, 665 F.2d 482 (4th Cir. 1981)...................... 18
Ensley Branch, NAACP v. Seibels, 31 F.3d
1548 (11th Cir. 1994).............................................23
Firefighters Inst, for Racial Equality v. City
of St. Louis:
549 F.2d 506 (8th Cir. 1977)........................... 10, 12
616 F.2d 350 (8th Cir. 1980)............. 16, 17, 22, 32
Griggs v. Duke Power Co., 401 U.S. 424
(1971).................................................................. 6, 29
Guardians Ass’n of New York City Police
Dep’t u. Civil Serv. Comm’n, 630 F.2d 79
(2d Cir. 1980)....................................................21, 22
Isabel v. City of Memph is:
No. 01-2533 ML/BRE, 2003 WL 23849732
(W.D. Tenn. Feb. 21, 2003), aff’d, 404
F.3d 404 (6th Cir. 2005)...................................... 13
404 F.3d 404 (6th Cir. 2005)............................... 18
IV
Jones v. New York City Human Res. Admin.,
391 F. Supp. 1064 (S.D.N.Y. 1975), aff’d,
528 F.2d 696 (2d Cir. 1976)................................. 14
Kelly v. City of New Haven, 881 A.2d 978
(Conn. 2005).......................................................... 22
Nash v. Consolidated City of Jacksonville,
837 F.2d 1534 (11th Cir. 1988), vacated
and remanded, 490 U.S. 1103 (1989),
opinion reinstated on remand, 905 F.2d
355 (11th Cir. 1990).............................................. 17
Pina v. City of East Providence, 492 F. Supp.
1240 (D.R.I. 1980)........................................... 13, 22
STATUTES, REGULATIONS, AND RULES
Civil Rights Act of 1964, Tit. VII, 42 U.S.C.
§ 2000e et seq......................................... 5, 26, 29, 33
29 C.F.R. pt. 1607 (Uniform Guidelines on
Employee Selection Procedures)......... 5, 6, 7, 9, 12,
15, 16, 18, 21, 29
§ 1607.1(A)................................................................6
§ 1607.1(B)................................................................7
§ 1607.3(A)................................................................7
§ 1607.5(B)................................................................7
§ 1607.5(C)................................................................6
§ 1607.5(G)..............................................................21
§ 1607.5(H)............................................................ 18
§ 1607.9.....................................................................9
§ 1607.14(B)..............................................................9
§ 1607.14(B)-(D)........................................................7
V
§ 1607.14(B)(3)..................................................... 10
§ 1607.14(C)(4)..................................................... 10
§ 1607.14(C)(9)...................................................... 21
Sup. Ct. R. 37.6........................................................... 1
ADMINISTRATIVE MATERIALS
Adoption of Questions and Answers To Clar
ify and Provide a Common Interpretation
of the Uniform Guidelines on Employee
Selection Procedures, 44 Fed. Reg. 11,996
(1979)............................................................... 16, 17
OTHER MATERIALS
Herman Aguinis & Erika Harden, Will
Banding Benefit My Organization? An
Application of Multi-Attribute Utility
Analysis, in Test-Score Banding in
Human Resource Selection 193 (Herman
Aguinis ed., 2004)............................................. 8, 22
American Psychological Association, Stan
dards for Educational and Psychological
Tests (1999)...............................................................6
Winfred Arthur Jr. et al., A Meta-Analysis of
the Criterion-Related Validity of Assess
ment Center Dimensions, 56 Personnel
Psychol. 125 (2003)............................................... 31
Winfred Arthur Jr. et al., Multiple-Choice
and Constructed Response Tests of Abil
ity, 55 Personnel Psychol. 985 (2002)........... 24, 25
Walter C. Borman et al., Personnel Selection,
48 Ann. Rev. Psychol. 299 (1997)....................... 31
VI
David L. Bullins, Leading in the Gray Area,
Fire Chief (Aug. 10, 2006), at http://
firechief.com/management/bullins_gray
08102006/index.html........................................... 15
Wayne F. Cascio et al., Social and Technical
Issues in Staffing Decisions, in Test Score
Banding, in Human Resource Selection 7
(Herman Aguinis ed., 2004)...........................23, 27
Wayne F. Cascio & Herman Aguinis:
Applied Psychology in Human Resource
Majiagement (6th ed. 2005)............................20, 32
Test Development, and Use: New Twists
on Old Questions, 44 Human Res. Mgmt.
219 (2005)............................................................. 18
John F. Coleman, Incident Management for
the Street-Smart Fire Officer (2d ed. 2008)....... 11
Vincent Dunn, Command and Control of
Fires and Emergencies (1999)............................. 11
Robert D. Gatewood & Hubert S. Feild,
Human Resource Selection (5th ed. 2001)....13, 14
Barbara B. Gaugler et al., Meta-Analysis of
Assessment Center Validity, 72 J. Applied
Psychol. 493 (1987)......................................... 31, 32
Gary M. Gebhart et al., Fire Service Testing
in a Litigious Environment: A Case His
tory, 27 Pub. Personnel Mgmt. 447 (1998) .... 32-33
Irwin L. Goldstein et al., An Exploration of
the Job Analysis-Content Validity Proc
ess, in Personnel Selection in Organiza
tions 3 (Neil Schmitt & Walter C. Borman
eds., 1993)............................................................. 20
Vll
Charles D. Hale, The Assessment Center
Handbook for Police and Fire Personnel
(2d ed. 2004).....................................................30, 33
Chaitra M. Hardison & Paul R. Sackett,
Assessment Center Criterion Related Valid
ity: A Meta-Analytic Update (2004).................... 31
James R. Huck & Douglas W. Bray, Man
agement Assessment Center Evaluations
and Subsequent Job Performance of White
and, Black Females, 29 Personnel Psychol.
13 (1976)............................................................... 29
Int’l Ass’n of Fire Chiefs et al.:
Fire Officer: Principles and Practice (2006) ....15, 29
Fundamentals of Fire Fighter Skills (2004)....... 20
Anthony Kastros, Mastering the Fire Service
Assessment Center (2006).................................... 11
Richard Kolomay & Robert Hoff, Firefighter
Rescue & Survival (2003).................................... 11
Diana E. Krause et al., Incremental Validity
of Assessment Center Ratings Over Cogni
tive Ability Tests: A Study at the Execu
tive Management Level, 14 Int’l J. Selec
tion & Assessment 360 (2006)............................. 31
Phillip E. Lowry, A Survey of the Assessment
Center Process in the Public Sector, 25
Pub. Personnel Mgmt. 307 (1996)..................15, 30
Lynn Lyons Morris et al., How to Measure
Performance and Use Tests (1987)...................... 13
Matthew Murtagh, Fire Department Promo
tional Tests (1993).................................... 10, 11, 14
Vlll
James L. Outtz & Daniel A. Newman, A
Theory of Adverse Impact (manuscript on
file with author) (forthcoming in Adverse
Impact: Implications for Organizational
Staffing and High Stakes Selection, 2009)........ 25
Miguel Roig & Maryellen Reardon, A
Performance Standard for Promotions,
141 Fire Engineering 49 (1988).......................... 30
Philip L. Roth et ah, Ethnic Group Differ
ences in Cognitive Ability in Employment
and Educational Settings: A Meta-Analysis,
54 Personnel Psychol. 297 (2001)....................... 24
Chase Sargent, From Buddy to Boss: Effec
tive Fire Service Leadership (2006)............... 11, 23
Society for Industrial and Organizational
Psychology, Principles for the Validation
and, Use of Personnel Selection Procedures
(4th ed. 2003), at http://www.si0p.0rg/_
Principles/principles.pdf....................1, 2, 6, 7, 8, 9,
10, 13, 14, 15, 18, 27
Task Force on Assessment Center Guide
lines, Guidelines and Ethical Considera
tions for Assessment Center Operations,
18 Pub. Personnel Mgmt. 457 (1989)............ 29, 30
Ian Taylor, A Practical Guide to Assessment
Centres and Selection Methods (2007)................ 33
Michael A. Terpak, Assessment Center:
Strategy and Tactics (2008)................................ 15
George C. Thornton & Deborah E. Rupp,
Assessment Centers in Human Resource
Management (2006).............................................. 32
http://www.si0p.0rg/_
IX
Carl F. Weaver, Can Assessment Centers
Eliminate Challenges to the Promotional
Process? (July 2000), at http://www.usfa.
dhs.gov/pdf/efop/efo24862.pdf............................. 30
Samuel J. Yeager, Use of Assessment Centers
by Metropolitan Fire Departments in
North America, 15 Pub. Personnel Mgmt.
51 (1986)................................................................ 30
http://www.usfa
INTEREST OF AMICI CURIAE1
Amici are experts in the field of industrial-
organizational psychology and are elected fellows
of the Society for Industrial and Organizational
Psychology (“SIOP”), the division of the American
Psychological Association that is responsible for the
establishment of scientific findings and generally ac
cepted professional practices in the field of personnel
selection. Amici also have extensive experience in
the design and validation of promotional tests for
emergency services departments, including fire and
police departments across the country. Amici have
an interest in ensuring the scientifically appropriate
choice, development, evaluation, and use of personnel
selection procedures.
Professor Herman Aguinis is the Mehalchin Term
Professor of Management at the University of Colo
rado Denver Business School. In addition to his
extensive scholarship and research on personnel
selection, Professor Aguinis served on the Advisory
Panel on the most recent revision of the Principles
for the Validation and Use of Personnel Selection
Procedures (4th ed. 2003) (“Principles”), at http://
www.siop.org/_Principles/principles.pdf.
Professor Wayne Cascio holds the Robert H. Rey
nolds Chair in Global Leadership at the University of
Colorado Denver Business School. He has published
1 Pursuant to Supreme Court Rule 37.6, counsel for amici
represents that it authored this brief in its entirety and that
none of the parties or their counsel, nor any other person or
entity other than amici or their counsel, made a monetary
contribution intended to fund the preparation or submission of
this brief. Counsel for amici also represents that all parties
have consented to the filing of this brief in the form of blanket
consent letters filed with the Clerk.
http://www.siop.org/_Principles/principles.pdf
2
and testified extensively on issues relating to fire
fighter promotion. He served as President of SIOP
from 1992 to 1993.
Professor Irwin Goldstein is currently Senior Vice
Chancellor for Academic Affairs in the University
System of Maryland. From 1991 to 2004, he served
as Professor and Dean of the College of Behavioral
and Social Sciences at the University of Maryland
College Park. He has published extensively on vali
dation, job selection, and training. He also served as
President of SIOP from 1985 to 1986.
Dr. James Outtz, Ph.D., has more than 30 years’
experience in the design and validation of personnel
selection procedures. In 2002-2003, Dr. Outtz served
on the Ad Hoc Committee that oversaw the 2003
revision of the Principles, the official policy statement
of SIOP. Dr. Outtz also has served as a consultant
for the City of Bridgeport, Connecticut, in designing
the city’s promotional examinations for fire lieuten
ant and captain positions.
Professor Sheldon Zedeck is Professor of Psychol
ogy at the University of California at Berkeley and
Vice Provost for Academic Affairs and Faculty Wel
fare. Professor Zedeck’s research and writing focuses
on employment selection and validation models. Pro
fessor Zedeck served on the Ad Hoc Committee on
the 2003 revision of the Principles and as President
of SIOP from 1986 to 1987.
3
SUMMARY OF ARGUMENT
The City of New Haven (“City”), acting through
its Civil Service Board (“Board”), reasonably declined
to certify the results of the 2003 New Haven Fire
Department (“NHFD”) promotional examinations for
captain and lieutenant because the validity of the
tests could not have been substantiated under ac
cepted scientific principles in the field of industrial-
organizational (“I/O”) psychology and applicable legal
standards. Based on their expertise in the field of
I/O psychology and their experience in employment
test design, amici have identified at least four serious
flaws in the tests that undermined their validity:
(1) their admitted failure to measure critical qualifi
cations for the job of a fire company officer; (2) the
arbitrary, scientifically unsubstantiated weighting of
the multiple-choice and oral components of the test
battery; (3) the lack of input from local subject-
matter experts regarding whether the tests matched
the content of the jobs; and (4) use of strict rank
ordering without sufficient justification. Members of
the Board thus reasonably concluded that it was
unlikely, if not impossible, that the tests could be
demonstrated to be valid.
Petitioners’ claim that the decision not to certify
the NHFD test results constituted a deviation from
merit-based selection is inaccurate because of these
clear and serious flaws in the design of the tests and
the proposed use of the test scores. To the contrary,
due to those flaws, which are apparent from the
record below, there is no basis to conclude that certi
fication of the test results would have led to the
promotion of the most qualified candidates.
Moreover, several of the tests’ flaws - namely, the
unsubstantiated weighting of the test components
4
and use of strict rank-ordering - contributed to their
adverse impact on racial subgroups, specifically
African-American and Hispanic candidates. Thus,
the tests not only failed to support an inference of
superior job qualification from higher scores, but also
simultaneously introduced a likely source of bias
against minority candidates. In a predominantly
minority city such as New Haven, bias against
minority promotion exacerbates the public-safety
risks of flawed tests by undermining the perception
of fairness and cohesiveness among firefighters and
by impairing the overall public effectiveness of the
department. Thus, although the City had already
approved and administered the test, the City appro
priately concluded that costs to the NHFD and the
local community would outweigh any potential bene
fit gained from certifying the tests.
The City could (and properly should) have adopted
an alternative method of promotional selection to
reduce the tests’ adverse impact. At the very least,
the City should have used scientifically substanti
ated weighting for the test components, which would
likely have led to a reduced emphasis on the written
component. Also, it could have discarded rank
ordering in favor of a “banding” approach, which
treats candidates as equally qualified if their scores
lie within a certain range reflecting the test’s error of
measurement. Rather than focus on the tests them
selves, banding focuses on how test scores are used
to make hiring decisions. Banding has been demon
strated, in some circumstances, to produce modest
reductions in adverse impact without compromising
the validity of the testing procedures in question.
Moreover, the City could have adopted other options
such as an “assessment center” that included behav
5
ioral simulations of critical job components as part
of the exams. Over the last 30 years, I/O psychology
research has robustly confirmed that a properly vali
dated assessment center can substantially reduce the
adverse impact against minority candidates in the
context of jobs such as firefighting.
In sum, given the flaws in the NHFD exams,
which exacerbated the adverse impact on minority
candidates, and given the availability of proven al
ternative selection methods, the City had reasonable,
race-neutral grounds for deciding against certifying
the results of the flawed tests. Indeed, under Title
VII of the Civil Rights Act of 1964, it had no choice
but to decline to certify the results. Petitioners’
attempt to turn a decision compelled by Title VII into
a violation of Title VII on the basis of mere insinua
tions about the Board’s supposed racial biases turns
the statute on its head and should be rejected.
ARGUMENT
I. THE 2003 EXAMINATIONS CONTAINED
FATAL FLAWS THAT UNDERMINED
THEIR ABILITY TO SELECT THE MOST
QUALIFIED CANDIDATES
A critical and oft-repeated premise of petitioners’
brief is that the 2003 NHFD examinations were
“composed and validated” based on the Uniform Guide
lines on Employee Selection Procedures {“Uniform
Guidelines”), see 29 C.F.R. pt. 1607 (EEOC), and thus
(1) actually “served their purpose of screening out the
unqualified and identifying the most qualified,” and
(2) would have withstood scrutiny in a disparate-
impact lawsuit brought by minority officer candi
dates. Pet. Br. 7, 35; see also id. at i (assuming in
the question presented that the exams were “content-
valid”). However, petitioners’ premise lacks scientific
6
foundation. Under applicable I/O psychology princi
ples and legal standards, there was no reasonable
likelihood that the City could have demonstrated that
the NHFD promotional examinations were valid.
A. Proper Validation Of Employment Tests
According To Established Standards Is
Essential To Ensuring Fair, Merit-Based
Selection
A central objective of the field of I/O psychology is
to develop generally accepted professional standards
for the design and use of personnel-selection proce
dures, including employment tests, based on scien
tific data and analysis. The federal government
has issued the Uniform Guidelines, which establish
a federal standard for employment testing, see 29
C.F.R. § 1607.1(A), and “are intended to be consistent
with generally accepted professional standards for
evaluating standardized tests and other selection
procedures,” including the American Psychological
Association’s Standards for Educational and Psycho
logical Tests (“APA Standards”), id. § 1607.5(C);
see Griggs v. Duke Power Co., 401 U.S. 424, 433-
34 (1971) (holding that the Uniform Guidelines are
“entitled to great deference”). The Principles are
designed to be consistent with the APA Standards
and represent the official policy statement of the
Society for Industrial and Organizational Psychology
(“SIOP”), a division of the APA, regarding current I/O
psychology standards for personnel selection. See
Principles at ii, 1.
“Validity is the most important consideration in
developing and evaluating selection procedures.” Id.
at 4. “Validation” is the process of confirming that
an employment test is “predictive of or significantly
correlated with important elements of job perform
7
ance.” 29 C.F.R. § 1607.5(B).2 In this case, for the
reasons set forth below, at least four aspects of the
NHFD promotional tests were flawed or arbitrary,
and thus made it all but impossible for the City to
show that the tests were valid.
The lack of evidence supporting the validity of the
NFIFD tests undermines their value as a selection
tool. Proper validation of an employment test is criti
cal to merit-based personnel selection because it en
sures that there is a scientific basis for inferring that
a higher test score corresponds to superior job skills
or performance. See Principles at 4 (defining validity
as “the degree to which accumulated evidence and
theory support specific interpretations of test scores
entailed by proposed uses of a test”); 29 C.F.R.
§ 1607.1(B). Moreover, proper validation promotes
fairness and equal opportunity by ensuring that
any disparate impact on subgroups is traceable to
job requirements rather than contamination or bias
in testing methodology. See 29 C.F.R. § 1607.3(A)
2 The Principles and the Uniform Guidelines identify three
types of evidence that support an inference of validity:
“content,” “criterion,” and “construct.” See Principles at 4-5; 29
C.F.R. § 1607.14(B)-(D). The three are not mutually exclusive
categories, but rather refer to evidence supporting the inference
that a test identifies those who are qualified to do the job. See
Principles at 4. Content validity supports such an inference by
showing that the test’s content matches the essential content
of the job, while criterion validity supports the inference by
showing that test results successfully predict job performance.
Construct validity is more abstract and is shown by evidence
that the test measures the degree to which candidates have
characteristics, or traits, that have been determined to lead to
successful job performance. The flaws described in this brief
undermine the content validity of the NHFD exams, which is
the only evidence of validity asserted by petitioners and the
only feasible type of validation in these circumstances.
8
(providing that a procedure that has an adverse
impact “will be considered to be discriminatory . . .
unless the procedure has been validated in accor
dance with these guidelines”); Principles at 7.
Validation is especially critical in the context
of promotional exams for important public-safety
leadership positions, such as fire company officers.
Ensuring the selection of the most qualified fire
officers saves lives. Accordingly, all state and local
governments have a strong, race-neutral interest in
declining to use promotional tests for fire officers
that are shown to lack validity. Indeed, a legal re
gime in which state and local governments are ham
strung into implementing the results of such tests
threatens the lives of the citizens they are committed
to protect.3 In this case, the City acted reasonably
by declining to certify the NHFD promotional test
results because the tests were fatally flawed.
B. Contrary To Petitioners’ Premise, The
2003 NHFD Examinations Could Not
Have Been Validated Under Established
Principles
Petitioners’ premise that the NHFD tests were
properly validated rests on insufficient factual
evidence, consisting merely of the fact that the test
3 While critical, the validity of a test is only one of several
issues that may legitimately be taken into account in making a
decision on whether and how to use a test. The decision to use
a particular test, or to use a test in a particular way, is made
within a broader social and organizational context and appro
priately takes into account possible “side effects” that could be
costly or detrimental for the organization and the people served
by the organization. See Herman Aguinis & Erika Harden, Will
Banding Benefit My Organization? An Application of Multi-
Attribute Utility Analysis, in Test-Score Banding in Human
Resource Selection 193, 196-211 (Herman Aguinis ed., 2004).
9
designer, I/O Solutions (“IOS”), conducted what has
been described as a job analysis, using “question
naires, interviews, and ride-along exercises with in
cumbents to identify the importance and frequency of
essential job tasks,” and then had only one individ
ual, a fire battalion chief in Georgia, review the tests.
Pet. Br. 7; see also id. at 52. Petitioners also claim
that the test designer provided “oral assurance of
validity” and that the NHFD’s Chief and Assistant
Chief “thought the exams were fair and valid.” Id. at
35.
Contrary to petitioners’ assertions, conducting a
job analysis - while in most cases necessary to a
test’s validity - is not alone sufficient to demonstrate
validity. See, e.g., 29 C.F.R. § 1607.14(B). Moreover,
the Uniform Guidelines specifically reject the use
of “casual reports of |a test’s] validity,” such as
“testimonial statements and credentials of” IOS, and
“non-empirical or anecdotal accounts” such as the
comments of the NHFD’s Chief and Assistant Chief.
Id. § 1607.9. What the Uniform Guidelines and the
Principles require is a rigorous analysis of the design
and proposed use of the exam according to accepted
principles of I/O psychology.4
Judged against the proper standards, there was no
reasonable likelihood that the examinations adminis
tered by the NHFD could have been demonstrated
to be valid, after the fact, according to generally
accepted strategies for validation.5
4 The fact that IOS was purportedly prepared to issue a
validation study does not prove the tests were valid because the
test-design process that the study would have described was
fatally flawed.
5 Petitioners’ contention that the City avoided a full-fledged
analysis of the validity of the examinations because it would
10
1. The Test Designer Conceded That the
Exams Did Not Attempt To Measure
Command Presence, a Critical Job
Attribute
It is a fundamental precept of personnel selection
that an employment test should be constructed to
measure important knowledge, skills, abilities, and
other personal characteristics (“KSAOs”) needed for
the job. See 29 C.F.R. §§ 1607.14(B)(3), 1607.14(C)(4).
The omission from the testing domain of a KSAO
that is an important job prerequisite - known in I/O
psychology as “criterion deficiency” - vitiates the en
tire justification for the employment test, which is to
select individuals accurately based on their capacity
to perform the job in question. A test that makes
no attempt to measure one or more critical KSAOs
cannot be validated under established standards.
See, e.g., Principles at 23; see also Firefighters Inst,
for Racial Equality v. City of St. Louis, 549 F.2d 506,
512 (8th Cir. 1977) (UFIRE I ”) (validity “requires that
an important and distinguishing attribute be tested
in some manner to find the best qualified appli
cants”).
As the City’s then-corporate counsel, Thomas Ude,
recognized, the distinguishing feature of the job of a
fire officer, as opposed to an entry-level firefighter,
is responsibility for supervising and leading other
firefighters in the line of duty. See JA138-39; see
also Matthew Murtagh, Fire Department Promotional
Tests 152 (1993) (“[CJompany officers, lieutenants and
have been required to certify the test results is thus unsupport-
able. See U.S. Br. 19 (“An employer acts reasonably in not in
curring [the burdens of a validity study] when it has significant
questions concerning a test’s job-relatedness or reasonably be
lieves that better alternatives to the test may exist.”).
11
captains, are the primary supervisors.”); Anthony
Kastros, Mastering the Fire Service Assessment
Center 45 (2006). Leadership in emergency-response
crises requires expertise in fire-management tech
niques and sound judgment about life-and-death
decisions. Moreover, critically, it also requires a
steady “presence of command” so that the unit will
follow orders and respond correctly to fire conditions.
See, e.g., Richard Kolomay & Robert Hoff, Firefighter
Rescue & Survival 5 (2003). Command presence
requires an officer on the scene of a fire to act deci
sively, to communicate orders clearly and thoroughly
to personnel on the scene, and to maintain a sense of
confidence and calm even in the midst of intense
anxiety, confusion, and panic. See id. at 5-13. Com
mand presence generates respect for the officer among
subordinates and is thus essential to order and disci
pline within the unit. See Murtagh at 152.
Simply put, command presence is a hallmark of a
successful fire officer. See, e.g'., Chase Sargent, From
Buddy to Boss: Effective Fire Service Leadership 21
(2006) (“No individual leader ever forgets the first
time that their command presence was put to the
test.”). Virtually all studies of fire management em
phasize that command presence is vital to the safety
of firefighters at the scene and to the successful ac
complishment of the firefighting mission and the
safety of the public. See, e.g., John F. Coleman, Inci
dent Management for the Street-Smart Fire Officer
21-26 (2d ed. 2008); Kastros at 45; Vincent Dunn,
Command and Control of Fires and Emergencies 1-6
(1999).
Here, the developer of the NHFD promotional
exams, IOS, admitted that those exams were not
designed to measure “command presence.” Pet. App.
12
738a (Legel Dep.) (“Command presence usually
doesn’t come up as one of the skills and abilities we
even try to assess.”).6 A high test score thus could
not support an inference that the candidate would be
a good commander in the line of duty; conversely,
those candidates with strong command attributes
were never given an opportunity to demonstrate
them. Given the importance of command presence to
the job of a fire officer, as the City recognized, this
failure alone rendered the tests deficient. See JA139
(testimony of Mr. Ude that the “goal of the test is
to decide who is going to be a good supervisor ulti
mately, not who is going to be a good test-taker”).
In FIRE I, the Eighth Circuit recognized, in a simi
lar situation, that the St. Louis fire captain’s exam
contained the “fatal flaw” of failing to test for “super
visory ability.” 549 F.2d at 511. Because “super
visory ability” was a central requirement to the job of
a fire captain, that failure precluded validation of the
tests under the Uniform Guidelines. Id. The admit
ted failure of IOS to test command presence, a key
attribute for any supervisory fire officer, would have
led to the same result in this case.
2. The Weighting of the Multiple-Choice
and Oral Interview Portions of the
Exams Was Arbitrary and Could Not
Be Validated
Even putting aside the failure of the NHFD exams
to measure a critical aspect of the fire officer’s
responsibilities, the tests were seriously flawed for a
second reason, stemming from the imposition of a
6 However, IOS’s representative, Chad Legel, conceded that
“command presence” could be measured through the use of an
assessment center. See infra p. 31.
13
predetermined 60/40 weighting for the written and
oral interview components of the tests with no evi
dence that those weights matched the content of the
jobs. Scientific principles of employment-test valida
tion require not only that tests reflect the important
KSAOs of the job, but also that the results from
those tests be weighted in proportion to their job
importance. Indeed, even assuming that a test meas
ured the full range of important KSAOs, which the
NHFD tests did not, see supra Part I.B.l, a test that
gives inappropriate weight to some KSAOs over
others could not have been shown accurately to
select the candidates who are the most qualified for
the job.7 Numerous federal courts have likewise held
that tests that measure relevant job skills without
appropriate consideration for, and weighting of, their
relative importance cannot properly be validated.8
7 See, e.g., Principles at 23-25 (explaining that selection pro
cedures should adequately cover the requisite KSAOs and that
the sample of KSAOs tested should be rationally related to the
KSAOs needed to perform the work); Lynn Lyons Morris et al.,
How to Measure Performance and Use Tests 99 (1987) (explain
ing that a content-valid test should include a representative
sample of categories of KSAOs and “give emphasis to each
category according to its importance”); Robert D. Gatewood &
Hubert S. Feild, Human Resource Selection 178-80 (5th ed.
2001) (“For our measures to have content validity, . . . their
content [must] representatively sample[] the content of [the
relevant performance] domains.”).
8 See, e.g., Isabel v. City of Memphis, No. 01-2533 ML/BRE,
2003 WL 23849732, at *5 (W.D. Tenn. Feb. 21, 2003) (“[T]he
test developer must demonstrate that those tests utilized in the
selection system appropriately weigh the [KSAOs] to the same
extent they are required on the job.”), a ff’d, 404 F.3d 404
(6th Cir. 2005); Pina v. City of East Providence, 492 F. Supp.
1240, 1246 (D.R.I. 1980) (invalidating a ranking system for fire
fighters that gave equal weight to written and physical compo
nents of the exam, on the ground that the physical component
14
The NHFD exams in this case failed that basic
principle because the predetermined, arbitrary 60/40
weighting used to calculate the candidates’ combined
scores was in no way linked to the relative impor
tance of “work behaviors, activities, and/or worker
KSAOs,” as required for validation. Principles at 25.
The 60/40 weighting was determined in advance by
the City’s collective bargaining agreement with the
local firefighters’ union. See Pet. App. 606a (Legel
Dep.). While it is not uncommon for municipal gov
ernments such as the City to enter into labor or other
agreements that provide for a specified weighting
of test components, such provisions undermine the
validity of the resulting tests unless measures are
taken by the test designer to account for the preset
weighting.
Here, IOS concededly made no effort to establish
that the 60/40 weighting was appropriate for the
tests it designed. IOS should have used established
methods to calculate whether, in light of the manda
tory 60/40 weighting, the test components measured
the job-relevant KSAOs in proportion to their rela
tive importance. See, e.g., Murtagh at 161; Gatewood
& Feild at 178. IOS apparently did not do so; instead,
it merely assessed whether the test questions were
related to relevant aspects of the job, with no regard
to whether the items included on the test proportion
ally measured the critical aspects of the overall job.
“appear[ed] to have a greater relationship to job performance
as a firefighter”); Jones v. New York City Human Res. Admin.,
391 F. Supp. 1064, 1079 n.15, 1081 (S.D.N.Y. 1975) (calling it a
“serious defect” not to examine “how or why the skills in the test
plan were weighted as they were,” resulting in a test that could
not be validated; for example, numerous questions related to a
skill, supervision, that only 60%-65% of employees in the posi
tion actually performed), aff’d, 528 F.2d 696 (2d Cir. 1976).
15
See Pet. App. 634a (Legel Dep.). IOS’s failure to take
that step resulted in tests that, absent sheer luck,
could not have resulted in adequate validity evidence
under the Principles or the Uniform Guidelines.
Moreover, there is no indication that the 60/40
weighting at issue in this case, which gave predomi
nance to the multiple-choice component of the exams,
was appropriate for the relevant job. It is well-
recognized by I/O psychologists and firefighters alike
that written, pencil-and-paper tests, while able to
measure certain cognitive abilities (e.g., reading and
memorization) and factual knowledge, do not meas
ure other skills and abilities critical to being an effec
tive fire officer as well as alternative methods of test
ing do. See, e.g., Michael A. Terpak, Assessment Center:
Strategy and Tactics 1 (2008) (multiple-choice exams
are “known to be poor at measuring the knowledge
and abilities of the candidate, most notably that of
a fire officer”); Int’l Ass’n of Fire Chiefs et al., Fire
Officer: Principles and Practice 28 (2006) (describing
the criticism of written tests as producing firefighters
who are “[b]ook smart, street dumb”); see also David
L. Bullins, Leading in the Gray Area, Fire Chief
(Aug. 10, 2006) (“Good leadership is not a matter of
decisions made in black and white; it is a matter of
the decisions that must be made in shades of gray.”),
at http://firechief.com/management/bullins_gray08
102006/index.html. Although a written component
often properly comprises part of the overall assess
ment procedure for fire officers, a weighting of 60%
is significantly above what would be expected given
the requirements of the positions. See Phillip E.
Lowry, A Survey of the Assessment Center Process in
the Public Sector, 25 Pub. Personnel Mgmt. 307, 309
(1996) (survey finding that the median weight given
http://firechief.com/management/bullins_gray08
16
to written portion of test for fire and police depart
ments was 30%); infra p. 26 (describing weights used
by neighboring Bridgeport).
The Uniform Guidelines and the federal courts
have similarly recognized that written tests do not
correspond well to the skills and abilities actually
required for the job of a fire officer and are thus poor
predictors of which candidates will make successful
fire lieutenants and captains. The EEOC’s interpre
tive guidance on the Uniform Guidelines9 states that
“ [p]aper-and-pencil tests of . . . ability to function
properly under danger (e.g., firefighters) generally
are not close enough approximations of work behaviors
to show content validity.” Questions and Answers
No. 78, 44 Fed. Reg. at 12,007.
The Eighth and Eleventh Circuits have reached the
same conclusion. In FIRE II, the Eighth Circuit
rejected the validity of a multiple-choice test for
promotion to fire captain, on the ground that “ [t]he
captain’s job does not depend on the efficient exercise
of extensive reading or writing skills, the comprehen
sion of the peculiar logic of multiple choice questions,
or excellence in any of the other skills associated
with outstanding performance on a written multiple
choice test.” 616 F.2d at 357. ‘“ Where the content
9 The EEOC’s interpretive “questions and answers” were
adopted by the four agencies that promulgated the Uniform
Guidelines in order to “interpret and clarify, but not to modify,”
those Guidelines. Adoption of Questions and Answers To Clar
ify and Provide a Common Interpretation of the Uniform Guide
lines on Employee Selection Procedures, 44 Fed. Reg. 11,996,
11,996 (1979) (“Questions and Answers”). Like the Uniform
Guidelines themselves, the agency interpretations have been
given great deference by the courts. See, e.g., Firefighters Inst,
for Racial Equality v. City of St. Louis, 616 F.2d 350, 358 n.15
(8th Cir. 1980) (“FIRE IF).
17
and context of the selection procedures are unlike
those of the job, as, for example, in many paper-and-
pencil job knowledge tests, it is difficult to infer an
association between levels of performance on the pro
cedure and on the job.’ ” Id. at 358 (quoting Ques
tions and Answers No. 62, 44 Fed. Reg. at 12,005).
Accordingly, “[bjecause of the dissimilarity between
the work situation and the multiple choice proce
dure,” the court found that “greater evidence of valid
ity [wajs required.” Id. at 357.
In Nash v. Consolidated City of Jacksonville, 837
F.2d 1534 (11th Cir. 1988), vacated and remanded,
490 U.S. 1103 (1989), opinion reinstated on remand,
905 F.2d 355 (11th Cir. 1990), the Eleventh Circuit
likewise rejected the use of a written test to deter
mine eligibility for promotion to the position of fire
lieutenant. The court rejected the use of the test
even though the test questions “never made their
way into evidence” and even though the expert who
was challenging the use of the test on behalf of the
firefighter had never seen the questions. Id. at 1536.
As the court explained, “[a]n officer’s job in a fire
department involves ‘complex behaviors, good inter
personal skills, the ability to make decisions under
tremendous pressure, and a host of other abilities —
none of which is easily measured by a written, multi
ple choice test.’ ” Id. at 1538 (quoting FIRE II, 616
F.2d at 359).
IOS exacerbated the problem of imbalance in its
response to another predetermined feature of the
NHFD exams - the 70% cutoff score mandated by the
City’s civil service rules. Like the 60/40 weighting,
the 70% cutoff score was arbitrary and not scientifi
cally validated. See Pet. App. 697a-698a (concession
by Mr. Legel that IOS was unable to validate the
18
70% cutoff score).10 IOS not only “went ahead and
used [the] seventy percent,” but also decided to make
the written component of the test “more difficult” in
an effort to screen out “a fair amount more number
[sic] of people . . . than what other tests have done
in the past.” Id. at 698a-699a (Legel Dep.). Not only
did this admittedly worsen the adverse impact of the
tests on minority candidates, see infra Part II.A, but
it also skewed the focus of the test even more heavily
in the direction of the limited and more attenuated
set of knowledge and abilities that are measured by
a multiple-choice test, by giving that component un
justifiably greater weight in the composite scores.
That, in turn, further reduced the likelihood that the
exams could have been shown to be valid.11
10 Arbitrary cutoff scores alone can undermine a test’s validity.
See, e.g., Isabel v. City of Memphis, 404 F.3d 404, 413 (6th Cir.
2005) (stating that, "[t]o validate a cutoff score, the inference
must be drawn that the cutoff score measures minimal qualifi
cations”); accord Chisholm v. United States Postal Seru., 516 F.
Supp. 810, 832, 838 (W.D.N.C. 1980), a ff’d in relevant part, 665
F.2d 482 (4th Cir. 1981). The Uniform Guidelines and the Prin
ciples clearly require cutoff scores, if they are used, to be based
on scientifically accepted principles. See 29 C.F.R. § 1607.5(H)
(“Where cutoff scores are used, they should normally be set so
as to be reasonable and consistent with normal expectations of
acceptable proficiency within the work force.”); Principles at 47
(explaining that “[professional judgment is necessary in setting
any cutoff score” in light of factors including “the [KSAOs]
required by the work”); see also Wayne F. Cascio & Herman
Aguinis, Test Development and Use: New Twists on Old
Questions, 44 Human Res. Mgmt. 219, 227 (2005) (discussing
appropriate process for calibrating cutoff score to minimum
proficiency for the job).
11 In fact, IOS acknowledged that even the oral portion of the
test was designed, at least “to a small degree,” to test factual
knowledge, thus further skewing the balance of the test. Pet.
App. 709a (Legel Dep.).
19
Under established principles in the field of I/O
psychology and longstanding legal authorities, the
NHFD exams were deficient because of IOS’s failure
to substantiate the predetermined 60/40 weighting
before administering the test and because of the re
sulting overemphasis given to the written, multiple-
choice component of the exams, which has been dem
onstrated to be a relatively poor method for measur
ing whether a candidate has the KSAOs needed to be
a fire officer.
3. Flaws in the Exam-Development Proc
ess Contributed to the Lack of Validity
Evidence of the NHFD Tests
The process used to develop and finalize the tests
further undermined the tests’ validity as a method
for identifying the individuals best suited for promo
tion. IOS personnel wrote the test questions based
on the information developed from job analysis ques
tionnaires given to incumbent New Haven fire offi
cers and “national texts” on firefighting. C.A. App.
478 (Legel Dep.). However, IOS personnel were not
themselves subject-matter experts on the job of a fire
company officer, nor were the “national texts” they
used tailored to the NHFD’s specific practices or local
conditions in New Haven. See id. (“So depending on
the way that those [New Haven] City employees are
trained to do their specific job, it may not always jibe
with the way the textbook says to do it.”); see also
Pet. App. 520a-521a (“Fire fighting is different on the
East Coast than it is on [sic] West Coast or in the
Midwest.”).
Accordingly, as IOS acknowledged, “ [standard
practice” in the field required that the tests be re
viewed by “a panel of subject matter experts internal
to New Haven, for instance, incumbent lieutenants,
20
captains, battalion chiefs, [assistant] chiefs, and the
like to actually gain [sicj their opinion about how
relevant the items were and whether or not they
were consistent with best practice in New Haven.”
Id. at 635a (Legel Dep.) (emphasis added). Review
by multiple persons with specific expertise about the
NHFD was, as IOS recognized, important to verify
that the questions accurately reflected important
KSAOs of the job and, especially, local differences be
tween NHFD’s practices and procedures and national
firefighting standards. See Wayne F. Cascio & Her
man Aguinis, Applied Psychology in Human Resource
Management 158-59 (6th ed. 2005) (documenting the
need for subject-matter experts to “confirm[] the
fairness of sampling and scoring procedures” and
to evaluate “overlap between the test and the job
performance domain”); Irwin L. Goldstein et al.,
An Exploration of the Job Analysis-Content Validity
Process, in Personnel Selection in Organizations 3,
20-21 (Neil Schmitt & Walter C. Borman eds., 1993);
Int’l Ass’n of Fire Chiefs et al., Fundamentals of Fire
Fighter Shills 103, 431, 663 (2004) (emphasizing that
firefighters need to become “intimately familiar” with
local procedures and local differences affecting fire
fighting such as architectural styles).
Rather than follow this admittedly standard proce
dure, IOS hired a single individual, a battalion chief
in a fire department in Georgia, to review the tests
for the job-relatedness of their content. See Pet. App.
635a-636a (Legel Dep.). Unsurprisingly, due to the
failure to conduct a proper review by multiple subject-
matter experts on local practice, IOS admitted that
some of the items on the tests were “irrelevant for
the City because you’re testing them on a knowledge
base that while supported by a national textbook,
21
wouldn’t be supported by their own standard operat
ing procedures.” C.A. App. 482 (Legel Dep.). For
example, the lieutenants’ test included a question
from a New York-based textbook about whether fire
equipment should be parked uptown, downtown, or
underground when arriving at a fire. JA48. The
question was meaningless because New Haven has
no “uptown” or “downtown.”
By IOS’s admission, and under applicable I/O psy
chology standards, review of the test items by local
subject-matter experts was critical to ensuring that
the test components corresponded to the important
job KSAOs. The failure to do so further undermined
the validity of the NHFD exams as indicators of
which candidates would have made successful NHFD
fire lieutenants or captains.
4. The NHFD Tests Could Not Have Been
Validated for Strict Rank-Ordering of
Candidates
Under accepted standards, not only must an
exam’s content be properly validated, but the use of
the scores also must be scientifically justified. As
the Uniform Guidelines state, “the use of a selection
procedure on a pass/fail (screening) basis may be in
sufficient to support the use of the same procedure on
a ranking basis under these guidelines.” 29 C.F.R.
§ 1607.5(G). Under the Uniform Guidelines, a strict
rank-ordering system such as the one imposed by the
City - i.e., treating a candidate as “better qualified”
based on even a slight incremental difference in score
- is only appropriate upon a scientific showing “that
a higher score on a content valid selection procedure
is likely to result in better job performance.” Id.
§ 1607.14(C)(9). As the Second Circuit held in Guar
dians Association of New York City Police Department
v. Civil Service Commission, 630 F.2d 79 (2d Cir.
1980), “[permissible use of rank-ordering requires a
demonstration of such substantial test validity that
it is reasonable to expect one- or two-point differ
ences in scores to reflect differences in job perform
ance.” Id. at 100-01 (rejecting the validity of rank
ordering); see also FIRE II, 616 F.2d at 358.
In this case, the NHFD tests could not have
supported the use of a strict rank-ordering procedure
for promotional selection. Indeed, the tests were
designed and administered at a time when New
Haven’s “Rule of Three” had been interpreted to per
mit rounding of scores to the nearest integer, rather
than strict rank-ordering based on differences of
fractions of a point. See C.A. App. 1701; Kelly v. City
of New Haven, 881 A.2d 978, 993-94 (Conn. 2005).
Use of strict rank-ordering for a test absent evidence
demonstrating that it was valid for that purpose
cannot be justified. See Pina, 492 F. Supp. at 1246
(invalidating test where “[tjhere [wa]s no evidence
which even remotely suggested] that the order of
ranking established] that any applicant [wa]s better
qualified than any other”).
Moreover, as explained above, the serious flaws
in the NHFD tests severely undermined the overall
validity of the exams and certainly foreclosed any
conclusion that the exams were of such “substantial
. . . validity” as to justify the additional step of
making promotional decisions strictly based on small
score differences. Guardians Ass’n, 630 F.2d at 100-
01. Making fine judgments based on small differ
ences on fundamentally flawed tests is scientifically
unsupportable. See, e.g., Aguinis & Harden at 193.
“[U]se of an exam to rank applicants, when the
exam cannot predict applicants’ relative merits, offers
23
nothing but a false sense of assurance based on a
misplaced belief that some criterion - no matter how
arbitrary - is better than none.” Ensley Branch,
NAACP v. Seibels, 31 F.3d 1548, 1574 (11th Cir. 1994).
Tests that transform differences that are as likely
to be a product of measurement error or flawed test
design as they are a reflection of superior qualifica
tions create nothing but the illusion of meritocracy.
That illusion creates not only a false sense of indi
vidual entitlement to jobs and promotions, but also a
real public danger in the context of positions such as
fire and police officers. When the safety and lives of
citizens are at stake, it is particularly critical for
public employers to have the leeway to ensure that
the tests they deploy accurately identify those candi
dates who are most qualified for these important
jobs.
II. THE FLAWS IN THE NHFD PROMO
TIONAL EXAMS EXACERBATED THEIR
ADVERSE IMPACT ON MINORITY CAN
DIDATES
Unjustified exclusion of minority candidates
through scientifically flawed testing procedures has
significant social costs. Especially in a city like New
Haven, racial diversity has significant benefits to the
ability of the public sector to provide needed services
to the community and to protect the public safety.
See, e.g., Wayne F. Cascio et al., Social and Technical
Issues in Staffing Decisions, in Test Score Banding,
in Human Resource Selection 7, 9 (Herman Aguinis
ed., 2004). An all-white officer corps in the NHFD
will be less effective than one that is more racially
diverse. See id.; see also Sargent at 188 (noting that
having a Hispanic firefighter fluent in Spanish “can
be a life saver”).
24
In this case, the flaws in the NHFD promotional
exams not only undermined their validity, but also
unjustifiably increased their adverse impact on mi
nority candidates. In particular, two features of the
tests contributed to the conceded adverse impact on
African-American and Hispanic examinees. Tests
that eliminated these features were available to the
City as “less discriminatory alternatives” under Title
VII.
A. Overweighting Of The Written, Multiple-
Choice Portion Of The Exams Increased
The Adverse Impact On Minority Candi
dates
It is well-established that minority candidates
fare less well than their Caucasian counterparts on
standardized written examinations, and especially
multiple-choice (as opposed to “write-in”) tests. See,
e.g., Winfred Arthur Jr. et al., Multiple-Choice and
Constructed Response Tests of Ability, 55 Personnel
Psychol. 985, 986 (2002); Philip L. Roth et al., Ethnic
Group Differences in Cognitive Ability in Employ
ment and Educational Settings: A Meta-Analysis, 54
Personnel Psychol. 297 (2001). Although the causes
for that widely recognized discrepancy are not fully
understood, certain features of the multiple-choice
format have been recognized to contribute to adverse
impact.
First, “[t]o the extent that [the exam’s] reading
demands are not concomitant with job demands
and/or performance, then any variance associated
with reading demands and comprehension is consid
ered to be error variance.” Arthur et al., 55 Person
nel Psychol, at 991. Some studies suggest disparities
among racial subgroups in reading comprehension,
such that using written questions and answers as
25
the sole or predominant medium for testing increases
adverse impact. See id.; James L. Outtz & Daniel
A. Newman, A Theory of Adverse Impact 12-13,
68 (manuscript on file with author) (forthcoming
in Adverse Impact: Implications for Organizational
Staffing and High Stakes Selection, 2009). Moreover,
studies suggest that racial minorities are less “test
wise” than white test-takers, and it is “widely recog
nized that performance on multiple-choice tests is
susceptible to specific test-taking strategies or test
wiseness.” Arthur et al., 55 Personnel Psychol, at
991-92. Finally, studies have found that a test-
taker’s unfavorable view of a test’s validity nega
tively influences performance, and some evidence
indicates that minority test-takers generally have a
less favorable view of traditional written tests. See
id. at 992.
Regardless of the exact cause of the disparity, it
is clear that the use of written, multiple-choice
tests beyond what is justified by the demands of
a particular job has the effect of disproportionately
excluding minority candidates without any correspond
ing increase in job performance. See, e.g., Outtz &
Newman at 33. As set forth above, the NHFD’s 60/40
weighting was arbitrary and put more emphasis on
the written, multiple-choice examination than science
and experience have shown to be warranted for the
job of a fire officer. Likewise, the response of 10S
to the 70% cutoff score contributed to the adverse
impact of the exams. By IOS’s own admission, arbi
trarily making the written portion of the tests “more
difficult” further exaggerated the importance of the
written component and thereby contributed to the
exclusion of African-American and Hispanic candi
26
dates from the promotional ranks. Pet. App. 698a-
699a (Legel Dep.).
Changing the weighting of the exams to more accu
rately reflect the content of the job almost certainly
would have reduced their adverse impact by reducing
the weight of the written component, and thus
constituted a “less discriminatory alternative” that
the City would have been obligated to use under Title
VII. Had the City given a 30% weighting to the
written component of the examination, more in line
with the nationwide norm, see supra pp. 15-16, the
tests would have had a significantly lower adverse
impact on minority candidates. See Resp. Br. 33
(“[I]f the tests were weighted 70%/30% oral/written,
then two African-Americans would have been consid
ered for lieutenant positions and one for a captain
position.”). Indeed, 20 miles down the coast from
New Haven, the fire department in Bridgeport,
Connecticut, has administered tests with less weight
given to the written component (25% for lieutenants
and 33% for captains) and achieved a significant
reduction in adverse impact relative to the NHFD
exam results. See JA64-66.12
B. Selection Of Candidates In Strict Rank
Order Also Contributed To Adverse
Impact
As discussed above, the NHFD tests were improp
erly weighted toward the written component, which
tested certain KSAOs (e.g., reading, memorization,
and factual knowledge) in disproportion to their im
12 In 2005, the Bridgeport lieutenant’s exam consisted of
written and oral components, weighted 25% and 75%, respec
tively; the weights on the captain’s exam were 33% for the
written component, 62% for the oral component, and 5% for
seniority.
27
portance relative to other important skills and abili
ties, including “command presence,” which was not
measured at all. Moreover, the tests unjustifiably
employed a strict rank-ordering system that differen
tiated among candidates based on small score differ
ences that had not been scientifically demonstrated
to be meaningful. The combination of imbalanced
weighting toward KSAOs that disproportionately
disfavor minority candidates and the selection of
candidates based strictly on rank order cemented the
disproportionate rejection of minority candidates for
promotion.
An alternative to strict rank-ordering would have
been a “banding” scoring system. In brief, banding
involves use of statistical analysis of the amount of
error in the test scores to create “bands” of scores,
the lowest of which is considered to be sufficiently
similar to the highest to warrant equal consideration
within that band. Cascio et al. at 10; see also Princi
ples at 48 (bands “take into account the imprecision
of selection procedure scores and their inferences”).
After the width of the band is established, based on a
statistical analysis of the reliability of measurement,
the user can either establish “fixed” bands, in which
the test user considers everyone within the top band
before considering anyone from the next band, or
“sliding” bands, which allows the band to “slide”
down the list once higher scorers are either chosen or
rejected. See Cascio et al. at 10-11.
The federal courts have recognized banding as
“a universal and normally an unquestioned method
of simplifying scoring by eliminating meaningless
gradations” between candidates whose scores differ by
less than the degree of measurement error. Chicago
Firefighters Local 2 v. City of Chicago, 249 F.3d 649,
28
656 (7th Cir. 2001); see also, e.g., Biondo v. City of
Chicago, 382 F.3d 680, 684 (7th Cir. 2004) (banding
“respect[s] the limits of [an] exam’s accuracy”). In
amici’s view, a banding approach would have been
a viable method to reduce the adverse impact of the
NHFD tests. However, given that the rankings
themselves were a result of flawed tests, banding
alone would not have been sufficient to achieve the
objective of selecting the most qualified individuals
for the job. See Bridgeport Guardians, Inc. v. City of
Bridgeport, 933 F.2d 1140, 1147 (2d Cir. 1991) (“The
ranking of the candidates was itself the result of the
disparate impact of the examination.”).
III. CURRENT I/O PSYCHOLOGY RESEARCH
SUPPORTS THE USE OF PROMOTIONAL
ASSESSMENT CENTERS AS A VALID AND
LESS DISCRIMINATORY ALTERNATIVE
TO TRADITIONAL TESTING METHODS
The evidence in the record clearly demonstrates
that the NHFD exams suffered from fatal design de
fects that undermined their validity and unjustifia
bly excluded a disproportionate number of minority
candidates. That alone left the City no choice but to
decline to certify the exams. In addition, the City
reasonably concluded that certification of the tests
could not be justified given the existence of alterna
tive methods of selection. One alternative before the
City was the assessment center, which, if designed
properly, would measure a broader range of KSAOs
and also be less discriminatory. See, e.g., JA96
(statement of Dr. Christopher Hornick to the Board
that assessment centers are "much more valid in
terms of identifying the best potential supervisors”);
Pet. App. 739a (Legel Dep.).
29
A. History Of The Assessment Center Model
From the 1950s to the 1980s, multiple-choice tests
were generally the only procedure used for promo
tional selection in U.S. fire departments. See Int'l
Ass’n of Fire Chiefs et al., Fire Officer: Principles and
Practice at 28. Such tests were prevalent because
they were easy and inexpensive to administer, and
seemingly “objective.” However, for the reasons dis
cussed above, such tests had the side-effect of exclud
ing a disproportionate number of minority candidates
from consideration. Beginning in the 1970s, spurred
in part by the passage of Title VII (1964), the devel
opment of the Uniform Guidelines (1966), and this
Court’s decision in Griggs (1971), employers increas
ingly began using an alternative selection method
known as the assessment center. See James R. Huck
& Douglas W. Bray, Management Assessment Center
Evaluations and Subsequent Job Performance of
White and Black Females, 29 Personnel Psychol. 13,
13-14 (1976).
An assessment center is a form of standardized
evaluation that seeks to test multiple dimensions of
job qualification through observation of job-related
exercises and other assessment techniques. See gen
erally Task Force on Assessment Center Guidelines,
Guidelines and Ethical Considerations for Assess
ment Center Operations, 18 Pub. Personnel Mgmt.
457, 460-64 (1989) (defining an assessment center).
Unlike multiple-choice exams, which evaluate KSAOs
through a single, written medium, assessment cen
ters employ multiple methods, including, prominently,
job simulations, all of which are designed to permit
more direct assessment of ability to do the job. See
id. at 461-62. Candidates’ performance on the simu
lation exercises is rated by multiple subject-matter
30
experts. See id. at 462. By observing how a par
ticipant handles the problems and challenges of the
target job (as simulated in the exercises), assessors
develop a valid picture of how that person would
perform in the target position. See Charles D. Hale,
The Assessment Center Handbook for Police and Fire
Personnel 16-52 (2d ed. 2004) (describing typical
exercises).
B. Assessment Centers Have Demonstrated
Validity In The Context Of Firefighter
Promotion
Since the 1970s, the use of assessment centers for
employee selection has increased rapidly, both in the
United States and elsewhere, and in firefighter pro
motion in particular. By 1986, 44% of fire depart
ments surveyed used assessment centers in making
promotion decisions.13 More recent surveys indicate
a usage rate of between 60% and 70%.14
Like any testing method, an assessment center
must be properly constructed so that, for example,
it measures important KSAOs of the relevant job.
After more than 30 years of use and research, how
ever, substantial agreement exists among I/O psy
chologists that properly designed assessment centers
are better predictors of job performance than other
13 See Samuel J. Yeager, Use of Assessment Centers by Met
ropolitan Fire Departments in North America, 15 Pub. Person
nel Mgmt. 51, 52-53 (1986); accord Miguel Roig & Maryellen
Reardon, A Performance Standard for Promotions, 141 Fire
Engineering 49, 49 (1988).
14 See Lowry, 25 Pub. Personnel Mgmt. at 310; Carl F.
Weaver, Can Assessment Centers Eliminate Challenges to the
Promotional Process? 13 (July 2000) (unpublished monograph),
at http://www.usfa.dhs.gov/pdf/efop/efo24862.pdf.
http://www.usfa.dhs.gov/pdf/efop/efo24862.pdf
31
forms of promotional testing.15 Today, because of
numerous studies supporting the conclusion, “the
predictive validity of [assessment centers] is now
largely assumed.” Walter C. Borman et al.. Person
nel Selection, 48 Ann. Rev. Psychol. 299, 313 (1997).
Properly designed assessment centers have incremental
predictive validity over cognitive tests “because occu
pational success is not only a function of a person’s
cognitive abilities, but also the manifestation of those
abilities in concrete observable behavior.” Diana E.
Krause et al., Incremental Validity of Assessment
Center Ratings Over Cognitive Ability Tests: A Study
at the Executive Management Level, 14 Int’l J. Selec
tion & Assessment 360, 362 (2006).
As reflected by their widespread usage by fire
departments across the country, assessment centers
are especially appropriate in the context of firefighter
promotion. Because they use multiple methods of
assessment, assessment centers are able to measure
a wider range of skills, including critical skills such
as leadership capacity, problem-solving, and “com
mand presence.” IOS’s representative, Chad Legel,
admitted that the NHFD tests failed to test for
“command presence,” and he further acknowledged
that the City “would probably be better off with an
assessment center if you cared to measure that.”
Pet. App. 738a (Legel Dep.); see also Krause et al., 14
Int’l J. Selection & Assessment at 362 (agreeing that
leadership ability is likely better assessed through an
15 See, e.g., Chaitra M. Hardison & Paul R. Sackett, Assess
ment Center Criterion Related Validity: A Meta-Analytic Update
14-20 (2004) (unpublished manuscript); Winfred Arthur Jr. et
al., A Meta-Analysis of the Criterion-Related Validity of Assess
ment Center Dimensions, 56 Personnel Psychol. 125, 145-46
(2003); Barbara B. Gaugler et al., Meta-Analysis of Assessment
Center Validity, 72 J. Applied Psychol. 493, 503 (1987).
32
assessment center than an oral interview); Gaugler
et al., 72 J. Applied Psychol, at 493 (“assessment
centers are most frequently used for assessing
managers”).
In short, the “state of the art” in the field of promo
tional testing for firefighters and the “state of the
science” in I/O psychology have evolved beyond the
outdated methods of testing used by the NHFD.
Instead, as the City was told by Dr. Hornick, see
JA96, there is now substantial agreement that a pro
fessionally validated assessment center represents a
more effective method of selecting the most qualified
fire officers.
C. Assessment Centers Have Been Proven To
Reduce Adverse Impact On Minorities
It is equally well-recognized in the research litera
ture that assessment centers reduce adverse impact
on racial minorities as compared to traditional stan
dardized tests. See, e.g., George C. Thornton &
Deborah E. Rupp, Assessment Centers in Human Re
source Management 231 (2006). “Additional research
has demonstrated that adverse impact is less of a
problem in an assessment center as compared to an
aptitude test designed to assess cognitive abilities
that are important for the successful performance of
work behaviors in professional occupations.” Cascio
& Aguinis, Applied Psychology in Human Resource
Management at 372-73.
Those scientific studies also have been borne out by
experience. An analysis of fire-personnel selection in
St. Louis in the 15 years after the FIRE II decision
found that the institution of an assessment center
selection method “achieved considerable success at
minimizing adverse impact against black candi
dates.” Gary M. Gebhart et al., Fire Service Testing
33
in a Litigious Environment: A Case History, 27 Pub.
Personnel Mgmt. 447, 453 (1998).
In sum, assessment centers are now a prevalent
feature of firefighter promotional tests across the
nation. Numerous resources exist for employers
wishing to incorporate assessment centers into their
selection procedures in accordance with accepted
scientific principles.10 The availability of the assess
ment center as an equally valid, less discriminatory
alternative provides yet another justification for the
City’s decision not to certify the results of the NHFD
promotional exams. Indeed, under Title VII, it com
pelled that decision.
* * * * *
To place this case in overall perspective, petition
ers’ lawsuit seeks to compel the City to certify the
results of tests that suffered from glaring Haws
undermining their validity, had an admitted adverse
impact on racial minorities, and could have been
replaced by readily available, equally or more valid,
and less discriminatory alternatives. From the
standpoint of accepted I/O psychology principles, there
is no justification for certifying the results of such
tests because there is no evidence they selected the
most qualified candidates, and they systematically
excluded minority candidates. Under established
legal principles, moreover, certification would have
resulted in a violation of Title VII, and the City’s
decision was thus compelled by law. Petitioners’
challenge to the City’s decision must therefore fail.
10 See generally, e.g., Ian Taylor, A Practical Guide to
Assessment Centres and Selection Methods (2007); Hale, supra.
34
CONCLUSION
The judgment of the court of appeals should be
affirmed.
M a r c h 2 5 , 2 0 0 9
Respectfully submitted,
D a v i d C. F r e d e r i c k
Counsel of Record
D e r e k T . H o
B a r r e t t C. H e s t e r
J e n n i f e r L. P e r e s i e
K e l l o g g , H u b e r , H a n s e n ,
T o d d , E v a n s & F i g e l , P.L.L.C.
1 6 1 5 M Street, N.W., Suite 4 0 0
Washington, D.C. 2 0 0 3 6
(2 0 2 ) 3 2 6 -7 9 0 0
Counsel for Industrial-
Organizational Psychologists