Ricci v DeStefano Brief Amici Curiae

Public Court Documents
March 25, 2009

Ricci v DeStefano Brief Amici Curiae preview

44 pages

Cite this item

  • Brief Collection, LDF Court Filings. Ricci v DeStefano Brief Amici Curiae, 2009. 631f712b-c29a-ee11-be36-6045bdeb8873. LDF Archives, Thurgood Marshall Institute. https://ldfrecollection.org/archives/archives-search/archives-item/2ff90145-e8a8-42a4-baab-1ee9293d5089/ricci-v-destefano-brief-amici-curiae. Accessed April 26, 2025.

    Copied!

    Nos. 07-1428 & 08-328

In  T he

S u p r e m e  C o u r t  of tin' f H n t t d J  S ta t e s ;

F r a n k  R ic c i , et a l .,
Petitioners,

v.

J o h n  D e St e f a n o , et  a l .,
Respondents.

On Writs of Certiorari 
to the United States Court of Appeals 

for the Second Circuit

BRIEF OF INDUSTRIAL-ORGANIZATIONAL 
PSYCHOLOGISTS AS AMICI CURIAE 

IN SUPPORT OF RESPONDENTS

David  C. Frederick  
Counsel o f Record 

Derek  T. H o 
Barrett  C. Hester  
Jennifer  L. Peresie  
Ke llo g g , Hu b e r , Ha n s e n , 

T od d , Evans  & F ig e l , P.L.L.C. 
1615 M Street, N.W., Suite 400 
Washington, D.C. 20036 
(202) 326-7900

Counsel for Industrial- 
Organizational Psychologists

March 25, 2009



TABLE OF CONTENTS
Page

TABLE OF AUTHORITIES....................................... iii
INTEREST OF AMICI CURIAE................................  1
SUMMARY OF ARGUMENT......................................3
ARGUMENT..................................................................5

I. THE 2003 EXAMINATIONS CONTAINED 
FATAL FLAWS THAT UNDERMINED 
THEIR ABILITY TO SELECT THE 
MOST QUALIFIED CANDIDATES................ 5
A. Proper Validation Of Employment

Tests According To Established Stan­
dards Is Essential To Ensuring Fair, 
Merit-Based Selection...................................6

B. Contrary To Petitioners’ Premise, The
2003 NHFD Examinations Could Not 
Have Been Validated Under Estab­
lished Principles............................................ 8
1. The Test Designer Conceded That

the Exams Did Not Attempt To 
Measure Command Presence, a 
Critical Job Attribute...........................  10

2. The Weighting of the Multiple-
Choice and Oral Interview Portions 
of the Exams Was Arbitrary and 
Could Not Be Validated....................... 12

3. Flaws in the Exam-Development 
Process Contributed to the Lack of 
Validity Evidence of the NHFD
Tests....................................................... 19



11

4. The NHFD Tests Could Not Have 
Been Validated for Strict Rank- 
Ordering of Candidates........................ 21

II. THE FLAWS IN THE NHFD PRO­
MOTIONAL EXAMS EXACERBATED 
THEIR ADVERSE IMPACT ON MINOR­
ITY CANDIDATES.......................................... 23
A. Overweighting Of The Written,

Multiple-Choice Portion Of The Exams 
Increased The Adverse Impact On 
Minority Candidates...................................24

B. Selection Of Candidates In Strict
Rank Order Also Contributed To Ad­
verse Impact................................................ 26

III. CURRENT I/O PSYCHOLOGY RE­
SEARCH SUPPORTS THE USE OF 
PROMOTIONAL ASSESSMENT CEN­
TERS AS A VALID AND LESS DIS­
CRIMINATORY ALTERNATIVE TO
TRADITIONAL TESTING METHODS.........28
A. History Of The Assessment Center

Model............................................................29
B. Assessment Centers Have Demon­

strated Validity In The Context Of 
Firefighter Promotion.................................30

C. Assessment Centers Have Been
Proven To Reduce Adverse Impact On 
Minorities.................................................... 32

CONCLUSION.............................................................34



Ill

TABLE OF AUTHORITIES
Page

CASES
Biondo v. City of Chicago, 382 F.3d 680

(7th Cir. 2004).......................................................  28
Bridgeport Guardians, Inc. v. City of Bridge­

port, 933 F.2d 1140 (2d Cir. 1991)......................28
Chicago Firefighters Local 2 u. City of

Chicago, 249 F.3d 649 (7th Cir. 2001)..........  27-28
Chisholm v. United States Postal Serv., 516 

F. Supp. 810 (W.D.N.C. 1980), aff’d in 
part, 665 F.2d 482 (4th Cir. 1981)...................... 18

Ensley Branch, NAACP v. Seibels, 31 F.3d
1548 (11th Cir. 1994).............................................23

Firefighters Inst, for Racial Equality v. City 
of St. Louis:
549 F.2d 506 (8th Cir. 1977)........................... 10, 12
616 F.2d 350 (8th Cir. 1980)............. 16, 17, 22, 32

Griggs v. Duke Power Co., 401 U.S. 424
(1971).................................................................. 6, 29

Guardians Ass’n of New York City Police 
Dep’t u. Civil Serv. Comm’n, 630 F.2d 79 
(2d Cir. 1980)....................................................21, 22

Isabel v. City of Memph is:
No. 01-2533 ML/BRE, 2003 WL 23849732 
(W.D. Tenn. Feb. 21, 2003), aff’d, 404 
F.3d 404 (6th Cir. 2005)......................................  13
404 F.3d 404 (6th Cir. 2005)...............................  18



IV

Jones v. New York City Human Res. Admin.,
391 F. Supp. 1064 (S.D.N.Y. 1975), aff’d,
528 F.2d 696 (2d Cir. 1976).................................  14

Kelly v. City of New Haven, 881 A.2d 978
(Conn. 2005)..........................................................  22

Nash v. Consolidated City of Jacksonville, 
837 F.2d 1534 (11th Cir. 1988), vacated 
and remanded, 490 U.S. 1103 (1989), 
opinion reinstated on remand, 905 F.2d
355 (11th Cir. 1990).............................................. 17

Pina v. City of East Providence, 492 F. Supp.
1240 (D.R.I. 1980)........................................... 13, 22

STATUTES, REGULATIONS, AND RULES
Civil Rights Act of 1964, Tit. VII, 42 U.S.C.

§ 2000e et seq......................................... 5, 26, 29, 33
29 C.F.R. pt. 1607 (Uniform Guidelines on

Employee Selection Procedures)......... 5, 6, 7, 9, 12,
15, 16, 18, 21, 29

§ 1607.1(A)................................................................6
§ 1607.1(B)................................................................7
§ 1607.3(A)................................................................7
§ 1607.5(B)................................................................7
§ 1607.5(C)................................................................6
§ 1607.5(G)..............................................................21
§ 1607.5(H)............................................................  18
§ 1607.9.....................................................................9
§ 1607.14(B)..............................................................9
§ 1607.14(B)-(D)........................................................7



V

§ 1607.14(B)(3)..................................................... 10
§ 1607.14(C)(4)..................................................... 10
§ 1607.14(C)(9)...................................................... 21

Sup. Ct. R. 37.6...........................................................  1

ADMINISTRATIVE MATERIALS
Adoption of Questions and Answers To Clar­

ify and Provide a Common Interpretation 
of the Uniform Guidelines on Employee 
Selection Procedures, 44 Fed. Reg. 11,996 
(1979)............................................................... 16, 17

OTHER MATERIALS
Herman Aguinis & Erika Harden, Will 

Banding Benefit My Organization? An 
Application of Multi-Attribute Utility 
Analysis, in Test-Score Banding in 
Human Resource Selection 193 (Herman
Aguinis ed., 2004)............................................. 8, 22

American Psychological Association, Stan­
dards for Educational and Psychological 
Tests (1999)...............................................................6

Winfred Arthur Jr. et al., A Meta-Analysis of 
the Criterion-Related Validity of Assess­
ment Center Dimensions, 56 Personnel 
Psychol. 125 (2003)............................................... 31

Winfred Arthur Jr. et al., Multiple-Choice 
and Constructed Response Tests of Abil­
ity, 55 Personnel Psychol. 985 (2002)........... 24, 25

Walter C. Borman et al., Personnel Selection,
48 Ann. Rev. Psychol. 299 (1997)....................... 31



VI

David L. Bullins, Leading in the Gray Area,
Fire Chief (Aug. 10, 2006), at http:// 
firechief.com/management/bullins_gray 
08102006/index.html...........................................  15

Wayne F. Cascio et al., Social and Technical 
Issues in Staffing Decisions, in Test Score 
Banding, in Human Resource Selection 7 
(Herman Aguinis ed., 2004)...........................23, 27

Wayne F. Cascio & Herman Aguinis:
Applied Psychology in Human Resource 
Majiagement (6th ed. 2005)............................20, 32
Test Development, and Use: New Twists 
on Old Questions, 44 Human Res. Mgmt.
219 (2005).............................................................  18

John F. Coleman, Incident Management for
the Street-Smart Fire Officer (2d ed. 2008).......  11

Vincent Dunn, Command and Control of
Fires and Emergencies (1999).............................  11

Robert D. Gatewood & Hubert S. Feild,
Human Resource Selection (5th ed. 2001)....13, 14

Barbara B. Gaugler et al., Meta-Analysis of 
Assessment Center Validity, 72 J. Applied 
Psychol. 493 (1987)......................................... 31, 32

Gary M. Gebhart et al., Fire Service Testing 
in a Litigious Environment: A Case His­
tory, 27 Pub. Personnel Mgmt. 447 (1998) .... 32-33

Irwin L. Goldstein et al., An Exploration of 
the Job Analysis-Content Validity Proc­
ess, in Personnel Selection in Organiza­
tions 3 (Neil Schmitt & Walter C. Borman 
eds., 1993).............................................................  20



Vll

Charles D. Hale, The Assessment Center 
Handbook for Police and Fire Personnel 
(2d ed. 2004).....................................................30, 33

Chaitra M. Hardison & Paul R. Sackett, 
Assessment Center Criterion Related Valid­
ity: A Meta-Analytic Update (2004).................... 31

James R. Huck & Douglas W. Bray, Man­
agement Assessment Center Evaluations 
and Subsequent Job Performance of White 
and, Black Females, 29 Personnel Psychol.
13 (1976)...............................................................  29

Int’l Ass’n of Fire Chiefs et al.:
Fire Officer: Principles and Practice (2006) ....15, 29 
Fundamentals of Fire Fighter Skills (2004).......  20

Anthony Kastros, Mastering the Fire Service
Assessment Center (2006)....................................  11

Richard Kolomay & Robert Hoff, Firefighter
Rescue & Survival (2003)....................................  11

Diana E. Krause et al., Incremental Validity 
of Assessment Center Ratings Over Cogni­
tive Ability Tests: A Study at the Execu­
tive Management Level, 14 Int’l J. Selec­
tion & Assessment 360 (2006).............................  31

Phillip E. Lowry, A Survey of the Assessment 
Center Process in the Public Sector, 25 
Pub. Personnel Mgmt. 307 (1996)..................15, 30

Lynn Lyons Morris et al., How to Measure
Performance and Use Tests (1987)...................... 13

Matthew Murtagh, Fire Department Promo­
tional Tests (1993).................................... 10, 11, 14



Vlll

James L. Outtz & Daniel A. Newman, A 
Theory of Adverse Impact (manuscript on 
file with author) (forthcoming in Adverse 
Impact: Implications for Organizational 
Staffing and High Stakes Selection, 2009)........  25

Miguel Roig & Maryellen Reardon, A 
Performance Standard for Promotions,
141 Fire Engineering 49 (1988)..........................  30

Philip L. Roth et ah, Ethnic Group Differ­
ences in Cognitive Ability in Employment 
and Educational Settings: A Meta-Analysis,
54 Personnel Psychol. 297 (2001)....................... 24

Chase Sargent, From Buddy to Boss: Effec­
tive Fire Service Leadership (2006)............... 11, 23

Society for Industrial and Organizational 
Psychology, Principles for the Validation
and, Use of Personnel Selection Procedures 
(4th ed. 2003), at http://www.si0p.0rg/_
Principles/principles.pdf....................1, 2, 6, 7, 8, 9,

10, 13, 14, 15, 18, 27
Task Force on Assessment Center Guide­

lines, Guidelines and Ethical Considera­
tions for Assessment Center Operations,
18 Pub. Personnel Mgmt. 457 (1989)............ 29, 30

Ian Taylor, A Practical Guide to Assessment
Centres and Selection Methods (2007)................ 33

Michael A. Terpak, Assessment Center:
Strategy and Tactics (2008)................................  15

George C. Thornton & Deborah E. Rupp,
Assessment Centers in Human Resource 
Management (2006).............................................. 32

http://www.si0p.0rg/_


IX

Carl F. Weaver, Can Assessment Centers 
Eliminate Challenges to the Promotional 
Process? (July 2000), at http://www.usfa. 
dhs.gov/pdf/efop/efo24862.pdf............................. 30

Samuel J. Yeager, Use of Assessment Centers 
by Metropolitan Fire Departments in 
North America, 15 Pub. Personnel Mgmt.
51 (1986)................................................................ 30

http://www.usfa


INTEREST OF AMICI CURIAE1
Amici are experts in the field of industrial- 

organizational psychology and are elected fellows 
of the Society for Industrial and Organizational 
Psychology (“SIOP”), the division of the American 
Psychological Association that is responsible for the 
establishment of scientific findings and generally ac­
cepted professional practices in the field of personnel 
selection. Amici also have extensive experience in 
the design and validation of promotional tests for 
emergency services departments, including fire and 
police departments across the country. Amici have 
an interest in ensuring the scientifically appropriate 
choice, development, evaluation, and use of personnel 
selection procedures.

Professor Herman Aguinis is the Mehalchin Term 
Professor of Management at the University of Colo­
rado Denver Business School. In addition to his 
extensive scholarship and research on personnel 
selection, Professor Aguinis served on the Advisory 
Panel on the most recent revision of the Principles 
for the Validation and Use of Personnel Selection 
Procedures (4th ed. 2003) (“Principles”), at http:// 
www.siop.org/_Principles/principles.pdf.

Professor Wayne Cascio holds the Robert H. Rey­
nolds Chair in Global Leadership at the University of 
Colorado Denver Business School. He has published

1 Pursuant to Supreme Court Rule 37.6, counsel for amici 
represents that it authored this brief in its entirety and that 
none of the parties or their counsel, nor any other person or 
entity other than amici or their counsel, made a monetary 
contribution intended to fund the preparation or submission of 
this brief. Counsel for amici also represents that all parties 
have consented to the filing of this brief in the form of blanket 
consent letters filed with the Clerk.

http://www.siop.org/_Principles/principles.pdf


2

and testified extensively on issues relating to fire­
fighter promotion. He served as President of SIOP 
from 1992 to 1993.

Professor Irwin Goldstein is currently Senior Vice 
Chancellor for Academic Affairs in the University 
System of Maryland. From 1991 to 2004, he served 
as Professor and Dean of the College of Behavioral 
and Social Sciences at the University of Maryland 
College Park. He has published extensively on vali­
dation, job selection, and training. He also served as 
President of SIOP from 1985 to 1986.

Dr. James Outtz, Ph.D., has more than 30 years’ 
experience in the design and validation of personnel 
selection procedures. In 2002-2003, Dr. Outtz served 
on the Ad Hoc Committee that oversaw the 2003 
revision of the Principles, the official policy statement 
of SIOP. Dr. Outtz also has served as a consultant 
for the City of Bridgeport, Connecticut, in designing 
the city’s promotional examinations for fire lieuten­
ant and captain positions.

Professor Sheldon Zedeck is Professor of Psychol­
ogy at the University of California at Berkeley and 
Vice Provost for Academic Affairs and Faculty Wel­
fare. Professor Zedeck’s research and writing focuses 
on employment selection and validation models. Pro­
fessor Zedeck served on the Ad Hoc Committee on 
the 2003 revision of the Principles and as President 
of SIOP from 1986 to 1987.



3

SUMMARY OF ARGUMENT
The City of New Haven (“City”), acting through 

its Civil Service Board (“Board”), reasonably declined 
to certify the results of the 2003 New Haven Fire 
Department (“NHFD”) promotional examinations for 
captain and lieutenant because the validity of the 
tests could not have been substantiated under ac­
cepted scientific principles in the field of industrial- 
organizational (“I/O”) psychology and applicable legal 
standards. Based on their expertise in the field of 
I/O psychology and their experience in employment 
test design, amici have identified at least four serious 
flaws in the tests that undermined their validity: 
(1) their admitted failure to measure critical qualifi­
cations for the job of a fire company officer; (2) the 
arbitrary, scientifically unsubstantiated weighting of 
the multiple-choice and oral components of the test 
battery; (3) the lack of input from local subject- 
matter experts regarding whether the tests matched 
the content of the jobs; and (4) use of strict rank­
ordering without sufficient justification. Members of 
the Board thus reasonably concluded that it was 
unlikely, if not impossible, that the tests could be 
demonstrated to be valid.

Petitioners’ claim that the decision not to certify 
the NHFD test results constituted a deviation from 
merit-based selection is inaccurate because of these 
clear and serious flaws in the design of the tests and 
the proposed use of the test scores. To the contrary, 
due to those flaws, which are apparent from the 
record below, there is no basis to conclude that certi­
fication of the test results would have led to the 
promotion of the most qualified candidates.

Moreover, several of the tests’ flaws -  namely, the 
unsubstantiated weighting of the test components



4

and use of strict rank-ordering -  contributed to their 
adverse impact on racial subgroups, specifically 
African-American and Hispanic candidates. Thus, 
the tests not only failed to support an inference of 
superior job qualification from higher scores, but also 
simultaneously introduced a likely source of bias 
against minority candidates. In a predominantly 
minority city such as New Haven, bias against 
minority promotion exacerbates the public-safety 
risks of flawed tests by undermining the perception 
of fairness and cohesiveness among firefighters and 
by impairing the overall public effectiveness of the 
department. Thus, although the City had already 
approved and administered the test, the City appro­
priately concluded that costs to the NHFD and the 
local community would outweigh any potential bene­
fit gained from certifying the tests.

The City could (and properly should) have adopted 
an alternative method of promotional selection to 
reduce the tests’ adverse impact. At the very least, 
the City should have used scientifically substanti­
ated weighting for the test components, which would 
likely have led to a reduced emphasis on the written 
component. Also, it could have discarded rank­
ordering in favor of a “banding” approach, which 
treats candidates as equally qualified if their scores 
lie within a certain range reflecting the test’s error of 
measurement. Rather than focus on the tests them­
selves, banding focuses on how test scores are used 
to make hiring decisions. Banding has been demon­
strated, in some circumstances, to produce modest 
reductions in adverse impact without compromising 
the validity of the testing procedures in question. 
Moreover, the City could have adopted other options 
such as an “assessment center” that included behav­



5

ioral simulations of critical job components as part 
of the exams. Over the last 30 years, I/O psychology 
research has robustly confirmed that a properly vali­
dated assessment center can substantially reduce the 
adverse impact against minority candidates in the 
context of jobs such as firefighting.

In sum, given the flaws in the NHFD exams, 
which exacerbated the adverse impact on minority 
candidates, and given the availability of proven al­
ternative selection methods, the City had reasonable, 
race-neutral grounds for deciding against certifying 
the results of the flawed tests. Indeed, under Title 
VII of the Civil Rights Act of 1964, it had no choice 
but to decline to certify the results. Petitioners’ 
attempt to turn a decision compelled by Title VII into 
a violation of Title VII on the basis of mere insinua­
tions about the Board’s supposed racial biases turns 
the statute on its head and should be rejected.

ARGUMENT
I. THE 2003 EXAMINATIONS CONTAINED 

FATAL FLAWS THAT UNDERMINED 
THEIR ABILITY TO SELECT THE MOST 
QUALIFIED CANDIDATES

A critical and oft-repeated premise of petitioners’ 
brief is that the 2003 NHFD examinations were 
“composed and validated” based on the Uniform Guide­
lines on Employee Selection Procedures {“Uniform 
Guidelines”), see 29 C.F.R. pt. 1607 (EEOC), and thus
(1) actually “served their purpose of screening out the 
unqualified and identifying the most qualified,” and
(2) would have withstood scrutiny in a disparate- 
impact lawsuit brought by minority officer candi­
dates. Pet. Br. 7, 35; see also id. at i (assuming in 
the question presented that the exams were “content- 
valid”). However, petitioners’ premise lacks scientific



6

foundation. Under applicable I/O psychology princi­
ples and legal standards, there was no reasonable 
likelihood that the City could have demonstrated that 
the NHFD promotional examinations were valid.

A. Proper Validation Of Employment Tests 
According To Established Standards Is 
Essential To Ensuring Fair, Merit-Based 
Selection

A central objective of the field of I/O psychology is 
to develop generally accepted professional standards 
for the design and use of personnel-selection proce­
dures, including employment tests, based on scien­
tific data and analysis. The federal government 
has issued the Uniform Guidelines, which establish 
a federal standard for employment testing, see 29 
C.F.R. § 1607.1(A), and “are intended to be consistent 
with generally accepted professional standards for 
evaluating standardized tests and other selection 
procedures,” including the American Psychological 
Association’s Standards for Educational and Psycho­
logical Tests (“APA Standards”), id. § 1607.5(C); 
see Griggs v. Duke Power Co., 401 U.S. 424, 433- 
34 (1971) (holding that the Uniform Guidelines are 
“entitled to great deference”). The Principles are 
designed to be consistent with the APA Standards 
and represent the official policy statement of the 
Society for Industrial and Organizational Psychology 
(“SIOP”), a division of the APA, regarding current I/O 
psychology standards for personnel selection. See 
Principles at ii, 1.

“Validity is the most important consideration in 
developing and evaluating selection procedures.” Id. 
at 4. “Validation” is the process of confirming that 
an employment test is “predictive of or significantly 
correlated with important elements of job perform­



7

ance.” 29 C.F.R. § 1607.5(B).2 In this case, for the 
reasons set forth below, at least four aspects of the 
NHFD promotional tests were flawed or arbitrary, 
and thus made it all but impossible for the City to 
show that the tests were valid.

The lack of evidence supporting the validity of the 
NFIFD tests undermines their value as a selection 
tool. Proper validation of an employment test is criti­
cal to merit-based personnel selection because it en­
sures that there is a scientific basis for inferring that 
a higher test score corresponds to superior job skills 
or performance. See Principles at 4 (defining validity 
as “the degree to which accumulated evidence and 
theory support specific interpretations of test scores 
entailed by proposed uses of a test”); 29 C.F.R. 
§ 1607.1(B). Moreover, proper validation promotes 
fairness and equal opportunity by ensuring that 
any disparate impact on subgroups is traceable to 
job requirements rather than contamination or bias 
in testing methodology. See 29 C.F.R. § 1607.3(A)

2 The Principles and the Uniform Guidelines identify three 
types of evidence that support an inference of validity: 
“content,” “criterion,” and “construct.” See Principles at 4-5; 29 
C.F.R. § 1607.14(B)-(D). The three are not mutually exclusive 
categories, but rather refer to evidence supporting the inference 
that a test identifies those who are qualified to do the job. See 
Principles at 4. Content validity supports such an inference by 
showing that the test’s content matches the essential content 
of the job, while criterion validity supports the inference by 
showing that test results successfully predict job performance. 
Construct validity is more abstract and is shown by evidence 
that the test measures the degree to which candidates have 
characteristics, or traits, that have been determined to lead to 
successful job performance. The flaws described in this brief 
undermine the content validity of the NHFD exams, which is 
the only evidence of validity asserted by petitioners and the 
only feasible type of validation in these circumstances.



8

(providing that a procedure that has an adverse 
impact “will be considered to be discriminatory . . . 
unless the procedure has been validated in accor­
dance with these guidelines”); Principles at 7.

Validation is especially critical in the context 
of promotional exams for important public-safety 
leadership positions, such as fire company officers. 
Ensuring the selection of the most qualified fire 
officers saves lives. Accordingly, all state and local 
governments have a strong, race-neutral interest in 
declining to use promotional tests for fire officers 
that are shown to lack validity. Indeed, a legal re­
gime in which state and local governments are ham­
strung into implementing the results of such tests 
threatens the lives of the citizens they are committed 
to protect.3 In this case, the City acted reasonably 
by declining to certify the NHFD promotional test 
results because the tests were fatally flawed.

B. Contrary To Petitioners’ Premise, The 
2003 NHFD Examinations Could Not 
Have Been Validated Under Established 
Principles

Petitioners’ premise that the NHFD tests were 
properly validated rests on insufficient factual 
evidence, consisting merely of the fact that the test

3 While critical, the validity of a test is only one of several 
issues that may legitimately be taken into account in making a 
decision on whether and how to use a test. The decision to use 
a particular test, or to use a test in a particular way, is made 
within a broader social and organizational context and appro­
priately takes into account possible “side effects” that could be 
costly or detrimental for the organization and the people served 
by the organization. See Herman Aguinis & Erika Harden, Will 
Banding Benefit My Organization? An Application of Multi- 
Attribute Utility Analysis, in Test-Score Banding in Human 
Resource Selection 193, 196-211 (Herman Aguinis ed., 2004).



9

designer, I/O Solutions (“IOS”), conducted what has 
been described as a job analysis, using “question­
naires, interviews, and ride-along exercises with in­
cumbents to identify the importance and frequency of 
essential job tasks,” and then had only one individ­
ual, a fire battalion chief in Georgia, review the tests. 
Pet. Br. 7; see also id. at 52. Petitioners also claim 
that the test designer provided “oral assurance of 
validity” and that the NHFD’s Chief and Assistant 
Chief “thought the exams were fair and valid.” Id. at 
35.

Contrary to petitioners’ assertions, conducting a 
job analysis -  while in most cases necessary to a 
test’s validity -  is not alone sufficient to demonstrate 
validity. See, e.g., 29 C.F.R. § 1607.14(B). Moreover, 
the Uniform Guidelines specifically reject the use 
of “casual reports of |a test’s] validity,” such as 
“testimonial statements and credentials of” IOS, and 
“non-empirical or anecdotal accounts” such as the 
comments of the NHFD’s Chief and Assistant Chief. 
Id. § 1607.9. What the Uniform Guidelines and the 
Principles require is a rigorous analysis of the design 
and proposed use of the exam according to accepted 
principles of I/O psychology.4

Judged against the proper standards, there was no 
reasonable likelihood that the examinations adminis­
tered by the NHFD could have been demonstrated 
to be valid, after the fact, according to generally 
accepted strategies for validation.5

4 The fact that IOS was purportedly prepared to issue a 
validation study does not prove the tests were valid because the 
test-design process that the study would have described was 
fatally flawed.

5 Petitioners’ contention that the City avoided a full-fledged 
analysis of the validity of the examinations because it would



10

1. The Test Designer Conceded That the 
Exams Did Not Attempt To Measure 
Command Presence, a Critical Job 
Attribute

It is a fundamental precept of personnel selection 
that an employment test should be constructed to 
measure important knowledge, skills, abilities, and 
other personal characteristics (“KSAOs”) needed for 
the job. See 29 C.F.R. §§ 1607.14(B)(3), 1607.14(C)(4). 
The omission from the testing domain of a KSAO 
that is an important job prerequisite -  known in I/O 
psychology as “criterion deficiency” -  vitiates the en­
tire justification for the employment test, which is to 
select individuals accurately based on their capacity 
to perform the job in question. A test that makes 
no attempt to measure one or more critical KSAOs 
cannot be validated under established standards. 
See, e.g., Principles at 23; see also Firefighters Inst, 
for Racial Equality v. City of St. Louis, 549 F.2d 506, 
512 (8th Cir. 1977) (UFIRE I ”) (validity “requires that 
an important and distinguishing attribute be tested 
in some manner to find the best qualified appli­
cants”).

As the City’s then-corporate counsel, Thomas Ude, 
recognized, the distinguishing feature of the job of a 
fire officer, as opposed to an entry-level firefighter, 
is responsibility for supervising and leading other 
firefighters in the line of duty. See JA138-39; see 
also Matthew Murtagh, Fire Department Promotional 
Tests 152 (1993) (“[CJompany officers, lieutenants and

have been required to certify the test results is thus unsupport- 
able. See U.S. Br. 19 (“An employer acts reasonably in not in­
curring [the burdens of a validity study] when it has significant 
questions concerning a test’s job-relatedness or reasonably be­
lieves that better alternatives to the test may exist.”).



11

captains, are the primary supervisors.”); Anthony 
Kastros, Mastering the Fire Service Assessment 
Center 45 (2006). Leadership in emergency-response 
crises requires expertise in fire-management tech­
niques and sound judgment about life-and-death 
decisions. Moreover, critically, it also requires a 
steady “presence of command” so that the unit will 
follow orders and respond correctly to fire conditions. 
See, e.g., Richard Kolomay & Robert Hoff, Firefighter 
Rescue & Survival 5 (2003). Command presence 
requires an officer on the scene of a fire to act deci­
sively, to communicate orders clearly and thoroughly 
to personnel on the scene, and to maintain a sense of 
confidence and calm even in the midst of intense 
anxiety, confusion, and panic. See id. at 5-13. Com­
mand presence generates respect for the officer among 
subordinates and is thus essential to order and disci­
pline within the unit. See Murtagh at 152.

Simply put, command presence is a hallmark of a 
successful fire officer. See, e.g'., Chase Sargent, From 
Buddy to Boss: Effective Fire Service Leadership 21 
(2006) (“No individual leader ever forgets the first 
time that their command presence was put to the 
test.”). Virtually all studies of fire management em­
phasize that command presence is vital to the safety 
of firefighters at the scene and to the successful ac­
complishment of the firefighting mission and the 
safety of the public. See, e.g., John F. Coleman, Inci­
dent Management for the Street-Smart Fire Officer 
21-26 (2d ed. 2008); Kastros at 45; Vincent Dunn, 
Command and Control of Fires and Emergencies 1-6 
(1999).

Here, the developer of the NHFD promotional 
exams, IOS, admitted that those exams were not 
designed to measure “command presence.” Pet. App.



12

738a (Legel Dep.) (“Command presence usually 
doesn’t come up as one of the skills and abilities we 
even try to assess.”).6 A high test score thus could 
not support an inference that the candidate would be 
a good commander in the line of duty; conversely, 
those candidates with strong command attributes 
were never given an opportunity to demonstrate 
them. Given the importance of command presence to 
the job of a fire officer, as the City recognized, this 
failure alone rendered the tests deficient. See JA139 
(testimony of Mr. Ude that the “goal of the test is 
to decide who is going to be a good supervisor ulti­
mately, not who is going to be a good test-taker”).

In FIRE I, the Eighth Circuit recognized, in a simi­
lar situation, that the St. Louis fire captain’s exam 
contained the “fatal flaw” of failing to test for “super­
visory ability.” 549 F.2d at 511. Because “super­
visory ability” was a central requirement to the job of 
a fire captain, that failure precluded validation of the 
tests under the Uniform Guidelines. Id. The admit­
ted failure of IOS to test command presence, a key 
attribute for any supervisory fire officer, would have 
led to the same result in this case.

2. The Weighting of the Multiple-Choice 
and Oral Interview Portions of the 
Exams Was Arbitrary and Could Not 
Be Validated

Even putting aside the failure of the NHFD exams 
to measure a critical aspect of the fire officer’s 
responsibilities, the tests were seriously flawed for a 
second reason, stemming from the imposition of a

6 However, IOS’s representative, Chad Legel, conceded that 
“command presence” could be measured through the use of an 
assessment center. See infra p. 31.



13

predetermined 60/40 weighting for the written and 
oral interview components of the tests with no evi­
dence that those weights matched the content of the 
jobs. Scientific principles of employment-test valida­
tion require not only that tests reflect the important 
KSAOs of the job, but also that the results from 
those tests be weighted in proportion to their job 
importance. Indeed, even assuming that a test meas­
ured the full range of important KSAOs, which the 
NHFD tests did not, see supra Part I.B.l, a test that 
gives inappropriate weight to some KSAOs over 
others could not have been shown accurately to 
select the candidates who are the most qualified for 
the job.7 Numerous federal courts have likewise held 
that tests that measure relevant job skills without 
appropriate consideration for, and weighting of, their 
relative importance cannot properly be validated.8

7 See, e.g., Principles at 23-25 (explaining that selection pro­
cedures should adequately cover the requisite KSAOs and that 
the sample of KSAOs tested should be rationally related to the 
KSAOs needed to perform the work); Lynn Lyons Morris et al., 
How to Measure Performance and Use Tests 99 (1987) (explain­
ing that a content-valid test should include a representative 
sample of categories of KSAOs and “give emphasis to each 
category according to its importance”); Robert D. Gatewood & 
Hubert S. Feild, Human Resource Selection 178-80 (5th ed. 
2001) (“For our measures to have content validity, . . . their 
content [must] representatively sample[] the content of [the 
relevant performance] domains.”).

8 See, e.g., Isabel v. City of Memphis, No. 01-2533 ML/BRE, 
2003 WL 23849732, at *5 (W.D. Tenn. Feb. 21, 2003) (“[T]he 
test developer must demonstrate that those tests utilized in the 
selection system appropriately weigh the [KSAOs] to the same 
extent they are required on the job.”), a ff’d, 404 F.3d 404 
(6th Cir. 2005); Pina v. City of East Providence, 492 F. Supp. 
1240, 1246 (D.R.I. 1980) (invalidating a ranking system for fire­
fighters that gave equal weight to written and physical compo­
nents of the exam, on the ground that the physical component



14

The NHFD exams in this case failed that basic 
principle because the predetermined, arbitrary 60/40 
weighting used to calculate the candidates’ combined 
scores was in no way linked to the relative impor­
tance of “work behaviors, activities, and/or worker 
KSAOs,” as required for validation. Principles at 25. 
The 60/40 weighting was determined in advance by 
the City’s collective bargaining agreement with the 
local firefighters’ union. See Pet. App. 606a (Legel 
Dep.). While it is not uncommon for municipal gov­
ernments such as the City to enter into labor or other 
agreements that provide for a specified weighting 
of test components, such provisions undermine the 
validity of the resulting tests unless measures are 
taken by the test designer to account for the preset 
weighting.

Here, IOS concededly made no effort to establish 
that the 60/40 weighting was appropriate for the 
tests it designed. IOS should have used established 
methods to calculate whether, in light of the manda­
tory 60/40 weighting, the test components measured 
the job-relevant KSAOs in proportion to their rela­
tive importance. See, e.g., Murtagh at 161; Gatewood 
& Feild at 178. IOS apparently did not do so; instead, 
it merely assessed whether the test questions were 
related to relevant aspects of the job, with no regard 
to whether the items included on the test proportion­
ally measured the critical aspects of the overall job.

“appear[ed] to have a greater relationship to job performance 
as a firefighter”); Jones v. New York City Human Res. Admin., 
391 F. Supp. 1064, 1079 n.15, 1081 (S.D.N.Y. 1975) (calling it a 
“serious defect” not to examine “how or why the skills in the test 
plan were weighted as they were,” resulting in a test that could 
not be validated; for example, numerous questions related to a 
skill, supervision, that only 60%-65% of employees in the posi­
tion actually performed), aff’d, 528 F.2d 696 (2d Cir. 1976).



15

See Pet. App. 634a (Legel Dep.). IOS’s failure to take 
that step resulted in tests that, absent sheer luck, 
could not have resulted in adequate validity evidence 
under the Principles or the Uniform Guidelines.

Moreover, there is no indication that the 60/40 
weighting at issue in this case, which gave predomi­
nance to the multiple-choice component of the exams, 
was appropriate for the relevant job. It is well- 
recognized by I/O psychologists and firefighters alike 
that written, pencil-and-paper tests, while able to 
measure certain cognitive abilities (e.g., reading and 
memorization) and factual knowledge, do not meas­
ure other skills and abilities critical to being an effec­
tive fire officer as well as alternative methods of test­
ing do. See, e.g., Michael A. Terpak, Assessment Center: 
Strategy and Tactics 1 (2008) (multiple-choice exams 
are “known to be poor at measuring the knowledge 
and abilities of the candidate, most notably that of 
a fire officer”); Int’l Ass’n of Fire Chiefs et al., Fire 
Officer: Principles and Practice 28 (2006) (describing 
the criticism of written tests as producing firefighters 
who are “[b]ook smart, street dumb”); see also David 
L. Bullins, Leading in the Gray Area, Fire Chief 
(Aug. 10, 2006) (“Good leadership is not a matter of 
decisions made in black and white; it is a matter of 
the decisions that must be made in shades of gray.”), 
at http://firechief.com/management/bullins_gray08 
102006/index.html. Although a written component 
often properly comprises part of the overall assess­
ment procedure for fire officers, a weighting of 60% 
is significantly above what would be expected given 
the requirements of the positions. See Phillip E. 
Lowry, A Survey of the Assessment Center Process in 
the Public Sector, 25 Pub. Personnel Mgmt. 307, 309 
(1996) (survey finding that the median weight given

http://firechief.com/management/bullins_gray08


16

to written portion of test for fire and police depart­
ments was 30%); infra p. 26 (describing weights used 
by neighboring Bridgeport).

The Uniform Guidelines and the federal courts 
have similarly recognized that written tests do not 
correspond well to the skills and abilities actually 
required for the job of a fire officer and are thus poor 
predictors of which candidates will make successful 
fire lieutenants and captains. The EEOC’s interpre­
tive guidance on the Uniform Guidelines9 states that 
“ [p]aper-and-pencil tests of . . . ability to function 
properly under danger (e.g., firefighters) generally 
are not close enough approximations of work behaviors 
to show content validity.” Questions and Answers 
No. 78, 44 Fed. Reg. at 12,007.

The Eighth and Eleventh Circuits have reached the 
same conclusion. In FIRE II, the Eighth Circuit 
rejected the validity of a multiple-choice test for 
promotion to fire captain, on the ground that “ [t]he 
captain’s job does not depend on the efficient exercise 
of extensive reading or writing skills, the comprehen­
sion of the peculiar logic of multiple choice questions, 
or excellence in any of the other skills associated 
with outstanding performance on a written multiple 
choice test.” 616 F.2d at 357. ‘“ Where the content

9 The EEOC’s interpretive “questions and answers” were 
adopted by the four agencies that promulgated the Uniform 
Guidelines in order to “interpret and clarify, but not to modify,” 
those Guidelines. Adoption of Questions and Answers To Clar­
ify and Provide a Common Interpretation of the Uniform Guide­
lines on Employee Selection Procedures, 44 Fed. Reg. 11,996, 
11,996 (1979) (“Questions and Answers”). Like the Uniform 
Guidelines themselves, the agency interpretations have been 
given great deference by the courts. See, e.g., Firefighters Inst, 
for Racial Equality v. City of St. Louis, 616 F.2d 350, 358 n.15 
(8th Cir. 1980) (“FIRE IF).



17

and context of the selection procedures are unlike 
those of the job, as, for example, in many paper-and- 
pencil job knowledge tests, it is difficult to infer an 
association between levels of performance on the pro­
cedure and on the job.’ ” Id. at 358 (quoting Ques­
tions and Answers No. 62, 44 Fed. Reg. at 12,005). 
Accordingly, “[bjecause of the dissimilarity between 
the work situation and the multiple choice proce­
dure,” the court found that “greater evidence of valid­
ity [wajs required.” Id. at 357.

In Nash v. Consolidated City of Jacksonville, 837 
F.2d 1534 (11th Cir. 1988), vacated and remanded, 
490 U.S. 1103 (1989), opinion reinstated on remand, 
905 F.2d 355 (11th Cir. 1990), the Eleventh Circuit 
likewise rejected the use of a written test to deter­
mine eligibility for promotion to the position of fire 
lieutenant. The court rejected the use of the test 
even though the test questions “never made their 
way into evidence” and even though the expert who 
was challenging the use of the test on behalf of the 
firefighter had never seen the questions. Id. at 1536. 
As the court explained, “[a]n officer’s job in a fire 
department involves ‘complex behaviors, good inter­
personal skills, the ability to make decisions under 
tremendous pressure, and a host of other abilities — 
none of which is easily measured by a written, multi­
ple choice test.’ ” Id. at 1538 (quoting FIRE II, 616 
F.2d at 359).

IOS exacerbated the problem of imbalance in its 
response to another predetermined feature of the 
NHFD exams -  the 70% cutoff score mandated by the 
City’s civil service rules. Like the 60/40 weighting, 
the 70% cutoff score was arbitrary and not scientifi­
cally validated. See Pet. App. 697a-698a (concession 
by Mr. Legel that IOS was unable to validate the



18

70% cutoff score).10 IOS not only “went ahead and 
used [the] seventy percent,” but also decided to make 
the written component of the test “more difficult” in 
an effort to screen out “a fair amount more number 
[sic] of people . . . than what other tests have done 
in the past.” Id. at 698a-699a (Legel Dep.). Not only 
did this admittedly worsen the adverse impact of the 
tests on minority candidates, see infra Part II.A, but 
it also skewed the focus of the test even more heavily 
in the direction of the limited and more attenuated 
set of knowledge and abilities that are measured by 
a multiple-choice test, by giving that component un­
justifiably greater weight in the composite scores. 
That, in turn, further reduced the likelihood that the 
exams could have been shown to be valid.11

10 Arbitrary cutoff scores alone can undermine a test’s validity.
See, e.g., Isabel v. City of Memphis, 404 F.3d 404, 413 (6th Cir. 
2005) (stating that, "[t]o validate a cutoff score, the inference 
must be drawn that the cutoff score measures minimal qualifi­
cations”); accord Chisholm v. United States Postal Seru., 516 F. 
Supp. 810, 832, 838 (W.D.N.C. 1980), a ff’d in relevant part, 665 
F.2d 482 (4th Cir. 1981). The Uniform Guidelines and the Prin­
ciples clearly require cutoff scores, if they are used, to be based 
on scientifically accepted principles. See 29 C.F.R. § 1607.5(H) 
(“Where cutoff scores are used, they should normally be set so 
as to be reasonable and consistent with normal expectations of 
acceptable proficiency within the work force.”); Principles at 47 
(explaining that “[professional judgment is necessary in setting 
any cutoff score” in light of factors including “the [KSAOs] 
required by the work”); see also Wayne F. Cascio & Herman 
Aguinis, Test Development and Use: New Twists on Old
Questions, 44 Human Res. Mgmt. 219, 227 (2005) (discussing 
appropriate process for calibrating cutoff score to minimum 
proficiency for the job).

11 In fact, IOS acknowledged that even the oral portion of the 
test was designed, at least “to a small degree,” to test factual 
knowledge, thus further skewing the balance of the test. Pet. 
App. 709a (Legel Dep.).



19

Under established principles in the field of I/O 
psychology and longstanding legal authorities, the 
NHFD exams were deficient because of IOS’s failure 
to substantiate the predetermined 60/40 weighting 
before administering the test and because of the re­
sulting overemphasis given to the written, multiple- 
choice component of the exams, which has been dem­
onstrated to be a relatively poor method for measur­
ing whether a candidate has the KSAOs needed to be 
a fire officer.

3. Flaws in the Exam-Development Proc­
ess Contributed to the Lack of Validity 
Evidence of the NHFD Tests

The process used to develop and finalize the tests 
further undermined the tests’ validity as a method 
for identifying the individuals best suited for promo­
tion. IOS personnel wrote the test questions based 
on the information developed from job analysis ques­
tionnaires given to incumbent New Haven fire offi­
cers and “national texts” on firefighting. C.A. App. 
478 (Legel Dep.). However, IOS personnel were not 
themselves subject-matter experts on the job of a fire 
company officer, nor were the “national texts” they 
used tailored to the NHFD’s specific practices or local 
conditions in New Haven. See id. (“So depending on 
the way that those [New Haven] City employees are 
trained to do their specific job, it may not always jibe 
with the way the textbook says to do it.”); see also 
Pet. App. 520a-521a (“Fire fighting is different on the 
East Coast than it is on [sic] West Coast or in the 
Midwest.”).

Accordingly, as IOS acknowledged, “ [standard 
practice” in the field required that the tests be re­
viewed by “a panel of subject matter experts internal 
to New Haven, for instance, incumbent lieutenants,



20

captains, battalion chiefs, [assistant] chiefs, and the 
like to actually gain [sicj their opinion about how 
relevant the items were and whether or not they 
were consistent with best practice in New Haven.” 
Id. at 635a (Legel Dep.) (emphasis added). Review 
by multiple persons with specific expertise about the 
NHFD was, as IOS recognized, important to verify 
that the questions accurately reflected important 
KSAOs of the job and, especially, local differences be­
tween NHFD’s practices and procedures and national 
firefighting standards. See Wayne F. Cascio & Her­
man Aguinis, Applied Psychology in Human Resource 
Management 158-59 (6th ed. 2005) (documenting the 
need for subject-matter experts to “confirm[] the 
fairness of sampling and scoring procedures” and 
to evaluate “overlap between the test and the job 
performance domain”); Irwin L. Goldstein et al., 
An Exploration of the Job Analysis-Content Validity 
Process, in Personnel Selection in Organizations 3, 
20-21 (Neil Schmitt & Walter C. Borman eds., 1993); 
Int’l Ass’n of Fire Chiefs et al., Fundamentals of Fire 
Fighter Shills 103, 431, 663 (2004) (emphasizing that 
firefighters need to become “intimately familiar” with 
local procedures and local differences affecting fire­
fighting such as architectural styles).

Rather than follow this admittedly standard proce­
dure, IOS hired a single individual, a battalion chief 
in a fire department in Georgia, to review the tests 
for the job-relatedness of their content. See Pet. App. 
635a-636a (Legel Dep.). Unsurprisingly, due to the 
failure to conduct a proper review by multiple subject- 
matter experts on local practice, IOS admitted that 
some of the items on the tests were “irrelevant for 
the City because you’re testing them on a knowledge 
base that while supported by a national textbook,



21

wouldn’t be supported by their own standard operat­
ing procedures.” C.A. App. 482 (Legel Dep.). For 
example, the lieutenants’ test included a question 
from a New York-based textbook about whether fire 
equipment should be parked uptown, downtown, or 
underground when arriving at a fire. JA48. The 
question was meaningless because New Haven has 
no “uptown” or “downtown.”

By IOS’s admission, and under applicable I/O psy­
chology standards, review of the test items by local 
subject-matter experts was critical to ensuring that 
the test components corresponded to the important 
job KSAOs. The failure to do so further undermined 
the validity of the NHFD exams as indicators of 
which candidates would have made successful NHFD 
fire lieutenants or captains.

4. The NHFD Tests Could Not Have Been 
Validated for Strict Rank-Ordering of 
Candidates

Under accepted standards, not only must an 
exam’s content be properly validated, but the use of 
the scores also must be scientifically justified. As 
the Uniform Guidelines state, “the use of a selection 
procedure on a pass/fail (screening) basis may be in­
sufficient to support the use of the same procedure on 
a ranking basis under these guidelines.” 29 C.F.R. 
§ 1607.5(G). Under the Uniform Guidelines, a strict 
rank-ordering system such as the one imposed by the 
City -  i.e., treating a candidate as “better qualified” 
based on even a slight incremental difference in score 
-  is only appropriate upon a scientific showing “that 
a higher score on a content valid selection procedure 
is likely to result in better job performance.” Id. 
§ 1607.14(C)(9). As the Second Circuit held in Guar­
dians Association of New York City Police Department



v. Civil Service Commission, 630 F.2d 79 (2d Cir. 
1980), “[permissible use of rank-ordering requires a 
demonstration of such substantial test validity that 
it is reasonable to expect one- or two-point differ­
ences in scores to reflect differences in job perform­
ance.” Id. at 100-01 (rejecting the validity of rank­
ordering); see also FIRE II, 616 F.2d at 358.

In this case, the NHFD tests could not have 
supported the use of a strict rank-ordering procedure 
for promotional selection. Indeed, the tests were 
designed and administered at a time when New 
Haven’s “Rule of Three” had been interpreted to per­
mit rounding of scores to the nearest integer, rather 
than strict rank-ordering based on differences of 
fractions of a point. See C.A. App. 1701; Kelly v. City 
of New Haven, 881 A.2d 978, 993-94 (Conn. 2005). 
Use of strict rank-ordering for a test absent evidence 
demonstrating that it was valid for that purpose 
cannot be justified. See Pina, 492 F. Supp. at 1246 
(invalidating test where “[tjhere [wa]s no evidence 
which even remotely suggested] that the order of 
ranking established] that any applicant [wa]s better 
qualified than any other”).

Moreover, as explained above, the serious flaws 
in the NHFD tests severely undermined the overall 
validity of the exams and certainly foreclosed any 
conclusion that the exams were of such “substantial 
. . . validity” as to justify the additional step of 
making promotional decisions strictly based on small 
score differences. Guardians Ass’n, 630 F.2d at 100- 
01. Making fine judgments based on small differ­
ences on fundamentally flawed tests is scientifically 
unsupportable. See, e.g., Aguinis & Harden at 193. 
“[U]se of an exam to rank applicants, when the 
exam cannot predict applicants’ relative merits, offers



23

nothing but a false sense of assurance based on a 
misplaced belief that some criterion -  no matter how 
arbitrary -  is better than none.” Ensley Branch, 
NAACP v. Seibels, 31 F.3d 1548, 1574 (11th Cir. 1994).

Tests that transform differences that are as likely 
to be a product of measurement error or flawed test 
design as they are a reflection of superior qualifica­
tions create nothing but the illusion of meritocracy. 
That illusion creates not only a false sense of indi­
vidual entitlement to jobs and promotions, but also a 
real public danger in the context of positions such as 
fire and police officers. When the safety and lives of 
citizens are at stake, it is particularly critical for 
public employers to have the leeway to ensure that 
the tests they deploy accurately identify those candi­
dates who are most qualified for these important 
jobs.
II. THE FLAWS IN THE NHFD PROMO­

TIONAL EXAMS EXACERBATED THEIR 
ADVERSE IMPACT ON MINORITY CAN­
DIDATES

Unjustified exclusion of minority candidates 
through scientifically flawed testing procedures has 
significant social costs. Especially in a city like New 
Haven, racial diversity has significant benefits to the 
ability of the public sector to provide needed services 
to the community and to protect the public safety. 
See, e.g., Wayne F. Cascio et al., Social and Technical 
Issues in Staffing Decisions, in Test Score Banding, 
in Human Resource Selection 7, 9 (Herman Aguinis 
ed., 2004). An all-white officer corps in the NHFD 
will be less effective than one that is more racially 
diverse. See id.; see also Sargent at 188 (noting that 
having a Hispanic firefighter fluent in Spanish “can 
be a life saver”).



24

In this case, the flaws in the NHFD promotional 
exams not only undermined their validity, but also 
unjustifiably increased their adverse impact on mi­
nority candidates. In particular, two features of the 
tests contributed to the conceded adverse impact on 
African-American and Hispanic examinees. Tests 
that eliminated these features were available to the 
City as “less discriminatory alternatives” under Title 
VII.

A. Overweighting Of The Written, Multiple- 
Choice Portion Of The Exams Increased 
The Adverse Impact On Minority Candi­
dates

It is well-established that minority candidates 
fare less well than their Caucasian counterparts on 
standardized written examinations, and especially 
multiple-choice (as opposed to “write-in”) tests. See, 
e.g., Winfred Arthur Jr. et al., Multiple-Choice and 
Constructed Response Tests of Ability, 55 Personnel 
Psychol. 985, 986 (2002); Philip L. Roth et al., Ethnic 
Group Differences in Cognitive Ability in Employ­
ment and Educational Settings: A Meta-Analysis, 54 
Personnel Psychol. 297 (2001). Although the causes 
for that widely recognized discrepancy are not fully 
understood, certain features of the multiple-choice 
format have been recognized to contribute to adverse 
impact.

First, “[t]o the extent that [the exam’s] reading 
demands are not concomitant with job demands 
and/or performance, then any variance associated 
with reading demands and comprehension is consid­
ered to be error variance.” Arthur et al., 55 Person­
nel Psychol, at 991. Some studies suggest disparities 
among racial subgroups in reading comprehension, 
such that using written questions and answers as



25

the sole or predominant medium for testing increases 
adverse impact. See id.; James L. Outtz & Daniel 
A. Newman, A Theory of Adverse Impact 12-13, 
68 (manuscript on file with author) (forthcoming 
in Adverse Impact: Implications for Organizational 
Staffing and High Stakes Selection, 2009). Moreover, 
studies suggest that racial minorities are less “test 
wise” than white test-takers, and it is “widely recog­
nized that performance on multiple-choice tests is 
susceptible to specific test-taking strategies or test­
wiseness.” Arthur et al., 55 Personnel Psychol, at 
991-92. Finally, studies have found that a test- 
taker’s unfavorable view of a test’s validity nega­
tively influences performance, and some evidence 
indicates that minority test-takers generally have a 
less favorable view of traditional written tests. See 
id. at 992.

Regardless of the exact cause of the disparity, it 
is clear that the use of written, multiple-choice 
tests beyond what is justified by the demands of 
a particular job has the effect of disproportionately 
excluding minority candidates without any correspond­
ing increase in job performance. See, e.g., Outtz & 
Newman at 33. As set forth above, the NHFD’s 60/40 
weighting was arbitrary and put more emphasis on 
the written, multiple-choice examination than science 
and experience have shown to be warranted for the 
job of a fire officer. Likewise, the response of 10S 
to the 70% cutoff score contributed to the adverse 
impact of the exams. By IOS’s own admission, arbi­
trarily making the written portion of the tests “more 
difficult” further exaggerated the importance of the 
written component and thereby contributed to the 
exclusion of African-American and Hispanic candi­



26

dates from the promotional ranks. Pet. App. 698a- 
699a (Legel Dep.).

Changing the weighting of the exams to more accu­
rately reflect the content of the job almost certainly 
would have reduced their adverse impact by reducing 
the weight of the written component, and thus 
constituted a “less discriminatory alternative” that 
the City would have been obligated to use under Title 
VII. Had the City given a 30% weighting to the 
written component of the examination, more in line 
with the nationwide norm, see supra pp. 15-16, the 
tests would have had a significantly lower adverse 
impact on minority candidates. See Resp. Br. 33 
(“[I]f the tests were weighted 70%/30% oral/written, 
then two African-Americans would have been consid­
ered for lieutenant positions and one for a captain 
position.”). Indeed, 20 miles down the coast from 
New Haven, the fire department in Bridgeport, 
Connecticut, has administered tests with less weight 
given to the written component (25% for lieutenants 
and 33% for captains) and achieved a significant 
reduction in adverse impact relative to the NHFD 
exam results. See JA64-66.12

B. Selection Of Candidates In Strict Rank 
Order Also Contributed To Adverse 
Impact

As discussed above, the NHFD tests were improp­
erly weighted toward the written component, which 
tested certain KSAOs (e.g., reading, memorization, 
and factual knowledge) in disproportion to their im­

12 In 2005, the Bridgeport lieutenant’s exam consisted of 
written and oral components, weighted 25% and 75%, respec­
tively; the weights on the captain’s exam were 33% for the 
written component, 62% for the oral component, and 5% for 
seniority.



27

portance relative to other important skills and abili­
ties, including “command presence,” which was not 
measured at all. Moreover, the tests unjustifiably 
employed a strict rank-ordering system that differen­
tiated among candidates based on small score differ­
ences that had not been scientifically demonstrated 
to be meaningful. The combination of imbalanced 
weighting toward KSAOs that disproportionately 
disfavor minority candidates and the selection of 
candidates based strictly on rank order cemented the 
disproportionate rejection of minority candidates for 
promotion.

An alternative to strict rank-ordering would have 
been a “banding” scoring system. In brief, banding 
involves use of statistical analysis of the amount of 
error in the test scores to create “bands” of scores, 
the lowest of which is considered to be sufficiently 
similar to the highest to warrant equal consideration 
within that band. Cascio et al. at 10; see also Princi­
ples at 48 (bands “take into account the imprecision 
of selection procedure scores and their inferences”). 
After the width of the band is established, based on a 
statistical analysis of the reliability of measurement, 
the user can either establish “fixed” bands, in which 
the test user considers everyone within the top band 
before considering anyone from the next band, or 
“sliding” bands, which allows the band to “slide” 
down the list once higher scorers are either chosen or 
rejected. See Cascio et al. at 10-11.

The federal courts have recognized banding as 
“a universal and normally an unquestioned method 
of simplifying scoring by eliminating meaningless 
gradations” between candidates whose scores differ by 
less than the degree of measurement error. Chicago 
Firefighters Local 2 v. City of Chicago, 249 F.3d 649,



28

656 (7th Cir. 2001); see also, e.g., Biondo v. City of 
Chicago, 382 F.3d 680, 684 (7th Cir. 2004) (banding 
“respect[s] the limits of [an] exam’s accuracy”). In 
amici’s view, a banding approach would have been 
a viable method to reduce the adverse impact of the 
NHFD tests. However, given that the rankings 
themselves were a result of flawed tests, banding 
alone would not have been sufficient to achieve the 
objective of selecting the most qualified individuals 
for the job. See Bridgeport Guardians, Inc. v. City of 
Bridgeport, 933 F.2d 1140, 1147 (2d Cir. 1991) (“The 
ranking of the candidates was itself the result of the 
disparate impact of the examination.”).
III. CURRENT I/O PSYCHOLOGY RESEARCH 

SUPPORTS THE USE OF PROMOTIONAL 
ASSESSMENT CENTERS AS A VALID AND 
LESS DISCRIMINATORY ALTERNATIVE 
TO TRADITIONAL TESTING METHODS 

The evidence in the record clearly demonstrates 
that the NHFD exams suffered from fatal design de­
fects that undermined their validity and unjustifia­
bly excluded a disproportionate number of minority 
candidates. That alone left the City no choice but to 
decline to certify the exams. In addition, the City 
reasonably concluded that certification of the tests 
could not be justified given the existence of alterna­
tive methods of selection. One alternative before the 
City was the assessment center, which, if designed 
properly, would measure a broader range of KSAOs 
and also be less discriminatory. See, e.g., JA96 
(statement of Dr. Christopher Hornick to the Board 
that assessment centers are "much more valid in 
terms of identifying the best potential supervisors”); 
Pet. App. 739a (Legel Dep.).



29

A. History Of The Assessment Center Model
From the 1950s to the 1980s, multiple-choice tests 

were generally the only procedure used for promo­
tional selection in U.S. fire departments. See Int'l 
Ass’n of Fire Chiefs et al., Fire Officer: Principles and 
Practice at 28. Such tests were prevalent because 
they were easy and inexpensive to administer, and 
seemingly “objective.” However, for the reasons dis­
cussed above, such tests had the side-effect of exclud­
ing a disproportionate number of minority candidates 
from consideration. Beginning in the 1970s, spurred 
in part by the passage of Title VII (1964), the devel­
opment of the Uniform Guidelines (1966), and this 
Court’s decision in Griggs (1971), employers increas­
ingly began using an alternative selection method 
known as the assessment center. See James R. Huck 
& Douglas W. Bray, Management Assessment Center 
Evaluations and Subsequent Job Performance of 
White and Black Females, 29 Personnel Psychol. 13, 
13-14 (1976).

An assessment center is a form of standardized 
evaluation that seeks to test multiple dimensions of 
job qualification through observation of job-related 
exercises and other assessment techniques. See gen­
erally Task Force on Assessment Center Guidelines, 
Guidelines and Ethical Considerations for Assess­
ment Center Operations, 18 Pub. Personnel Mgmt. 
457, 460-64 (1989) (defining an assessment center). 
Unlike multiple-choice exams, which evaluate KSAOs 
through a single, written medium, assessment cen­
ters employ multiple methods, including, prominently, 
job simulations, all of which are designed to permit 
more direct assessment of ability to do the job. See 
id. at 461-62. Candidates’ performance on the simu­
lation exercises is rated by multiple subject-matter



30

experts. See id. at 462. By observing how a par­
ticipant handles the problems and challenges of the 
target job (as simulated in the exercises), assessors 
develop a valid picture of how that person would 
perform in the target position. See Charles D. Hale, 
The Assessment Center Handbook for Police and Fire 
Personnel 16-52 (2d ed. 2004) (describing typical 
exercises).

B. Assessment Centers Have Demonstrated 
Validity In The Context Of Firefighter 
Promotion

Since the 1970s, the use of assessment centers for 
employee selection has increased rapidly, both in the 
United States and elsewhere, and in firefighter pro­
motion in particular. By 1986, 44% of fire depart­
ments surveyed used assessment centers in making 
promotion decisions.13 More recent surveys indicate 
a usage rate of between 60% and 70%.14

Like any testing method, an assessment center 
must be properly constructed so that, for example, 
it measures important KSAOs of the relevant job. 
After more than 30 years of use and research, how­
ever, substantial agreement exists among I/O psy­
chologists that properly designed assessment centers 
are better predictors of job performance than other

13 See Samuel J. Yeager, Use of Assessment Centers by Met­
ropolitan Fire Departments in North America, 15 Pub. Person­
nel Mgmt. 51, 52-53 (1986); accord Miguel Roig & Maryellen 
Reardon, A Performance Standard for Promotions, 141 Fire 
Engineering 49, 49 (1988).

14 See Lowry, 25 Pub. Personnel Mgmt. at 310; Carl F. 
Weaver, Can Assessment Centers Eliminate Challenges to the 
Promotional Process? 13 (July 2000) (unpublished monograph), 
at http://www.usfa.dhs.gov/pdf/efop/efo24862.pdf.

http://www.usfa.dhs.gov/pdf/efop/efo24862.pdf


31

forms of promotional testing.15 Today, because of 
numerous studies supporting the conclusion, “the 
predictive validity of [assessment centers] is now 
largely assumed.” Walter C. Borman et al.. Person­
nel Selection, 48 Ann. Rev. Psychol. 299, 313 (1997). 
Properly designed assessment centers have incremental 
predictive validity over cognitive tests “because occu­
pational success is not only a function of a person’s 
cognitive abilities, but also the manifestation of those 
abilities in concrete observable behavior.” Diana E. 
Krause et al., Incremental Validity of Assessment 
Center Ratings Over Cognitive Ability Tests: A Study 
at the Executive Management Level, 14 Int’l J. Selec­
tion & Assessment 360, 362 (2006).

As reflected by their widespread usage by fire 
departments across the country, assessment centers 
are especially appropriate in the context of firefighter 
promotion. Because they use multiple methods of 
assessment, assessment centers are able to measure 
a wider range of skills, including critical skills such 
as leadership capacity, problem-solving, and “com­
mand presence.” IOS’s representative, Chad Legel, 
admitted that the NHFD tests failed to test for 
“command presence,” and he further acknowledged 
that the City “would probably be better off with an 
assessment center if you cared to measure that.” 
Pet. App. 738a (Legel Dep.); see also Krause et al., 14 
Int’l J. Selection & Assessment at 362 (agreeing that 
leadership ability is likely better assessed through an

15 See, e.g., Chaitra M. Hardison & Paul R. Sackett, Assess­
ment Center Criterion Related Validity: A Meta-Analytic Update 
14-20 (2004) (unpublished manuscript); Winfred Arthur Jr. et 
al., A Meta-Analysis of the Criterion-Related Validity of Assess­
ment Center Dimensions, 56 Personnel Psychol. 125, 145-46 
(2003); Barbara B. Gaugler et al., Meta-Analysis of Assessment 
Center Validity, 72 J. Applied Psychol. 493, 503 (1987).



32

assessment center than an oral interview); Gaugler 
et al., 72 J. Applied Psychol, at 493 (“assessment 
centers are most frequently used for assessing 
managers”).

In short, the “state of the art” in the field of promo­
tional testing for firefighters and the “state of the 
science” in I/O psychology have evolved beyond the 
outdated methods of testing used by the NHFD. 
Instead, as the City was told by Dr. Hornick, see 
JA96, there is now substantial agreement that a pro­
fessionally validated assessment center represents a 
more effective method of selecting the most qualified 
fire officers.

C. Assessment Centers Have Been Proven To 
Reduce Adverse Impact On Minorities

It is equally well-recognized in the research litera­
ture that assessment centers reduce adverse impact 
on racial minorities as compared to traditional stan­
dardized tests. See, e.g., George C. Thornton & 
Deborah E. Rupp, Assessment Centers in Human Re­
source Management 231 (2006). “Additional research 
has demonstrated that adverse impact is less of a 
problem in an assessment center as compared to an 
aptitude test designed to assess cognitive abilities 
that are important for the successful performance of 
work behaviors in professional occupations.” Cascio 
& Aguinis, Applied Psychology in Human Resource 
Management at 372-73.

Those scientific studies also have been borne out by 
experience. An analysis of fire-personnel selection in 
St. Louis in the 15 years after the FIRE II decision 
found that the institution of an assessment center 
selection method “achieved considerable success at 
minimizing adverse impact against black candi­
dates.” Gary M. Gebhart et al., Fire Service Testing



33

in a Litigious Environment: A Case History, 27 Pub. 
Personnel Mgmt. 447, 453 (1998).

In sum, assessment centers are now a prevalent 
feature of firefighter promotional tests across the 
nation. Numerous resources exist for employers 
wishing to incorporate assessment centers into their 
selection procedures in accordance with accepted 
scientific principles.10 The availability of the assess­
ment center as an equally valid, less discriminatory 
alternative provides yet another justification for the 
City’s decision not to certify the results of the NHFD 
promotional exams. Indeed, under Title VII, it com­
pelled that decision.

*  *  *  *  *

To place this case in overall perspective, petition­
ers’ lawsuit seeks to compel the City to certify the 
results of tests that suffered from glaring Haws 
undermining their validity, had an admitted adverse 
impact on racial minorities, and could have been 
replaced by readily available, equally or more valid, 
and less discriminatory alternatives. From the 
standpoint of accepted I/O psychology principles, there 
is no justification for certifying the results of such 
tests because there is no evidence they selected the 
most qualified candidates, and they systematically 
excluded minority candidates. Under established 
legal principles, moreover, certification would have 
resulted in a violation of Title VII, and the City’s 
decision was thus compelled by law. Petitioners’ 
challenge to the City’s decision must therefore fail.

10 See generally, e.g., Ian Taylor, A Practical Guide to 
Assessment Centres and Selection Methods (2007); Hale, supra.



34

CONCLUSION
The judgment of the court of appeals should be 

affirmed.

M a r c h  2 5 , 2 0 0 9

Respectfully submitted,

D a v i d  C. F r e d e r i c k  
Counsel of Record 

D e r e k  T . H o  
B a r r e t t  C. H e s t e r  
J e n n i f e r  L. P e r e s i e  
K e l l o g g , H u b e r , H a n s e n , 

T o d d , E v a n s  &  F i g e l , P.L.L.C. 
1 6 1 5  M  Street, N.W., Suite 4 0 0  
Washington, D.C. 2 0 0 3 6  
(2 0 2 )  3 2 6 -7 9 0 0

Counsel for Industrial- 
Organizational Psychologists

Copyright notice

© NAACP Legal Defense and Educational Fund, Inc.

This collection and the tools to navigate it (the “Collection”) are available to the public for general educational and research purposes, as well as to preserve and contextualize the history of the content and materials it contains (the “Materials”). Like other archival collections, such as those found in libraries, LDF owns the physical source Materials that have been digitized for the Collection; however, LDF does not own the underlying copyright or other rights in all items and there are limits on how you can use the Materials. By accessing and using the Material, you acknowledge your agreement to the Terms. If you do not agree, please do not use the Materials.


Additional info

To the extent that LDF includes information about the Materials’ origins or ownership or provides summaries or transcripts of original source Materials, LDF does not warrant or guarantee the accuracy of such information, transcripts or summaries, and shall not be responsible for any inaccuracies.

Return to top