Watson v. Fort Worth Bank and Trust Brief Amicus Curiae in Support of Petitioner

Public Court Documents
September 14, 1987

Watson v. Fort Worth Bank and Trust Brief Amicus Curiae in Support of Petitioner preview

Brief submitted by the American Psychological Association

Cite this item

  • Brief Collection, LDF Court Filings. Watson v. Fort Worth Bank and Trust Brief Amicus Curiae in Support of Petitioner, 1987. d0a6e2b5-c89a-ee11-be36-6045bdeb8873. LDF Archives, Thurgood Marshall Institute. https://ldfrecollection.org/archives/archives-search/archives-item/77e92b64-598d-4b9b-85b4-5757b385c243/watson-v-fort-worth-bank-and-trust-brief-amicus-curiae-in-support-of-petitioner. Accessed October 09, 2025.

    Copied!

    No. 86-6139

In  The

CCnurt at ilyp Hmtpfc Stairs
October Term, 1987

Clara Watson,
Petitioner,

v.

Fort Worth Bank & Trust,
Respondent.

On Writ of Certiorari to the United States 
Court of Appeals for the Fifth Circuit

BRIEF FOR AMICUS CURIAE 
AMERICAN PSYCHOLOGICAL ASSOCIATION 

IN SUPPORT OF PETITIONER

Donald N. Bersoff 
(Counsel of Record)

Laurel Pyke Malson 
Donald B. Verrilli, Jr.
E nnis Friedman & Bersoff 
1200 - 17th Street, N.W., Suite 400 
Washington, D.C. 20036 
(202) 775-8100 
Attorneys for Amicus Curiae

American Psychological Association
September 14,1987

W il s o n  - Epes P r in t in g  C o . ,  In c . - 7 8 9 - 0 0 9 6  • W a s h in g t o n , D .C . 2 0 0 0 1



TABLE OF CONTENTS
Page

TABLE OF AUTHORITIES_____________________  iii

INTEREST OF AMICUS C U RIA E_______________  1

INTRODUCTION AND SUMMARY OF ARGU­
MENT _______________________________________  2

ARGUM ENT___________________________________  4
I. BECAUSE SUBJECTIVE ASSESSMENT DE­

VICES CAN, AND SHOULD, BE SCIENTIFI­
CALLY VALIDATED, THE USE OF SUBJEC­
TIVE SELECTION CRITERIA AND PROCE­
DURES BY EMPLOYERS SHOULD NOT 
PRECLUDE REVIEW UNDER ANY TITLE 
VII TH EO RY ____________________________  4
A. Professional Standards Concerning the Tech­

nical Adequacy of Selection Devices are Ap­
plicable to the Subjective Methods Used by 
Respondent____________________________  4

B. There are Generally Accepted Strategies for
Establishing the Validity of Subjective Meth­
ods of Employee Selection______________  9

C. To Reduce Sources of Bias the Validity7 of 
Each of the Selection Devices Used by Re­
spondent Must and Can Be Established by 
Generally Accepted And Accessible Valida­
tion Strategies_________________________  14
1. The interview ________________________ 14
2. Rating scales and other performance

appraisals__________________________  17
3. Experience requirements_____________  21

II. THE SUBJECTIVE SELECTION PROCE­
DURES USED BY RESPONDENT FAIL TO 
MEET GENERALLY ACCEPTED STAND­
ARDS AND APPEAR TO HAVE BEEN AP­
PLIED WITHOUT ANY EVIDENCE THAT 
THEY ARE VALID FOR THE INFERENCES 
DRAWN FROM T H E M ___________________  22



11

Page

III. THE FAILURE TO REQUIRE THAT SUB­
JECTIVE SELECTION DEVICES HAVE 
DEMONSTRABLE VALIDITY WOULD UN­
DERMINE THE PURPOSES OF TITLE VII- 28

CONCLUSION__________________________________ 30

TABLE OF CONTENTS—Continued



Ill

TABLE OF AUTHORITIES
CASES: Page

Albemarle Paper Co. v. Moody, 422 U.S. 405
(1 9 7 5 )___________________________________  20

Brito v. Zia, 478 F.2d 1200 (10th Cir. 1973)______  9
Debra P. v. Turlington, 644 F.2d 397 (5th Cir.

1981) ____________________________________  6
Douglas v. Hampton, 512 F.2d 976 (D.C. Cir.

1975)_____________________________________ 6
Griggs v. Duke Power Co., 401 U.S. 424 (1971)___ 28, 29
Harless v. Duck, 14 FEP Cases 1616 (N.D. Ohio

1977)____________________________________ 6
McDonnell Douglas Corp. v. Green, 411 U.S. 792

(1973)___________________________________ 29
Texas Dep’t of Comm. Affairs v. Burdine, 450 U.S.

248 (1981)______________________________  29
Washington v. Davis, 426 U.S. 227 (1976)______  6, 12
Watson v. Fort Worth Bank <£ Trust, 798 F.2d 

791 (5th Cir. 1986)_______________________  24, 27

STATUTES & REGULATIONS:
42 U.S.C. § 2000e et se q ._____________________ 2, 3, 28
29 C.F.R. § 1607 et seq______________________3, 7, 8, 27
H.R. Rep. No. 914, 88 Cong., 2d Sess., reprinted in

1964 U.S. Code Cong. & Ad. News 2391______  28

MISCELLANEOUS:
AERA, APA, NCME, Standards for Educational

and Psychological Testing (1985)________ passim
AERA, APA, NCME, Standards for Educational

and Psychological Tests (1974)_____________  6
A. Anastasi, Psychological Testing (5th ed.

1982) _____________________ 10, 15, 16, 17, 18, 20, 21
APA, Standards for Educational and Psychological

Tests and Manuals (1966)_________________  6
APA, Technical Recommendations for Psycholog­

ical Tests and Diagnostic Techniques (1954)___ 6
Arvey, Unfair Discrimination in the Employment 

Interview: Legal and Psychological Aspects, 86 
Psychology Bull. 736 (1982)_____________  14,15



IV

Arvey & Campion, The Employment Interview: A 
Summary and Review of Recent Research, 35
Personnel Psychology 281 (1982 )___________  16

Bernardin & Pence, Effects of Rater Training, 65
J. Applied Psychology 60 (1980)___________  21

Bersoff, Testing and the Law, 36 Am. PSYCHOLO­
GIST 1047 (1981 )___________________________  11

W. Bingham, B. Moore & J. Gustad, How to In­
terview (4th ed. 1959)_____________________  15

Borman, Format and Training Effects on Rating 
Accuracy and Rating Errors, 64 J. Applied
Psychology 410 (1979)_____________________  21

Brush & Owens, Implementation and Evaluation 
for an Assessment Classification Model for Man­
power Utilization, 32 Personnel Psychology
369 (1979 )_________________________________  22

Cascio & Bernardin, Implications of Performance 
Appraisal Litigation for Personnel Decisions, 34
Personnel Psychology 211 (1981)________ 19, 20, 25

L. Cronbach, Essentials of Psychological
Testing (4th ed. 1984)___________________ 10, 18, 20

Distefano, Pryer, & Erffmeyer, Application of Con­
tent Validity Methods to the Development of a 
Job-Related Performance Rating Criterion, 36
Personnel Psychology 621 (1983)_________ 19

G. Dreher & P. Sackett, Perspectives on Staff­
ing and Selection (1983)__________________  21

Dunnette & Borman, Personnel Selection and Clas­
sification Systems, 40 An n . Rev. Psychology
477 (1979 )_________________________________  15

R. Fear, The Evaluation Interview (2d ed.
1973)______________________________________  15

Feild & Holley, The Relationship of Performance 
Appraisal System Characteristics to Verdicts in 
Selected Employment Discrimination Cases, 25
Acad. Mgmt J. 392 (1982)_________________  19

Friedman & Williams, Current Use of Tests for 
Employment in 2 Ability Testing (A. Wiedor 
& W. Garner eds. 1982)

TABLE OF AUTHORITIES—Continued
Page

9



V

S. Gael, Job Analysis: A Guide to Assessing
W ork Activities (1983)____________________  13

I. Goldstein, Training in  Organizations (2d ed.
1986 )__________________________________ 5, 14, 18, 21

Grant & Bray, Contributions of the Interview' to 
Assessment of Management Personnel, 53 J. Ap­
plied P sycology 24 (1969)__________________  15

Guion, On Trinitarian Doctrines of Validity, 11
Prof. Psychology 385 (1980 )_______________ 11

Hakel, Employment Interviewing in PERSONNEL 
Managem ent  (K. Rowland & G. Ferris eds.
198 2 )______________________________________  16

H. Henneman, D. Schwab, J. Fossum, & L. Dyer, 
Personnel/H uman Resource Management

(1980)_____________________________________  14
Ivancevich, Longitudinal Study of the Effects of 

Rater Training on Psychometric Errors in Rat­
ings, 64 J. Applied Psychology 502 (1979)___  21

Kleiman & Durham, Performance Appraisal, Pro­
motion and the Courts: A Critical Review, 34
Personnel Psychology 103 (1981)_____ 5,13,19, 20

Korman. The Prediction of Managerial Perform­
ance: A Review, 21 Personnel Psychology
295 (1968 )_________________________________  22

Kraiger & Ford, A Meta-analysis of Ratee Race 
Effects in Performance Ratings, 69 J. Applied
Psychology 56 (1985)_______________________  18

Landv & Farr, Performance Rating , 87 Psycho­
logical Bull. 72 (1980)__________________ 17, 18, 19

A. Larson & L. Larson, Employment Discrim­
ination § 15-87 (1986)_____________________  5

Latham, Saari, Pursell & Campion, The Situational 
Interview, 65 J. Applied Psychology 422
(1 9 8 0 )_____________________________________ 21

Latham. Wexley & Pursell, Troinina Manaaers to 
Minimize Rating Errors in the Observation of 
Behavior, 60 J. Appi IED PSYCHOLOGY 550
(1 9 7 5 )_______________________________________  16

E. Levine, Everything You Ever Wanted to 
Know about Job Analysis (1983) .

TABLE OF AUTHORITIES—Continued
Page

13



VI

TABLE OF AUTHORITIES—Continued
Page

Locher & Teel, Performance Appraisal— A Survey 
of Current Practices, 56 PERSONNEL J. 245
(1 9 7 7 )--------------------------------------------------------  17

J. Matarazzo & A. Weins, The Interview: Re­
search on its Anatomy and Structures
(1 9 7 2 )--------------------------------------------------------  15

E. McCormick, Job Analysis: Methods and
Applications (1979)_______________________  13

Messick, Test Validity and the Ethics of Assess-
ment, 35 Am . P sychologist 1012 (1 9 8 0 )_____  n

Owens, Background Data in Handbook of Indus­
trial and Organizational Psychology (M.
Dunnette ed. 1 9 7 6 )_________________________  21

Owens & Schoenfeldt, Toward a, Classification of 
Persons, 46 J. Applied Psychology 329 (1979) _ 22

Pace & Schoenfeldt, Legal Concerns in the Use of 
Weighted Applications. 30 Personnel Psychol­
ogy 159 (1977)____________________________  22

Reilly & Chao, Validity and Fairness of Some Ah  
temotive Employee Selection Procedures, 35
Personnel Psychology 1 (1982)_____________  22

Rice, Spotlight on Employee Performance, 9 US
Air 53 (August 1987)______________________  19

Schmidt & Johnson, Effect of Race on Peer Ratings 
in an Industrial Setting, 57 J. Applied Psychol­
ogy 237 (1973)_____________________________ 18

Schmitt, Social and Situational Determinants of 
Interview Decisions: Implications for the Em­
ployment Interview, 29 Personnel Psychology
79 (1976)---------------------------------------------- 14, 15, 23

B. Schneider & N. Schmitt, Staffing Organiza­
tions (2d ed. 1 9 8 6 )_______________________ ;

Schoenfeldt, Utilization of Manpower: Develop­
ment and Evaluation of Assessment-Classifica­
tion Model for Matching Individuals with Jobs,
59 J. Applied Psychology 583 (1974) 22



Vll

Society for Industrial and Organizational Psy­
chology, Principles for the Validation and

TABLE OF AUTHORITIES—Continued
Page

Use of Personnel Selection Procedures
(1987) ___________________________________ passim

Tenopyr, Content-Construct Confusion, 30 Per­
sonnel Psychology 47 (1 9 7 7 )_______________  11

Thompson & Thompson, Court Standards for Job 
Analysis in Test Validation, 35 Personnel Psy­
chology 865 (1982)________________________  13,23

Ulrich & Trumbo, The Selection Interview Since
19U9 , 63 Psychology Bull. 100 (1965)__________14,15

Waintroob, The Developing Law of Equal Employ­
ment Opportunity at the White Collar and Pro­
fessional Level, 21 Wm. & Mary L. Rev. 45 
(1979) _____________________________________  5



BRIEF FOR AMICUS CURIAE 
AMERICAN PSYCHOLOGICAL ASSOCIATION 

IN SUPPORT OF PETITIONER

INTEREST OF AMICUS CURIAE
The American Psychological Association (“APA”) is 

a nonprofit, scientific, and professional organization with 
more than 65,000 members. It has been the major asso­
ciation of psychologists since 1892, and includes the vast 
majority of psychologists holding doctoral degrees from 
accredited universities in this country. Among APA’s 
major functions are the promotion of psychological re­
search, the dissemination of information regarding hu­
man psychological behavior, the promulgation of stand­
ards governing scientific and professional practice, in­
cluding assessment, and, as reflected in its Bylaws, the 
“advance [ment] of psychology as a science and profes­
sion.” A substantial number of APA’s members are con­
cerned with the development and validation of assessment 
devices for personnel selection in the employment context, 
including the more than 2500 members who belong to the 
APA’s Division of Industrial and Organizational Psy­
chology and the 1500 members who belong to its Division 
of Evaluation and Measurement.

The APA has participated as amicus in many cases in 
this Court involving social science issues, including Ken­
tucky v. Stincer, 107 S. Ct. 2658 (1987) (effects on child 
victims of sex abuse of testifying in the presence of their 
alleged abusers); Colorado v. Connelly, 107 S. Ct. 515 
(1986) (behavioral effects of command hallucinations); 
and Lockhart v. McCrec, 106 S. Ct. 1758 (1986) (“con- 
viction-proneness” of “death-qualified” juries). APA con­
tributes amicus briefs only where it has special knowl­
edge to share with the Court. APA regards this as one 
of those cases. In this instance, APA wishes to inform 
this Court of the state of current scientific thought re­
garding validation of personnel assessment devices, in­
cluding subjective selection criteria and procedures such 
as those used by respondent in this case.



2
Petitioner and respondent have consented to the filing 

of this amicus brief. Their letters of consent are on file 
with the Clerk of the Court.

INTRODUCTION AND SUMMARY OF ARGUMENT
APA addresses in this brief an essential and inherent 

issue in this case. Amicus will leave it to the parties to 
argue whether disparate impact analysis under Title VII 
of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et. seq., 
is properly applied to review the legality of subjective 
assessment devices for the hiring and promotion of em­
ployees. However, insofar as a negative answer is 
grounded in the assumption that subjective assessment 
devices are not amenable to psychometric scrutiny in the 
same way that ability tests are, such an assumption is 
contrary to fundamental and generally accepted scientific 
principles of measurement. The most frequently articu­
lated reason for limiting disparate impact analysis to ob­
jective criteria and procedures—that only objective cri­
teria and procedures yield sufficient statistical data to 
permit scientific validation—is not supported by the 
relevant social science literature. Indeed, the view that 
only objective selection criteria and procedures can be 
clearly identified, applied equally to all applicants, and 
statistically evaluated has been discredited by the ex­
tensive work of industrial psychologists and other assess­
ment specialists. Subjective selection devices can be scien­
tifically validated for the assessment of individuals for 
hiring, promotion, or other selection decisions in the em­
ployment context. The choice of analyses under Title 
VII, therefore, should not turn on whether the challenged 
employment practices are based on objective or subjec­
tive evaluations of applicants.

The APA’s Standards for E ducational and Psycho­
logical Testing (1985) [hereinafter Standards] pro­
vide a framework for the evaluation and validation of 
testing and other assessment devices, including such sub­
jective devices as interviews, behavioral observations, and 
rating scales. The Standards are consistent with the 
P rinciples for the Validation and Use of Personnel



3

Selection P rocedures (1987) [hereinafter P rinciples] 
published by the Society for Industrial and Organiza­
tional Psychology.1 Furthermore, the Uniform Guidelines 
on Employee Selection Procedures, 29 C.F.R. §§ 1607.1, 
et. seq. [hereinafter Uniform Guidelines] were explicitly 
intended to be consistent with the Standards. Id . at 
§ 1607.5(C). Such technical standards were clearly con­
templated by the drafters of Title VII when they referred 
to the use of “pj'ofessionally developed” assessment de­
vices by employers. 42 U.S.C. § 2000e-2(h>. When ex­
amined in light of the Standards, Principles, and Uni­
form Guidelines, it is clear that the procedures respond­
ent used in this case to evaluate petitioner for promotion 
were not shown to be scientifically valid, i.e., appropriate, 
meaningful, or useful for the inferences drawn from 
them. See Part 1(B). More importantly, however, the 
Standards and P rinciples provide ample guidelines for 
how those procedures, notwithstanding their subjective 
nature, could have been validated.

The recognition that subjective assessment procedures 
may be validated is critical to the effectuation of the 
underlying goals of Title VII. Subjective procedures, like 
those of a more objective nature, should be required to be 
validated for specific jobs in any case where employers 
use those procedures as a defense to a prima facie case of 
discrimination, whether the claim is analyzed under a dis­
parate treatment or a disparate effect theory. Only un­
der such a rule will employers be inhibited from making 
personnel decisions based on unlawful and irrelevant fac­
tors. Should the Court in this case implicitly approve the 
use of unvalidated selection procedures as a defense to 
any Title VII claim, employers will have greater incen­
tive to resort more readily to subjective assessment de-

1 The Society is an integral component of the amicus and is also 
known as Division 14 of the APA (Division of Industrial and 
Organizational Psychology). Until adopted by amicus as a whole, 
the Principles are the formal policy only of Division 14. However, 
they are “intended to represent the consensus of professional knowl­
edge and thought as it exists today. . . Id. at 3.



4

vices, which would facilitate covert and discriminatory 
decisionmaking and severely undermine the right of equal 
employment opportunities for those classes of persons 
otherwise protected by Title VII.

ARGUMENT
I. BECAUSE SUBJECTIVE ASSESSMENT DEVICES 

CAN, AND SHOULD. BE SCIENTIFICALLY VALI­
DATED, THE USE OF SUBJECTIVE SELECTION 
CRITERIA AND PROCEDURES BY EMPLOYERS 
SHOULD NOT PRECLUDE REVIEW UNDER ANY 
TITLE VII THEORY.

Amicus relies on petitioner and other supporting aviici 
to establish the applicability of disparate impact analysis 
under Title VII to subjective selection criteria for hiring 
and promotions. Amicus wishes to share with the Court 
its unique knowledge of testing and other assessment de­
vices, and the validation of such devices, so that the Court’s 
decision regarding the proper standard of analysis will 
be informed by relevant social science data. However, 
should the Court ultimately determine that disparate 
treatment analysis is the appropriate standard of review 
for subjective assessment devices in the employment con­
text, amicus believes that such a holding in no way obvi­
ates the principle that such devices can and should be 
validated for the particular job in question.

A. Professional Standards Concerning the Technical 
Adequacy of Selection Devices are Applicable to the 
Subjective Methods Used by Respondent.

At issue in this case are three selection devices by 
which petitioner was evaluated for promotion by respond­
ent’s agents—interviews, supervisor’s ratings, and ex­
perience requirements. Respondent characterizes these as 
subjective assessment procedures, in contrast to such 
procedures as multiple choice standardized paper-and- 
pencil tests which are typically classified as objective 
measures. Although in the context of personnel assess­
ment procedures the term “subjective” is not easily de­
fined, the concept has been used to refer variously to



5

procedures in which “judgment or discretion fis exer­
cised] on the part of the evaluator” " or which lack any 
“neutral” factors,2 3 to assessment devices of a “non­
mechanical, operator-dependent” nature,4 or to appraisals 
not based on “ ‘hard data,’ such as production records 
[or] attendance.” 5 Most simply, “Measures that require 
the statement of opinion, beliefs, or judgments are con­
sidered subjective.” 6 Regardless of the particular defini­
tion, they are consistent with the widely-held view that 
subjective devices used by employers for hiring or promo­
tion purposes are inherently less scientific, less quantifi­
able, less reliable, and less facially neutral than their 
objective counterparts.7 For this reason, it has been as-

2 Waintroob, The Developing Law of Equal Employment Oppor­
tunity at the White Collar and Professional Level, 21 Wm. & Maf.Y 
L. Rev. 45, 48 (1979).

3 3 A. Larson & L. Larson, E mployment Discrimination 
§ 15-87 (1986).

4 Brief of United States as Amicus Curiae on Petition for Writ
of Certiorari at 13 [hereinafter Brief of United States].

6 Kleinian & Durham, Performance Appraisal, Promotion and the 
Courts: A Critical Review, 34 Personnel Psychology 103, 114 
(1981) [hereinafter Kleiman & Durham].

« I. Goldstein, Training in Organizations 136 (2d ed. 1986) 
[hereinafter Goldstein], “For example, rating scales are subjective 
measures, while measures of absenteeism are more objective. 
(However, supervisors’ ratings of the absenteeism level of em 
ployees could turn that measure into a subjective criterion).’’ Id.

7 “ [S]ubjective measures are affected by the difficulties that one 
individual has in rating another without bias.” Id. at 136-137. 
These difficulties, however, are not due to any inherent charac­
teristics of subjective measures but to the failure of employers to 
apply standard principles of test construction fo subjective meas­
ures. It is not intrinsic to such devices to be unquantifiable. When 
properly developed, they are amenable to scoring and objective 
analysis. For example:

rating scales have been the most commonly employed measures 
in applied settings. . . . [One] reason is that it is simple to 
throw together a rating scale with a few traits . . . and 
delude yourself into believing that you have a useful measure 
of performance. Professionals . . . know that the steps in



6

serted that subjective selection methods and criteria are 
not susceptible to scientific “validation” or any other 
psychometric scrutiny. See, e.g., Brief of United States, 
supra note 4, at 15, and references therein.

This view is fundamentally at odds with the universal 
judgment of those most experienced and knowledgeable 
in techniques of measurement and evaluation generally, 
and in the appraisal of employee performance specifically. 
The most authoritative source for the standards to be 
applied to determine the technical adequacy of assess­
ment devices, the appropriateness of specific applications 
of these devices, and the reasonableness of inferences 
based on the results of these devices are the Standards 
for Educational and P sychological Testing (1985), 
a joint publication of the amicus APA, the Ameri­
can Educational Research Association (“AERA” ), and 
the National Council on Measurement in Education 
(“NCME”).e

Consistent with the Standards, the Division 14 Prin­
ciples for the Validation and U se of Personnel Se- 8

the process are very similar for objective and subjective 
measures and that shortcuts do not work in either case.

Id. at 137 (emphasis added).
8 The 1985 Standards represent the most modern expression of 

professional and scientific thinking concerning technical advances 
in psychological assessment. Its predecessor documents are, APA, 
Technical Recommendations for Psychological Tests and Diagnostic 
Techniques (1954); APA. Standards for Educational and Psycho­
logical Tests and Manuals (1966); and AERA, APA, NCME 
Standards for Educational and Psychological Tests (1974).

The 1966 and 1974 forerunners of the current Standards have 
been cited with approval by this Court and in a variety of lower 
federal cases. S(e, e.g.. Washington v. Davis, 426 L’.S. 227, 247 
n.13(1976); Debra P. r. Turlington, 644 F.2d 397. 405 n.10 (5th 
Cir. 1981'; Douglas r. Hampton, 512 F.2d 976, 984-986 (D.C. Cir. 
1975) (see especially id. at 984 n.59 where the court called the 
1966 edition “the universally recognized professional authority” ); 
Harless v. Duck. 14 FEP Cases 1616, 1624 n.5 (N.D. Ohio 1977) 
(stating that the “courts have almost unanimously agreed that the



lection Procedures (1987) apply the more general 
guidelines of the Standards to the specific problems of 
making decisions in the context of employee selection, 
placement, and promotion and provide, inter alia, “prin­
ciples for the application and use of valid selection pro­
cedures, and information that may be helpful to person­
nel managers and others responsible for authorizing or 
implementing validation efforts.” Principles at 2.8 9

Of relevance as well are the Uniform Guidelines. “The 
provisions of these guidelines . . . are intended to be 
consistent with generally accepted professional standards 
for evaluating standardized tests and other selection pro­
cedures, such as those described in the Standards . . . 
prepared by a joint committee of the” APA, AERA, and 
NCME. 29 C.F.R. § 1607.5(C). They “incorporate a 
single set of principles which are designed to assist em­
ployers . . .  to comply with requirements of Federal law 
prohibiting employment practices which discriminate on 
the grounds of race. . . .” Id. at § 1607.1(B).

The Standards, P rinciples, and Uniform Guidelines 
apply to a broad range of selection procedures, not merely 
to the traditional objective measures commonly denomi­
nated as tests. Although the full title of the Standards 
uses the word “testing,” the term is generic and refers 
to “standardized ability . . . instruments, diagnostic and 
evaluative devices, interest inventories, personality inven­
tories, and projective instruments.” Standards at 3. 
The term also includes samples of observable behavior 
“relevant to . . . employment decisionmaking.” Id. at 4. 
Amicus, AERA, and NCME unequivocally view the

[1974] APA guidelines provide persuasive standards for evaluat­
ing: claims of job relatedness”).

9 The 1987 Principles are a revision of the original 1980 version. 
“The purposes of the revision are to bring the Principles up to 
date scientifically, to make them consistent with the Standards, 
and to reduce possible ambiguities regarding good practice in the 
use of selection procedures in making employment decisions.” 
Principles at 1.



8
Standards as useful and applicable “to the entire range 
of assessment techniques.” Id .w

Similarly, the Division 14 P rinciples are not appli­
cable solely to standardized paper-and-pencil tests. They 
are explicitly intended to aid employers who make hiring 
and promotion decisions to choose, select, develop, eval­
uate, and use all personnel selection devices, including 
“performance tests, . . . personality [and] interest in­
ventories, . . . biographical data forms or scored appli­
cation blanks, interviews, . . . experience requirements, 
. . . appraisals of job performance, . . . [and] esti­
mates of advancement potential.” Principles at 1.

Finally, the Uniform Guidelines “provide a framework 
for determining” not only “the proper use of tests” but 
“other selection procedures” as well. 29 C.F.R. § 1607.1 
(B). They plainly “apply to tests and other selection 
procedures which are used as a basis for any employment 
decision[,] including “hiring and promotion.” Id. at 
§ 1607.2 (B1 .n

In sum, then, it is the universal professional judgment 
that the assessment devices used by respondent in decid­
ing not to promote the petitioner can properly be scrutin­
ized under the applicable scientific and professional 
standards, principles, and guidelines concerning the 
evaluation of the psychometric soundness of such de­
vices.10 11 12’ In the present case, those devices were not sub­

10 Such “instruments . . . are called tests here to indicate that 
the standards also apply to these instruments.” Id. at 4-5.

11 See also, id., at § 1607.15(A)(1) (referring to selection proce­
dures “either standardized or not standardized”).

12 See, e.g., B. Schneider & N. Schmitt, Staffing Organiza­
tions 14 (2d ed. 1986) [hereinafter Schneider & Schm ittl:

Typically we think of a test as an examination of some kind 
responded to with paper and pencil. . . .  In fact, industrial 
psychologists and the L'niform Guidelines on Employee Selec­
tion Procedures (1978) have defined the word “test” in much 
broader terms and the courts have adopted this definition. 
In brief, a test is defined as any form of collecting information



9

jected to any such scrutiny to determine if they were 
valid for the purpose for which they were used.

B. There are Generally Accepted Strategies for Estab­
lishing the Validity of Subjective Methods of Em­
ployee Selection.

There are many elements involved in the construction, 
development, and evaluation of an assessment instru­
ment. Industrial psychologists and others who create a 
test or other selection procedure must choose the domain 
to be assessed, construct the items to w7 * * * * *hich test takers 
will respond or select the behaviors to be observed, de­
velop scoring scales and norms so that results can be 
interpreted, prepare manuals, and most importantly, 
ensure that the instrument is psychometrically sound. See 
S ta n d ar d s  at 9-37. The psychometric soundness of an 
instrument depends primarily on its being reliable 13 and 
valid. Although both qualities are essential, there is no 
doubt that “validity is the most important consideration 
in test evaluation.” S ta n d ar d s  at 9.14

when that information is used as a basis for making an em­
ployment decision. So, interviews are tests, as are application
blanks, . . . performance appraisals used as a basis for mak­
ing promotions (which, obviously are selection decisions), and
any other kind of information used for making employment
decisions.

Accord Friedman & Williams, Current Use of Tests for Employ­
ment in 2 Ability Testing 99-100 (A. Wigdor & W. Garner eds. 
1982) published by The National Academy of Sciences (acknowledg­
ing that the definition of selection procedures has extended to the 
full range of assessment devices, including interviews and are to 
be scrutinized according to the same guidelines used to evaluate 
standardized tests). See also Brito v. Zia, 478 F.2d 1200 (10th Cir. 
1973) (holding that subjective observations by employer of minor­
ity employees had to be supported by empirical evidence of 
validity).

13 “Reliability refers to the degree to which test scores are free 
from errors in measurement.” Standards at 19. The more reli­
able a test score the more consistent, dependable, or repeatable it 
will be. See Principles at 39.

14 “Undoubtedly the most important question to be asked about 
any psychological test concerns its validity . . . .” A. Anastasi,



10
Validation refers to the process by which psychologists 

ascertain the degree to which certain inferences from a 
particular assessment device are appropriate, meaningful, 
or useful. See Standards at 9; Principles at 4. A test 
is valid if the proposed interpretation of scores proves to 
be sound and relevant. See Cronbach, supra note 14, at 
125. Validity “concerns what the test measures and how- 
well it does so. It tells us what can be inferred from test 
scores.” Anastasi, supra note 14, at 131. When a selec­
tion procedure or device is said to be “validated,” psy­
chologists understand that the predictions inferred from 
the scored result of the procedure or device have a high 
rate of accuracy. For validation to be meaningful, it 
must predict performance of a particular task or set of 
tasks or other job relevant behavior of particular concern 
to the employer. Thus, selection procedures or devices are 
validated for a particular job when they have been dem­
onstrated scientifically to make reliable and meaningful 
distinctions between individuals on the basis of their abil­
ity to perform particular tasks w-ith competence or to 
function successfully in a particular job.™ “The simple 
psychometi ic fact that test validity must be ascertained 
for specific uses of the test has long been familiar 
An invalid test or one that includes elements not related 
to the job under consideration may unfairly exclude 
minority group members who could have performed the 
job satisfactorily.” Anastasi, supra note 14, at 432.

Validity “is a unitary concept.” Standards at 9. 
However, within the unifying theme of determining the

P sychological Testing 27 (5th ed. 1982) [hereinafter Anastasi] 
Obviously, no aspect of a test is more important than valid­

ity . . . .  L. Cronbach. Essentials of Psychological Testing 
1-5 (4th ed. 1984) [hereinafter Cronbach], The texts by Anastasi 
and Cronbach are considered the most authoritative basic treatises 
on the topic of psychological measurement.

Only as abbreviation is it legitimate to speak of ‘the validity 
of a test'; a test relevant to one decision may have no value for 
another. So users must ask. ‘How valid is this test for the decision 
made, or ‘How valid are the interpretations I am making of the 
scores?’.” Cronbach, supra note 14, at 125.



11
relevance or interpretability of scored results, “ [t]he 
validity of any inference can be determined in a variety 
of ways.” Principles at 4. “ [T]he various means of 
accumulating validity evidence have been grouped into 
categories called content-related, criterion-related, and 
construct-related evidence of validity.” S ta n d ar d s  at 9.16 
In the context of personnel selection procedures, content- 
validation involves a determination that the assessment 
instrument accurately reflects a representative sample of 
important aspects of job performance or job-required 
knowledge. See Principles at 19.17 Criterion-related 
validation involves a determination that the assessment 
instrument is predictive of, or significantly correlated 
with, important elements of job performance or work 
behavior. See P r in c ipl e s  at 6.18 Construct validation

16 “ [I]nsofar as the courts have interpreted the test standards 
and . . . the Uniform Guidelines . . .  to mean that content, cri­
terion, and construct validity are distinct forms of validation, those 
interpretations are oversimplified, if not erroneous.” Bersoff, Test­
ing and the Law, 36 Am. Psychologist 1047, 1051 (1981). These 
three approaches should be viewed as subsets within a unifying 
and common framework. See Guion, On Trinitarian Doctrines of Va­
lidity, 11 Prof. Psychology 385, 386 (1980); Messick, Test Ua- 
lidity and the Ethics of Assessment, 35 Am. Psychologist 1012 
(1980) ; Tenopyr, Content-Construct Confusion, 30 Personnel 
Psychology 47 (1977). The three approaches may be discussed 
“separately only to acknowledge traditional presentations and avoid 
an abrupt departure from tradition.” Principles at 4.

17 “In general, content-related evidence demonstrates the degree 
to which the sample of items, tasks, or questions on a test are 
representative of some defined universe or domain of content.” 
Standards at 10.

18 “Criterion-related evidence demonstrates that test scores are 
systematically related to one or more outcome criteria . . . .” 
Standards at 11.

Two designs for obtaining criteria-related evidence—predic­
tive and concurrent—can be distinguished. A predictive study 
obtains information about the accuracy with which early test 
data can be used to estimate criterion scores that will be ob­
tained in the future. A concurrent study serves the same pur­
poses, but obtains prediction and criterion information simul­
taneously.

Id.



12
involves a determination that the assessment instrument 
accurately measures the degree to which individuals pos­
sess identifiable characteristics which have been deter­
mined to be important for successful job performance. 
See Principles at 25.18

Each of the validation strategies requires the employer 
to engage in an essential prerequisite activity. Through 
a process known as job analysis, the employer must 
clearly identify the most important components of suc­
cessful job performance. “Job analysis is essential to 
the development of a content-oriented procedure or to the 
justification of a construct important to job behavior.” 
Principles at 5. “In some situations, the major purpose 
of job analysis may be to provide information from 
which criterion measures may be developed.” Id. at 6. 
Satisfying this essential prerequisite requires an analysis 
of the job in question, and a clear articulation of the 
knowledge, skills and abilities (“KSAs” ), or other per­
sonal characteristics or behaviors the exhibition of which 
determine proficiency at that job.19 20 These data can be 
secured through judgments of job incumbents, their su­
pervisors, personnel specialists, the professional judg­
ment of job experts, and through training manuals,

19 “The evidence classed in the construct-related category focuses 
primarily on the test score as a measure of the psychological cate­
gory of interest . . . .  Such characteristics are referred to as con­
structs because they are theoretical constructions about the nature 
of human behavior.” Standards at 9.

This Court has summarized these approaches in Washington v. 
Davis, 426 U.S. at 247 n.13, as it understood them to be described 
in the 1966 version of the Standards.

20 Although a job analysis is crucial to all validation strategies 
it has been emphasized more clearly in the context of content- 
oriented validity. See, e.g., “Content validation should be based on 
a thorough and explicit definition of the content domain of interest. 
For job selection, classification, and promotion, the characterization 
of the domain should be based on job analysis.” Standard 10.4, 
Standards at 60-61. But it is required for criterion-related va­
lidity, see Standard 10.1. Standards at 60, and construct-related 
validity, see Standard 10.8, Standards at 61, as well.



13

job descriptions, and other written information. See 
Principles at 19-20; Schneider & Schmitt, supra note 12, 
at 47; Thompson & Thompson, Court Standards for Job 
Analysis in Test Validation, 35 Personnel Psychology 
865 (1982). In addition, the relative importance of these 
KSAs must be determined. Finally, a close link between 
the assessment device and the identified job content or 
behavioral characteristic (construct) must be established. 
Standard 10.5, Standards at 61; see also Standard 10.8. 
Id.s

In sum, then, the use of particular selection procedures 
by an employer reflects his or her implicit assumption 
that some important aspect of behavior on the job can be 
predicted from an individual’s scores or performance on 
the chosen selection procedure. The critical factor under­
lying this assumption is the accumulation of evidence or 
data to support an inference of the chosen procedure’s 
job-relatedness. This can be accomplished only through 
the various strategies of validation.

Validation is no less applicable to subjective assess­
ment devices than to objective ones. In both cases, ac­
curate predictors of job performance are essential to as­
sist employers in selecting or promoting individuals who 
will best serve their needs, as well as to provide a method 
of personnel selection that inhibits consideration of non­
job-related factors such as an individual’s race. Indeed, 
several commentators have noted the emphasis placed by 
many courts and employers on objectivity, at the expense 
of validity, i.e., requiring the use of certain “neutral ap­
pearing objective tests to measure job performance, 
even where the validity of those criteria is clearly ques­
tionable. -■ Because of the role that validation plays in 21 22

21 Several books summarize job analysis procedures and discuss 
their relative utility in various situations. See, c.p ., S. Gael, J ob 
Analysis: A Guide to Assessing Work Activities (1983); E. 
Levine, Everything You Ever Wanted to Know about J ob 
Analysis (1983); E McCormick, J ob Analysis: Methods and 
Applications (1979).

22 See, e.g ., Kleiman & Durham, supra note 5, at 117-118.



14

enhancing the quality of selection procedures and reduc­
ing the potential for discrimination, an employer should 
be required to provide evidence of the validity of those 
procedures whether the employer chooses to use ones 
labeled as subjective or objective.

C. To Reduce Sources of Bias the Validity of Each of 
the Selection Devices Used by Respondent Must 
and Can Be Established by Generally Accepted and 
Accessible Validation Strategies.

Industrial psychologists routinely use the three strat­
egies for validating assessment devices described in Part 
1(B) to validate both objective devices such as standard­
ized ability tests and interest inventories, and purely sub­
jective or multi-component devices such as interviews, 
performance appraisal ratings, constructed performance 
tasks, nonscored experience and biographical data intake 
sheets, and structured behavioral sample tests. See gen­
erally, e.g., Goldstein, supra note 6; Schneider & Schmitt, 
supra note 12.

1. The interview.
The employment interview, the technique most heavily 

relied upon by respondent in this case, is probably more 
widely used than any other selection tool. See H. Hen- 
n em a n , D. Schwab, J. F ossum, & L. D yer, Personnel 
Human Resource Management (1980); Ulrich & 
Trumbo, The Selection Interview Since 1949, 64 Psy­
chology Bull. 100 (1965 •. However, because most em­
ployers are unaware of research which has determined 
which variables are reliably, validly, and uniquely as­
sessed in the selection interview, see Schmitt, Social and 
Situational Determinants of Interview Decisions: Impli­
cations for the Employment Interview, 29 Personnel 
Psychology 79, 97 (1976i, the employment interview is 
typically subject to interview bias of various types. See 
Arvey, Unfair Discrimination in the Employment Inter­
view: Legal and Psychological Aspects, 86 Psychology 
Bull. 736 (19821. The most commonly known bias is 
“halo effect,” where an interviewer may be unduly influ­



15

enced by a single trait which colors his/her judgment 
of the employee’s other traits. See Anastasi, supra note 
14, at 612. A related problem is “stereotyping” in 
which an employee is judged “based on his or her 
group membership [e.g., race] rather than on the basis 
of his or her unique characteristics” Schneider & Schmitt 
supra note 12, at 388-389. Another concern is the “simi- 
lar-to-me phenomenon” in which the interviewer adopts 
the attitude “I am wonderful and I have the following 
attitudes and opinions, so if candidates I interview have 
the same attitudes and opinions, they must also be won­
derful.” Id. at 389. “When combined with stereotyping, 
the similar-to-me phenomenon can be a potent deter­
minant of interviewer decision-making.” Id.

Interviews can afford an opportunity for direct ob­
servation of samples of behavior, albeit limited, mani­
fested during the interview and serve to evoke life-history 
data, both of which can be important predictors of fu­
ture performance, if interviews are developed and con­
ducted using generally accepted standards. See Anastasi, 
supra note 14, at 610.23 Several recent studies, reviewed 
by noted scholars, show that interview judgments can be 
valid indicators of subsequent job performance. See

-3 A variety of available sources discuss methods, applications, 
and effectiveness of interviewing, and research on the inter­
viewing process. Sec, e.g., W. Bingham, B. Moore & J. Gustad, 
How TO Interview (4th ed. 19591: R. Fear. The Evaluation 
Interview (2d ed. 1973) ; J. Matarazzo & A. Weins, The Inter­
view: Research on its Anatomy and Structure (1972); Arvey, 
Unfair Discrimination in the Employment Interview: Legal and 
Psychological Aspects, 86 Psychology Bull. 736 (1979) ; Dunnette 
& Borman, Personnel Selection and Classification Systems, 40 ANN. 
Rev. P sychology 47< (1979) ; Grant & Bray, Contributions of the 
Interview to Assessment of Management Personnel, 53 J. Applied 
Psychology 24 (1969); Schmitt, Social and Situational Deter­
minants of Interview Decisions: Implications for the Employment 
Interview, 29 Personnel Psychology 79 ( 1976) ; Ulrich & 
Trumbo, The Selection Interview Since 1919, 63 Psychology Bull 
100 (1965).



16

Arvev & Campion, The Employment Interview: A Sum­
mary and Review of Recent Research, 35 PERSONNEL 
P sychology 281 (1982). These sources show that in­
terviews can be created that are valid and nondiscnmi- 
natory if interview questions are carefully linked to job 
analysis and performance criterion data.

Interview validity is not alone sufficient, however. The 
validity of the interviewer must also be established. An 
“interview requires skill in data gathering and in data 
interpreting. An interview may lead to wrong decisions 
because important data were not elicited or because given 
data were inadequately or incorrectly interpreted. 
Anastasi, supra note 14, at 610-611. In this regard, in­
terviewer training is important. A structured interview 
guide will improve interviewer reliability and assist in 
removing any bias especially if the training occurs with 
applicants of gender and/or race different than that of 
the interviewer. With important interviews used to de­
termine hiring or promotion, it is very helpful if appli­
cants are seen by more than one interviewer, although if 
records are kept it is possible to identify those interview­
ers w'hose decisions are most reliable and valid and rely 
on those interviewers singly to make judgments. See 
Schneider & Schmitt, supra note 12, at 390-394; Hakel, 
Employment Interviewing in P ersonnel Management 
(K. Rowland & G. Ferris eds. 19821.

jm As one example, researchers have described and tested an 
innovative but relatively simple and valid employment interview. 
Critical incidents, i.c.. reports by job incumbents or supervisors 
of situations in which especially effective or ineffective behavior is 
displayed, were converted into situational interview questions. The 
interviewer posed these situations to job applicants and asked them 
how they would behave. Each answer was rated independently by 
two or more interviewers on a five-point scale with end points on 
the scale provided by job experts to facilitate objective scoring. 
The process validily predicted future job performance for both 
women and blacks. See Latham. Saari, Pursell & Campion, The 
Situational Interview, 65 J. Applild Psychology 422 (1980;.



17
In sum, the use of employment interviews should be 

“preceded by a thorough analysis of the target job, the 
development of a structured set of questions based on the 
job analysis, and the development of behaviorally specific 
rating instruments by which to evaluate applicants.” 
Schneider & Schmitt, supra note 12, at 395. The assess­
ment of employees “should be maximally dependent on 
their personal characteristics and minimally dependent 
on who made the assessment . . . .  Where non-test pre­
dictors like interviewer judgments are used, the fem­
ployer] should develop procedures that will minimize 
error resulting from differences between judges.” P rin­
ciples at 12.

2. Rating scales and other performance appraisals.
Performance appraisal devices such as rating scales 

are widely used in employment settings, and were used 
by respondent in this case.25 “Rating scales differ from 
naturalistic observations in that data are accumulated 
casually and informally; they also involve interpretation 
and judgment, rather than simple recording of observa­
tions.” Anastasi, sujrra note 14, at 611. In contrast to 
interviews, however, “they typically cover a longer obser­
vation period and the information is obtained under 
more realistic conditions.” Id.

Like interviews, rating scales are subject to a variety 
of sources of contamination or bias, including: (1) Op­
portunity bias which occurs if raters do not have the 
opportunity to observe the employee in situations in 
which the behavior to be rated could be manifested, but 
have the opportunity to do so with a competing employee; 
(2) halo effect, a tendency on the part of raters to be 
unduly influenced by a single favorable or unfavorable

-r' In one survey, 899< of the companies studied reported using 
performance appraisals on a regular basis. Locher & Teel, Per­
formance Appraisal—A Surrey of Current Practices. 5G PERSON­
NEL J. 245 (1977). Of all performance appraisal techniques, the 
rating scale is by far the most ubiquitous. See Landy & Farr, 
Performance Rating. 87 P sychological Bull. 72, 73 (1980) 
[hereinafter Performance Rating].



18

trait, which colors their judgment on the individual’s 
other traits; (3) error of central tendency, or the ten­
dency to place persons in the middle of the scale and to 
avoid extremes; and (4) leniency error, or the reluc­
tance of many raters to assign unfavorable ratings. 
Both latter errors reduce the effective width of the rating 
scale and make them less useful in distinguishing among 
individuals. See Anastasi, supra note 14, at 611-612; 
Cronbach, supra, note 14, at 509-511; Schneider & 
Schmitt, supra note 12, at 90-92; Goldstein, supra note 6, 
at 255.

Most troubling in the context of this case is that scores 
on rating scales may be affected by the race of the rater 
and ratee. White raters have been found to assign sig­
nificantly higher ratings to white ratees than black 
ratees. These findings were noted in a comprehensive 
review of 74 studies involving 17,159 ratees in which 
the rater was white and 14 studies involving 2,420 
ratees in which the rater was black. Race effects were 
more pronounced in real-life settings than in labo­
ratory settings and more likely when, as in this case, the 
proportion of blacks in the workforce was small. See 
Kraiger & Ford, A Meta-analysis of Ratee Race Effects 
in Performance Ratings, 69 J. A pplied Psychology 56 
(1985).26

Psychologists have published a great deal of accessible 
lite ra tu re  describing ra ting  scale form ats which are use­
ful in ra ting  employees and have been critiqued in the

26 These finding's are not universal nor do they imply that per­
formance appraisal systems are inherently discriminatory. One 
study, for example, found no race of rater effect in an indus­
trial setting which was highly racially integrated and where par­
ticipants in the study had been exposed to human relations train­
ing. See Schmidt & Johnson. Effect of Race on Peer Ratings in an 
Industrial Setting, 57 J. Applied Psychology 237 (1973). These 
mitigating factors are not true with regard to respondent. For a 
comprehensive review of the effects of rater and ratee characteris­
tics and the interaction of the two see Performance Rating, supra 
note 25, at 74-82.



19

context of Title VII requirements.27 * Many of these for­
mats would be significant improvements over the system 
used by respondent.2* Regardless of format, what does 
produce a superior scale is that it is the “result of psy­
chometric rigor in development and of some level of par­
ticipation of individuals representative of those who will 
eventually use the scales to make ratings . . . Perform­
ance Rating, sujrra. note 25, at 85.

An essential aspect of the requirement for psychometric 
rigor is the job analysis. “The development of rating 
procedures should ordinarily be guided by job analyses

27 E.g., Cascio & Bernardin, Implications of Performance Ap­
praisal Litigation for Personnel Decisions, 34 Personnel Psy­
chology 211 (1981); Distefano, Prver, & Erffmeyer, Application 
of Content Validity Methods to the Development of a Job-Related 
Performance Rating Criterion, 36 Personnel Psychology 621 
(1983); Feild & Holley, The Relationship of Performance Ap­
praisal System Characteristics to Verdicts in Select Employment 
Discrimination Cases, 25 Acad. Mgmt J. 392 (1982) ; Kleiman & 
Durham, supra note 5; Performance Rating, supra note 25. The 
dissemination of this information is so widespread that it now 
appears in popular literature designed for lay readers. See Rice, 
Spotlight on Employee Performance, 9 US Air 53 (August 1987).

w Two of the most popular rating formats are the graphic rating 
scale and the behaviorally anchored rating scale (“BARS”). 
In the graphic rating scale’s most usual format, several dimensions 
to be rated are listed vertically and raters are then asked to make 
rating decisions along a horizontal 5 to 9 point scale. For example, 
if the dimension to be rated is “accuracy,” the scale may use a 
numerical or one-word verbal rating, e.g., 1 to 5 or “high” to “low” ; 
preferably, it may use a range of descriptions, e.g., at one end of 
the scale, would be “makes too many errors” ; at the other end, 
“almost never makes mistakes.” BARS uses dimensions derived by 
raters who would actually use the scale with different points on 
each dimension anchored by statements describing actual job be­
havior which would illustrate specific levels of performance, e.g., 
using “accuracy” in rating a bank teller, the statements could in­
clude a range from “makes frequent errors in totalling accounts 
at end of day” to “errors in totalling accounts are consistently 
rare.” See generally Schneider & Schmitt, supra note 12, at 101- 
106. Neither scale format seems more useful than the other in 
practice. See Performance Rating, supra note 25, at 85.



20
if, for example, raters are expected to evaluate several 
different aspects of performance,” as in this case. P r in ­
c iples  at 10. Job analyses are also important if, as here, 
appraisal of past performance is used as a predictor of 
future performance. The use of ratings of past perform­
ance in one job to make promotion decisions for another 
position is only permissible if the ratings of past per­
formance are valid and the ratings of past performance 
are related to future performance. The latter requires 
a job analysis indicating the extent to which the two 
jobs overlap. See Cascio & Bernardin, Implications of 
Performance Appraisal Litigation for Personnel Deci- 
siojis, 34 P e r so n n e l  P sychology 211, 217 (1981) .2®

The usefulness of a rating scale is also highly depend­
ent on the skill of the rater. “Valid ratings cannot be 
made by someone who is either unfamiliar with the work 
of the ratee or lacks the skills necessary to accurately 
observe or rate the job behavior.” Kleiman & Durham, 
supra note 5, at 113. In these respects, “those in imme­
diate contact with the subject give superior information,” 
Cronbach, supra note 14. at 512 (in the case of employ­
ment settings, first level supervisors!, and those raters 
who have undergone training show increased “reliability 
and validity of ratings” and decreased errors in judg­
ment. Anastasi, supra note 14, at 612.29 30

29 See also Albemarle Paper Co. r. Moody. 422 U.S. 405. 431-433 
(1975) where this Court, invoking the Uniform Guidelines (and 
crediting APA’s Standards) condemned as “materially defective” 
the employer’s validation study because its “subjective supervisorial 
rankings” used standards which were vague and ambiguous and 
failed to follow the Uniform Guidelines’ requirement for job 
analyses.

30 Sec also Standard 1.13, Standards at 16: “When criteria are 
composed of rater judgments, the degree of knowledge that raters 
have concerning ratee performance should be reported. If possible, 
the training and experience of the raters should be described” ; 
P rinciples at 10: “It may . . .  be necessary to train raters in the 
observation and evaluation of performance. Further, supervisors 
should be expected to be familiar enough with the demands of the 
job to evaluate overall performance.” The utility of rater training



21
In sum, rating scales will conform to legal and psycho­

metric requirements if the appraisal system is based on 
a job analysis, contains clearly defined dimensions of job 
performance rather than vague, global measures or ab­
stract trait names, is behaviorally based so that all rat­
ings can be supported by objective, observable evidence, 
and if the raters are in the position to observe the be­
haviors to be rated and are trained to reduce sources of 
bias, contamination, or other rating errors.

3. Experience requirements.
The use of past experience to make judgments about 

future performance, as in this case, is one aspect of a 
recognized selection device called biographical inventory 
technique or, more commonly, “biodata.” See Owens, 
Background Data in H andbook  of I n d u st r ia l  a n d  Or­
g a n iza tio n a l  P sychology (M . Dunnette ed. 19761. When 
used properly, biographical inventories which include 
prior experience are “especially appropriate for assessing 
the qualifications of women and minority groups.” 
Anastasi, supra note 14, at 616. To serve this purpose, 
however, the biodata inventory must focus on “specific, 
job-relevant past achievements, rather than on the pas­
sive exposure implied by the customary education and 
experience records.” Id. Respondent’s use of an experi­
ence criterion falls far short of this standard.

In fact, use of biodata is relatively likely to produce 
adverse impact if biodata items are not chosen carefully. 
See G. D reher  & p . S a c k e t t . P erspectives  on S t a f f in g  
a n d  S election  (19831. A number of comprehensive
in reducing rating errors and minimizing bias, as well as providing 
employers with useful techniques for doing so. is demonstrated in. 
e.p.. Bernardin & Pence, Effects of Rater Training, G5 J. Applied 
P sychology 60 C1980 > ; Borman. Format and Training Effects on 
Rating Accuracy and Rating Errors, 61 J. Applied Psychology 
410 f 1979) ; Ivancevich. Longitudinal Study of the Effects of Rater 
Training on Psychometric Errors in Ratings. 64 J. Applied PSY­
CHOLOGY 502 C1079) ; Latham. Wexlev & Purse]]. Training Man­
agers to Minimize Rating Errors in the Observation of Behavior, 
60 J. Applied Psychology 550 fl975i. Sec generally Goldstein,’ 
supra note 6, at 254-259.



22
validity studies have been conducted on the use of bio­
data as a selection device, concluding that it has incon­
sistent validity even in a form more comprehensive than 
used by respondent. See, e.g., Reilly & Chao, Validity 
and Fairness of Some Alternative Employee Selection 
Procedures, 35 Personnel Psychology 1 (1982).31

However, as with all selection devices, the most reason­
able and justifiable approach in using biodata is to base 
the choice of items on a well done job analysis, matching 
the items to the knowledge, skill, and ability require­
ments of the job description. See Pace & Schoenfeldt, 
Legal Concerns in the Use of Weighted Applications, 30 
P ersonnel Psychology, 159 (1977). Past experience 
alone, without the careful selection of both logically and 
empirically justified life-history questions, is unaccept­
able as a selection device.3-

II. THE SUBJECTIVE SELECTION PROCEDURES 
USED BY RESPONDENT FAIL TO MEET GEN­
ERALLY ACCEPTED STANDARDS AND APPEAR 
TO HAVE BEEN APPLIED WITHOUT ANY EVI­
DENCE THAT THEY ARE VALID FOR THE IN­
FERENCES DRAWN FROM THEM.

When reviewed in the context of the principles and 
studies described in Part I, it is clear that the assessment * 1 * 3 * * &

31 Compare Reilly & Chao, Validity and Fairness of Some, Alter­
native Employee Selection Procedures, 35 Personnel Psychology
1 (1982) (satisfactory reliability) with Korman, The Prediction
of Managerial Performance: A Review, 21 PERSONNEL PSYCHOL­
OGY 295 (1968) (finding biodata to have lower validity than other 
predictors for predicting managerial performance ).

3- Sophisticated research on the use of valid biodata is available
to inform the interested employer. See, e.g., Brush & Owens, 
Implementation and Evaluation for an Assessment Classification 
Model for Manpower Utilization, 32 Personnel PSYCHOLOGY 369
(1979); Schoenfeldt. Utilization of Manpower: Development and 
Evaluation of Assessment-Classification Model for Matching In­
dividuals with Jobs, 59 J. Applied PSYCHOLOGY 583 (1974) ; Owens
& Schoenfeldt, Toward a Classification of Persons, 46 J. Applied 
Psychology 329 (1979). See generally Schneider & Schmitt, supra 
note 12, at 378-382.



23

devices used by respondent in this case to select em­
ployees, including petitioner, for promotion to the posi­
tion of teller supervisor are distressingly inadequate. 
They have been used to deny promotion to a member of 
a protected class without any evidence that they were 
developed, used, and applied in a way consistent with 
generally accepted professional standards. There is 
absolutely no evidence that the procedures were subjected 
to any of the validation strategies available to respond­
ent, even in rudimentary form.

It is not unlawful per se for employers to use so- 
called “subjective” selection procedures. See Part I ( A). 
But, assuming that the use of subjective criteria was ap­
propriate for the position of supervisor of tellers, re­
spondent failed to perform a job analysis for the posi­
tion to identify more accurately the knowledges, skills, 
and abilities, which are desirable for successful per­
formance of the job.33 34 There is no evidence that respond­
ent secured accurate and thorough information about 
the job from job incumbents, their supervisors, personnel 
specialists, training manuals, job descriptions, or actual 
observation by trained observers. See P rinciples at 5-6, 
19-24; Schneider & Schmitt, supra note 12, at 47-50.

With regard to the selection procedures themselves, 
there are crucial infirmities in each of them. With re­
gard to the interview, there is no evidence that the inter­
viewer, in this case a white male, had any training in 
conducting interviews, a requirement that is especially 
important when the interviewee is of a different race and 
gender than the interviewer.84 Nor was more than one

33 As noted in Part I. job analysis is a critical first step in estab­
lishing the usefulness of any selection procedure. For a review of 
26 employment discrimination cases yielding a helpful summary of 
requirements for judicially-approved job analyses, arc Thompson & 
Thompson, Court Standards for Job Analysis in Test Validation, 35 
Personnel Psychology 865 (1982).

34 “The training of interviewers especially with possible appli­
cants of different race or sex may increase ‘their ability to relate’
. . . .” Schneider & Schmitt, supra note 12, at 386: sec Schmitt. 
Social and Situational Determinants of Interview Decisions: Impli­



24

interviewer systematically involved in decisionmaking. 
There is also no evidence that the interviews were struc­
tured so as to improve reliability and reduce biasing er­
rors,35 * nor is there any evidence that the nature of the 
interview7 or the questions asked had any empirical, logi­
cal, or theoretical connection with the position for which 
petitioner was considered.30

With regard to the rating scales, it was especially im­
portant for respondent to have offered some evidence for 
the validity of its supervisor performance appraisal, as 
the use of numerical values gives it facial validity and 
the appearance of objectivity. But, when judged accord­
ing to the criteria discussed in Part IfBi  & (C) (2) the 
rating scale is deplorable.

The qualities measured on the rating scale are neither 
unambiguously defined nor is there any demonstrable 
correlation between many of the criteria listed in the 
scale and successful job performance.37 For example, all

cations for the Employment Interview, 29 P ersonnel Psychology 
79, 97 (1976). See also Principles at 33: “AH persons within 
the organizations who have responsibilities related to the use of 
employment tests and related predictors should be qualified through 
appropriate training to carry out their responsibilities.”

35 “Use of a structured interview guide will improve interviewer 
reliability.” Schneider & Schmitt, supra note 12, at 386.

as ‘‘Predictor variables should be chosen for which there is an 
empirical, logical, or theoretical foundation.” P rinciples at 11.

37 The qualities rated were accuracy of work, alertness, per­
sonal appearance, supervisor-coworker relations, quantity of work, 
physical fitness, attendance, dependability, stability (“the ability to 
withstand pressure and remain calm in most situations” ), drive 
(“ambition”), friendliness and courtesy, and job knowledge. The 
qualities were variously rated on a scale from 0 to 7-10. See Watson 
v. Fort Worth Bank &■ Trust, 798 F.2d 791, 812 n.26 (5th Cir. 1986 ' 
(Goldberg, J.. dissenting). “Few of these categories have much 
objective content. For example, ‘personal appearance,’ ‘drive,’ and 
‘friendliness and courtesy’ are clearly subjective on their face. .
The rating system is also subjective: [e.g..] 0-1 ‘does not meet
minimum requirement . . .; 7-8, ‘superior work production record.’ 
This type of subjective measurement lends itself to discriminatory 
bias, be it conscious or unconscious.” Id.



25

but two of the rating criteria used by respondent are 
totally undefined and the two that have purported defini­
tions (stability and drive) are only vaguely defined; none 
of qualities assessed have anchoring or endpoint defini­
tions, as is customary in generally accepted graphic or 
BARS scales.3* The scale thus failed to use clearly de­
fined individual components or dimensions of job per­
formance, in contrast to undefined global measures, e.g., 
“neat and clean in appearance’’ vs. “personal appear­
ance.” 3!> Similarly, it did not use behaviorally based 
performance dimensions that could be verified by objec­
tive, observable evidence, e.g., “knows how to check ac­
count balance vs. “job knowiedge.” 38 * 40

The failure to conduct even a rudimentary job analysis 
further undermines the rating scale’s validity. It is far 
from clear how criteria such as “physical fitness,” meas­
ure an individual’s ability to supervise tellers. The 
constructs or behavior traits identified by respondent 
such as “drive” or “dependability” could be validated for 
use in promoting individuals to supervisory teller posi­
tions if demonstrated to be job-related and assessed re­
liably from the performance appraisal. Although the use 
of such “abstract trait names” is not advised unless the 
traits can be defined in terms of observable behaviors,41 
it may be necessary to measure such personality con­
structs for certain jobs. If certain traits or constructs 
are deemed important enough to influence personnel selec­
tion, however, they are important enough to measure 
validly:

Knowing whether a construct is measured validly re­
quires, if not a theory, at least some fairly well arti­
culated ideas about what is being measured, what a

38 Sec supra note 28.
3>l Cascio & Bernardin. Implications of Performance Appraisal 

Litipatiori for Personnel Decisions, 34 PERSONNEL PSYCHOLOGY 211, 
212 (1981) [hereinafter Cascio & Bernardin].

*« Id.
41 Cascio & Bernardin, supra note 39, at 212.



26

measure of the construct should reasonably be ex­
pected to be related to and, perhaps more impor­
tantly, what it should not be related to. . . . This 
view of constructs and construct validity implies two 
aspects of a construct-related strategy for developing 
evidence to judge the job relatedness of a selection 
procedure. The first is evidence that the construct 
is indeed important for job performance. . . . Ordi­
narily, a job analysis can provide a part of the basis 
for identifying and defining constructs which are im­
portant to job performance. Clarity of the articula­
tion of the meaning and the nature of the construct, 
and well-informed expert judgment that a logical 
relationship exists between the nature of the con­
struct and identifiable demands of the job is essen­
tial.

The second is evidence that the instrument used 
as a selection procedure is a valid measure of the 
construct and not of other constructs.

P r in c ipl e s  at 25; see also S ta n d ar d s  9-10.
Finally, there is no indication that the supervisors who 

performed the ratings had any training in assessing re­
sponses, avoiding the variety of sources of contamination 
of bias, especially when there is significant evidence that 
scores on rating scales may be affected by the race of 
rater and ratee. See Part 1(C) (2).

With regard to experience requirements, respondent 
failed to show that the lack of prior experience upon 
which it based its failure to promote petitioner in her 
first three attempts was related to the position for which 
she applied. It could very well be that her experience as 
a teller would be predictive of her success as a supervisor 
of tellers, but without an analysis of both positions, that 
assumption could very well be faulty. In addition, re­
spondent idiosyncratic-ally employed the criterion, using 
her alleged lack of experience to justify its denying her 
promotion in three instances and then ignored the cri­
terion to deny her promotion in a fourth instance, when 
it promoted a competing applicant with less experience. 
Perhaps most importantly, respondent failed to use prior



27

experience as one part of a carefully selected and logi­
cally and empirically justified comprehensive biographi­
cal data inventory.

Although respondent may have fewer resources to de­
vote to assessment device development or validation than 
larger organizations, that fact does not excuse the absence 
of any attempt to support its use of the selection devices 
employed in this case:

Where resources or sample sizes are limited, the 
criterion-related evidence of validity and content- 
related validation judgments obtained on similar jobs 
in other settings and the strength of the construct- 
related evidence of validity already generated by the 
[particular instrument] become particularly impor­
tant. Employers should not be precluded from using 
a [particular instrument] if it can be demonstrated 
that th [at instrument] has generated a significant 
record of validity in similar job settings for highly 
similar people[,] or that it is otherwise appropriate 
to generalize from other applications.

S tandards  at 59 .4- Respondent neglected to conduct even 
this inexpensive inquiry. There is absolutely no legiti­
mate reason why respondent failed to conduct even crude 
validity studies of the selection devices it used to evalu­
ate petitioner or, at the very least, to investigate the avail­
ability of existing sources of valid selection devices. These 
failures are all the more disturbing considering its poor 
record in hiring and promoting minority employees.4;i 42 43

42 Researchers and employers are encouraged to conduct coopera­
tive studies when adequate data . . . are not available.” Principles 
at: The Uniform Guidelines also clearly permit exceptions to the 
general requirement that employers validate their own procedures. 
Sec 29 C.F.R. §§ 1607.6-.8.

43 ‘‘[Pllaintiff presented 'significant proof’ that the bank opera- 
ated under a 'general policy of discrimination’.” Watson v. Fort 
Worth Sank <£- Trust, 798 F.2d 791. 807 (5th Cir. 198G > (Goldberg, 
J., dissenting). See. id. at 810-814 for supporting statistical data.



28
III. THE FAILURE TO REQUIRE THAT SUBJECTIVE 

SELECTION DEVICES HAVE DEMONSTRABLE 
VALIDITY WOULD UNDERMINE THE PURPOSES 
OF TITLE VII.

The underlying goal of Title VII of the Civil Rights 
Act of 1964 was the “eliminat [ion] . . . [of] discrimina­
tion in employment” based on race, color, religion, sex 
or national origin in all of its forms. H.R. Rep. No. 914, 
88 Cong., 2d Sess., reprinted in 1964 U.S. Code Cong. & 
Ad. News 2391, 2401. Consistent with that goal, Title 
VII prohibits employers from discriminating in employ­
ment decisions based on such impermissible classifications. 
See 42 U.S.C. § 2000e-2. However, because “Congress did 
not intend by Title VII . . .  to guarantee a job to every 
person regardless of his qualifications,” Griggs v. Duke 
Power Co., 401 U.S. 424, 430 <1971), it authorized em­
ployers to distinguish among individuals for selection or 
promotion purposes based “upon the results of any pro­
fessionally developed ability test provided that such tests, 
its administration or action upon the results is not de­
signed, intended or used to discriminate because of race, 
color, religion, sex or national origin.” 42 U.S.C. 2000e- 
2(h).44

In permitting the appropriate use of “professionally 
developed ability tests” as a basis for selecting and pro­
moting individuals, the drafters of the ultimate language 
of Title VII stressed the importance of demonstrating 
the relationship and relevance of the selection procedure 
to job qualifications. See 110 Cong. Rec. 7247 (1964). 
Recognizing the “u [tility]” of “testing or measuring 
procedures,” this Court has also stressed Congress’ intent 
to “forbid [] . . . giving these devices and mechanisms

44 “[T]he Act does not command that any person be hired simply 
because he was formerly the subject of discrimination, or because 
he is a member of a minority group . . . .  What is required by 
Congress is the removal of artificial, arbitrary, and unnecessary 
barriers to employment when the barriers operate invidiously to 
discriminate on the basis of racial or other impermissible factors.” 
Griggs, 401 U.S. at 431.



29

controlling force unless they are demonstrably a reason­
able measure of job performance.” Griggs, 401 U.S. at 
436. “What Congress . . . commanded is that any tests 
used must measure the person for the job and not the 
person in the abstract.” Id.

Proponents of Title VII opposed authorizing the use 
of “professionally developed ability tests” without regard 
to their ability to predict performance of the particular 
job in question. 110 Cong. Rec. at 13504. The use of 
tests which, although “professionally developed” bear no 
relation to the job for which they are being used to assess 
individuals, was clearly recognized as a potential means 
of covert “ [d] iscrimination [,] . . . under the guise of 
compliance with the statute.” Id. Assessment devices 
which have not been shown to be job-related, or otherwise 
predictive of performance of a particular job, cannot 
justify discriminatory employment practices. The crucial 
public policy goals of Title VII would be thwarted if em­
ployers could rebut claims of discrimination simply by 
pointing to the results of unvalidated assessment devices, 
whether subjective or objective. Indeed, such unvalidated 
results may well reflect precisely the discrimination 
Congress sought to eliminate in Title VII.

In light of the “nondiscrimination” objectives of Title 
VII and the demonstrated ability of professionals to vali­
date subjective assessment devices, however, there is no 
principled reason to treat objective and subjective devices 
differently in imposing a validation requirement, regard­
less of whether a plaintiff proceeds under disparate im­
pact or disparate treatment claim.4" Indeed, permitting 
the use of unvalidated subjective assessment devices 
while requiring objective devices to be validated provides 
a ready mechanism for covert discrimination for em­
ployers seeking to avoid the constraints of Title VII. 
Validation does require the expenditure of both time and 
money by an employer. But. as amicus has demonstrated,

4r' Texas Dep’t of Comm. Affair? r. Burdinc, 450 U.S. 248 (1981) ; 
McDonnell Douglas Carp. v. Green, 411 U.S. 792 (1973).



there are a number of readily available techniques for 
developing, adopting, and validating both objective and 
subjective devices, and both professional and legal stand­
ards allow the use of already developed selection devices. 
Thus, when one balances the relative costs of validation 
to the employer against the costs of eroding the protec­
tions provided by Title VII and the damage to society of 
perpetuating the vestiges of discrimination, the outcome 
clearly favors the requirement that employers use pyscho- 
metrically sound and job-relevant selection devices.

CONCLUSION
For the foregoing reasons, amicus respectfully requests 

that this Court reverse the decision of the Court of Ap­
peals for the Fifth Circuit insofar as it releases em­
ployers from their obligation to show that the selection 
devices they use to make employment decisions are valid.

Respectfully submitted,

3 0

D onald N. B ersoff 
(Counsel of Record)

Laurel Pyke Malson 
Donald B. Verrilli, Jr.
E n n is  F riedman & B ersoff 
1200 - 17th Street, N.W., Suite 400 
Washington, D.C. 20036 
(202) 775-8100 
Attorneys for Amicus Curiae

American Psychological Association

September 14, 1987

Copyright notice

© NAACP Legal Defense and Educational Fund, Inc.

This collection and the tools to navigate it (the “Collection”) are available to the public for general educational and research purposes, as well as to preserve and contextualize the history of the content and materials it contains (the “Materials”). Like other archival collections, such as those found in libraries, LDF owns the physical source Materials that have been digitized for the Collection; however, LDF does not own the underlying copyright or other rights in all items and there are limits on how you can use the Materials. By accessing and using the Material, you acknowledge your agreement to the Terms. If you do not agree, please do not use the Materials.


Additional info

To the extent that LDF includes information about the Materials’ origins or ownership or provides summaries or transcripts of original source Materials, LDF does not warrant or guarantee the accuracy of such information, transcripts or summaries, and shall not be responsible for any inaccuracies.