Watson v. Fort Worth Bank and Trust Brief Amicus Curiae in Support of Petitioner
Public Court Documents
September 14, 1987
Cite this item
-
Brief Collection, LDF Court Filings. Watson v. Fort Worth Bank and Trust Brief Amicus Curiae in Support of Petitioner, 1987. d0a6e2b5-c89a-ee11-be36-6045bdeb8873. LDF Archives, Thurgood Marshall Institute. https://ldfrecollection.org/archives/archives-search/archives-item/77e92b64-598d-4b9b-85b4-5757b385c243/watson-v-fort-worth-bank-and-trust-brief-amicus-curiae-in-support-of-petitioner. Accessed November 23, 2025.
Copied!
No. 86-6139
In The
CCnurt at ilyp Hmtpfc Stairs
October Term, 1987
Clara Watson,
Petitioner,
v.
Fort Worth Bank & Trust,
Respondent.
On Writ of Certiorari to the United States
Court of Appeals for the Fifth Circuit
BRIEF FOR AMICUS CURIAE
AMERICAN PSYCHOLOGICAL ASSOCIATION
IN SUPPORT OF PETITIONER
Donald N. Bersoff
(Counsel of Record)
Laurel Pyke Malson
Donald B. Verrilli, Jr.
E nnis Friedman & Bersoff
1200 - 17th Street, N.W., Suite 400
Washington, D.C. 20036
(202) 775-8100
Attorneys for Amicus Curiae
American Psychological Association
September 14,1987
W il s o n - Epes P r in t in g C o . , In c . - 7 8 9 - 0 0 9 6 • W a s h in g t o n , D .C . 2 0 0 0 1
TABLE OF CONTENTS
Page
TABLE OF AUTHORITIES_____________________ iii
INTEREST OF AMICUS C U RIA E_______________ 1
INTRODUCTION AND SUMMARY OF ARGU
MENT _______________________________________ 2
ARGUM ENT___________________________________ 4
I. BECAUSE SUBJECTIVE ASSESSMENT DE
VICES CAN, AND SHOULD, BE SCIENTIFI
CALLY VALIDATED, THE USE OF SUBJEC
TIVE SELECTION CRITERIA AND PROCE
DURES BY EMPLOYERS SHOULD NOT
PRECLUDE REVIEW UNDER ANY TITLE
VII TH EO RY ____________________________ 4
A. Professional Standards Concerning the Tech
nical Adequacy of Selection Devices are Ap
plicable to the Subjective Methods Used by
Respondent____________________________ 4
B. There are Generally Accepted Strategies for
Establishing the Validity of Subjective Meth
ods of Employee Selection______________ 9
C. To Reduce Sources of Bias the Validity7 of
Each of the Selection Devices Used by Re
spondent Must and Can Be Established by
Generally Accepted And Accessible Valida
tion Strategies_________________________ 14
1. The interview ________________________ 14
2. Rating scales and other performance
appraisals__________________________ 17
3. Experience requirements_____________ 21
II. THE SUBJECTIVE SELECTION PROCE
DURES USED BY RESPONDENT FAIL TO
MEET GENERALLY ACCEPTED STAND
ARDS AND APPEAR TO HAVE BEEN AP
PLIED WITHOUT ANY EVIDENCE THAT
THEY ARE VALID FOR THE INFERENCES
DRAWN FROM T H E M ___________________ 22
11
Page
III. THE FAILURE TO REQUIRE THAT SUB
JECTIVE SELECTION DEVICES HAVE
DEMONSTRABLE VALIDITY WOULD UN
DERMINE THE PURPOSES OF TITLE VII- 28
CONCLUSION__________________________________ 30
TABLE OF CONTENTS—Continued
Ill
TABLE OF AUTHORITIES
CASES: Page
Albemarle Paper Co. v. Moody, 422 U.S. 405
(1 9 7 5 )___________________________________ 20
Brito v. Zia, 478 F.2d 1200 (10th Cir. 1973)______ 9
Debra P. v. Turlington, 644 F.2d 397 (5th Cir.
1981) ____________________________________ 6
Douglas v. Hampton, 512 F.2d 976 (D.C. Cir.
1975)_____________________________________ 6
Griggs v. Duke Power Co., 401 U.S. 424 (1971)___ 28, 29
Harless v. Duck, 14 FEP Cases 1616 (N.D. Ohio
1977)____________________________________ 6
McDonnell Douglas Corp. v. Green, 411 U.S. 792
(1973)___________________________________ 29
Texas Dep’t of Comm. Affairs v. Burdine, 450 U.S.
248 (1981)______________________________ 29
Washington v. Davis, 426 U.S. 227 (1976)______ 6, 12
Watson v. Fort Worth Bank <£ Trust, 798 F.2d
791 (5th Cir. 1986)_______________________ 24, 27
STATUTES & REGULATIONS:
42 U.S.C. § 2000e et se q ._____________________ 2, 3, 28
29 C.F.R. § 1607 et seq______________________3, 7, 8, 27
H.R. Rep. No. 914, 88 Cong., 2d Sess., reprinted in
1964 U.S. Code Cong. & Ad. News 2391______ 28
MISCELLANEOUS:
AERA, APA, NCME, Standards for Educational
and Psychological Testing (1985)________ passim
AERA, APA, NCME, Standards for Educational
and Psychological Tests (1974)_____________ 6
A. Anastasi, Psychological Testing (5th ed.
1982) _____________________ 10, 15, 16, 17, 18, 20, 21
APA, Standards for Educational and Psychological
Tests and Manuals (1966)_________________ 6
APA, Technical Recommendations for Psycholog
ical Tests and Diagnostic Techniques (1954)___ 6
Arvey, Unfair Discrimination in the Employment
Interview: Legal and Psychological Aspects, 86
Psychology Bull. 736 (1982)_____________ 14,15
IV
Arvey & Campion, The Employment Interview: A
Summary and Review of Recent Research, 35
Personnel Psychology 281 (1982 )___________ 16
Bernardin & Pence, Effects of Rater Training, 65
J. Applied Psychology 60 (1980)___________ 21
Bersoff, Testing and the Law, 36 Am. PSYCHOLO
GIST 1047 (1981 )___________________________ 11
W. Bingham, B. Moore & J. Gustad, How to In
terview (4th ed. 1959)_____________________ 15
Borman, Format and Training Effects on Rating
Accuracy and Rating Errors, 64 J. Applied
Psychology 410 (1979)_____________________ 21
Brush & Owens, Implementation and Evaluation
for an Assessment Classification Model for Man
power Utilization, 32 Personnel Psychology
369 (1979 )_________________________________ 22
Cascio & Bernardin, Implications of Performance
Appraisal Litigation for Personnel Decisions, 34
Personnel Psychology 211 (1981)________ 19, 20, 25
L. Cronbach, Essentials of Psychological
Testing (4th ed. 1984)___________________ 10, 18, 20
Distefano, Pryer, & Erffmeyer, Application of Con
tent Validity Methods to the Development of a
Job-Related Performance Rating Criterion, 36
Personnel Psychology 621 (1983)_________ 19
G. Dreher & P. Sackett, Perspectives on Staff
ing and Selection (1983)__________________ 21
Dunnette & Borman, Personnel Selection and Clas
sification Systems, 40 An n . Rev. Psychology
477 (1979 )_________________________________ 15
R. Fear, The Evaluation Interview (2d ed.
1973)______________________________________ 15
Feild & Holley, The Relationship of Performance
Appraisal System Characteristics to Verdicts in
Selected Employment Discrimination Cases, 25
Acad. Mgmt J. 392 (1982)_________________ 19
Friedman & Williams, Current Use of Tests for
Employment in 2 Ability Testing (A. Wiedor
& W. Garner eds. 1982)
TABLE OF AUTHORITIES—Continued
Page
9
V
S. Gael, Job Analysis: A Guide to Assessing
W ork Activities (1983)____________________ 13
I. Goldstein, Training in Organizations (2d ed.
1986 )__________________________________ 5, 14, 18, 21
Grant & Bray, Contributions of the Interview' to
Assessment of Management Personnel, 53 J. Ap
plied P sycology 24 (1969)__________________ 15
Guion, On Trinitarian Doctrines of Validity, 11
Prof. Psychology 385 (1980 )_______________ 11
Hakel, Employment Interviewing in PERSONNEL
Managem ent (K. Rowland & G. Ferris eds.
198 2 )______________________________________ 16
H. Henneman, D. Schwab, J. Fossum, & L. Dyer,
Personnel/H uman Resource Management
(1980)_____________________________________ 14
Ivancevich, Longitudinal Study of the Effects of
Rater Training on Psychometric Errors in Rat
ings, 64 J. Applied Psychology 502 (1979)___ 21
Kleiman & Durham, Performance Appraisal, Pro
motion and the Courts: A Critical Review, 34
Personnel Psychology 103 (1981)_____ 5,13,19, 20
Korman. The Prediction of Managerial Perform
ance: A Review, 21 Personnel Psychology
295 (1968 )_________________________________ 22
Kraiger & Ford, A Meta-analysis of Ratee Race
Effects in Performance Ratings, 69 J. Applied
Psychology 56 (1985)_______________________ 18
Landv & Farr, Performance Rating , 87 Psycho
logical Bull. 72 (1980)__________________ 17, 18, 19
A. Larson & L. Larson, Employment Discrim
ination § 15-87 (1986)_____________________ 5
Latham, Saari, Pursell & Campion, The Situational
Interview, 65 J. Applied Psychology 422
(1 9 8 0 )_____________________________________ 21
Latham. Wexley & Pursell, Troinina Manaaers to
Minimize Rating Errors in the Observation of
Behavior, 60 J. Appi IED PSYCHOLOGY 550
(1 9 7 5 )_______________________________________ 16
E. Levine, Everything You Ever Wanted to
Know about Job Analysis (1983) .
TABLE OF AUTHORITIES—Continued
Page
13
VI
TABLE OF AUTHORITIES—Continued
Page
Locher & Teel, Performance Appraisal— A Survey
of Current Practices, 56 PERSONNEL J. 245
(1 9 7 7 )-------------------------------------------------------- 17
J. Matarazzo & A. Weins, The Interview: Re
search on its Anatomy and Structures
(1 9 7 2 )-------------------------------------------------------- 15
E. McCormick, Job Analysis: Methods and
Applications (1979)_______________________ 13
Messick, Test Validity and the Ethics of Assess-
ment, 35 Am . P sychologist 1012 (1 9 8 0 )_____ n
Owens, Background Data in Handbook of Indus
trial and Organizational Psychology (M.
Dunnette ed. 1 9 7 6 )_________________________ 21
Owens & Schoenfeldt, Toward a, Classification of
Persons, 46 J. Applied Psychology 329 (1979) _ 22
Pace & Schoenfeldt, Legal Concerns in the Use of
Weighted Applications. 30 Personnel Psychol
ogy 159 (1977)____________________________ 22
Reilly & Chao, Validity and Fairness of Some Ah
temotive Employee Selection Procedures, 35
Personnel Psychology 1 (1982)_____________ 22
Rice, Spotlight on Employee Performance, 9 US
Air 53 (August 1987)______________________ 19
Schmidt & Johnson, Effect of Race on Peer Ratings
in an Industrial Setting, 57 J. Applied Psychol
ogy 237 (1973)_____________________________ 18
Schmitt, Social and Situational Determinants of
Interview Decisions: Implications for the Em
ployment Interview, 29 Personnel Psychology
79 (1976)---------------------------------------------- 14, 15, 23
B. Schneider & N. Schmitt, Staffing Organiza
tions (2d ed. 1 9 8 6 )_______________________ ;
Schoenfeldt, Utilization of Manpower: Develop
ment and Evaluation of Assessment-Classifica
tion Model for Matching Individuals with Jobs,
59 J. Applied Psychology 583 (1974) 22
Vll
Society for Industrial and Organizational Psy
chology, Principles for the Validation and
TABLE OF AUTHORITIES—Continued
Page
Use of Personnel Selection Procedures
(1987) ___________________________________ passim
Tenopyr, Content-Construct Confusion, 30 Per
sonnel Psychology 47 (1 9 7 7 )_______________ 11
Thompson & Thompson, Court Standards for Job
Analysis in Test Validation, 35 Personnel Psy
chology 865 (1982)________________________ 13,23
Ulrich & Trumbo, The Selection Interview Since
19U9 , 63 Psychology Bull. 100 (1965)__________14,15
Waintroob, The Developing Law of Equal Employ
ment Opportunity at the White Collar and Pro
fessional Level, 21 Wm. & Mary L. Rev. 45
(1979) _____________________________________ 5
BRIEF FOR AMICUS CURIAE
AMERICAN PSYCHOLOGICAL ASSOCIATION
IN SUPPORT OF PETITIONER
INTEREST OF AMICUS CURIAE
The American Psychological Association (“APA”) is
a nonprofit, scientific, and professional organization with
more than 65,000 members. It has been the major asso
ciation of psychologists since 1892, and includes the vast
majority of psychologists holding doctoral degrees from
accredited universities in this country. Among APA’s
major functions are the promotion of psychological re
search, the dissemination of information regarding hu
man psychological behavior, the promulgation of stand
ards governing scientific and professional practice, in
cluding assessment, and, as reflected in its Bylaws, the
“advance [ment] of psychology as a science and profes
sion.” A substantial number of APA’s members are con
cerned with the development and validation of assessment
devices for personnel selection in the employment context,
including the more than 2500 members who belong to the
APA’s Division of Industrial and Organizational Psy
chology and the 1500 members who belong to its Division
of Evaluation and Measurement.
The APA has participated as amicus in many cases in
this Court involving social science issues, including Ken
tucky v. Stincer, 107 S. Ct. 2658 (1987) (effects on child
victims of sex abuse of testifying in the presence of their
alleged abusers); Colorado v. Connelly, 107 S. Ct. 515
(1986) (behavioral effects of command hallucinations);
and Lockhart v. McCrec, 106 S. Ct. 1758 (1986) (“con-
viction-proneness” of “death-qualified” juries). APA con
tributes amicus briefs only where it has special knowl
edge to share with the Court. APA regards this as one
of those cases. In this instance, APA wishes to inform
this Court of the state of current scientific thought re
garding validation of personnel assessment devices, in
cluding subjective selection criteria and procedures such
as those used by respondent in this case.
2
Petitioner and respondent have consented to the filing
of this amicus brief. Their letters of consent are on file
with the Clerk of the Court.
INTRODUCTION AND SUMMARY OF ARGUMENT
APA addresses in this brief an essential and inherent
issue in this case. Amicus will leave it to the parties to
argue whether disparate impact analysis under Title VII
of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et. seq.,
is properly applied to review the legality of subjective
assessment devices for the hiring and promotion of em
ployees. However, insofar as a negative answer is
grounded in the assumption that subjective assessment
devices are not amenable to psychometric scrutiny in the
same way that ability tests are, such an assumption is
contrary to fundamental and generally accepted scientific
principles of measurement. The most frequently articu
lated reason for limiting disparate impact analysis to ob
jective criteria and procedures—that only objective cri
teria and procedures yield sufficient statistical data to
permit scientific validation—is not supported by the
relevant social science literature. Indeed, the view that
only objective selection criteria and procedures can be
clearly identified, applied equally to all applicants, and
statistically evaluated has been discredited by the ex
tensive work of industrial psychologists and other assess
ment specialists. Subjective selection devices can be scien
tifically validated for the assessment of individuals for
hiring, promotion, or other selection decisions in the em
ployment context. The choice of analyses under Title
VII, therefore, should not turn on whether the challenged
employment practices are based on objective or subjec
tive evaluations of applicants.
The APA’s Standards for E ducational and Psycho
logical Testing (1985) [hereinafter Standards] pro
vide a framework for the evaluation and validation of
testing and other assessment devices, including such sub
jective devices as interviews, behavioral observations, and
rating scales. The Standards are consistent with the
P rinciples for the Validation and Use of Personnel
3
Selection P rocedures (1987) [hereinafter P rinciples]
published by the Society for Industrial and Organiza
tional Psychology.1 Furthermore, the Uniform Guidelines
on Employee Selection Procedures, 29 C.F.R. §§ 1607.1,
et. seq. [hereinafter Uniform Guidelines] were explicitly
intended to be consistent with the Standards. Id . at
§ 1607.5(C). Such technical standards were clearly con
templated by the drafters of Title VII when they referred
to the use of “pj'ofessionally developed” assessment de
vices by employers. 42 U.S.C. § 2000e-2(h>. When ex
amined in light of the Standards, Principles, and Uni
form Guidelines, it is clear that the procedures respond
ent used in this case to evaluate petitioner for promotion
were not shown to be scientifically valid, i.e., appropriate,
meaningful, or useful for the inferences drawn from
them. See Part 1(B). More importantly, however, the
Standards and P rinciples provide ample guidelines for
how those procedures, notwithstanding their subjective
nature, could have been validated.
The recognition that subjective assessment procedures
may be validated is critical to the effectuation of the
underlying goals of Title VII. Subjective procedures, like
those of a more objective nature, should be required to be
validated for specific jobs in any case where employers
use those procedures as a defense to a prima facie case of
discrimination, whether the claim is analyzed under a dis
parate treatment or a disparate effect theory. Only un
der such a rule will employers be inhibited from making
personnel decisions based on unlawful and irrelevant fac
tors. Should the Court in this case implicitly approve the
use of unvalidated selection procedures as a defense to
any Title VII claim, employers will have greater incen
tive to resort more readily to subjective assessment de-
1 The Society is an integral component of the amicus and is also
known as Division 14 of the APA (Division of Industrial and
Organizational Psychology). Until adopted by amicus as a whole,
the Principles are the formal policy only of Division 14. However,
they are “intended to represent the consensus of professional knowl
edge and thought as it exists today. . . Id. at 3.
4
vices, which would facilitate covert and discriminatory
decisionmaking and severely undermine the right of equal
employment opportunities for those classes of persons
otherwise protected by Title VII.
ARGUMENT
I. BECAUSE SUBJECTIVE ASSESSMENT DEVICES
CAN, AND SHOULD. BE SCIENTIFICALLY VALI
DATED, THE USE OF SUBJECTIVE SELECTION
CRITERIA AND PROCEDURES BY EMPLOYERS
SHOULD NOT PRECLUDE REVIEW UNDER ANY
TITLE VII THEORY.
Amicus relies on petitioner and other supporting aviici
to establish the applicability of disparate impact analysis
under Title VII to subjective selection criteria for hiring
and promotions. Amicus wishes to share with the Court
its unique knowledge of testing and other assessment de
vices, and the validation of such devices, so that the Court’s
decision regarding the proper standard of analysis will
be informed by relevant social science data. However,
should the Court ultimately determine that disparate
treatment analysis is the appropriate standard of review
for subjective assessment devices in the employment con
text, amicus believes that such a holding in no way obvi
ates the principle that such devices can and should be
validated for the particular job in question.
A. Professional Standards Concerning the Technical
Adequacy of Selection Devices are Applicable to the
Subjective Methods Used by Respondent.
At issue in this case are three selection devices by
which petitioner was evaluated for promotion by respond
ent’s agents—interviews, supervisor’s ratings, and ex
perience requirements. Respondent characterizes these as
subjective assessment procedures, in contrast to such
procedures as multiple choice standardized paper-and-
pencil tests which are typically classified as objective
measures. Although in the context of personnel assess
ment procedures the term “subjective” is not easily de
fined, the concept has been used to refer variously to
5
procedures in which “judgment or discretion fis exer
cised] on the part of the evaluator” " or which lack any
“neutral” factors,2 3 to assessment devices of a “non
mechanical, operator-dependent” nature,4 or to appraisals
not based on “ ‘hard data,’ such as production records
[or] attendance.” 5 Most simply, “Measures that require
the statement of opinion, beliefs, or judgments are con
sidered subjective.” 6 Regardless of the particular defini
tion, they are consistent with the widely-held view that
subjective devices used by employers for hiring or promo
tion purposes are inherently less scientific, less quantifi
able, less reliable, and less facially neutral than their
objective counterparts.7 For this reason, it has been as-
2 Waintroob, The Developing Law of Equal Employment Oppor
tunity at the White Collar and Professional Level, 21 Wm. & Maf.Y
L. Rev. 45, 48 (1979).
3 3 A. Larson & L. Larson, E mployment Discrimination
§ 15-87 (1986).
4 Brief of United States as Amicus Curiae on Petition for Writ
of Certiorari at 13 [hereinafter Brief of United States].
6 Kleinian & Durham, Performance Appraisal, Promotion and the
Courts: A Critical Review, 34 Personnel Psychology 103, 114
(1981) [hereinafter Kleiman & Durham].
« I. Goldstein, Training in Organizations 136 (2d ed. 1986)
[hereinafter Goldstein], “For example, rating scales are subjective
measures, while measures of absenteeism are more objective.
(However, supervisors’ ratings of the absenteeism level of em
ployees could turn that measure into a subjective criterion).’’ Id.
7 “ [S]ubjective measures are affected by the difficulties that one
individual has in rating another without bias.” Id. at 136-137.
These difficulties, however, are not due to any inherent charac
teristics of subjective measures but to the failure of employers to
apply standard principles of test construction fo subjective meas
ures. It is not intrinsic to such devices to be unquantifiable. When
properly developed, they are amenable to scoring and objective
analysis. For example:
rating scales have been the most commonly employed measures
in applied settings. . . . [One] reason is that it is simple to
throw together a rating scale with a few traits . . . and
delude yourself into believing that you have a useful measure
of performance. Professionals . . . know that the steps in
6
serted that subjective selection methods and criteria are
not susceptible to scientific “validation” or any other
psychometric scrutiny. See, e.g., Brief of United States,
supra note 4, at 15, and references therein.
This view is fundamentally at odds with the universal
judgment of those most experienced and knowledgeable
in techniques of measurement and evaluation generally,
and in the appraisal of employee performance specifically.
The most authoritative source for the standards to be
applied to determine the technical adequacy of assess
ment devices, the appropriateness of specific applications
of these devices, and the reasonableness of inferences
based on the results of these devices are the Standards
for Educational and P sychological Testing (1985),
a joint publication of the amicus APA, the Ameri
can Educational Research Association (“AERA” ), and
the National Council on Measurement in Education
(“NCME”).e
Consistent with the Standards, the Division 14 Prin
ciples for the Validation and U se of Personnel Se- 8
the process are very similar for objective and subjective
measures and that shortcuts do not work in either case.
Id. at 137 (emphasis added).
8 The 1985 Standards represent the most modern expression of
professional and scientific thinking concerning technical advances
in psychological assessment. Its predecessor documents are, APA,
Technical Recommendations for Psychological Tests and Diagnostic
Techniques (1954); APA. Standards for Educational and Psycho
logical Tests and Manuals (1966); and AERA, APA, NCME
Standards for Educational and Psychological Tests (1974).
The 1966 and 1974 forerunners of the current Standards have
been cited with approval by this Court and in a variety of lower
federal cases. S(e, e.g.. Washington v. Davis, 426 L’.S. 227, 247
n.13(1976); Debra P. r. Turlington, 644 F.2d 397. 405 n.10 (5th
Cir. 1981'; Douglas r. Hampton, 512 F.2d 976, 984-986 (D.C. Cir.
1975) (see especially id. at 984 n.59 where the court called the
1966 edition “the universally recognized professional authority” );
Harless v. Duck. 14 FEP Cases 1616, 1624 n.5 (N.D. Ohio 1977)
(stating that the “courts have almost unanimously agreed that the
lection Procedures (1987) apply the more general
guidelines of the Standards to the specific problems of
making decisions in the context of employee selection,
placement, and promotion and provide, inter alia, “prin
ciples for the application and use of valid selection pro
cedures, and information that may be helpful to person
nel managers and others responsible for authorizing or
implementing validation efforts.” Principles at 2.8 9
Of relevance as well are the Uniform Guidelines. “The
provisions of these guidelines . . . are intended to be
consistent with generally accepted professional standards
for evaluating standardized tests and other selection pro
cedures, such as those described in the Standards . . .
prepared by a joint committee of the” APA, AERA, and
NCME. 29 C.F.R. § 1607.5(C). They “incorporate a
single set of principles which are designed to assist em
ployers . . . to comply with requirements of Federal law
prohibiting employment practices which discriminate on
the grounds of race. . . .” Id. at § 1607.1(B).
The Standards, P rinciples, and Uniform Guidelines
apply to a broad range of selection procedures, not merely
to the traditional objective measures commonly denomi
nated as tests. Although the full title of the Standards
uses the word “testing,” the term is generic and refers
to “standardized ability . . . instruments, diagnostic and
evaluative devices, interest inventories, personality inven
tories, and projective instruments.” Standards at 3.
The term also includes samples of observable behavior
“relevant to . . . employment decisionmaking.” Id. at 4.
Amicus, AERA, and NCME unequivocally view the
[1974] APA guidelines provide persuasive standards for evaluat
ing: claims of job relatedness”).
9 The 1987 Principles are a revision of the original 1980 version.
“The purposes of the revision are to bring the Principles up to
date scientifically, to make them consistent with the Standards,
and to reduce possible ambiguities regarding good practice in the
use of selection procedures in making employment decisions.”
Principles at 1.
8
Standards as useful and applicable “to the entire range
of assessment techniques.” Id .w
Similarly, the Division 14 P rinciples are not appli
cable solely to standardized paper-and-pencil tests. They
are explicitly intended to aid employers who make hiring
and promotion decisions to choose, select, develop, eval
uate, and use all personnel selection devices, including
“performance tests, . . . personality [and] interest in
ventories, . . . biographical data forms or scored appli
cation blanks, interviews, . . . experience requirements,
. . . appraisals of job performance, . . . [and] esti
mates of advancement potential.” Principles at 1.
Finally, the Uniform Guidelines “provide a framework
for determining” not only “the proper use of tests” but
“other selection procedures” as well. 29 C.F.R. § 1607.1
(B). They plainly “apply to tests and other selection
procedures which are used as a basis for any employment
decision[,] including “hiring and promotion.” Id. at
§ 1607.2 (B1 .n
In sum, then, it is the universal professional judgment
that the assessment devices used by respondent in decid
ing not to promote the petitioner can properly be scrutin
ized under the applicable scientific and professional
standards, principles, and guidelines concerning the
evaluation of the psychometric soundness of such de
vices.10 11 12’ In the present case, those devices were not sub
10 Such “instruments . . . are called tests here to indicate that
the standards also apply to these instruments.” Id. at 4-5.
11 See also, id., at § 1607.15(A)(1) (referring to selection proce
dures “either standardized or not standardized”).
12 See, e.g., B. Schneider & N. Schmitt, Staffing Organiza
tions 14 (2d ed. 1986) [hereinafter Schneider & Schm ittl:
Typically we think of a test as an examination of some kind
responded to with paper and pencil. . . . In fact, industrial
psychologists and the L'niform Guidelines on Employee Selec
tion Procedures (1978) have defined the word “test” in much
broader terms and the courts have adopted this definition.
In brief, a test is defined as any form of collecting information
9
jected to any such scrutiny to determine if they were
valid for the purpose for which they were used.
B. There are Generally Accepted Strategies for Estab
lishing the Validity of Subjective Methods of Em
ployee Selection.
There are many elements involved in the construction,
development, and evaluation of an assessment instru
ment. Industrial psychologists and others who create a
test or other selection procedure must choose the domain
to be assessed, construct the items to w7 * * * * *hich test takers
will respond or select the behaviors to be observed, de
velop scoring scales and norms so that results can be
interpreted, prepare manuals, and most importantly,
ensure that the instrument is psychometrically sound. See
S ta n d ar d s at 9-37. The psychometric soundness of an
instrument depends primarily on its being reliable 13 and
valid. Although both qualities are essential, there is no
doubt that “validity is the most important consideration
in test evaluation.” S ta n d ar d s at 9.14
when that information is used as a basis for making an em
ployment decision. So, interviews are tests, as are application
blanks, . . . performance appraisals used as a basis for mak
ing promotions (which, obviously are selection decisions), and
any other kind of information used for making employment
decisions.
Accord Friedman & Williams, Current Use of Tests for Employ
ment in 2 Ability Testing 99-100 (A. Wigdor & W. Garner eds.
1982) published by The National Academy of Sciences (acknowledg
ing that the definition of selection procedures has extended to the
full range of assessment devices, including interviews and are to
be scrutinized according to the same guidelines used to evaluate
standardized tests). See also Brito v. Zia, 478 F.2d 1200 (10th Cir.
1973) (holding that subjective observations by employer of minor
ity employees had to be supported by empirical evidence of
validity).
13 “Reliability refers to the degree to which test scores are free
from errors in measurement.” Standards at 19. The more reli
able a test score the more consistent, dependable, or repeatable it
will be. See Principles at 39.
14 “Undoubtedly the most important question to be asked about
any psychological test concerns its validity . . . .” A. Anastasi,
10
Validation refers to the process by which psychologists
ascertain the degree to which certain inferences from a
particular assessment device are appropriate, meaningful,
or useful. See Standards at 9; Principles at 4. A test
is valid if the proposed interpretation of scores proves to
be sound and relevant. See Cronbach, supra note 14, at
125. Validity “concerns what the test measures and how-
well it does so. It tells us what can be inferred from test
scores.” Anastasi, supra note 14, at 131. When a selec
tion procedure or device is said to be “validated,” psy
chologists understand that the predictions inferred from
the scored result of the procedure or device have a high
rate of accuracy. For validation to be meaningful, it
must predict performance of a particular task or set of
tasks or other job relevant behavior of particular concern
to the employer. Thus, selection procedures or devices are
validated for a particular job when they have been dem
onstrated scientifically to make reliable and meaningful
distinctions between individuals on the basis of their abil
ity to perform particular tasks w-ith competence or to
function successfully in a particular job.™ “The simple
psychometi ic fact that test validity must be ascertained
for specific uses of the test has long been familiar
An invalid test or one that includes elements not related
to the job under consideration may unfairly exclude
minority group members who could have performed the
job satisfactorily.” Anastasi, supra note 14, at 432.
Validity “is a unitary concept.” Standards at 9.
However, within the unifying theme of determining the
P sychological Testing 27 (5th ed. 1982) [hereinafter Anastasi]
Obviously, no aspect of a test is more important than valid
ity . . . . L. Cronbach. Essentials of Psychological Testing
1-5 (4th ed. 1984) [hereinafter Cronbach], The texts by Anastasi
and Cronbach are considered the most authoritative basic treatises
on the topic of psychological measurement.
Only as abbreviation is it legitimate to speak of ‘the validity
of a test'; a test relevant to one decision may have no value for
another. So users must ask. ‘How valid is this test for the decision
made, or ‘How valid are the interpretations I am making of the
scores?’.” Cronbach, supra note 14, at 125.
11
relevance or interpretability of scored results, “ [t]he
validity of any inference can be determined in a variety
of ways.” Principles at 4. “ [T]he various means of
accumulating validity evidence have been grouped into
categories called content-related, criterion-related, and
construct-related evidence of validity.” S ta n d ar d s at 9.16
In the context of personnel selection procedures, content-
validation involves a determination that the assessment
instrument accurately reflects a representative sample of
important aspects of job performance or job-required
knowledge. See Principles at 19.17 Criterion-related
validation involves a determination that the assessment
instrument is predictive of, or significantly correlated
with, important elements of job performance or work
behavior. See P r in c ipl e s at 6.18 Construct validation
16 “ [I]nsofar as the courts have interpreted the test standards
and . . . the Uniform Guidelines . . . to mean that content, cri
terion, and construct validity are distinct forms of validation, those
interpretations are oversimplified, if not erroneous.” Bersoff, Test
ing and the Law, 36 Am. Psychologist 1047, 1051 (1981). These
three approaches should be viewed as subsets within a unifying
and common framework. See Guion, On Trinitarian Doctrines of Va
lidity, 11 Prof. Psychology 385, 386 (1980); Messick, Test Ua-
lidity and the Ethics of Assessment, 35 Am. Psychologist 1012
(1980) ; Tenopyr, Content-Construct Confusion, 30 Personnel
Psychology 47 (1977). The three approaches may be discussed
“separately only to acknowledge traditional presentations and avoid
an abrupt departure from tradition.” Principles at 4.
17 “In general, content-related evidence demonstrates the degree
to which the sample of items, tasks, or questions on a test are
representative of some defined universe or domain of content.”
Standards at 10.
18 “Criterion-related evidence demonstrates that test scores are
systematically related to one or more outcome criteria . . . .”
Standards at 11.
Two designs for obtaining criteria-related evidence—predic
tive and concurrent—can be distinguished. A predictive study
obtains information about the accuracy with which early test
data can be used to estimate criterion scores that will be ob
tained in the future. A concurrent study serves the same pur
poses, but obtains prediction and criterion information simul
taneously.
Id.
12
involves a determination that the assessment instrument
accurately measures the degree to which individuals pos
sess identifiable characteristics which have been deter
mined to be important for successful job performance.
See Principles at 25.18
Each of the validation strategies requires the employer
to engage in an essential prerequisite activity. Through
a process known as job analysis, the employer must
clearly identify the most important components of suc
cessful job performance. “Job analysis is essential to
the development of a content-oriented procedure or to the
justification of a construct important to job behavior.”
Principles at 5. “In some situations, the major purpose
of job analysis may be to provide information from
which criterion measures may be developed.” Id. at 6.
Satisfying this essential prerequisite requires an analysis
of the job in question, and a clear articulation of the
knowledge, skills and abilities (“KSAs” ), or other per
sonal characteristics or behaviors the exhibition of which
determine proficiency at that job.19 20 These data can be
secured through judgments of job incumbents, their su
pervisors, personnel specialists, the professional judg
ment of job experts, and through training manuals,
19 “The evidence classed in the construct-related category focuses
primarily on the test score as a measure of the psychological cate
gory of interest . . . . Such characteristics are referred to as con
structs because they are theoretical constructions about the nature
of human behavior.” Standards at 9.
This Court has summarized these approaches in Washington v.
Davis, 426 U.S. at 247 n.13, as it understood them to be described
in the 1966 version of the Standards.
20 Although a job analysis is crucial to all validation strategies
it has been emphasized more clearly in the context of content-
oriented validity. See, e.g., “Content validation should be based on
a thorough and explicit definition of the content domain of interest.
For job selection, classification, and promotion, the characterization
of the domain should be based on job analysis.” Standard 10.4,
Standards at 60-61. But it is required for criterion-related va
lidity, see Standard 10.1. Standards at 60, and construct-related
validity, see Standard 10.8, Standards at 61, as well.
13
job descriptions, and other written information. See
Principles at 19-20; Schneider & Schmitt, supra note 12,
at 47; Thompson & Thompson, Court Standards for Job
Analysis in Test Validation, 35 Personnel Psychology
865 (1982). In addition, the relative importance of these
KSAs must be determined. Finally, a close link between
the assessment device and the identified job content or
behavioral characteristic (construct) must be established.
Standard 10.5, Standards at 61; see also Standard 10.8.
Id.s
In sum, then, the use of particular selection procedures
by an employer reflects his or her implicit assumption
that some important aspect of behavior on the job can be
predicted from an individual’s scores or performance on
the chosen selection procedure. The critical factor under
lying this assumption is the accumulation of evidence or
data to support an inference of the chosen procedure’s
job-relatedness. This can be accomplished only through
the various strategies of validation.
Validation is no less applicable to subjective assess
ment devices than to objective ones. In both cases, ac
curate predictors of job performance are essential to as
sist employers in selecting or promoting individuals who
will best serve their needs, as well as to provide a method
of personnel selection that inhibits consideration of non
job-related factors such as an individual’s race. Indeed,
several commentators have noted the emphasis placed by
many courts and employers on objectivity, at the expense
of validity, i.e., requiring the use of certain “neutral ap
pearing objective tests to measure job performance,
even where the validity of those criteria is clearly ques
tionable. -■ Because of the role that validation plays in 21 22
21 Several books summarize job analysis procedures and discuss
their relative utility in various situations. See, c.p ., S. Gael, J ob
Analysis: A Guide to Assessing Work Activities (1983); E.
Levine, Everything You Ever Wanted to Know about J ob
Analysis (1983); E McCormick, J ob Analysis: Methods and
Applications (1979).
22 See, e.g ., Kleiman & Durham, supra note 5, at 117-118.
14
enhancing the quality of selection procedures and reduc
ing the potential for discrimination, an employer should
be required to provide evidence of the validity of those
procedures whether the employer chooses to use ones
labeled as subjective or objective.
C. To Reduce Sources of Bias the Validity of Each of
the Selection Devices Used by Respondent Must
and Can Be Established by Generally Accepted and
Accessible Validation Strategies.
Industrial psychologists routinely use the three strat
egies for validating assessment devices described in Part
1(B) to validate both objective devices such as standard
ized ability tests and interest inventories, and purely sub
jective or multi-component devices such as interviews,
performance appraisal ratings, constructed performance
tasks, nonscored experience and biographical data intake
sheets, and structured behavioral sample tests. See gen
erally, e.g., Goldstein, supra note 6; Schneider & Schmitt,
supra note 12.
1. The interview.
The employment interview, the technique most heavily
relied upon by respondent in this case, is probably more
widely used than any other selection tool. See H. Hen-
n em a n , D. Schwab, J. F ossum, & L. D yer, Personnel
Human Resource Management (1980); Ulrich &
Trumbo, The Selection Interview Since 1949, 64 Psy
chology Bull. 100 (1965 •. However, because most em
ployers are unaware of research which has determined
which variables are reliably, validly, and uniquely as
sessed in the selection interview, see Schmitt, Social and
Situational Determinants of Interview Decisions: Impli
cations for the Employment Interview, 29 Personnel
Psychology 79, 97 (1976i, the employment interview is
typically subject to interview bias of various types. See
Arvey, Unfair Discrimination in the Employment Inter
view: Legal and Psychological Aspects, 86 Psychology
Bull. 736 (19821. The most commonly known bias is
“halo effect,” where an interviewer may be unduly influ
15
enced by a single trait which colors his/her judgment
of the employee’s other traits. See Anastasi, supra note
14, at 612. A related problem is “stereotyping” in
which an employee is judged “based on his or her
group membership [e.g., race] rather than on the basis
of his or her unique characteristics” Schneider & Schmitt
supra note 12, at 388-389. Another concern is the “simi-
lar-to-me phenomenon” in which the interviewer adopts
the attitude “I am wonderful and I have the following
attitudes and opinions, so if candidates I interview have
the same attitudes and opinions, they must also be won
derful.” Id. at 389. “When combined with stereotyping,
the similar-to-me phenomenon can be a potent deter
minant of interviewer decision-making.” Id.
Interviews can afford an opportunity for direct ob
servation of samples of behavior, albeit limited, mani
fested during the interview and serve to evoke life-history
data, both of which can be important predictors of fu
ture performance, if interviews are developed and con
ducted using generally accepted standards. See Anastasi,
supra note 14, at 610.23 Several recent studies, reviewed
by noted scholars, show that interview judgments can be
valid indicators of subsequent job performance. See
-3 A variety of available sources discuss methods, applications,
and effectiveness of interviewing, and research on the inter
viewing process. Sec, e.g., W. Bingham, B. Moore & J. Gustad,
How TO Interview (4th ed. 19591: R. Fear. The Evaluation
Interview (2d ed. 1973) ; J. Matarazzo & A. Weins, The Inter
view: Research on its Anatomy and Structure (1972); Arvey,
Unfair Discrimination in the Employment Interview: Legal and
Psychological Aspects, 86 Psychology Bull. 736 (1979) ; Dunnette
& Borman, Personnel Selection and Classification Systems, 40 ANN.
Rev. P sychology 47< (1979) ; Grant & Bray, Contributions of the
Interview to Assessment of Management Personnel, 53 J. Applied
Psychology 24 (1969); Schmitt, Social and Situational Deter
minants of Interview Decisions: Implications for the Employment
Interview, 29 Personnel Psychology 79 ( 1976) ; Ulrich &
Trumbo, The Selection Interview Since 1919, 63 Psychology Bull
100 (1965).
16
Arvev & Campion, The Employment Interview: A Sum
mary and Review of Recent Research, 35 PERSONNEL
P sychology 281 (1982). These sources show that in
terviews can be created that are valid and nondiscnmi-
natory if interview questions are carefully linked to job
analysis and performance criterion data.
Interview validity is not alone sufficient, however. The
validity of the interviewer must also be established. An
“interview requires skill in data gathering and in data
interpreting. An interview may lead to wrong decisions
because important data were not elicited or because given
data were inadequately or incorrectly interpreted.
Anastasi, supra note 14, at 610-611. In this regard, in
terviewer training is important. A structured interview
guide will improve interviewer reliability and assist in
removing any bias especially if the training occurs with
applicants of gender and/or race different than that of
the interviewer. With important interviews used to de
termine hiring or promotion, it is very helpful if appli
cants are seen by more than one interviewer, although if
records are kept it is possible to identify those interview
ers w'hose decisions are most reliable and valid and rely
on those interviewers singly to make judgments. See
Schneider & Schmitt, supra note 12, at 390-394; Hakel,
Employment Interviewing in P ersonnel Management
(K. Rowland & G. Ferris eds. 19821.
jm As one example, researchers have described and tested an
innovative but relatively simple and valid employment interview.
Critical incidents, i.c.. reports by job incumbents or supervisors
of situations in which especially effective or ineffective behavior is
displayed, were converted into situational interview questions. The
interviewer posed these situations to job applicants and asked them
how they would behave. Each answer was rated independently by
two or more interviewers on a five-point scale with end points on
the scale provided by job experts to facilitate objective scoring.
The process validily predicted future job performance for both
women and blacks. See Latham. Saari, Pursell & Campion, The
Situational Interview, 65 J. Applild Psychology 422 (1980;.
17
In sum, the use of employment interviews should be
“preceded by a thorough analysis of the target job, the
development of a structured set of questions based on the
job analysis, and the development of behaviorally specific
rating instruments by which to evaluate applicants.”
Schneider & Schmitt, supra note 12, at 395. The assess
ment of employees “should be maximally dependent on
their personal characteristics and minimally dependent
on who made the assessment . . . . Where non-test pre
dictors like interviewer judgments are used, the fem
ployer] should develop procedures that will minimize
error resulting from differences between judges.” P rin
ciples at 12.
2. Rating scales and other performance appraisals.
Performance appraisal devices such as rating scales
are widely used in employment settings, and were used
by respondent in this case.25 “Rating scales differ from
naturalistic observations in that data are accumulated
casually and informally; they also involve interpretation
and judgment, rather than simple recording of observa
tions.” Anastasi, sujrra note 14, at 611. In contrast to
interviews, however, “they typically cover a longer obser
vation period and the information is obtained under
more realistic conditions.” Id.
Like interviews, rating scales are subject to a variety
of sources of contamination or bias, including: (1) Op
portunity bias which occurs if raters do not have the
opportunity to observe the employee in situations in
which the behavior to be rated could be manifested, but
have the opportunity to do so with a competing employee;
(2) halo effect, a tendency on the part of raters to be
unduly influenced by a single favorable or unfavorable
-r' In one survey, 899< of the companies studied reported using
performance appraisals on a regular basis. Locher & Teel, Per
formance Appraisal—A Surrey of Current Practices. 5G PERSON
NEL J. 245 (1977). Of all performance appraisal techniques, the
rating scale is by far the most ubiquitous. See Landy & Farr,
Performance Rating. 87 P sychological Bull. 72, 73 (1980)
[hereinafter Performance Rating].
18
trait, which colors their judgment on the individual’s
other traits; (3) error of central tendency, or the ten
dency to place persons in the middle of the scale and to
avoid extremes; and (4) leniency error, or the reluc
tance of many raters to assign unfavorable ratings.
Both latter errors reduce the effective width of the rating
scale and make them less useful in distinguishing among
individuals. See Anastasi, supra note 14, at 611-612;
Cronbach, supra, note 14, at 509-511; Schneider &
Schmitt, supra note 12, at 90-92; Goldstein, supra note 6,
at 255.
Most troubling in the context of this case is that scores
on rating scales may be affected by the race of the rater
and ratee. White raters have been found to assign sig
nificantly higher ratings to white ratees than black
ratees. These findings were noted in a comprehensive
review of 74 studies involving 17,159 ratees in which
the rater was white and 14 studies involving 2,420
ratees in which the rater was black. Race effects were
more pronounced in real-life settings than in labo
ratory settings and more likely when, as in this case, the
proportion of blacks in the workforce was small. See
Kraiger & Ford, A Meta-analysis of Ratee Race Effects
in Performance Ratings, 69 J. A pplied Psychology 56
(1985).26
Psychologists have published a great deal of accessible
lite ra tu re describing ra ting scale form ats which are use
ful in ra ting employees and have been critiqued in the
26 These finding's are not universal nor do they imply that per
formance appraisal systems are inherently discriminatory. One
study, for example, found no race of rater effect in an indus
trial setting which was highly racially integrated and where par
ticipants in the study had been exposed to human relations train
ing. See Schmidt & Johnson. Effect of Race on Peer Ratings in an
Industrial Setting, 57 J. Applied Psychology 237 (1973). These
mitigating factors are not true with regard to respondent. For a
comprehensive review of the effects of rater and ratee characteris
tics and the interaction of the two see Performance Rating, supra
note 25, at 74-82.
19
context of Title VII requirements.27 * Many of these for
mats would be significant improvements over the system
used by respondent.2* Regardless of format, what does
produce a superior scale is that it is the “result of psy
chometric rigor in development and of some level of par
ticipation of individuals representative of those who will
eventually use the scales to make ratings . . . Perform
ance Rating, sujrra. note 25, at 85.
An essential aspect of the requirement for psychometric
rigor is the job analysis. “The development of rating
procedures should ordinarily be guided by job analyses
27 E.g., Cascio & Bernardin, Implications of Performance Ap
praisal Litigation for Personnel Decisions, 34 Personnel Psy
chology 211 (1981); Distefano, Prver, & Erffmeyer, Application
of Content Validity Methods to the Development of a Job-Related
Performance Rating Criterion, 36 Personnel Psychology 621
(1983); Feild & Holley, The Relationship of Performance Ap
praisal System Characteristics to Verdicts in Select Employment
Discrimination Cases, 25 Acad. Mgmt J. 392 (1982) ; Kleiman &
Durham, supra note 5; Performance Rating, supra note 25. The
dissemination of this information is so widespread that it now
appears in popular literature designed for lay readers. See Rice,
Spotlight on Employee Performance, 9 US Air 53 (August 1987).
w Two of the most popular rating formats are the graphic rating
scale and the behaviorally anchored rating scale (“BARS”).
In the graphic rating scale’s most usual format, several dimensions
to be rated are listed vertically and raters are then asked to make
rating decisions along a horizontal 5 to 9 point scale. For example,
if the dimension to be rated is “accuracy,” the scale may use a
numerical or one-word verbal rating, e.g., 1 to 5 or “high” to “low” ;
preferably, it may use a range of descriptions, e.g., at one end of
the scale, would be “makes too many errors” ; at the other end,
“almost never makes mistakes.” BARS uses dimensions derived by
raters who would actually use the scale with different points on
each dimension anchored by statements describing actual job be
havior which would illustrate specific levels of performance, e.g.,
using “accuracy” in rating a bank teller, the statements could in
clude a range from “makes frequent errors in totalling accounts
at end of day” to “errors in totalling accounts are consistently
rare.” See generally Schneider & Schmitt, supra note 12, at 101-
106. Neither scale format seems more useful than the other in
practice. See Performance Rating, supra note 25, at 85.
20
if, for example, raters are expected to evaluate several
different aspects of performance,” as in this case. P r in
c iples at 10. Job analyses are also important if, as here,
appraisal of past performance is used as a predictor of
future performance. The use of ratings of past perform
ance in one job to make promotion decisions for another
position is only permissible if the ratings of past per
formance are valid and the ratings of past performance
are related to future performance. The latter requires
a job analysis indicating the extent to which the two
jobs overlap. See Cascio & Bernardin, Implications of
Performance Appraisal Litigation for Personnel Deci-
siojis, 34 P e r so n n e l P sychology 211, 217 (1981) .2®
The usefulness of a rating scale is also highly depend
ent on the skill of the rater. “Valid ratings cannot be
made by someone who is either unfamiliar with the work
of the ratee or lacks the skills necessary to accurately
observe or rate the job behavior.” Kleiman & Durham,
supra note 5, at 113. In these respects, “those in imme
diate contact with the subject give superior information,”
Cronbach, supra note 14. at 512 (in the case of employ
ment settings, first level supervisors!, and those raters
who have undergone training show increased “reliability
and validity of ratings” and decreased errors in judg
ment. Anastasi, supra note 14, at 612.29 30
29 See also Albemarle Paper Co. r. Moody. 422 U.S. 405. 431-433
(1975) where this Court, invoking the Uniform Guidelines (and
crediting APA’s Standards) condemned as “materially defective”
the employer’s validation study because its “subjective supervisorial
rankings” used standards which were vague and ambiguous and
failed to follow the Uniform Guidelines’ requirement for job
analyses.
30 Sec also Standard 1.13, Standards at 16: “When criteria are
composed of rater judgments, the degree of knowledge that raters
have concerning ratee performance should be reported. If possible,
the training and experience of the raters should be described” ;
P rinciples at 10: “It may . . . be necessary to train raters in the
observation and evaluation of performance. Further, supervisors
should be expected to be familiar enough with the demands of the
job to evaluate overall performance.” The utility of rater training
21
In sum, rating scales will conform to legal and psycho
metric requirements if the appraisal system is based on
a job analysis, contains clearly defined dimensions of job
performance rather than vague, global measures or ab
stract trait names, is behaviorally based so that all rat
ings can be supported by objective, observable evidence,
and if the raters are in the position to observe the be
haviors to be rated and are trained to reduce sources of
bias, contamination, or other rating errors.
3. Experience requirements.
The use of past experience to make judgments about
future performance, as in this case, is one aspect of a
recognized selection device called biographical inventory
technique or, more commonly, “biodata.” See Owens,
Background Data in H andbook of I n d u st r ia l a n d Or
g a n iza tio n a l P sychology (M . Dunnette ed. 19761. When
used properly, biographical inventories which include
prior experience are “especially appropriate for assessing
the qualifications of women and minority groups.”
Anastasi, supra note 14, at 616. To serve this purpose,
however, the biodata inventory must focus on “specific,
job-relevant past achievements, rather than on the pas
sive exposure implied by the customary education and
experience records.” Id. Respondent’s use of an experi
ence criterion falls far short of this standard.
In fact, use of biodata is relatively likely to produce
adverse impact if biodata items are not chosen carefully.
See G. D reher & p . S a c k e t t . P erspectives on S t a f f in g
a n d S election (19831. A number of comprehensive
in reducing rating errors and minimizing bias, as well as providing
employers with useful techniques for doing so. is demonstrated in.
e.p.. Bernardin & Pence, Effects of Rater Training, G5 J. Applied
P sychology 60 C1980 > ; Borman. Format and Training Effects on
Rating Accuracy and Rating Errors, 61 J. Applied Psychology
410 f 1979) ; Ivancevich. Longitudinal Study of the Effects of Rater
Training on Psychometric Errors in Ratings. 64 J. Applied PSY
CHOLOGY 502 C1079) ; Latham. Wexlev & Purse]]. Training Man
agers to Minimize Rating Errors in the Observation of Behavior,
60 J. Applied Psychology 550 fl975i. Sec generally Goldstein,’
supra note 6, at 254-259.
22
validity studies have been conducted on the use of bio
data as a selection device, concluding that it has incon
sistent validity even in a form more comprehensive than
used by respondent. See, e.g., Reilly & Chao, Validity
and Fairness of Some Alternative Employee Selection
Procedures, 35 Personnel Psychology 1 (1982).31
However, as with all selection devices, the most reason
able and justifiable approach in using biodata is to base
the choice of items on a well done job analysis, matching
the items to the knowledge, skill, and ability require
ments of the job description. See Pace & Schoenfeldt,
Legal Concerns in the Use of Weighted Applications, 30
P ersonnel Psychology, 159 (1977). Past experience
alone, without the careful selection of both logically and
empirically justified life-history questions, is unaccept
able as a selection device.3-
II. THE SUBJECTIVE SELECTION PROCEDURES
USED BY RESPONDENT FAIL TO MEET GEN
ERALLY ACCEPTED STANDARDS AND APPEAR
TO HAVE BEEN APPLIED WITHOUT ANY EVI
DENCE THAT THEY ARE VALID FOR THE IN
FERENCES DRAWN FROM THEM.
When reviewed in the context of the principles and
studies described in Part I, it is clear that the assessment * 1 * 3 * * &
31 Compare Reilly & Chao, Validity and Fairness of Some, Alter
native Employee Selection Procedures, 35 Personnel Psychology
1 (1982) (satisfactory reliability) with Korman, The Prediction
of Managerial Performance: A Review, 21 PERSONNEL PSYCHOL
OGY 295 (1968) (finding biodata to have lower validity than other
predictors for predicting managerial performance ).
3- Sophisticated research on the use of valid biodata is available
to inform the interested employer. See, e.g., Brush & Owens,
Implementation and Evaluation for an Assessment Classification
Model for Manpower Utilization, 32 Personnel PSYCHOLOGY 369
(1979); Schoenfeldt. Utilization of Manpower: Development and
Evaluation of Assessment-Classification Model for Matching In
dividuals with Jobs, 59 J. Applied PSYCHOLOGY 583 (1974) ; Owens
& Schoenfeldt, Toward a Classification of Persons, 46 J. Applied
Psychology 329 (1979). See generally Schneider & Schmitt, supra
note 12, at 378-382.
23
devices used by respondent in this case to select em
ployees, including petitioner, for promotion to the posi
tion of teller supervisor are distressingly inadequate.
They have been used to deny promotion to a member of
a protected class without any evidence that they were
developed, used, and applied in a way consistent with
generally accepted professional standards. There is
absolutely no evidence that the procedures were subjected
to any of the validation strategies available to respond
ent, even in rudimentary form.
It is not unlawful per se for employers to use so-
called “subjective” selection procedures. See Part I ( A).
But, assuming that the use of subjective criteria was ap
propriate for the position of supervisor of tellers, re
spondent failed to perform a job analysis for the posi
tion to identify more accurately the knowledges, skills,
and abilities, which are desirable for successful per
formance of the job.33 34 There is no evidence that respond
ent secured accurate and thorough information about
the job from job incumbents, their supervisors, personnel
specialists, training manuals, job descriptions, or actual
observation by trained observers. See P rinciples at 5-6,
19-24; Schneider & Schmitt, supra note 12, at 47-50.
With regard to the selection procedures themselves,
there are crucial infirmities in each of them. With re
gard to the interview, there is no evidence that the inter
viewer, in this case a white male, had any training in
conducting interviews, a requirement that is especially
important when the interviewee is of a different race and
gender than the interviewer.84 Nor was more than one
33 As noted in Part I. job analysis is a critical first step in estab
lishing the usefulness of any selection procedure. For a review of
26 employment discrimination cases yielding a helpful summary of
requirements for judicially-approved job analyses, arc Thompson &
Thompson, Court Standards for Job Analysis in Test Validation, 35
Personnel Psychology 865 (1982).
34 “The training of interviewers especially with possible appli
cants of different race or sex may increase ‘their ability to relate’
. . . .” Schneider & Schmitt, supra note 12, at 386: sec Schmitt.
Social and Situational Determinants of Interview Decisions: Impli
24
interviewer systematically involved in decisionmaking.
There is also no evidence that the interviews were struc
tured so as to improve reliability and reduce biasing er
rors,35 * nor is there any evidence that the nature of the
interview7 or the questions asked had any empirical, logi
cal, or theoretical connection with the position for which
petitioner was considered.30
With regard to the rating scales, it was especially im
portant for respondent to have offered some evidence for
the validity of its supervisor performance appraisal, as
the use of numerical values gives it facial validity and
the appearance of objectivity. But, when judged accord
ing to the criteria discussed in Part IfBi & (C) (2) the
rating scale is deplorable.
The qualities measured on the rating scale are neither
unambiguously defined nor is there any demonstrable
correlation between many of the criteria listed in the
scale and successful job performance.37 For example, all
cations for the Employment Interview, 29 P ersonnel Psychology
79, 97 (1976). See also Principles at 33: “AH persons within
the organizations who have responsibilities related to the use of
employment tests and related predictors should be qualified through
appropriate training to carry out their responsibilities.”
35 “Use of a structured interview guide will improve interviewer
reliability.” Schneider & Schmitt, supra note 12, at 386.
as ‘‘Predictor variables should be chosen for which there is an
empirical, logical, or theoretical foundation.” P rinciples at 11.
37 The qualities rated were accuracy of work, alertness, per
sonal appearance, supervisor-coworker relations, quantity of work,
physical fitness, attendance, dependability, stability (“the ability to
withstand pressure and remain calm in most situations” ), drive
(“ambition”), friendliness and courtesy, and job knowledge. The
qualities were variously rated on a scale from 0 to 7-10. See Watson
v. Fort Worth Bank &■ Trust, 798 F.2d 791, 812 n.26 (5th Cir. 1986 '
(Goldberg, J.. dissenting). “Few of these categories have much
objective content. For example, ‘personal appearance,’ ‘drive,’ and
‘friendliness and courtesy’ are clearly subjective on their face. .
The rating system is also subjective: [e.g..] 0-1 ‘does not meet
minimum requirement . . .; 7-8, ‘superior work production record.’
This type of subjective measurement lends itself to discriminatory
bias, be it conscious or unconscious.” Id.
25
but two of the rating criteria used by respondent are
totally undefined and the two that have purported defini
tions (stability and drive) are only vaguely defined; none
of qualities assessed have anchoring or endpoint defini
tions, as is customary in generally accepted graphic or
BARS scales.3* The scale thus failed to use clearly de
fined individual components or dimensions of job per
formance, in contrast to undefined global measures, e.g.,
“neat and clean in appearance’’ vs. “personal appear
ance.” 3!> Similarly, it did not use behaviorally based
performance dimensions that could be verified by objec
tive, observable evidence, e.g., “knows how to check ac
count balance vs. “job knowiedge.” 38 * 40
The failure to conduct even a rudimentary job analysis
further undermines the rating scale’s validity. It is far
from clear how criteria such as “physical fitness,” meas
ure an individual’s ability to supervise tellers. The
constructs or behavior traits identified by respondent
such as “drive” or “dependability” could be validated for
use in promoting individuals to supervisory teller posi
tions if demonstrated to be job-related and assessed re
liably from the performance appraisal. Although the use
of such “abstract trait names” is not advised unless the
traits can be defined in terms of observable behaviors,41
it may be necessary to measure such personality con
structs for certain jobs. If certain traits or constructs
are deemed important enough to influence personnel selec
tion, however, they are important enough to measure
validly:
Knowing whether a construct is measured validly re
quires, if not a theory, at least some fairly well arti
culated ideas about what is being measured, what a
38 Sec supra note 28.
3>l Cascio & Bernardin. Implications of Performance Appraisal
Litipatiori for Personnel Decisions, 34 PERSONNEL PSYCHOLOGY 211,
212 (1981) [hereinafter Cascio & Bernardin].
*« Id.
41 Cascio & Bernardin, supra note 39, at 212.
26
measure of the construct should reasonably be ex
pected to be related to and, perhaps more impor
tantly, what it should not be related to. . . . This
view of constructs and construct validity implies two
aspects of a construct-related strategy for developing
evidence to judge the job relatedness of a selection
procedure. The first is evidence that the construct
is indeed important for job performance. . . . Ordi
narily, a job analysis can provide a part of the basis
for identifying and defining constructs which are im
portant to job performance. Clarity of the articula
tion of the meaning and the nature of the construct,
and well-informed expert judgment that a logical
relationship exists between the nature of the con
struct and identifiable demands of the job is essen
tial.
The second is evidence that the instrument used
as a selection procedure is a valid measure of the
construct and not of other constructs.
P r in c ipl e s at 25; see also S ta n d ar d s 9-10.
Finally, there is no indication that the supervisors who
performed the ratings had any training in assessing re
sponses, avoiding the variety of sources of contamination
of bias, especially when there is significant evidence that
scores on rating scales may be affected by the race of
rater and ratee. See Part 1(C) (2).
With regard to experience requirements, respondent
failed to show that the lack of prior experience upon
which it based its failure to promote petitioner in her
first three attempts was related to the position for which
she applied. It could very well be that her experience as
a teller would be predictive of her success as a supervisor
of tellers, but without an analysis of both positions, that
assumption could very well be faulty. In addition, re
spondent idiosyncratic-ally employed the criterion, using
her alleged lack of experience to justify its denying her
promotion in three instances and then ignored the cri
terion to deny her promotion in a fourth instance, when
it promoted a competing applicant with less experience.
Perhaps most importantly, respondent failed to use prior
27
experience as one part of a carefully selected and logi
cally and empirically justified comprehensive biographi
cal data inventory.
Although respondent may have fewer resources to de
vote to assessment device development or validation than
larger organizations, that fact does not excuse the absence
of any attempt to support its use of the selection devices
employed in this case:
Where resources or sample sizes are limited, the
criterion-related evidence of validity and content-
related validation judgments obtained on similar jobs
in other settings and the strength of the construct-
related evidence of validity already generated by the
[particular instrument] become particularly impor
tant. Employers should not be precluded from using
a [particular instrument] if it can be demonstrated
that th [at instrument] has generated a significant
record of validity in similar job settings for highly
similar people[,] or that it is otherwise appropriate
to generalize from other applications.
S tandards at 59 .4- Respondent neglected to conduct even
this inexpensive inquiry. There is absolutely no legiti
mate reason why respondent failed to conduct even crude
validity studies of the selection devices it used to evalu
ate petitioner or, at the very least, to investigate the avail
ability of existing sources of valid selection devices. These
failures are all the more disturbing considering its poor
record in hiring and promoting minority employees.4;i 42 43
42 Researchers and employers are encouraged to conduct coopera
tive studies when adequate data . . . are not available.” Principles
at: The Uniform Guidelines also clearly permit exceptions to the
general requirement that employers validate their own procedures.
Sec 29 C.F.R. §§ 1607.6-.8.
43 ‘‘[Pllaintiff presented 'significant proof’ that the bank opera-
ated under a 'general policy of discrimination’.” Watson v. Fort
Worth Sank <£- Trust, 798 F.2d 791. 807 (5th Cir. 198G > (Goldberg,
J., dissenting). See. id. at 810-814 for supporting statistical data.
28
III. THE FAILURE TO REQUIRE THAT SUBJECTIVE
SELECTION DEVICES HAVE DEMONSTRABLE
VALIDITY WOULD UNDERMINE THE PURPOSES
OF TITLE VII.
The underlying goal of Title VII of the Civil Rights
Act of 1964 was the “eliminat [ion] . . . [of] discrimina
tion in employment” based on race, color, religion, sex
or national origin in all of its forms. H.R. Rep. No. 914,
88 Cong., 2d Sess., reprinted in 1964 U.S. Code Cong. &
Ad. News 2391, 2401. Consistent with that goal, Title
VII prohibits employers from discriminating in employ
ment decisions based on such impermissible classifications.
See 42 U.S.C. § 2000e-2. However, because “Congress did
not intend by Title VII . . . to guarantee a job to every
person regardless of his qualifications,” Griggs v. Duke
Power Co., 401 U.S. 424, 430 <1971), it authorized em
ployers to distinguish among individuals for selection or
promotion purposes based “upon the results of any pro
fessionally developed ability test provided that such tests,
its administration or action upon the results is not de
signed, intended or used to discriminate because of race,
color, religion, sex or national origin.” 42 U.S.C. 2000e-
2(h).44
In permitting the appropriate use of “professionally
developed ability tests” as a basis for selecting and pro
moting individuals, the drafters of the ultimate language
of Title VII stressed the importance of demonstrating
the relationship and relevance of the selection procedure
to job qualifications. See 110 Cong. Rec. 7247 (1964).
Recognizing the “u [tility]” of “testing or measuring
procedures,” this Court has also stressed Congress’ intent
to “forbid [] . . . giving these devices and mechanisms
44 “[T]he Act does not command that any person be hired simply
because he was formerly the subject of discrimination, or because
he is a member of a minority group . . . . What is required by
Congress is the removal of artificial, arbitrary, and unnecessary
barriers to employment when the barriers operate invidiously to
discriminate on the basis of racial or other impermissible factors.”
Griggs, 401 U.S. at 431.
29
controlling force unless they are demonstrably a reason
able measure of job performance.” Griggs, 401 U.S. at
436. “What Congress . . . commanded is that any tests
used must measure the person for the job and not the
person in the abstract.” Id.
Proponents of Title VII opposed authorizing the use
of “professionally developed ability tests” without regard
to their ability to predict performance of the particular
job in question. 110 Cong. Rec. at 13504. The use of
tests which, although “professionally developed” bear no
relation to the job for which they are being used to assess
individuals, was clearly recognized as a potential means
of covert “ [d] iscrimination [,] . . . under the guise of
compliance with the statute.” Id. Assessment devices
which have not been shown to be job-related, or otherwise
predictive of performance of a particular job, cannot
justify discriminatory employment practices. The crucial
public policy goals of Title VII would be thwarted if em
ployers could rebut claims of discrimination simply by
pointing to the results of unvalidated assessment devices,
whether subjective or objective. Indeed, such unvalidated
results may well reflect precisely the discrimination
Congress sought to eliminate in Title VII.
In light of the “nondiscrimination” objectives of Title
VII and the demonstrated ability of professionals to vali
date subjective assessment devices, however, there is no
principled reason to treat objective and subjective devices
differently in imposing a validation requirement, regard
less of whether a plaintiff proceeds under disparate im
pact or disparate treatment claim.4" Indeed, permitting
the use of unvalidated subjective assessment devices
while requiring objective devices to be validated provides
a ready mechanism for covert discrimination for em
ployers seeking to avoid the constraints of Title VII.
Validation does require the expenditure of both time and
money by an employer. But. as amicus has demonstrated,
4r' Texas Dep’t of Comm. Affair? r. Burdinc, 450 U.S. 248 (1981) ;
McDonnell Douglas Carp. v. Green, 411 U.S. 792 (1973).
there are a number of readily available techniques for
developing, adopting, and validating both objective and
subjective devices, and both professional and legal stand
ards allow the use of already developed selection devices.
Thus, when one balances the relative costs of validation
to the employer against the costs of eroding the protec
tions provided by Title VII and the damage to society of
perpetuating the vestiges of discrimination, the outcome
clearly favors the requirement that employers use pyscho-
metrically sound and job-relevant selection devices.
CONCLUSION
For the foregoing reasons, amicus respectfully requests
that this Court reverse the decision of the Court of Ap
peals for the Fifth Circuit insofar as it releases em
ployers from their obligation to show that the selection
devices they use to make employment decisions are valid.
Respectfully submitted,
3 0
D onald N. B ersoff
(Counsel of Record)
Laurel Pyke Malson
Donald B. Verrilli, Jr.
E n n is F riedman & B ersoff
1200 - 17th Street, N.W., Suite 400
Washington, D.C. 20036
(202) 775-8100
Attorneys for Amicus Curiae
American Psychological Association
September 14, 1987