Watson v. Fort Worth Bank and Trust Brief Amicus Curiae in Support of Petitioner
Public Court Documents
September 14, 1987

Cite this item
-
Brief Collection, LDF Court Filings. Watson v. Fort Worth Bank and Trust Brief Amicus Curiae in Support of Petitioner, 1987. d0a6e2b5-c89a-ee11-be36-6045bdeb8873. LDF Archives, Thurgood Marshall Institute. https://ldfrecollection.org/archives/archives-search/archives-item/77e92b64-598d-4b9b-85b4-5757b385c243/watson-v-fort-worth-bank-and-trust-brief-amicus-curiae-in-support-of-petitioner. Accessed October 09, 2025.
Copied!
No. 86-6139 In The CCnurt at ilyp Hmtpfc Stairs October Term, 1987 Clara Watson, Petitioner, v. Fort Worth Bank & Trust, Respondent. On Writ of Certiorari to the United States Court of Appeals for the Fifth Circuit BRIEF FOR AMICUS CURIAE AMERICAN PSYCHOLOGICAL ASSOCIATION IN SUPPORT OF PETITIONER Donald N. Bersoff (Counsel of Record) Laurel Pyke Malson Donald B. Verrilli, Jr. E nnis Friedman & Bersoff 1200 - 17th Street, N.W., Suite 400 Washington, D.C. 20036 (202) 775-8100 Attorneys for Amicus Curiae American Psychological Association September 14,1987 W il s o n - Epes P r in t in g C o . , In c . - 7 8 9 - 0 0 9 6 • W a s h in g t o n , D .C . 2 0 0 0 1 TABLE OF CONTENTS Page TABLE OF AUTHORITIES_____________________ iii INTEREST OF AMICUS C U RIA E_______________ 1 INTRODUCTION AND SUMMARY OF ARGU MENT _______________________________________ 2 ARGUM ENT___________________________________ 4 I. BECAUSE SUBJECTIVE ASSESSMENT DE VICES CAN, AND SHOULD, BE SCIENTIFI CALLY VALIDATED, THE USE OF SUBJEC TIVE SELECTION CRITERIA AND PROCE DURES BY EMPLOYERS SHOULD NOT PRECLUDE REVIEW UNDER ANY TITLE VII TH EO RY ____________________________ 4 A. Professional Standards Concerning the Tech nical Adequacy of Selection Devices are Ap plicable to the Subjective Methods Used by Respondent____________________________ 4 B. There are Generally Accepted Strategies for Establishing the Validity of Subjective Meth ods of Employee Selection______________ 9 C. To Reduce Sources of Bias the Validity7 of Each of the Selection Devices Used by Re spondent Must and Can Be Established by Generally Accepted And Accessible Valida tion Strategies_________________________ 14 1. The interview ________________________ 14 2. Rating scales and other performance appraisals__________________________ 17 3. Experience requirements_____________ 21 II. THE SUBJECTIVE SELECTION PROCE DURES USED BY RESPONDENT FAIL TO MEET GENERALLY ACCEPTED STAND ARDS AND APPEAR TO HAVE BEEN AP PLIED WITHOUT ANY EVIDENCE THAT THEY ARE VALID FOR THE INFERENCES DRAWN FROM T H E M ___________________ 22 11 Page III. THE FAILURE TO REQUIRE THAT SUB JECTIVE SELECTION DEVICES HAVE DEMONSTRABLE VALIDITY WOULD UN DERMINE THE PURPOSES OF TITLE VII- 28 CONCLUSION__________________________________ 30 TABLE OF CONTENTS—Continued Ill TABLE OF AUTHORITIES CASES: Page Albemarle Paper Co. v. Moody, 422 U.S. 405 (1 9 7 5 )___________________________________ 20 Brito v. Zia, 478 F.2d 1200 (10th Cir. 1973)______ 9 Debra P. v. Turlington, 644 F.2d 397 (5th Cir. 1981) ____________________________________ 6 Douglas v. Hampton, 512 F.2d 976 (D.C. Cir. 1975)_____________________________________ 6 Griggs v. Duke Power Co., 401 U.S. 424 (1971)___ 28, 29 Harless v. Duck, 14 FEP Cases 1616 (N.D. Ohio 1977)____________________________________ 6 McDonnell Douglas Corp. v. Green, 411 U.S. 792 (1973)___________________________________ 29 Texas Dep’t of Comm. Affairs v. Burdine, 450 U.S. 248 (1981)______________________________ 29 Washington v. Davis, 426 U.S. 227 (1976)______ 6, 12 Watson v. Fort Worth Bank <£ Trust, 798 F.2d 791 (5th Cir. 1986)_______________________ 24, 27 STATUTES & REGULATIONS: 42 U.S.C. § 2000e et se q ._____________________ 2, 3, 28 29 C.F.R. § 1607 et seq______________________3, 7, 8, 27 H.R. Rep. No. 914, 88 Cong., 2d Sess., reprinted in 1964 U.S. Code Cong. & Ad. News 2391______ 28 MISCELLANEOUS: AERA, APA, NCME, Standards for Educational and Psychological Testing (1985)________ passim AERA, APA, NCME, Standards for Educational and Psychological Tests (1974)_____________ 6 A. Anastasi, Psychological Testing (5th ed. 1982) _____________________ 10, 15, 16, 17, 18, 20, 21 APA, Standards for Educational and Psychological Tests and Manuals (1966)_________________ 6 APA, Technical Recommendations for Psycholog ical Tests and Diagnostic Techniques (1954)___ 6 Arvey, Unfair Discrimination in the Employment Interview: Legal and Psychological Aspects, 86 Psychology Bull. 736 (1982)_____________ 14,15 IV Arvey & Campion, The Employment Interview: A Summary and Review of Recent Research, 35 Personnel Psychology 281 (1982 )___________ 16 Bernardin & Pence, Effects of Rater Training, 65 J. Applied Psychology 60 (1980)___________ 21 Bersoff, Testing and the Law, 36 Am. PSYCHOLO GIST 1047 (1981 )___________________________ 11 W. Bingham, B. Moore & J. Gustad, How to In terview (4th ed. 1959)_____________________ 15 Borman, Format and Training Effects on Rating Accuracy and Rating Errors, 64 J. Applied Psychology 410 (1979)_____________________ 21 Brush & Owens, Implementation and Evaluation for an Assessment Classification Model for Man power Utilization, 32 Personnel Psychology 369 (1979 )_________________________________ 22 Cascio & Bernardin, Implications of Performance Appraisal Litigation for Personnel Decisions, 34 Personnel Psychology 211 (1981)________ 19, 20, 25 L. Cronbach, Essentials of Psychological Testing (4th ed. 1984)___________________ 10, 18, 20 Distefano, Pryer, & Erffmeyer, Application of Con tent Validity Methods to the Development of a Job-Related Performance Rating Criterion, 36 Personnel Psychology 621 (1983)_________ 19 G. Dreher & P. Sackett, Perspectives on Staff ing and Selection (1983)__________________ 21 Dunnette & Borman, Personnel Selection and Clas sification Systems, 40 An n . Rev. Psychology 477 (1979 )_________________________________ 15 R. Fear, The Evaluation Interview (2d ed. 1973)______________________________________ 15 Feild & Holley, The Relationship of Performance Appraisal System Characteristics to Verdicts in Selected Employment Discrimination Cases, 25 Acad. Mgmt J. 392 (1982)_________________ 19 Friedman & Williams, Current Use of Tests for Employment in 2 Ability Testing (A. Wiedor & W. Garner eds. 1982) TABLE OF AUTHORITIES—Continued Page 9 V S. Gael, Job Analysis: A Guide to Assessing W ork Activities (1983)____________________ 13 I. Goldstein, Training in Organizations (2d ed. 1986 )__________________________________ 5, 14, 18, 21 Grant & Bray, Contributions of the Interview' to Assessment of Management Personnel, 53 J. Ap plied P sycology 24 (1969)__________________ 15 Guion, On Trinitarian Doctrines of Validity, 11 Prof. Psychology 385 (1980 )_______________ 11 Hakel, Employment Interviewing in PERSONNEL Managem ent (K. Rowland & G. Ferris eds. 198 2 )______________________________________ 16 H. Henneman, D. Schwab, J. Fossum, & L. Dyer, Personnel/H uman Resource Management (1980)_____________________________________ 14 Ivancevich, Longitudinal Study of the Effects of Rater Training on Psychometric Errors in Rat ings, 64 J. Applied Psychology 502 (1979)___ 21 Kleiman & Durham, Performance Appraisal, Pro motion and the Courts: A Critical Review, 34 Personnel Psychology 103 (1981)_____ 5,13,19, 20 Korman. The Prediction of Managerial Perform ance: A Review, 21 Personnel Psychology 295 (1968 )_________________________________ 22 Kraiger & Ford, A Meta-analysis of Ratee Race Effects in Performance Ratings, 69 J. Applied Psychology 56 (1985)_______________________ 18 Landv & Farr, Performance Rating , 87 Psycho logical Bull. 72 (1980)__________________ 17, 18, 19 A. Larson & L. Larson, Employment Discrim ination § 15-87 (1986)_____________________ 5 Latham, Saari, Pursell & Campion, The Situational Interview, 65 J. Applied Psychology 422 (1 9 8 0 )_____________________________________ 21 Latham. Wexley & Pursell, Troinina Manaaers to Minimize Rating Errors in the Observation of Behavior, 60 J. Appi IED PSYCHOLOGY 550 (1 9 7 5 )_______________________________________ 16 E. Levine, Everything You Ever Wanted to Know about Job Analysis (1983) . TABLE OF AUTHORITIES—Continued Page 13 VI TABLE OF AUTHORITIES—Continued Page Locher & Teel, Performance Appraisal— A Survey of Current Practices, 56 PERSONNEL J. 245 (1 9 7 7 )-------------------------------------------------------- 17 J. Matarazzo & A. Weins, The Interview: Re search on its Anatomy and Structures (1 9 7 2 )-------------------------------------------------------- 15 E. McCormick, Job Analysis: Methods and Applications (1979)_______________________ 13 Messick, Test Validity and the Ethics of Assess- ment, 35 Am . P sychologist 1012 (1 9 8 0 )_____ n Owens, Background Data in Handbook of Indus trial and Organizational Psychology (M. Dunnette ed. 1 9 7 6 )_________________________ 21 Owens & Schoenfeldt, Toward a, Classification of Persons, 46 J. Applied Psychology 329 (1979) _ 22 Pace & Schoenfeldt, Legal Concerns in the Use of Weighted Applications. 30 Personnel Psychol ogy 159 (1977)____________________________ 22 Reilly & Chao, Validity and Fairness of Some Ah temotive Employee Selection Procedures, 35 Personnel Psychology 1 (1982)_____________ 22 Rice, Spotlight on Employee Performance, 9 US Air 53 (August 1987)______________________ 19 Schmidt & Johnson, Effect of Race on Peer Ratings in an Industrial Setting, 57 J. Applied Psychol ogy 237 (1973)_____________________________ 18 Schmitt, Social and Situational Determinants of Interview Decisions: Implications for the Em ployment Interview, 29 Personnel Psychology 79 (1976)---------------------------------------------- 14, 15, 23 B. Schneider & N. Schmitt, Staffing Organiza tions (2d ed. 1 9 8 6 )_______________________ ; Schoenfeldt, Utilization of Manpower: Develop ment and Evaluation of Assessment-Classifica tion Model for Matching Individuals with Jobs, 59 J. Applied Psychology 583 (1974) 22 Vll Society for Industrial and Organizational Psy chology, Principles for the Validation and TABLE OF AUTHORITIES—Continued Page Use of Personnel Selection Procedures (1987) ___________________________________ passim Tenopyr, Content-Construct Confusion, 30 Per sonnel Psychology 47 (1 9 7 7 )_______________ 11 Thompson & Thompson, Court Standards for Job Analysis in Test Validation, 35 Personnel Psy chology 865 (1982)________________________ 13,23 Ulrich & Trumbo, The Selection Interview Since 19U9 , 63 Psychology Bull. 100 (1965)__________14,15 Waintroob, The Developing Law of Equal Employ ment Opportunity at the White Collar and Pro fessional Level, 21 Wm. & Mary L. Rev. 45 (1979) _____________________________________ 5 BRIEF FOR AMICUS CURIAE AMERICAN PSYCHOLOGICAL ASSOCIATION IN SUPPORT OF PETITIONER INTEREST OF AMICUS CURIAE The American Psychological Association (“APA”) is a nonprofit, scientific, and professional organization with more than 65,000 members. It has been the major asso ciation of psychologists since 1892, and includes the vast majority of psychologists holding doctoral degrees from accredited universities in this country. Among APA’s major functions are the promotion of psychological re search, the dissemination of information regarding hu man psychological behavior, the promulgation of stand ards governing scientific and professional practice, in cluding assessment, and, as reflected in its Bylaws, the “advance [ment] of psychology as a science and profes sion.” A substantial number of APA’s members are con cerned with the development and validation of assessment devices for personnel selection in the employment context, including the more than 2500 members who belong to the APA’s Division of Industrial and Organizational Psy chology and the 1500 members who belong to its Division of Evaluation and Measurement. The APA has participated as amicus in many cases in this Court involving social science issues, including Ken tucky v. Stincer, 107 S. Ct. 2658 (1987) (effects on child victims of sex abuse of testifying in the presence of their alleged abusers); Colorado v. Connelly, 107 S. Ct. 515 (1986) (behavioral effects of command hallucinations); and Lockhart v. McCrec, 106 S. Ct. 1758 (1986) (“con- viction-proneness” of “death-qualified” juries). APA con tributes amicus briefs only where it has special knowl edge to share with the Court. APA regards this as one of those cases. In this instance, APA wishes to inform this Court of the state of current scientific thought re garding validation of personnel assessment devices, in cluding subjective selection criteria and procedures such as those used by respondent in this case. 2 Petitioner and respondent have consented to the filing of this amicus brief. Their letters of consent are on file with the Clerk of the Court. INTRODUCTION AND SUMMARY OF ARGUMENT APA addresses in this brief an essential and inherent issue in this case. Amicus will leave it to the parties to argue whether disparate impact analysis under Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et. seq., is properly applied to review the legality of subjective assessment devices for the hiring and promotion of em ployees. However, insofar as a negative answer is grounded in the assumption that subjective assessment devices are not amenable to psychometric scrutiny in the same way that ability tests are, such an assumption is contrary to fundamental and generally accepted scientific principles of measurement. The most frequently articu lated reason for limiting disparate impact analysis to ob jective criteria and procedures—that only objective cri teria and procedures yield sufficient statistical data to permit scientific validation—is not supported by the relevant social science literature. Indeed, the view that only objective selection criteria and procedures can be clearly identified, applied equally to all applicants, and statistically evaluated has been discredited by the ex tensive work of industrial psychologists and other assess ment specialists. Subjective selection devices can be scien tifically validated for the assessment of individuals for hiring, promotion, or other selection decisions in the em ployment context. The choice of analyses under Title VII, therefore, should not turn on whether the challenged employment practices are based on objective or subjec tive evaluations of applicants. The APA’s Standards for E ducational and Psycho logical Testing (1985) [hereinafter Standards] pro vide a framework for the evaluation and validation of testing and other assessment devices, including such sub jective devices as interviews, behavioral observations, and rating scales. The Standards are consistent with the P rinciples for the Validation and Use of Personnel 3 Selection P rocedures (1987) [hereinafter P rinciples] published by the Society for Industrial and Organiza tional Psychology.1 Furthermore, the Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. §§ 1607.1, et. seq. [hereinafter Uniform Guidelines] were explicitly intended to be consistent with the Standards. Id . at § 1607.5(C). Such technical standards were clearly con templated by the drafters of Title VII when they referred to the use of “pj'ofessionally developed” assessment de vices by employers. 42 U.S.C. § 2000e-2(h>. When ex amined in light of the Standards, Principles, and Uni form Guidelines, it is clear that the procedures respond ent used in this case to evaluate petitioner for promotion were not shown to be scientifically valid, i.e., appropriate, meaningful, or useful for the inferences drawn from them. See Part 1(B). More importantly, however, the Standards and P rinciples provide ample guidelines for how those procedures, notwithstanding their subjective nature, could have been validated. The recognition that subjective assessment procedures may be validated is critical to the effectuation of the underlying goals of Title VII. Subjective procedures, like those of a more objective nature, should be required to be validated for specific jobs in any case where employers use those procedures as a defense to a prima facie case of discrimination, whether the claim is analyzed under a dis parate treatment or a disparate effect theory. Only un der such a rule will employers be inhibited from making personnel decisions based on unlawful and irrelevant fac tors. Should the Court in this case implicitly approve the use of unvalidated selection procedures as a defense to any Title VII claim, employers will have greater incen tive to resort more readily to subjective assessment de- 1 The Society is an integral component of the amicus and is also known as Division 14 of the APA (Division of Industrial and Organizational Psychology). Until adopted by amicus as a whole, the Principles are the formal policy only of Division 14. However, they are “intended to represent the consensus of professional knowl edge and thought as it exists today. . . Id. at 3. 4 vices, which would facilitate covert and discriminatory decisionmaking and severely undermine the right of equal employment opportunities for those classes of persons otherwise protected by Title VII. ARGUMENT I. BECAUSE SUBJECTIVE ASSESSMENT DEVICES CAN, AND SHOULD. BE SCIENTIFICALLY VALI DATED, THE USE OF SUBJECTIVE SELECTION CRITERIA AND PROCEDURES BY EMPLOYERS SHOULD NOT PRECLUDE REVIEW UNDER ANY TITLE VII THEORY. Amicus relies on petitioner and other supporting aviici to establish the applicability of disparate impact analysis under Title VII to subjective selection criteria for hiring and promotions. Amicus wishes to share with the Court its unique knowledge of testing and other assessment de vices, and the validation of such devices, so that the Court’s decision regarding the proper standard of analysis will be informed by relevant social science data. However, should the Court ultimately determine that disparate treatment analysis is the appropriate standard of review for subjective assessment devices in the employment con text, amicus believes that such a holding in no way obvi ates the principle that such devices can and should be validated for the particular job in question. A. Professional Standards Concerning the Technical Adequacy of Selection Devices are Applicable to the Subjective Methods Used by Respondent. At issue in this case are three selection devices by which petitioner was evaluated for promotion by respond ent’s agents—interviews, supervisor’s ratings, and ex perience requirements. Respondent characterizes these as subjective assessment procedures, in contrast to such procedures as multiple choice standardized paper-and- pencil tests which are typically classified as objective measures. Although in the context of personnel assess ment procedures the term “subjective” is not easily de fined, the concept has been used to refer variously to 5 procedures in which “judgment or discretion fis exer cised] on the part of the evaluator” " or which lack any “neutral” factors,2 3 to assessment devices of a “non mechanical, operator-dependent” nature,4 or to appraisals not based on “ ‘hard data,’ such as production records [or] attendance.” 5 Most simply, “Measures that require the statement of opinion, beliefs, or judgments are con sidered subjective.” 6 Regardless of the particular defini tion, they are consistent with the widely-held view that subjective devices used by employers for hiring or promo tion purposes are inherently less scientific, less quantifi able, less reliable, and less facially neutral than their objective counterparts.7 For this reason, it has been as- 2 Waintroob, The Developing Law of Equal Employment Oppor tunity at the White Collar and Professional Level, 21 Wm. & Maf.Y L. Rev. 45, 48 (1979). 3 3 A. Larson & L. Larson, E mployment Discrimination § 15-87 (1986). 4 Brief of United States as Amicus Curiae on Petition for Writ of Certiorari at 13 [hereinafter Brief of United States]. 6 Kleinian & Durham, Performance Appraisal, Promotion and the Courts: A Critical Review, 34 Personnel Psychology 103, 114 (1981) [hereinafter Kleiman & Durham]. « I. Goldstein, Training in Organizations 136 (2d ed. 1986) [hereinafter Goldstein], “For example, rating scales are subjective measures, while measures of absenteeism are more objective. (However, supervisors’ ratings of the absenteeism level of em ployees could turn that measure into a subjective criterion).’’ Id. 7 “ [S]ubjective measures are affected by the difficulties that one individual has in rating another without bias.” Id. at 136-137. These difficulties, however, are not due to any inherent charac teristics of subjective measures but to the failure of employers to apply standard principles of test construction fo subjective meas ures. It is not intrinsic to such devices to be unquantifiable. When properly developed, they are amenable to scoring and objective analysis. For example: rating scales have been the most commonly employed measures in applied settings. . . . [One] reason is that it is simple to throw together a rating scale with a few traits . . . and delude yourself into believing that you have a useful measure of performance. Professionals . . . know that the steps in 6 serted that subjective selection methods and criteria are not susceptible to scientific “validation” or any other psychometric scrutiny. See, e.g., Brief of United States, supra note 4, at 15, and references therein. This view is fundamentally at odds with the universal judgment of those most experienced and knowledgeable in techniques of measurement and evaluation generally, and in the appraisal of employee performance specifically. The most authoritative source for the standards to be applied to determine the technical adequacy of assess ment devices, the appropriateness of specific applications of these devices, and the reasonableness of inferences based on the results of these devices are the Standards for Educational and P sychological Testing (1985), a joint publication of the amicus APA, the Ameri can Educational Research Association (“AERA” ), and the National Council on Measurement in Education (“NCME”).e Consistent with the Standards, the Division 14 Prin ciples for the Validation and U se of Personnel Se- 8 the process are very similar for objective and subjective measures and that shortcuts do not work in either case. Id. at 137 (emphasis added). 8 The 1985 Standards represent the most modern expression of professional and scientific thinking concerning technical advances in psychological assessment. Its predecessor documents are, APA, Technical Recommendations for Psychological Tests and Diagnostic Techniques (1954); APA. Standards for Educational and Psycho logical Tests and Manuals (1966); and AERA, APA, NCME Standards for Educational and Psychological Tests (1974). The 1966 and 1974 forerunners of the current Standards have been cited with approval by this Court and in a variety of lower federal cases. S(e, e.g.. Washington v. Davis, 426 L’.S. 227, 247 n.13(1976); Debra P. r. Turlington, 644 F.2d 397. 405 n.10 (5th Cir. 1981'; Douglas r. Hampton, 512 F.2d 976, 984-986 (D.C. Cir. 1975) (see especially id. at 984 n.59 where the court called the 1966 edition “the universally recognized professional authority” ); Harless v. Duck. 14 FEP Cases 1616, 1624 n.5 (N.D. Ohio 1977) (stating that the “courts have almost unanimously agreed that the lection Procedures (1987) apply the more general guidelines of the Standards to the specific problems of making decisions in the context of employee selection, placement, and promotion and provide, inter alia, “prin ciples for the application and use of valid selection pro cedures, and information that may be helpful to person nel managers and others responsible for authorizing or implementing validation efforts.” Principles at 2.8 9 Of relevance as well are the Uniform Guidelines. “The provisions of these guidelines . . . are intended to be consistent with generally accepted professional standards for evaluating standardized tests and other selection pro cedures, such as those described in the Standards . . . prepared by a joint committee of the” APA, AERA, and NCME. 29 C.F.R. § 1607.5(C). They “incorporate a single set of principles which are designed to assist em ployers . . . to comply with requirements of Federal law prohibiting employment practices which discriminate on the grounds of race. . . .” Id. at § 1607.1(B). The Standards, P rinciples, and Uniform Guidelines apply to a broad range of selection procedures, not merely to the traditional objective measures commonly denomi nated as tests. Although the full title of the Standards uses the word “testing,” the term is generic and refers to “standardized ability . . . instruments, diagnostic and evaluative devices, interest inventories, personality inven tories, and projective instruments.” Standards at 3. The term also includes samples of observable behavior “relevant to . . . employment decisionmaking.” Id. at 4. Amicus, AERA, and NCME unequivocally view the [1974] APA guidelines provide persuasive standards for evaluat ing: claims of job relatedness”). 9 The 1987 Principles are a revision of the original 1980 version. “The purposes of the revision are to bring the Principles up to date scientifically, to make them consistent with the Standards, and to reduce possible ambiguities regarding good practice in the use of selection procedures in making employment decisions.” Principles at 1. 8 Standards as useful and applicable “to the entire range of assessment techniques.” Id .w Similarly, the Division 14 P rinciples are not appli cable solely to standardized paper-and-pencil tests. They are explicitly intended to aid employers who make hiring and promotion decisions to choose, select, develop, eval uate, and use all personnel selection devices, including “performance tests, . . . personality [and] interest in ventories, . . . biographical data forms or scored appli cation blanks, interviews, . . . experience requirements, . . . appraisals of job performance, . . . [and] esti mates of advancement potential.” Principles at 1. Finally, the Uniform Guidelines “provide a framework for determining” not only “the proper use of tests” but “other selection procedures” as well. 29 C.F.R. § 1607.1 (B). They plainly “apply to tests and other selection procedures which are used as a basis for any employment decision[,] including “hiring and promotion.” Id. at § 1607.2 (B1 .n In sum, then, it is the universal professional judgment that the assessment devices used by respondent in decid ing not to promote the petitioner can properly be scrutin ized under the applicable scientific and professional standards, principles, and guidelines concerning the evaluation of the psychometric soundness of such de vices.10 11 12’ In the present case, those devices were not sub 10 Such “instruments . . . are called tests here to indicate that the standards also apply to these instruments.” Id. at 4-5. 11 See also, id., at § 1607.15(A)(1) (referring to selection proce dures “either standardized or not standardized”). 12 See, e.g., B. Schneider & N. Schmitt, Staffing Organiza tions 14 (2d ed. 1986) [hereinafter Schneider & Schm ittl: Typically we think of a test as an examination of some kind responded to with paper and pencil. . . . In fact, industrial psychologists and the L'niform Guidelines on Employee Selec tion Procedures (1978) have defined the word “test” in much broader terms and the courts have adopted this definition. In brief, a test is defined as any form of collecting information 9 jected to any such scrutiny to determine if they were valid for the purpose for which they were used. B. There are Generally Accepted Strategies for Estab lishing the Validity of Subjective Methods of Em ployee Selection. There are many elements involved in the construction, development, and evaluation of an assessment instru ment. Industrial psychologists and others who create a test or other selection procedure must choose the domain to be assessed, construct the items to w7 * * * * *hich test takers will respond or select the behaviors to be observed, de velop scoring scales and norms so that results can be interpreted, prepare manuals, and most importantly, ensure that the instrument is psychometrically sound. See S ta n d ar d s at 9-37. The psychometric soundness of an instrument depends primarily on its being reliable 13 and valid. Although both qualities are essential, there is no doubt that “validity is the most important consideration in test evaluation.” S ta n d ar d s at 9.14 when that information is used as a basis for making an em ployment decision. So, interviews are tests, as are application blanks, . . . performance appraisals used as a basis for mak ing promotions (which, obviously are selection decisions), and any other kind of information used for making employment decisions. Accord Friedman & Williams, Current Use of Tests for Employ ment in 2 Ability Testing 99-100 (A. Wigdor & W. Garner eds. 1982) published by The National Academy of Sciences (acknowledg ing that the definition of selection procedures has extended to the full range of assessment devices, including interviews and are to be scrutinized according to the same guidelines used to evaluate standardized tests). See also Brito v. Zia, 478 F.2d 1200 (10th Cir. 1973) (holding that subjective observations by employer of minor ity employees had to be supported by empirical evidence of validity). 13 “Reliability refers to the degree to which test scores are free from errors in measurement.” Standards at 19. The more reli able a test score the more consistent, dependable, or repeatable it will be. See Principles at 39. 14 “Undoubtedly the most important question to be asked about any psychological test concerns its validity . . . .” A. Anastasi, 10 Validation refers to the process by which psychologists ascertain the degree to which certain inferences from a particular assessment device are appropriate, meaningful, or useful. See Standards at 9; Principles at 4. A test is valid if the proposed interpretation of scores proves to be sound and relevant. See Cronbach, supra note 14, at 125. Validity “concerns what the test measures and how- well it does so. It tells us what can be inferred from test scores.” Anastasi, supra note 14, at 131. When a selec tion procedure or device is said to be “validated,” psy chologists understand that the predictions inferred from the scored result of the procedure or device have a high rate of accuracy. For validation to be meaningful, it must predict performance of a particular task or set of tasks or other job relevant behavior of particular concern to the employer. Thus, selection procedures or devices are validated for a particular job when they have been dem onstrated scientifically to make reliable and meaningful distinctions between individuals on the basis of their abil ity to perform particular tasks w-ith competence or to function successfully in a particular job.™ “The simple psychometi ic fact that test validity must be ascertained for specific uses of the test has long been familiar An invalid test or one that includes elements not related to the job under consideration may unfairly exclude minority group members who could have performed the job satisfactorily.” Anastasi, supra note 14, at 432. Validity “is a unitary concept.” Standards at 9. However, within the unifying theme of determining the P sychological Testing 27 (5th ed. 1982) [hereinafter Anastasi] Obviously, no aspect of a test is more important than valid ity . . . . L. Cronbach. Essentials of Psychological Testing 1-5 (4th ed. 1984) [hereinafter Cronbach], The texts by Anastasi and Cronbach are considered the most authoritative basic treatises on the topic of psychological measurement. Only as abbreviation is it legitimate to speak of ‘the validity of a test'; a test relevant to one decision may have no value for another. So users must ask. ‘How valid is this test for the decision made, or ‘How valid are the interpretations I am making of the scores?’.” Cronbach, supra note 14, at 125. 11 relevance or interpretability of scored results, “ [t]he validity of any inference can be determined in a variety of ways.” Principles at 4. “ [T]he various means of accumulating validity evidence have been grouped into categories called content-related, criterion-related, and construct-related evidence of validity.” S ta n d ar d s at 9.16 In the context of personnel selection procedures, content- validation involves a determination that the assessment instrument accurately reflects a representative sample of important aspects of job performance or job-required knowledge. See Principles at 19.17 Criterion-related validation involves a determination that the assessment instrument is predictive of, or significantly correlated with, important elements of job performance or work behavior. See P r in c ipl e s at 6.18 Construct validation 16 “ [I]nsofar as the courts have interpreted the test standards and . . . the Uniform Guidelines . . . to mean that content, cri terion, and construct validity are distinct forms of validation, those interpretations are oversimplified, if not erroneous.” Bersoff, Test ing and the Law, 36 Am. Psychologist 1047, 1051 (1981). These three approaches should be viewed as subsets within a unifying and common framework. See Guion, On Trinitarian Doctrines of Va lidity, 11 Prof. Psychology 385, 386 (1980); Messick, Test Ua- lidity and the Ethics of Assessment, 35 Am. Psychologist 1012 (1980) ; Tenopyr, Content-Construct Confusion, 30 Personnel Psychology 47 (1977). The three approaches may be discussed “separately only to acknowledge traditional presentations and avoid an abrupt departure from tradition.” Principles at 4. 17 “In general, content-related evidence demonstrates the degree to which the sample of items, tasks, or questions on a test are representative of some defined universe or domain of content.” Standards at 10. 18 “Criterion-related evidence demonstrates that test scores are systematically related to one or more outcome criteria . . . .” Standards at 11. Two designs for obtaining criteria-related evidence—predic tive and concurrent—can be distinguished. A predictive study obtains information about the accuracy with which early test data can be used to estimate criterion scores that will be ob tained in the future. A concurrent study serves the same pur poses, but obtains prediction and criterion information simul taneously. Id. 12 involves a determination that the assessment instrument accurately measures the degree to which individuals pos sess identifiable characteristics which have been deter mined to be important for successful job performance. See Principles at 25.18 Each of the validation strategies requires the employer to engage in an essential prerequisite activity. Through a process known as job analysis, the employer must clearly identify the most important components of suc cessful job performance. “Job analysis is essential to the development of a content-oriented procedure or to the justification of a construct important to job behavior.” Principles at 5. “In some situations, the major purpose of job analysis may be to provide information from which criterion measures may be developed.” Id. at 6. Satisfying this essential prerequisite requires an analysis of the job in question, and a clear articulation of the knowledge, skills and abilities (“KSAs” ), or other per sonal characteristics or behaviors the exhibition of which determine proficiency at that job.19 20 These data can be secured through judgments of job incumbents, their su pervisors, personnel specialists, the professional judg ment of job experts, and through training manuals, 19 “The evidence classed in the construct-related category focuses primarily on the test score as a measure of the psychological cate gory of interest . . . . Such characteristics are referred to as con structs because they are theoretical constructions about the nature of human behavior.” Standards at 9. This Court has summarized these approaches in Washington v. Davis, 426 U.S. at 247 n.13, as it understood them to be described in the 1966 version of the Standards. 20 Although a job analysis is crucial to all validation strategies it has been emphasized more clearly in the context of content- oriented validity. See, e.g., “Content validation should be based on a thorough and explicit definition of the content domain of interest. For job selection, classification, and promotion, the characterization of the domain should be based on job analysis.” Standard 10.4, Standards at 60-61. But it is required for criterion-related va lidity, see Standard 10.1. Standards at 60, and construct-related validity, see Standard 10.8, Standards at 61, as well. 13 job descriptions, and other written information. See Principles at 19-20; Schneider & Schmitt, supra note 12, at 47; Thompson & Thompson, Court Standards for Job Analysis in Test Validation, 35 Personnel Psychology 865 (1982). In addition, the relative importance of these KSAs must be determined. Finally, a close link between the assessment device and the identified job content or behavioral characteristic (construct) must be established. Standard 10.5, Standards at 61; see also Standard 10.8. Id.s In sum, then, the use of particular selection procedures by an employer reflects his or her implicit assumption that some important aspect of behavior on the job can be predicted from an individual’s scores or performance on the chosen selection procedure. The critical factor under lying this assumption is the accumulation of evidence or data to support an inference of the chosen procedure’s job-relatedness. This can be accomplished only through the various strategies of validation. Validation is no less applicable to subjective assess ment devices than to objective ones. In both cases, ac curate predictors of job performance are essential to as sist employers in selecting or promoting individuals who will best serve their needs, as well as to provide a method of personnel selection that inhibits consideration of non job-related factors such as an individual’s race. Indeed, several commentators have noted the emphasis placed by many courts and employers on objectivity, at the expense of validity, i.e., requiring the use of certain “neutral ap pearing objective tests to measure job performance, even where the validity of those criteria is clearly ques tionable. -■ Because of the role that validation plays in 21 22 21 Several books summarize job analysis procedures and discuss their relative utility in various situations. See, c.p ., S. Gael, J ob Analysis: A Guide to Assessing Work Activities (1983); E. Levine, Everything You Ever Wanted to Know about J ob Analysis (1983); E McCormick, J ob Analysis: Methods and Applications (1979). 22 See, e.g ., Kleiman & Durham, supra note 5, at 117-118. 14 enhancing the quality of selection procedures and reduc ing the potential for discrimination, an employer should be required to provide evidence of the validity of those procedures whether the employer chooses to use ones labeled as subjective or objective. C. To Reduce Sources of Bias the Validity of Each of the Selection Devices Used by Respondent Must and Can Be Established by Generally Accepted and Accessible Validation Strategies. Industrial psychologists routinely use the three strat egies for validating assessment devices described in Part 1(B) to validate both objective devices such as standard ized ability tests and interest inventories, and purely sub jective or multi-component devices such as interviews, performance appraisal ratings, constructed performance tasks, nonscored experience and biographical data intake sheets, and structured behavioral sample tests. See gen erally, e.g., Goldstein, supra note 6; Schneider & Schmitt, supra note 12. 1. The interview. The employment interview, the technique most heavily relied upon by respondent in this case, is probably more widely used than any other selection tool. See H. Hen- n em a n , D. Schwab, J. F ossum, & L. D yer, Personnel Human Resource Management (1980); Ulrich & Trumbo, The Selection Interview Since 1949, 64 Psy chology Bull. 100 (1965 •. However, because most em ployers are unaware of research which has determined which variables are reliably, validly, and uniquely as sessed in the selection interview, see Schmitt, Social and Situational Determinants of Interview Decisions: Impli cations for the Employment Interview, 29 Personnel Psychology 79, 97 (1976i, the employment interview is typically subject to interview bias of various types. See Arvey, Unfair Discrimination in the Employment Inter view: Legal and Psychological Aspects, 86 Psychology Bull. 736 (19821. The most commonly known bias is “halo effect,” where an interviewer may be unduly influ 15 enced by a single trait which colors his/her judgment of the employee’s other traits. See Anastasi, supra note 14, at 612. A related problem is “stereotyping” in which an employee is judged “based on his or her group membership [e.g., race] rather than on the basis of his or her unique characteristics” Schneider & Schmitt supra note 12, at 388-389. Another concern is the “simi- lar-to-me phenomenon” in which the interviewer adopts the attitude “I am wonderful and I have the following attitudes and opinions, so if candidates I interview have the same attitudes and opinions, they must also be won derful.” Id. at 389. “When combined with stereotyping, the similar-to-me phenomenon can be a potent deter minant of interviewer decision-making.” Id. Interviews can afford an opportunity for direct ob servation of samples of behavior, albeit limited, mani fested during the interview and serve to evoke life-history data, both of which can be important predictors of fu ture performance, if interviews are developed and con ducted using generally accepted standards. See Anastasi, supra note 14, at 610.23 Several recent studies, reviewed by noted scholars, show that interview judgments can be valid indicators of subsequent job performance. See -3 A variety of available sources discuss methods, applications, and effectiveness of interviewing, and research on the inter viewing process. Sec, e.g., W. Bingham, B. Moore & J. Gustad, How TO Interview (4th ed. 19591: R. Fear. The Evaluation Interview (2d ed. 1973) ; J. Matarazzo & A. Weins, The Inter view: Research on its Anatomy and Structure (1972); Arvey, Unfair Discrimination in the Employment Interview: Legal and Psychological Aspects, 86 Psychology Bull. 736 (1979) ; Dunnette & Borman, Personnel Selection and Classification Systems, 40 ANN. Rev. P sychology 47< (1979) ; Grant & Bray, Contributions of the Interview to Assessment of Management Personnel, 53 J. Applied Psychology 24 (1969); Schmitt, Social and Situational Deter minants of Interview Decisions: Implications for the Employment Interview, 29 Personnel Psychology 79 ( 1976) ; Ulrich & Trumbo, The Selection Interview Since 1919, 63 Psychology Bull 100 (1965). 16 Arvev & Campion, The Employment Interview: A Sum mary and Review of Recent Research, 35 PERSONNEL P sychology 281 (1982). These sources show that in terviews can be created that are valid and nondiscnmi- natory if interview questions are carefully linked to job analysis and performance criterion data. Interview validity is not alone sufficient, however. The validity of the interviewer must also be established. An “interview requires skill in data gathering and in data interpreting. An interview may lead to wrong decisions because important data were not elicited or because given data were inadequately or incorrectly interpreted. Anastasi, supra note 14, at 610-611. In this regard, in terviewer training is important. A structured interview guide will improve interviewer reliability and assist in removing any bias especially if the training occurs with applicants of gender and/or race different than that of the interviewer. With important interviews used to de termine hiring or promotion, it is very helpful if appli cants are seen by more than one interviewer, although if records are kept it is possible to identify those interview ers w'hose decisions are most reliable and valid and rely on those interviewers singly to make judgments. See Schneider & Schmitt, supra note 12, at 390-394; Hakel, Employment Interviewing in P ersonnel Management (K. Rowland & G. Ferris eds. 19821. jm As one example, researchers have described and tested an innovative but relatively simple and valid employment interview. Critical incidents, i.c.. reports by job incumbents or supervisors of situations in which especially effective or ineffective behavior is displayed, were converted into situational interview questions. The interviewer posed these situations to job applicants and asked them how they would behave. Each answer was rated independently by two or more interviewers on a five-point scale with end points on the scale provided by job experts to facilitate objective scoring. The process validily predicted future job performance for both women and blacks. See Latham. Saari, Pursell & Campion, The Situational Interview, 65 J. Applild Psychology 422 (1980;. 17 In sum, the use of employment interviews should be “preceded by a thorough analysis of the target job, the development of a structured set of questions based on the job analysis, and the development of behaviorally specific rating instruments by which to evaluate applicants.” Schneider & Schmitt, supra note 12, at 395. The assess ment of employees “should be maximally dependent on their personal characteristics and minimally dependent on who made the assessment . . . . Where non-test pre dictors like interviewer judgments are used, the fem ployer] should develop procedures that will minimize error resulting from differences between judges.” P rin ciples at 12. 2. Rating scales and other performance appraisals. Performance appraisal devices such as rating scales are widely used in employment settings, and were used by respondent in this case.25 “Rating scales differ from naturalistic observations in that data are accumulated casually and informally; they also involve interpretation and judgment, rather than simple recording of observa tions.” Anastasi, sujrra note 14, at 611. In contrast to interviews, however, “they typically cover a longer obser vation period and the information is obtained under more realistic conditions.” Id. Like interviews, rating scales are subject to a variety of sources of contamination or bias, including: (1) Op portunity bias which occurs if raters do not have the opportunity to observe the employee in situations in which the behavior to be rated could be manifested, but have the opportunity to do so with a competing employee; (2) halo effect, a tendency on the part of raters to be unduly influenced by a single favorable or unfavorable -r' In one survey, 899< of the companies studied reported using performance appraisals on a regular basis. Locher & Teel, Per formance Appraisal—A Surrey of Current Practices. 5G PERSON NEL J. 245 (1977). Of all performance appraisal techniques, the rating scale is by far the most ubiquitous. See Landy & Farr, Performance Rating. 87 P sychological Bull. 72, 73 (1980) [hereinafter Performance Rating]. 18 trait, which colors their judgment on the individual’s other traits; (3) error of central tendency, or the ten dency to place persons in the middle of the scale and to avoid extremes; and (4) leniency error, or the reluc tance of many raters to assign unfavorable ratings. Both latter errors reduce the effective width of the rating scale and make them less useful in distinguishing among individuals. See Anastasi, supra note 14, at 611-612; Cronbach, supra, note 14, at 509-511; Schneider & Schmitt, supra note 12, at 90-92; Goldstein, supra note 6, at 255. Most troubling in the context of this case is that scores on rating scales may be affected by the race of the rater and ratee. White raters have been found to assign sig nificantly higher ratings to white ratees than black ratees. These findings were noted in a comprehensive review of 74 studies involving 17,159 ratees in which the rater was white and 14 studies involving 2,420 ratees in which the rater was black. Race effects were more pronounced in real-life settings than in labo ratory settings and more likely when, as in this case, the proportion of blacks in the workforce was small. See Kraiger & Ford, A Meta-analysis of Ratee Race Effects in Performance Ratings, 69 J. A pplied Psychology 56 (1985).26 Psychologists have published a great deal of accessible lite ra tu re describing ra ting scale form ats which are use ful in ra ting employees and have been critiqued in the 26 These finding's are not universal nor do they imply that per formance appraisal systems are inherently discriminatory. One study, for example, found no race of rater effect in an indus trial setting which was highly racially integrated and where par ticipants in the study had been exposed to human relations train ing. See Schmidt & Johnson. Effect of Race on Peer Ratings in an Industrial Setting, 57 J. Applied Psychology 237 (1973). These mitigating factors are not true with regard to respondent. For a comprehensive review of the effects of rater and ratee characteris tics and the interaction of the two see Performance Rating, supra note 25, at 74-82. 19 context of Title VII requirements.27 * Many of these for mats would be significant improvements over the system used by respondent.2* Regardless of format, what does produce a superior scale is that it is the “result of psy chometric rigor in development and of some level of par ticipation of individuals representative of those who will eventually use the scales to make ratings . . . Perform ance Rating, sujrra. note 25, at 85. An essential aspect of the requirement for psychometric rigor is the job analysis. “The development of rating procedures should ordinarily be guided by job analyses 27 E.g., Cascio & Bernardin, Implications of Performance Ap praisal Litigation for Personnel Decisions, 34 Personnel Psy chology 211 (1981); Distefano, Prver, & Erffmeyer, Application of Content Validity Methods to the Development of a Job-Related Performance Rating Criterion, 36 Personnel Psychology 621 (1983); Feild & Holley, The Relationship of Performance Ap praisal System Characteristics to Verdicts in Select Employment Discrimination Cases, 25 Acad. Mgmt J. 392 (1982) ; Kleiman & Durham, supra note 5; Performance Rating, supra note 25. The dissemination of this information is so widespread that it now appears in popular literature designed for lay readers. See Rice, Spotlight on Employee Performance, 9 US Air 53 (August 1987). w Two of the most popular rating formats are the graphic rating scale and the behaviorally anchored rating scale (“BARS”). In the graphic rating scale’s most usual format, several dimensions to be rated are listed vertically and raters are then asked to make rating decisions along a horizontal 5 to 9 point scale. For example, if the dimension to be rated is “accuracy,” the scale may use a numerical or one-word verbal rating, e.g., 1 to 5 or “high” to “low” ; preferably, it may use a range of descriptions, e.g., at one end of the scale, would be “makes too many errors” ; at the other end, “almost never makes mistakes.” BARS uses dimensions derived by raters who would actually use the scale with different points on each dimension anchored by statements describing actual job be havior which would illustrate specific levels of performance, e.g., using “accuracy” in rating a bank teller, the statements could in clude a range from “makes frequent errors in totalling accounts at end of day” to “errors in totalling accounts are consistently rare.” See generally Schneider & Schmitt, supra note 12, at 101- 106. Neither scale format seems more useful than the other in practice. See Performance Rating, supra note 25, at 85. 20 if, for example, raters are expected to evaluate several different aspects of performance,” as in this case. P r in c iples at 10. Job analyses are also important if, as here, appraisal of past performance is used as a predictor of future performance. The use of ratings of past perform ance in one job to make promotion decisions for another position is only permissible if the ratings of past per formance are valid and the ratings of past performance are related to future performance. The latter requires a job analysis indicating the extent to which the two jobs overlap. See Cascio & Bernardin, Implications of Performance Appraisal Litigation for Personnel Deci- siojis, 34 P e r so n n e l P sychology 211, 217 (1981) .2® The usefulness of a rating scale is also highly depend ent on the skill of the rater. “Valid ratings cannot be made by someone who is either unfamiliar with the work of the ratee or lacks the skills necessary to accurately observe or rate the job behavior.” Kleiman & Durham, supra note 5, at 113. In these respects, “those in imme diate contact with the subject give superior information,” Cronbach, supra note 14. at 512 (in the case of employ ment settings, first level supervisors!, and those raters who have undergone training show increased “reliability and validity of ratings” and decreased errors in judg ment. Anastasi, supra note 14, at 612.29 30 29 See also Albemarle Paper Co. r. Moody. 422 U.S. 405. 431-433 (1975) where this Court, invoking the Uniform Guidelines (and crediting APA’s Standards) condemned as “materially defective” the employer’s validation study because its “subjective supervisorial rankings” used standards which were vague and ambiguous and failed to follow the Uniform Guidelines’ requirement for job analyses. 30 Sec also Standard 1.13, Standards at 16: “When criteria are composed of rater judgments, the degree of knowledge that raters have concerning ratee performance should be reported. If possible, the training and experience of the raters should be described” ; P rinciples at 10: “It may . . . be necessary to train raters in the observation and evaluation of performance. Further, supervisors should be expected to be familiar enough with the demands of the job to evaluate overall performance.” The utility of rater training 21 In sum, rating scales will conform to legal and psycho metric requirements if the appraisal system is based on a job analysis, contains clearly defined dimensions of job performance rather than vague, global measures or ab stract trait names, is behaviorally based so that all rat ings can be supported by objective, observable evidence, and if the raters are in the position to observe the be haviors to be rated and are trained to reduce sources of bias, contamination, or other rating errors. 3. Experience requirements. The use of past experience to make judgments about future performance, as in this case, is one aspect of a recognized selection device called biographical inventory technique or, more commonly, “biodata.” See Owens, Background Data in H andbook of I n d u st r ia l a n d Or g a n iza tio n a l P sychology (M . Dunnette ed. 19761. When used properly, biographical inventories which include prior experience are “especially appropriate for assessing the qualifications of women and minority groups.” Anastasi, supra note 14, at 616. To serve this purpose, however, the biodata inventory must focus on “specific, job-relevant past achievements, rather than on the pas sive exposure implied by the customary education and experience records.” Id. Respondent’s use of an experi ence criterion falls far short of this standard. In fact, use of biodata is relatively likely to produce adverse impact if biodata items are not chosen carefully. See G. D reher & p . S a c k e t t . P erspectives on S t a f f in g a n d S election (19831. A number of comprehensive in reducing rating errors and minimizing bias, as well as providing employers with useful techniques for doing so. is demonstrated in. e.p.. Bernardin & Pence, Effects of Rater Training, G5 J. Applied P sychology 60 C1980 > ; Borman. Format and Training Effects on Rating Accuracy and Rating Errors, 61 J. Applied Psychology 410 f 1979) ; Ivancevich. Longitudinal Study of the Effects of Rater Training on Psychometric Errors in Ratings. 64 J. Applied PSY CHOLOGY 502 C1079) ; Latham. Wexlev & Purse]]. Training Man agers to Minimize Rating Errors in the Observation of Behavior, 60 J. Applied Psychology 550 fl975i. Sec generally Goldstein,’ supra note 6, at 254-259. 22 validity studies have been conducted on the use of bio data as a selection device, concluding that it has incon sistent validity even in a form more comprehensive than used by respondent. See, e.g., Reilly & Chao, Validity and Fairness of Some Alternative Employee Selection Procedures, 35 Personnel Psychology 1 (1982).31 However, as with all selection devices, the most reason able and justifiable approach in using biodata is to base the choice of items on a well done job analysis, matching the items to the knowledge, skill, and ability require ments of the job description. See Pace & Schoenfeldt, Legal Concerns in the Use of Weighted Applications, 30 P ersonnel Psychology, 159 (1977). Past experience alone, without the careful selection of both logically and empirically justified life-history questions, is unaccept able as a selection device.3- II. THE SUBJECTIVE SELECTION PROCEDURES USED BY RESPONDENT FAIL TO MEET GEN ERALLY ACCEPTED STANDARDS AND APPEAR TO HAVE BEEN APPLIED WITHOUT ANY EVI DENCE THAT THEY ARE VALID FOR THE IN FERENCES DRAWN FROM THEM. When reviewed in the context of the principles and studies described in Part I, it is clear that the assessment * 1 * 3 * * & 31 Compare Reilly & Chao, Validity and Fairness of Some, Alter native Employee Selection Procedures, 35 Personnel Psychology 1 (1982) (satisfactory reliability) with Korman, The Prediction of Managerial Performance: A Review, 21 PERSONNEL PSYCHOL OGY 295 (1968) (finding biodata to have lower validity than other predictors for predicting managerial performance ). 3- Sophisticated research on the use of valid biodata is available to inform the interested employer. See, e.g., Brush & Owens, Implementation and Evaluation for an Assessment Classification Model for Manpower Utilization, 32 Personnel PSYCHOLOGY 369 (1979); Schoenfeldt. Utilization of Manpower: Development and Evaluation of Assessment-Classification Model for Matching In dividuals with Jobs, 59 J. Applied PSYCHOLOGY 583 (1974) ; Owens & Schoenfeldt, Toward a Classification of Persons, 46 J. Applied Psychology 329 (1979). See generally Schneider & Schmitt, supra note 12, at 378-382. 23 devices used by respondent in this case to select em ployees, including petitioner, for promotion to the posi tion of teller supervisor are distressingly inadequate. They have been used to deny promotion to a member of a protected class without any evidence that they were developed, used, and applied in a way consistent with generally accepted professional standards. There is absolutely no evidence that the procedures were subjected to any of the validation strategies available to respond ent, even in rudimentary form. It is not unlawful per se for employers to use so- called “subjective” selection procedures. See Part I ( A). But, assuming that the use of subjective criteria was ap propriate for the position of supervisor of tellers, re spondent failed to perform a job analysis for the posi tion to identify more accurately the knowledges, skills, and abilities, which are desirable for successful per formance of the job.33 34 There is no evidence that respond ent secured accurate and thorough information about the job from job incumbents, their supervisors, personnel specialists, training manuals, job descriptions, or actual observation by trained observers. See P rinciples at 5-6, 19-24; Schneider & Schmitt, supra note 12, at 47-50. With regard to the selection procedures themselves, there are crucial infirmities in each of them. With re gard to the interview, there is no evidence that the inter viewer, in this case a white male, had any training in conducting interviews, a requirement that is especially important when the interviewee is of a different race and gender than the interviewer.84 Nor was more than one 33 As noted in Part I. job analysis is a critical first step in estab lishing the usefulness of any selection procedure. For a review of 26 employment discrimination cases yielding a helpful summary of requirements for judicially-approved job analyses, arc Thompson & Thompson, Court Standards for Job Analysis in Test Validation, 35 Personnel Psychology 865 (1982). 34 “The training of interviewers especially with possible appli cants of different race or sex may increase ‘their ability to relate’ . . . .” Schneider & Schmitt, supra note 12, at 386: sec Schmitt. Social and Situational Determinants of Interview Decisions: Impli 24 interviewer systematically involved in decisionmaking. There is also no evidence that the interviews were struc tured so as to improve reliability and reduce biasing er rors,35 * nor is there any evidence that the nature of the interview7 or the questions asked had any empirical, logi cal, or theoretical connection with the position for which petitioner was considered.30 With regard to the rating scales, it was especially im portant for respondent to have offered some evidence for the validity of its supervisor performance appraisal, as the use of numerical values gives it facial validity and the appearance of objectivity. But, when judged accord ing to the criteria discussed in Part IfBi & (C) (2) the rating scale is deplorable. The qualities measured on the rating scale are neither unambiguously defined nor is there any demonstrable correlation between many of the criteria listed in the scale and successful job performance.37 For example, all cations for the Employment Interview, 29 P ersonnel Psychology 79, 97 (1976). See also Principles at 33: “AH persons within the organizations who have responsibilities related to the use of employment tests and related predictors should be qualified through appropriate training to carry out their responsibilities.” 35 “Use of a structured interview guide will improve interviewer reliability.” Schneider & Schmitt, supra note 12, at 386. as ‘‘Predictor variables should be chosen for which there is an empirical, logical, or theoretical foundation.” P rinciples at 11. 37 The qualities rated were accuracy of work, alertness, per sonal appearance, supervisor-coworker relations, quantity of work, physical fitness, attendance, dependability, stability (“the ability to withstand pressure and remain calm in most situations” ), drive (“ambition”), friendliness and courtesy, and job knowledge. The qualities were variously rated on a scale from 0 to 7-10. See Watson v. Fort Worth Bank &■ Trust, 798 F.2d 791, 812 n.26 (5th Cir. 1986 ' (Goldberg, J.. dissenting). “Few of these categories have much objective content. For example, ‘personal appearance,’ ‘drive,’ and ‘friendliness and courtesy’ are clearly subjective on their face. . The rating system is also subjective: [e.g..] 0-1 ‘does not meet minimum requirement . . .; 7-8, ‘superior work production record.’ This type of subjective measurement lends itself to discriminatory bias, be it conscious or unconscious.” Id. 25 but two of the rating criteria used by respondent are totally undefined and the two that have purported defini tions (stability and drive) are only vaguely defined; none of qualities assessed have anchoring or endpoint defini tions, as is customary in generally accepted graphic or BARS scales.3* The scale thus failed to use clearly de fined individual components or dimensions of job per formance, in contrast to undefined global measures, e.g., “neat and clean in appearance’’ vs. “personal appear ance.” 3!> Similarly, it did not use behaviorally based performance dimensions that could be verified by objec tive, observable evidence, e.g., “knows how to check ac count balance vs. “job knowiedge.” 38 * 40 The failure to conduct even a rudimentary job analysis further undermines the rating scale’s validity. It is far from clear how criteria such as “physical fitness,” meas ure an individual’s ability to supervise tellers. The constructs or behavior traits identified by respondent such as “drive” or “dependability” could be validated for use in promoting individuals to supervisory teller posi tions if demonstrated to be job-related and assessed re liably from the performance appraisal. Although the use of such “abstract trait names” is not advised unless the traits can be defined in terms of observable behaviors,41 it may be necessary to measure such personality con structs for certain jobs. If certain traits or constructs are deemed important enough to influence personnel selec tion, however, they are important enough to measure validly: Knowing whether a construct is measured validly re quires, if not a theory, at least some fairly well arti culated ideas about what is being measured, what a 38 Sec supra note 28. 3>l Cascio & Bernardin. Implications of Performance Appraisal Litipatiori for Personnel Decisions, 34 PERSONNEL PSYCHOLOGY 211, 212 (1981) [hereinafter Cascio & Bernardin]. *« Id. 41 Cascio & Bernardin, supra note 39, at 212. 26 measure of the construct should reasonably be ex pected to be related to and, perhaps more impor tantly, what it should not be related to. . . . This view of constructs and construct validity implies two aspects of a construct-related strategy for developing evidence to judge the job relatedness of a selection procedure. The first is evidence that the construct is indeed important for job performance. . . . Ordi narily, a job analysis can provide a part of the basis for identifying and defining constructs which are im portant to job performance. Clarity of the articula tion of the meaning and the nature of the construct, and well-informed expert judgment that a logical relationship exists between the nature of the con struct and identifiable demands of the job is essen tial. The second is evidence that the instrument used as a selection procedure is a valid measure of the construct and not of other constructs. P r in c ipl e s at 25; see also S ta n d ar d s 9-10. Finally, there is no indication that the supervisors who performed the ratings had any training in assessing re sponses, avoiding the variety of sources of contamination of bias, especially when there is significant evidence that scores on rating scales may be affected by the race of rater and ratee. See Part 1(C) (2). With regard to experience requirements, respondent failed to show that the lack of prior experience upon which it based its failure to promote petitioner in her first three attempts was related to the position for which she applied. It could very well be that her experience as a teller would be predictive of her success as a supervisor of tellers, but without an analysis of both positions, that assumption could very well be faulty. In addition, re spondent idiosyncratic-ally employed the criterion, using her alleged lack of experience to justify its denying her promotion in three instances and then ignored the cri terion to deny her promotion in a fourth instance, when it promoted a competing applicant with less experience. Perhaps most importantly, respondent failed to use prior 27 experience as one part of a carefully selected and logi cally and empirically justified comprehensive biographi cal data inventory. Although respondent may have fewer resources to de vote to assessment device development or validation than larger organizations, that fact does not excuse the absence of any attempt to support its use of the selection devices employed in this case: Where resources or sample sizes are limited, the criterion-related evidence of validity and content- related validation judgments obtained on similar jobs in other settings and the strength of the construct- related evidence of validity already generated by the [particular instrument] become particularly impor tant. Employers should not be precluded from using a [particular instrument] if it can be demonstrated that th [at instrument] has generated a significant record of validity in similar job settings for highly similar people[,] or that it is otherwise appropriate to generalize from other applications. S tandards at 59 .4- Respondent neglected to conduct even this inexpensive inquiry. There is absolutely no legiti mate reason why respondent failed to conduct even crude validity studies of the selection devices it used to evalu ate petitioner or, at the very least, to investigate the avail ability of existing sources of valid selection devices. These failures are all the more disturbing considering its poor record in hiring and promoting minority employees.4;i 42 43 42 Researchers and employers are encouraged to conduct coopera tive studies when adequate data . . . are not available.” Principles at: The Uniform Guidelines also clearly permit exceptions to the general requirement that employers validate their own procedures. Sec 29 C.F.R. §§ 1607.6-.8. 43 ‘‘[Pllaintiff presented 'significant proof’ that the bank opera- ated under a 'general policy of discrimination’.” Watson v. Fort Worth Sank <£- Trust, 798 F.2d 791. 807 (5th Cir. 198G > (Goldberg, J., dissenting). See. id. at 810-814 for supporting statistical data. 28 III. THE FAILURE TO REQUIRE THAT SUBJECTIVE SELECTION DEVICES HAVE DEMONSTRABLE VALIDITY WOULD UNDERMINE THE PURPOSES OF TITLE VII. The underlying goal of Title VII of the Civil Rights Act of 1964 was the “eliminat [ion] . . . [of] discrimina tion in employment” based on race, color, religion, sex or national origin in all of its forms. H.R. Rep. No. 914, 88 Cong., 2d Sess., reprinted in 1964 U.S. Code Cong. & Ad. News 2391, 2401. Consistent with that goal, Title VII prohibits employers from discriminating in employ ment decisions based on such impermissible classifications. See 42 U.S.C. § 2000e-2. However, because “Congress did not intend by Title VII . . . to guarantee a job to every person regardless of his qualifications,” Griggs v. Duke Power Co., 401 U.S. 424, 430 <1971), it authorized em ployers to distinguish among individuals for selection or promotion purposes based “upon the results of any pro fessionally developed ability test provided that such tests, its administration or action upon the results is not de signed, intended or used to discriminate because of race, color, religion, sex or national origin.” 42 U.S.C. 2000e- 2(h).44 In permitting the appropriate use of “professionally developed ability tests” as a basis for selecting and pro moting individuals, the drafters of the ultimate language of Title VII stressed the importance of demonstrating the relationship and relevance of the selection procedure to job qualifications. See 110 Cong. Rec. 7247 (1964). Recognizing the “u [tility]” of “testing or measuring procedures,” this Court has also stressed Congress’ intent to “forbid [] . . . giving these devices and mechanisms 44 “[T]he Act does not command that any person be hired simply because he was formerly the subject of discrimination, or because he is a member of a minority group . . . . What is required by Congress is the removal of artificial, arbitrary, and unnecessary barriers to employment when the barriers operate invidiously to discriminate on the basis of racial or other impermissible factors.” Griggs, 401 U.S. at 431. 29 controlling force unless they are demonstrably a reason able measure of job performance.” Griggs, 401 U.S. at 436. “What Congress . . . commanded is that any tests used must measure the person for the job and not the person in the abstract.” Id. Proponents of Title VII opposed authorizing the use of “professionally developed ability tests” without regard to their ability to predict performance of the particular job in question. 110 Cong. Rec. at 13504. The use of tests which, although “professionally developed” bear no relation to the job for which they are being used to assess individuals, was clearly recognized as a potential means of covert “ [d] iscrimination [,] . . . under the guise of compliance with the statute.” Id. Assessment devices which have not been shown to be job-related, or otherwise predictive of performance of a particular job, cannot justify discriminatory employment practices. The crucial public policy goals of Title VII would be thwarted if em ployers could rebut claims of discrimination simply by pointing to the results of unvalidated assessment devices, whether subjective or objective. Indeed, such unvalidated results may well reflect precisely the discrimination Congress sought to eliminate in Title VII. In light of the “nondiscrimination” objectives of Title VII and the demonstrated ability of professionals to vali date subjective assessment devices, however, there is no principled reason to treat objective and subjective devices differently in imposing a validation requirement, regard less of whether a plaintiff proceeds under disparate im pact or disparate treatment claim.4" Indeed, permitting the use of unvalidated subjective assessment devices while requiring objective devices to be validated provides a ready mechanism for covert discrimination for em ployers seeking to avoid the constraints of Title VII. Validation does require the expenditure of both time and money by an employer. But. as amicus has demonstrated, 4r' Texas Dep’t of Comm. Affair? r. Burdinc, 450 U.S. 248 (1981) ; McDonnell Douglas Carp. v. Green, 411 U.S. 792 (1973). there are a number of readily available techniques for developing, adopting, and validating both objective and subjective devices, and both professional and legal stand ards allow the use of already developed selection devices. Thus, when one balances the relative costs of validation to the employer against the costs of eroding the protec tions provided by Title VII and the damage to society of perpetuating the vestiges of discrimination, the outcome clearly favors the requirement that employers use pyscho- metrically sound and job-relevant selection devices. CONCLUSION For the foregoing reasons, amicus respectfully requests that this Court reverse the decision of the Court of Ap peals for the Fifth Circuit insofar as it releases em ployers from their obligation to show that the selection devices they use to make employment decisions are valid. Respectfully submitted, 3 0 D onald N. B ersoff (Counsel of Record) Laurel Pyke Malson Donald B. Verrilli, Jr. E n n is F riedman & B ersoff 1200 - 17th Street, N.W., Suite 400 Washington, D.C. 20036 (202) 775-8100 Attorneys for Amicus Curiae American Psychological Association September 14, 1987