Ortiz Samuel O, Cehelyk Sarah K
Department of Psychology, St. John's University, Jamaica, NY 11439, USA.
J Intell. 2023 Dec 31;12(1):3. doi: 10.3390/jintelligence12010003.
A fundamental concept in psychological and intelligence testing involves the assumption of comparability in which performance on a test is compared to a normative standard derived from prior testing on individuals who are comparable to the examinee. When evaluating cognitive abilities, the primary variable used for establishing comparability and, in turn, validity is age, given that intellectual abilities develop largely as a function of general physical growth and neuromaturation. When an individual has been raised only in the language of the test, language development is effectively controlled by age. For example, when measuring vocabulary, a 12-year-old will be compared only to other 12-year-olds, all of whom have been learning the language of the test for approximately 12 years-hence, they remain comparable. The same cannot be said when measuring the same or other abilities in a 12-year-old who has been raised only in a different language or raised partly with a different language and partly with the language of the test. In such cases, a 12-year-old may have been learning the language of the test at some point shortly after birth, or they might have just begun learning the language a week ago. Their respective development in the language of the test thus varies considerably, and it can no longer be assumed that they are comparable in this respect to others simply because they are of the same age. Psychologists noted early on that language differences could affect test performance, but it was viewed mostly as an issue regarding basic comprehension. Early efforts were made to address this issue, which typically involved simplification of the instructions or reliance on mostly nonverbal methods of administration and measurement. Other procedures that followed included working around language via test modifications or alterations (e.g., use of an interpreter), testing in the dominant language, or use of tests translated into other languages. None of these approaches, however, have succeeded in establishing validity and fairness in the testing of multilinguals, primarily because they fail to recognize that language difference is not the same as language development, much like cultural difference is not the same as acquisition of acculturative knowledge. Current research demonstrates that the test performance of multilinguals is moderated primarily by the amount of exposure to and development in the language of the test. Moreover, language development, specifically receptive vocabulary, accounts for more variance in test performance than age or any other variable. There is further evidence that when the influence of differential language development is examined and controlled, historical attributions to race-based performance disappear. Advances in fairness in the testing of multilinguals rest on true peer comparisons that control for differences in language development within and among multilinguals. The BESA and the Ortiz PVAT are the only two examples where norms have been created that control for both age and degree of development in the language(s) of the test. Together, they provide a blueprint for future tests and test construction wherein the creation of true peer norms is possible and, when done correctly, exhibits significant influence in equalizing test performance across diverse groups, irrespective of racial/ethnic background or language development. Current research demonstrates convincingly that with deliberate and careful attention to differences that exist, not only between monolinguals and multilinguals of the same age but also among multilinguals themselves, tests can be developed to support claims of validity and fairness for use with individuals who were in fact not raised exclusively in the language or the culture of the test.
心理和智力测试中的一个基本概念涉及可比性假设,即把一个测试的表现与从先前对与受测者可比的个体进行测试得出的规范标准相比较。在评估认知能力时,用于确立可比性进而确立有效性的主要变量是年龄,因为智力能力在很大程度上是一般身体成长和神经成熟的函数。当一个人仅在测试语言环境中成长时,语言发展实际上由年龄控制。例如,在测量词汇量时,一个12岁的孩子只会与其他12岁的孩子相比较,他们所有人都已经学习测试语言约12年了——因此,他们仍然具有可比性。然而,当测量一个仅在另一种语言环境中成长或部分在另一种语言环境中、部分在测试语言环境中成长的12岁孩子的相同或其他能力时,情况就不同了。在这种情况下,一个12岁的孩子可能在出生后不久就开始在某个时候学习测试语言,或者他们可能一周前才刚刚开始学习这种语言。因此,他们在测试语言方面的各自发展差异很大,不能仅仅因为他们年龄相同就假定他们在这方面与其他人具有可比性。心理学家很早就注意到语言差异会影响测试表现,但这主要被视为一个关于基本理解的问题。早期曾努力解决这个问题,通常包括简化指导语或主要依赖非语言的施测和测量方法。随后的其他程序包括通过测试修改或变更来绕过语言问题(例如,使用口译员)、用主导语言进行测试或使用翻译成其他语言的测试。然而,这些方法都没有成功地在多语言者测试中确立有效性和公平性,主要是因为它们没有认识到语言差异与语言发展不同,就像文化差异与文化适应知识的习得不同一样。当前的研究表明,多语言者的测试表现主要受测试语言的接触量和发展程度的调节。此外,语言发展,特别是接受性词汇,在测试表现中所占的方差比年龄或任何其他变量都要大。还有进一步证据表明,当考察并控制不同语言发展的影响时,基于种族的表现的历史归因就会消失。多语言者测试公平性的进展取决于真正的同龄人比较,这种比较要控制多语言者内部和之间的语言发展差异。BESA和奥尔蒂斯PVAT是仅有的两个例子,在那里已经创建了控制测试语言的年龄和发展程度的常模。它们共同为未来的测试和测试构建提供了一个蓝图,在这个蓝图中可以创建真正的同龄人常模,并且如果做得正确,在使不同群体(无论种族/民族背景或语言发展如何)的测试表现均衡方面会产生重大影响。当前的研究令人信服地表明,只要刻意并仔细关注存在的差异,不仅是同年龄的单语言者和多语言者之间的差异,还有多语言者自身之间的差异,就可以开发出支持有效性和公平性主张的测试,用于那些实际上并非仅在测试语言或文化环境中成长的个体。