Psychological Testing

Testing Intelligence:

[PURPOSES AND METHODS]

Although, as we have seen , theorists and researchers are still working on a definition of intelligence, psychologists, educators, and others who face a very real need to measure intelligence in some way continue to study existing tests and to explore ways to improve their ability to provide us with useful information. Why do we need to measure intelligence? There are three primary purposes in intelligence testing : predicting academic performance, predicting performance on the job, and assessing general adjustment and health

The earliest intelligence tests were designed to meet the first of these goals, and most existing intelligence tests, such as the Binet and Wechsler scales, predict performance in school better than they predict anything else. Predicting how well a person will succeed at a job is the second goal of intelligence testing, and according to Gottfredson (1997), such measures are the most powerful predictors of over all work performance. A third use of intelligence testing is in assessing people's general adjustment and health. The Binet and Wechsler tests that we discuss in this section can detect signs of neurological problems, mental retardation, and emotional distress in children as well as adults, and the Bayley scales and other infant tests are useful in assessing developmental progress in infants as well as neurological disorders or mental retardation.

Unfortunately, traditional tests do not make predictions as accurately for some groups in our society as for others. Many critics, for example, have pointed out that these tests often require knowledge that children with fewer advantages are unlikely to have and that thus, traditional tests may unfairly classify some people as less intelligent than they actually are. For some years researchers have been attempting to develop what are known as culture-fair tests, that is, tests that attempt to exclude or minimize the presence of culturally biased content that could prejudice test takers' responses. The raven progressive matrices test which requires people to identify, distinguish, and match patterns of varying complexity, and the Kaufman test we discuss shortly are such tests.

We begin this section with a brief discussion of the Bayley scales and then examine the two most widely used sets of traditional tests – the Stanford – Binet tests and the Wechsler scales. Neither of these tests is culture fair, and both tests measure the products of intelligence, that is, actual information. Next we examine the relatively new, Kaufman assessment measure, which attempts both to be culture –fair and to measure the processes by which people acquire information and solve problems we then turn to the methodology of test construction, including the ways in which psychologist s develop norms for test scoring and the kinds of procedures they considering the relative stability of intelligence as well as what factors may effect changes in intelligence over time.

Measuring infant intelligence

The Bayley scales of infant development (Bayley, 1969, 1993) are probably the best known and most widely used of all infant development tests. Because these tests were designed to be used with the very young, their nonverbal test items were chosen for their ability to measure specific developmental milestones. The Bayley scales are used with infants and children between 1 and 3 years of age, and they are generally used to assess children who are suspected to be at risk for abnormal development. For example, the Bayley mental scale includes such things as looking for a hidden object and naming pictures, whereas the motor scale includes such items as grasping ability and jumping skills in a third part of the test, the examiner observes the child's behavior, making notes about such things as sociability and displays of fear

The growing emphasis on the processes of intelligence has led some investigators to explore ways of assessing processing skills in the very young as well as to devise measures that are culture – fair . box 11-1 discusses one of these newer tests, the Fagan test of infant intelligence, which is capable not only of assessing process but of eliciting similar performance from infants of different cultures. The test also correlates well with measures of intelligence in children of older ages. Such predictive value not only can tell us something about the relative stability of intelligence, a topic we discuss later in this section, but also can serve a diagnostic purpose, indicating the need for special assistance for a child at risk for less than adequate cognitive development (Fagan, 1992).

The Stanford – Binet test

Widely used by psychologists both in schools and in health settings, the Stanford Binet test is the modern version of the test devised in the early 1900s by Binet and Simon at the request of the Paris school system. At that time because of new compulsory education laws, the city's schools were overcrowded, and school administrators wanted to identify children who were unable to learn in traditional classroom settings, fearing that teachers might judge these children unfairly, school officials proposed giving them the opportunity for special education. Binet and Simon, who believed that intelligence was malleable and that children's academic performance could be improved with special programs, took an innovative approach to the construction of their test (Binet, 1909/1973; Siegler, 1992). Critical of earlier psychologists, who had tried to assess intelligence by measuring simple sensory or motor responses, Binet and Simon asserted that to differentiate among individuals, one had to sample higher mental function such as

Child psychology in action

The Fagan test of infant intelligence

All mothers and fathers know that their babies are destined to win Nobel prizes or to become great leaders, but is it really possible to test intelligence in very young children? The Fagan test of infant intelligence was built on the notion that infants display their intelligence in their capable use of such processes as encoding the attributes of objects, seeing similarities and differences between objects, forming mental representations, and retrieving those representations (Fagan, 1992). Of course we cannot see an infant's selective attention to novelty, the primary basis of the Fagan test, but we can infer it from infants behavior. Suppose, for example , that we show a 5-month old a red diamond shape; next we show the infant a green diamond and a green square. Typically, the infant will show a preference for the square, indicating that she has processed the element of from the original color form compound and is now interested in the new form. (Remember our discussion in chapter 5 of habituation and the infant's tendency to attend to what is novel in her environment)

The Fagan test estimates an infant's intelligence by measuring the amount of time the infant spends looking at a new object compared with the time spends looking at a familiar object (Fagan et al , 1991) using a set of 20 photographs of human faces, arranged in pairs, the examiner begins by showing a baby one photograph of the first pair, for 20 seconds. Then the examiner pairs that photograph with its mate, showing the baby the two photos together for 5 seconds, and then again for another 5 seconds, this time reversing the two photos left to right (to avoid any tendency for the infant to choose one side). The score the infant receives is made up of the total time he spends looking at the novel photograph throughout presentation of all the 10 pairs.

To determine whether infants from different cultures would be equally adept at this task, Fagan and his colleagues administered the test to groups of European American and African American babies in the United States, a group of infants in Bahrain (bordering Saudi Arabia), and a group of infants in Kampala, Uganda. Within each cultural group infants of four ages - 6, 8, 18, and 24 months – were tested. The researchers found that there were practically no differences between the average scores obtained by the nearly 200 infants in these groups:

European American babies scored an average of 58.8 African American 59.1 Bahrainans 59.6 and Ugandans 58.4. according to Fagan (1992), the most parsimonious explanation for these findings is that differences in IQ between blacks and whites spring from differences in cultural exposure to information (that) influences scores in the domain of cultural knowledge tested (p.85).

Interestingly, among these infants, 54 percent of European Americans, 49 percent of African Americans, and 26 percent of Bahrainians were considered to be physically at risk – that is, they had spent five or more days in a neonatal intensive care unit. Moreover, among the Ugandan babies, 62 percent had mothers who were HIV positive. The presence of these potentially negative influences may add significance to the authors statement that there appeared to be no differences in intelligence among the four group.

Fagan and his associates (Fagan , 1984) have demonstrated the Fagan test's ability to predict intelligence scores at a later age, a quality that greatly enhances the potential usefulness of the test (see also text discussion of the predictive value of infant testing). In fact, a primary goal of the test is to differentiate normal from cognitively deficient infants. According to Fagan (1992), the test has been shown highly sensitive to later evidence of mental retardation, identifying some 85 percent of children who later obtained low IQ scores.

Binet originated the concept of mental age, which is an index of a child's actual performance level as contrasted with her true age. Thus, if a 6-year – old child gets as many items correct as the average 7-year-old , the 6-year – olds mental age is 7; she performs as well as a child 7 years old. The mental age concept was later superseded by the intelligence quoting for which the German psychologist William stern devised the following form:

Where IQ equals mental age MA) divided by chronological age (CA), multiplied by 100. thus, if a child's mental age equaled her chronological age, she would be performing like an average child of her true age, and her IQ would be 100. if her performance were superior to other children her age , her IQ would be above 100; if it were inferior, her IQ would be less than 100. Today's Stanford-Binet test, a revision of the Binet-Simon measure, has been designed to include items that tap other than language and math skills, such as putting together jigsaw puzzles and making designs with blocks, but the extent to which academic experience influences the latter types of tasks is not completely clear. As we will see next, the Wechsler scales have placed more emphasis on tests that assess so-called performance skills.

The Wechsler Scales

The Wechsler intelligence scales, developed by David Wechsler (1952, 1958), include the Wechsler Adult Intelligence Scale (WAIS), the Wechsler Intelligence Scale for Children (WISC and the Wechsler Preschool and Primary Scale of Intelligence (WPPSI). Although these tests show the influence of Binets tests. Wechsler designed them specifically to yield separate verbal and performance IQ scores as well as a combined, full scale IQ score. The descriptions of the WISC subtests shown in table 11- highlight that performance items are some what less likely to be influenced by formal education or cultural factors. As a result, a child who has a specific learning problem such as a difficulty with language may do quite well on these items, even if he performs poorly on the verbal subtests. Children who come from homes that lack some of the advantages other children enjoy may also be more successful on these performance tests.

Rather than use mental age as a basis for estimating intelligence, Wechsler created the deviation IQ, which is a number that reflects the higher, lower, or similar position of the test taker's score in relation to the score obtained by an average child of the test taker's age. How is this different from the concept of mental age? The deviation IQ scoring system, which like the Binet IQ takes 100 as an average score, is based on extensive testing of people of different ages in many parts of the United States and on the statistical computation of mean scores for each age group. In computing these average scores, psychologists use a statistic called the standard deviation to signal the extent to which no average scores deviate from the norm. as a result, an individual's score may be at the mean, or it may be one or more standard deviations above or below the mean.

The Kaufman assessment battery for children

The first test to include a specific focus on processing skills, the Kaufman assessment battery for children (K-ABC), is based directly on cognitive development research (Kaufman & Kaufman, 1983). The test measures several types of information – processing skills grouped into two categories: sequential processing (solving problems in a step – by – step fashion) and simultaneous processing (examining and integrating a wide variety of materials in the solution of a problem) the test also assesses achievement in academic subjects, like vocabulary and arithmetic, yet its test items (many nonverbal) are intended to be culture-fair. In fact, the designers of this test used a wide and representative sample of many American cultural and socio-economic groups in establishing norms for the test.

An interesting innovation is that if a child fails early items on a subscale, K-ABC examiners teach the child how to complete these items before they administer the rest of the subtest. According to the designers of the test, this ensures that no child who is capable of learning an unfamiliar task receives a failing score in it. The Kaufman test has been criticized for offering only a limited range of items that tap information – processing function, such as short-term memory (Conoley, 1990); Sternberg, 1985) critics have argued that the test should focus more on problem solving strategies, a central component of the information – processing approach to intelligence. Nevertheless, the Kaufman test is a creative approach to the measurement of intelligence that applies important insights from the information – processing approach to the study of intelligence and intelligence testing. ==