The measurement of intelligence
Chapter 25
NATURE OF THE STANFORD REVISION AND EXTENSION
Although the Binet scale quickly demonstrated its value as an instrument for the classification of mentally-retarded and otherwise exceptional children, it had, nevertheless, several imperfections which greatly limited its usefulness. There was a dearth of tests at the higher mental levels, the procedure was so inadequately defined that needless disagreement came about in the interpretation of data, and so many of the tests were misplaced as to make the results of an examination more or less misleading, particularly in the case of very young subjects and those near the adult level. It was for the purpose of correcting these and certain other faults that the Stanford investigation was planned.[15]
[15] The writer wishes to acknowledge his very great indebtedness to Miss Grace Lyman, Dr. George Ordahl, Dr. Louise Ellison Ordahl, Miss Neva Galbreath, Mr. Wilford Talbert, Dr. J. Harold Williams, Mr. Herbert E. Knollin, and Miss Irene Cuneo for their coöperation in making the tests on which the Stanford revision is chiefly based. Without their loyal assistance the investigation could not have been carried through.
Grateful acknowledgment is also made to the many public school teachers and principals for their generous and invaluable coöperation in furnishing subjects for the tests, and in supplying, sometimes at considerable cost of labor, the supplementary information which was called for regarding the pupils tested. Their contribution was made in the interest of educational science, and without expectation of personal benefits of any kind. Their professional spirit cannot be too highly commended.
SOURCES OF DATA. Our revision is the result of several years of work, and involved the examination of approximately 2300 subjects, including 1700 normal children, 200 defective and superior children, and more than 400 adults.
Tests of 400 of the 1700 normal children had been made by Childs and Terman in 1910-11, and of 300 children by Trost, Waddle, and Terman in 1911-12. For various reasons, however, the results of these tests did not furnish satisfactory data for a thoroughgoing revision of the scale. Accordingly a new investigation was undertaken, somewhat more extensive than the others, and more carefully planned. Its main features may be described as follows:--
1. The first step was to assemble as nearly as possible all the results which had been secured for each test of the scale by all the workers of all countries. The result was a large sheet of tabulated data for each individual test, including percentages passing the test at various ages, conditions under which the results were secured, method of procedure, etc. After a comparative study of these data, and in the light of results we had ourselves secured, a provisional arrangement of the tests was prepared for try-out.
2. In addition to the tests of the original Binet scale, 40 additional tests were included for try-out. This, it was expected, would make possible the elimination of some of the least satisfactory tests, and at the same time permit the addition of enough new ones to give at least six tests, instead of five, for each age group.
3. A plan was then devised for securing subjects who should be as nearly as possible representative of the several ages. The method was to select a school in a community of average social status, a school attended by all or practically all the children in the district where it was located. In order to get clear pictures of age differences the tests were confined to children who were within two months of a birthday. To avoid accidental selection, _all_ the children within two months of a birthday were tested, in whatever grade enrolled. Tests of foreign-born children, however, were eliminated in the treatment of results. There remained tests of approximately 1000 children, of whom 905 were between 5 and 14 years of age.
4. The children's responses were, for the most part, recorded _verbatim_. This made it possible to re-score the records according to any desired standard, and thus to fit a test more perfectly to the age level assigned it.
5. Much attention was given to securing uniformity of procedure. A half-year was devoted to training the examiners and another half-year to the supervision of the testing. In the further interests of uniformity all the records were scored by one person (the writer).
METHOD OF ARRIVING AT A REVISION. The revision of the scale below the 14-year level was based almost entirely on the tests of the above-mentioned 1,000 unselected children. The guiding principle was to secure an arrangement of the tests and a standard of scoring which would cause the median mental age of the unselected children of each age group to coincide with the median chronological age. That is, a correct scale must cause the _average_ child of 5 years to test exactly at 5, the _average_ child at 6 to test exactly at 6, etc. Or, to express the same fact in terms of intelligence quotient,[16] a correct scale must give a median intelligence quotient of unity, or 100 per cent, for unselected children of each age.
[16] The intelligence quotient (often designated as I Q) is the ratio of mental age to chronological age. (See pp. 65 _ff._ and 78 _ff._)
If the median mental age resulting at any point from the provisional arrangement of tests was too high or too low, it was only necessary to change the location of certain of the tests, or to change the standard of scoring, until an order of arrangement and a standard of passing were found which would throw the median mental age where it belonged. We had already become convinced, for reasons too involved for presentation here, that no satisfactory revision of the Binet scale was possible on any theoretical considerations as to the percentage of passes which an individual test ought to show in a given year in order to be considered standard for that year.
As was to be expected, the first draft of the revision did not prove satisfactory. The scale was still too hard at some points, and too easy at others. In fact, three successive revisions were necessary, involving three separate scorings of the data and as many tabulations of the mental ages, before the desired degree of accuracy was secured. As finally revised, the scale gives a median intelligence quotient closely approximating 100 for the unselected children of each age from 4 to 14.
Since our school children who were above 14 years and still in the grades were retarded left-overs, it was necessary to base the revision above this level on the tests of adults. These included 30 business men and 150 "migrating" unemployed men tested by Mr. H. E. Knollin, 150 adolescent delinquents tested by Mr. J. Harold Williams, and 50 high-school students tested by the writer.
The extension of the scale in the upper range is such that ordinarily intelligent adults, little educated, test up to what is called the "average adult" level. Adults whose intelligence is known from other sources to be superior are found to test well up toward the "superior adult" level, and this holds whether the subjects in question are well educated or practically unschooled. The almost entirely unschooled business men, in fact, tested fully as well as high-school juniors and seniors.
Figure 1 shows the distribution of mental ages for 62 adults, including the 30 business men and the 32 high-school pupils who were over 16 years of age. It will be noted that the middle section of the graph represents the "mental ages" falling between 15 and 17. This is the range which we have designated as the "average adult" level. Those above 17 are called "superior adults," those between 13 and 15, "inferior adults." Subjects much over 15 years of age who test in the neighborhood of 12 years may ordinarily be considered border-line cases.
The following method was employed for determining the validity of a test. The children of each age level were divided into three groups according to intelligence quotient, those testing below 90, those between 90 and 109, and those with an intelligence quotient of 110 or above. The percentages of passes on each individual test at or near that age level were then ascertained separately for these three groups. If a test fails to show a decidedly higher proportion of passes in the superior I Q group than in the inferior I Q group, it cannot be regarded as a satisfactory test of intelligence. On the other hand, a test which satisfies this criterion must be accepted as valid or the entire scale must be rejected. Henceforth it stands or falls with the scale as a whole.
When tried out by this method, some of the tests which have been most criticized showed a high degree of reliability; certain others which have been considered excellent proved to be so little correlated with intelligence that they had to be discarded.
After making a few necessary eliminations, 90 tests remained, or 36 more than the number included in the Binet 1911 scale. There are 6 at each age level from 3 to 10, 8 at 12, 6 at 14, 6 at "average adult," 6 at "superior adult," and 16 alternative tests. The alternative tests, which are distributed among the different groups, are intended to be used only as substitutes when one or more of the regular tests have been rendered, by coaching or otherwise, undesirable.[17]
[17] See p. 137 _ff._ for explanations regarding the calculation of mental age and the use of alternative tests.
Of the 36 new tests, 27 were added and standardized in the various Stanford investigations. Two tests were borrowed from the Healy-Fernald series, one from Kuhlmann, one was adapted from Bonser, and the remaining five were amplifications or adaptations of some of the earlier Binet tests.
Following is a complete list of the tests of the Stanford revision. Those designated _al._ are alternative tests. The guide for giving and scoring the tests is presented at length in Part II of this volume.
_The Stanford revision and extension_
_Year III._ (_6 tests, 2 months each._) 1. Points to parts of body. (3 to 4.) Nose; eyes; mouth; hair. 2. Names familiar objects. (3 to 5.) Key, penny, closed knife, watch, pencil. 3. Pictures, enumeration or better. (At least 3 objects enumerated in one picture.) (a) Dutch Home; (b) River Scene; (c) Post-Office. 4. Gives sex. 5. Gives last name. 6. Repeats 6 to 7 syllables. (1 to 3.) Al. Repeats 3 digits. (1 success in 3 trials. Order correct.)
_Year IV._ (_6 tests, 2 months each._) 1. Compares lines. (3 trials, no error.) 2. Discrimination of forms. (Kuhlmann.) (Not over 3 errors.) 3. Counts 4 pennies. (No error.) 4. Copies square. (Pencil. 1 to 3.) 5. Comprehension, 1st degree. (2 to 3.) (Stanford addition.) "What must you do": "When you are sleepy?" "Cold?" "Hungry?" 6. Repeats 4 digits. (1 to 3. Order correct.) (Stanford addition.) Al. Repeats 12 to 13 syllables. (1 to 3 absolutely correct, or 2 with 1 error each.)
_Year V._ (_6 tests, 2 months each._) 1. Comparison of weights. (2 to 3.) 3-15; 15-3; 3-15. 2. Colors. (No error.) Red; yellow; blue; green. 3. Æsthetic comparison. (No error.) 4. Definitions, use or better. (4 to 6.) Chair; horse; fork; doll; pencil; table. 5. Patience, or divided rectangle. (2 to 3 trials. 1 minute each.) 6. Three commissions. (No error. Order correct.) Al. Age.
_Year VI._ (_6 tests, 2 months each._) 1. Right and left. (No error.) Right hand; left ear; right eye. 2. Mutilated pictures. (3 to 4 correct.) 3. Counts 13 pennies. (1 to 2 trials, without error.) 4. Comprehension, 2d degree. (2 to 3.) "What's the thing for you to do": (a) "If it is raining when you start to school?" (b) "If you find that your house is on fire?" (c) "If you are going some place and miss your car?" 5. Coins. (3 to 4.) Nickel; penny; quarter; dime. 6. Repeats 16 to 18 syllables. (1 to 3 absolutely correct, or 2 with 1 error each.) Al. Morning or afternoon.
_Year VII._ (_6 tests, 2 months each._) 1. Fingers. (No error.) Right; left; both. 2. Pictures, description or better. (Over half of performance description:) Dutch Home; River Scene; Post-Office. 3. Repeats 5 digits. (1 to 3. Order correct.) 4. Ties bow-knot. (Model shown. 1 minute.) (Stanford addition.) 5. Gives differences. (2 to 3.) Fly and butterfly; stone and egg; wood and glass. 6. Copies diamond. (Pen. 2 to 3.) Al. 1. Names days of week. (Order correct. 2 to 3 checks correct.) Al. 2. Repeats 3 digits backwards. (1 to 3.)
_Year VIII._ (_6 tests, 2 months each._) 1. Ball and field. (Inferior plan or better.) (Stanford addition.) 2. Counts 20 to 1. (40 seconds. 1 error allowed.) 3. Comprehension, 3d degree. (2 to 3.) "What's the thing for you to do": (a) "When you have broken something which belongs to some one else?" (b) "When you are on your way to school and notice that you are in danger of being tardy?" (c) "If a playmate hits you without meaning to do it?" 4. Gives similarities, two things. (2 to 4.) (Stanford addition.) Wood and coal; apple and peach; iron and silver; ship and automobile. 5. Definitions superior to use. (2 to 4.) Balloon; tiger; football; soldier. 6. Vocabulary, 20 words. (Stanford addition. For list of words used, see record booklet.) Al. 1. First six coins. (No error.) Al. 2. Dictation. ("See the little boy." Easily legible. Pen. 1 minute.)
_Year IX._ (_6 tests, 2 months each._) 1. Date. (Allow error of 3 days in _c_, no error in _a_, _b_, or _d_.) (a) day of week; (b) month; (c) day of month; (d) year. 2. Weights. (3, 6, 9, 12, 15. Procedure not illustrated. 2 to 3.) 3. Makes change. (2 to 3. No coins, paper, or pencil.) 10--4; 15--12; 25--4. 4. Repeats 4 digits backwards. (1 to 3.) (Stanford addition.) 5. Three words. (2 to 3. Oral. 1 sentence or not over 2 coördinate clauses.) Boy, river, ball; work, money, men; desert, rivers, lakes. 6. Rhymes. (3 rhymes for two of three words. 1 minute for each part.) Day; mill; spring. Al. 1. Months. (15 seconds and 1 error in naming. 2 checks of 3 correct.) Al. 2. Stamps, gives total value. (Second trial if individual values are known.)
_Year X._ (_6 tests, 2 months each._) 1. Vocabulary, 30 words. (Stanford addition.) 2. Absurdities. (4 to 5. Warn. Spontaneous correction allowed.) (Four of Binet's, one Stanford.) 3. Designs. (1 correct, 1 half correct. Expose 10 seconds.) 4. Reading and report. (8 memories. 35 seconds and 2 mistakes in reading.) (Binet's selection.) 5. Comprehension, 4th degree. (2 to 3. Question may be repeated.) (a) "What ought you to say when some one asks your opinion about a person you don't know very well?" (b) "What ought you to do before undertaking (beginning) something very important?" (c) "Why should we judge a person more by his actions than by his words?" 6. Names 60 words. (Illustrate with clouds, dog, chair, happy.) Al. 1. Repeats 6 digits. (1 to 2. Order correct.) (Stanford addition.) Al. 2. Repeats 20 to 22 syllables. (1 to 3 correct, or 2 with 1 error each.) Al. 3. Form board. (Healy-Fernald Puzzle A. 3 times in 5 minutes.)
_Year XII._ (_8 tests, 3 months each._) 1. Vocabulary, 40 words. (Stanford addition.) 2. Abstract words. (3 to 5.) Pity; revenge; charity; envy; justice. 3. Ball and field. (Superior plan.) (Stanford addition.) 4. Dissected sentences. (2 to 3. 1 minute each.) 5. Fables. (Score 4; i.e., two correct or the equivalent in half credits.) (Stanford addition.) Hercules and Wagoner; Maid and Eggs; Fox and Crow; Farmer and Stork; Miller, Son, and Donkey. 6. Repeats 5 digits backwards. (1 to 3.) (Stanford addition.) 7. Pictures, interpretation. (3 to 4. "Explain this picture.") Dutch Home; River Scene; Post-Office; Colonial Home. 8. Gives similarities, three things. (3 to 5.) (Stanford addition.) Snake, cow, sparrow; book, teacher, newspaper; wool, cotton, leather; knife-blade, penny, piece of wire; rose, potato, tree.
_Year XIV._ (_6 tests, 4 months each._) 1. Vocabulary, 50 words. (Stanford addition.) 2. Induction test. (Gets rule by 6th folding.) (Stanford addition.) 3. President and king. (Power; accession; tenure. 2 to 3.) 4. Problems of fact. (2 to 3.) (Binet's two and one Stanford addition.) 5. Arithmetical reasoning. (1 minute each. 2 to 3.) (Adapted from Bonser.) 6. Clock. (2 to 3. Error must not exceed 3 or 4 minutes.) 6.22. 8.10. 2.46. Al. Repeats 7 digits. (1 to 2. Order correct.)
"AVERAGE ADULT." (_6 tests, 5 months each._) 1. Vocabulary, 65 words. (Stanford addition.) 2. Interpretation of fables. (Score 8.) (Stanford addition.) 3. Difference between abstract words. (3 real contrasts out of 4.) Laziness and idleness; evolution and revolution; poverty and misery; character and reputation. 4. Problem of the enclosed boxes. (3 to 4.) (Stanford addition.) 5. Repeats 6 digits backwards. (1 to 3.) (Stanford addition.) 6. Code, writes "Come quickly." (2 errors. Omission of dot counts half error. Illustrate with "war" and "spy.") (From Healy and Fernald.) Al. 1. Repeats 28 syllables. (1 to 2 absolutely correct.) Al. 2. Comprehension of physical relations. (2 to 3.) (Stanford addition.) Path of cannon ball; weight of fish in water; hitting distant mark.
"SUPERIOR ADULT." (_6 tests, 6 months each._) 1. Vocabulary, 75 words. (Stanford addition.) 2. Binet's paper-cutting test. (Draws, folds, and locates holes.) 3. Repeats 8 digits. (1 to 3. Order correct.) (Stanford addition.) 4. Repeats thought of passage heard. (1 to 2.) (Binet's and Wissler's selections adapted.) 5. Repeats 7 digits backwards. (1 to 3.) (Stanford addition.) 6. Ingenuity test. (2 to 3. 5 minutes each.) (Stanford addition.)
SUMMARY OF CHANGES. A comparison of the above list with either the Binet 1908 or 1911 series will reveal many changes. On the whole, it differs somewhat more from the Binet 1911 scale than from that of 1908. Thus, of the 49 tests below the "adult" group in the 1911 scale, 2 are eliminated and 29 are relocated. Of these, 25 are moved downward and 4 upward. The shifts are as follows:--
Down 1 year, 18 Down 2 years, 4 Down 3 years, 2 Down 6 years, 1 Up 1 year, 3 Up 2 years, 1
Of the adult group in Binet's 1911 series 1 is eliminated, 2 are moved up to "superior adult," and 1 is moved up to 14. Accordingly, of Binet's entire 54 tests, we have eliminated 3 and relocated 32, leaving only 19 in the positions assigned them by Binet. The 3 eliminated are: repeating 2 digits, resisting suggestion, and "reversed triangle."
The revision is really more extensive than the above figures would suggest, since minor changes have been made in the scoring of a great many tests in order to make them fit better the locations assigned them. Throughout the scale the procedure and scoring have been worked over and made more definite with the idea of promoting uniformity. This phase of the revision is perhaps more important than the mere relocation of tests. Also, the addition of numerous tests in the upper ranges of the scale affects very considerably the mental ages above the level of 10 or 11 years.
EFFECTS OF THE REVISION ON THE MENTAL AGES SECURED. The most important effect of the revision is to reduce the mental ages secured in the lower ranges of the scale, and to raise considerably the mental ages above 10 or 11 years. This difference also obtains, though to a somewhat smaller extent, between the Stanford revision and those of Goddard and Kuhlmann.
For example, of 104 adult individuals testing by the Stanford revision between 12 and 14 years, and who were therefore somewhat above the level of feeble-mindedness as that term is usually defined, 50 per cent tested below 12 years by the Goddard revision. That the dull and border-line adults are so much more readily distinguished from the feeble-minded by the Stanford revision than by other Binet series is due as much to the addition of tests in the upper groups as to the relocation of existing tests.
On the other hand, the Stanford revision causes young subjects to test lower than any other version of the Binet scale. At 5 or 6 years the mental ages secured by the Stanford revision average from 6 to 10 months lower than other revisions yield.
The above differences are more significant than would at first appear. An error of 10 months in the mental age of a 5-year-old is as serious as an error of 20 months in the case of a 10-year-old. Stating the error in terms of the intelligence quotient makes it more evident. Thus, an error of 10 months in the mental age of a 5-year-old means an error of almost 15 per cent in the intelligence quotient. A scale which tests this much too low would cause the child with a true intelligence quotient of 75 (which ordinarily means feeble-mindedness or border-line intelligence) to test at 90, or only slightly below normal.
Three serious consequences came from the too great ease of the original Binet scale at the lower end, and its too great difficulty at the upper end:--
1. In young subjects the higher grades of mental deficiency were overlooked, because the scale caused such subjects to test only a little below normal.
2. The proportion of feeble-mindedness among adult subjects was greatly overestimated, because subjects who were really of the 12- or 13-year mental level could only earn a mental age of about 11 years.
3. Confusion resulted in efforts to trace the mental growth of either feeble-minded or normal children. For example, by other versions of the Binet scale an average 5-year-old will show an intelligence quotient probably not far from 110 or 115; at 9, an intelligence quotient of about 100; and at 14, an intelligence quotient of about 85 or 90.
By such a scale the true border-line case would test approximately as follows:--
At age 5, 90 I Q (apparently not far below normal). At age 9, 75 I Q (border-line). At age 14, 65 I Q (moron deficiency).
On the other hand, re-tests of children by the Stanford revision have been found to yield intelligence quotients almost identical with those secured from two to four years earlier by the same tests. Those who graded feeble-minded in the first test graded feeble-minded in the second test: the dull remained dull, the average remained average, the superior remained superior, and always in approximately the same degree.[18]
[18] See "Some Problems relating to the Detection of Border-line Cases of Mental Deficiency," by Lewis M. Terman and H. E. Knollin, in _Journal of Psycho-Asthemes_, June, 1916.
It is unnecessary to emphasize further the importance of having an intelligence scale which is equally accurate at all points. Absolute perfection in this respect is not claimed for the Stanford revision, but it is believed to be at least free from the more serious errors of other Binet arrangements.