Test Tools

 

28 Signs That You Totally Have Senioritis:

Achievement Test Grid *

Ron Dumont
John Willis

Tests with similar purposes and even similar names are not interchangeable.  Differences in norms (who was tested, where, and how long ago), content, format, and administration rules (e.g., time limits, bonus points for speed, rules for determining which items are administered to each examinee) can yield very different scores for the same individual on two tests that superficially seem very similar to each other.  See, for example, Bracken (1988) and Schultz (1988).

Tests that purport to measure the same general ability or skill may sample different component skills. For example, if a student has word-finding difficulties, a reading comprehension test that requires recall of one, specific correct word to complete a sentence with a missing word (cloze technique) might be much more difficult than an otherwise comparable reading comprehension test that offers multiple choices from which to select the correct missing word for the sentence (maze technique) or a test with open-ended comprehension questions.  Similarly, for a student with adequate reading comprehension but weak short-term memory, the difference between answering questions about a reading passage that remains in view and answering questions after the passage has been removed could make a dramatic difference in scores.  The universal question of students taking classroom tests – “Does spelling count?” – certainly applies to interpretation of formal, normed tests of written expression.

Such differences in format make little difference in average scores for large groups of examinees, so achievement-test manuals usually report robust correlations between various achievement tests despite differences in test format and specific skills sampled (see, for example, McGrew, 1999, or the validity section of any test manual).  [These issues also apply to tests of cognitive ability, where they also receive insufficient attention.]

Examiners need to select achievement tests that measure the skills relevant to the referral concerns and to avoid or carefully interpret tests that measure some ability (such as word-finding or memory) that would distort direct measurement of the intended achievement skill.  When selecting or interpreting a test, think about the actual demands imposed on the examinee, the referral concerns and questions, and what you know about your examinee’s skills and weaknesses. 

The tables for selected reading, phonological, and writing tests may be found  in the complete Word file below.  They are based on those developed by Sara Brody (2001) and are being corrected over time by Ron Dumont, Laurie Farr Hanks, Melissa Farrall, Sue Morbey, and John Willis.

Achievement Test Grid 3.7.12

They are only rough summaries, but they at least illustrate the issue of test content and format and may possibly help guide test selection.  For extensive information on achievement tests, please see the test manuals, the publishers’ Web pages for the tests, the Buros Institute Test Reviews Online, and the Achievement Test Desk Reference (Flanagan, Ortiz, Alfonso, & Mascolo, 2006).

__________________________________

Bracken, B. A. (1988).  Ten psychometric reasons why similar tests produce dissimilar results.  Journal of School Psychology, 26, (2), 155-166.

Brody, S. (Ed) (2001).  Teaching reading: Language, letters & thought (2nd ed.).  LARC Publishing, P.O. Box  801, Milford, NH 03055 (603-880-7691 http://www.larcpublishing.com).

Flanagan, D. P., Ortiz, S. O., Alfonso, V. & Mascolo, J. T. (2006).  Achievement test desk reference (ATDR-II): A guide to learning disability identification (2nd ed.).  Hoboken, NJ: Wiley.

McGrew, K. S. (1999). The Measurement of reading achievement by different individually administered standardized reading tests:  Apples and apples, or apples and oranges? IAP Research Report No. 1.  Clearwater, MN: Institute for Applied Psychometrics. 

Schultz, M. K. (1988).  A comparison of Standard Scores for commonly used tests of early reading.  Communiqué, 17 (4), 13.

Buros Institute of Mental Measurements (n.d.).  Test reviews online. Retrieved from http://buros.unl.edu/buros/jsp/search.jsp.

EVALUATION CHECKLIST *

Ron Dumont
John Willis

I. CONTEXT OF THE EVALUATION

A. Have all assessment results been placed in context, including the cautious, skeptical interpretation of the following?

1. The student’s actual, real‑life performance in school;

2. The student’s educational history;

3. The student’s personal, familial, and medical history;

4. Reports and referral questions from therapists, parents, and the student (actively solicit reports and questions if necessary);

5. The student’s self-reports;

6. Previous evaluations; and

7. Concurrent findings of other evaluators.

8. Have all data been integrated?

B. Have commonalties been given due weight; and have disparities been acknowledged and discussed?

C. Has knowledge of the need for multiple sources of convergent data been shown?

D. Have the final decisions taken into account the above considerations?

E. Is the choice of tests sufficient to sample behaviors for a specific purpose (e.g., psychoeducational testing)?

F. Is the choice of tests appropriate to the test takers?

G. Is the choice of tests as free from bias as possible, considering the standardization sample and the test-taker population?

H. Do the tests used distinguish between skills rather than lumping them together? For example, a combined reading score based on both oral reading and reading comprehension may mask important differences between the two skills.

II. TEST STANDARDIZATION AND NORMS

A. Do not alter standardized tests especially normed, standardized tests.

1. Were all basal and ceiling rules obeyed, and were they the correct ones, since the rules vary from test to test?

2. Were the test instructions and items read verbatim; with appropriate demonstrations given as instructed?

a. Was there any coaching, helping, or teaching given, except as instructed?

b. Was there any ad libbing during standardized instructions?

c. Did the examiner appear to have practiced the instructions so that they were delivered in a smooth, conversational tone?

e. Was there any unauthorized feedback given?

f. Were all time limits, timed presentations, and timed delays adhered to?

3. Was the test learned and practiced to complete mastery before using it? Was qualified supervision obtained when learning and using a new test?

4. If “testing the limits” has been used, does the report contain clear statements about what was done, why, and how it was done? Has it been made absolutely certain that results of limit‑testing cannot possibly be confused with valid scores?  Was testing the limits done in ways that ensured that it will not influence later items, e.g., by providing extra practice.

5. For students with severe and low‑incidence disabilities, did the examiner adopt appropriate tests rather than adapt inappropriate ones?

6. Were the students tested in their specific native languages (e.g., Puerto Rican vs. Castilian Spanish or American Sign Language vs. Signed English) with tests normed in those languages?

7. Was consideration given to the consequences of taking subtests out of context?

8. Were the scoring keys and test materials kept secure?

9. Were all protocols and record forms original and not photocopies of copyrighted materials?  [This consideration is especially important for response forms used by the examinee because photocopies may alter the difficulty of the task.]

10. Did the evaluator refrain from answering questions from test takers in greater detail than the test manual allows?

B. Was appropriate attention given to test norms?

1. Were all tests used normed on a genuinely representative sample that was:

a. Sufficiently large;

b. Appropriately stratified and randomized for:

i. sexes

ii. geographic regions

iii. racial and ethnic groups

iv. disabilities

v. income and educational levels

vi. other germane variables.

vii. interactions of these variables

c. National, international, or appropriate, clearly specified subgroup;

d. Truthfully and completely presented in test manual;

e. Recent; and

f. The appropriate test level for the student?

C. Is the evaluator aware of possible errors in norms tables?

1. Printed norms.

2. Computerized norms.

D. Does the evaluator take into account the risk of significant changes in scores resulting from movement between norms tables (e.g., the overnight change from Fall norms to Spring norms)?

E. Is the evaluator appropriately skeptical of publishers’ claims?

F. Does the evaluator establish rapport with examinees to obtain accurate scores?

III. RELIABILITY

A. Is the evaluator aware of and has she or he taken into account the Standard Error of Measurement (SEm)?

1. Does the evaluator consistently use 90% (1.65 or 1 2/3 SEm) or 95% (1.96 or 2 SEm) confidence bands?

2. Is the meaning of the confidence band explained clearly in lay terms?

4. Does the evaluator recognize that a test score was obtained once, at a specific time and place?

5. Does the evaluator recognize and explain that the confidence band does not include errors and problems with test administration and conditions?

6. Does the evaluator distinguish clearly between reliability and validity?

7. Does the evaluator distinguish between Standard Error of Measurement (SEm) and Standard Error of Estimate (SEest)?

8. Does the evaluator use the correct confidence band for the appropriate score: raw, ability, W, standard score, percentile rank, etc.?

B. Does the evaluator determine how reliability data were obtained?

1. How large were the samples?

2. Are there reliability data for students similar to the student being tested?

3. Are the time intervals comparable to those we are concerned with?

C. Does the evaluator understand and make appropriate decisions about tests that use rigid “Cut‑off scores” for decision making?

IV. VALIDITY

A. Does the evaluator appreciate the implications of test validity?

1. Validity for what purposes?

2. Validity for what groups?

3. Determine how validity data was obtained

a. How large were the samples?

b. Are there validity data for students similar to the student being tested?

c. What were the criterion measures? Are we using a closed circle of validating tests against other very similar tests?

d. Are the time intervals comparable to those we are concerned with?

B. Does the evaluator understand the concept of construct validity?

C. Does the evaluator understand the relationship between validity and reliability?

F. Does the evaluator understand and take into consideration issues related to “Incremental validity”?

G. Does the evaluator interpret tests only in ways for which validity has been established?

H. Does the evaluator keep a record of all test data for follow-up to establish trends and understand how the test works in the local situation?

I. Does the evaluator demonstrate an appreciation for the limitations of each specific test’s content coverage?

V. SCORING

A. Does the evaluator avoid errors in scoring and recording by:

1.      Using two straight‑edges in the scoring tables;

2.       If necessary, photocopying the norms table and drawing lines and circling numbers;

3.       Checking the accuracy of tables by inspecting adjacent scores;

4.       Reading table titles and headings aloud while scoring;

5.       Rechecking all scores;

6.       Checking them again;

7.       Getting someone else to check them;

8.       Scoring by both age and grade norms, if available, and comparing the results;

9.       Recording the student’s name and the date on all sheets of paper;

10.  Checking the student’s birthdate and age with the student and calculating the age correctly by the rules for the particular test.

11.  Performing thought experiments with tables, e.g., What if the student had made two lucky or unlucky guesses? What if the student were 6 months older or younger? etc.;

12.   Recording all responses verbatim;

13.   Keeping raw data for future use;

14.   Using consistent notations for correct and incorrect answers, no responses, “I don’t know” responses , and examiner’s questions; making sure the examinee cannot determine from the number or direction of pencil strokes which notations  are being made;

15.  Using protractors and templates consistently, carefully, and correctly and, if uncertain, having someone check the scoring;

16.   Following computer‑scoring instructions slavishly, including the sequence in which  the CPU and peripheral equipment are turned on;

17. Checking continued accuracy of computer results by occasionally hand scoring;

18. Being certain to have the latest version of the scoring program, to  know of any new  errors or bugs in it, and  to have the protocols that go with that version;

19. Understanding and clearly explaining differences among standard scores, scaled scores, normal curve equivalents, percentile ranks, and other scores;

20. Using age-equivalent (“mental age”) and grade-equivalent scores sparingly, if at all, explaining them and their myriad limitations clearly,  making sure they have some relationship to reality; and bracketing them with 90% or 95% confidence bands, just as  with standard scores; and

21. Following scoring instructions exactly as described in the test manual, and, when in doubt, obtaining a second opinion or asking the test publisher for clarification.

VI. INTERPRETATION OF EVALUATION RESULTS

A. Does the evaluator distinguish clearly between different tests, clusters, factors, subtests, and scores with similar titles?

1. e.g., “Reading Comprehension” is not the same skill on different reading tests.

2. e.g., “Processing Speed is not the same ability on different intelligence tests.

B. Does the evaluator explain with words and figures all the statistics used in the reports?

C. Does the evaluator explain differences between different statistics for different tests that are included in the reports?

D. Does the evaluator explain the names (e.g., “Below Average”) for the statistics used in the reports?

E. Does the evaluator explain differences between different names for the same scores on various tests that are used in the reports?

F. Does the evaluator distinguish clearly between findings and implications?

G. Do the interpretation and recommendations demonstrate an understanding of the disabilities and the programs, not merely of the tests?

H. Does the identification of a disability demonstrate a reasoned, clinical judgment, rather than simply an exercise in arithmetic?

I. Does the report offer specific, detailed recommendations and give a rationale for each?

J. Does the evaluator eschew boilerplate?

K. Does the evaluator detect and reject unvalidated computer software?

L. Does the evaluator use computer reports to help interpret data and plan reports rather than simply including or retyping the actual printouts in reports?

M. Does the evaluator recognize that students’ skills in related areas may differ dramatically and unexpectedly?

N. Does the evaluator explain the mechanism of the disability?

1. For example, a specific learning disability is a disorder in one or more of the basic psychological processes involved in understanding or in using language, spoken or written that may manifest itself in an imperfect ability to listen, speak, think, read, write, spell, or do mathematical calculations. Does the evaluator make an attempt to describe the student’s specific disorder(s)?

2. Similarly, a serious emotional disturbance must be based on a psychological disorder. Does the evaluator attempt to specify, define, and explain the disorder, not just the behaviors?

O. Does the evaluator report genuinely germane observations from test sessions, while at the same time being clear that behaviors in a test session may be unique to that test session and may never be seen in any other context?

P. Does the evaluator pay attention to the reported observations?  For example, if an evaluator cites the student’s boredom or fatigue, it would not make sense to declare, “Test results are assumed to be valid.” foolish.

Q. Does the evaluator consider practice effects when tests are re-administered? Does the evaluator consider differential practice effects on different subtests?

R. Does the evaluator appraise the entire pattern of the student’s abilities, not merely weaknesses?

S. Does the evaluator revisit the verbatim record of the student’s actual responses before accepting “canned” interpretations from the manual, handbook, or computer printout? For instance, WISC-III Comprehension measures Social Studies achievement as well as “social comprehension,” Picture Completion almost never measures the “ability to distinguish essential from nonessential details,” and young children can earn high scores for “verbal abstract reasoning” with entirely concrete responses to Similarities.

T. Does the evaluator base conclusions and recommendations on multiple sources of convergent data (not just test scores)?

U. Does the evaluator understand and advise about the limitations of norms, especially grade equivalents, for student populations differing markedly from the norm sample?

V. Does the evaluator have a thorough understanding of both standard scores and percentile ranks?

W. Does the evaluator keep up with the field and check interpretations with other professionals?

X. Does the evaluator apply principles of test theory and principles of test interpretation?

Y. Does the evaluator avoid interpretation beyond the limits of the test?

Z. Does the evaluator avoid idiotic interpretations?


Score Conversion Tables for Commonly Used Tests  *

All Wechsler Scales, all Woodcock tests,1 all Kaufman Tests, Most Tests Published by American Guidance Service, Pro-Ed,2 Riverside, the Psychological Corporation, and Many Others

Z-scores (z), Percentile Ranks (PR), Standard Scores (SS) (Mean = 100, s.d. = 15),  Scaled Scores (ss) (Mean = 10, s.d. = 3), and Stanines (9)

Z

PR

SS

ss

9

z

PR

SS

ss

9

z

PR

SS

ss

9

+4.00

99.9

160

+1.33

91

120

14

8

-1.40

08

79

+3.93

99.9

159

+1.27

90

119

-1.47

07

78

+3.87

99.9

158

+1.20

88

118

-1.53

06

77

2

+3.80

99.9

157

+1.13

87

117

-1.60

05

76

+3.73

99.9

156

+1.07

86

116

-1.67

05

75

5

+3.67

99.9

155

+1.00

84

115

13

7

-1.73

04

74

+3.60

99.9

154

+0.93

82

114

-1.80

04

73

+3.53

99.9

153

+0.87

81

113

-1.87

03

72

+3.47

99.9

152

+0.80

79

112

-1.93

03

71

+3.40

99.9

151

+0.73

77

111

-2.00

02

70

4

+3.33

99.9

150

+0.67

75

110

12

-2.07

02

69

+3.27

99.9

149

+0.60

73

109

-2.13

02

68

+3.20

99.9

148

+0.53

70

108

6

-2.20

01

67

+3.13

99.9

147

+0.47

68

107

-2.27

01

66

+3.07

99.9

146

+0.40

66

106

-2.33

01

65

3

+3.00

99.9

145

19

+0.33

63

105

11

-2.40

01

64

+2.93

99.8

144

+0.27

61

104

-2.47

01

63

+2.87

99.8

143

+0.20

58

103

-2.53

01

62

+2.80

99.7

142

9

+0.13

55

102

-2.60

0.5

61

1

+2.73

99.7

141

+0.07

53

101

-2.67

0.4

60

2

+2.67

99.6

140

18

0.00

50

100

10

5

-2.73

0.3

59

+2.60

99.5

139

-0.07

47

99

-2.80

0.3

58

+2.53

99

138

-0.13

45

98

-2.87

0.2

57

+2.47

99

137

-0.20

42

97

-2.93

0.2

56

+2.40

99

136

-0.27

39

96

-3.00

0.1

55

1

+2.33

99

135

17

-0.33

37

95

9

-3.07

0.1

54

+2.27

99

134

-0.40

34

94

-3.13

0.1

53

+2.20

99

133

-0.47

32

93

4

-3.20

0.1

52

+2.13

98

132

-0.53

30

92

-3.27

0.1

51

+2.07

98

131

-0.60

27

91

-3.33

0.1

50

+2.00

98

130

16

-0.67

25

90

8

-3.40

0.1

49

+1.93

97

129

-0.73

23

89

-3.47

0.1

48

+1.87

97

128

-0.80

21

88

-3.53

0.1

47

+1.80

96

127

-0.87

19

87

3.60

0.1

46

+1.73

96

126

-0.93

18

86

-3.67

0.1

45

+1.67

95

125

15

-1.00

16

85

7

3

-3.73

0.1

44

+1.60

95

124

-1.07

14

84

-3.80

0.1

43

+1.53

94

123

8

-1.13

13

83

-3.87

0.1

42

+1.47

93

122

-1.20

12

82

-3.93

0.1

41

+1.40

92

121

-1.27

10

81

-4.00

0.1

40

-1.33

09

80

6

2

                1.  Tests of which Dr. Woodcock is author or co-author separately compute Standard Scores and Percentile Ranks, so these will not be the precise relationships between Standard Scores and Percentile Ranks.

                2.  Tests published by Pro-Ed call these Standard Scores “quotients,” and these Scaled Scores “Standard Scores”  (which they are, although that is not the common usage).

Differential Ability Scales and Other Tests Using T scores1 *

 Z-scores (z), Percentile Ranks (PR), Standard Scores (General Conceptual Ability — GCA) (mean = 100, s.d. = 15), T Scores (T) (mean = 50, s.d. = 10), and Stanines (9)

z

PR

GCA

T

9

z

PR

GCA

T

9

z

PR

GCA

T

9

36

+4.00

99.9

160

90

+1.33

91

120

8

-1.40

08

79

36

+3.93

99.9

159

+1.27

90

119

-1.47

07

78

+3.87

99.9

158

+1.20

88

118

62

-1.53

06

77

2

+3.80

99.9

157

88

+1.13

87

117

-1.60

05

76

34

+3.73

99.9

156

+1.07

86

116

-1.67

05

75

+3.67

99.9

155

+1.00

84

115

60

7

-1.73

04

74

+3.60

99.9

154

86

+0.93

82

114

-1.80

04

73

32

+3.53

99.9

153

+0.87

81

113

-1.87

03

72

+3.47

99.9

152

+0.80

79

112

58

-1.93

03

71

+3.40

99.9

151

84

+0.73

77

111

-2.00

02

70

30

+3.33

99.9

150

+0.67

75

110

-2.07

02

69

+3.27

99.9

149

+0.60

73

109

56

-2.13

02

68

+3.20

99.9

148

82

+0.53

70

108

6

-2.20

01

67

28

+3.13

99.9

147

+0.47

68

107

-2.27

01

66

+3.07

99.9

146

+0.40

66

106

54

-2.33

01

65

+3.00

99.9

145

80

+0.33

63

105

-2.40

01

64

26

+2.93

99.8

144

+0.27

61

104

-2.47

01

63

+2.87

99.8

143

+0.20

58

103

52

-2.53

01

62

+2.80

99.7

142

78 9

+0.13

55

102

-2.60

0.5

61

24

1

+2.73

99.7

141

+0.07

53

101

-2.67

0.4

60

+2.67

99.6

140

0.00

50

100

50

5

-2.73

0.3

59

+2.60

99.5

139

76

-0.07

47

99

-2.80

0.3

58

22

+2.53

99

138

-0.13

45

98

-2.87

0.2

57

+2.47

99

137

-0.20

42

97

48

-2.93

0.2

56

+2.40

99

136

74

-0.27

39

96

-3.00

0.1

55

20

+2.33

99

135

-0.33

37

95

-3.07

0.1

54

+2.27

99

134

-0.40

34

94

46

-3.13

0.1

53

+2.20

99

133

72

-0.47

32

93

4

-3.20

0.1

52

18

+2.13

98

132

-0.53

30

92

-3.27

0.1

51

+2.07

98

131

-0.60

27

91

44

-3.33

0.1

50

+2.00

98

130

70

-0.67

25

90

-3.40

0.1

49

16

+1.93

97

129

-0.73

23

89

-3.47

0.1

48

+1.87

97

128

-0.80

21

88

42

-3.53

0.1

47

+1.80

96

127

68

-0.87

19

87

3.60

0.1

46

14

+1.73

96

126

-0.93

18

86

-3.67

0.1

45

+1.67

95

125

-1.00

16

85

40

3

-3.73

0.1

44

+1.60

95

124

66

-1.07

14

84

-3.80

0.1

43

12

+1.53

94

123

8

-1.13

13

83

-3.87

0.1

42

+1.47

93

122

-1.20

12

82

38

-3.93

0.1

41

+1.40

92

121

64

-1.27

10

81

-4.00

0.1

40

10

-1.33

09

80

2

1.  Odd-numbered T scores fall between Standard Scores.  For instance, a T score of 49 is equivalent to a z-score of   -0.10, and a standard score of 98.5

Stanford-Binet Intelligence Scale, 4th ed. and Other Tests Using Standard Scores with a Mean of 100 and SD 16

Z-scores (z), Percentile Ranks (PR), Composite and Area Standard Scores (SAS[1]) (Mean = 100 s.d. = 16), Subtest Standard Scores (sas) (Mean = 50, s.d. = 8), and Stanines (9)

Z

PR

SAS

sas

9

z

PR

SAS

sas

9

z

PR

SAS

sas

9

+4.00

99.9

164

82

+1.31

91

121

61

8

-1.38

08

78

39

+3.94

99.9

163

82

+1.25

89

120

60

-1.44

08

77

39

2

+3.88

99.9

162

81

+1.19

88

119

60

-1.50

07

76

38

+3.81

99.9

161

81

+1.13

87

118

59

-1.56

06

75

38

+3.75

99.9

160

80

+1.06

86

117

59

-1.63

05

74

37

+3.69

99.9

159

80

+1.00

84

116

58

7

-1.69

05

73

37

+3.63

99.9

158

79

+0.94

83

115

58

-1.75

04

72

36

+3.56

99.9

157

79

+0.88

81

114

57

-1.81

04

71

36

+3.50

99.9

156

78

+0.81

79

113

57

-1.88

03

70

35

+3.44

99.9

155

78

+0.75

77

112

56

-1.94

03

69

35

+3.38

99.9

154

77

+0.69

75

111

56

-2.00

02

68

34

+3.31

99.9

153

77

+0.63

73

110

55

-2.06

02

67

34

+3.25

99.9

152

76

+0.56

71

109

55

-2.13

02

66

33

+3.19

99.9

151

76

+0.50

69

108

54

6

-2.19

01

65

33

+3.13

99.9

150

75

+0.44

67

107

54

-2.25

01

64

32

+3.06

99.9

149

75

9

+0.38

65

106

53

-2.31

01

63

32

+3.00

99.9

148

74

+0.31

62

105

53

-2.38

01

62

31

+2.94

99.8

147

74

+0.25

60

104

52

-2.44

01

61

31

+2.88

99.8

146

73

+0.19

57

103

52

-2.50

01

60

30

+2.81

99.8

145

73

+0.13

55

102

51

-2.53

01

59

30

+2.75

99.7

144

72

+0.06

52

101

51

-2.63

0.4

58

29

+2.69

99.6

143

72

0.00

50

100

50

5

-2.69

0.4

57

29

+2.63

99.6

142

71

-0.06

48

99

50

-2.75

0.3

56

28

+2.56

99

141

71

-0.13

45

98

49

-2.81

0.3

55

28

1

+2.50

99

140

70

-0.19

43

97

49

-2.88

0.2

54

27

+2.44

99

139

70

-0.25

40

96

48

-2.94

0.2

53

27

+2.38

99

138

69

-0.31

38

95

48

-3.00

0.1

52

26

+2.31

99

137

69

-0.38

35

94

47

-3.06

0.1

51

26

+2.25

99

136

68

-0.44

33

93

47

4

-3.13

0.1

50

25

+2.19

99

135

68

-0.50

31

92

46

-3.19

0.1

49

25

+2.13

98

134

67

-0.56

29

91

46

-3.25

0.1

48

24

+2.06

98

133

67

-0.63

27

90

45

-3.31

0.1

47

24

+2.00

98

132

66

-0.69

25

89

45

-3.38

0.1

46

23

+1.94

97

131

66

-0.75

23

88

44

-3.44

0.1

45

23

+1.88

97

130

65

-0.81

21

87

44

-3.50

0.1

44

22

+1.81

96

129

65

-0.88

19

86

43

-3.56

0.1

43

22

+1.75

96

128

64

-0.94

17

85

43

3

-3.63

0.1

42

21

+1.69

95

127

64

-1.00

16

84

42

-3.69

0.1

41

21

+1.63

95

126

63

-1.06

14

83

42

-3.75

0.1

40

20

+1.56

94

125

63

-1.13

13

82

41

-3.81

0.4

39

20

+1.50

93

124

62

8

-1.19

12

81

41

-3.88

0.1

38

19

+1.44

92

123

62

-1.25

11

80

40

-3.94

0.1

37

19

+1.38

92

122

61

-1.31

09

79

40

2

-4.00

0.1

36

18


[1] Despite having different means and standard deviations, the Stanford-Binet Intelligence Scale, 4th ed. uses the same term – Standard Age Scores – to refer to the scores obtained on both the individual tests and the area composite scores.  We use capitalized “SAS” to refer to the area composite score and the lower case “sas” to refer to the individual test score.

See also  Mather, N., & Jaffe, L. E. (in press) Score Equivalents and Classification Labels

Stanford-Binet Fifth Edition (SB5) *

Broad and Narrow Abilities

Dumont-Willis

These classifications are based on Ron Dumont and John Willis’s understanding of the Gf-Gc classification system.  These may or may not be consistent with the SB5 manuals.

Nonverbal

Verbal

Broad Gf-Gc

Activity

Narrow Ability

Testlet

Activity

Narrow Ability

Testlet

Fluid Reasoning

Object Series Induction Routing Early Reasoning Induction 2, 3
Matrices Induction Routing Verbal Absurdities Induction, Language Development 4
Verbal Analogies Induction, Language Development 5, 6

Crystallized

Procedural Knowledge Language Development, Listening Ability 2, 3 Vocabulary Language Development Routing
Picture Absurdities Language Development, General Information 4, 5, 6

Quantitative Reasoning

Quantitative Reasoning Mathematical Achievement 2, 3, 4 Quantitative Reasoning Mathematical Knowledge 2, 3
Mathematical Achievement, Working Memory 5 Mathematical Achievement 4
Mathematical Achievement 6 Mathematical Achievement, Working Memory 5, 6

Visual Spatial Processing

Form Board Visualization 1, 2 Position and Direction Spatial Relations 2, 3, 4
Form Pattern Visualization 3, 4, 5, 6 Visualization 5, 6

Short-term Memory

Delayed Response Working Memory 1 Memory for Sentences Memory Span, Language Development 2, 3
Block Span Memory Span 2, 3, 4, 5, 6 Last Word Working Memory 4, 5, 6

Induction: Ability to discover the underlying characteristic that governs a problem or set of materials; Language Development: General development or the understanding of words, sentences, and paragraphs (not requiring reading) in spoken native language skills; Listening Ability: Ability to listen and comprehend oral communications; General Information: Range of general knowledge; Mathematical Achievement: Measured mathematics achievement; Mathematical Knowledge: Range of general knowledge about mathematics; Visualization: Ability to manipulate objects or visual patterns mentally and to “see” how they would appear under altered conditions; Spatial Relations: Ability to perceive and manipulate visual patterns rapidly or to maintain orientation with respect to objects in space; Working Memory: Ability to store temporarily and perform a set of cognitive operations on information that requires divided attention and the management of the limited capacity of short-term memory; Memory Span: Ability to attend to and immediately recall temporally ordered elements in the correct order after a single presentation

D. P. Flanagan, S. O. Ortiz, V. Alphonso, & J. T. Mascolo (2002) Achievement Test Desk Reference (ATDR): Comprehensive Assessment and Learning Disability (Boston: Allyn & Bacon)

K. S. McGrew, & D. P. Flanagan (1998), The Intelligence Test Desk Reference (ITDR): Gf-Gc Cross-Battery Assessment (Boston: Allyn & Bacon)

Testing Templates *

A handy dandy chart by John Willis showing the range of average scores for a variety of common statistics..
Average Range Scores Parody

For all the templates, click on the tabs at the bottom to access the worksheets

CHC Templates for the ATDR
ATDR-IIv1.3-1

Dumont Willis WAIS Interpretive Worksheet
WAISIVv12 pub-2

Dumont Willis WISC IV Interpretative Worksheet
WISCIV2pub-1

Psycholinguistic Analyis of the Woodcock Johnson Understanding Directions Subtest
UDAnalysisSheet-1

Normal Curve
NormalCurve-1

Dumont Willis SB5 Interpretive Worksheet
STANFORDBINETV9.12.03pub-1

 WPPSI  III Interpretive Worksheet
WPPSIIIIpub

DAS II Interpretive Worksheets
DASIIComputerAssistantv1.2pub-2

WJ III Template
WJIII Template

Links to Other Test Tools Pages *

RIAS

DAS-2

WJ 4

Wechsler 10 Subtest Tables

WISC V

KTEA-3

WIAT III

Stanford Binet Fifth Edition

DWEEEB