Test Info – My School Psychology

Table of Contents

DISCLAIMER:

While on this and other pages we provide links where specific tests are offered for sale, that is mainly so readers can obtain additional information regarding those tests. If anyone desires to actually purchase the test, we recommend doing a Google search and comparing the prices from various vendors. Often savings can be found, and occasionally very substantial savings. Editor

New Resources on Special Education Evaluations from NCLD (Added July 27, 2021)\

“Dear friends,

Educators and school leaders are preparing for a new school year and will face challenges like never before. One challenge coming will be an increase in referrals for initial evaluations for special education and new requests to add additional services to children’s existing Individualized Education Programs (IEPs). In addition to a backlog created by out-of-school time and remote education, the dislocation, trauma and loss caused by the pandemic add complicating factors to an already challenging eligibility process. It will be more complex for schools to determine eligibility for special education as they must seek to determine if a disability is the primary cause of a student’s academic, social, emotional, or behavioral challenges.

If not done carefully, districts face a risk of misidentifying students as needing special education when, in fact, they are in need of instruction and support that can mitigate the impact of the pandemic.

To help districts navigate these complex challenges, NCLD has developed three briefs to inform state and district policies and practices. We’ve also developed a primer on parent rights and ways to advocate to help parents and caregivers understand their rights and ways to engage in the special education process.”

Link (with images) to the Three Briefs

PowerPoint Presentation by Jerome Sattler (10/20/2018). Jerome Sattler has posted a PowerPoint presentation from fall, 2018 on his website.

Age Norms vs. Grade Norms (June 24, 2018)

When it is best to use age norms vs. grade norms has long been a matter of concern and discussion in various forums, including Yahoo’s schoolpsychology.com listserv. In this response to intervention era of SLD identification, when most probles used in curriculum based measurement are grade norms, it has become an even hotter topic. In addition to published nationally normed probes, at least one test author has published achievement tests in reading and math that are grade, not age, normed, the Feifer Assessment of Reading and the Feifer Assessment of Math. The following is a slide presention by John Willis that discussed some of the issues. The basic theme is that it is NEVER appropriate to compare grade normed test results with an age normed intelligence test. However, the Woodcock Johnson Test of Cognitive Abilitiies in its various updates has provided both grade and age standard scores for comparison. However, as the 2006 regulations pointed out, simply switching to grade norms on the Test of Cognitive Abilities does not end the discussion or the debate if a child has been retained in grade. Some have argued that the regulations prohibit identifying a child as having a specific learning disability if the reading or math problems were the result of the child not having had appropriate instruction, so even if a child has been retained, grade norms should be used. That however is a difficult argument to sustain when the reason for the retention in most cases was the child was failing to meet grade level expectations in the grade in which s/he was retained. (Obviously, if Johnny was retained because he missed six months of schooling because he was in a coma or traveling around the world with his parents, , that might lead to a different conclusion.)

Age Norms vs. Grade Norms

The federal Part B regulations from 2006 addressed the issue briefly.

From the federal regulations:

The performance of classmates and peers is not an appropriate standard if most children in a class or school are not meeting Stateapproved standards. Furthermore, using grade-based normative data to make this determination is generally not appropriate for children who have not been permitted to progress to the next academic grade or are otherwise older than their peers. Such a practice may give the illusion of average rates of learning when the child’s rate of learning has been below average, resulting in retention. A focus on expectations relative to abilities or classmates simply dilutes expectations for children with disabilities. p. 46652 Part B 2006 Regulation

In states and school systems using a response to intervention system based on measurements from grade normed probles the discusson takes on a slightly different flavor. In those instances, the question that must be asked is “should our measurements be based on the child’s performance in his/her current grade?” Or should we be looking what we would have expected had the child not been retained?

Flynn Effect (Added 6/17/2018)

Although Wikipedia is generally regarded as a less than reliable source of scientific information, their on-line description of the Flynn Effect is one of the easiest reading I’ve found on the topic.

An excerpt from the Wikipedia review follows:

The Flynn effect is the substantial and long-sustained increase in both fluid and crystallized intelligence test scores measured in many parts of the world over the 20th century.^[1] When intelligence quotient (IQ) tests are initially standardized using a sample of test-takers, by convention the average of the test results is set to 100 and their standard deviation is set to 15 or 16 IQ points. When IQ tests are revised, they are again standardized using a new sample of test-takers, usually born more recently than the first. Again, the average result is set to 100. However, when the new test subjects take the older tests, in almost every case their average scores are significantly above 100.

Intelligence test score increases have been continuous and approximately linear from the earliest years of testing to the present. For the Raven’s Progressive Matrices test, a study published in the year 2009 found that British children’s average scores rose by 14 IQ points from 1942 to 2008.^[2] Similar gains have been observed in many other countries in which IQ testing has long been widely used, including other Western Europeancountries, Japan, and South Korea.^[1]

There are numerous proposed explanations of the Flynn effect, as well as some skepticism about its implications. Similar improvements have been reported for other cognitions such as semantic and episodic memory.^[3] Research published in 2018 suggests that the Flynn effect may have ended in at least a few developed nations, including Norway

And therein lies the rub. The popular press based on that research has been raising a cry of alarm. (See four links to examples below.)

However, Dr. Kevin McGrew summarized the current situation in an email to the School Psychology Listserv, on June 17, 2018, The state of research in this country on Flynn Effect remains substantially unchanged.

“The extant research regarding the Flynn effect is extensive. As summarized by McGrew (2015), “the consensus of the relevant scientific community is that the Flynn effect is real” (p. 158). Recently, two a meta-analysis of the Flynn effect research have provided a comprehensive “big picture” perspective regarding the Flynn effect research landscape. Trahan et al. (2014) and Pietschnig and Voracek (2015) are the most comprehensive, statistically sound, and notable Flynn effect research synthesis to date. Pietschnig and Voracek (2015) located (and quantified and synthesized) 219 independent studies that yielded 271 independent samples, comprising approximately 4 million participants (n=3,987,892). Trahan et al. (2014) located over 4,000 (n=4,383) research articles (or results in IQ test manuals) that produced 285 studies (reflecting over 14,000 participants) related to the Flynn effect, from which they extracted 378 Flynn effect relevant comparisons. The key findings across these two major meta-analysis are that the Flynn effect exists and that the current rule-of-thumb (3 IQ points per decade) is a robust finding regarding full scale IQ scores. Although a robust finding, there currently is no agreement on what factors cause the Flynn effect (Trahan et al., 2014)—the cause is likely multivariate in nature.”

So in this country it is still true that when individuals are assessed using tests that are old (but still currently in use) or out of date their scores when compared to the original norming sample may be inflated. This is why most standardized assessments are periodically renormed. While this effect is for most of us not a life and death concern, it can suddenly become one when arguing a death penalty case.

One on-line reference on death penalty cases that also extensively reviews research on the Flynn Effect is Chapter 10 by Kevin McGrew in a book entitled “The Death Penalty and Intellectual Disability.” The book of course is copyrighted and costs $39.95 from AAID, but the chapter download at the preceding link is free.

Another (shorter) reading that also cites recent research includes:

The Flynn Effect (Human Intelligence)

While the trend being reported elsewhere in the world may come to pass in our country, right now the evidence for that is scanty. Nevertheless here are some of the more recent articles in the popular press sounding an alarm.

People are Getting Dumber: The Flynn Effect Goes into Reverse (Reason)

Are we getting more stupid? (Daily Mail)

Is Human Intelligence Decreasing? (Appalachian Magazine)

It would be Stupid to Ignore a Drop in Human Intellect. (New Scientist)

A very brief off-line bibliography follows (courtesy of Dr. McGrew)

Pietschnig, J., & Voracek, M. (2015). One century of global IQ gains: A formal meta-analysis of the Flynn effect (1909–2013). Perspectives on Psychological Science, 10(3), 282-306.

Trahan, L. H., Stuebing, K. K., Fletcher, J. M., & Hiscock, M. (2014). The Flynn effect: A meta-analysis. Psychological bulletin, 140(5), 1332.

Dutton, E., van der Linden, D., & Lynn, R. (2016). The negative Flynn Effect: A systematic literature review. Intelligence, 59, 163-169.

Practice Effect (Updated 4/18/2018)

From time to time, test results from a particular test administration reflect an unexpected change from a previous test administration. This happens more frequently when a student is administered one version of a test and then given a revised version of that test three years later. Or when a student is given an assessment appropriate for age level (e.g. the WISC 5) and then a similar test (e.g. the WAIS IV) appropriate for age several years later.

General rule: You cannot compare apples to oranges. Never. Different tests will give different scores. Always.

Legal requirements. Schools are not required to evaluate a student more than once in the same year although they agree to do so even when requested by a teacher or a parent.

Legal Reference:

§ 300.303 Reevaluations.

(a)General. A public agency must ensure that a reevaluation of each child with a disability is conducted in accordance with §§ 300.304through 300.311 (1) If the public agency determines that the educational or related services needs, including improved academic achievement and functional performance, of the child warrant a reevaluation; or
(2) If the child’s parent or teacher requests a reevaluation.
(b)Limitation. A reevaluation conducted under paragraph (a) of this section –
(1) May occur not more than once a year, unless the parent and the public agency agree otherwise; and
(2) Must occur at least once every 3 years, unless the parent and the public agency agree that a reevaluation is unnecessary.
(Authority: 20 U.S.C. 1414(a)(2))

In general, nationally normed, standardized tests should never be administered more than once in the same year because of the impossibility of discerning whether changes are the result of some condition attributable to the student or simply the result of practice effect.

Of course if there were no questions as to the student’s eligibility or his/her special educational needs or needs for related services, the IEP team would be under no obligation (and have no reason) to have the student re-tested on a nationally normed test in the first place. (Section 300.305(c))

There are multiple on-line references that can be accessed through Google from more authoritative sources. One of the best this reviewer has found regarding IQ testing practice effects was an on-line article by Kevin McGrew in 2011. While the specific tests mentioned in the review are now for the most part outdated, the conclusions are as relevant today as they were then. That article is reprinted in its entirety below. Readers are also advised, whenever possible, to review the test’s technical manual wherein information regarding the effects of re-testing over the short term are often provided.

Monday, January 31, 2011

IQ test “practice effects”

A practice effect is a major psychometric issue in many Atkins cases, given that both the state and defense often test the defendant with the same IQ battery (most often a Wechsler), and often within a short test-retest interval. Click here to view all ICDP posts that mention practice effects.

Dr. Alan Kaufman has summarized the majority of the literature on practice effects on the Wechslers. He published an article in The Encyclopedia of Intelligence (1994; Edited by Robert Sternberg) that summarized the research prior to the third editions of the Wechsler scales. That article is available on-line (click here).

The most recent summary of the contemporary Wechsler practice effect research is in Lichtenberger and Kaufman (2009) Essentials of WAIS-IV Assessment (p. 306-309). The tables and text provide much about WAIS-IV and some about WAIS-III. The best source for WAIS-III is Kaufman and Lichtenberger, Assessing Adolescent and Adult Intelligence (either the 2002 second edition or the 2006 third edition), especially Tables 6.5 and 6.6 (2006 edition). Below are a few excerpts from the associated text from the 2006 edition

“Practice effects on Wechsler’s scales tend to be profound, particularly on the Performance Scale” (p. 202)

“predictable retest gains in IQs” (p.202)

“On the WAIS-III, tests with largest gains are Picture Completion, Object Assembly, and Picture Arrangement”

“Tests with smallest gains are Matrix Reasoning (most novel Gf test), Vocabulary and Comprehension

Block Design improvement most likely due to speed variance–“on second exposure subjects may be able to respond more quickly, thereby gaining in their scores” (p. 204)

One year interval results in far less pronounced practice effects (p. 208).

“The impact of retesting on test performance, whether using the WAIS-III, WAIS-R, other Wechsler scales, or similar tests, needs to be internalized by researchers and clinicians alike. Researchers should be aware of the routine and expected gains of about 2 1/2 points in V-IQ for all ages between 16 and 89 years. They should also internalize the relatively large gain on P-IQ for ages 16-54 (about 8 to 8 1/2 points), andn the fact that this gain in P-IQ swindles in size to less than 6 points for ages 55-74 and less than 4 points for ages 75-889” (p. 209).

“Increases in Performance IQ will typically be about twice as large as increases in Verbal IQ for individuals ages 16 to 54” (p. 209)

Finally, the latest AAIDD manual provides professional guidance on the practice effect.

“The practice effect refers to gains in IQ scores on test of intelligence that result from a person being retested on the same instrument” (p. 38)

“..established clinical practice is to avoid administering the same intelligence test within the same year to the same individual because it will often lead to an overestimate of the examinee’s true intelligence” (p. 38).

– iPost using BlogPress from my Kevin McGrew’s iPad

Updated list of tests currently up for revision (11/22/2017)

Pearson is currently working on these revisions:

Bayley 4

Bracken 4

Brown ADD Scales

CELF-5 Spanish

D-KEFS 2

ESI-3

PPVT-5/EVT-3

PPVT-5/EVT-3 Spanish

MACI

WAIS 5

WIAT 4

WMS 5

(Submitted to the School Psychology Listserv by

Robert Walrath, PsyD

Associate Professor in Education and Counseling

Director of Clinical Training

Doctoral Program in Counseling and School Psychology in November, 2017)

PowerPoint Presentation by John Willis and Ron Dumont on Using Test Scores to Document Progress (Added on July 28, 2016)

ANNUAL PROGRESS:
MAINTAINING THE SAME RELATIVE POSITION

A Year’s Growth in a Year’s Time?

Track Meet Analogy Ron’s version

Testing Information (Very General)

Why are full scale or total scores on tests of intelligence sometimes lower (or higher) than any subscale scores OR Why is the whole not the sum of its parts?

Ron and John offered the following explanation in the paper below.
MNEMONICS FOR FIVE ISSUES IN THE IDENTIFICATION OF LEARNING DISABILITIES

For a more specific discussion of WJ IV GIA scores sometimes appear higher or lower than their composties, click on the link below to Kevin McGrew’s website.
Why GIA scores sometimes appear higher or lower

And for a simple explanation about why IQ scores are not good predictors of achievement, see John Willis’s
How Can a Person’s Reading Score be Higher than Their IQ (7/16/2016) (This brief article was written in partial refutation of a prosecutor’s unusual argument that if someone could read, he or she could not be intellectually disabled. )

Errors on Cognitive Assessments Administered by Graduate Students and Practicing School Psychologists

Scoring Errors (Introduction by John Willis). I recently made minor contributions to an article being submitted for publication: “Wechsler Administration and Scoring Errors Made by Graduate Students and School Psychologists” by Erika Rodger and Ron Dumont. Dr. Rodger had the opportunity, working as a teaching assistant in graduate assessment courses over several years, to review a whole raft of WISCs and WAISs inflicted on unsuspecting victims by master’s and doctoral candidates, and she managed to collect a bunch of Wechsler scales administered in real life by practicing psychologists. Her detailed, carefully analyzed, and thoughtfully and clearly discussed findings are not cause for optimism. It is for us, the evaluators, to be dedicated to the unfinished work of administering and scoring tests accurately. It is for us to be here dedicated to the great task remaining before us of reading directions and items exactly as written in the manual, of recording all responses verbatim, of using the manual to score items correctly, of recording elapsed times and adhering to time limits, of awarding bonus points correctly, of performing simple arithmetic accurately, of looking up and recording scores accurately, using straightedges as needed, of verifying that we entered raw scores correctly in computerized scoring programs, and of copying scores correctly into our reports. It is for us, the evaluators, to take increased devotion to the cause of accurate testing and reporting so that our examinees shall not have been tested in vain. My take on some of Dr. Rodgers’s data was that experienced examiners sometimes seem to think that their personal judgment is more valid that the normative procedures. We must do better. (Additional bibliography followsl)

Erika Rodger, a student of Ron Dumont, completed a dissertation documenting the kinds of errors by both experienced school psychologists and graduate students. In her Introduction to the dissertation, Erica writes,

Cognitive assessments are prevalent in U.S. history and policy, and are still very widely used for a variety of purposes. Individuals are trained on the administration and interpretation of these assessments, and upon completion of a program it should be assumed that they are able to complete an assessment without making administrative, scoring, or recording errors. However,
an examination of assessment protocols completed by students as well as practicing school psychologists reveals that errors are the norm, not the exception. The purpose of this study was to examine errors committed by both master’s and doctoral-level students on three series of cognitive assessments as well as errors made by practicing school psychologists.

To read her dissertation, click on:

Erika Rodger Test Errors

Alfonso, V., Johnson, A., Patinella, L., & Rader, D. (1998). Common WISC-III examiner errors: Evidence from graduate students in training. Psychology in the Schools, 35, 119-125.

Belk, M., LoBello, S., Ray, G., & Zachar, P. (2002). WISC-III administration, clerical, and scoring errors made by student examiners. Journal of Psychoeducational Assessment, 20, 290-300.

Brazelton, E., Jackson, R., Buckhalt, J., Shapiro, S., & Byrd, D. (2003). Scoring errors on the WISC-III: A study across levels of education, degree fields, and current professional positions. Professional Educator, 25(2), 1-8.

Conner, R., & Woodall, R. E. (1983). The effects of experience and structured feedback on WISC-R error rates made by student examiners. Psychology in the Schools, 20, 376-379.

Egan, P., McCabe, P., Semenchuk, D., & Butler, J. (2003). Using portfolios to teach test scoring skills: A preliminary investigation. Teaching of Psychology, 30(3), 233-235.

Erdodi, L., Richard, D., & Hopwood, C. (2009). The importance of relying on the manual: Scoring error variance in the WISC-IV vocabulary subtest. Journal of Psychoeducational Assessment, 27, 374-385.

Lee, D., Reynolds, C. R., & Willson, V. L. (2003). Standardized test administration: Why bother? Journal of Forensic Neuropsychology, 3, 55-81. doi:10.1300/J151v03n03_04

Loe, S., Kadlubek, R., & Marks, W. (2007). Administration and scoring errors on the WISC-IV among graduate student examiners. Journal of Psychoeducational Assessment, 25, 237-247.

Patterson, M., Slate, J., Jones, C., & Steger, H. (1995). The effects of practice administrations in learning to administer and score the WAIS-R: A partial replication. Educational and Psychological Measurement, 55, 32-37.

Sherrets, S., Gard, G., & Langner, H. (1979). Frequency of clerical errors on WISC protocols. Psychology in the Schools, 16(4), 495-496.

Slate, J. R., & Chick, D. (1989). WISC-R examiner errors: Cause for concern. Psychology in the Schools, 26, 74-84.

Slate, J. R., & Jones, C. H. (1990). Student error in administering the WISC-R: Identifying problem areas. Measurement & Evaluation in Counseling & Development, 23, 137-140.

Slate, J. R., & Jones, C. H. (1993). Evidence that practitioners err in administering and scoring the WAIS-R. Measurement & Evaluation in Counseling & Development, 20(4),156-162.

Slate, J. R., Jones, C. H., Coulter, C., & Covert, T. L. (1992). Practitioners’ administration and scoring of the WISC-R: Evidence that we do err. Journal of School Psychology, 30, 77-82.

Slate, J. R., Jones, C. H., Murray, R. A., & Coulter, C. (1993). Evidence that practitioners err in administering and scoring the WAIS-R. Measurement and Evaluation in Counseling and Development, 25, 156-161.

Warren, S. A., & Brown, W. G. (1972). Examiner scoring errors on individual intelligence tests. Psychology in the Schools, 9, 118-122

Miscellaneous Topics includes previously published articles on a variety of related subjects including ADHD, child sexual abuse and other school related subjects such corporal punishment, grade level retention, school psychologist ratios

Miscellaneous Topics

Wechsler Individual Achievement Test II. Thirteen, more or less, comparisons, charts, and mini articles. Click on link below for complete table of contents.

https://www.myschoolpsychology.com/WIAT II.pdf

Relating Assessment Results to Accommodations. Republished guidance from the NICHCY website (closing down on 9/30/2014).

nichcy.org-Assessment_and_Accommodations

Scoring Errors Necessitate Double Checking (John Willis)

Scoring Errors Necessitate Double Checking Protocols

Partial Bibliography of Test Reviews by Ron Dumont, John Willis, Kate Viezel, and Jamie Zibulsky

Encyclopedia of Special Education Encyclopedia of Special Education 2