High-stakes tests are examinations used to make critical decisions about examinees and those who work with the examinees. The hallmark of a high-stakes test is that the results are associated with consequences for those connected to the assessment, such as graduates of professional programs, students in public schools, and teachers and administrators in public schools. High-stakes tests contribute to making decisions about examinees and institutions in many societies, including the United States, Germany, Japan, and Singapore. The consequences of such tests include benefits and detriments. Professional organizations offer guidance in the implementation of high-stakes testing programs that, if followed, should result in fewer of the negative consequences currently associated with high-stakes tests. This entry describes high-stakes tests, their consequences, and strategies for ensuring their fairness and contribution to quality education.

How Tests Work

Examples of high-stakes tests include examinations for high school graduation, college credit (e.g., Advanced Placement or International Baccalaureate), college admissions (e.g., SAT; ACT; Graduate Record Examination, or GRE; and licensure such as the U. S. Medical Licensing Examination). In public schools, high-stakes tests have been used by policy makers to hold students and educators accountable for student outcomes. At the postsecondary level, admissions offices use test results to predict which applicants will most likely be successful at their institution. In addition, examinations have been used to award college credit for coursework completed in high school. In the case of licensure examinations, the purpose of the tests is to assure the public of the qualifications of aspiring professionals.

The interpretation of high-stakes test scores might be norm or criterion referenced. Norm-referenced interpretations are based on the comparison of an examinee’s score with scores of other examinees. The ACT, SAT, and GRE are all examples of high-stakes tests that provide norm-referenced interpretations. For example, an examinee’s score of eighty-eighth percentile on the GRE indicates that her raw score (i.e., number of test items correct) was higher than the scores of 88 percent of the other examinees. In contrast, criterion-referenced tests in state testing programs use a student’s item-correct score to classify his performance as basic, proficient, or advanced. Such criterion-referenced interpretations do not provide information about the student’s performance as compared to other examinees.

In terms of response format, some high-stakes tests use only a multiple-choice format, whereas others use multiple-choice and constructed-response items. For example, some states administer end-of-course examinations that contribute to a student’s final course grade. In some instances these examinations use only the multiple-choice format. The GRE, however, incorporates both multiple-choice items and an analytic writing component.

The Stakes

The federal No Child Left Behind legislation (NCLB, Public Law 107-110) provides an example of public policy that requires high-stakes testing. NCLB requires states to test all students in Grades 3–8 annually in (a) reading or language arts and (b) mathematics. In addition, testing is required in science at one grade level in the grade spans of 3–5, 6–9, 10–12. Also, NCLB requires states to test high school students in one grade level annually. The state tests must describe two levels of high achievement (proficient and advanced) to gauge student mastery of the state content standards and a level of basic achievement to gauge the progress of lower achieving students toward attaining higher achievement levels.

Test results are disaggregated and reported by ethnicity, poverty level, disability, and English language learners (ELLs). The target is for all students to be at the proficient or higher level in reading or language arts and mathematics by the 2013–2014 school year. Schools that do not meet adequate yearly progress (AYP) targets are required to develop school improvement plans and the school district must provide students and parents with public school choice. Schools that do not meet AYP for several years potentially face such sanctions as restructuring, dismissal of staff, and external oversight.

The types of stakes associated with a high-stakes testing program vary across constituencies. To continue the school example, in some states, schools with high test scores receive financial awards or public recognition. The stakes are raised in terms of public awareness when states publish school report cards that contain test scores and ratings (A to F, Excellent to Unsatisfactory) based on the school’s test scores. Stakes for schools, and for neighborhoods, increase when realtors provide families with a school’s test scores to sell a house.

The stakes for educators include the award of bonuses or pay increases for teachers if test scores are high. In low-performing schools, possible sanctions for teachers include denial of tenure, dismissal, reassignment, and withholding of salary increases. Administrators in low performing schools may be dismissed or receive a salary reduction, whereas in high-performing schools, an administrator may receive a bonus.

In the case of students, high-stakes tests have been used to determine grade-level promotion. Tests are used to track students in classes based on their achievement levels. Tests inform decisions about the qualification of students for special education services (e.g., gifted, learning disability). Scores on high-stakes tests determine whether students meet high-school graduation requirements and whether students receive diplomas of distinction. At the end of secondary school, students complete examinations that are used to make college admission decisions and some students receive scholarships based on the test scores. In concluding their postsecondary education, aspiring professionals, such as students in medicine, law, and teaching, must pass a licensure examination.


High-stakes tests are associated with both beneficial and detrimental consequences. For example, the establishment and publication of content standards associated with state-level tests allow teachers and students to understand the important content that students must know and be able to do. Information from the tests can be used to identify problem areas in instruction and to plan changes. However, high-stakes tests typically cannot be used for diagnostic purposes because too few test items assess a specific content area for reliable reporting, and the scores typically are reported in the summer.

Another benefit is associated with the NCLB Act and its requirement that schools report students’ test results by socioeconomic level, disabilities, ELLs, and race/ethnicity. Such disaggregated reporting of students’ test results allows educators to examine whether all students are learning key content knowledge. Disaggregation allows monitoring of achievement gaps between examinees in, for example, high and low socioeconomic groups, and reduction of any achievement gap.

Harmful consequences associated with high-stakes testing include educators’ narrowing of the curriculum. One instance of narrowing the curriculum occurs when teachers focus instruction on those subject areas tested, such as reading and mathematics, and attend less to subjects not tested, such as history or art. Another form of narrowing of the curriculum is illustrated by the minimum competency tests of the 1980s. To prepare students for these tests of basic skills, teachers narrowed instruction in terms of depth in order to focus on basics. Thus, at the expense of student mastery of more complex skills, teachers narrowed their instruction to address minimal competencies. Also, narrowing occurs when learning activities are aligned with the test format. For example, teachers use commercial test-preparation materials in their instruction, replacing problem-based learning activities.

A consequence of the narrowing of the curriculum is that test scores may no longer accurately represent student learning. Scores on a test are indicators of student performance in the broader content domain from which the test items were sampled. The usefulness of test information depends on the degree to which the scores on a specific test represent what students can do in the broader content domain of interest. Scores on a high-stakes test become inflated as when teaching is based on the content of a specific test because the scores no longer reflect students’ understanding of the broader content domain.

Although high-stakes tests provide information for improving instruction, in some instances, educators narrowly target student groups for intervention. Such a consequence can be seen when a school decides to focus efforts on students with borderline scores on a high-stakes test and dedicates less resources to high or low-performing student groups.

High-stakes testing policies have focused attention on student groups that may have been ignored in the past; however, the negative consequences of high-stakes testing may disproportionately affect these same groups. African American and Hispanic students have high failure rates on graduation examinations. In addition, states that have scholarship programs to support students in their undergraduate studies have test-score requirements that impede minority students from qualifying for the scholarship.

Conditions For High-Stakes Testing

The increasing use of high-stakes tests as instruments of policy has led to publication of position statements by the American Educational Research Association and the American Evaluation Association, both national organizations of professionals who conduct research and evaluation in education. The position statements indicate that high-stakes testing programs in education should meet certain conditions. Included in the conditions are the following:

  • Decisions about students should be based on multiple, high-quality measures, not a single test. Critical decisions about grade-level promotion or high school graduation require that students have multiple opportunities to demonstrate their proficiency. In addition, if evidence indicates that the score from a test does not reflect a student’s actual proficiency, then alternate methods for assessing the student’s proficiency level are required.
  • Validity studies should examine the accuracy of test-score interpretations when using labels of “basic,” “proficient,” or “advanced” to indicate students’ proficiency or “passing” to describe examinees’ performance.
  • Students must be provided the opportunity to learn the content and cognitive skills that are tested prior to implementation of high-stakes policies. This requires evidence that the content has been integrated into both the curriculum and instruction prior to its use in a high-stakes context.
  • Rules designating which students are to be tested and which may be exempted from testing must be established and enforced if test results for schools or districts are to be compared or results compared over time.
  • Appropriate test accommodations should be made for ELL students and students with disabilities. Appropriate accommodations will assure the scores of ELL students and disabled students represent the intended construct, such as social studies or mathematics achievement, and not characteristics external to the construct, such as text reading level.
  • Students who fail a high-stakes test should be provided remediation in the knowledge and skills of the broad content domain that the test represents.
  • Each use of a high-stakes test must be validated. For example, a test used for making decisions about individual students would also require a study of the use of the test for making decisions about teachers or administrators.
  • The reliability of scores should be sufficient for each use. For example, the reliability of school means might be sufficient for making decisions about overall student performance; however, the reliability of subgroup means (e.g., for ethnic, socioeconomic groups) might be insufficient for making school improvement decisions.
  • The consequences of a high-stakes test should be evaluated and findings communicated to policy makers, educators, and the public.
  • Tests should align with the whole curriculum, not the portion that is easiest to assess.


