This Achievement Tests Essay example is published for educational and informational purposes only. If you need a custom essay or research paper on this topic, please use our writing services. EssayEmpire.com offers reliable custom essay writing services that can help you to receive high grades and impress your professors with the quality of each essay or research paper you hand in.
An achievement test (e.g., a bar exam) is an assessment of one or more persons’ knowledge or skills (typically during a specific period of time). (In the remainder of this entry, “test” only refers to an achievement test.) In addition to evaluating individuals’ competences, tests can assess (a) their learning, (b) teachers’ impact on their learning, and (c) the school or school system’s effects on their learning. Testing can also influence individuals’ behaviors before, during, and after the exam. Designing suitable tests requires creating specific test questions (or test items) and testing them with advanced statistics. Country, family, school system, school, schoolmate, teacher, and individual characteristics all influence achievement test scores. Likewise, achievement test scores are linked to future academic performance, the graduation rate, job status, and income.
Beginning with China’s imperial civil service exam (or Keju in Chinese) in 206 b.c.e., people have used tests to select among candidates based on their knowledge rather than on favoritism, nepotism, or bribery. As tests typically enable the selection of some candidates with less wealth or social status, they highlight the system’s openness to others beyond a closed elite. Hence, this test system encourages candidates to view it as fair and based on merit, resulting in its greater legitimacy.
Tests can assess a person’s learning during the time between a pretest and a posttest (posttest score minus pretest score). To estimate the impact of teachers on student learning, evaluators can use the differences in students’ annual test scores in a large, longitudinal data set of many teachers and students across many years in an analysis that controls for the possible effects of the characteristics of students and their families. Similarly, cities and countries can use annual student tests to evaluate the effectiveness of their schools. Without extensive, longitudinal data and controls, however, estimates of the effectiveness of teachers, schools, and school systems can be biased and misleading.
Testing can influence individual behaviors before, during, and after the exam. When people are informed in advance about an exam whose result has consequences for them, they are more likely to prepare for the exam (by studying, practicing, etc.) and, thereby, perform better than otherwise. During a test, people who are concerned about its consequences or unfamiliar with its format might feel test anxiety and thus perform worse than otherwise. After an exam, its results can provide useful feedback about a person’s performance and inform a plan for further study or instruction, thereby improving future performance. Thus, testing itself can change people’s behaviors.
Creating tests that accurately assess knowledge or skills requires appropriate selection of content for the target population and a purpose to design a suitable test for subsequent analysis. As most knowledge or skills are specific to a domain (e.g., geometry), test designers must select the content that they will assess. Ideally, the targeted content is a coherent, integrated set of knowledge and skills that are intimately related to one another (not just a list of disparate ideas and behaviors). If a test covers coherent content, its score can provide meaningful interpretation about a person’s competence in that content area.
Tests are also designed for specific populations and purposes. For example, a high school graduation test of biology test items focuses on basic, general concepts. In contrast, a biology test to inform the awarding of college scholarships has more test items about advanced ideas and their relationships.
Last, a midterm biology test with open, specific questions about the human respiratory system can help a high school teacher assess students’ understanding and inform her or his teaching of the human circulatory system to them. Hence, a test must suit both the population and the purpose.
Test designers aim to create tests that can be graded fairly and consistently at low cost—typically, analytic tests (rather than holistic tests) with objectively evaluated test items. Holistic tests ask students to address a major problem or question (e.g., What causes climate change?), to assess participants’ executive skills (i.e., their ability to plan, organize, integrate, etc.). However, evaluators might not agree on a single score on a participant’s holistic test, which can raise questions about its legitimacy. Hence, holistic tests are often scored along multiple dimensions (e.g., content, organization), using rubrics with exemplars for each score along each dimension and with explicit boundaries between adjacent scores.
Analytic tests have separate items covering different, specific knowledge (e.g., questions requiring short answers, true/false items, multiple-choice questions, matching answers to questions, etc.). If these analytic test items have clear, objective answers that can be clearly evaluated as correct or incorrect, they allow for transparency and allow stakeholders to view and agree on the fair evaluation of participant responses, thereby enhancing a test’s legitimacy. Short-answer questions, true/false items, and matching have critical weaknesses compared with multiple-choice questions. Evaluators might not know how to score unexpected short answers. Meanwhile, students who do not know the answer to a true/ false question have a 50-percent probability of guessing the correct answer but only a 20-percent probability for a multiple-choice question with five possible answers. Hence, each multiple-choice question is more likely than a true/false item to distinguish among students of different competences. Meanwhile, multiple-choice questions often have several choices that are nearly correct, but matching problems cannot include a comparable number of such answers without severely taxing participants’ short-term memories.
Tests must be inexpensive to design, administer, and evaluate. While holistic tests are easy to design and administer, they are costly to evaluate, requiring extensive time to prepare a rubric, to train evaluators, and for them to grade the tests. In contrast, short-answer and multiple-choice questions require extensive time to design and slightly more time and cost to produce and administer, but they can be evaluated quickly and correctly, especially multiple-choice questions, when using computers. For a teacher assessing a classroom of students, a holistic test can yield more information at low design and administration costs and at tolerable evaluation costs than less informative tests with only short-answer or multiple-choice questions. However, tests with short-answer or multiple-choice questions are preferable when evaluating large populations (e.g., for school entrance exams). The remainder of this entry focuses on multiple-choice tests.
Test designers aim to create a bank of multiplechoice test items that cover the target content, range in difficulty, and are of high quality. Each test item evaluates a person’s knowledge of specific target content. Typically, each test item has one correct answer, and the other choices receive no credit (in some tests, some choices can receive partial credit). Furthermore, each test item has a specific level of difficulty. Last, high-quality items distinguish reliably between participants who are above and those who are below a specific level of competence (i.e., those who can vs. those who cannot answer the questions correctly). Ineffective, low-quality test items might be misunderstood, too easy, too hard, misleading, or too easy to guess correctly.
To evaluate the quality of the test items, they are bundled into multiple tests and administered to people, whose responses are assessed. Each pair of tests has common test items (anchors) that allow for scores on all tests to be calibrated to the same scoring scale. The people selected to take these preliminary tests should have competences similar to the target test population’s range of competences. For example, new items on tests like the ACT (originally, American College Testing) or the SAT (Scholastic Aptitude Test) are introduced as experimental sections on tests given to current students.
Advanced statistical analyses of test responses estimate the competence of each participant and the attributes of each test item. The competences of the participants indicate whether the test items (or a subset of them) cumulatively serve their function of distinguishing participants from one another along a single scale. For example, a scholarship test that results in high scores for most participants is too easy, so the easy test items should be dropped or redesigned to be more difficult.
The estimated attributes of each test item show its relative alignment with the target content, its difficulty level, its quality, its likelihood of guessing success, and its bias against subsamples of participants (through factor analyses, item response test analysis, and differential item functioning analysis). First, the analysis determines whether the test items reflect one or more underlying target content competences. If most test items align along one competence with a few items aligning along other competences, the latter items likely assess irrelevant competences and are discarded or revised (another possibility is that the target content requires substantial reconsideration). Second, items that are much easier or harder than expected are recategorized, revised, or discarded. Third, high-quality items are retained, and low-quality items are revised or discarded. Fourth, test items with high rates of guessing success by low-competence participants are revised or discarded. Last, among subgroups of participants with similar competence estimates (e.g., males vs. females; Asians vs. Latino), test items that are much easier for one group than for another are revised, discarded, or tagged for use with only homogeneous samples (e.g., only females).
Influences On Test Scores
Country, family, school, and individual characteristics influence test scores. Countries that are richer or more equal have higher test scores. People in countries with higher real gross domestic product per capita (e.g., Japan) often capitalize on their country’s greater resources to learn more and score higher on tests. Furthermore, countries with a less equal distribution of family income often experience diminishing marginal returns; a poor student likely learns more from an extra book than a rich student would. Thus, in more equal countries (e.g., Norway), poorer students often have more resources and benefit more from them than richer students, resulting in higher achievement and test scores overall in these countries.
Some family members (e.g., parents) provide family resources, but others (e.g., siblings) compete for them. Children in families with more human capital (e.g., education), financial capital (wealth), social capital (social network resources), and cultural capital (knowledge of the dominant culture and society) often use these resources to learn more. When a person has more siblings (especially older ones), they compete for these limited resources, resulting in less use of shared resources, less learning, and lower test scores (resource dilution).
Students from privileged families often attend schools with privileged schoolmates, large budgets, and effective teachers. Privileged schoolmates’ family capital, material resources, diverse experiences, and high academic expectations often help a student learn more and score higher on tests. These schools often have larger budgets, better physical conditions, and more educational materials, which can improve their students’ learning and test scores compared with those of other schools. Students from privileged families often benefit from attending schools with higher teacher-to-student ratios and better-qualified teachers. Superior teachers often maintain better student discipline and better relationships with their students—both of which are linked to higher student achievement.
Studies of school competition show mixed results. Some natural experiments suggest that in schools facing greater competition (there are more schools in some districts because of natural phenomena such as rivers), students have higher test scores. When school closures are anticipated or announced, their students have lower test scores, but surviving schools show higher test scores. Meanwhile, studies of school choice and of traditional versus charter schools show mixed results.
Student genes, cognitive ability, gender, attitudes, motivation, and behaviors also influence test scores. Genes contribute to student cognitive ability and test scores, but studies of separated twins and siblings suggest that genetics account for less than 15 percent of the differences in people’s test scores. Girls outperform boys on school tests at every age level in nearly every subject, in part because girls have better attitudes toward school, feel a greater sense of belonging at school, are more motivated, attend school more regularly, and study more and suffer from fewer behavioral discipline problems— all of which are linked to higher test scores.
Girls also outperform boys on standardized reading tests, but boys score higher on standardized mathematics tests. This latter result stems from school tests’ ceiling effects on boys with high mathematics ability and from girls having greater test anxiety during consequential, standardized tests.
Test scores are linked to future test scores, graduation rates, further study, and better jobs. People with higher test scores tend to score higher on future tests (with some regression to the mean). As graduation and further study largely depend on academic performance, students with high test scores are more likely to graduate from school and more likely to pursue higher degrees. Those who graduate from college or have advanced degrees have higher-status jobs and earn higher incomes. However, these relationships weaken over time. For example, people with high mathematics test scores one year are likely to have high mathematics test scores the next year too; however, they are only somewhat more likely to earn more than others in 10 years’ time.
- Baker, Frank and Seock-Ho Kim. Item Response Theory. Boca Raton, FL: CRC Press, 2004. Chiu, Ming Ming. “Inequality, Family, School, and Mathematics Achievement.” Social Forces, v.88/4 (2010).
- Chiu, Ming Ming and Robert Klassen. “Relations of Mathematics Self-Concept and Its Calibration With Mathematics Achievement.” Learning and Instruction, v.20 (2010).