Standardized testing

Abstract

Standardized Testing has been and continues to be a controversial and widely debated subject from the local school teacher’s lounge to the highest levels of academia; from internet blogs to Congressional chambers. Even amongst the most bitter rivals in support or opposed to Standardized Testing, there remains a consistent agreement that there must be a way to measure the outcomes of our national education system in order to establish real and meaningful educational reform. The ongoing debate, however, surrounds the means by which those assessment standards are developed and applied to achieve the most valid and reliable results. Signed into law by President George W. Bush on January 8, 2002, the No Child Left Behind Act of 2001 (NCLB) is “An act to close the achievement gap with accountability, flexibility, and choice, so that no child is left behind”. The NCLB Act emphasizes four key components of education reform; Increased Accountability; More Choices for Parents and Students; Greater Flexibility for States, School Districts, and Schools; and Putting Reading First. As related to the area of Increased Accountability, I will strive to establish a clear understanding of the role of Standardized Testing as a valid and reliable measurement tool for assessing minimum standards of proficiency and the Annual Yearly Progress (AYP) of local school systems.

Standardized Testing Defined

According to the Educational Measurement Group of Pearson, a test is standardized when it is developed, administered, and scored using established procedures and guidelines that are set to insure that every student is provided the same opportunity.

[http://www.pearsonedmeasurement.com/research/faq_1a.htm]

The definition of Standardized Testing, as adopted by the U.S. Department of Education and published through the International Affairs Office in February 2008, is:

Standardized tests are scientifically normed and machine-graded instruments administered to students and adults under controlled conditions to assess capabilities, including knowledge, cognitive skills and abilities, and aptitude. They are used extensively in the U.S. education system at all levels to assist with admissions, placement, and counseling decisions. [http://www.ed.gov/international/usnei/edlite-index.html ]

By design, states are given the autonomy and flexibility to develop or adopt the assessment tools they feel best reflect their understanding of the Federal NCLB Act. All states have established standards and guidelines to govern the implementation of the assessment process to insure full compliance with federal law. The Georgia Department of Education has established the following Mission Statement as related to the state’s assessment program.

“The purposes of the Georgia Student Assessment Program are to measure student achievement of the state mandated curriculum, to identify students failing to achieve mastery of content, to provide teachers with diagnostic information, and to assist school systems in identifying strengths and weaknesses in order to establish priorities in planning educational programs.” [http://www.doe.k12.ga.us/ci_testing.aspx]

Standardized Testing Formats

Standardized tests can be basically divided into two broad categories; Norm-Referenced and Criterion Referenced tests. Norm-Referenced Tests (NRT) establish a national baseline of general basic skills using the test results of a national sample. Students are then tested and their scores are nationally ranked to determine their academic progress in relation to other students of the same grade level. Criterion Referenced Tests (CRT) are more specific in scope and assess a student’s individual performance as compared to clearly defined standards and learning objectives.

Norm-Referenced tests are very useful to local school system administrators in establishing programs, allocate resources, and implementing system-wide changes as necessary to insure that every student is receiving the highest level of education available. For example, if a local school system’s performance ranks very high on a national level, then a lower performing school may decide to find out what that system is doing well and adopt and implement similar programs to improve system-wide performance.

On the other hand, Criterion-Referenced tests are more useful in determining individual student performance. In Georgia, the CRCT (Criterion-Referenced Competency Test) is defined by the Georgia Department of Education as a test that â€œis designed to measure how well students acquire the skills and knowledge described in the Georgia Performance Standards (GPS)â€ and is used â€œto diagnose individual student strengths and weaknesses as related to the instruction of the GPS, and to gauge the quality of education throughout Georgia.â€ [http://www.doe.k12.ga.us/ci_testing.aspx?PageReq=CI_TESTING_CRCT]

To provide a framework for the content of this paper, I will breakdown and discuss the individual components of the Mission Statement of the Georgia Student Assessment Program as related to the various stakeholders in the assessment process.

Measuring student achievement of the state mandated curriculum

It is paramount to the success of any assessment program to establish a clearly defined process to measure that success. It is also critical to insure that the results of the measurement process are both reliable and valid. A test that is reliable but not valid is useless as a measurement tool. Likewise, a test that is valid but unreliable has no value in the assessment process.

Reliability in statistical analysis deals with how the assessment is constructed and administered. In other words, all things related to the test being equal, the test results should vary little between similar test groups. If similar groups of students are taught the same curriculum under similar educational environments by highly qualified teachers, then the test results should also be similar and the test can be deemed reliable.

Validity focuses primarily on the connection between intent and result. The question that must be asked of any test is: Does the test actually measure what I am trying to measure? Many tests are unfortunately written in such a way that the actual knowledge and skills measured have nothing to do with the content area in question. Some tests are constructed in such a way that they do not determine the level of student knowledge and comprehension, but reveal the student’s testing skills. If this is the specified intent of the test, then the results could be considered valid. If the intent is to measure how well the student understands 16th Century English Literature, then the test is not valid and should not be used as an assessment tool.

The Georgia Performance Standards (GPS) were established by the Georgia Department of Education in response to declining student performance and to define a clear framework for accountability to a standard curriculum. A “Standards Based Classroom” is defined as “a classroom where teachers and students have a clear understanding of the expectations (standards). They know what they are teaching/learning each day, why the day’s learning is an important thing to know or know how to do, and how to do it. They also know that they are working toward meeting standards throughout the year.” The GPS establishes a solid foundation for determining the reliability and validity of the assessment process for measuring student achievement by creating a benchmark for minimum proficiency.

Identifying students failing to achieve mastery of content

Standardized tests are not designed nor are they intended to replace summative or formative assessments that measure consistent progress on a daily, weekly, or unit level. They do, however, provide parents and students feedback on critical basic skills and concepts as defined by a standard curriculum that measures all students of the same grade level against the same standards. The critics against standardized tests argue that it is impossible to create a single test that accurately measures everything that a student truly knows. While this may seem to be a valid argument on the surface, it should be pointed out that standardized tests are not designed to be a â€œone size fits allâ€ solution to educational reform. Standardized tests are assessment tools that are only useful when used in the right environment by skilled educators for a specified purpose. Once the outcomes are compiled and the results are published, then hours of statistical analysis are required to make sure that the results are reliable and valid based on clearly defined standards.

Providing teachers with diagnostic information

The more information that is available to teachers. Teachers as well as students must consistently strive to expand their knowledge base and develop their skills. Standardized tests provide valuable feedback

Assisting school systems in identifying strengths and weaknesses

According to the National Center for Education Statistics, the Georgia public education system incorporates 180 individual school districts with 2601 schools responsible for educating the nearly 1.65 million students enrolled. [http://nces.ed.gov/nationsreportcard/states/]

Georgia law requires that a nationally norm-referenced test be given every year to students in grades three, five, and eight. The law also requires that this test include the content areas for math, science, social studies, and reading. Georgia has adopted the Iowa Tests of Basic Skills, Form A (ITBS/A) and uses the results of this test to determine how the performance of Georgia’s students compare with the results of a national sample. The outcomes help both state and district administrators make decisions related to resource management, curriculum development, and

Order Now