assessment – Peter Ponce

December 11, 2025

The Least Wonderful Time of the Year

No, this post is NOT about to be 2,600+ words bashing the Christmas season for being too commercialized or sappy or anything else. I’m actually loving the holiday season this year. Anyone who’s read this blog before probably knows why.

What this post IS about is STAAR Testing – specifically, STAAR End-Of-Course (EOC) Retests for those high school students who are cursed with the requirement of taking them. I will also share a personal story about what I now believe was proverbial “Testing Hades,” and how I helped get my staff (and myself) through it using humor.

I have previously confessed on this blog that the final dozen years of my K-12 education career had me wrestling with self-loathing because, as a so-called “assessment professional,” my biggest role was training, implementation, and support of a system that is deeply flawed at best. There are a host of reasons for these flaws, many of which would involve exposition that would truly be agonizing to read. What it comes down to is primarily inconsistency.

STAAR is fundamentally inconsistent as a matter of course because everything about its construction is constantly changing. The state curriculum of Texas public schools, known as the Texas Essential Knowledge and Skills (TEKS) is regularly revised. Subject area TEKS are reviewed and adjusted at least once every 10 years on a rotating basis. Schools don’t have to contend with full-scale revisions all at once, but teachers of a given subject know that they will have to adjust planning and teaching within a decade at minimum. While a decade doesn’t seem frequent on its face, the reality in a classroom is that it may take several years after revisions are published to determine the best way to approach certain TEKS standards instructionally. It’s not like anyone can just flip a switch, make a couple of tweaks, and all students magically respond positively to whatever the new TEKS are.

And as you might expect, the process of assessing any new TEKS learning standards can also be messy and flawed over several years. This is why TEA randomly selects districts and campuses for specific field tests every year, usually in February. A typical TEA Field Test is similar to an actual STAAR test, and campuses are required to implement them with the same approach and security protocols that they use for the actual STAAR test in the spring, except there’s no payoff for a field test. Whether students do well or poorly, they will never be rewarded or punished. They will never even know how they scored. Reporting of results is minimal, because the point is literally to test the test items themselves and determine if they’re reliable and valid. If you’re thinking that your children might become lab experiments each February so that psychometricians can analyze results, you would be correct.

This process is also why actual STAAR tests in the spring include “field test items” that may or may not actually be scored. And it’s why TEA will rewrite and revise items continually between STAAR testing cycles, which occur annually each April for Grades 3-8, but every April, June, and December for high school students. There’s a continual item analysis process to seek out a reliable and valid test, and as you might expect, some STAAR tests are more reliable and valid than others. Small comfort to a high school freshman who learns he has to retake an EOC even if the test itself was poorly constructed.

The revision process, combined with the desire of schools to “teach to the test” so that scores improve has created its own cottage industry. A legion of consulting companies have developed with intent of helping schools and their teachers to analyze released items from STAAR tests, connect those items to the TEKS standards being assessed, and determine ways to adjust instruction so that the teachers may better prepare students for what they might see on the STAAR. Is it good instruction? Sometimes. Your mileage may vary, as they say. But “teach to the test” has become scientific (or pseudo-scientific, in many cases), all in the name of accountability points, property values…and, oh yeah, “for the children.”

Speaking of those children – we all want them to excel, right? We want them to score well since it’s evidence that they’re learning, it’s good for self-esteem, yada, yada, yada. But of course, it’s not always so simple. Remember that this is a test that students will see only once per year. (Theoretically three times with upcoming legislative changes.) And when they see the test next year, it’ll be a bit different from last year because Grade 8 is different from Grade 7, which is different from Grade 6, and so on, even if the subject is still Reading or Math. It is no wonder that most schools see drops in Math scores, for instance, from Grade 4 to Grade 5, year after year, even as students change, because Grade 5 Math is typically a bit more challenging than Grade 4 Math. What’s more, the actual passing standard might be adjusted thanks to the magic of the “cut score.”

When you visit TEA’s website, you might run across the STAAR Performance Standards for Grades 3-8 and EOC. You’ll see austere tables listing the Scale Scores required for a given student to reach Approaches, Meets, or Masters Grade Level performance categories as handed down by the gods themselves…er, I mean, by TEA officials. These categories and scale scores are unchanged (thanks to the auspices of either Odin or Ra; I never remember which), but the real sausage is made after students have completed the tests, psychometricians have analyzed the results, and TEA constructs what’s known as a Raw Score Conversion Table. Take Grade 5 Math, for instance, which had 42 items that were scored in 2025. TEA takes every student raw score, from 1 to 42, and links it to a given Scale Score. Those links then determine what raw score a student needed to reach the Approaches / Meets / Masters level from the Performance Standards. It’s all computed AFTER THE FACT, because TEA looks at the distribution of student raw scores statewide before deciding where, in fact, “passing” will be located. In 2025, a Grade 5 student in Math needed to get 17 of 42 items correct for Approaches, 26 for Meets, and 34 for Masters. In 2026 and beyond, those raw scores could change depending on how every student in Texas fared on that year’s test. It’s a moving target every year. It might not move much, but it can move. Is it possible that public relations and political concerns can impact where the cut scores fall? You tell me. It was certainly curious that accountability ratings in 2023 and 2024 went to court, then in 2025 most districts and campuses in the state saw their STAAR scores and ratings increase. I don’t have the time or inclination to lay out a full analysis of the data over those years, but it sure was a curious coincidence.

The bottom line is that passing standards can be, and are, adjusted year-to-year as cut scores are linked to scale scores. “Passing the STAAR” is, itself, an exercise in inconsistency. What’s more, “passing” isn’t always the end for students in high school. Current TEA guidelines require students to reach Approaches Grade Level or higher in all 5 EOC-assessed subjects – English I, English II, Algebra I, Biology, and US History. (I won’t open the Individual Graduation Committee or the Substitute Assessment cans of worms; those are other deep-dive posts.) The students currently taking December EOCs are those who have previously not passed one or more tests, or who were Absent or otherwise missed their opportunities in April and June. When a student finally achieves those scores and passes their classes, graduation is on the horizon. Perhaps college? Not so fast. “Approaches Grade Level” is passing for graduation purposes only. TEA has a whole other set of standards for what is called “College, Career, and Military Readiness” (CCMR). And Approaches on STAAR EOC ain’t one of them. In fact, STAAR scores don’t matter AT ALL for CCMR accountability standards, and they won’t grant students access to Texas public colleges and universities…at least, not without some type of remedial learning. So now, high schools in Texas offer the TSIA2, SAT, and ACT at least once to all their students in an attempt to get as many students as possible to meet CCMR requirements. More testing for our high schoolers! Isn’t it grand? In fact, it’s several grand paid each year by the taxpayers, or by the students themselves.

Lest you think we only torture the high schools and their students, there’s a whole set of other accountability measurements that primarily impact elementary, intermediate, and middle schools and their students – the Progress Measure, brought to you by TEA through each year’s STAAR tests. The intent is actually well-meaning and fairly intuitive: Students should show growth, also called academic progress, in their year-to-year performance on STAAR Reading and Math tests. Easy, right? Of course not! You might think that an intuitive approach to growth would be that a given student should score at or higher than the previous year’s raw or scale score to show progress. Or perhaps there should be a set of score ranges that might overlap so that students wouldn’t be penalized for missing one more item than last year. But you would be wrong in both cases. Instead, TEA determines progress based on the student’s performance among the Approaches / Meets / Masters standards, which we’ve already established may change thanks to cut scores. Essentially, in order for a student to “meet progress” officially, that student must match or exceed the performance category from the previous year. If they reached Approaches last year, they must reach Approaches, Meets, or Masters this year. Here’s the problem: because those performance levels are matched directly to a specific raw score, it’s possible for a student to “not meet growth” based on a single test item. Consider the Grade 5 Math scores referenced earlier, and suppose a 5th grader in 2025 got 34 of 42 items correct in 2025 to reach Masters Grade Level. It just so happens that in 2025, Masters on Grade 6 Math also required a raw score of 34 out of 43 items. BUT, in 2026, IF the Masters Level ends up being raised to 35 items after TEA’s psychometric analysis, AND this same student gets the same raw score, 34 on Grade 6 Math, the student will actually DROP to Meets Grade Level. That might seem fine, BUT this student will be deemed “Did Not Meet” for the progress measure in Math. By a single item. Even though the student passed the test easily, getting 79% of the items correct when it only takes 37% to “pass.” Is this equitable, fair, justifiable, reliable, and/or valid? You tell me.

Knowledge of such flaws and inequities are just some of the reasons that it became more and more difficult for me to justify continuing to work as an “assessment professional.” It became increasingly difficult to pretend that the system was defensible, let alone worth training teachers how to implement it appropriately. Of course, when it came to the insanity of the system, Grand Prairie ISD said “hold my beer” and added layers of local assessment to this Least Wonderful Time of Year. So began the creation of Testivus.

Here’s how it happened: The high school calendar in GPISD had students attending classes for roughly 3 weeks after Thanksgiving break. Week 1 was mostly instruction; TEA allowed districts to offer December EOCs that week, but GPISD elected to wait. Week 2 was when GPISD offered December EOCs over four days (Tuesday through Friday because TEA at the time did not allow STAAR tests on Mondays). But GPISD also added four (4!!!) additional days of local assessments – “Q2 Summative Assessments,” they were called – and required “shutdown” testing for EOC-assessed courses. What’s more, the US History Q2 Summative HAD to be given on the Friday of Week 3 because of district policy on semester exams, meaning we had to hold both the final day of EOCs AND a major local assessment on the same day. On a Friday, no less. Self-induced torture. Or should I say, district-induced torture. Somewhere in those 8 days, non-core subjects also had to offer Semester 1 Exams. As you might imagine, the schedule was somewhere between confusing and downright comical. As an administrator, it was a death-defying juggling act just to create a coherent schedule, and then we had to communicate it to the staff and students. So I had a choice: either tear out my hair, elevate my blood pressure, and otherwise stress myself out at the holidays trying to make it work, or have fun with it. I decided to have fun and approach it with humor. So Testivus was born.

Testivus, as any good Seinfeld fan would infer, was a riff on Festivus, the fictional holiday “for the rest of us” in response to the rampant commercialism of Christmas. We needed something bizarre to associate with, and yes, resist, the madness, because there was no way to comprehend it without also admitting it was strange and convoluted. I even created a logo for it that I included on documents I gave to the teachers. I made jokes about it in e-mail communication. I was also brutally honest. “This is what happens when district tries to shoehorn more than 8 tests into 8 days. You might argue it’s the counterpoint to the miracle of Hanukkah.” Eventually, my humor got me in mild trouble. It just so happened that the husband of the district’s head of data and accountability worked as the chief security officer on our campus, and he received the mass e-mails I sent to the staff. He would forward said e-mails to his wife. But rather than contact me herself, she asked the district testing coordinator (DTC) to call and badger me about my humor. It so happens that the DTC’s personality is often quite dramatic, dialing any little issue up to 11 (shout-out to Nigel Tufnel) immediately. So I was told, “Senior district administrators are reading your e-mails, and they are not amused. I guess you’re trying to be funny, but to them it sounds like you’re pitting your campus in opposition to the district.” Fine, whatever. I expressed regret to her that she was being asked to deal with it, but the reality was that district was pitting itself against campuses by inundating us with local assessments literally layered on top of state assessments. I make no apologies for calling that out, nor would I apologize for fostering some empathy and camaraderie with my campus colleagues through humor that they genuinely enjoyed. And you know what? In Fall 2025, Grand Prairie ISD nixed those plans, removed “Q2 Summative Assessments” from the calendar entirely, and instead created “Fall District Assessments” on a different week in November. Granted, they still managed to make it awkward (not worth outlining how here), but they changed their approach to local assessments. Maybe the actual message behind Testivus, and my humor, resonated with someone, after all.

So if you’re winding your way through a similar situation, navigating the nonsense passed down from someone who doesn’t realize the negative impact on your students and teachers, I wish you a Happy Testivus. Hang in there, do the best you can under the circumstances, and most of all, laugh about the laughable. Go ahead and roll your eyes at the outlandish. Give yourself permission to have fun with it instead of getting upset. And most of all, realize that none of these things…zero, nada, nenhum, nüt…will matter all that much in the grand scheme of things. Treat your students and your colleagues with dignity, empathy, and kindness in this most blessed of holiday seasons, and let the absurd wallow in its absurdity.

September 10, 2025

The Illusion of Learning: State and Local Assessment

I admit that I have experienced a fair amount of self-loathing for the role I held in education the last 12 years of my career. The first 19 years, when I was teaching, were the most fulfilling. I got to work with some fantastic students. Even the unremarkable students were teenagers, after all, and I like to think that they all figured things out over time and became contributing members of society. I don’t remember having any students who I thought were actually terrible humans. I had some truly gratifying moments in my teaching years, because instruction – the teacher-student rapport that you build over a semester, a year, or even multiple years in some courses – is the literal backbone of education. To me, it’s sacred. As I moved into campus-level assessment, I actually experienced the best of both worlds. For 5 years, I still got to teach one class per day, plus I had the pleasure of supporting my colleagues as we navigated the nonsense of state & local assessment. And make no mistake: it is largely nonsense. These tests provide an illusion that we’re tracking student learning, but mostly we’re just adding a bunch of extraneous activities that intrude upon actual instruction and slowly drive teachers insane. My focal point on the administrative side of things was simple – my job is to help you keep your sanity.

I maintained that same mentality throughout the final 12 years of my career after I moved 100% into the realm of assessment. Whether we’re talking about the old TAAS or TAKS system, the current STAAR, or whatever TEA concocts in the future, the state assessment system in Texas is basically insane. We’re talking about a system where we give your son or daughter a single test each spring that is longer than anything they’ve encountered in an authentic classroom setting (also longer than nationally accepted standardized tests), from which we intend to measure whether that child has “learned” the content based on our arbitrary scale, and from which we also intend to determine whether that child has made adequate “progress” from the previous grade level. In multiple subjects. Then we’ll do it all again a year from now, even though the curriculum for those courses may be vastly different. Wow, for the system to work, that had better be one incredibly sophisticated set of multiple choice questions.

“Oh, but it’s not all multiple-choice anymore.” Yeah, sure. You can add in the choose-all-that-apply items, drag-and-drop, short and long “constructed response” items, but that doesn’t really make the test comprehensive. Those item types are ultimately window-dressing designed to suppress the notion that a given student has a probability of getting 20-25% correct simply by guessing. Any teacher can tell you a truly sophisticated gauge of student progress would track it class-by-class, if possible, on an authentic level based on the content. Instant, regular, consistent feedback is the most reliable. But fewer, infrequent tests with a greater number of items on each are always less valuable in tracking student learning and facilitating better instruction. Always. Regardless of the item types. While nationally accepted standardized tests, like SAT and ACT, are infrequent and extended, they are intended to capture a snapshot of a given student’s academic readiness for college, and schools consider them as part of a broader picture of the student’s profile because they know the tests aren’t perfect. No university in the world puts all of the proverbial student eggs in the testing basket.

And yet, somehow we do exactly this in the United States from grade 3 through high school. The current system has the federal government requiring states to use high-stakes, flawed assessments to answer the question, “Is our children learning?” Agencies like TEA spend millions consulting with testing firms to create these tests, which are then used for all the purposes already mentioned, and whose results then determine the majority of each campus’ and district’s accountability rating for the year (it will take a series of posts or podcasts to deconstruct what a mess the accountability system in Texas is). Now, I realize that certain statisticians or psychometricians may argue for the validity and reliability of STAAR, but this post isn’t arguing those issues, nor am I exploring the notion that the tests are inherently biased against certain demographic groups. For me, the bottom line is that the system as presently constituted is, on its face, detrimental to students and the teachers attempting to educate them because the very notion of an annual test for children simply cannot be considered the end-all-be-all in determining whether they are learning, whether they are making sufficient academic progress, or whether the campus and its teachers are meeting any practical standard of performance.

Yet here we are. And the fun REALLY begins when district administrators get a hold of some data points, develop an addiction to buzzwords like “data-driven instruction” and “rigor,” and decide they want more, More, MORE in the name of determining whether students are performing throughout the school year. Enter Local Assessment, a veritable obsession for many districts (my last district included) as they embark on a quest for – let’s be honest here – some kind of predictor of their accountability for the current year. Sure, student learning is an objective, but the real goal is the score and letter grade we can trumpet in board meetings, news releases, and social media. And it’s created an entire cottage industry: the “benchmark test” that attempts to imitate STAAR in content, format, and difficulty so that students can literally practice…testing. Not necessarily the skills embedded in the coursework, and certainly not skills that might work across different courses. Nope. Testing practice. Texas law currently limits districts from administering more than two benchmark tests in a given school year. (And, voila!, TEA developed their own “Interim Assessments” – two per year – whose sole purpose is to predict a student’s STAAR performance for the year. More on that sham another time.) But hey, that’s OK – your local school district will simply purchase and/or develop a series of smaller tests and call them something besides “benchmarks” – curriculum assessments (CAs), curriculum-based assessments (CBAs), quarterly assessments (QAs), insert your own title and initials here – these tests are all specifically designed to circumvent Texas law on benchmark testing. Sometimes these tests are actually quite short; other times they may require “shutdown testing,” as my last district called it, so a good portion of an instructional day (or maybe all of it depending on what class you teach) is burned away. Short or long, this testing is administered outside the normal testing that occurs in a classroom, meaning that instructional time is interrupted simply for the sake of district-level testing and data. Teachers and students become pawns for the central office bean-counters.

“That should be acceptable if the data is used to inform teachers about student performance and improve instruction.” Absolutely! As the great Kenan Thompson once said, “I mean…it should be.” And sometimes, it happens. SOMETIMES. When I worked under Dr. Teresa Stegall (see last week’s post), we operated under a mantra where “assessment should inform instruction.” But too often, the data is altogether ignored at the campus level, or worse, it’s used in a punitive fashion. Teachers are punished because of their students’ performance. Principals are called into meetings with lofty names like “Cadence of Accountability,” where they have to present their data on the most recent CBAs, defend their numbers, and lay out a plan of action if those numbers fall below expectations. Often, such meetings are incredibly adversarial, where central administrators are almost hostile toward campus principals. I know this because I used to support these principals as an assessment & accountability coordinator, either preparing them for an upcoming meeting or assisting them in the aftermath. The stories could be brutal and actually changed my perception of certain central administrators. The process often seemed like the old joke about “the beatings will continue until morale improves.” No productive or supportive environment, but plenty of accusations and ridicule to go around.

Do you really think, after suffering through such adversarial garbage, principals then go back to their teachers full of energy and support? Maybe the most noble ones do. But more likely, the message and tone received from central office is passed to teachers through badgering and negativity. Then we wonder why morale is down and teachers look to escape to other districts, or out of the profession entirely. But do we ever consider that “maybe this isn’t working?” Heck, no. “It’s what we’ve always done.” (At least for the past two decades.) This is the cycle that high-stakes assessment has begotten. And even as TEA, or the legislature, or the federal government, promise reforms and simplification, what I like to call the “testing industrial complex” (shout-out to the great President Eisenhower) continues to churn and roll along. And no one will have the actual courage to step up and admit that it’s harmful to students, that more and more testing literally crowds out time for teacher-student rapport, for teacher-teacher collaboration, for…you know, instruction. The politicians at the national, state, and local level would rather point to incremental gains that might be illusory and call them “miracles.” And the companies profiting from the system will be happy with that.

This whole sham lies at the core of whatever self-loathing I’ve experienced for the past dozen years. Yes, I tried to rise above the fray. I used hash tags like #respectinstructionaltime when communicating with teachers. I intentionally used humor to establish rapport with staff, letting them know that, as a so-called assessment professional, I understand how the proverbial “necessary evil” of testing was soul-crushing for them because it was sucking away time from the literal reasons they got into the profession. I even got into minor trouble at times for my humor (yet another story for another time), but I make no apologies, because my job was to help teachers, or principals, or fellow administrators, maintain their sanity. I stick by that. But I also stick by the belief that someone in a position of true authority needs the courage to stop this insanity. In the meantime, I ultimately decided that the grind was a bit too much, and not worth having it crowd out the time I wanted for other things in my life. No apologies there, either. And no apologies for using my voice to call out the system as the illusion that it is now that I am no longer constrained by it.