Stupid Grading Tricks

A bunch of people have been mailing me links to an article from USA today
about schools and grading systems. I think that most of the people who’ve
been sending it to me want me to flame it as a silly idea; but I’m not going to do that. Instead, I’m going to focus on an issue of presentation. What they’re talking about could be a good idea, or it could be a bad idea – but because the
way that they present it leaves out crucial information, it’s not possible to meaningfully judge the soundness of the concept.

This is very typical of the kind of rubbish we constantly see in
the popular press. They’re so clueless about the math underlying what they’re
talking about that they don’t even know when they’re leaving something crucial out.

The article focuses on how you record failing grades in a percentage-based
system: If 91-100 is an A, 81-90 is a B, etc. – then what should you record for an F? The article talks about a movement in schools to record Fs as 50s, rather than 0s.

What they left out is: what is it that’s actually being recorded?

When you take a test, what’s generally recorded is your actual score on
the test. So if you got 74% on a test, what would be recorded is the “74”, not
“C”. On the other hand, many things are graded not on a percentage basis, but on
a judgement of the appropriate grade on a 5-point letter-grade scale, so that the full information about how you did on an assignment or test is “C”.

In the former case, changing peoples scores is thoroughly unfair. If student
1 failed – but failed by just a hair, with a 50%, and student 2 failed terribly
with an 18% score, then student 1 should have an easier time raising his/her grade by doing well on other work than student 2.

In the latter case – that is, the case where you were given a grade
based on a five-point letter system, where a “D” was recorded as 60, and an F was recorded as 0 – that’s very unfair – because the scale is profoundly unbalanced. What corresponds to a one percentage point difference in a percentage-based score is translated into a 60 point difference by the conversion from letter to percentage.

Look at a simple example. Suppose you have a student who turns in five
writing assignments, which are graded on a 5 point scale. The student gets
F, B, B, B, C. If you score those as 50, 90, 90, 90, 80, then the average is
80, which would convert back to a C. If you score them as
0, 90, 90, 90, 80, then the average is 70, which converts back to a D.

It’s really not fair – because what you’ve done is created a tight cluster of
scores for passing grades, and then a comparatively huge gap between a minimum passing grade, and a maximum failing grade. Expressed that way, you can see what the real problem is. If you use a bad conversion from five-point scoring to percentage, you get unfair results. The real root cause of the problem is that
the way that the grades are produced, and the way that they’re recorded or averaged are very different.

If that’s what people are talking about, then I’m absolutely on the side of
the people who want to change the grading system.

On the other hand, some of the text in the article makes it sound like
they’re not going to distinguish between real percentage grades, and converted
letter-grades. The correct solution is to pick a consistent grading system. If you want to use percentages as the fundamental grading system, then use percentage-based grading – and have the teachers assign the full ranges of scores
when they’re doing a subjective grading, so that a percentage score from a test
graded by right/wrong answers, and a subjective score from an essay test are
equivalent.

Just bumping failing scores on subjective grading to a 50% is a lousy
solution. But it’s a less lousy solution than using 0.

But as I said, the article does a totally lousy job of talking about
this. It keeps talking about the difference in difficulty of raising a
failing grade when it’s scored as a 0 – without ever getting into the real
issue, which is the lousiness of the conversion system. For example, they
include the following from an opponent of the change:

But opponents say the larger gap between D and F exists because passing requires a minimum competency of understanding at least 60% of the material. Handing out more credit than a student has earned is grade inflation, says Ed Fields, founder of HotChalk.com, a site for teachers and parents: “I certainly don’t want to teach my children that no effort is going to get them half the way there.”

If a student really got a 0, then I’d agree with Mr. Fields. It shouldn’t be artificially boosted to 50%. On the other hand, what the story never
addresses is the fact that a student who really got 50% could be
treated by the grading system as if they got 0%.

They do briefly mention the fact that A, B, C, and D are broken
down in 10-point increments from 60 to 100, but F is separated by a much wider gap. But there are two problems with the way that they refer to that. First,
if you’ve got 4 letter grades separated by 10-point increments, they can’t cover the range from 60 to 100. There are 5 10-point
increments in that range: 60, 70, 80, 90, 100. And second, the issue isn’t really
that there’s a wide gap between 60 and 0; the fundamental problem is that the
process of converting from 5-point scores to percentages is broken because it
artificially creates that 59 point gap. The gap is an artifact of the
conversion process – and it can unfairly penalize students.

I think it would be a thoroughly unfair, foolish, even disgraceful exercise in
grade inflation to turn all Fs into 50%s – ignoring the distinction between 10%
quality work and 50% quality work. But I also think it’s an unfair, foolish
idea to round it the other way, and turn all Fs into 0%s, ignoring the distinction
between 10% quality work and 50% quality work. Why is the idea of grading everything the same way – all percentage based – so unacceptable?

0 thoughts on “Stupid Grading Tricks

  1. Blake Stacey

    From at least seventh grade onwards, every evaluation I had was a percentage score, a fraction of points earned over points possible. I never did figure out the point of converting these numbers to letter grades. My guess is that there is no good reason to do so, and the operation is done out of institutional inertia.

    Reply
  2. Eric

    To the opponent:
    “I certainly don’t want to teach my children that no effort is going to get them half the way there.”
    But you do want to teach them that if they can only do part of the job, they might as well just sit on their butt and do nothing?
    To Mark:
    There are 5 10-point increments in that range: 60, 70, 80, 90, 100.
    I don’t understand your quibble here, because there’s really four 10-point ranges there, which is how it’s usually divided (60-69,70-79,80-89,90-100). If you wanted to actually make it five 10-point ranges, you’d have to divide it up as 91-100, 81-90, 71-80, 61-70, and 51-60, which extends past 60.

    Reply
  3. mike

    The way this was presented to our teaching staff is that any assignment would have a minimum grade of 50%. So a score of 50% could mean that a student was able to correctly answer half the questions in an assignment, did very poorly on the assignment, or did not do the assignment at all! In a school that has adopted this type of scale, doing absolutely nothing earns you a 50%. There was enough of an uproar that the idea quietly went away.

    Reply
  4. Daryl McCullough

    Averaging percentages makes no sense, either. Here’s an extreme example: A true/false test. If you score 50% on a true/false test, then that means you know nothing. A chimpanzee can be expected to do as well.

    Reply
  5. Drekab

    What does fairness have to do with teaching? Its not like a kid with a failing grade is gonna take the valevictorian speech away from a more deserving kid or something. There’s a big difference in the way to treat students in the top or bottom of a class. At the top of the class its important to have a more precise grading scale to give more accurate feedback. But at the bottom of the class, all you can do is scream silently to yourself and wish the kid would just try a little.
    I fail to see a significant difference between a kid who gets 18% and one who gets 50%. Its still ‘slap yourself on the forehead, why won’t this kid give a crap’ bad. The only thing you want from someone with that low of a score is some indication that they’re going to try harder on the next assignment. If giving him a 50 helps that goal, who cares how fair it is?

    Reply
    1. Meme

      I teach an intervention benchmark class, which means at least 90 percent of my students are not reading on-level and not able to write on-level. When we’re working through assignments I cannot modify to their level because it is a standard I am required to cover according to Common Core, these students typically try harder than my 10 percent and often fall below that fifty percent mark. They ask for extra practice, ask questions, participate actively, stay after school, come at lunch, offer to redo subpar assignments.
      Out of fairness (which, by the way, is required by law) for the students that mastered the material, I will not modify their grade for anyone, and if administration asked/told me to change the true grade, I would call into question their ethics; also, assessments and the grades assigned to them are just as much for the teacher as the student, if not more so.
      Where did I fail this student with the 18 percent? They’re not having trouble with one aspect of the assignment, but the whole thing with a grade like that. Contrariwise, a 50 percent indicates at least half of the skills or assignment was understood. These grades mean the difference between a total re-teach and a review of key skills. These assessments and the assigned grades are what we take to administration and special ed when we suspect there is a problem developmentally. An 18 percent presents a much bigger problem than a 50 percent.
      And let the record show, my kids with the 18 percent WANT to know how to improve, to know what they did wrong, and need very precise grading, whereas, I’ve found that my advanced kids check the grade and put it away so long as it meets their standard and only bother to give it attention when it might impact their precious GPA–and of course, I’ve graded it wrong, or so they’ll say..
      .

      Reply
  6. Daryl McCullough

    In my opinion, it doesn’t really convey much information to give a numerical grade or a letter grade, except in the two extremes: If you have the highest possible grade, then you probably understand the material pretty well. If you have the lowest possible grade, then you probably don’t understand it. For people in the middle, the grade doesn’t really reflect any concrete.
    What I would much prefer, even though this is probably more work than teachers are willing to do, is to organize the material into tiny little “microtopics” and then for each microtopic, use pass/fail. Pass means that the student has mastered the microtopic, fail means that he or she hasn’t. Mastery means that the student is basically perfect in that microtopic. Yes, it’s unreasonable to say that someone has mastered a broad topic, such as “mathematics” or “history”, but it’s perfectly reasonable to say that he has mastered a tiny topic, such as “multiplying two-digit numbers” or “solving linear equations in one variable”.
    In mathematics especially, it makes no sense to move on to advanced topics when one hasn’t mastered the pre-requisites.

    Reply
  7. Mark C. Chu-Carroll

    Eric:
    My quibble about the “10-point ranges” is that it’s another demonstration of the fact that the reporter doesn’t understand the problem.
    If you’re converting from letter grades to numbers, then you get to record a single number for each grade. If a D is a 60, then what’s an A? If a D is a 70, then why say
    that there are 59 points between a D and an F?
    If you’re recording percentages – so that you’re not converting from the letter to the number, then why would you record a zero for an F if a student got 58%?
    If the “f=0” thing is a real problem, then either people are discarding information about the students real performance – which is unacceptable – or there’s a stupid conversion from letter to number. And if you’re converting letter-to-number,
    where each letter corresponds to a particular percentage, then 60-100 isn’t four scores separated by 10, but 5.
    The real problem is that there’s a problem in the conversion – but the reporter just doesn’t get that.

    Reply
  8. Mark C. Chu-Carroll

    Drekab:
    The issue about fairness is about rewarding students for what they do.
    If you have a student who’s really trying, and they get a 58% on a test; and you have a student who doesn’t bother to even write answers to any of the questions, I think you should want to make it easier for the student who’s really trying to bring their average up to a passing grade. If you have a student who, across a term, gets scores like 50%, 55%, 65%, 75%, 80%, that student should pass. But if you’ve got a student whose scores are 0%, 75%, 10%, 75%, 65%, then that student really deserves to fail. If you boost all failing scores to 50%, then the two students are in very similar situations, where they can both likely end up passing; if you drop the Fs to 0s, then both students are likely to fail. But the two students are very different: one is trying and improving, and the other just doesn’t give a shit.

    Reply
  9. Mark C. Chu-Carroll

    Darryl:
    I disagree with you – I think that grades are quite informative, even when they’re not at the extremes.
    A couple of examples from my own personal experience:
    – When I took my first writing course in college, I was a terrible writer. My first paper in the course got a D – and that may have been generous. During that semester, I worked my butt off trying to improve, and the grades demonstrated that. Moving from a D to a C reflected the fact that my writing had improved; and moving from the C to a B reflected even more improvement. I never got past the B – but my writing was dramatically better than it was at the beginning of the semester.
    – When I’ve taught math, I have students who can look at any problem, and whip up a perfect answer with a clear and elegant proof. I’ve also had students who can get a proof mostly correct, but with a few minor errors. And I’ve had students who just can’t solve a simple problem, much less write a proof. The first student, who can solve and prove their solution in an elegant way deserves a 100%. The second student, who makes small mistakes, doesn’t deserve the same grade as the student who did a perfect job – but they deserve something good for getting it mostly right – generally in the 85% range. The student who can’t get the right answer, but shows that they at least understand how to attempt to solve it deserves something for their effort – perhaps 60% if they get the structure and approach, but can’t finish it to get to the answer. And the student who has absolutely no clue of what they hell they should do
    at all deserves less than the student who got the structure and approach but not the answer.

    Reply
  10. Mark C. Chu-Carroll

    Blake:
    In my experience, both in high school (Bridgewater, NJ) and as an undergrad in college (Rutgers), some classes graded on a percentage scale, and some graded with letters. If you have the mixture, then there’s a problem, because you want to be able to compare academic performance between students with different courses, and so you need to universalize the grading system.
    I agree that it’s a broken system, and there really isn’t any good way to convert between the two. The percentage grading carries more information than the letter grading. So converting percentage grades to universal letter-grading requires losing information; and converting from letter grades to universal percentage requires inserting arbitrary information that isn’t present in the grade.
    universal

    Reply
  11. Scott Simmons

    “Averaging percentages makes no sense, either. Here’s an extreme example: A true/false test. If you score 50% on a true/false test, then that means you know nothing. A chimpanzee can be expected to do as well.”
    Last time I gave a chimp a true/false test, he ate it. 🙂
    “My quibble about the “10-point ranges” is that it’s another demonstration of the fact that the reporter doesn’t understand the problem.”
    I agree that the reporter seems to be very hazy about what’s going on … But from the examples of these policies listed, I think that what’s being done in these districts is the first option you have listed in your last paragraph of the blog entry–the floor % score for F’s is being set at 50%. Not necessarily for individual assignments, but at least at the overall average level. My sons are in Arlington ISD, neighboring the Dallas district that’s implemented this, and I know that on their report cards, they don’t get letter grades, but the numeric average for the class. (There’s a translation to the numeric grade at the bottom, but what’s tracked from grading period to grading period is the number.)
    Here’s the argument in favor of ‘flooring’ the averages Dallas’ way: Student X has to pass math (i.e. have a numeric average for the school year of >= 60%) to get promoted to the next grade. At the mid-year break, his report card shows an average of 15%. Breaking out his calculator, without which he obviously has no chance of solving this hard a problem, he figures out that he needs to average 105% in math for the rest of the year in order to pass. Since this obviously isn’t going to happen, he spends the rest of the year just goofing off–not just in math, but in all of his classes. (This ain’t college–if he doesn’t get promoted, he doesn’t just repeat the class he failed, he repeats the whole year.)
    Meanwhile, his compatriot in Dallas, Student Y, is doing just as poorly in math, but his mid-year report card shows a 50%. If he can just bump his math work up to a C level, he’ll squeak by & go on to the next grade, assuming he can pass the TAKS test.
    Now, Student Y’s chance of being promoted is actually not materially different from Student X’s; it’s rougly equivalent to their chances of winning the state lottery when Y has a ticket and X doesn’t. The difference is, Y has some hope, and might at least try to succeed.

    Reply
  12. Greg Laden

    There seems to be an unquestioned assumption here that I’d like to bring up. That is, an assessment for a student who learned nothing, or would earn nothing (ideally similar concepts) yields a zero as a numerical value in the assessment. However, this is often not the case.
    A five point multiple choice administered to a chimpanzee might yield 20%. In other words, 20% = zero.
    But people know more than ‘random’ and they can make guesses, etc. In other words, a typical human taking a test before taking the class (in an area with which s/he is not familiar) will not get a 20%, but rather, something closer to 37-43%. So, zero is around 40%.
    Even in subjective exams a person is unlikely to get ‘absolute zero.’ for a number of reasons.
    Sure, there are (potentially) objective or subjective assessments that can result in zero. Also, failure to take the test or to turn in an assignment can often be zero (even though maybe it should not be! Maybe it should be the random expected outcome …. the chimpanzee value …!) but in most cases, zero is not zero.

    Reply
  13. bill r

    This would be a nice basis for a series of posts on the confusion of ordinal, cardinal, and real numbers, leading into semiorders, interval orders and partial orders.
    If the purpose is to order students, why not consider direct measures of dominance Pr(X>Y), or a score that translates into dominance?

    Reply
  14. Grieve

    @Daryl McCullough:
    I had never thought of a true/false test in those terms before, and seeing it in that light makes me realize what an awful tool they are to test knowledge (at least for someone who doesn’t know any of the material being tested). It seems that an essay test is the way to go, but those are hard to get graded by scantron. 🙂

    Reply
  15. John Armstrong

    This is exactly why I don’t (didn’t) assign letter grades in the middle of the semester. Letter grades should only attach at the very end of a course, to minimize rounding errors.

    Reply
  16. Carlie

    And don’t forget about the places that add +/- to the mix, too. Strangely, you can get a D+ at my school but not a D-, and an F is an F. (Sometimes I’d like to have an F- that awards negative GPA points.)
    Isn’t converting from a letter grade to a percentage grade just like trying to add in extra significant digits after the fact? It’s entirely fake data, even at the top end. An 81 IS a different grade than an 89. That said, I do like having letters to convert a percentage grade to at the end, because it does allow for a little rounding in case of a few badly worded questions here and there, losing a bit of time at the end of a test, etc.

    Reply
  17. Jim Thomrson

    When I took freshman English composition, I was in a special section for high test scoring people. First theme was an F, second a D-. Prof called me in, said I could manage it. If I could make a B on the last theme he would give me a B in the course which would exempt me from the second semester. Grades ran D, D+, C-, C, C+ B-, B. I knew grammar very well, but hated my 6th grade spelling teacher, and wrote illegably. I figured out to get help on the two latter problems. Got my B and here we are today.
    It is strange, but in my 32 year career as a professor, I have never had a student miss the next higher final grade by only one point.

    Reply
  18. Julie Stahlhut

    Converting letter grades back to numerical grades reminds me of the old story about how to weigh a hog: Look at the hog, pick a big stone you think might be about the same weight, and then guess the weight of the stone.

    Reply
  19. Grackle

    Given those 10-point spreads for A through D, with everything between 0 and 60 an F, then it’s pretty obvious that the testing is mistakenly designed.
    If testing did an adequate job, then we would correctly identify the excellent students, the good ones, the fair ones, the poor ones, and the failing ones.
    If a student who fails to master the material could conceivably get three fifths of the answers correct, then those three fifths do nothing of any use. All of the passing students should get them right, so 60% of the test is wasted making the passing students do busywork. Thus most of the test’s discrimination is to sort out degrees of failing. There is no point having the ‘dumb’ questions on the test in the first place.
    Develop test questions that will distinguish the excellent from the good, the good from the fair, the fair from the poor, and the passing from the failing.
    What we hope to achieve is excellent performance from the entire class. If we are performing badly, we should not corrupt our testing and scoring to conceal our problems.
    (I used to be an instructor at a red-brick university. I had many students compliment me on my exams, telling me they knew they couldn’t bullshit their way through them. Developing the exam questions was a real bear, but the work had to be done.)

    Reply
  20. Mark C. Chu-Carroll

    Greg:
    You’re absolutely right about the baseline issue.
    I despise multiple choice exams for exactly that reason: they’ve got a hidden baseline. When I taught classes, I always tried to write exams where it was actually possible to get a zero if you really didn’t know the material. I hate hidden baselines.

    Reply
  21. Stephen

    At my engineering school, most professors attempted to teach and test competence. In one course, the prof passed out a syllabus on the first day which included when the 3 tests would be, what they’d cover, and all homework for the course. The combined homework would count as much as a test. The lowest of the 4 grades would be dropped. You needed a 60% to pass the course – pass/fail. The whole school was pass/fail. No GPA. Solves that one, anyway.
    I went home and discovered that i only had time to do half of the homework. I suppose i could have cheated. But, half of the homework was enough to learn the material. I passed in the first assignment, and got a 50%. This is a failing grade, but at least i’d gotten the representative problems i’d done correct. There were cross checks, so i knew that already. I did half of each of the rest of the assignments, but did not hand them in – they would all be failing marks, which was pointless. The last homework assignment wasn’t covered by the last test. I did half of it, and didn’t hand it in. I passed the course.
    The same prof taught the following course, and on the first day, he handed out an exam. There were four problems, and one was from that last homework assignment. I got a 97%. The next highest mark in the class was 67%. Clearly, of 110 students, i was the only one who did anything on that last homework set.
    The prof was attempting to test for competence. I was attempting to achieve competence. My main complaint was that twice as much homework was assigned as was needed. It was a difficult class. Most students failed it. One way to improve it would be to split the two courses into three.
    This prof didn’t want to fail everyone. He changed the passing grade twice – making it easier each time. He had tons of open office hours. Videos of lectures. Past problems as examples completed. Etc.
    Failure was part of school. At the start, i was told that only one in three would graduate. It turned out to be one in four. And these kids were really smart. Many smarter than me didn’t make it.
    IMO, the only way to test for competence is the Turing Test. If you’re competent, you interview each candidate. Nothing less will do. It’s considered impractical, except for very small classes. Even there, it’s rarely done. But as a parent, i can do it for my kid.

    Reply
  22. itchy

    Meanwhile, his compatriot in Dallas, Student Y, is doing just as poorly in math, but his mid-year report card shows a 50%. If he can just bump his math work up to a C level, he’ll squeak by & go on to the next grade, assuming he can pass the TAKS test.
    Now, Student Y’s chance of being promoted is actually not materially different from Student X’s; it’s rougly equivalent to their chances of winning the state lottery when Y has a ticket and X doesn’t. The difference is, Y has some hope, and might at least try to succeed.

    So the moral is: Trick Student Y into thinking he has a chance to pass, causing him to put more effort into the second half of the semester — then flunk him anyway.
    Because, seriously, kid, did you really think we were going to grade your horrible first-half work as 50%? That just proves how dumb you are!
    Really, the problem with Scott’s perspective is that, while it may encourage the student who truly only got 25% of his answers correct to continue trying, it encourages a student like me, who might have gotten 60% in the first half, to only bother getting 20% or so, since I know I’m gonna get an artificial bump anyway. I mean, it’s like matching funds in your 401(k). Why not take advantage?

    Reply
  23. Robert E. Harris

    The parts of teaching (or trying to teach) that I miss least are trying to write exam questions that have relevance to the subject and can be answered by most students, and then grading the papers, and at the end of the semester converting all the numerical grades into letter grades. I never was stuck with any requirement of a 100-90-80-70-60 sort of grading system, and most of the time A grades ran nearer 80% than 90%. Of course, with more effort and better judgment in writing exams I probably could have made my grades come out close to a 100-90-80-70-60 percentage scale. I have been retired for 8 years, and I do not miss any part of the grading problems.
    The problems with trying to force grades into a rigid percentage scheme are well discussed above. But if letter grades are used during a semester, why not average them using a GPA system? Then A(4), B(3), D(1) on three papers or exams gives 8/3 = 2.66… which works out to about B- or say 82% if we have to do this.

    Reply
  24. miller

    Something about the whole “controversy”, if it truly exists, seems rather silly to me. First of all, I’ve got to agree that USA Today is awfully vague. I could be reading it incorrectly, but I gather that the issue is about how to average letter grades after the percentage-data has been thrown out.
    If that’s the case, then assigning A,B,C,D,and F the numbers 100, 90, 80, 70, and 60 is mathematically equivalent to assigning them the numbers 4, 3, 2, 1, 0 (which is what’s done with gpa). The whole “no effort gets you half way” is a stupid, stupid non-issue.
    In any case, there’s no real reason to throw out percentage grades and use letter grades to do averaging, unless you’re comparing different classes which have different grading distributions and scales. If all teachers are forced to use the same grading scale (as seems to be the case here), there’s just no point, except to reduce data quality.

    Reply
  25. Matthias

    The german scoring system is somewhat different.
    a) in schools we have grades of 1 through 6, where 1 is best and 4 “just passed”. All of these are usually averaged over. The conversion of percentages to grades is somewhat arbitrary at times.
    b) in universities we have grades 1 through 5, where 1 is best and 5 is failure. Normally the failed exams are NOT included in the average, instead you are required to pass a certain number of exams (all of them w/ the new bachelor/master system, but there is always a second try possible).

    Reply
  26. Matthias

    addition to last post: that’s the short of it. it’s a little more copmplicated in detail, as education is state-driven, so we have 16 different schooling systems and (transitionally) 32 sets of university rules, depending on where you study and when you started.

    Reply
  27. Jérôme ^

    The more I read you, the less I understand the use of the ABCDEF grading system. Here in France we use (a variant of) percentage grading: the students simply have a score out of 20 points, and they need to have at least 10 to pass their exam. (This may vary depending on exams, sometimes the total is not 20, or the required average is higher than 10, but reducing to 20 points is the first thing anybody does).
    For one thing, this makes it ridiculously easy to calculate ponderated averages (even a majority of math students understands how to do it! 😉

    Reply
  28. jl

    @Daryl McCullough:
    I had never thought of a true/false test in those terms before, and seeing it in that light makes me realize what an awful tool they are to test knowledge

    He is wrong. Guesswork does not automatically get 50% on a true/false test at all. This is basic probability.
    The probability of x successes in n trials is nCx px qn-x where p and q are the probabilities of success and failure respectively.

    Reply
  29. Daithi

    I like the way Matthias described how Germany works. It is my understanding that the goal sought in the USA Today article was to find a means to allow a student to improve his course grade when he has failed an exam (or not completed the exam and received a 0). Germany’s solution of not including failed exams in the student’s average score, but also forcing a student to pass a minimum percentage of exams seems to address this problem. Although I suppose it is then possible for a student to fail an exam and get a better course grade than a student that didn’t fail any exams.

    Reply
  30. Daryl McCullough

    Mark,
    Perhaps the letter grade gave you some useful feedback, but I’m thinking from the point of view of the next class that you take. How much does the letter grade mean to the teacher of that class? I claim not much. If you know that someone got a C in a mathematics course, it could mean that the student was careless and made a lot of mistakes, or it could mean that there were key concepts that he completely doesn’t understand. How you teach that student is vastly different in those two cases, and the difference is not reflected at all in the grade (whether numeric or letter grade).
    As far as the distinction between a student who can solve (most of) the problems and a student who can also explain the theory and/or proofs involved, to me those are two different “microtopics”. Give the one student a pass on one, and a fail on the other. Give the other student a pass on both.

    Reply
  31. Ole

    I guess this would be a good time to describe the system that was used in Norway, and the other system which is used now.
    Up until 2003 we used a grading system from 6.0 to 1.0, where 1.0 was best, and 4.0 and up to 1.0 was passed. 4.1 and below was fail. This means that if you had 40 % right you’d get a 4.0 and just pass the exam. And it was nice to know that if you got a 4.5 you just missed passing, but if you got a 5.5 you were a long way behind. I think most people felt that this was a nice system, easily converted to per cent, and easy to explain to others.
    But then some bright person found out that we should have a grading system that was more international. From 2003 and still we have a system which grades ABCDEF. Yes, we got an E also. The F is pretty big, from 0 % to 39 %. E is about 40- % – 49 %, and I don’t remember the rest. I think A is 91 % – 100 %. Another point with the new grading was that instead of 40 different grades on the old (1.5, 2.6, etc) the new one only have 5 grades. This might make it easier to get an overview of the grades.
    But the new system didn’t make it more international. I don’t think there’s any country that has the same grading that we have. And the inclusion of an E creates confusion when we compare our grades to grades in USA. For us a C is an average and better grade, and nothing to be ashamed of. In theory our A is actually better than an A in the US. (But I see that both is around 91 % – 100 %.)
    I like the way they do it in France.

    Reply
  32. Daryl McCullough

    jl writes: Guesswork does not automatically get 50% on a true/false test at all. This is basic probability.
    Yes, it’s basic probability. Try plugging in actual numbers and see what your formulas tell you. The *expected* number correct is 50%.

    Reply
  33. Mark C. Chu-Carroll

    Daryl:
    I think this comes down to a disagreement about the purpose of grades.
    I see grades as being fundamentally feedback to the student about how they’re doing. The main purpose of giving a grade isn’t for the current instructor to provide information to the next instructor, but for the instructor to provide information to the student.
    When I’ve taught, I’ve found students past grades to be utterly irrelevant. The things that I, as an instructor, am interested in about my students can’t be communicated by a single numeric value or letter grade. It doesn’t matter what grading system you use: as an instructor, past grades provide me with virtually no useful information about the student.
    On the other hand, as a student, seeing my grades on tests and assignments provided me with tons of valuable information.

    Reply
  34. Daryl McCullough

    Mark writes: When I’ve taught, I’ve found students past grades to be utterly irrelevant. The things that I, as an instructor, am interested in about my students can’t be communicated by a single numeric value or letter grade. It doesn’t matter what grading system you use: as an instructor, past grades provide me with virtually no useful information about the student.
    Then I think we are in agreement about the uselessness of grades for this purpose. But past grades are certainly used to determine what courses a student must or is allowed to take next. So I’m suggesting using something else for that purpose.
    As an instructor, you have to assume something about what your students already understand. You can’t start from scratch and reteach counting and addition.

    Reply
  35. Donalbain

    Me not good at maffs. Me just science teacher!
    Does a multiple choice test with negative marking have a hidden baseline? What are the statistical benefits and problems with such tests?

    Reply
  36. Mark C. Chu-Carroll

    Donalbain:
    I don’t know what you mean by “multiple choice with negative marking” – do you give
    positive points for correct answers, and negative points for incorrect? Do different wrong answers get different numbers of negative points?
    The problem with hidden baselines comes from averaging scores. If random guessing produces an expected outcome of, say, 40% correct, then meaningful grades really range
    from 40-100. If you’re averaging two tests, one of which has a hidden baseline of 20 and
    one with a hidden baseline of 40, the average score isn’t being computed in a correct way.
    Suppose you’ve got two tests: a true/false exam where the student got a 60%; and a
    multiple choice test with 5 answers per question, where the student got 60%. In the
    former case, the student basically got a score 10 points higher than you would expect by random guessing. In the second case, the student got a score 40 points higher than you would expect by random guessing.
    So in the first case, the student got about 1 question in 5 correct above what you would expect from random guessing: he did better than random guessing by a margin of 20%. In the second, the student got 1 question in 2 correct above what you would expect from random guessing – he did better than random guessing by 50%.
    The student in the second case clearly had a *much* better mastery of the material than the student in the first case – but because of the hidden baselines (50% in the true/false, and 20% in the multiple choice), you’ve made very different performances look as if they were the same thing.
    The meaningful grades are what you get if you eliminate the baseline. In the true/false
    test, the meaningful grade for 60% correct is (arguably) 20% above baseline. In the multiple choice, the meaningful grade for 60% is 50% above baseline.

    Reply
  37. SiMPel MYnd

    In my high school (in Northern Ohio), they used the A=4, B=3, C=2, D=1, F=0 system for computing final grade point average. Each teacher used his/her own system for defining what constituted an A, B, etc.
    But, here’s the real kicker… They had what they called “Honors” or “AP” classes. And, they decided that, since these are honors classes and are harder, that for them, and A counts as 5, a B counts as 4, etc.–everything was 1 point higher.
    Well, you, the math literate can see what’s coming. There were a couple of people who had straight As and both took all the honors courses available. But, one of them took fewer courses overall. So, because that person took fewer courses, the A=5 courses weighted their grade point average higher–the person with fewer courses had a higher grade point average and would have been the valedictorian. So, picture this… here’s a group of us sitting around with principals and counselors, trying to explain to these bozos why this wasn’t fair. How can you have one student who took more courses and got all As be ranked less than another student who took fewer courses but also got all As. Isn’t the former performing better than the latter? We argued until we were blue in the face and they couldn’t understand it.
    They copped out by calling anyone with a grade point average over 4 co-valedictorians. We also argued this as unfair because one of those was someone who had a B in a single course, but the A=5 courses raised that GPA over a 4. So, they had a bunch of people with straight A’s ranked “equally” as valedictorian as someone who got a B.
    I still remember the vacuous look on their faces as we tried to explain their errors to them.

    Reply
  38. Jeff

    The whole discussion seems unlikely. In my experience grading is a great deal more nuanced than this. Individual tests and other work is assigned a numerical score, which can either be qualitative or quantitative in nature, depending on the subject. Some classes further weight the score by the importance of that work to the overall subject. When all scores are collected the (possibly weighted) scores are summed and a letter grade is assigned as an indicator either of class standing (“grading on a curve”) or of subject mastery. – And grades are on a 5 point scale where F=0 and A=4, not 0-100.

    Reply
  39. Wry Mouth

    Do you have any idea how much time it takes for me to inculcate in my students (and their parents) the “good F” versus “bad F” distinction?
    Pointing to a 55%, I might say, “that’s not a pretty good F,” or looking over a paper that’s scored 35%, I might say “oops — this is a bad F.”
    Raised eyebrows are the normative reaction. But I persist.
    ;o/

    Reply
  40. Donalbain

    Mark, yes negative grading was used a couple of times when I was at uni. A correct answer gained you 1 mark. An incorrect answer got you -1. A question left blank got you 0. The aim, I believe, was to prevent guesswork. I just wondered how that might show up in a statistical problem with the grades.

    Reply
  41. SLC

    Re Greg Laden
    The answer to the problem of non-zero scores on multiple choice tests is to give not a zero for a wrong answer but a negative number, depending on the number of choices. This is how it is done on the College Board exams.

    Reply
  42. SteveM

    I had a professor in grad school (EE Signal Processing) who was fond of giving true/false tests with the possibility of partial credit. That is, if you showed how you arrived at your answer but made a sign error or some other trivial mistake you could get partial credit. You didn’t need to show your work but wrong answers would be scored as -1, no answer as 0, correct as +1.

    Reply
  43. Ethan Romero

    Carlie #16:
    At my current university undergrads can get +/- on all letter grades. Unfortunately, one of the students in a course that I was TAing actually got an F-. All 4 of the exams were 30 multiple choice questions each with 5 possible answers. The student got

    Reply
  44. trrll

    It is not uncommon to adjust for the baseline problem in multiple choice grading by subtracting an additional fraction of the number wrong, corresponding to the average number of questions that a clueless student would get right by guessing. To take a trivial example, on a true-false test, if the student got 5 questions wrong, there are probably another 5 for which he did not know the answer, but picked the correct answer by chance. So you can correct by subtracting an additional 5 points from the grade.
    Because of this, I’ve heard students taking tests of this nature advised, “Don’t guess, because you are penalized extra for wrong answers.” The correct advice, is of course, always guess. It won’t help you or hurt you (on the average) if you genuinely don’t know the answer, but often people know more than they realize, so guessing is likely to be biased toward the correct answer, leaving you with a gain even after subtraction of a penalty based on the assumption of random guessing. And of course, if you can eliminate even one choice on a multiple choice test (and there is usually at least one that is utterly crazy), picking one of the remaining choices at random will gain you a fraction of a point on the average.

    Reply
  45. socratic_me

    Wow. Not a lot of HS teachers here from the looks of it.
    My district just instituted the “no zero” program and it has been a huge headache. All the talk about tests is really missing the point. Most HS teachers don’t mind bumping grades on tests a bit anyway, so giving a multiple choice test where the baseline is 20% or 25% isn’t a huge concern. Where zeros come into play are on work that isn’t done. It is extraordinarily rare that students receive less than a 50% grade on most all non-exam work turned in. However, not turning homework and such in earns a student a zero. No effort = no grade.
    These zeros add up. By the end of the first quarter, a student who hasn’t done anything will find it very difficult to pass the semester, which administrators find troubling. Therefore, they push all grades to 50%, including the grade received for not doing a single thing in class. The idea seems to be that if the student has some hope, they will magically start working. Inevitably, the fact that the students hasn’t done a single thing in class to that point, which is how they earned the zeros in the first place, is left out of the discussion until teachers start screaming.
    As a solution, students at our district get an extra day to turn any assignment in. Then they are given a day to get a note signed by their parent and turn it in. If it still isn’t in on day two, they receive a lunch detention. And of course, all of the documentation falls onto the teacher and nobody takes account of the fact that in a class with daily HW (like many HS math classes), that student is now half a week behind. Since all this paperwork builds up, what generally happens is that a student gets 50% for doing nothing, gets a few detentions until the teacher gets tired of the whole process and decides it just isn’t worth it. At the end of the semester, the student’s parent is livid because they have actually turned in a few assignments on a rare whimsy, so they are within 5% or so of passing instead of at 50% away from passing and the teacher refuses to help them out.
    That said, if you just read, none of the above is even moderately obvious. Not only does the reporter explain the math poorly. They generally fail to explain the process well at all. Not surprising given how little most people outside of education rally know about what goes on in the classroom.

    Reply
  46. HS teacher

    In the school district where I teach, grades are entered on a scale of 50-100, with 65 the lowest passing score (65=D, 70=C, 80=B, 90=A — letters are shown on the report card, but it’s the numbers that matter). Teachers typically keep their own records on a scale of 0-100. Some teachers enter their raw scores into the system, converting the grades that are

    Reply
  47. HS teacher

    [Previous post was truncated because of less than sign.]
    In the school district where I teach, grades are entered on a scale of 50-100, with 65 the lowest passing score (65=D, 70=C, 80=B, 90=A — letters are shown on the report card, but it’s the numbers that matter). Teachers typically keep their own records on a scale of 0-100. Some teachers enter their raw scores into the system, converting the grades that are below 50 to 50 and leaving the others alone. Some teachers curve all of the grades upward, so as not to give a student whose raw score was 0 and a student whose raw score was 50 identical grades. In my school, students who do absolutely nothing are far from rare. So students who attempt assignments and take tests seriously, even though they do poorly, are appreciated and rewarded. A student with a grade of 70 for each of the first three terms can be absent for the entire fourth term, get a grade of 50, and pass for the year.
    #6: “In mathematics especially, it makes no sense to move on to advanced topics when one hasn’t mastered the pre-requisites.” Indeed. But it is not that “[microtopic grading] is probably more work than teachers are willing to do”. It is that there are limits to how much a mandated course of algebra 1, for example, with a mandated textbook and a mandated schedule, can be made to resemble 5th grade for woefully unprepared students. Teachers are already working 50-60 hour weeks, trying to maintain some semblance of order and progression in the midst of chronic absences and apathy.

    Reply
  48. SLC

    I think that the issue of assigning negative points for a wrong answer in a multiple choice test can be understood by proposing an example.
    1. Suppose we have a multiple choice test with 100 questions, with a choice of 5 responses to each question.
    2. By pure random chance, the expected number of correct answers is 20. Thus, if 1/4 of a point is subtracted for each wrong answer (note: no points are subtracted for not making any choice), the expected score is 0.
    3. Thus, for any question where taker has no idea as to which choice is correct, the best strategy is to make no choice and leave the question blank
    4. However, if the test taker has sufficient knowledge of the subject area to definitely eliminate one or more of the choices, then the best strategy is to choose one of the remaining choices. As a pathological example, suppose that the test taker can eliminate 1 choice on each question. Then by random chance, he/she will get 25 correct answers. The 75 incorrect answers yield a negative score of 18 3/4, giving a total score on the test of 6 1/4.

    Reply
  49. Robert Harrison

    Just throwing in my 2 cents as a cs professor at a decent state university. There is a bit of test psychology involved that doesn’t seem to be discussed. While it would be great to have a test with a mean of 50% and some spread that really measured their comprehension of the material, it isn’t a workable solution. It is actually important to put some easy questions on the test to allow the students to calm down and concentrate on the “real” ones that actually measure their competence. So my “chimpanzee” baseline is closer to 50% than ideal. Still the occasional one goes below that. As a practical matter I keep all intermediate grades in percents and only at the end convert them to letter grades via an appropriate weighted average. I find it useful to have zero’s vs. something to penalize the cheaters (as nothing else works and we don’t have a better formal mechanism).

    Reply
  50. Robert M. Bradford

    To me it sounds like the program in #46 is a good idea especially for math classes. High school math classes always bugged me in that I felt like I was being graded for effort rather than understanding of the material. It always amused me when I received C’s and D’s in math classes that I always scored nearly perfect on the tests. At least my calculus teacher apologized to me for giving me a failing grade only to have me score a 5 on the AP exam.

    Reply
  51. Blake Stacey

    Mark Chu-Carroll:

    In my experience, both in high school (Bridgewater, NJ) and as an undergrad in college (Rutgers), some classes graded on a percentage scale, and some graded with letters. If you have the mixture, then there’s a problem, because you want to be able to compare academic performance between students with different courses, and so you need to universalize the grading system.

    Granted. We’re all probably looking at our own ends of the education elephant: all of the middle- and high-school classes I can remember graded on some sort of percentage system, with percentage scores being converted to letter grades. No mixing was involved. The English teacher gave you an 82, the algebra teacher gave you an 85, and so forth — but these were all converted to letters. The grades started off “universal”, and were converted to another “universal” system, with needless information loss in between.
    In retrospect, middle- and high-school education had much worse things wrong with them, but that’s a rant for another day. . . .

    Reply
  52. trrll

    2. By pure random chance, the expected number of correct answers is 20. Thus, if 1/4 of a point is subtracted for each wrong answer (note: no points are subtracted for not making any choice), the expected score is 0.
    3. Thus, for any question where taker has no idea as to which choice is correct, the best strategy is to make no choice and leave the question blank

    This is bad advice. Leaving the question blank is no better, on the average, than taking a wild guess. In practice, students usually have some knowledge that they are not consciously aware of, so the best strategy for multiple choice is: always guess. The worst case–when you really don’t have a clue–is that you sill neither gain nor lose by guessing. More likely, you know more than you will realize, and will pick up on average a fraction of a point by guessing.

    Reply
  53. trrll

    One of the worst pedagogical mistakes in education is grading work that is intended as exercises rather than for evaluation. This can result in the unfair outcome of failing a student who has in fact mastered the material and scored well on the exam, because the student neglected to complete exercises that he in fact had no educational need for, because had had mastered the material by other means.
    It is sometimes objected that grades are assigned for exercises “for the student’s own good,” with the intent of providing a short term incentive to study for students who would otherwise put it off until the last minute, and suffer as a consequence. However, there is no excuse whatsoever for making non-evaluation exercises a major component of a student’s grade. I suggest a ceiling of 10% of the contribution of exercises to the final grade. An even better approach would be to establish a rule that grades on exercises can only increase, and never decrease, a student’s average grade for the course.
    It is also a bad idea to score work that is intended for learning rather than evaluation. Students should have the opportunity to make errors on learning exercises without being penalized, because making mistakes is part of learning. A solution that I reached in a course I taught recently was to award a certain (small) number of points for completing each exercise, regardless of whether they got the questions right or wrong.

    Reply
  54. Daithi

    SiMPel MYnd,
    Great post, SiMPel MYnd, and thanks for the explanation of how students get GPAs that are over 4.0. I’ve asked my kids a couple of times how you get a GPA over 4.0 and never really got a good answer. They thought you took extra classes, which goes to show you they don’t quite have the concept of averages down yet. Based on your post, it also looks like some school administrators don’t have the concept of averages down either – or at least there ramifactions.
    To the HS teachers,
    I feel your pain – actually I was one of the students that caused your pain. I was an awful student that barely passed high school. It wasn’t that I couldn’t do the work. When I actually did the work I usually got As, but quite often I just didn’t do the work at all (or did the work but didn’t turn it in). So I got plenty of 0s and I was lucky if my grade in a class averaged out to a D.

    Reply
  55. Blake Stacey

    Thinking about the matter for a bit, I suspect that there might have been a purpose to letter grades in college, where education wasn’t so much about memorizing the largest possible fraction of a pre-digested fact set, and instead concerned developing the student’s aptitude as a scientist. Numerical grades were accumulated during the semester, and after finals, the professors had to use their judgment to assign grades. Instead of taking a mapping from numbers to letters as if it had been handed down from Sinai, the divisions were chosen so that each letter grade would have the appropriate meaning. To wit:

    A: Exceptionally good performance, demonstrating a superior understanding of the subject matter, a foundation of extensive knowledge, and a skillful use of concepts and/or materials.
    B: Good performance, demonstrating capacity to use the appropriate concepts, a good understanding of the subject matter, and an ability to handle the problems and materials encountered in the subject.
    C: Adequate performance, demonstrating an adequate understanding of the subject matter, an ability to handle relatively simple problems, and adequate preparation for moving on to more advanced work in the field.
    D: Minimally acceptable performance, demonstrating at least partial familiarity with the subject matter and some capacity to deal with relatively simple problems, but also demonstrating deficiencies serious enough to make it inadvisable to proceed further in the field without additional work.
    F: Failed. This grade also signifies that the student must repeat the subject to receive credit.

    It happened now and then that a professor would write a test that was too hard — say, that a problem worth many points could only be solved if you recognized a particular exceptionally subtle and technical point. If this meant nobody in the class managed to solve the midterm’s problem on degenerate perturbation theory, cumulant expansions or whatever, then that midterm wasn’t a good experiment for measuring the class’s knowledge level. The boundary between A and B might be set at 85% instead of 90% that semester.
    Bottom line: letter grades should be used as short forms of words like “Excellent” or “Adequate”, not poor man’s versions of numbers.

    Reply
  56. trrll

    It happened now and then that a professor would write a test that was too hard — say, that a problem worth many points could only be solved if you recognized a particular exceptionally subtle and technical point. If this meant nobody in the class managed to solve the midterm’s problem on degenerate perturbation theory, cumulant expansions or whatever, then that midterm wasn’t a good experiment for measuring the class’s knowledge level. The boundary between A and B might be set at 85% instead of 90% that semester.

    The extreme case of this is of course grading “on the curve,” in which the scores are normalized to a predetermined mean and standard deviation regardless of the actual scores. So if a course is B-centered, than a student who scored at the mean on the exam would get a B, even if the mean is 25.
    Grading on the curve has the advantage that it corrects for an exam that is made overly easy or overly difficult, but is often unpopular among students because it sets them even more in competition than they would otherwise be (i.e. if you do exceptionally well on the exam, it hurts your classmates). And while students rarely complain if the curve bumps their grade upward, they tend to be unhappy when it goes the other way. One sometimes hears the rather naive assertion that curving is wrong because everybody should be graded on an absolute scale. It is a naive notion, because the only means an instructor has of judging whether a question is fair is how well the students do. So if students do poorly on a particular question one year, the instructor will figure either that the question was too hard, or that he did not do a good job of conveying that material, and the questions next year will likely be easier, or else the teaching will be modified to provide more instruction on how to answer that kind of question. So the net result is the same as curving, but with a longer time constant.

    Reply
  57. Зарт

    1. More digits for a grade do not provide more information.
    I’m astonished that only two comments (#16 by Carlie and #56 by Blake Stacey) out of almost sixty even mentioned it.
    If we can classify students only in five categories at best in the first place then it does not make any sense to distinguish 65 and 70 for example. More digits are just an illusion that we have more information about a student.
    2. To begin with what is the purpose of a grade?
    a) Feedback by teacher to student
    b) Formal criteria used at some evaluation process (who allowed to go to the next year in school, or in college applications, or even in resumes for a job if there are stupid enough companies to hire based on grades, or to evaluate the teacher/school).
    These purposes are in conflict. This along makes it hard to expect valid grades. Therefore (considering many many other problems in evaluating someone expertise at something) the whole process cannot be seriously harmed by difference between F=0 and F=60, be course there are already bigger issues.
    P.S. I’m sorry for poor English. It is not my first language and not even the second. I hope it does not obscure the meaning of what I was trying to say.

    Reply
  58. Donalbain

    As a high school science teacher, I get around this whole issue by not grading homework at all. I only mark work with comments. A student learns more if you say “Write the title!” on a graph than if you say “7/10”. And, research and my experience both say that if you put both, then only the “7/10” is noticed.

    Reply
  59. SLC

    Re trrll
    I don’t know what the current thinking is but when I took the College Board exams a million years ago, coaching organizations who prepared students for the exams all recommended against guessing if the test taker could not, with great confidence, eliminate at least one of the choices.

    Reply
  60. Jonathan Vos Post

    Testing and its abuses have been a subject of concern for me and my associates in several different schools within the past month.
    I have experienced a yawning chasm between the peak of quality at Caltech (my alma mater, at which I attended a 35th annual alumni reunion last weekend) and the abyss of failing local schools where I teach Math, Science, English, Art, History, and the like as a substitute. In these schools (I’ve taught at the highest- and lowest-ranked) the differences amount to re-segregation of public schools by race and income. It is shocking, frightening, and an existential threat to America.
    But, setting aside for the moment the questions about how a teacher grades a set of students, which I agree is crucial (in that good teaching requires strength along 3 axes: instruction, assessment, and management), how can the teacher be graded by the school, and the school be graded by the district, state, and nation?
    “No Child Left Behind” imposes unfunded mandates on states to test their students using bad standardized tests, and de facto forces teachers to teach to the test.
    I liked standardized testing when I grew up in New York City, as my string of 100% scores on the Regents exams guaranteed good grades in classes where I had been wrongly perceived as failing. Ironically, as a matter of law, pushed by powerful teacher’s unions, the State of New York may NOT use student test scores either to grant nor deny tenure to public school teachers. Weird!
    I hate, as a teacher, the standardized tests which I have to proctor. In California we now have two different suites of such tests, a situation exacerbated by “No Child Left Behind.”
    There is the CAHSEE, which students must pass to graduate from high school. Much of my teaching effort in the past year has been directed at students who have failed, or are expected to fail, the Math CAHSEE.
    Then there are the STAR statewide tests. My final 3 weeks of teaching at Rose City High School were blighted by STAR tests, which lasted 2 full weeks and utterly disrupted lesson plans, teaching schedules, student motivation, and attendance.
    Third are the API tests, which rank schools.
    The past two days I taught at the closest school to my home, Eliot Middle School, which ranks at the bottom, according to the following story. I have also taught at Muir High School (bottom ranked), as well as at Washington and Wilson middle schools (bottom third), at Norma Coombs middle school (top in PUSD with 8 of 10) where my son graduated 8th grade immediately before starting college at age 13. I’ve taught at other high schools and middle schools in PUSD (Pasadena Unified School District), which gives me perspective. And, from 4 to 8 p.m., I’ve been taking teacher’s college classes on Urban Schooling, on Lesson Planning, and on Classroom Management. I have too much to say about testing for this blog thread; my current draft Classroom Management Plan, perhaps 2/3 complete, is already 60 pages long, and I handed in about 20 pages of journal entries on my PUSD teaching experiences as they relate to issues of lesson planning, testing, and classroom management. The latest news story on testing follows.
    Nowhere to go but up
    Muir, Altadena rank at bottom of API
    By Caroline An, Staff Writer
    Pasadena Star-News
    Article Launched: 05/21/2008 10:48:38 PM PDT
    http://www.pasadenastarnews.com/ci_9341344
    John Muir High School and Altadena Elementary School received the lowest possible ranking on the Academic Performance Index, according to results released Wednesday for the state’s main method of measuring how individual schools are performing.
    Both Muir and Altadena Elementary received a ranking of 1 on the API, which scores school achievement on a scale of 1 to 10.
    “It was not surprising,” Pasadena Unified School District Superintendent Edwin Diaz said of Muir’s results, considering the academically troubled high school posted an API score of 569 last year, a 32-point drop from its score of 601 in 2006.
    Wednesday’s ranking was based on last year’s API score and other state-mandated testing, Diaz noted.
    News that Muir had fallen below 600 on the API triggered the district in October to launch a complete overhaul of the high school. Earlier this year, the district approved a plan to turn Muir into five small “learning communities” – academies that would serve as schools within the school.
    Muir teachers also were made to re-apply for their jobs.
    In addition, Muir received five additional teachers this school year to staff more intervention classes for students on the verge of failing.
    But the bulk of the changes at Muir, which is in the fourth year of a state monitoring program, are scheduled to be implemented next school year.
    More stability among Muir’s administrators, including the appointment of a new principal this year, should help improve its 2008 API score, due to be released by the state this summer, Diaz said.
    Altadena Elementary also received additional attention this school year, including extra funding to implement class-size reduction in fourth through sixth grades. The school also is implementing a variety of new teaching methods aimed at improving instruction, PUSD officials said.
    Overall, the PUSD had mixed results on the API. While 11 schools saw improvements from last year, three schools stayed the same in their rankings, and six others saw their rankings drop.
    The troubling performance of PUSD’s three middle schools remains a concern, officials said. Elliot, Washington and Wilson middle schools all scored in bottom third in the statewide rankings.
    District officials plan to discuss middle school reforms in June.
    At other public school districts in the West San Gabriel Valley, most schools improved or maintained their API rankings.
    Every school belonging to districts in the Arcadia, San Marino and South Pasadena received rankings of 10 – the same as last year’s results.
    The API ranks schools on how well they perform compared to other schools around the state. But it also compares schools with similar numbers of low-income and limited-English students, providing a separate ranking for that category.
    South Pasadena Unified School District Superintendent Brian Bristol said he was “thrilled” with the results for his district’s schools. He said the results show schools are making consistent academic strides from year to year.
    “We are honing our skills and reaching our best potential,” Bristol said.
    caroline.an@sgvn.com
    (626) 578-6300, Ext. 4494 http://www.insidesocal.com/
    hallwaymonitor

    Reply
  61. Johann Hibschman

    When TAing intro physics, everything was curved. Every component of the grade (homework, lab, exams) was converted into a z-score. These z-scores were then combined by the appropriate weightings (50% homework, etc.), to produce a final course score. The students hated it, but it had a certain appeal.
    Even back in high school, English and history classes never graded on a percentile system; the grading was too subjective to permit that level of detail. It’s hard enough to decide if a paper is a B+ or a B; 82 vs. 83 seems impossible.
    And, as a final anecdote, I amused myself while teaching an intro astronomy class one year by (privately) calculating the equivalent monkey score for multiple-choice exams, defined as the number of monkeys required to give a 50% chance that one of them would equal or beat a score. I had a 3-monkey student, but no worse.
    And, yes, I’m ashamed for giving a multiple-choice test, but I had 100 students, no TA, and they just weren’t paying me enough to give a real exam.

    Reply
  62. pat ballew

    A few questions that leap to mind… 1) Why do you use arithmetic averages instead of some other? Medians perhaps are a better reflection of the students overall ability in a class.. and Geometric means could provide creative approaches.. but the bigger question is 2) what does it matter how you average the grade if there is no consistency in what a grade means.. there have been multple posts about the student who knows 50% or 70% of the material… but what you mean is that he could answer 50% or 70% of the questions you chose to ask. And probably in the same class down the hall, across town in another school, or across the country in another state, the questions on the same topic are completely different, are scored on a different basis, then applied to a different cut scale to get A-F grades. I love schools that pretend they are tough becuase their scores are 93-100 for an A instead of 90-100…. so all the teachers write test so that A students get 93% of the questions right instead of 90%… and I know lots of advanced placement teachers who take the square root of the percentage right to give a grade.. the student scores 64%, the square root is 80%.. you get a B-, and that is probably a fair grade if the questions are really tough…

    Reply
  63. Hank

    My favorite line is:
    “It’s a classic mathematical dilemma: that the students have a six times greater chance of getting an F.”
    Mark, you should include assuming that every distribution is uniform on your list of silly statistical errors. Or do you think that he committed the greater sin of misrepresenting his data in order to strengthen his point?

    Reply
  64. Mark C. Chu-Carroll

    Hank:
    You’re right that assuming uniform distribution is a fundamental error, and that I did leave it out of my post where I listed the major stupid errors – there are so many examples of people making errors based on assuming a uniform distribution! I’m shocked that I didn’t think to include it – that was extremely negligent on my part!
    With respect to the specific quote that you mentioned… I think that the person you quote is so damned innumerate that he doesn’t even have a clue of what a probability distribution is – he doesn’t even know enough to know how incredibly stupid his statement is. The whole issue of grading is based on a deep misunderstanding of what the grade numbers mean.

    Reply
  65. trrll

    I don’t know what the current thinking is but when I took the College Board exams a million years ago, coaching organizations who prepared students for the exams all recommended against guessing if the test taker could not, with great confidence, eliminate at least one of the choices.

    Which raises the question of why people are getting SAT coaching from guys so statistically innumerate as to make such an obviously incorrect recommendation.

    Reply
  66. daenku32

    Bring forth the grade letter “E”. Make F 30% or something. That way you at least get the students through the system. They might have not retained a lot, but its better than them just dropping out in second year of HS.
    I did my early schooling in Finland. In primary schools, it is 4 to 10. 4 is failure. Get a 4 as course grade and you failed the course and will head to summer-school. But, they have 6 working grades: 5,6,7,8,9,10. 10 is a real bitch to get. 99% performance or the like.
    In vocational/college level it is 0 to 5. 0 means you failed the test/course. 5 means you got an A+. Very hard to get.
    http://www.utu.fi/en/studying/studies/studyregister/grades.html
    Since coming to states and enrolling back into college after a 10 year hiatus, I have had plenty of A & B grades. Enough to maintain 3.7 GPA.

    Reply
  67. Jim Thomerson

    Some disjointed comments. Grade points have been mentioned in passing. Grade point averages are important. Try getting into graduate school with a low grade point average.
    On T/F tests, if the test has less than 80% true answers, students will score lower on it than they should. An excess of false answers lowers the students’ confidence in their knowledge and leads them to make mistakes. (I suspect the mistake is to mark F questions T, but don’t know.)
    So far as guessing, I guessed regardless and scored well enough that I never quit.

    Reply
  68. beet

    Grading system in my country goes from 0 to 100. It’s really hard for someone to get an absolute 0, but I’ve seen it happen. Students who get 23% are really different from the ones that get 49%. Of course you wouldn’t notice it in a true/false test, but I definitely don’t see how those tests can really “test” anything. For maths, asking for numerical values should be enough I think.
    I don’t get why you use A-B-C-D-E-F. A 0% and a 50% student are waaayyy different. But 0 to 100 can be ridiculously detailed also: you need at least 60% (59.5+) in order to approve, so if you have 59.4%, you fail (by just 0.1%).

    Reply

Leave a Reply to HS teacher Cancel reply