![]() |
|
![]() |
|
When I was a TA at CMU, we used Gradescope https://www.gradescope.com/ for this. Every exam would be scanned and divided into problems (based on a predefined template - fixed page space for answers). Then, each problem was assigned to a TA. Either there's a predefined rubric, or you create it as you go (-1 point for mistake X, half credit for mistake Y, etc.). There's a pretty slick interface where you just read the answer, and use keyboard shortcuts to apply the relevant deductions. It still has the issue that every time you change the rubric, you'd need to go back and re-do previously-graded instances of that problem. But it was way faster and (equally important) less tiring. |
![]() |
|
Online tools like Gradescope make this a little less painful (but still painful), but sometimes it's what's needed, especially on problems that are a little open-ended.
|
![]() |
|
The LLM can compile verbose prose down to a short summary. If the summaries of each chunk are consistent, then it’s at least structurally well written. Then you grade the summary itself.
|
![]() |
|
Anecdotally, the course I grade has this effect (just looking at the average score). I have been grading this course from last 5 years(9-10 times). Last names with L-Z score slightly more than A-L.
|
![]() |
|
Imagine a question: compare bubble sort and quick sort algorithm. Some students might mix up the algorithms, some might give the incorrect computation complexity, some might describe them incorrectly in some way. Manually grade some (or all of) the answers by noting the kinds of things students got wrong (e.g. the above criteria). Then, feed in to ChatGPT (or your favorite alternative) the answer + the categories of mistakes to expect. Here's a simplified example: https://chat.openai.com/share/bf801e12-51d5-4255-9968-bbf91b... |
![]() |
|
Probably every single health or wellness "Find a Provider" portal lists them A-Z. That's a multi-billion dollar industry. If I was Dr. Zachary Zane, I'd change my name.
|
![]() |
|
My daily standup is run by the order my boss sees the participants in the JIRA board -- My first name starts with W, so I'm last in that list. Makes staying engaged the whole meeting hard...
|
![]() |
|
I wouldn't be surprised. It's very natural. Probably not for that specific use case but if for some reason you are actually going through the list then it's natural
|
![]() |
|
I think the point is that some automated systems like Canvas may hide the names, but they're still presented in alphabetical order. Pseudonyms don't help if you don't shuffle them.
|
![]() |
|
>One simple fix would be to make random order the default setting. Fixed in the sense that the bias will be random. Presumably students graded last will still receive lower grades. |
![]() |
|
Haha, I've always enjoyed being at the end getting less attention from teachers. If the data merely shows a correlation, it may as well be explained by us at the end being under less pressure.
|
![]() |
|
There's a section of one of the Diary of a Wimpy Kid books that talks about this exact thing. I was reminded of it as soon as I saw the headline. The justification is comes up with is that kids with names at the front of the alphabet sit in the front of the classroom, so they get called on and learn more. It definitely turned some gears in my brain when I first read it as a teen. Here's the relevant page: https://imgur.com/a/6wIx6qg
|
![]() |
|
That 0.6 pt gap over multiple semesters is the difference between graduating with “summa cum laude” or “magma cum laude”
|
![]() |
|
It’s 0.6% so it would only be if you happened to drop a letter grade as a result. Like 90.5 -> 89.9. And that would have to happen multiple times to significantly affect your GPA.
|
![]() |
|
https://nautil.us/impossibly-hungry-judges-236688/ > we should dismiss this finding, simply because it is impossible. When we interpret how impossibly large the effect size is, anyone with even a modest understanding of psychology should be able to conclude that it is impossible that this data pattern is caused by a psychological mechanism. As psychologists, we shouldn’t teach or cite this finding, nor use it in policy decisions as an example of psychological bias in decision making. |
![]() |
|
It seems it would take less time for Instructure, Inc. (makers of the mentioned software) to fix this than it took do this research. Anyone know whether this is happening, and if not why not? |
![]() |
|
If we changed our policy of exams from discriminative to evaluative, grading bias would be a trivial issue but here we are since we just NEED ways to fit everyone into numbers that we can easily use.
|
![]() |
|
I noticed this in myself last time I was as a TA. I'd go back and re-grade the first 15 assignments or so to make sure the rules were being applied consistently.
|
![]() |
|
What other popular systems might lead to different outcomes based on sort order? Dating site matches? Your own contact list? Interesting category of problems... |
![]() |
|
I had a theory in school that this was the case for presentations too so I always forced myself to go first. No one else to compare me against, and no sitting around getting jittery.
|
![]() |
|
Clearly evidence of anti-Polish bias when all the Zbigniews and Zygmunts and Wojteks get lower grades. (Or just another example of correlation vs. causation in action)
|
![]() |
|
That's thought-provoking - thank you. Essentially, unless it's an old exam where the universe of bad answers is already known, you need two passes - a discovery pass followed by the grading pass. |
![]() |
|
In my case, I have to make a conscious effort to remain consistently (in)tolerant of lazy writing. It’s hard to keep on reading between the lines and giving the benefit of the doubt.
|
However, I also grade weekly exercise sheets during the semester, and these are committed into a repository, where each student has a folder that... begins with the first letter of their first name. Everyone I have ever worked with acknowledges that you have to shuffle the order in which you grade these submissions each week, for fairness. Several effects come into play: (1) your are usually less tired at the beginning, (2) your mood gets better during the last 2 sheets because you know you are done soon, (3, and crucially) at the beginning, you have not yet seen all the common errors / developed a "feeling" for them, and you might thus miss them in early submissions, but spot them immediately in later submissions.
Another alphabetic effect: In elementary school, my name was on top of the list of students in my class. I remember that I often had to do some special job simply because I was the first name on this list (for example, carry a group ticket when we visited some museum, keep track of something, be the first at something where nobody wanted to be the first, with everyone watching, be the first to be graded in PE, again with everyone watching, etc.). As a fairly shy kid, this already annoyed me in first grade.