Guest Opinion Essay

Robo-Readers Aren’t as Good as Human Readers — They’re Better

By Annie Murphy Paul

Posted on August 14, 2014

Annie Murphy Paul

From The Hechinger Report:

In April of 2012, Mark D. Shermis, then the dean of the College of Education at the University of Akron, made a striking claim: “Automated essay scoring engines” were capable of evaluating student writing just as well as human readers. Shermis’s research, presented at a meeting of the National Council on Measurement in Education, created a sensation in the world of education —among those who see such “robo-graders” as the future of assessment, and those who believe robo-graders are worse than useless.

The most outspoken member of the second camp is undoubtedly Les Perelman, a former director of writing and a current research affiliate at the Massachusetts Institute of Technology. “Robo-graders do not score by understanding meaning but almost solely by use of gross measures, especially length and the presence of pretentious language,” Perelman charged in an op-ed published in the Boston Globe earlier this year. Test-takers who game the programs’ algorithms by filling pages with lots of text and using big words, Perelman contended, can inflate their scores without actually producing good writing.

Perelman makes a strong case against using robo-graders for assigning grades and test scores. But there’s another use for robo-graders — a role for them to play in which, evidence suggests, they may not only be as good as humans, but better. In this role, the computer functions not as a grader but as a proofreader and basic writing tutor, providing feedback on drafts, which students then use to revise their papers before handing them in to a human.

Instructors at the New Jersey Institute of Technology have been using a program called E-Rater in this fashion since 2009, and they’ve observed a striking change in student behavior as a result. Andrew Klobucar, associate professor of humanities at NJIT, notes that students almost universally resist going back over material they’ve written. But, Klobucar told Inside Higher Ed reporter Scott Jaschik, his students are willing to revise their essays, even multiple times, when their work is being reviewed by a computer and not by a human teacher. They end up writing nearly three times as many words in the course of revising as students who are not offered the services of E-Rater, and the quality of their writing improves as a result. Crucially, says Klobucar, students who feel that handing in successive drafts to an instructor wielding a red pen is “corrective, even punitive” do not seem to feel rebuked by similar feedback from a computer.

A close look at one of the growing number of independent studies of automated writing feedback provides some clues as to what might be going on among NJIT students. Khaled El Ebyary of Alexandria University in Egypt and Scott Windeatt of Newcastle University in Britain published the study in the International Journal of English Studies; it looks at the effects of a robo-reader program called Criterion on the writing of education students learning to teach English as a foreign language. The students in the study received Criterion’s feedback on two drafts of essays submitted on each of four topics.

The computer program appeared to transform the students’ approach to the process of receiving and acting on feedback, El Ebyary and Windeatt report. Comments and criticism from a human instructor actually had a negative effect on students’ attitudes about revision and on their willingness to write, the researchers note. By contrast, interactions with the computer produced overwhelmingly positive feelings, as well as an actual change in behavior — from “virtually never” revising, to revising and resubmitting at a rate of 100 percent. As a result of engaging in this process, the students’ writing improved; they repeated words less often, used shorter, simpler sentences, and corrected their grammar and spelling. These changes weren’t simply mechanical. Follow-up interviews with the study’s participants suggested that the computer feedback actually stimulated reflectiveness in the students — which, notably, feedback from instructors had not done.

Why would this be? First, the feedback from a computer program like Criterion is immediate and highly individualized — something not usually possible in big classes like those at Alexandria University, the site of the study by El Ebyary and Windeatt. Second, the researchers observed that for many students in the study, the process of improving their writing appeared to take on a game-like quality, boosting their motivation to get better. Third, and most interesting, the students’ reactions to feedback seemed to be influenced by the impersonal, automated nature of the software.

This may seem paradoxical. When critics like Les Perelman of MIT claim that robo-graders can’t be as good as human graders, it’s because robo-graders lack human insight, human nuance, human judgment. But it’s the very non-humanness of a computer that may encourage students to experiment, to explore, to share a messy rough draft without self-consciousness or embarrassment. In return, they get feedback that is individualized, but not personal — not “punitive,” to use the term employed by Andrew Klobucar of NJIT.

Evidence of this peculiar advantage of technology can be found in a field outside education. Public health professionals have long known that people will more readily disclose sensitive information to a computer than to a person. When typing their answers on a keyboard, rather than looking a questioner in the eye, respondents reveal more about their health problems, acknowledge that they’re suffering from more symptoms (especially psychiatric symptoms), admit more HIV risk behaviors and confess more drug use. (Fun fact: Women confess a greater number sexual partners when asked by a computer, while men drop the macho act and admit fewer.)

The “disinhibition effect” produced by technology, writes Adam Joinson, a professor at the University of the West of England, emerges whenever an individual has reason to feel anxiety, self-consciousness or worries about being evaluated. And anxiety, self-consciousness and worries about evaluation are just the emotions that, sad but true, many people feel around learning. Research has repeatedly shown that many students experience these uncomfortable emotions in relation to writing, as well as to math, science and foreign languages.

Precisely because learning can be so emotionally fraught, a non-judgmental computer may motivate students to try, to fail and to improve more than almost any human. Just don’t let the robo-reader give out grades.

This article originally appeared in The Hechinger Report.