An interesting use of artificial intelligence in education: to catch the cheats

Lately I have been involved in different discussions about Artificial Intelligence in education behind the screens. To paraphrase Larry Cuban: the claims made about AI are often oversold, with simple algorithms without any element of machine learning being labeled AI. E.g. together with some other scientists I’m still waiting for any solid evidence of a Tech company claiming in media that their tool delivered a smashing learning gain of 30%. We have been waiting for months now.

This doesn’t mean that there aren’t any possibilities. I have been involved in a research project that has tried machine learning to correct written tasks for a specific group of pupils together with colleagues from Arteveldehogeschool and Ghent University. We’re still working on it.

This study lies in the same line of thinking as it uses technology to determine whether you wrote your assignment or whether a ghostwriter penned it for you — with nearly 90% accuracy. I do think this is indeed possible.

From the press release:

Combining big data with artificial intelligence has allowed University of Copenhagen researchers to determine whether you wrote your assignment or whether a ghostwriter penned it for you – with nearly 90 percent accuracy.

Several studies have shown that cheating on assignments is widespread and becoming increasingly prevalent among high school students. At the University of Copenhagen’s Department of Computer Science, efforts to detect cheating on assignments through writing analysis by way of artificial intelligence have been underway for a few years. Now, based on analyses of 130,000 written Danish assignments, scientists can, with nearly 90 percent accuracy, detect whether a student has written an assignment on their own or had it composed by a ghostwriter.

Danish high schools currently use the Lectio platform to check if a student has handed in plagiarized work that has passages copied directly from a previously submitted assignment. High schools have a harder time discovering if a student has enlisted someone else to write the assignment for them, something that happens to a more or less systematized degree via online services. The case of the SRP, a major written assignment in the final year of Danish high school, is particularly telling. Because the assignment counts for double, students have gone as far as tendering out their writing assignments on the Danish classified website, Den Blå Avis.

“The problem today is that if someone is hired to write an assignment, Lectio won’t spot it. Our program identifies discrepancies in writing styles by comparing recently submitted writing against a student’s previously submitted work. Among other variables, the program looks at: word length, sentence structure and how words are used. For instance, whether ‘for example’ is written as ‘ex.’ or ‘e.g.’,” explains PhD student Stephan Lorenzen of the Department of Computer Science. He, along with the rest of the DIKU-DABAI research group, recently presented their findings at a major European AI conference.

Prior to setting the trap, an ethical debate

The program, Ghostwriter, is built around machine learning and neural networks – branches of artificial intelligence that are particularly useful for recognizing patterns in images and texts. MaCom, the company that provides Lectio to Danish high schools, has made a dataset of 130,000 written assignments from 10,000 different high school students available to Ghostwriter project researchers at the Department of Computer Science. For now, it is still a research project.

Stephan Lorenzen doesn’t think that it is unrealistic for the program to find its way into high schools in the not too distant future, as schools must constantly stay apace with technological developments to address ‘authorship verification’.

“I think that it is realistic to expect that high schools will begin using it at some point. But before they do, there needs to be an ethical discussion of how the technology ought to be applied. Any result delivered by the program should never stand on its own, but serve to support and substantiate a suspicion of cheating,” believes Lorenzen.

Police and fake news

Ghostwriter’s technological foundation can be applied elsewhere in society. For example, the program could be used in police work to supplement forged document analysis, a task carried out by forensic document examiners and others.

“It would be fun to collaborate with the police, who currently deploy forensic document examiners to look for qualitative similarities and differences between the texts they are comparing. We can look at large amounts of data and find patterns. I imagine that this combination would benefit police work,” says Lorenzen, who emphasizes that ethical discussions are needed here as well.

The artificial intelligence used by researchers at the Department of Computer Science to detect cheating on assignments has a wide range of applications. It has already been used to analyze Twitter tweets to determine whether they were composed by actual users or penned by paid imposters or robots.


  • The ghostwriter program uses what is known as a Siamese neural network to distinguish the writing styles of two texts. The network is trained on large amounts of data to learn from representations of writing styles, which are then compared.
  • When a student submits an assignment, the network compares it against their previous assignments. For each previous assignment, the network provides a percentage score for writing style similarity against the new assignment.
  • In the end, a weighted average of these scores is calculated using a calculation that also takes other factors, such as delivery time, into account. This final score is presented as a percentage and indicates the similarity between the new assignment and the student’s writing style.

Abstract of the study:

Students hiring ghostwriters to write their assignments is an increasing problem in educational institutions all over the world, with companies selling these services as a product. In this work, we develop automatic techniques with special focus on detecting such ghostwriting in high school assignments. This is done by training deep neural networks on an unprecedented large amount of data supplied by the Danish company MaCom, which covers 90% of Danish high schools. We achieve an accuracy of 0.875 and a AUC score of 0.947 on an evenly split data set.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.