Using Gradescope to Autograde programming assignments

Teaching College spoke to Sean Bechhofer, Reader in Computer Science about how Computer Science have been using Gradescope for marking programming assignments.

What teaching challenge were you trying to resolve?

Computer Science has large student cohorts. As part of assessments students submit code, which then needs to be checked against test data.

Previously, the assignments were graded using some custom code. Assignments were submitted via our git repository, and we then ran some unit tests against those submissions. This worked, but was costly to maintain. It was also quite brittle, and students got little feedback during the development process. In particular, small issues with submissions would often result in the custom code not running at all, resulting in marks of zero.

Gradescope is a tool that enables electronic submission programming code and is available to use within University of Manchester course units. Gradescope also facilitate autograding of code. Autograding consists of a setup script which will install any dependencies needed along with an executable script that does something appropriate with the submitted code and provides results conforming to a spec.

My hope was that Gradescope would provide a consistent environment for running the submissions, removing some of the issues that we were seeing. Gradescope has some nice infrastructure for managing the tests used for autograding which allows for different levels of visibility. This means that we could provide students with some feedback on the performance of their submissions.

How did you do it?

We spoke to the eLearning team to find out a bit more about Gradescopes capabilities. The eLearning team were able to help us with the basics of setting up Gradescope within a course. However, because the programming assignment is a specialist area, eLearning put us in touch with Gradescope support directly.

Gradescope has some detailed Gradescope Autograder Documentation, including some example Autograder codes, which helped us to get started.

What did the process look like for students?

Students can submit via upload to Gradescope or through a git repository. The uploaded files can then be run against the autograder code.

Direct Submission

Zip files can be used to submit a directory structure. With this approach staff need to make sure that the archive unpacks into the appropriate directory structure. 

Integration with Git repositories

Gradescope allows integration with git repositories. Integration allows students to identify a repository and branch that will be pulled into the autograder. Out of the box, Gradescope allows integration with github and bitbucket. Integration with gitlab is also possible but requires some set up in order to support authorisation.

Further information on how we set up integration with Git repositories is available on request from the Faculty eLearning team.

Using Gradescope for COMP15212

The autograder was used for a coursework assessment for COMP15212 Operating Systems. The unit had 278 students registered, with 249 submitting.

There were a selection of unit tests used.

A set of zero weighted tests provided instant feedback to students about their solutions. This helped to reduce issues with poorly formed submissions – for example code that does not run due to syntax errors.
A set of weighted tests with post-publication visibility. These were then used to determine a mark for the work.

We still have some ongoing questions about Gradescope. It’s not yet clear whether more sophisticated integration is possible – for example having Gradescope pull from the repository every time code is pushed, to allow a continuous integration style of working. Gradescope does not clone the repository but gets a copy of the files from the given repository/branch. So, it’s not possible to identify a particular commit for marking or run checks on the git structure, for example looking for commit histories or tags. We may be able to add this as a feature request if it is something that we really want/need.

What were the benefits of using Gradescope Autograder?

The autograding worked well. Few problems were reported by students.

The tests were themselves largely generated from templates using a set of test data and a reference implementation. The tests themselves could have had better discrimination: this is a question of content however rather than any failings in the infrastructure.

Gradescope also provides malpractice detection through MOSS. (All submissions are cross-checked, and a report provides headline similarity scores along with detailed drill down into similar code and diff-like reports.) The malpractice detection allowed us to identify several cases of malpractice which were successfully prosecuted.

What did your students think?

In previous years, we had a lot of queries about the submission process. With Gradescope, there were very few. The use of the visible tests also gives some immediate feedback that can give students confidence that they are on the right track.

Conclusion

I plan to continue using Gradescope for this kind of exercise. There are improvements that we can make to the content, but as a system for managing the assessment it worked very well. As it is integrated through the LTI specification it will hopefully work just as well with Canvas!

Further resources

Gradescope Programming Assignments