Excursions Into Applying ML for Learning Sciences: Initial Results and Lessons Learned

An important challenge in Learning Sciences is the coding of qualitative data for evidence of students’ engagement in scientific practices.

An important challenge in Learning Sciences is the coding of qualitative data for evidence of students’ engagement in scientific practices. Such evidence may appear as novel ideas, expressions of puzzlement, or idiosyncratic lines of reasoning. Typically, identifying such evidence requires time-consuming labor by trained analysts. In this work we explore the possibility for statistical machine learning (ML) methods to aid learning sciences researchers in qualitative coding. We start with a human coded set of lab reports from a biology course that were scored using an adapted version of the domain-general Structure of Observed Learning Outcomes (SOLO) taxonomy (Biggs and Collis, 1992). The adapted four-level scheme assigns higher scores to lab reports that exhibit desirable features of scientific writing, specifically: more complex claim structures, use of multiple evidences, and appropriately qualified conclusions that address or acknowledge uncertainty.

We will present and evaluate a novel ML workflow that achieves good performance in scoring the lab reports for evidence of scientific thinking in student writing. This finding is subsequently corroborated via a blind re-coding experiment, wherein the reports that were always mis-classified by the ML algorithm in majority of the cross-validation steps, were re-evaluated by human coders. We found that the ML predictions agreed well with the re-coded scores thereby indicating the possibility that computational NLP tools can approach the reliability of human coding and may assist researchers in automatic coding at-scale.

Author: Shuchin Aeron