Automatic Identification of Collaborative Problem-Solving Phases from Oral Peer Dialogue in Classroom
DOI:
https://doi.org/10.34190/icair.5.1.4292Keywords:
Spoken Dialogue, Face-to-Face Classroom, Machine Learning (ML), Deep Learning (DL), Multi-label ClassificationAbstract
Collaborative problem solving (CPS) is a critical competency in the Artificial intelligence (AI) era, requiring the integration of cognitive and social skills through real-time dialogue and coordination. While prior studies have explored CPS behaviours using human-coded text from online platforms, limited research has examined how machine learning (ML) and deep learning (DL) models perform on spoken peer dialogue in face-to-face (F2F) classroom settings. This study investigates the automatic classification of CPS phases using a validated coding framework applied to two classroom tasks—one supported by a GenAI assistant and one not. A total of 7,744 utterances were manually labelled across nine CPS subskills and three broader facets. Six ML and five DL models were evaluated, including lightweight BERT variants combined with various classifiers. Results show that BERT-based models significantly outperform traditional ML approaches. Specifically, BERT+ANN achieved better overall performance in smaller, imbalanced datasets, while BERT+CNN performed better in larger datasets. Reducing label granularity from nine subskills to three facets consistently improved classification accuracy and F1 scores. Both models achieved AUROC scores around 0.90, indicating strong discriminative capability. Several key insights emerged from the findings: Model architecture matters: Simpler classifiers like ANN preserve BERT’s semantic representations and offer stable performance, especially in smaller or imbalanced datasets. Task context influences CPS behaviour: Different tasks elicit distinct CPS skill distributions, with task regulation dominating in technical tasks and communicative participation more prevalent in reflective tasks. Label granularity affects performance: Reducing the number of classification labels (e.g., from 9 subskills to 3 facets) significantly improves model accuracy and generalizability. Lightweight models are viable: Even with a reduced-capacity BERT model, competitive performance was achieved, suggesting potential for real-time, resource-efficient deployment in educational settings. This study contributes to educational AI by introducing a novel oral CPS dataset, benchmarking multiple models, and demonstrating the feasibility of lightweight architectures for real-time deployment. Limitations include the small sample size and single-modality input. Future work should explore multimodal features, larger and more diverse classrooms, and teacher-facing dashboards for actionable feedback. The findings support the development of scalable, ethical, and human-centered learning analytics tools that enhance collaborative learning in AI-enhanced education.