Adaptive Selection Algorithm and Standard Error Termination Rule in Comparative Judgement: An Application for Assessing Writing Skills

Sungur Gürel, Murat Doğan Şahin, İbrahim Uysal, Ali İhsan İbileme, Tuba Gündüz

Abstract

This study aims to examine the scoring reliability of comparative judgement under different sample sizes and standard error termination rule conditions. For this purpose, a Monte Carlo simulation study with 9 conditions and 82 iterations was conducted with sample sizes of 250, 500 and 1000 and standard error termination rules of 0.40, 0.35 and 0.30. In addition, a application for assessing writing skills was conducted with a sample of 50 students using the standard error termination rule of 0.40 and a maximum number of comparisons of 40. In the simulation study, scoring reliability was determined by true reliability, rank order accuracy and scale separation reliability. In the application, the correlation between scores that are obtained with a holistic rubric and ability estimates that are obtained with adaptive comparative judgement as well as the correlation between scores that are obtained using an analytic rubric and ability estimates that are obtained with adaptive comparative judgement were examined. In addition, scale separation reliability was calculated to obtain ability estimates using adaptive comparative judgement. The simulation results showed a high level of reliability in all conditions. Moreover, reliability was high, independent of the sample size. We conclude that stricter standard error termination rules lead to higher levels of reliability, but this requires performances to be subjected to a higher number of pairwise comparisons. The application results showed high scale separation reliability of .89 and correlations of over 0.70 with the scores obtained by using both holistic and analytic rubrics. Overall, the results of the study suggest that adaptive comparative judgement can be used in both classroom and large-scale assessment applications. In addition, adaptive comparative judgement is considered advantageous because it is easier to administer, does not require a difference in the testing process, and places the abilities on a continuous scale.

Keywords

Comparative judgement, Holistic assessment, Scale separation reliability, Pairwise comparison


DOI: http://dx.doi.org/10.15390/EB.2025.14123

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.