An experimental analysis of the relationship between the evaluations of artificial intelligence and pre-service teachers

Authors

DOI:

https://doi.org/10.21556/edutec.2024.89.3509

Keywords:

Assessment, Artificial Intelligence, ChatGPT, Teacher Training

Abstract

One of the potential benefits of AI is that it may allow optimizing teachers' tasks, enabling them to be more efficient in their work. This study aimed to analyze potential differences between the evaluations given by pre-service teachers and the evaluations given by different generative AIs. A total of 507 pre-service teachers participated, who were provided with a rubric to evaluate 12 texts of different types and qualities. The results showed how the performance of generative AIs in evaluating written tasks replicated the functioning of pre-service teachers quite accurately, with ChatGPT being the AI that best replicated the behavior of pre-service teachers, with an accuracy close to 70% of the evaluation provided by humans. Similarly, there were minimal differences in the evaluations given by pre-service teachers based on their gender and academic year. Likewise, generative AI overestimated scores provided for texts. However, this overestimation decreased as the performance of pre-service teachers improved. Thus, assessments provided by high-performing pre-service teachers were more aligned with those provided by generative AI compared to students with lower performance. These results are useful as they highlight how generative AI could be an aid tool guiding the pedagogical knowledge of pre-service teachers in digital assessment tasks.

Downloads

Download data is not yet available.

Author Biographies

Héctor Galindo-Domínguez, University of the Basque Country (Spain)

Doctor Cum Laude in Education from the University of Deusto, Master's Degree in Innovation and Research in Education from the UNED, Master's Degree in Prevention and Treatment of School Bullying from the University of San Jorge and Graduate in Primary Education from the UPV/EHU. He has worked as an associate lecturer in several degrees and master's degrees in different universities, although he currently works as an assistant lecturer in the Department of Didactics and School Organisation at the UPV/EHU. He combines this work with the publication of research in high impact scientific journals. His main lines of research deal with the analysis of new educational methodologies and technologies, the impact of educational research on teaching practice, the influence of personal and social variables on well-being, as well as the development of critical thinking. Finally, he has worked for years as a primary school teacher. 

Nahia Delgado, University of the Basque Country (Spain)

Doctor Cum Laude in Psychodidactics (Psychology of Education and Specific Didactics) from the University of Basque Country. Graduate in Primary Education (UPV/EHU) in the English minor. Official Master Degree in Development and Management of Didactic-Methodological Innovation Projects in Educational Institutions (Universidad de Mondragón). Degree in Graphic Design (Universidad Rey Juan Carlos de Madrid and INESEM). Work experience as a teacher in the Faculty of Philosophy and Anthropology (UPV/EHU). She has participated in the teaching research project "Economy and Taxes" of the Provincial Council of Gipuzkoa in the design of digital teaching materials. She has worked as a teacher for 5 years in several teacher training courses, within the "Prest_Gara" programme of the Basque Government. She is currently a researcher at the Institute for Democratic Governance (Globernance) and works as a researcher for the "Udal Etorkizuna Eraikiz" project of the Provincial Council of Gipuzkoa.

Martín Sainz de la Maza, University of the Basque Country (Spain)

Graduate in Psychology (UPV/EHU), in the social psychology minor and currently a doctoral student in the UPV/EHU's psychology programme. Official Masters in Individual, Group, Organisation and Culture (UPV/EHU) and Models and Areas of Research in Social Sciences (UPV/EHU). One year of work experience as a social educator and currently working as a teacher in the Department of Developmental and Educational Psychology in the Faculty of Education of Vitoria-Gasteiz (since 2021). External collaborator of the project "Resituating risk practices in the context of social relations. Study on social representations". Socialisation of different works derived from the research field in international and national congresses of educational and social sciences.

Ernesto Expósito, Université de Pau et des Pays de l'Adour (France)

Dr. Ernesto is a Full Professor at Université de Pau et des Pays de l'Adour, where he serves as Vice-Rector for International Relations and leads the Computer Science Department. He earned his Habilitation à diriger des Recherches from the Institut National Polytechnique de Toulouse (INPT) in 2010, with his research focusing on methodology, models, and paradigms for designing a next-generation transport layer. He obtained his PhD in Computer Science and Telecommunications from INPT in 2003, specializing in specifying and implementing quality of service-oriented transport protocols for multimedia applications. Dr. Ernesto completed his DEA in Fundamental Computer Science and Parallelism at Université Paul Sabatier - Toulouse III in 1999, with a thesis on garbage collection modeling for object-oriented multi-databases. He holds a degree in Computer Science Engineering from Universidad Centro Occidental Lisandro Alvarado in Barquisimeto, Venezuela, where he graduated with honors in 1994.

References

Atjonen, P. (2017). Development of teacher assessment literacy in comprehensive schools – Views from the curriculum analysis. Kriteerit Puntarissa, 74, 132–169.

Atjonen, P., Pöntinen, S., Kontkanen, S., & Ruotsalainen, P. (2022). In Enhancing Preservice Teachers’ Assessment Literacy: Focus on Knowledge Base, Conceptions of Assessment, and Teacher Learning. Frontiers in Education, 7, 1-12. https://doi.org/10.3389/feduc.2022.891391 DOI: https://doi.org/10.3389/feduc.2022.891391

Baidoo-Anu, D. & Owusu, L. (2023). Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning. SSRN. https://dx.doi.org/10.2139/ssrn.4337484 DOI: https://doi.org/10.2139/ssrn.4337484

Bagsao, J., & Peckley, M.K. (2020). Assessment Literacy of Public Elementary School Teachers in the Indigenous Communities in Northern Philippines. Universal Journal of Educational Research, 8(11b), 5693-5703. http://dx.doi.org/10.13189/ujer.2020.082203 DOI: https://doi.org/10.13189/ujer.2020.082203

Cai, W., Sheng, H., & Goel, S. (2020). MathBot: A Personalized Conversational Agent for Learning Math. In B. Scharlau & R. McDermott (Pres.), Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. Association for Computing Machinery.

Chassignol, M., Khoroshavin, A., Klimova, A., & Bilyatdinova, A. (2018). Artificial intelligence trends in education: A narrative overview. Procedia Computer Sciences, 136, 16-24. DOI: https://doi.org/10.1016/j.procs.2018.08.233

Chen, L., Chen, P., & Lin, Z. (2020). Artificial Intelligence in Education: A review. IEEE Access, 8, 75264-75278. https://doi.org/10.1109/ACCESS.2020.2988510 DOI: https://doi.org/10.1109/ACCESS.2020.2988510

Choi, Y., & McClenen, C. (2020). Development of adaptive formative assessment system using computerized adaptive testing and dynamic bayesian networks. Applied Sciences, 10(22), 8196. https://www.mdpi.com/2076-3417/10/22/8196# DOI: https://doi.org/10.3390/app10228196

Contreras, J.O., Hilles, S.M., & Abubakar, Z.B. (2018) Automated essay scoring with ontology based on text mining and NLTK tools. In I. Zen (Pres.), 2018 International Conference on Smart Computing and Electronic Enterprise (pp. 1-6). IEEExplore. DOI: https://doi.org/10.1109/ICSCEE.2018.8538399

Coppock, A., Leeper, T.J., Mullinix, K.J. (2018). Generalizability of heterogeneous treatment effect estimates across samples. PNAS, 115(49), 12441-12446. http://www.pnas.org/cgi/doi/10.1073/pnas.1808083115 DOI: https://doi.org/10.1073/pnas.1808083115

Cummins, R., Zhang, M., & Briscoe, E. (2016). Constrained multi-task learning for automated essay scoring. Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/P16-1075

Darwish, S.M., & Mohamed, S.K. (2019) Automated essay evaluation based on fusion of fuzzy ontology and latent semantic analysis. In A.E. Hassanien, A.T. Azar, T. Gaber, R. Bhatnagar, & M.F. Tolba (Eds.), The International Conference on Advanced Machine Learning Technologies and Applications (pp. 566-575). Springer. DOI: https://doi.org/10.1007/978-3-030-14118-9_57

DeLuca, D., Willis, J., Cowie, B., Harrison, C., Coombs, A., Gibson, A., et al. (2019). Policies, programs, and practices: exploring the complex dynamics of assessment education in teacher education across four countries. Frontiers in Education, 4, 1-19. https://doi.org/10.3389/feduc.2019.00132 DOI: https://doi.org/10.3389/feduc.2019.00132

Deneen, C.C., & Brown, G.T.L (2016). The impact of conceptions of assessment on assessment literacy in a teacher education program. Cogent Education, 3(1), 1225380. https://doi.org/10.1080/2331186X.2016.1225380 DOI: https://doi.org/10.1080/2331186X.2016.1225380

Dillenbourg, P. (2016). The evolution of research on digital education. International Journal of Artificial Intelligence in Education, 26(2), 544-560. https://doi.org/10.1007/s40593-016-0106-z DOI: https://doi.org/10.1007/s40593-016-0106-z

Dong, F., Zhang, Y., Yang, J. (2017). Attention-based recurrent convolutional neural network for automatic essay scoring. In R. Levy & L. Specia (Eds.), Proceedings of the 21st Conference on Computational Natural Language Learning (pp. 153–162). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/K17-1017

Douglas, C., Worsfold, K., Davies, L., Fisher, R., & McPhail, R. (2011). Assessment literacy and student learning: the case for explicitly developing students ‘assessment literacy’. Assessment & Evaluation in Higher Education, 38(1), 44-60. https://doi.org/10.1080/02602938.2011.598636 DOI: https://doi.org/10.1080/02602938.2011.598636

Galindo-Domínguez, H., & Bezanilla, M.J. (2021). Promoting Time Management and Self-Efficacy Through Digital Competence in University Students: A Mediational Model. Contemporary Educational Technology, 13(2), ep294. https://doi.org/10.30935/cedtech/9607 DOI: https://doi.org/10.30935/cedtech/9607

Galindo-Domínguez, H., Delgado, N., Losada, D., & Etxabe, J.M. (2024). An analysis of the use of artificial intelligence in education in Spain: The in-service teacher’s perspective. Journal of Digital Learning in Teacher Education, 40(1), 41-56. https://doi.org/10.1080/21532974.2023.2284726 DOI: https://doi.org/10.1080/21532974.2023.2284726

Gálvez, J., Conejo, R., & Guzmán, E. (2013). Statistical Techniques to Explore the Quality of Constraints in Constraint-Based Modeling Environments. International Journal of Artificial Intelligence in Education, 23, 22–49. https://doi.org/10.1007/s40593-013-0008-2 DOI: https://doi.org/10.1007/s40593-013-0008-2

Gao, Y., Wang, R., & Hou, F. (2023). How to design translation prompts for ChatGPT: An empirical study. ArXiv, 2304, 02182. https://doi.org/10.48550/arXiv.2304.02182

González-Calatayud, V., Prendes-Espinosa, P., & Roig-Vila, R. (2021). Artificial Intelligence for student assessment: a systematic review. Applied Sciences, 11, 5467. https://doi.org/10.3390/app 11125467 DOI: https://doi.org/10.3390/app11125467

Government of Newfoundland and Labrador (2014). English Language Arts Grade 6. Appendix D: Sample Elementary Classroom Rubrics and Checklists. Department of Education of the Government of Newfoundland and Labrador. https://www.gov.nl.ca/education/files/k12_curriculum_guides_english_grade6_300614_g6_ela.pdf

Grivokostopoulou, F., Perikos, I., Hatzilygeroudis, I. (2017). An Educational System for Learning Search Algorithms and Automatically Assessing Student Performance. International Journal of Artificial Intelligence in Education, 27, 207–240. http://dx.doi.org/10.1007/s40593-016-0116-x DOI: https://doi.org/10.1007/s40593-016-0116-x

Hamodi, C., López-Pastor, V., and López-Pastor, A. (2016). If i experience formative assessment whilst studying at university, will i put it into practice later as a teacher? Formative and shared assessment in Initial Teacher Education (ITE). European Journal of Teacher Education, 40(2), 171–190. https://doi.org/10.1080/02619768.2017.1281909 DOI: https://doi.org/10.1080/02619768.2017.1281909

Hill, M., Ell, F., & Eyers, G. (2017). Assessment capability and student self-regulation: the challenge of preparing teachers. Frontiers in Education, 2, 1-15. https://doi.org/10.3389/feduc.2017.00021 DOI: https://doi.org/10.3389/feduc.2017.00021

Houtao, L., Wenjia, M., Tingting, W., & Chuanhua, X. (2022). The Study of Feedback in Writing from College English Teachers and Artificial Intelligence Platform Based on Mixed Method Teaching. Pacific International Journal, 5(4), 147-154. https://doi.org/10.55014/pij.v5i4.270 DOI: https://doi.org/10.55014/pij.v5i4.270

Hrastinski, S., Olofsson, A. D., Arkenback, C., Ekström, S., Ericsson, E., Fransson, G., Jaldemark, J., Ryberg, T., Öberg, L.-M., Fuentes, A., Gustafsson, U., Humble, N., Mozelius, P., Sundgren, M., & Utterberg, M. (2019). Critical imaginaries and reflections on artificial intelligence and robots in post-digital K-12 education. Post-Digital Science and Education, 1(2), 427-445. https://doi.org/10.1007/ s42438-019-00046-x DOI: https://doi.org/10.1007/s42438-019-00046-x

Jani, K.H., Jones, K.A., Jones, G.W., Amiel, J., Barron, B., & Elhadad, N. (2020). Machine learning to extract communication and historytaking skills in OSCE transcripts. Medical Education, 54, 1159–1170. https://doi.org/10.1111/medu.14347 DOI: https://doi.org/10.1111/medu.14347

Jiao, W., Wang, W., Huang, J.T., Wang, X., & Tu, Z. (2023). Is ChatGPT a Good Translator? Yes with GPT-4 as the engine. ArXiv, 3, 1-8. https://doi.org/10.48550/arXiv.2301.08745

Kasneci, E., Sessler, K., Küchemann, S., …, Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274 DOI: https://doi.org/10.1016/j.lindif.2023.102274

Ke, Z., Inamdar, H., Lin, H., & Ng, V. (2019). Give me more feedback II: Annotating thesis strength and related attributes in student essays. In A. Korhonen, D. Traum & L. Márquez (Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3994-4004). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/P19-1390

Kumar, Y., Aggarwal, S., Mahata, D., Shah, R. R., Kumaraguru, P., & Zimmermann, R. (2019). Get it scored using autosas—an automated system for scoring short answers. In B. Williams, Y. Chen, & J. Neville (Eds.), Proceedings of the AAAI Conference on Artificial Intelligence (pp. 9662–9669). AAAI Press. DOI: https://doi.org/10.1609/aaai.v33i01.33019662

Liu, M., Wang, Y., Xu, W., & Liu, L. (2017). Automated Scoring of Chinese Engineering Students’ English Essays. International Journal of Distance Education Technologies, 15(1), 52–68. DOI: https://doi.org/10.4018/IJDET.2017010104

Lovorn, M.G., Reza, A. (2011). Assessing the Assessment: Rubrics Training for Pre-service and New In-service Teachers. Practical Assessment, Research, and Evaluation, 16(1), 16. https://doi.org/10.7275/sjt6-5k13

Mathias, S., & Bhattacharyya, P. (2018). Thank “Goodness”! A Way to Measure Style in Student Essays. In Y. Tseng, H. Chen, V. Ng. & M. Komachi (Eds.), Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications (pp. 35–41). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/W18-3705

Mellati, M., & Khademi, M. (2018). Exploring teachers' assessment literacy: Impact on learners' writing achievements and implications for teacher development. Australian Journal of Teacher Education, 43(6), 1-18. http://dx.doi.org/10.14221/ajte.2018v43n6.1 DOI: https://doi.org/10.14221/ajte.2018v43n6.1

Mikropoulos, T.A. & Natsis, A. (2011). Educational virtual environments: A ten-year review of empirical research (1999–2009). Computers & Education, 56(3), 769-780. https://doi.org/10.1016/j.compedu.2010.10.020 DOI: https://doi.org/10.1016/j.compedu.2010.10.020

Mintz, Y., & Brodie, R. (2019). Introduction to artificial intelligence in medicine. Minimally Invasive Therapy & Allied Technologies, 28(2), 73-81. https://doi.org/10.1080/13645706.2019.1575882 DOI: https://doi.org/10.1080/13645706.2019.1575882

Mirchi, N., Bissonnette, V., Yilmaz, R., Ledwos, N., Winkler-Schwartz, A., & Del Maestro, R.F. (2020). The Virtual Operative Assistant: An explainable artificial intelligence tool for simulation-based training in surgery and medicine. PLoS ONE 15, e0229596. https://doi.org/10.1371/journal.pone.0229596 DOI: https://doi.org/10.1371/journal.pone.0229596

Ocaña-Fernández, Y., Valenzuela-Fernández, L.A., & Garro-Aburto, L.L. (2019). Inteligencia artificial y sus implicaciones en la educación superior. Propósitos y Representaciones, 7(2), 536-568. https://doi.org/10.20511/pyr2019.v7n2.274 DOI: https://doi.org/10.20511/pyr2019.v7n2.274

Okada, A., Whitelock, D., Holmes, W., & Edwards, C. (2019). e-Authentication for online assessment: A mixed-method study. British Journal of Educational Technology, 50(2), 861–875. https://doi.org/10.1111/bjet.12608 DOI: https://doi.org/10.1111/bjet.12608

Organic Law 3/2020, of December 29th, amending Organic Law 2/2006, of May 3rd, on Education. Official State Gazette, 340, 122868-122953. https://www.boe.es/eli/es/lo/2020/12/29/3

Ouguengay, Y.A., El Faddouli, N.-E., & Bennani, S. (2015). A neuro-fuzzy inference system for the evaluation of reading/writing competencies acquisition in an e-learning environnement. Journal of Theoretical and Applied Information Technology, 81(3), 600–608.

Owan, V.J., Bekom, K., Emoji, D., Onor, E., & Asuquo, B. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. Modestum. Eurasia Journal of Mathematics, Science and Technology Education, 19(8), em2307. https://doi.org/10.29333/ejmste/13428 DOI: https://doi.org/10.29333/ejmste/13428

Peng, K., Ding, L., Zhong, Q., Shen, L., Liu, X., Zhang, M., Ouyang, Y., & Tao, D. (2023). Towards making the most of ChatGPT for machine translation. ArXiv, 2303, 13780. https://doi.org/10.48550/arXiv.2303.13780 DOI: https://doi.org/10.2139/ssrn.4390455

Ramesh, D., & Kumar, S. (2022). An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review, 55, 2495-2527. https://doi.org/10.1007/s10462-021-10068-2 DOI: https://doi.org/10.1007/s10462-021-10068-2

Redecker, C. (2017). European Framework for the Digital Competence of Educators: DigCompEdu. Joint Research Centre. http://dx.doi.org/10.2760/159770

Rhienmora, P., Haddawy, P., Suebnukarn, S., Dailey, M.N. (2011). Intelligent dental training simulator with objective skill assessment and feedback. Artificial Intelligence in Medicine, 52(2), 115–121. https://doi.org/10.1016/j.artmed.2011.04.003 DOI: https://doi.org/10.1016/j.artmed.2011.04.003

Salama, S., & Subahi, A. M. (2020). The Impact of Specialty, Sex, Qualification, and Experience on Teachers’ Assessment Literacy at Saudi Higher Education. International Journal of Learning, Teaching and Educational Research, 19(5), 200-216. https://doi.org/10.26803/ijlter.19.5.12 DOI: https://doi.org/10.26803/ijlter.19.5.12

Samarakou, M., Fylladitakis, E.D., Karolidis, D., Früh, W.-G., Hatziapostolou, A., Athinaios, S.S., & Grigoriadou, M. (2016). Evaluation of an intelligent open learning system for engineering education. Knowledge Management & E-Learning: An International Journal, 8(3), 496–513. DOI: https://doi.org/10.34105/j.kmel.2016.08.031

Spear-Swerling, L., Owen, P., & Alfano, M.P. (2005). Teachers’ literacy-related knowledge and self-perceptions in relation to preparation and experience. Annals of Dyslexia, 55, 266-296. https://doi.org/10.1007/s11881-005-0014-7 DOI: https://doi.org/10.1007/s11881-005-0014-7

Stiggins, R. (2014). Improve assessment literacy outside of schools too. Phi Delta Kappan, 96, 65–72. DOI: https://doi.org/10.1177/0031721714553413

Sun, G.H. & Hoelscher, S.H. (2023). The ChatGPT Storm and What Faculty can do. Nurse Educator, 48(3), 119-124. https://doi.org/10.1097/nne.0000000000001390 DOI: https://doi.org/10.1097/NNE.0000000000001390

Ulum, Ö.G. (2020). A critical deconstruction of computer-based test application in Turkish State University. Education and Information Technologies, 25, 4883–4896. https://doi.org/10.1007/s10639-020-10199-z DOI: https://doi.org/10.1007/s10639-020-10199-z

Vij, S., Tayal, D., & Jain, A. (2020). A machine learning approach for automated evaluation of short answers using text similarity based on WordNet graphs. Wireless Personal Communications, 111(2), 1271–1282. https://doi.org/10.1007/s11277-019-06913-x DOI: https://doi.org/10.1007/s11277-019-06913-x

Wang, P. (2019). On Defining Artificial Intelligence. Journal of Artificial General Intelligence, 10(2), 1-37. https://doi.org/10.2478/jagi-2019-0002 DOI: https://doi.org/10.2478/jagi-2019-0002

Xu, Y., & Brown, G.T.L. (2016). Teacher assessment literacy in practice: a reconceptualization. Teaching and Teacher Education, 58, 149-162. http://dx.doi.org/10.1016/j.tate.2016.05.010 DOI: https://doi.org/10.1016/j.tate.2016.05.010

Yuan, S., He, T., Huang, H., Hou, R., & Wang, M. (2020). Automated Chinese essay scoring based on deep learning. CMC-Computers Materials & Continua, 65(1), 817–833. https://doi.org/10.32604/cmc.2020.010471 DOI: https://doi.org/10.32604/cmc.2020.010471

Published

30-09-2024

How to Cite

Galindo-Domínguez, H., Delgado, N., Sainz de la Maza, M., & Expósito, E. (2024). An experimental analysis of the relationship between the evaluations of artificial intelligence and pre-service teachers. Edutec, Revista Electrónica De Tecnología Educativa, (89), 84–104. https://doi.org/10.21556/edutec.2024.89.3509

Issue

Section

Special issue: Artificial intelligence in the evaluation and personalization...