Un análisis experimental de la relación entre las evaluaciones proporcionadas por la inteligencia artificial y las proporcionadas por los docentes en formación
DOI:
https://doi.org/10.21556/edutec.2024.89.3509Palabras clave:
Evaluación, Inteligencia Artificial, ChatGPT, Formación DocenteResumen
Este estudio tuvo como objetivo analizar las posibles diferencias entre las evaluaciones realizadas por docentes en formación y las realizadas por diferentes IA generativas. Participaron un total de 507 docentes en formación, a quienes se les proporcionó una rúbrica para evaluar 12 textos de distintos tipos y calidades. Los resultados mostraron cómo el desempeño de las IA generativas en la evaluación de tareas escritas replicó con bastante precisión el funcionamiento de los docentes en formación, siendo ChatGPT la IA que mejor replicó el comportamiento de los docentes en formación, con una precisión cercana al 70% de la evaluación proporcionada por humanos. Del mismo modo, hubo diferencias mínimas en las evaluaciones realizadas por los docentes en formación según su género y año académico. Asimismo, la IA generativa sobrestimó las puntuaciones otorgadas a los textos. Sin embargo, esta sobrestimación disminuyó a medida que mejoraba el desempeño de los docentes en formación. De este modo, las evaluaciones realizadas por los docentes en formación con mejor desempeño estuvieron más alineadas con las proporcionadas por la IA generativa en comparación con los estudiantes con menor desempeño.
Descargas
Citas
Atjonen, P. (2017). Development of teacher assessment literacy in comprehensive schools – Views from the curriculum analysis. Kriteerit Puntarissa, 74, 132–169.
Atjonen, P., Pöntinen, S., Kontkanen, S., & Ruotsalainen, P. (2022). In Enhancing Preservice Teachers’ Assessment Literacy: Focus on Knowledge Base, Conceptions of Assessment, and Teacher Learning. Frontiers in Education, 7, 1-12. https://doi.org/10.3389/feduc.2022.891391 DOI: https://doi.org/10.3389/feduc.2022.891391
Baidoo-Anu, D. & Owusu, L. (2023). Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning. SSRN. https://dx.doi.org/10.2139/ssrn.4337484 DOI: https://doi.org/10.2139/ssrn.4337484
Bagsao, J., & Peckley, M.K. (2020). Assessment Literacy of Public Elementary School Teachers in the Indigenous Communities in Northern Philippines. Universal Journal of Educational Research, 8(11b), 5693-5703. http://dx.doi.org/10.13189/ujer.2020.082203 DOI: https://doi.org/10.13189/ujer.2020.082203
Cai, W., Sheng, H., & Goel, S. (2020). MathBot: A Personalized Conversational Agent for Learning Math. In B. Scharlau & R. McDermott (Pres.), Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. Association for Computing Machinery.
Chassignol, M., Khoroshavin, A., Klimova, A., & Bilyatdinova, A. (2018). Artificial intelligence trends in education: A narrative overview. Procedia Computer Sciences, 136, 16-24. DOI: https://doi.org/10.1016/j.procs.2018.08.233
Chen, L., Chen, P., & Lin, Z. (2020). Artificial Intelligence in Education: A review. IEEE Access, 8, 75264-75278. https://doi.org/10.1109/ACCESS.2020.2988510 DOI: https://doi.org/10.1109/ACCESS.2020.2988510
Choi, Y., & McClenen, C. (2020). Development of adaptive formative assessment system using computerized adaptive testing and dynamic bayesian networks. Applied Sciences, 10(22), 8196. https://www.mdpi.com/2076-3417/10/22/8196# DOI: https://doi.org/10.3390/app10228196
Contreras, J.O., Hilles, S.M., & Abubakar, Z.B. (2018) Automated essay scoring with ontology based on text mining and NLTK tools. In I. Zen (Pres.), 2018 International Conference on Smart Computing and Electronic Enterprise (pp. 1-6). IEEExplore. DOI: https://doi.org/10.1109/ICSCEE.2018.8538399
Coppock, A., Leeper, T.J., Mullinix, K.J. (2018). Generalizability of heterogeneous treatment effect estimates across samples. PNAS, 115(49), 12441-12446. http://www.pnas.org/cgi/doi/10.1073/pnas.1808083115 DOI: https://doi.org/10.1073/pnas.1808083115
Cummins, R., Zhang, M., & Briscoe, E. (2016). Constrained multi-task learning for automated essay scoring. Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/P16-1075
Darwish, S.M., & Mohamed, S.K. (2019) Automated essay evaluation based on fusion of fuzzy ontology and latent semantic analysis. In A.E. Hassanien, A.T. Azar, T. Gaber, R. Bhatnagar, & M.F. Tolba (Eds.), The International Conference on Advanced Machine Learning Technologies and Applications (pp. 566-575). Springer. DOI: https://doi.org/10.1007/978-3-030-14118-9_57
DeLuca, D., Willis, J., Cowie, B., Harrison, C., Coombs, A., Gibson, A., et al. (2019). Policies, programs, and practices: exploring the complex dynamics of assessment education in teacher education across four countries. Frontiers in Education, 4, 1-19. https://doi.org/10.3389/feduc.2019.00132 DOI: https://doi.org/10.3389/feduc.2019.00132
Deneen, C.C., & Brown, G.T.L (2016). The impact of conceptions of assessment on assessment literacy in a teacher education program. Cogent Education, 3(1), 1225380. https://doi.org/10.1080/2331186X.2016.1225380 DOI: https://doi.org/10.1080/2331186X.2016.1225380
Dillenbourg, P. (2016). The evolution of research on digital education. International Journal of Artificial Intelligence in Education, 26(2), 544-560. https://doi.org/10.1007/s40593-016-0106-z DOI: https://doi.org/10.1007/s40593-016-0106-z
Dong, F., Zhang, Y., Yang, J. (2017). Attention-based recurrent convolutional neural network for automatic essay scoring. In R. Levy & L. Specia (Eds.), Proceedings of the 21st Conference on Computational Natural Language Learning (pp. 153–162). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/K17-1017
Douglas, C., Worsfold, K., Davies, L., Fisher, R., & McPhail, R. (2011). Assessment literacy and student learning: the case for explicitly developing students ‘assessment literacy’. Assessment & Evaluation in Higher Education, 38(1), 44-60. https://doi.org/10.1080/02602938.2011.598636 DOI: https://doi.org/10.1080/02602938.2011.598636
Galindo-Domínguez, H., & Bezanilla, M.J. (2021). Promoting Time Management and Self-Efficacy Through Digital Competence in University Students: A Mediational Model. Contemporary Educational Technology, 13(2), ep294. https://doi.org/10.30935/cedtech/9607 DOI: https://doi.org/10.30935/cedtech/9607
Galindo-Domínguez, H., Delgado, N., Losada, D., & Etxabe, J.M. (2024). An analysis of the use of artificial intelligence in education in Spain: The in-service teacher’s perspective. Journal of Digital Learning in Teacher Education, 40(1), 41-56. https://doi.org/10.1080/21532974.2023.2284726 DOI: https://doi.org/10.1080/21532974.2023.2284726
Gálvez, J., Conejo, R., & Guzmán, E. (2013). Statistical Techniques to Explore the Quality of Constraints in Constraint-Based Modeling Environments. International Journal of Artificial Intelligence in Education, 23, 22–49. https://doi.org/10.1007/s40593-013-0008-2 DOI: https://doi.org/10.1007/s40593-013-0008-2
Gao, Y., Wang, R., & Hou, F. (2023). How to design translation prompts for ChatGPT: An empirical study. ArXiv, 2304, 02182. https://doi.org/10.48550/arXiv.2304.02182
González-Calatayud, V., Prendes-Espinosa, P., & Roig-Vila, R. (2021). Artificial Intelligence for student assessment: a systematic review. Applied Sciences, 11, 5467. https://doi.org/10.3390/app 11125467 DOI: https://doi.org/10.3390/app11125467
Government of Newfoundland and Labrador (2014). English Language Arts Grade 6. Appendix D: Sample Elementary Classroom Rubrics and Checklists. Department of Education of the Government of Newfoundland and Labrador. https://www.gov.nl.ca/education/files/k12_curriculum_guides_english_grade6_300614_g6_ela.pdf
Grivokostopoulou, F., Perikos, I., Hatzilygeroudis, I. (2017). An Educational System for Learning Search Algorithms and Automatically Assessing Student Performance. International Journal of Artificial Intelligence in Education, 27, 207–240. http://dx.doi.org/10.1007/s40593-016-0116-x DOI: https://doi.org/10.1007/s40593-016-0116-x
Hamodi, C., López-Pastor, V., and López-Pastor, A. (2016). If i experience formative assessment whilst studying at university, will i put it into practice later as a teacher? Formative and shared assessment in Initial Teacher Education (ITE). European Journal of Teacher Education, 40(2), 171–190. https://doi.org/10.1080/02619768.2017.1281909 DOI: https://doi.org/10.1080/02619768.2017.1281909
Hill, M., Ell, F., & Eyers, G. (2017). Assessment capability and student self-regulation: the challenge of preparing teachers. Frontiers in Education, 2, 1-15. https://doi.org/10.3389/feduc.2017.00021 DOI: https://doi.org/10.3389/feduc.2017.00021
Houtao, L., Wenjia, M., Tingting, W., & Chuanhua, X. (2022). The Study of Feedback in Writing from College English Teachers and Artificial Intelligence Platform Based on Mixed Method Teaching. Pacific International Journal, 5(4), 147-154. https://doi.org/10.55014/pij.v5i4.270 DOI: https://doi.org/10.55014/pij.v5i4.270
Hrastinski, S., Olofsson, A. D., Arkenback, C., Ekström, S., Ericsson, E., Fransson, G., Jaldemark, J., Ryberg, T., Öberg, L.-M., Fuentes, A., Gustafsson, U., Humble, N., Mozelius, P., Sundgren, M., & Utterberg, M. (2019). Critical imaginaries and reflections on artificial intelligence and robots in post-digital K-12 education. Post-Digital Science and Education, 1(2), 427-445. https://doi.org/10.1007/ s42438-019-00046-x DOI: https://doi.org/10.1007/s42438-019-00046-x
Jani, K.H., Jones, K.A., Jones, G.W., Amiel, J., Barron, B., & Elhadad, N. (2020). Machine learning to extract communication and historytaking skills in OSCE transcripts. Medical Education, 54, 1159–1170. https://doi.org/10.1111/medu.14347 DOI: https://doi.org/10.1111/medu.14347
Jiao, W., Wang, W., Huang, J.T., Wang, X., & Tu, Z. (2023). Is ChatGPT a Good Translator? Yes with GPT-4 as the engine. ArXiv, 3, 1-8. https://doi.org/10.48550/arXiv.2301.08745
Kasneci, E., Sessler, K., Küchemann, S., …, Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274 DOI: https://doi.org/10.1016/j.lindif.2023.102274
Ke, Z., Inamdar, H., Lin, H., & Ng, V. (2019). Give me more feedback II: Annotating thesis strength and related attributes in student essays. In A. Korhonen, D. Traum & L. Márquez (Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3994-4004). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/P19-1390
Kumar, Y., Aggarwal, S., Mahata, D., Shah, R. R., Kumaraguru, P., & Zimmermann, R. (2019). Get it scored using autosas—an automated system for scoring short answers. In B. Williams, Y. Chen, & J. Neville (Eds.), Proceedings of the AAAI Conference on Artificial Intelligence (pp. 9662–9669). AAAI Press. DOI: https://doi.org/10.1609/aaai.v33i01.33019662
Liu, M., Wang, Y., Xu, W., & Liu, L. (2017). Automated Scoring of Chinese Engineering Students’ English Essays. International Journal of Distance Education Technologies, 15(1), 52–68. DOI: https://doi.org/10.4018/IJDET.2017010104
Lovorn, M.G., Reza, A. (2011). Assessing the Assessment: Rubrics Training for Pre-service and New In-service Teachers. Practical Assessment, Research, and Evaluation, 16(1), 16. https://doi.org/10.7275/sjt6-5k13
Mathias, S., & Bhattacharyya, P. (2018). Thank “Goodness”! A Way to Measure Style in Student Essays. In Y. Tseng, H. Chen, V. Ng. & M. Komachi (Eds.), Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications (pp. 35–41). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/W18-3705
Mellati, M., & Khademi, M. (2018). Exploring teachers' assessment literacy: Impact on learners' writing achievements and implications for teacher development. Australian Journal of Teacher Education, 43(6), 1-18. http://dx.doi.org/10.14221/ajte.2018v43n6.1 DOI: https://doi.org/10.14221/ajte.2018v43n6.1
Mikropoulos, T.A. & Natsis, A. (2011). Educational virtual environments: A ten-year review of empirical research (1999–2009). Computers & Education, 56(3), 769-780. https://doi.org/10.1016/j.compedu.2010.10.020 DOI: https://doi.org/10.1016/j.compedu.2010.10.020
Mintz, Y., & Brodie, R. (2019). Introduction to artificial intelligence in medicine. Minimally Invasive Therapy & Allied Technologies, 28(2), 73-81. https://doi.org/10.1080/13645706.2019.1575882 DOI: https://doi.org/10.1080/13645706.2019.1575882
Mirchi, N., Bissonnette, V., Yilmaz, R., Ledwos, N., Winkler-Schwartz, A., & Del Maestro, R.F. (2020). The Virtual Operative Assistant: An explainable artificial intelligence tool for simulation-based training in surgery and medicine. PLoS ONE 15, e0229596. https://doi.org/10.1371/journal.pone.0229596 DOI: https://doi.org/10.1371/journal.pone.0229596
Ocaña-Fernández, Y., Valenzuela-Fernández, L.A., & Garro-Aburto, L.L. (2019). Inteligencia artificial y sus implicaciones en la educación superior. Propósitos y Representaciones, 7(2), 536-568. https://doi.org/10.20511/pyr2019.v7n2.274 DOI: https://doi.org/10.20511/pyr2019.v7n2.274
Okada, A., Whitelock, D., Holmes, W., & Edwards, C. (2019). e-Authentication for online assessment: A mixed-method study. British Journal of Educational Technology, 50(2), 861–875. https://doi.org/10.1111/bjet.12608 DOI: https://doi.org/10.1111/bjet.12608
Organic Law 3/2020, of December 29th, amending Organic Law 2/2006, of May 3rd, on Education. Official State Gazette, 340, 122868-122953. https://www.boe.es/eli/es/lo/2020/12/29/3
Ouguengay, Y.A., El Faddouli, N.-E., & Bennani, S. (2015). A neuro-fuzzy inference system for the evaluation of reading/writing competencies acquisition in an e-learning environnement. Journal of Theoretical and Applied Information Technology, 81(3), 600–608.
Owan, V.J., Bekom, K., Emoji, D., Onor, E., & Asuquo, B. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. Modestum. Eurasia Journal of Mathematics, Science and Technology Education, 19(8), em2307. https://doi.org/10.29333/ejmste/13428 DOI: https://doi.org/10.29333/ejmste/13428
Peng, K., Ding, L., Zhong, Q., Shen, L., Liu, X., Zhang, M., Ouyang, Y., & Tao, D. (2023). Towards making the most of ChatGPT for machine translation. ArXiv, 2303, 13780. https://doi.org/10.48550/arXiv.2303.13780 DOI: https://doi.org/10.2139/ssrn.4390455
Ramesh, D., & Kumar, S. (2022). An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review, 55, 2495-2527. https://doi.org/10.1007/s10462-021-10068-2 DOI: https://doi.org/10.1007/s10462-021-10068-2
Redecker, C. (2017). European Framework for the Digital Competence of Educators: DigCompEdu. Joint Research Centre. http://dx.doi.org/10.2760/159770
Rhienmora, P., Haddawy, P., Suebnukarn, S., Dailey, M.N. (2011). Intelligent dental training simulator with objective skill assessment and feedback. Artificial Intelligence in Medicine, 52(2), 115–121. https://doi.org/10.1016/j.artmed.2011.04.003 DOI: https://doi.org/10.1016/j.artmed.2011.04.003
Salama, S., & Subahi, A. M. (2020). The Impact of Specialty, Sex, Qualification, and Experience on Teachers’ Assessment Literacy at Saudi Higher Education. International Journal of Learning, Teaching and Educational Research, 19(5), 200-216. https://doi.org/10.26803/ijlter.19.5.12 DOI: https://doi.org/10.26803/ijlter.19.5.12
Samarakou, M., Fylladitakis, E.D., Karolidis, D., Früh, W.-G., Hatziapostolou, A., Athinaios, S.S., & Grigoriadou, M. (2016). Evaluation of an intelligent open learning system for engineering education. Knowledge Management & E-Learning: An International Journal, 8(3), 496–513. DOI: https://doi.org/10.34105/j.kmel.2016.08.031
Spear-Swerling, L., Owen, P., & Alfano, M.P. (2005). Teachers’ literacy-related knowledge and self-perceptions in relation to preparation and experience. Annals of Dyslexia, 55, 266-296. https://doi.org/10.1007/s11881-005-0014-7 DOI: https://doi.org/10.1007/s11881-005-0014-7
Stiggins, R. (2014). Improve assessment literacy outside of schools too. Phi Delta Kappan, 96, 65–72. DOI: https://doi.org/10.1177/0031721714553413
Sun, G.H. & Hoelscher, S.H. (2023). The ChatGPT Storm and What Faculty can do. Nurse Educator, 48(3), 119-124. https://doi.org/10.1097/nne.0000000000001390 DOI: https://doi.org/10.1097/NNE.0000000000001390
Ulum, Ö.G. (2020). A critical deconstruction of computer-based test application in Turkish State University. Education and Information Technologies, 25, 4883–4896. https://doi.org/10.1007/s10639-020-10199-z DOI: https://doi.org/10.1007/s10639-020-10199-z
Vij, S., Tayal, D., & Jain, A. (2020). A machine learning approach for automated evaluation of short answers using text similarity based on WordNet graphs. Wireless Personal Communications, 111(2), 1271–1282. https://doi.org/10.1007/s11277-019-06913-x DOI: https://doi.org/10.1007/s11277-019-06913-x
Wang, P. (2019). On Defining Artificial Intelligence. Journal of Artificial General Intelligence, 10(2), 1-37. https://doi.org/10.2478/jagi-2019-0002 DOI: https://doi.org/10.2478/jagi-2019-0002
Xu, Y., & Brown, G.T.L. (2016). Teacher assessment literacy in practice: a reconceptualization. Teaching and Teacher Education, 58, 149-162. http://dx.doi.org/10.1016/j.tate.2016.05.010 DOI: https://doi.org/10.1016/j.tate.2016.05.010
Yuan, S., He, T., Huang, H., Hou, R., & Wang, M. (2020). Automated Chinese essay scoring based on deep learning. CMC-Computers Materials & Continua, 65(1), 817–833. https://doi.org/10.32604/cmc.2020.010471 DOI: https://doi.org/10.32604/cmc.2020.010471
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2024 Edutec, Revista Electrónica de Tecnología Educativa

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
Con la entrega del trabajo, los autores ceden los derechos de publicación a la revista Edutec. Por su parte, Edutec autoriza su distribución siempre que no se altere su contenido y se indique su origen. Al final de cada artículo publicado en Edutec se indica cómo se debe citar.
La dirección y el consejo de redacción de Edutec Revista Electrónica de Tecnología Educativa, no aceptan ninguna responsabilidad sobre las afirmaciones e ideas expresadas por los autores en sus trabajos.