Exploitation and Evaluation of an Arabic-English Composite Learner Translator Corpus




Arabic multimodal parallel learner corpus, process and product-oriented translation, triangulation


This paper describes in depth the data collection and exploitation stages in constructing the undergraduate learner translator corpus (ULTC), a 75 million-word sentence-aligned bidirectional parallel corpus of Arabic, English, and French, with Arabic as its central language. We focus on the methodological challenges, and describe the compilation process and problems encountered in the first phase of the project. Our aim is to inform future compilers of similar projects that integrate learner corpus research (LCR) and corpus-based translation studies (CBTS). In the first part, we present design considerations, data collection criteria, and the exploitation of the corpus, and in the second part, we evaluate the systems we used and possible improvements

Author Biographies

Reem F.  Alfuraih, Princess Norah bint Abdul Rahman University, Saudi Arabia

 (Lecturer in Applied Linguistics) 

Department of Applied Linguistics, College of Languages

Princess Norah bint Abdul Rahman University, Saudi Arabia

Noha M. El-Jasser, Princess Noura bint Abdulrahman University, Saudi Arabia

  (Lecturer in Translation)

Department of Translation, College of Languages

Princess Noura bint Abdulrahman University, Saudi Arabia


Alfuraih, Reem. (2019). ‘The undergraduate learner translator corpus: A new resource for translation studies and computational linguistics’. Language Resources and Evaluation. doi:10.1007/s10579-019-09472-6

Alves, Fabio. (2005). ‘Triangulation in process-oriented research in translation’. In Fabio Alves (ed.), Triangulating Translation: Perspectives in Process Oriented Research, 3-24. Amsterdam: Benjamins.


Baker, Mona. (1993). ‘Corpus linguistics and translation studies: Implications and applications’. In M. Baker, G. Francis and E. Tognini-Bonelli (eds.), Text and Technology: In Honour of John Sinclair, 233-50. Amsterdam and Philadelphia: John Benjamins.

Baker, Mona. (1995). ‘Corpora in translation studies: An overview and some suggestions for future research’. Target, 7(2): 223-243.

Baker, Mona. (1996). ‘Corpus-based translation studies: The challenges that lie ahead’. In H. Somers (ed.), Terminology, LSP and Translation: Studies in language Engineering in Honour of Juan C. Sager, 175-86. Amsterdam: John Benjamins.

Baker, Paul and Jesse Egbert. (2016). Triangulating Methodological Approaches in Corpus Linguistic Research. London: Routledge.

Bernardini, Silvia. (1999). ‘Using think-aloud protocols to investigate the translation process: Methodological aspects. In N.J. Williams (ed.), RCEAL: Working Papers in English and Applied Linguistics 6, 179-199. Cambridge: University of Cambridge.

Biber, Douglas. (2014). ‘Using multi-dimensional analysis to explore cross-linguistic universals of register variation’. Languages in Contrast 14(1): 7-34. doi:10.1075/lic.14.1.02bib

Bowker, Lynne and Peter Bennison. (2003). ‘Student translation archive: Design, development and application’. In F. Zanettin, S. Bernardini and D. Stewart (eds.), Corpora in Translator Education, 103-117. London and New York: Routledge.

Borin, Lars and Klas Prütz. (2004). ‘New wine in old skins? A corpus investigation of L1 syntactic transfer in learner language’. In G. Aston, S. Bernardini and D. Stewart (eds.), Corpora and Language Learners, 67-87. Amsterdam: Benjamins.

Castagnoli, Sara. (2009). Regularities and variations in learner translations: A corpus-based study of conjunctive explicitation. PhD Dissertation, University of Pisa.

Castagnoli, Sara, Dragoș Ciobanu, Natalie Kübler, Kerstin Kunz, and Alexandra Volanschi. (2011). ‘Designing a learner translator corpus for training purposes’. In N. Kübler (ed.), Corpora, Language, Teaching, and Resources: From Theory to Practice, 221-248. Bern: Peter Lang.

Čulo, Oliver, Silvia Hansen-Schirra, Karin Maksymski, and Stella Neumann. (2017). ‘Empty links and crossing lines: Querying multi-layer annotation and alignment in parallel corpora’. In Silvia Hansen-Schirra, Stella Neumann, and Oliver Čulo (eds), Annotation, Exploitation and Evaluation of Parallel Corpora, 47-80. Berlin: Language Science Press. doi:10.5281/zenodo.283498

Doctorow, Cory. (2016). ‘The privacy wars are about to get a whole lot worse’.

Espunya, Anna. (2014). ‘The UPF learner translation corpus as a resource for translator training’. Language Resources and Evaluation, 48: 33-43.

Fictumova, Jarmila, Adam Obrusnik, and Kristyna Stepankova. (2017). ‘Teaching specialized translation error tagged translation learner corpora’. Sendebar, 28: 209-241.

Graedler, Anne-Line. (2013). NEST—a corpus in the brooding box. Studies in Variation, Contacts and Change in English,: Corpus Linguistics and Variation in English: Focus on Non-Native Englishes, 13.

Granger, Sylviane. (1993). ‘The international corpus of learner English’. In J. Aarts, P. de Haan and N. Oostdijk (eds). 57-69. English language corpora: Design, analysis and exploitation. Amsterdam: Rodopi.

Granger, Sylviane. (1994). ‘The learner corpus: A revolution in applied linguistics’. English Today, 39 (10/3): 25-29.

Granger, Sylviane. (1996). ‘From CA to CIA and back: An integrated contrastive approach to computerized bilingual and learner corpora’. In K. Aijmer, B. Altenberg and M. Johansson (eds.), Languages in Contrast. Text-Based Cross-Linguistic Studies, 37-51. Lund: Lund University Press.

Granger, Sylviane. (2004). ‘Computer learner corpus research: Current status and future prospects’. Applied Corpus Linguistics: A Multidimensional Perspective.

Granger, Sylviane, Gaëtanelle Gilquin and Fanny Meunier. (2015). The Cambridge Handbook of Learner Corpus Research.


Hareide, Lidun. (2019). ‘Comparable parallel corpora: A critical review of current practices in corpus-based translation studies’. In Irene Doval and M. Teresa Sánchez Nieto (eds), Parallel Corpora for Contrastive and Translation Studies: New Resources and Applications, 19-38. John Benjamins Publishing. Amsterdam.

Hareide, Lidun and Knut Hofland. (2012). ‘Compiling a Norwegian–Spanish parallel corpus: Methods and challenges. In Michael Oakes and Meng Ji (eds), Quantitative Methods in Corpus Based Translation Studies, 75-114. Amsterdam: John Benjamins.

Hunston, Susan. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.

Jakobsen, Arnt Lykke. (1999). ‘Translog documentation’. In G. Hansen (ed.), Probing the Process in Translation: Methods and Results, 151-186. Copenhagen: Samfundslitteratur.

Kübler, Natalie. (2008). ‘A comparable learner translator corpus: Creation and use’. In Proceedings of the Comparable Corpora Workshop of the LREC Conference, 73, 78, Marrakech, 28-30 May 2008.

Kutuzov, Andrei and Maria Kunilovskaya. (2014). ‘Russian learner translator corpus: Design, research potential and applications’. In P. Sojka, A. Horak, I. Kopecek and K. Palak (eds.), Text, Speech and Dialogue. Lecture Notes in Computer Science, 315-323. Berlin: Springer.

Marchi, Anna and Charlotte Taylor. (2018). Corpus Approaches to Discourse: A Critical Review. New York: Routledge.

Malamatidou, Sofia. (2018). Corpus Triangulation: Combining Data and Methods in Corpus-Based Translation Studies. New York: Routledge

Mikhailov, Mikhail and Robert Cooper. (2016). Corpus Linguistics for Translation and Contrastive Studies: A Guide for Research. Routledge. Corpus Linguistics Guides. London and New York: Routledge.

Nicholls, Diane. (2003). ‘The Cambridge learner corpus – Error coding and analysis for lexicography and ELT’, In Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery (eds), Proceedings of the Corpus Linguistics 2003 Conference, 572-581. Lancaster University.

Paltridge, Brian. (2012). Discourse Analysis, 2nd edition. London: Bloomsbury.

Sanjurjo-González, Hugo and Marlén Izquierdo. (2019). ‘P-ACTRES 2.0: A parallel corpus for cross-linguistic research’. In Irene Doval and M. Teresa Sánchez Nieto (eds), Parallel Corpora for Contrastive and Translation Studies: New Resources and Applications. John Benjamins Publishing. Amsterdam.

Saldanha, Gabriela and Sharon O’Brien. (2013). Research Methodologies in Translation Studies. New York: Routledge.

Timmis, Ivor. (2015). Corpus Linguistics for ELT: Research and Practice. Corpus Linguistics Guides. New York: Routledge.

Utka, Andrius. (2004). ‘Phases of translation corpus: Compilation and analysis’. International Journal of Corpus Linguistics. 9: 195-224.



Date of Publication


How to Cite

 Alfuraih, R. F., & El-Jasser, N. M. (2023). Exploitation and Evaluation of an Arabic-English Composite Learner Translator Corpus. International Journal of Arabic-English Studies.