
Natural Language Processing Approaches to Text Data Augmentation: A Computational Linguistic Analysis




In the context of Natural Language Processing (NLP) tasks, problems such as insufficient or skewed data are frequently encountered. One practical solution to this problem is to generate additional textual data. Text Data Augmentation (TDA) refers to small changes made to accessible text at the character, word, or sentence level to generate synthetic data that is subsequently inserted into data loaders to train the model. By producing synthetic data, models can learn from a larger range of instances and, hence, enhance their resilience and generalization skills. Despite the fact that the entire NLP community has extensively studied many NLP DA approaches, recent research on the subject suggests that the relationship between the several DA techniques now in use is not entirely known in practice. Therefore, this study applies and extends the advances of TDA to encounter and cover varied tools on multiple settings or contexts. To carry out a thorough practical implementation of NLP DA approaches, comparing the way they perform and highlighting some of the significant similarities and differences in these various scenarios, this work depends on different tools of easy data augmentation and neural-based augmentation. This study suggests that some typical DA techniques might not be suitable in some circumstances or text environments. Specifically, according to the initial results, the context and word count of a text may have a significant impact on the quality of the synthetic data.

Author Biographies

Hoda Zaiton, Alexandria University in collaboration with Arab Academy for Science, Technology and Maritime Transport (AASTMT), Egypt

MA Candidate- Applied Linguistics

College of Language and Communication (CLC)

The Arab Academy for Science, Technology and Maritime Transport (AASTMT),

in collaboration with the Institute of Applied Linguistics and Translation, Faculty of Arts, Alexandria University, Egypt.

Sameh Al-Ansary , Alexandria University, Egypt

Professor of Computational Linguistics

Phonetics and Phonology Department

Faculty of Arts, Alexandria University, Egypt.


