Human vs. Machine-Assisted Subtitle Translation in MOOCs
A Corpus-Based Case Study
DOI:
https://doi.org/10.47476/jat.v8i2.2025.357Keywords:
MOOCs, lexical bundles, MT-assisted translation, human translation, audiovisual translation, literate and oral registerAbstract
Massive Open Online Courses (MOOCs) have become important audiovisual resources in higher education, offering accessible, high-quality learning materials to a global audience. While MOOCs have been extensively studied, relatively little attention has been paid to the language used in MOOC lectures, particularly from the perspectives of audiovisual translation (AVT) and machine translation (MT). This study investigates lexical bundles in machine-translated and human-translated corpora from four MOOCs through a case study approach, using AntConc 4.3.1 to extract frequently recurring bundles categorized by structure and function. Findings suggest that MT-generated translations tend to align more closely with a “literate” register, dominated by referential bundles, whereas human-translated subtitles reflect a more “oral” register, with discourse organizers comprising nearly 40% of the total. While these results diverge from Biber and Barbieri’s (2007) conclusions, they are consistent with studies indicating that MT output resembles academic lectures, while human translations show features similar to those found in general TED Talk discourse. Moreover, despite being produced at different times, both corpora include exact matches and comparable lexical bundles across various instructional stages. This study offers preliminary insights into the academic register of MOOC subtitle translation and contributes to the growing body of research in audiovisual translation.
Lay summary
Massive Open Online Courses (MOOCs) have become essential resources in higher education, offering high-quality learning materials to global audiences through online video lectures with subtitles. This study examines how different translation methods—machine translation versus human translation—affect the language style of English subtitles in four Chinese-language MOOC courses. Using corpus analysis tools, we identified recurring word patterns in both translation types. The findings reveal that machine-translated subtitles tend to use more formal, textbook-like language focused on describing facts and concepts, while human-translated subtitles
adopt a more conversational, speech-like tone with phrases that help organize information and guide learners—similar to the style found in TED Talks. Despite these differences, both translation approaches used similar phrases at key instructional moments such as course introductions, topic transitions, and conclusions. These findings have practical implications for educators and translation professionals producing online course subtitles, highlighting the trade-offs between translation efficiency and maintaining an engaging, conversational teaching style. As online education continues to grow internationally, understanding how translation choices shape the learning experience becomes increasingly important for making educational content both accessible and engaging to diverse audiences.