TAIPEI (Taiwan News) — TV dramas using Taiwanese (Hokkien) played a significant role in the development of a groundbreaking speech-to-speech translation system by Meta.
The tech giant on Wednesday (Oct. 19) unveiled a Hokkien-English translation service as part of its broader ambition to make the world’s thousands of oral languages understandable via AI-enabled translation.
Led by a Taiwanese engineer, the team behind the project sourced more than 1,500 hours of footage from Taiwanese soap operas for the AI training, said Lee Hung-yi (李宏毅), an associate professor at National Taiwan University’s Department of Electrical Engineering and who was involved in the research.
Due to a lack of substantial data annotated and collected for Taiwanese, AI learning would have been impossible. Thanks to the large amounts of data from Taiwanese TV dramas, which came in Chinese subtitles, paired sentences became available to help train the AI model, CNA quoted Lee as saying.
According to Lee, the project employed self-supervised learning, a method of machine learning based on unlabeled sample data. Then the machine learned how to speak Taiwanese with the input of the dialogue pairings. The Taiwanese Across Taiwan speech database has also been drawn upon to improve the accuracy of the translations, he added.
Try the Hokkien-English translation service here.
An example of Taiwanese soap operas: