![]() ![]() Ruby was used to do the data preparation for Dataset 02 and Dataset 03 because it is easier and quicker to manipulate the data using simple syntax and regular expression. ![]() Dataset 03 is a list of 6,492 words drawn from the CambodianEnglish dictionary together with their phonemic transcription done by Robert Headley published in 1997. Phonemic and phonetic transcriptions of each word in Datasets 01 and 02 are manually created based on existing phonological principles/regularities postulated by previous scholars. Dataset 02 is a list of 7,654 words drawn from the official Khmer monolingual dictionary published in 1967. Dataset 02 and Dataset 03 serve as testing dataset. ![]() ![]() Dataset 01 is a list of manually selected 140 words which covers most spelling and pronunciation cases in native Khmer words. Three datasets are created to manually train the model as well as test it, and two Thrax grammars were written to fulfill the two processes. The approach chosen for this research involves two processes: (1) converting the orthographic words into phonemic transcription which represents careful speech, and (2) converting the phonemic transcription to the phonetic transcription which represents casual speech. This thesis explores using phonological principles in Khmer to build a model which can automatically transduce orthographic native Khmer words into a phonemic transcription and a close phonetic transcription. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |