Filtering Spam Text Messages by Using Twitter-LDA Algorithm
Rahmat, Romi Fadillah
Pasha, Muhammad Fermi
MetadataShow full item record
Recently, the usage of short messaging service (SMS) or text messages have been changed gradually to product or service promotion, and even fraud. The mobile phone users in Indonesia also experience the same condition. A simple approach to address this issue is creating black list of phone numbers or certain keywords and phrases. However, this approach is inefficient because the spammer might change the phone number or change the content of the text message. Meanwhile, another approach is utilizing text classification such as Naive Bayes, k- Nearest Neighbor (kNN), and Support Vector Machine (SVM) to recognize pattern of the text messages. This research proposes Twitter-LDA algorithm to identify spam text messages in Bahasa Indonesia. There are total 985 text messages divided to 774 text messages for training dataset and 211 text messages for testing dataset. These datasets consist of 860 spam and 125 ham text messages. All the text messages should be pre-processed before the training and testing process are applied. This research conducts five experiments which yield the average of f-score is 94.26% and accuracy is 96.49%. According to this result, the Twitter-LDA algorithm has demonstrated a good performance in identifying spam text messages in Bahasa Indonesia.