You are on page 1of 2

Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

SMS Spam Classifier Using Machine Learning


Shaniya Chauhan
Department of Computer Applications

Abstract:- The purpose of this research paper is to Till then the effect was not so widespread because
examine how machine learning techniques are used to internet was only confined to defense setups, large
identify whether a SMS is spam or not. The growth of international companies and research communities.
mobile users has led to a dramatic increase in SMS and
messages. Despite the fact that our idea of information In 1996, when internet was launched for the public, it
channels are currently seen as spotless and reliable in immediately became popular among the masses and they
many parts of the world on going data clearly slowly became dependent on it to an extent that it have
demonstrates that the amount of cell phones Spam is changed their lifestyle. As on April 2018, the number of
dramatically increasing overtime. It is a growing mobile broadband subscribers in India reached 401.41
catastrophe, especially in the middle East and Asia. million. Overall, the number of broadband subscribers,
including wired, in India reached 419.79 million by the end
Separating SMS spam is a similarly lead task to of April 2018.
solve this problem. It games several concerns and
practical fixes from SMS spam separation in any case, it III. METHODOLOGY
brings up it's own unique problems.
The terms "Spam" refers to unwanted content with
Keywords – Machine Learning, Spam Separation. questionable information. The data set consists of a
randomly choseniction of plain text emails that have been
I. INTRODUCTION classified as either SPAM or HAM. The model for
categorising emails as ham and spam is developed using the
The bulk delivery of unscheduled messages, primarily training data. The model created using the training data is
of a business nature but also containing offensive content, examined using the test data for accuracy.The system is
has become a major problem for SMS service for internet shown as a block diagram in Fig.1.
service providers, businesses,and individual customers in the
last 10 years due to the spam phenomenon's steady growth.
Recent analysis show that more than 60% of all messages
are spam. Spam puts excessive strain on sms frameworks
ability to make data quickly and store data on servers,
increasing annual costs for partnership by more than several
billion dollars.

II. LITERATURE REVIEW

The expression 'cybercrime' is a product of the


expansion in communications technology which has
accelerated over the last twenty five years.

Cyber Crime is a term that encompasses a variety of Fig. 1 System Block Diagram
offences associated with the use of information and
communication technology. The internet was born around Data cleaning-One of the crucial components of
1960’s where its access was limited to few scientist, machine learning is data cleaning, it is crucial to the process
researchers and the defense only. of creating a model. There are no hidden twist or secrets to
discover, but it is also not the most fancy aspect of machine
Internet user base have evolved exponentially. Initially learning.
the computer crime was only confined to making a physical
damage to the computer and related infrastructure. Exploratory Data Analysis-Data analysis utilizing
visual methods is called exploratory data analysis. With the
Around 1980’s the trend changed from causing the use of statistical summaries and graphical representations, it
physical damaging to computers to making a computer is used to identify trends, patterns, or to verify assumptions.
malfunction using a malicious code called virus.

IJISRT23JUN164 www.ijisrt.com 29
Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Data pre-processing-The learning data's emails are in and rearranging the formula for determining the interesting
plain text format. The simple text must be transformed into rate can be made at a later time.
characteristics that can represent emails. We may then apply
a learning algorithm to the emails using these features. First We intend to tackle more difficult issue in the future,
and for most certain preposition procedures are carried out. like the analysis and administration of report data stored in
Model building-The last step is model building where we SMS spam filters.Future work will additionally concentrate
finalize everything and build a precise model. on finding a solution to this issue.

IV. DATASET AND FEATURES REFERENCES

The dataset used was picked up from Kaggle. This [1]. https://www.youtube.com/watch?v=YncZ0WwxyzU
dataset was first thoroughly examined and then data [2]. Swayam
cleaning and data analysis was performed. Further [3]. 3.https://www.researchgate.net/publication/349799157
Exploratory data analysis was done for this dataset. The data _SMS_Spam_Detection_Using_Machine_Learning
pre-processing includes the removal of stop words, [4]. https://ieeexplore.ieee.org/document/9441783
stemming, tokenization and removal of special characters
and punctuation marks.

V. RESULT

We tested different algorithms for the dataset for the


best precision score and accuracy.

Typically, the performance of an SMS spam classifier


is evaluated using metrics such as precision, recall, and F1
score. Precision is the fraction of spam messages correctly
identified as spam, while recall is the fraction of all spam
messages that are correctly identified by the classifier. The
F1 score is the harmonic mean of precision and recall, which
provides an overall measure of the classifier's performance.

Multinomial Naive Bayes is a popular machine


learning algorithm used for text classification tasks,
including SMS spam classification. The algorithm works by
modeling the probability distribution of words in a text
message and calculating the likelihood that a message is
spam or not based on these probabilities.

The performance of a Multinomial Naive Bayes SMS


spam classifier can be evaluated using metrics such as
accuracy, precision, recall, and F1 score. The highest
precision achieved is 1.0 and accuracy is about 97 percent.

VI. CONCLUSION & FUTURE WORK

This projects goal and objectives were established at


the very very of the process and were accomplished
throughout . The research process involves a detailed
analysis of the various filtering algorithms and available
anti-spam technologies in order to gather all the
information.

The projects work was inspired in part by the large


scale research articles and existing software packages
mentioned above. The entire project was broken up into
various iterations.Each iteration was finished by walking
through the different phases.

There are still certain areas that can be improved, such


as by incorporating more filtering methods or altering
certain features of the ones that already exist. Changes like
increasing or decreasing the message's intriguing word count

IJISRT23JUN164 www.ijisrt.com 30

You might also like