Traffic Classification Using Flow Exporter And Protocol Filters
of Engineering and Technology
Northcap University, Gurgaon
Abstract— Botnets represent one of
the most aggressive cyber security threats faced by organizations as they
provide different platforms for many illegal activities like distributed denial
of service attacks, click frauds, phishing and malware dissemination. Variety
of techniques which use different feature set are proposed for effective botnet
traffic classification and analyses but several challenges remain unaddressed
such as the effect of feature set of Network flow exporter. In this paper we
explore an open source Network traffic flow exporter (with a set of features)
using different protocol filters. We evaluated that the use of flow exporter
and protocol filters indeed affect the performance of botnet traffic
cyber security, flow exporter, protocol filter, traffic classification.
A botnet is a collection of compromised
computers connected over internet and remotely controlled by botmaster. The
individual compromised machines are called bots. Botnets are created to conduct
different malicious activities like distributed denial of service (DDoS)
attacks, click-fraud scams, spreading spam, stealing victims personal
information and taking advantage of users significant computational resources
by using malicious bots 1. The bots keep updating themselves and are controlled
by botmaster to carry out malicious instructions for different illegal
activities. Hence with significantly increasing high rate of reported
infections and illegal activities, the botnets contribute a serious threat
against cyber security.
The significant aspect of botnets
architecture include communication scheme, which has highly evolved over
the years that enhanced botnet functionality and avoid botnet detection. The architecture includes the compromised
bots that communicate with command and control (C&C) server to fetch
instructions from botmaster. Botnets used the Internet Relay Chat (IRC)
protocol for communication until early 2000s. However, the IRC-based bots are
highly vulnerable as they use centralized topology architecture. The complete botnet
network can be disrupted just by shutting down the IRC server. Also, the messages may easily get reveled by
continuous monitoring of network traffic and further research can be done on
captured messages from packets. Since 2003, the botnets evolved and started
using more sophisticated techniques that involved use of decentralized topology
architecture such as peer-to-peer (P2P) and different ubiquitous protocols such
as DNS and HTTP. The P2P communication scheme involves individual bots that act
as both client and server, making it more effective without any fixed
centralized point that could be exploited. However, the P2P botnet topology
also has its limitation that includes higher latency underlying in the command
and control transmission which further impacts the bots synchronization. The
use of various techniques like encryption and fluxing has also helped botnets
to avoid detection.
Therefore, botnet identification and
detection have become highly challenging. Many botnet detection approaches have
been proposed that involve network traffic analysis classification. Some of the
research in this category focus to build a generalized model for botnet
detection where as others focuses on specific types of botnets. In Early 2000,
mostly the proposed systems included specifically botnets using IRC 2.
However the recent research is more focused on P2P and HTTP based botnets 3 4.
The botnet monitoring and detection techniques used for botnet classification
should be active and continuous as the botnets use automatic update mechanisms.
Also, it potentially enables them to learn new patterns and help in adapting to
any changes in botnet evolution. Therefore, machine learning techniques (i.e.,
classification and clustering) are an effective apt solution which can be
deployed. To enable automatic pattern recognition for meaningful representation
of network traffic analysis, the clustering and classification are used. Hence,
the most significant component of these systems is meaningful feature
(attribute) extraction from network traffic. It is very challenging to extract these
Thus to end this, various botnet
detection and analysis systems have proposed their own feature sets that
represent network traffic which consists of the network packets. The network
packets is mainly divided into two major parts: 1) packet header, that contains
control information of protocols being used over network, and 2) packet
payload, which contain the application information being used over the network.
Some of the botnet detection and analysis approaches use network packet headers
4, where as others use packet payload methods 5. Flow based feature
extraction methods are commonly used by the approaches that rely on packet
headers 4. In these approaches, the traffic communication packets are
aggregated into flows and later the statistics are computed. The flow exporters
are used for generating flows and extracting such features. However, various
botnets use encryption techniques to hide the identity and avoid the detection
systems which analyze the packet payload for embedded communication
information. Thus, the flow exporters are very effective because they summarize
the traffic using only network packet headers. Hence, the open source flow
exporter along with machine learning technique is used for performing effective
botnet traffic classification.
II. BACKGROUD AND RELATED WORK
The bots are the vulnerable hosts that
are infected by the self-propagating malwares called bot program and are
designed to perform various malicious activities. The botmaster controls the
infected bots network known as botnet. Initially, the infected bots receive the
commands from the botmaster by C&C medium and perform malicious operations
like DDoS, phishing, spamming, identity theft attacks and stealing user’s
significant information 1.
The bot uses five stages to create and
maintain a botnet 1. The first stage includes the infection stage, where the
attacker infects the victim by exploiting the existing vulnerabilities by different
exploitation techniques. The second stage includes the secondary injection,
where the shell code is executed on the infected machine to get the image of
bot binary. This bot binary then itself installs on the infected machine and
later gets converted to a bot. The third stage involves the connection, the bot
binary establishes the C channel which is used by the botmaster. The
fourth stage, after the connection is established then the malicious stage
starts where the botmaster sends the commands to the botnet. The fifth stage
includes the updating and maintenance of bots by botmaster.
Although a significant amount of
research work has been done on botnet detection but botnet detection techniques
using network traffic flow analysis approach have only emerged in the last few
Gu et al. developed the BotMiner that
detects botnets which uses the group behavior analysis approach. It uses a
clustering approach to find similar C communication behavior and makes
clusters, later employs Snort 6. The data set included non malicious data
from the campus network and malicious data from running bot binaries in a
sandbox environment. The captured traffic files are converted into flows and
flow exporter included the features such as the total number of packets per
flow, average number of bytes per packet and average number of bytes per
second. The result showed that the BotMiner could detect botnets with detection
rate (DRs) between 75% and 100%.
Strayer et al. proposed an IRC botnet
detection system which used machine learning techniques (classification and
clustering) 2. Firstly the classification technique is used to filter the
chat type of traffic and later the clustering technique is used to find the
group activities in the filtered traffic. Lastly, the analyzer was applied to
the cluster for botnet detection. The data set used was gathered from a
controlled testbed running bot binary. They evaluated the classifiers against a
multidimensional flow correlation technique which was designed and proposed.
Zeidanloo et al. developed a detection
system that focused on P2P and IRC-based botnets 5. By using filtering,
classification, and clustering approaches, it focused to detect botnets group
behavior in a given traffic file. A flow based technique was used to analyze
traffic and payload inspection was deployed for traffic filtering.
Zhao et al. investigated a botnet
detection system based on flow intervals 3. The flow features of captured
traffic packets were employed with Bayesian networks and decision tree
classifiers to detect the botnets. They evaluated and analyzed the normal and
malicious attack traffic. The result showed DRs over 90% with the false
positive rates (FPRs) under 5%.
et al. proposed the botnet detection approach based on botnet traffic analysis
4. By establishing the HTTP and DNS communication with the publicly available
domain names of botnet C server and legitimate web server, the normal and
malicious traffic was generated. Netflow with machine learning algorithm was
proposed to detect the botnets. Results achieved 97% DR and 3% FPR.
The recent literature work for botnet
detection focuses more on the P2P and HTTP protocols 4. This includes using
different data mining or machine learning techniques such as neural networks,
decision trees, or statistical methods that used flow features. Mostly the
normal traffic files are integrated with attack traffic file to evaluate the
performance of the proposed botnet detection systems.
At last, this paper is aimed to use the
features exported by open source flow exporter and analyzing the flow
exporter’s effect on the performance of botnet classification.
Early literature botnet traffic analysis
work used some network flow information, which included packet headers. Most of
them focus on certain type of protocols such as HTTP and DNS. This indicates
use of protocol filtering in analyzing traffic data. No packet payload related
information is incorporated in it. The possibility of detecting botnets by
using only features extracted from the traffic flow is explored.
Traffic Data Set
The traffic files obtained from botnets
that used HTTP protocol as the communication protocol or HTTP based P2P
topology that look like normal HTTP traffic are used for analysis. The botnet
traffic files publically available at NETRESEC 7 and Snort 8 website are
employed for carrying out the research. The different botnets and domain name
list is as follow:
1) Alexa: Alexa Internet, Inc 9 ranks the websites based on
their page views and unique site users. Later this ranking is published as the
most popular website list.
2) Zeus: Zeus is one of the most well known botnet that collected banking data
by using man-in-the-browser keystroke logging, form grabbing and can be
utilized for any identity theft attack 10.
3) Citadel: The Citadel botnet is the enhanced version of
Zeus, which was developed by fixing Zeus bugs and adapted to new security
platforms 11. It stole more than $500 million and also infected more than 5
million personal computer systems across different countries
4) Conficker: In a servey, Conficker botnet was listed in
Damballa top 10 botnets of the year. It was responsible for DDoS attacks and
stealing banking credentials by using distributed computing resources and also
infected many medical devices 12.
5) Cutwail: It is Pushdo trojan that originally used to
distribute various other malwares like Zeus. It has its own spam module which
is known as Cutwail, that is responsible for large portion of worlds’s daily
spam traffic 13.
6) Kelihos: This botnet is mainly involved in DDoS attacks and
spreading email spam attacks. It also has capability of stealing Bitcoin
wallets and later spreading links over various socail networking websites.
Flow generation tools are responsible
for summarizing the network packet headers. They collect the packet information
with similar properties such as IP addresses and port numbers, later aggregate
them into flows, and then compute statistics such as the number of packets per
To collect and analyze traffic flow
data, the following three network components should work together. 1) Flow
Exporter, that generates the flow data, 2) Flow Collector, which collects the
flow data from exporter and, 3) Flow Analyzer, which analyzes the collected
Tranalyzer is a lightweight unidirectional
flow exporter and analyzer which use an extended version of NetFlow feature
set. It exports in both binary and ASCII formats and hence does not require any
The different machine learning
approaches are widely used for botnet detection such as C4.5 algorithm, SVM,
ANN Bayesian Networks and Naïve Bayes.
C4.5: It is a decision tree algorithm which includes a tree-structured graph
where the internal nodes represent
conditions applied to attributes, the leaf nodes denote the class labels
and the path from root to leaves represent the classification rules. It aims to
find the smallest decision trees and later convert the trained tree into
if-then rule set.
considered as one of the most predominant aggressive threats against cyber
security. The effective botnet detection is very challenging because of the
complexity and changing technology that botnets adapt automatically nowdays.
Many requirements that help in effective botnet detection are largely
unaddressed by most of the existing detection schemes include early detection,
novelty detection, and adaptibility. Hence, the need for botnet detection
approach that can adapt to the botnet evoution is very necessary. To solve this
problem, various automatic botnet detection approches use network traffic
analysis. Different systems employ the particular network traffic feature set
based on flows in their analysis of the traffic. The selection of feature set and protocol
filter is very important and can greatly affect the performance of botnet