TY - JOUR
T1 - Session-based classification of internet applications in 3G wireless networks
AU - Lee, Seongjin
AU - Song, Jongwoo
AU - Ahn, Soohan
AU - Won, Youjip
PY - 2011/12/1
Y1 - 2011/12/1
N2 - Accurately classifying and identifying wireless network traffic associated with various applications, such as Web, VoIP, and VoD, is a challenge for both service providers and network operators. Traditional classification schemes exploiting port or payload analysis are becoming ineffective in actual networks, as many new applications are emerging. This paper presents the classification of HSDPA network traffic applications using Classification and Regression Tree (CART) and Support Vector Machine (SVM) with the session information as a basic measure. The session is bidirectional traffic stream between two hosts that is used as a basic measure and a unit of information. We acquired and processed HSDPA traffic from a real 3G network without sanitizing the data. CART and SVM are used to classify six application groups (download, game, upload, VoD, VoiP, and web) with a set of twelve easily retrievable features. These features are composed of simple statistical pieces of information, such as the standard deviation of the packet sizes, the number of packets, and the duration of a session. Compared to results of a flow-based application classification, session-based classification produces 11.07% (CART) and 21.99% (SVM) increases in the true positive rate. This feature set is further reduced to two principal components using Principal Component Regression. This paper also takes the initiative to compare CART to K-Means, the wired network traffic clustering scheme, and shows that CART is more accurate for classification than is K-Means.
AB - Accurately classifying and identifying wireless network traffic associated with various applications, such as Web, VoIP, and VoD, is a challenge for both service providers and network operators. Traditional classification schemes exploiting port or payload analysis are becoming ineffective in actual networks, as many new applications are emerging. This paper presents the classification of HSDPA network traffic applications using Classification and Regression Tree (CART) and Support Vector Machine (SVM) with the session information as a basic measure. The session is bidirectional traffic stream between two hosts that is used as a basic measure and a unit of information. We acquired and processed HSDPA traffic from a real 3G network without sanitizing the data. CART and SVM are used to classify six application groups (download, game, upload, VoD, VoiP, and web) with a set of twelve easily retrievable features. These features are composed of simple statistical pieces of information, such as the standard deviation of the packet sizes, the number of packets, and the duration of a session. Compared to results of a flow-based application classification, session-based classification produces 11.07% (CART) and 21.99% (SVM) increases in the true positive rate. This feature set is further reduced to two principal components using Principal Component Regression. This paper also takes the initiative to compare CART to K-Means, the wired network traffic clustering scheme, and shows that CART is more accurate for classification than is K-Means.
KW - CART
KW - Clustering
KW - HSDPA
KW - SVM
KW - Traffic classification
UR - http://www.scopus.com/inward/record.url?scp=80053351934&partnerID=8YFLogxK
U2 - 10.1016/j.comnet.2011.08.010
DO - 10.1016/j.comnet.2011.08.010
M3 - Article
AN - SCOPUS:80053351934
SN - 1389-1286
VL - 55
SP - 3915
EP - 3931
JO - Computer Networks
JF - Computer Networks
IS - 17
ER -