Selection of Online Network Traffic Discriminators for on-the-Fly Traffic Classification
There are several techniques to select a set of traffic features for traffic classification. However, most studies ignore the domain knowledge where traffic analysis or classification is performed and do not consider the always moving information carried in the networks. This paper describes a selection process of online network-traffic discriminators. We obtained 24 traffic features that can be processed on the fly and propose them as a base attribute set for future domain-aware online analysis, processing, or classification. For the selection of a set of traffic discriminators, and to avoid the inconveniences mentioned, we carried out three steps. The first step is a context knowledge-based manual selection of traffic features that meet the condition of being obtained on the fly from the flow. The second step is focused on the quality analysis of previously selected attributes to ensure the relevance of each one when performing a traffic classification. In the third step, the implementation of several incremental learning algorithms verified the usefulness of such attributes in online traffic classification processes.
T. Bakhshi and B. Ghita, “On Internet Traffic Classification: A Two-Phased Machine Learning Approach,” J. Comput. Netw. Commun., vol. 2016, pp. 21, 2016.
N. Namdev, S. Agrawal, and S. Silkari, “Recent Advancement in Machine Learning Based Internet Traffic Classification,” Procedia Comput. Sci., vol. 60, pp. 784-791, Jan. 2015.
T. T. T. Nguyen and G. Armitage, “A survey of techniques for internet traffic classification using machine learning,” IEEE Commun. Surv. Tutor., vol. 10, 4, pp. 56-76, 2008.
A. Baer et al., “DBStream: A holistic approach to large-scale network traffic monitoring and analysis,” Comput. Netw., vol. 107, pp. 5-19, Oct. 2016.
A. Moore, M. Crogan, and D. Zuev, “Discriminators for use in flow-based classification (Technical report No. RR-05-13),” University of London, Department of Computer Science, Queen Mary, 2005.
H. R. Loo and M. N. Marsono, “Online network traffic classification with incremental learning,” Evol. Syst., vol. 7, 2, pp. 129-143, Jun. 2016.
F. Ertam and E. Avcı, “A new approach for internet traffic classification: GA-WK-ELM,” Measurement, vol. 95, pp. 135-142, Jan. 2017.
S. Valenti, D. Rossi, A. Dainotti, A. Pescapè, A. Finamore, and M. Mellia, “Reviewing Traffic Classification,” in Data Traffic Monitoring and Analysis: From Measurement, Classification, and Anomaly Detection to Quality of Experience, Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 123-147.
A. Moore, J. Hall, C. Kreibich, E. Harris, and I. Pratt, “Architecture of a Network Monitor,” in Passive & Active Measurement Workshop 2003 (PAM2003), 2003.
D. Lei, Y. Xiaochun, and X. Jun, “Optimizing Traffic Classification Using Hybrid Feature Selection,” in 2008 The Ninth International Conference on Web-Age Information Management, Zhangjiajie Hunan, China, 2008, pp. 520-525.
D. Lei, C. You, and Y. Xiaochun, “Optimizing IP Flow Classification Using Feature Selection,”in Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2007), Adelaide, SA, Australia, 2007, pp. 39-45.
H. Zhang, G. Lu, M. T. Qassrawi, Y. Zhang, and X. Yu, “Feature selection for optimizing traffic classification,” Comput. Commun., vol. 35, 12, pp. 1457-1471, Jul. 2012.
D. Zuev and A. W. Moore, “Traffic Classification Using a Statistical Approach,” in Passive and Active Network Measurement, 2005, pp. 321-324.
A. W. Moore and D. Zuev, “Internet Traffic Classification Using Bayesian Analysis Techniques,” in Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, New York, NY, USA, 2005, pp. 50–60.
G. P. S. Junior, J. E. B. Maia, R. Holanda, and J. N. de Sousa, “P2P Traffic Identification using Cluster Analysis,” in 2007 First International Global Information Infrastructure Symposium, 2007, pp. 128–133.
T. Auld, A. W. Moore, and S. F. Gull, “Bayesian Neural Networks for Internet Traffic Classification,” IEEE Trans. Neural Netw., vol. 18, 1, pp. 223-239, Jan. 2007.
N. Jing, M. Yang, S. Cheng, Q. Dong, and H. Xiong, “An efficient SVM-based method for multi-class network traffic classification,” in 30th IEEE International Performance Computing and Communications Conference, Orlando, FL, 2011, pp. 1-8.
R. Holanda Filho, M. F. Fontenelle do Carmo, J. E. B. Maia, and G. Paulino Siqueira, “An Internet traffic classification methodology based on statistical discriminators,” in NOMS 2008 - 2008 IEEE Network Operations and Management Symposium, Salvador, Bahia, Brazil, 2008, pp. 907-910.
Y. Liu, H. Liu, H. Zhang, and X. Luan, “The Internet Traffic Classification an Online SVM Approach,” in 2008 International Conference on Information Networking, Busan, South Korea, 2008, pp. 1-5.
F. Noorbehbahani, A. Fanian, R. Mousavi, and H. Hasannejad, “An incremental intrusion detection system using a new semi-supervised stream classification method,” Int. J. Commun. Syst., vol. 30, 4, p. e3002, Mar. 2017.
G. Sun, T. Chen, Y. Su, and C. Li, “Internet Traffic Classification Based on Incremental Support Vector Machines,” Mob. Netw. Appl., vol. 23, 4, pp. 789-796, Aug. 2018.
G. Baptista and T. Oliveira, “Gamification and serious games: A literature meta-analysis and integrative model,” Computers in Human Behavior, vol. 92, pp. 306–315, Mar. 2019, doi: 10.1016/j.chb.2018.11.030.
J. Hamari and L. Keronen, “Why do people play games? A meta-analysis,” International Journal of Information Management, vol. 37, 3, pp. 125–141, Jun. 2017, doi: 10.1016/j.ijinfomgt.2017.01.006. H. A. Jamil, A. Mohammed, A. Hamza, S. M. Nor, and M. N. Marsono, “Selection of On-line Features for Peer-to-Peer Network Traffic Classification,” in Recent Advances in Intelligent Informatics, 2014, pp. 379-390.
D. C. Corrales, A. Ledezma, and J. C. Corrales, “A Conceptual Framework for Data Quality in Knowledge Discovery Tasks (FDQ-KDT): A Proposal,” J. Comput., vol. 10, 6, pp. 396-405, Nov. 2015.
M. Bramer, Principles of Data Mining. Springer, 2016.
M. Juhola and J. Laurikkala, “Missing values: how many can they be to preserve classification reliability?,” Artif. Intell. Rev., vol. 40, 3, pp. 231-245, Oct. 2013.
A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning from Imbalanced Data Sets. Springer, 2018.
M. M. Patil, “Handling Concept Drift in Data Streams by Using Drift Detection Methods,” in Data Management, Analytics and Innovation, Singapore, 2019, pp. 155-166.
A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “MOA: Massive Online Analysis,” J. Mach. Learn. Res., vol. 11, pp. 1601–1604, 2010.
L. Rutkowski, M. Jaworski, and P. Duda, Stream Data Mining: Algorithms and Their Probabilistic Properties. Springer, 2019.
Author BiographiesAngela María Vargas Arcila, Universidad del Cauca
MSc in Telematics Engineering from Universidad del Cauca. Ph.D. student in Telematics Engineering at
Universidad del Cauca, and Computer Science and Technology at Universidad Carlos III de MadridJuan Carlos Corrales Muñoz, Universidad del Cauca
Ph.D. in Computer Science from Université de Versailles Saint-Quentin-en-Yvelines. Full-time Professor and
Leader of the Telematics Engineering Group at Universidad del Cauca.Alvaro Rendon Gallon, Universidad del Cauca
Ph.D. in Telecommunications Engineering from Universidad Politécnica de Madrid. Full-time professor and
director of the Doctoral Program in Telematics Engineering at Universidad del CaucaAraceli Sanchis, Universidad Carlos III de Madrid
Ph.D. in Computer Science from Universidad Politécnica de Madrid. Ph.D. in Physical Chemistry from Universidad Complutense de Madrid. University Associate Professor of Computer Science at Universidad Carlos III de Madrid.
Copyright (c) 2020 Revista Ingenierías Universidad de Medellín
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The total or partial reproduction of the contents of the journal for educational, research, or academic purposes is authorized as long as the source is cited. For reproduction for other purposes, express authorization from the Sello Editorial Universidad de Medellín is required.