Realizing the Full Potential of Machine Learning with Noblis’ Data Labeling Solution

Why Machine Learning?

Tackling cyber-related challenges can seem daunting. From detecting identity fraud to identifying compromised computer networks, cyber analysts have their hands full with ever-evolving threats, tactics, and procedures.

To combat threat, traditional network security tools such as Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) have historically relied on signatures and deterministic rules. This approach might have been acceptable in the past, but it has proven a weak defense against today’s sophisticated adversaries who can drastically change their attack vector in as little as an hour.

That’s why Noblis is turning to machine learning as a solution. We’re developing approaches that understand network behavior in advance so that we can predict and find abnormalities and catch threats before they become attacks.

What Makes Machine Learning Hard?

Robust machine learning systems require data—specifically, labeled data. Labelling data, though, is tricky (think: personally identifiable information) and varies in complexity. In some cases, labelling can be done by almost anyone; in others, subject matter expertise is required to annotate relevant datasets. When monitoring a computer network, labeling data segments as malicious or benign is just a starting point—and may require a whole team of experts.

The Noblis Solution

To tackle the problem of network data generation and labeling, Noblis developed a capability called CHAPPiE Swarm. This capability automatically generates, captures, and labels network traffic with a flexible network architecture. The system emulates an arbitrary number of users on an enterprise-type network configuration, using what we call “CHAPPiEs,” or base persona emulation units.

This technology compartmentalizes human browsing behaviors (such as reading time, type of preferred content, propensity for pages with ads, etc.) and combines them to formulate a browsing persona. As a result, we generate web traffic data representative of different demographics, and generate files from multiple vantages for future analysis. CHAPPiE allows users to design custom experiments and to run them with ease.

The Impact

Our technology captures and labels data automatically—effectively solving data labelling challenges and allowing machine learning to work to its full potential. The custom experiments made possible by CHAPPiE give analysts an advantage over adversaries, enabling them to build the robust classification systems instrumental in detecting cyber attacks and recovering from them.