News/Info Microsoft and Intel turn malware into images to help spot more threats!

emailx45

Бывалый
Staff member
Moderator
Microsoft and Intel turn malware into images to help spot more threats
Jon Fingas - Associate Editor - May 11, 2020
[SHOWTOGROUPS=4,20]
Microsoft and Intel have a novel approach to classifying malware: visualizing it.

They’re collaborating on STAMINA (Static Malware-as-Image Network Analysis), a project that turns rogue code into grayscale images so that a deep learning system can study them.

The approach converts the binary form of an input file into a simple stream of pixels, and turns that into a picture with dimensions that vary depending on aspects like file size. A trained neural network then determines what (if anything) has infected the file.

ZDNet noted that the AI is trained on the huge amount of data Microsoft has collected from Windows Defenders installations. The technology doesn’t need full-size, pixel-by-pixel recreations of viruses, which makes sense when large malware could easily translate to gigantic pictures.

STAMINA has proven mostly effective so far, with just over 99 percent accuracy in classifying malware and a false positive rate slightly under 2.6 percent. However, it has its limits. It works well with small files, but it struggles with larger ones.

With enough refinement, though, this could be very useful. Most malware detection relies on extracting binary signatures or fingerprints, but the sheer number of signatures makes that impractical.

This could help anti-malware tools effectively keep up and reduce the chances of security threats slipping past defenses.



Microsoft and Intel project converts malware into images before analyzing it
Microsoft and Intel Labs work on STAMINA, a new deep learning approach for detecting and classifying malware.
By Catalin Cimpanu for Zero Day | May 11, 2020 -- 01:40 GMT (18:40 PDT) | Topic: Security

stamina-steps.png


Image: Microsoft

Microsoft and Intel have recently collaborated on a new research project that explored a new approach to detecting and classifying malware.

Called STAMINA (STAtic Malware-as-Image Network Analysis), the project relies on a new technique that converts malware samples into grayscale images and then scans the image for textural and structural patterns specific to malware samples.

HOW STAMINA ACTUALLY WORKS
The Intel-Microsoft research team said the entire process followed a few simple steps. The first consisted of taking an input file and converting its binary form into a stream of raw pixel data.

Researchers then took this one-dimensional (1D) pixel stream and converted it into a 2D photo so that normal image analysis algorithms can analyze it.

The width of the image was selected based on the input file's size, using the table below. The height was dynamic, and resulted from dividing the raw pixel stream by the chosen width value.

stamina-table.png


Image: Intel, Microsoft

After assembling the raw pixel stream into a normal-looking 2D image, researchers then resized the resulting photo to a smaller dimension.

The Intel and Microsoft team said that resizing the raw image did not "negatively impact the classification result," and this was a necessary step so that the computational resources won't have to work with images consisting of billions of pixels, which would most likely slow down processing.

The resized images were then fed into a pre-trained deep neural network (DNN) that scanned the image (2D representation of the malware strain) and classified it as clean or infected.

Microsoft says it provided a sample of 2.2 million infected PE (Portable Executable) file hashes to serve as a base for the research.

Researchers used 60% of the known malware samples to train the original DNN algorithm, 20% of the files to validate the DNN, and the other 20% for the actual testing process.

The research team said STAMINA achieved an accuracy of 99.07% in identifying and classifying malware samples, with a false positives rate of 2.58%.

"The results certainly encourage the use of deep transfer learning for the purpose of malware classification," said Jugal Parikh and Marc Marino, the two Microsoft researchers who participated in the research on behalf of the Microsoft Threat Protection Intelligence Team.

MICROSOFT'S INVESTMENT IN MACHINE LEARNING
The research is part of Microsoft's recent efforts of improving malware detection using machine learning techniques.

STAMINA used a technique called deep learning. Deep learning is a subset of machine learning (ML), a branch of artificial intelligence (AI), which refers to intelligent computer networks that are capable of learning on their own from input data that is stored in an unstructured or unlabeled format -- in this case, a random malware binary.

Microsoft said that while STAMINA was accurate and fast when working with smaller files, it faultered with larger ones.

"For bigger size applications, STAMINA becomes less effective due to limitations in converting billions of pixels into JPEG images and then resizing them," Microsoft said in a blog post last week.
However, this most likely doesn't matter, as the project could be used for small files only, with excellent results.

In an interview with ZDNet earlier this month, Tanmay Ganacharya, Director for Security Research of Microsoft Threat Protection, said that Microsoft now heavily relies on machine learning for detecting emerging threats, and this system uses a different machine learning modules that are being deployed on customer systems or Microsoft servers.

Microsoft now uses client-side machine learning model engines, cloud-side machine learning model engines, machine learning modules for capturing sequences of behaviors or capturing the content of the file itself, Ganacharya said.

Based on the reported results, STAMINA could be very well one of those ML modules that we may soon see implemented at Microsoft as a way to spot malware.

Currently, Microsoft can make this approach work better than other companies primarily because of the sheer data it possesses from the hundreds of millions of Windows Defender installs.

"Anybody can build a model, but the labeled data and the quantity of it and the quality of it, really helps train the machine learning models appropriately and hence defines how effective they are going to be," Ganacharya said.

"And we, at Microsoft, have that as an advantage because we do have sensors that are bringing us lots of interesting signals through email, through identity, through the endpoint, and being able to combine them."

[/SHOWTOGROUPS]
 
Top