{"id":17677,"date":"2020-12-08T16:03:06","date_gmt":"2020-12-08T12:03:06","guid":{"rendered":"https:\/\/me-en.kaspersky.com\/blog\/federated-learning-against-mail-threats\/17677\/"},"modified":"2020-12-08T16:03:06","modified_gmt":"2020-12-08T12:03:06","slug":"federated-learning-against-mail-threats","status":"publish","type":"post","link":"https:\/\/me-en.kaspersky.com\/blog\/federated-learning-against-mail-threats\/17677\/","title":{"rendered":"Federated learning in the fight against e-mail threats"},"content":{"rendered":"<p>What is the easiest way to find a threat (either phishing or spam) in your e-mail? A variety of technical headers and other indirect markers of an unwanted message can point the way, but we shouldn\u2019t forget the most obvious bit \u2014 the message text. One might think it\u2019s the first thing to analyze; after all, the text is what cybercriminals or unscrupulous advertisers use to manipulate recipients. The task isn\u2019t quite that simple, though; whereas signature analysis could cope with the task in the past, it is now necessary to analyze the text using machine-learning algorithms. And if the machine learning model is to be trained to classify messages correctly, it needs to be fed messages in significant quantities \u2014 and that is not always practical, for privacy reasons. We found a solution.<\/p>\n<h2>Why isn\u2019t signature analysis effective anymore?<\/h2>\n<p>Ten years ago, catching a huge proportion of unwanted e-mail based purely on message text was relatively easy because cybercriminals used the same templates \u2014 the text of spam (and phishing) messages hardly changed. Today, cybercriminals continually improve the efficiency of their mailings, and they use millions of hooks: new video games, TV series, or smartphone models; political news; even emergencies (take, for example, the abundance of phishing and spam related to COVID-19). The massive variety of topics complicates the detection process. Moreover, attackers can even vary the text within one mailing wave to elude e-mail filters.<\/p>\n<p>Of course, signature-based approaches are still in use, though their success basically relies on encountering text that someone has already classified as unwanted or harmful. They can\u2019t work proactively because spammers can bypass them by making changes to mailing text. The only way to deal with this problem is through machine learning.<\/p>\n<h2>What\u2019s the problem with learning?<\/h2>\n<p>In recent years, machine-learning methods have shown good results in solving many problems. By analyzing a large amount of data, models learn to make decisions and find nontrivial common features in an information stream.\u00a0 We use neural networks trained on technical e-mail headers, together with DMARC, to detect e-mail threats. So, why can\u2019t we just do the same thing with message text?<\/p>\n<p>As mentioned above, models need a huge amount of data. In this case, the data consists of e-mails, and not only malicious ones \u2014 we need legitimate messages as well. Without them, teaching the model to distinguish an attack from legitimate correspondence would be impossible. We have numerous e-mail traps that catch all sorts of unwanted e-mails (we use them to make signatures) but obtaining legitimate letters for learning is a more complicated task.<\/p>\n<p>Typically, data is collected on servers for centralized learning. But when we are talking about text, additional difficulties arise: E-mails can contain private data, so storing and processing them in their original form would be unacceptable. So, how can we obtain a large enough collection of legitimate e-mails?<\/p>\n<h2>Federated learning<\/h2>\n<p>We solve that problem by using the federated learning method, neatly eliminating the need to collect legitimate e-mails and instead training models in a decentralized way. Model training takes place directly on the client\u2019s mail servers, and the central server receives only the trained weights of the machine-learning models, not message text. At the central server, algorithms combine the data with the resulting version of the model, and then we send it back to client\u2019s solutions, where model again proceeds to analyze the stream of e-mails.<\/p>\n<p>That\u2019s a slightly simplified picture: Before the newly trained model is set loose on real letters, it goes through several iterations of additional training. In other words, two models work simultaneously on the e-mail server: one in training mode, the other in active mode. After several trips to the central server, the retrained model replaces the active one.<\/p>\n<p>It\u2019s impossible to recover the text of specific e-mails from the model weights; thus its privacy during processing is assured. Nevertheless, training on real e-mails significantly improves the detection model\u2019s quality.<\/p>\n<p>At the moment, we are already using this approach to spam classification, in test mode, in <a href=\"https:\/\/www.kaspersky.com\/small-to-medium-business-security\/microsoft-office-365-security?icid=me-en_kdailyplacehold_acq_ona_smm__onl_b2b_kasperskydaily_wpplaceholder____kso365___\" target=\"_blank\" rel=\"noopener nofollow\">Kaspersky Security for Microsoft Office 365<\/a>, and it\u2019s showing outstanding results. Soon, it will be applied more widely and used to identify other threats such as phishing, BEC, and more.<\/p>\n<input type=\"hidden\" class=\"category_for_banner\" value=\"kes-cloud\">\n","protected":false},"excerpt":{"rendered":"<p>Our method for training models to filter out spam lets you maintain privacy without losing efficiency.<\/p>\n","protected":false},"author":2629,"featured_media":17678,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1318,1917],"tags":[1815,1415,76,240],"class_list":{"0":"post-17677","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-business","8":"category-smb","9":"tag-e-mail","10":"tag-machine-learning","11":"tag-phishing","12":"tag-spam"},"hreflang":[{"hreflang":"en-ae","url":"https:\/\/me-en.kaspersky.com\/blog\/federated-learning-against-mail-threats\/17677\/"},{"hreflang":"en-in","url":"https:\/\/www.kaspersky.co.in\/blog\/federated-learning-against-mail-threats\/22199\/"},{"hreflang":"en-us","url":"https:\/\/usa.kaspersky.com\/blog\/federated-learning-against-mail-threats\/23846\/"},{"hreflang":"en-gb","url":"https:\/\/www.kaspersky.co.uk\/blog\/federated-learning-against-mail-threats\/21931\/"},{"hreflang":"es-mx","url":"https:\/\/latam.kaspersky.com\/blog\/federated-learning-against-mail-threats\/20758\/"},{"hreflang":"es","url":"https:\/\/www.kaspersky.es\/blog\/federated-learning-against-mail-threats\/24408\/"},{"hreflang":"it","url":"https:\/\/www.kaspersky.it\/blog\/federated-learning-against-mail-threats\/23581\/"},{"hreflang":"ru","url":"https:\/\/www.kaspersky.ru\/blog\/federated-learning-against-mail-threats\/29618\/"},{"hreflang":"tr","url":"https:\/\/www.kaspersky.com.tr\/blog\/federated-learning-against-mail-threats\/9143\/"},{"hreflang":"x-default","url":"https:\/\/www.kaspersky.com\/blog\/federated-learning-against-mail-threats\/37936\/"},{"hreflang":"fr","url":"https:\/\/www.kaspersky.fr\/blog\/federated-learning-against-mail-threats\/16116\/"},{"hreflang":"pt-br","url":"https:\/\/www.kaspersky.com.br\/blog\/federated-learning-against-mail-threats\/16765\/"},{"hreflang":"pl","url":"https:\/\/plblog.kaspersky.com\/federated-learning-against-mail-threats\/14287\/"},{"hreflang":"de","url":"https:\/\/www.kaspersky.de\/blog\/federated-learning-against-mail-threats\/25901\/"},{"hreflang":"zh","url":"https:\/\/www.kaspersky.com.cn\/blog\/federated-learning-against-mail-threats\/12365\/"},{"hreflang":"ja","url":"https:\/\/blog.kaspersky.co.jp\/federated-learning-against-mail-threats\/29753\/"},{"hreflang":"nl","url":"https:\/\/www.kaspersky.nl\/blog\/federated-learning-against-mail-threats\/26499\/"},{"hreflang":"ru-kz","url":"https:\/\/blog.kaspersky.kz\/federated-learning-against-mail-threats\/23165\/"},{"hreflang":"en-au","url":"https:\/\/www.kaspersky.com.au\/blog\/federated-learning-against-mail-threats\/28496\/"},{"hreflang":"en-za","url":"https:\/\/www.kaspersky.co.za\/blog\/federated-learning-against-mail-threats\/28312\/"}],"acf":[],"banners":"","maintag":{"url":"https:\/\/me-en.kaspersky.com\/blog\/tag\/e-mail\/","name":"e-mail"},"_links":{"self":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/17677","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/users\/2629"}],"replies":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/comments?post=17677"}],"version-history":[{"count":0,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/17677\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/media\/17678"}],"wp:attachment":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/media?parent=17677"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/categories?post=17677"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/tags?post=17677"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}