Here’s some phish-AI research: Machine-learning code crafts phishing URLs that dodge auto-detection

An artificially intelligent system has been demonstrated generating URLs for phishing websites that appear to evade detection by security tools.

Essentially, the software can come up with URLs for webpages that masquerade as legit login pages for real websites, when in actual fact, the webpages simply collect the entered username and passwords to later hijack accounts.

Blacklists and algorithms – intelligent or otherwise – can be used to automatically identify and block links to phishing pages. Humans should be able to spot that the web links are dodgy, but not everyone is so savvy.

Using the Phishtank database, a group of computer scientists from Cyxtera Technologies, a cybersecurity biz based in Florida, USA, have built <a target=”_blank” rel=”nofollow” href=”“>DeepPhish, which is machine-learning software that, allegedly, generates phishing URLs that beat these defense mechanisms.


The team inspected more than a million URLs on Phishtank to identify three different phishing miscreants who had generated webpages to steal people’s credentials. The team fed these web addresses into AI-based phishing detection algorithm to measure how effective the URLs were at bypassing the system.

The first scumbag of the trio used 1,007 attack URLs, and only 7 were effective at avoiding setting off alarms, across 106 domains, making it successful only 0.69 per cent of the time. The second one had 102 malicious web addresses, across 19 domains. Only five of them bypassed the threat detection algorithm and it was effective 4.91 per cent of the time.

Next, they fed this information into a Long-Short Term Memory network (LSTM) to learn the general structure and extract features from the malicious URLs – for example the second threat actor commonly used “tdcanadatrustindex.html” in its address.

All the text from effective URLs were taken to create sentences and encoded into a vector and fed into the LSTM, where it is trained to predict the next character given the previous one.

Over time it learns to generate a stream of text to simulate a list of pseudo URLs that are similar to the ones used as input. When DeepPhish was trained on data from the first threat actor, it also managed to create 1,007 URLs, and 210 of them were effective at evading detection, bumping up the score from 0.69 per cent to 20.90 per cent.

When it was following the structure from the second threat actor, it also produced 102 fake URLs and 37 of them were successful, increasing the likelihood of tricking the existent defense mechanism from 4.91 per cent to 36.28 per cent.

The effectiveness rate isn’t very high as a lot of what comes out the LSTM is effective gibberish, containing strings of forbidden characters.

“It is important to automate the process of retraining the AI phishing detection system by incorporating the new synthetic URLs that each threat actor may create,” the researchers warned. ®

Source: Here’s some phish-AI research: Machine-learning code crafts phishing URLs that dodge auto-detection • The Register

Organisational Structures | Technology and Science | Military, IT and Lifestyle consultancy | Social, Broadcast & Cross Media | Flying aircraft