The issue is that ambiguity or discrepancies can be introduced if the machine-learning software ignores certain invisible Unicode characters. What’s seen on screen or printed out, for instance, won’t match up with what the neural network saw and made a decision on. It may be possible abuse this lack of Unicode awareness for nefarious purposes.
As an example, you can get Google Translate’s web interface to turn what looks like the English sentence “Send money to account 4321” into the French “Envoyer de l’argent sur le compte 1234.”
This is done by entering on the English side “Send money to account” and then inserting the invisible Unicode glyph 0x202E, which changes the direction of the next text we type in – “1234” – to “4321.” The translation engine ignores the special Unicode character, so on the French side we see “1234,” while the browser obeys the character, so it displays “4321” on the English side.
It may be possible to exploit an AI assistant or a web app using this method to commit fraud, though we present it here in Google Translate to merely illustrate the effect of hidden Unicode characters. A more practical example would be feeding the sentence…
U+8coward and a fov
…into a comment moderation system, where
U+8is the invisible Unicode character for delete the previous character. The moderation system ignores the backspace characters, sees instead a string of misspelled words, and can’t detect any toxicity – whereas browsers correctly rendering the comment show, “You are a coward and a fool.”
It was academics at the University of Cambridge in England, and the University of Toronto in Canada, who highlighted these issues, laying out their findings in a paper released on arXiv In June this year.
“We find that with a single imperceptible encoding injection – representing one invisible character, homoglyph, reordering, or deletion – an attacker can significantly reduce the performance of vulnerable models, and with three injections most models can be functionally broken,” the paper’s abstract reads.
“Our attacks work against currently deployed commercial systems, including those produced by Microsoft and Google, in addition to open source models published by Facebook and IBM.”