It’s tricky. Computers have to follow what is being said by whom, the context of the conversation and often some real world facts to understand cultural references. Feeding machines single sentences is often ineffective; it’s a difficult task for humans to detect if individual remarks are cheeky too.
The researchers, therefore, built a system designed to inspect individual sentences as well as the ones before and after it. The model is made up of several bidirectional long-short term memory networks (BiLSTMs) stitched together, and was accurate at spotting a sarcastic comment about 70 per cent of the time.
“Typical LSTMs read and encode the data – a sentence – from left to right. BiLSTMs will process the sentence in a left to right and right to left manner,” Reza Ghaeini, coauthor of the research on arXiv and a PhD student at Oregon State University, explained to The Register this week.
“The outcome of the BiLSTM for each position is the concatenation of forward and backward encodings of each position. Therefore, now each position contains information about the whole sentence (what is seen before and what will be seen after).”
So, where’s the best place to learn sarcasm? Reddit’s message boards, of course. The dataset known as SARC – geddit? – contains hundreds of thousands of sarcastic and non-sarcastic comments and responses.
“It is quite difficult for both machines and humans to distinguish sarcasm without context,” Mikhail Khodak, a graduate student at Princeton who helped compile SARC, previously told El Reg.
“One of the advantages of our corpus is that we provide the text preceding each statement as well as the author of the statement, so algorithms can see whether it is sarcastic in the context of the conversation or in the context of the author’s past statements.”