group of University of Edinburgh boffins have turned CSI:Crime Scene Investigation scripts into a natural language training dataset.Their aim is to improve how bots understand what’s said to them – natural language understanding.Drawing on 39 episodes from the first five seasons of the series, Lea Freeman, Shay Cohen and Mirella Lapata have broken the scripts up as inputs to a LSTM (long short-term memory) model.The boffins used the show because of its worst flaw: a rigid adherence to formulaic scripts that make it utterly predictable. Hence the name of their paper: “Whodunnit? Crime Drama as a Case for Natural Language Understanding”.“Each episode poses the same basic question (i.e., who committed the crime) and naturally provides the answer when the perpetrator is revealed”, the boffins write. In other words, identifying the perpetrator is a straightforward sequence labelling problem.What the researchers wanted was for their model to follow the kind of reasoning a viewer goes through in an episode: learn about the crime and the cast of characters, start to guess who the perp is (and see whether the model can outperform the humans).

Source: 39 episodes of ‘CSI’ used to build AI’s natural language model • The Register