NASA and Google using AI to hunt down potentially habitable planets

Astrobiologists are mostly interested in rocky exoplanets that lie in the habitable zone around their parent stars, where liquid water may exist on its surface. NASA’s Kepler spacecraft has spotted a handful of these in the so-called Goldilocks Zone – where it’s not too cold or too hot for life.

As such, a second team from Google and NASA’s lab has built a machine-learning-based tool known as INARA that can identify the chemical compounds in a rocky exoplanet’s atmosphere by studying its high-resolution telescope images.

To develop this software, the brainiacs simulated more than three million planets’ spectral signatures – fingerprints of their atmospheres’ chemical makeups – and labelled them as such to train a convolutional neural network (CNN). The CNN can therefore be used to automatically estimate the chemical composition of a planet from images and light curves of its atmosphere taken from NASA’s Kepler spacecraft. Basically, a neural network was trained to link telescope images to chemical compositions, and thus, you should it a given set of images, and it will spit out the associated chemical components – which can be used to assess whether those would lead to life bursting on the scene.

INARA takes seconds to figure out the biological compounds potentially present in a world’s atmosphere. “Given the scale of the datasets produced by the Kepler telescopes, and the even greater volume of data that will return to Earth from the soon-to-be-launched Transiting Exoplanet Survey Satellite (TESS) satellite, minimizing analysis time per planet can accelerate this research and ensure we don’t miss any viable candidates,” Mascaro concluded. ®

Source: Finally, a use for AI and good old-fashioned simulations: Hunting down E.T. in outer space • The Register

The US military wants to teach AI some basic common sense

Wherever artificial intelligence is deployed, you will find it has failed in some amusing way. Take the strange errors made by translation algorithms that confuse having someone for dinner with, well, having someone for dinner.

But as AI is used in ever more critical situations, such as driving autonomous cars, making medical diagnoses, or drawing life-or-death conclusions from intelligence information, these failures will no longer be a laughing matter. That’s why DARPA, the research arm of the US military, is addressing AI’s most basic flaw: it has zero common sense.

“Common sense is the dark matter of artificial intelligence,” says Oren Etzioni, CEO of the Allen Institute for AI, a research nonprofit based in Seattle that is exploring the limits of the technology. “It’s a little bit ineffable, but you see its effects on everything.”

DARPA’s new Machine Common Sense (MCS) program will run a competition that asks AI algorithms to make sense of questions like this one:

A student puts two identical plants in the same type and amount of soil. She gives them the same amount of water. She puts one of these plants near a window and the other in a dark room. The plant near the window will produce more (A) oxygen (B) carbon dioxide (C) water.

A computer program needs some understanding of the way photosynthesis works in order to tackle the question. Simply feeding a machine lots of previous questions won’t solve the problem reliably.

These benchmarks will focus on language because it can so easily trip machines up, and because it makes testing relatively straightforward. Etzioni says the questions offer a way to measure progress toward common-sense understanding, which will be crucial.

Tech companies are busy commercializing machine-learning techniques that are powerful but fundamentally limited. Deep learning, for instance, makes it possible to recognize words in speech or objects in images, often with incredible accuracy. But the approach typically relies on feeding large quantities of labeled data—a raw audio signal or the pixels in an image—into a big neural network. The system can learn to pick out important patterns, but it can easily make mistakes because it has no concept of the broader world.

Source: The US military wants to teach AI some basic common sense – MIT Technology Review

Google’s AI Bots Invent New Legs to Scamper Through Obstacle Courses

Using a technique called reinforcement learning, a researcher at Google Brain has shown that virtual robots can redesign their body parts to help them navigate challenging obstacle courses—even if the solutions they come up with are completely bizarre.

Embodied cognition is the idea that an animal’s cognitive abilities are influenced and constrained by its body plan. This means a squirrel’s thought processes and problem-solving strategies will differ somewhat from the cogitations of octopuses, elephants, and seagulls. Each animal has to navigate its world in its own special way using the body it’s been given, which naturally leads to different ways of thinking and learning.

“Evolution plays a vital role in shaping an organism’s body to adapt to its environment,” David Ha, a computer scientist and AI expert at Google Brain, explained in his new study. “The brain and its ability to learn is only one of many body components that is co-evolved together.”

[…]

Using the OpenAI Gym framework, Ha was able to provide an environment for his walkers. This framework looks a lot like an old-school, 2D video game, but it uses sophisticated virtual physics to simulate natural conditions, and it’s capable of randomly generating terrain and other in-game elements.

As for the walker, it was endowed with a pair of legs, each consisting of an upper and lower section. The bipedal bot had to learn how to navigate through its virtual environment and improve its performance over time. Researchers at DeepMind conducted a similar experiment last year, in which virtual bots had to learn how to walk from scratch and navigate through complex parkour courses. The difference here is that Ha’s walkers had the added benefit of being able to redesign their body plan—or at least parts of it. The bots could alter the lengths and widths of their four leg sections to a maximum of 75 percent of the size of the default leg design. The walkers’ pentagon-shaped head could not be altered, serving as cargo. Each walker used a digital version of LIDAR to assess the terrain immediately in front of it, which is why (in the videos) they appear to shoot a thin laser beam at regular intervals.

Using reinforcement-learning algorithms, the bots were given around a day or two to devise their new body parts and come up with effective locomotion strategies, which together formed a walker’s “policy,” in the parlance of AI researchers. The learning process is similar to trial-and-error, except the bots, via reinforcement learning, are rewarded when they come up with good strategies, which then leads them toward even better solutions. This is why reinforcement learning is so powerful—it speeds up the learning process as the bots experiment with various solutions, many of which are unconventional and unpredictable by human standards.

Left: An unmodified walker joyfully skips through easy terrain. Right: With training, a self-modified walker chose to hop instead.
GIF: David Ha/Google Brain/Gizmodo

For the first test (above), Ha placed a walker in a basic environment with no obstacles and gently rolling terrain. Using its default body plan, the bot adopted a rather cheerful-looking skipping locomotion strategy. After the learning stage, however, it modified its legs such that they were thinner and longer. With these modified limbs, the walker used its legs as springs, quickly hopping across the terrain.

The walker chose a strange body plan and an unorthodox locomotion strategy for traversing challenging terrain.
GIF: David Ha/Google Brain/Gizmodo

The introduction of more challenging terrain (above), such as having to walk over obstacles, travel up and down hills, and jump over pits, introduced some radical new policies, namely the invention of an elongated rear “tail” with a dramatically thickened end. Armed with this configuration, the walkers hopped successfully around the obstacle course.

By this point in the experiment, Ha could see that reinforcement learning was clearly working. Allowing a walker “to learn a better version of its body obviously enables it to achieve better performance,” he wrote in the study.

Not content to stop there, Ha played around with the idea of motivating the walkers to adopt some design decisions that weren’t necessarily beneficial to its performance. The reason for this, he said, is that “we may want our agent to learn a design that utilizes the least amount of materials while still achieving satisfactory performance on the task.”

The tiny walker adopted a very familiar gait when faced with easy terrain.
GIF: David Ha/Google Brain/Gizmodo

So for the next test, Ha rewarded an agent for developing legs that were smaller in area (above). With the bot motivated to move efficiently across the terrain, and using the tiniest legs possible (it no longer had to adhere to the 75 percent rule), the walker adopted a rather conventional bipedal style while navigating the easy terrain (it needed just 8 percent of the leg area used in the original design).

The walker struggled to come up with an effective body plan and locomotion style when it was rewarded for inventing small leg sizes.
GIF: David Ha/Google Brain/Gizmodo

But the walker really struggled to come up with a sensible policy when having to navigate the challenging terrain. In the example shown above, which was the best strategy it could muster, the walker used 27 percent of the area of its original design. Reinforcement learning is good, but it’s no guarantee that a bot will come up with something brilliant. In some cases, a good solution simply doesn’t exist.

Source: Google’s AI Bots Invent Ridiculous New Legs to Scamper Through Obstacle Courses

Stanford AI bot to negotiate sales for you with Craigslist

Artificially intelligent bots are notoriously bad at communicating with, well, anything. Conversations with the code, whether it’s between themselves or with people, often go awry, and veer off topic. Grammar goes out the window, and sentences become nonsensical.

[…]

Well, a group of researchers at Stanford University in the US have figured out how to, in theory, prevent that chaos and confusion from happening. In an experiment, they trained neural networks to negotiate when buying stuff in hypothetical situations, mimicking the process of scoring and selling stuff on sites like Craigslist or Gumtree.

Here’s the plan: sellers post adverts trying to get rid off their old possessions. Buyers enquire about the condition of the items, and if a deal is reached, both parties arrange a time and place to exchange the item for cash.

Here’s an example of a conversation between a human, acting as a seller, and a Stanford-built bot, as the buyer:

craiglist_bot_2

Example of a bot (A) interacting with a human (B) to buy a Fitbit. Image credit: He et al.

The dialogue is a bit stiff, and the grammar is wrong in places, but it does the job even though no deal is reached. The team documented their work in this paper, here [PDF], which came to our attention this week.

The trick is to keep the machines on topic and stop them from generating gibberish. The researchers used supervised learning and reinforcement learning together with hardcoded rules to force the bots to stay on task.

The system is broadly split into three parts: a parser, a manager and a generator. The parser inspects keywords that signify a specific action that is being taken. Next, the manager stage chooses how the bot should respond. These actions, dubbed “course dialogue acts”, guide the bot through the negotiation task so it knows when to inquire, barter a price, agree or disagree. Finally, the generator produces the response to keep the dialogue flowing.

craiglist_bot_1

Diagram of how the system works. The interaction is split into a series of course dialogue acts, the manager chooses what action the bot should take, and a generator spits out words for the dialogue. Image credit: He et al.

In the reinforcement learning method, the bots are encouraged to reach a deal and penalized with a negative reward when it fails to reach an agreement. The researchers train the bot by collecting 6,682 dialogues between humans working on the Amazon Mechanical Turk platform.

They call it the Craigslist Negotiation Dataset since they modeled the scenarios by scraping postings for the items in the six most popular categories on Craigslist. These include items filed under housing, furniture, cars, bikes, phones and electronics.

The conversations are represented as a sequence of actions or course dialogue acts. A long short-term memory network (LSTM) encodes the course dialogue act and another LSTM decodes it.

The manager part chooses the appropriate response. For example, it can propose a price, argue to go lower or higher, and accepts or rejects a deal. The generator conveys all these actions in plain English.

During the testing phase, the bots were pitted against real humans. Participants were then asked to how humans the interaction seemed. The researchers found that their systems were more successful at bargaining for a deal and were more human-like than other bots.

It doesn’t always work out, however. Here’s an example of a conversation where the bot doesn’t make much sense.

craiglist_bot_3

A bot (A) trying to buy a Fitbit off a human seller (B). This time, however, it fails to communicate effectively. Image credit: He et al.

If you like the idea of crafting a bot to help you automatically negotiate for things online then you can have a go at making your own. The researchers have posted the data and code on CodaLab. ®

Source: Those Stanford whiz kids have done it again. Now a chatty AI bot to negotiate sales for you with Craigslist riffraff • The Register

AI lifeline to help devs craft smartmobe apps that suck a whole lot less… battery capacity

Artificial intelligence can help developers design mobile phone apps that drain less battery, according to new research.

The system, dubbed DiffProff, will be presented this week at the USENIX Symposium on Operating Systems Design and Implementation conference in California, was developed by Charlie Hu and Abhilash Jindal, who have a startup devoted to better battery testing via software.

DiffProf rests on the assumption that apps that carry out the same function perform similar tasks in slightly different ways. For example, messaging apps like Whatsapp, Google Hangouts, or Skype, keep old conversations and bring up a keyboard so replies can be typed and sent. Despite this, Whatsapp is about three times more energy efficient than Skype.

“What if a feature of an app needs to consume 70 percent of the phone’s battery? Is there room for improvement, or should that feature be left the way it is?” said Hu, who is also a professor of electrical and computer engineering at Purdue University.

The research paper describing DiffProf is pretty technical. Essentially, it describes a method that uses “differential energy profiling” to create energy profiles for different apps. First, the researchers carry out a series of automated tests on apps by performing identical tasks on each app to work out energy efficiency.

Next, the profile also considers the app’s “call tree” also known as a call graph. These describe the different computer programs that are executed in order to perform a broader given task.

Apps that have the same function, like playing music or sending emails, should have similar call trees. Slight variances in the code, however, lead to different energy profiles. DiffProf uses an algorithm to compare the call trees and highlights what programs are causing an app to drain more energy.

Developers running the tool receive a list of Java packages, that describe the different software features, which appear in the both apps being compared. They can then work out which programs in the less energy efficient app suck up more juice and if it can be altered or deleted altogether. The tool is only useful if the source code for similar apps have significant overlap.

Source: AI lifeline to help devs craft smartmobe apps that suck a whole lot less… battery capacity • The Register

DoNotPay App Lets You ‘Sue Anyone By Pressing a Button’. Success rate: 50%

a new, free app promises to let you “sue anyone by pressing a button” and have an AI-powered lawyer fight your case.

Do Not Pay, a free service that launched in the iOS App store today, uses IBM Watson-powered artificial intelligence to help people win up to $25,000 in small claims court. It’s the latest project from 21-year-old Stanford senior Joshua Browder, whose service previously allowed people to fight parking tickets or sue Equifax; now, the app has streamlined the process. It’s the “first ever service to sue anyone (in all 3,000 counties in 50 states) by pressing a button.”

The crazy part: the robot lawyer actually wins in court. In its beta testing phase, which included releases in the UK and in select numbers across all 50 US states, Do Not Pay has helped its users get back $16 million in disputed parking tickets. In a phone call with Motherboard, Browder said that the success rate of Do Not Pay is about 50 percent, with average winnings of about $7,000.

[…]

The app works by having a bot ask the user a few basic questions about their legal issue. The bot then uses the answers to classify the case into one of 15 different legal areas, such as breach of contract or negligence. After that, Do Not Pay draws up documents specific to that legal area, and fills in the specific details. Just print it out, mail it to the courthouse, and violá—you’re a plaintiff. And if you have to show up to court in person, Do Not Pay even creates a script for the plaintiff to read out loud in court.

[…]

Browder told Motherboard that data protection is a central part of his service, which is free (users keep 100 percent of what they win in court, Browder says.) Per Do Not Pay’s privacy policy, all user data is protected with 256-bit encryption, and no third parties get access to personal user information such as home address, email address, or information pertaining to a particular case.

[…]

Of all of Do Not Pay’s legal disputes, Browder told Motherboard that he’s most proud of an instance where a woman took Equifax to court and won, twice. After her data was compromised by Equifax last year, she took the $3 billion company to small claims court and won. When Equifax appealed the verdict and sent a company lawyer to fight for an appeal, the woman won again.

Source: DoNotPay App Lets You ‘Sue Anyone By Pressing a Button’

Introducing MLflow: an Open Source Machine Learning Platform for tracking, projects and models

MLflow is inspired by existing ML platforms, but it is designed to be open in two senses:

  1. Open interface: MLflow is designed to work with any ML library, algorithm, deployment tool or language. It’s built around REST APIs and simple data formats (e.g., a model can be viewed as a lambda function) that can be used from a variety of tools, instead of only providing a small set of built-in functionality. This also makes it easy to add MLflow to your existing ML code so you can benefit from it immediately, and to share code using any ML library that others in your organization can run.
  2. Open source: We’re releasing MLflow as an open source project that users and library developers can extend. In addition, MLflow’s open format makes it very easy to share workflow steps and models across organizations if you wish to open source your code.

Mlflow is still currently in alpha, but we believe that it already offers a useful framework to work with ML code, and we would love to hear your feedback. In this post, we’ll introduce MLflow in detail and explain its components.

MLflow Alpha Release Components

This first, alpha release of MLflow has three components:

MLflow Components

MLflow Tracking

MLflow Tracking is an API and UI for logging parameters, code versions, metrics and output files when running your machine learning code to later visualize them. With a few simple lines of code, you can track parameters, metrics, and artifacts:

import mlflow

# Log parameters (key-value pairs)
mlflow.log_param("num_dimensions", 8)
mlflow.log_param("regularization", 0.1)

# Log a metric; metrics can be updated throughout the run
mlflow.log_metric("accuracy", 0.1)
...
mlflow.log_metric("accuracy", 0.45)

# Log artifacts (output files)
mlflow.log_artifact("roc.png")
mlflow.log_artifact("model.pkl")

You can use MLflow Tracking in any environment (for example, a standalone script or a notebook) to log results to local files or to a server, then compare multiple runs. Using the web UI, you can view and compare the output of multiple runs. Teams can also use the tools to compare results from different users:

MLflow Tracking UI

MLflow Tracking UI

MLflow Projects

MLflow Projects provide a standard format for packaging reusable data science code. Each project is simply a directory with code or a Git repository, and uses a descriptor file to specify its dependencies and how to run the code. A MLflow Project is defined by a simple YAML file called MLproject.

name: My Project
conda_env: conda.yaml
entry_points:
  main:
    parameters:
      data_file: path
      regularization: {type: float, default: 0.1}
    command: "python train.py -r {regularization} {data_file}"
  validate:
    parameters:
      data_file: path
    command: "python validate.py {data_file}"

Projects can specify their dependencies through a Conda environment. A project may also have multiple entry points for invoking runs, with named parameters. You can run projects using the mlflow run command-line tool, either from local files or from a Git repository:

mlflow run example/project -P alpha=0.5

mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5

MLflow will automatically set up the right environment for the project and run it. In addition, if you use the MLflow Tracking API in a Project, MLflow will remember the project version executed (that is, the Git commit) and any parameters. You can then easily rerun the exact same code.

The project format makes it easy to share reproducible data science code, whether within your company or in the open source community. Coupled with MLflow Tracking, MLflow Projects provides great tools for reproducibility, extensibility, and experimentation.

MLflow Models

MLflow Models is a convention for packaging machine learning models in multiple formats called “flavors”. MLflow offers a variety of tools to help you deploy different flavors of models. Each MLflow Model is saved as a directory containing arbitrary files and an MLmodel descriptor file that lists the flavors it can be used in.

time_created: 2018-02-21T13:21:34.12
flavors:
  sklearn:
    sklearn_version: 0.19.1
    pickled_model: model.pkl
  python_function:
    loader_module: mlflow.sklearn
    pickled_model: model.pkl

In this example, the model can be used with tools that support either the sklearn or python_function model flavors.

MLflow provides tools to deploy many common model types to diverse platforms. For example, any model supporting the python_function flavor can be deployed to a Docker-based REST server, to cloud platforms such as Azure ML and AWS SageMaker, and as a user-defined function in Apache Spark for batch and streaming inference. If you output MLflow Models as artifacts using the Tracking API, MLflow will also automatically remember which Project and run they came from.

Getting Started with MLflow

To get started with MLflow, follow the instructions at mlflow.org or check out the alpha release code on Github. We are excited to hear your feedback on the concepts and code!

Source: Introducing MLflow: an Open Source Machine Learning Platform – The Databricks Blog

Building your own PC for AI is 10x cheaper than renting out GPUs on cloud, apparently

Jeff Chen, an AI techie and entrepreneur at Stanford University in the US, believes that a suitable machine can be built for about $3,000 (~£2,300) without including tax. At the heart of the beast is an Nvidia GeForce 1080Ti GPU, a 12-core AMD Threadripper processor, 64GB of RAM, and a 1TB SSD card for data. Bung in a fan to keep the computer cool, a motherboard, a power supply, wrap the whole thing in a case, and voila.

Here’s the full checklist…

GPU_computer_cost

Image credit: Jeff Chen

Unlike renting out compute and data storage on cloud, once your personal rig is built, the only recurring cost to pay for is power. It costs $3 (£2.28) an hour to rent a GPU-accelerated system on AWS, whereas it’s only 20 cents (15p) to run on your own computer. Chen has done the sums, and, apparently, after two months that will work out to being ten times cheaper. The gap decreases slightly over time as the computer hardware depreciates.

“There are some drawbacks, such as slower download speed to your machine because it’s not on the backbone, static IP is required to access it away from your house, you may want to refresh the GPUs in a couple of years, but the cost savings is so ridiculous it’s still worth it,” he said this week.

Source: Building your own PC for AI is 10x cheaper than renting out GPUs on cloud, apparently • The Register

Google is using AI to predict floods in India and warn users

For years Google has warned users about natural disasters by incorporating alerts from government agencies like FEMA into apps like Maps and Search. Now, the company is making predictions of its own. As part of a partnership with the Central Water Commission of India, Google will now alert users in the country about impending floods. The service is only currently available in the Patna region, with the first alert going out earlier this month.

As Google’s engineering VP Yossi Matias outlines in a blog post, these predictions are being made using a combination of machine learning, rainfall records, and flood simulations.

“A variety of elements — from historical events, to river level readings, to the terrain and elevation of a specific area — feed into our models,” writes Matias. “With this information, we’ve created river flood forecasting models that can more accurately predict not only when and where a flood might occur, but the severity of the event as well.”

Source: Google is using AI to predict floods in India and warn users – The Verge

Hadoop and NoSQL backups timed by AI

Machine learning data management company Imanis Data has introduced an autonomous backup product powered by machine learning.

The firm said users can specify a desired RPO (Recovery Point Objective) and its SmartPolicies tech then set up the backup schedules. The tech is delivered as an upgrade to the Imanis Data Management Platform (IDMP) product.

SmartPolicies uses metrics including criticality and volume of data to be protected, primary cluster workloads, and daily or seasonal resource utilisation, to determine the most efficient way to achieve the desired RPO.

If it can’t be met because, for example, production systems are too busy, or computing resources are insufficient, then SmartPolicies provides recommendations to make the RPO executable.

Other items in the upgrade include any-point-in-time recovery for multiple NoSQL databases, better ransomware prevention and general data management improvements, such as job tag listing and a browsable catalog for simpler recovery.

[…]

Having backup software set up its own schedules based on input RPO values isn’t a new idea, but having it done with machine learning is. The checking of available resources is a darn good idea too and, when you think about it, absolutely necessary.

Otherwise “backup run failed” messages would start popping up all over the place – not good. We expect other backup suppliers to follow in Imanis’s wake and start sporting “machine learning-driven policy” messages quite quickly.

Source: When should I run backup, robot overlord? Autonomous Hadoop and NoSQL backup is now a thing • The Register

AI’s ‘deep-fake’ vids surge ahead in realism

Researchers from Carnegie Mellon University and Facebook Reality Lab are presenting Recycle-GAN, a generative adversarial system for “unsupervised video retargeting” this week at the European Conference on Computer Vision (ECCV) in Germany.

Unlike most methods, Recycle-GAN doesn’t rely on learning an explicit mapping between the images in a source and target video to perform a face swap. Instead, it’s an unsupervised learning method that begins to line up the frames from both videos based on “spatial and temporal information”.

In other words, the content that is transferred from one video to another not only relies on mapping the space but also the order of the frames to make sure both are in sync. The researchers use the comedians Stephen Colbert and John Oliver as an example. Colbert is made to look like he is delivering the same speech as Oliver, as his face is use to mimic the small movements of Oliver’s head nodding or his mouth speaking.

Here’s one where John Oliver is turned into a cartoon character.

It’s not just faces, Recycle-Gan can be used for other scenarios too. Other examples include synching up different flowers so they appear to bloom and die at the same time.

The researchers also play around with wind conditions, turning what looks like a soft breeze blowing into the trees into a more windy day without changing the background.

“I think there are a lot of stories to be told,” said Aayush Bansal, co-author of the research and a PhD. student at CMU.”It’s a tool for the artist that gives them an initial model that they can then improve,” he added.

Recycle-GAN might prove useful in other areas. Simulating various effects for video footage taken from self-driving cars could help them drive under different conditions.

“Such effects might be useful in developing self-driving cars that can navigate at night or in bad weather, Bansal said. These videos might be difficult to obtain or tedious to label, but its something Recycle-GAN might be able to generate automatically.

Source: The eyes don’t have it! AI’s ‘deep-fake’ vids surge ahead in realism • The Register

Wow, great invention: Now AI eggheads teach machines how to be sarcastic using Reddit

It’s tricky. Computers have to follow what is being said by whom, the context of the conversation and often some real world facts to understand cultural references. Feeding machines single sentences is often ineffective; it’s a difficult task for humans to detect if individual remarks are cheeky too.

The researchers, therefore, built a system designed to inspect individual sentences as well as the ones before and after it. The model is made up of several bidirectional long-short term memory networks (BiLSTMs) stitched together, and was accurate at spotting a sarcastic comment about 70 per cent of the time.

“Typical LSTMs read and encode the data – a sentence – from left to right. BiLSTMs will process the sentence in a left to right and right to left manner,” Reza Ghaeini, coauthor of the research on arXiv and a PhD student at Oregon State University, explained to The Register this week.

“The outcome of the BiLSTM for each position is the concatenation of forward and backward encodings of each position. Therefore, now each position contains information about the whole sentence (what is seen before and what will be seen after).”

So, where’s the best place to learn sarcasm? Reddit’s message boards, of course. The dataset known as SARC – geddit? – contains hundreds of thousands of sarcastic and non-sarcastic comments and responses.

“It is quite difficult for both machines and humans to distinguish sarcasm without context,” Mikhail Khodak, a graduate student at Princeton who helped compile SARC, previously told El Reg.

“One of the advantages of our corpus is that we provide the text preceding each statement as well as the author of the statement, so algorithms can see whether it is sarcastic in the context of the conversation or in the context of the author’s past statements.”

Source: Wow, great invention: Now AI eggheads teach machines how to be sarcastic using Reddit • The Register

Facebook creates an AI-based tool to automate bug fixes

SapFix, which is still under development, is designed to generate fixes automatically for specific bugs before sending them to human engineers for approval.

Facebook, which announced the tool today ahead of its @Scale conference in San Jose, California, for developers building large-scale systems and applications, calls SapFix an “AI hybrid tool.” It uses artificial intelligence to automate the creation of fixes for bugs that have been identified by its software testing tool Sapienz, which is already being used in production.

SapFix will eventually be able to operate independently from Sapienz, but for now it’s still a proof-of-concept that relies on the latter tool to pinpoint bugs first of all.

SapFix can fix bugs in a number of ways, depending on how complex they are, Facebook engineers Yue Jia, Ke Mao and Mark Harman wrote in a blog post announcing the tools. For simpler bugs, SapFix creates patches that revert the code submission that introduced them. In the case of more complicated bugs, SapFix uses a collection of “templated fixes” that were created by human engineers based on previous bug fixes.

And in case those human-designed template fixes aren’t up to the job, SapFix will then attempt what’s called a “mutation-based fix,” which works by continually making small modifications to the code that caused the software to crash, until a solution is found.

SapFix goes further by generating multiple potential fixes for each bug, then submits these for human evaluation. It also performs tests on each of these fixes so engineers can see if they might cause other problems, such as compilation errors and other crashes somewhere else.

Source: Facebook creates an AI-based tool to automate bug fixes – SiliconANGLE

Social Mapper – A Social Media Mapping Tool that correlates profiles via facial recognition

Social Mapper is a Open Source Intelligence Tool that uses facial recognition to correlate social media profiles across different sites on a large scale. It takes an automated approach to searching popular social media sites for targets names and pictures to accurately detect and group a person’s presence, outputting the results into report that a human operator can quickly review.

Social Mapper has a variety of uses in the security industry, for example the automated gathering of large amounts of social media profiles for use on targeted phishing campaigns. Facial recognition aids this process by removing false positives in the search results, so that reviewing this data is quicker for a human operator.

https://github.com/SpiderLabs/social_mapper

 

AI builds wiki entries for people that aren’t on it but should be

Human-generated knowledge bases like Wikipedia have a recall problem. First, there are the articles that should be there but are entirely missing. The unknown unknowns.

Consider Joelle Pineau, the Canadian roboticist bringing scientific rigor to artificial intelligence and who directs Facebook’s new AI Research lab in Montreal. Or Miriam Adelson, an actively publishing addiction treatment researcher who happens to be a billionaire by marriage and a major funder of her own field. Or Evelyn Wang, the new head of MIT’s revered MechE department whose accomplishments include a device that generates drinkable water from sunlight and desert air. When I wrote this a few days ago, none of them had articles on English Wikipedia, though they should by any measure of notability.

(Pineau is up now thanks to my friend and fellow science crusader Jess Wade who created an article just hours after I told her about Pineau’s absence. And if the internet is in a good mood, someone will create articles for the other two soon after this post goes live.)

But I didn’t discover those people on my own. I used a machine learning system we’re building at Primer. It discovered and described them for me. It does this much as a human would, if a human could read 500 million news articles, 39 million scientific papers, all of Wikipedia, and then write 70,000 biographical summaries of scientists.

[…]

We are publicly releasing free-licensed data about scientists that we’ve been generating along the way, starting with 30,000 computer scientists. Only 15% of them are known to Wikipedia. The data set includes 1 million news sentences that quote or describe the scientists, metadata for the source articles, a mapping to their published work in the Semantic Scholar Open Research Corpus, and mappings to their Wikipedia and Wikidata entries. We will revise and add to that data as we go. (Many thanks to Oren Etzioni and AI2 for data and feedback.) Our aim is to help the open data research community build better tools for maintaining Wikipedia and Wikidata, starting with scientific content.

Fluid Knowledge

We trained Quicksilver’s models on 30,000 English Wikipedia articles about scientists, their Wikidata entries, and over 3 million sentences from news documents describing them and their work. Then we fed in the names and affiliations of 200,000 authors of scientific papers.

In the morning we found 40,000 people missing from Wikipedia who have a similar distribution of news coverage as those who do have articles. Quicksilver doubled the number of scientists potentially eligible for a Wikipedia article overnight.

It also revealed the second flavor of the recall problem that plagues human-generated knowledge bases: information decay. For most of those 30,000 scientists who are on English Wikipedia, Quicksilver identified relevant information that was missing from their articles.

Source: Primer | Machine-Generated Knowledge Bases

AI identifies heat-resistant coral reefs in Indonesia

A recent scientific survey off the coast of Sulawesi Island in Indonesia suggests that some shallow water corals may be less vulnerable to global warming than previously thought.

Between 2014 and 2017, the world’s reefs endured the worst coral bleaching event in history, as the cyclical El Niño climate event combined with anthropogenic warming to cause unprecedented increases in water temperature.

But the June survey, funded by Microsoft co-founder Paul Allen’s family foundation, found the Sulawesi reefs were surprisingly healthy.

In fact the reefs hadn’t appeared to decline significantly in condition than when they were originally surveyed in 2014 – a surprise for British scientist Dr Emma Kennedy, who led the research team.

A combination of 360-degree imaging tech and Artificial Intelligence (AI) allowed scientists to gather and analyse more than 56,000 images of shallow water reefs. Over the course of a six-week voyage, the team deployed underwater scooters fitted with 360 degree cameras that allowed them to photograph up to 1.5 miles of reef per dive, covering a total of 1487 square miles in total.

Researchers at the University of Queensland in Australia then used cutting edge AI software to handle the normally laborious process of identifying and cataloguing the reef imagery. Using the latest Deep Learning tech, they ‘taught’ the AI how to detect patterns in the complex contours and textures of the reef imagery and thus recognise different types of coral and other reef invertebrates.

Once the AI had shown between 400 and 600 images, it was able to process images autonomously. Says Dr Kennedy, “the use of AI to rapidly analyse photographs of coral has vastly improved the efficiency of what we do — what would take a coral reef scientist 10 to 15 minutes now takes the machine a few seconds.”

Source: AI identifies heat-resistant coral reefs in Indonesia | Environment | The Guardian

MS Sketch2Code uses AI to convert a picture of a wireframe to HTML – download and try

Description

Sketch2Code is a solution that uses AI to transform a handwritten user interface design from a picture to a valid HTML markup code.

Process flow

The process of transformation of a handwritten image to HTML this solution implements is detailed as follows:

  1. The user uploads an image through the website.
  2. A custom vision model predicts what HTML elements are present in the image and their location.
  3. A handwritten text recognition service reads the text inside the predicted elements.
  4. A layout algorithm uses the spatial information from all the bounding boxes of the predicted elements to generate a grid structure that accommodates all.
  5. An HTML generation engine uses all these pieces of information to generate an HTML markup code reflecting the result.
  6. <A href=”https://github.com/Microsoft/ailab/tree/master/Sketch2Code”>Sketch2Code Github</a>

AI sucks at stopping online trolls spewing toxic comments

A group of researchers from Aalto University and the University of Padua found this out when they tested seven state-of-the-art models used to detect hate speech. All of them failed to recognize foul language when subtle changes were made, according to a paper [PDF] on arXiv.

Adversarial examples can be created automatically by using algorithms to misspell certain words, swap characters for numbers or add random spaces between words or attach innocuous words such as ‘love’ in sentences.

The models failed to pick up on adversarial examples and successfully evaded detection. These tricks wouldn’t fool humans, but machine learning models are easily blindsighted. They can’t readily adapt to new information beyond what’s been spoonfed to them during the training process.

“They perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech,” the paper’s abstract states.

Source: AI sucks at stopping online trolls spewing toxic comments • The Register

​Google just put an AI in charge of keeping its data centers cool

Google is putting an artificial intelligence system in charge of its data center cooling after the system proved it could cut energy use.

Now Google and its AI company DeepMind are taking the project further; instead of recommendations being implemented by human staff, the AI system is directly controlling cooling in the data centers that run services including Google Search, Gmail and YouTube.

“This first-of-its-kind cloud-based control system is now safely delivering energy savings in multiple Google data centers,” Google said.

Data centers use vast amount of energy and as the demand for cloud computing rises even small tweaks to areas like cooling can produce significant time and cost savings. Google’s decision to use its own DeepMind-created system is also a good plug for its AI business.

Every five minutes, the AI pulls a snapshot of the data center cooling system from thousands of sensors. This data is fed into deep neural networks, which predict how different choices will affect future energy consumption.

The AI system then identifies tweaks that could reduce energy consumption, which are then sent back to the data center, checked by the local control system and implemented.

Google said giving the AI more responsibility came at the request of its data center operators who said that implementing the recommendations from the AI system required too much effort and supervision.

“We wanted to achieve energy savings with less operator overhead. Automating the system enabled us to implement more granular actions at greater frequency, while making fewer mistakes,” said Google data center operator Dan Fuenffinger.

Source: ​Google just put an AI in charge of keeping its data centers cool | ZDNet

How AI Can Spot Exam Cheats and Raise Standards

AI is being deployed by those who set and mark exams to reduce fraud — which remains overall a small problem — and to create far greater efficiencies in preparation and marking, and to help improve teaching and studying. From a report, which may be paywalled: From traditional paper-based exam and textbook producers such as Pearson, to digital-native companies such as Coursera, online tools and artificial intelligence are being developed to reduce costs and enhance learning. For years, multiple-choice tests have allowed scanners to score results without human intervention. Now technology is coming directly into the exam hall. Coursera has patented a system to take images of students and verify their identity against scanned documents. There are plagiarism detectors that can scan essay answers and search the web — or the work of other students — to identify copying. Webcams can monitor exam locations to spot malpractice. Even when students are working, they provide clues that can be used to clamp down on cheats. They leave electronic “fingerprints” such as keyboard pressure, speed and even writing style. Emily Glassberg Sands, Cousera’s head of data science, says: “We can validate their keystroke signatures. It’s difficult to prepare for someone hell-bent on cheating, but we are trying every way possible.”

Source: How AI Can Spot Exam Cheats and Raise Standards – Slashdot

Oi, clickbait cop bot, jam this in your neural net: Hot new AI threatens to DESTROY web journos

Artificial intelligent software has been trained to detect and flag up clickbait headlines.

And here at El Reg we say thank God Larry Wall for that. What the internet needs right now is software to highlight and expunge dodgy article titles about space alien immigrants, faked moon landings, and the like.

Machine-learning eggheads continue to push the boundaries of natural language processing, and have crafted a model that can, supposedly, detect how clickbait-y a headline really is.

The system uses a convolutional neural network that converts the words in a submitted article title into vectors. These numbers are fed into a long-short-term memory network that spits out a score based on the headline’s clickbait strength. About eight times out of ten it agreed with humans on whether a title was clickbaity or not, we’re told.

The trouble is, what exactly is a clickbait headline? It’s a tough question. The AI’s team – from the International Institute of Information Technology in Hyderabad, the Manipal Institute of Technology, and Birla Institute of Technology, in India – decided to rely on the venerable Merriam-Webster dictionary to define clickbait.

Source: Oi, clickbait cop bot, jam this in your neural net: Hot new AI threatens to DESTROY web journos • The Register

Windows 10 now uses machine learning to stop updates installing when a PC is in use

One of the more frustrating aspects of Windows 10 is the operating system’s ability to start installing updates when you’re in the middle of using it. While Microsoft has tried to address this aggressive approach to updates with features to snooze installation, Windows 10 users continue to complain that updates reboot devices when they’re in use.

Reacting to this feedback, Microsoft says it’s aware of the issues. “We heard you, and to alleviate this pain, if you have an update pending we’ve updated our reboot logic to use a new system that is more adaptive and proactive,” explains Microsoft’s Windows Insider chief Dona Sarkar. Microsoft says it has trained a “predictive model” that will accurately predict when the best time to restart the device is thanks to machine learning. “We will not only check if you are currently using your device before we restart, but we will also try to predict if you had just left the device to grab a cup of coffee and return shortly after,” says Sarkar.

Microsoft has been testing this new model internally, and says it has seen “promising results.”

Source: Windows 10 now uses machine learning to stop updates installing when a PC is in use – The Verge

Yet another great reason to not use Windows 10

AI can untangle the jumble of neurons packed in brain scans

AI can help neurologists automatically map the connections between different neurons in brain scans, a tedious task that can take hundreds and thousands of hours.

In a paper published in Nature Methods, AI researchers from Google collaborated with scientists from the Max Planck Institute of Neurobiology to inspect the brain of a Zebra Finch, a small Australian bird renowned for its singing.

Although the contents of their craniums are small, Zebra Finches aren’t birdbrains, their connectome* is densely packed with neurons. To study the connections, scientists study a slice of the brain using an electron microscope. It requires high resolution to make out all the different neurites, the nerve cells extending from neurons.

The neural circuits then have to be reconstructed by tracing out the cells. There are several methods that help neurologists flesh these out, but the error rates are high and it still requires human expertise to look over the maps. It’s a painstaking chore, a cubic millimetre of brain tissue can generate over 1,000 terabytes of data.

“A recent estimate put the amount of human labor needed to reconstruct a 1003-µm3 volume at more than 100,000 h, even with an optimized pipeline,” according to the paper.

Now, AI researchers have developed a new method using a recurrent convolutional neural network known as a “flood-filling network”. It’s essentially an algorithm that finds the edges of a neuron path and fleshes out the space in between to build up a map of the different connections.

Here’s a video showing how they work.

“The algorithm is seeded at a specific pixel location and then iteratively “fills” a region using a recurrent convolutional neural network that predicts which pixels are part of the same object as the seed,” said Viren Jain and Michal Januszewski, co-authors of the paper and AI researchers at Google.

The flood-filling network was trained using supervised learning on a small region of a Zebra Finch brain complete with annotations. It’s difficult to measure the accuracy of the network, and instead the researchers use a “expected run length” (ERL) metric that measures how far it can trace out a neuron before making a mistake.

Flood-filling networks have a longer ERL than other deep learning methods that have also been tested on the same dataset. The algorithms were better than humans at identifying dendritic spines, tiny threads jutting off dendrites that help transmit electrical signals to cells. But the level of recall, a property measuring the completeness of the map, was much lower than data collected by a professional neurologist.

Another significant disadvantage of this approach is the high computational cost. “For example, a single pass of the fully convolutional FFN over a full volume is an order of magnitude more computationally expensive than the more traditional 3D convolution-pooling architecture in the baseline approach we used for comparison,” the researchers said.

Source: AI can untangle the jumble of neurons packed in brain scans • The Register

AI plus a chemistry robot finds all the reactions that will work

Lee Cronin, the researcher who organized the work, was kind enough to send along an image of the setup, which looks nothing like our typical conception of a robot (the researchers refer to it as “bespoke”). Most of its parts are dispersed through a fume hood, which ensures safe ventilation of any products that somehow escape the system. The upper right is a collection of tanks containing starting materials and pumps that send them into one of six reaction chambers, which can be operated in parallel.

The robot in question. MS = Mass Spectrometer; IR = Infrared Spectrometer.
Enlarge / The robot in question. MS = Mass Spectrometer; IR = Infrared Spectrometer.
Lee Cronin

The outcomes of these reactions can then be sent on for analysis. Pumps can feed samples into an IR spectrometer, a mass spectrometer, and a compact NMR machine—the latter being the only bit of equipment that didn’t fit in the fume hood. Collectively, these can create a fingerprint of the molecules that occupy a reaction chamber. By comparing this to the fingerprint of the starting materials, it’s possible to determine whether a chemical reaction took place and infer some things about its products.

All of that is a substitute for a chemist’s hands, but it doesn’t replace the brains that evaluate potential reactions. That’s where a machine-learning algorithm comes in. The system was given a set of 72 reactions with known products and used those to generate predictions of the outcomes of further reactions. From there, it started choosing reactions at random from the remaining list of options and determining whether they, too, produced products. By the time the algorithm had sampled 10 percent of the total possible reactions, it was able to predict the outcome of untested reactions with more than 80-percent accuracy.

And, since the earlier reactions it tested were chosen at random, the system wasn’t biased by human expectations of what reactions would or wouldn’t work.

Once it had built a model, the system was set up to evaluate which of the remaining possible reactions was most likely to produce products and prioritize testing those. The system could continue on until it reached a set number of reactions, stop after a certain number of tests no longer produced products, or simply go until it tested every possible reaction.

Neural networking

Not content with this degree of success, the research team went on to add a neural network that was provided with data from the research literature on the yield of a class of reactions that links two hydrocarbon chains. After training on nearly 3,500 reactions, the system had an error of only 11 percent when predicting the yield on another 1,700 reactions from the literature.

This system was then integrated with the existing test setup and set loose on reactions that hadn’t been reported in the literature. This allowed the system to prioritize not only by whether the reaction was likely to make a product but also how much of the product would be produced by the reaction.

All this, on its own, is pretty impressive. As the authors put it, “by realizing only 10 percent of the total number of reactions, we can predict the outcomes of the remaining 90 percent without needing to carry out the experiments.” But the system also helped them identify a few surprises—cases where the fingerprint of the reaction mix suggested that the product was something more than a simple combination of starting materials. These reactions were explored further by actual human chemists, who identified both ring-breaking and ring-forming reactions this way.

That last aspect really goes a long way toward explaining how this sort of capability will fit into future chemistry labs. People tend to think of robots as replacing humans. But in this context, the robots are simply taking some of the drudgery away from humans. No sane human would ever consider trying every possible combination of reactants to see what they’d do, and humans couldn’t perform the testing 24 hours a day without dangerous levels of caffeine anyway. The robots will also be good at identifying the rare cases where highly trained intuitions turn out to lead us astray about the utility of trying some reactions.

Source: AI plus a chemistry robot finds all the reactions that will work | Ars Technica