ETSI launches specification group on Securing Artificial Intelligence

ETSI is pleased to announce the creation of a new Industry Specification Group on Securing Artificial Intelligence (ISG SAI). The group will develop technical specifications to mitigate threats arising from the deployment of AI throughout multiple ICT-related industries. This includes threats to artificial intelligence systems from both conventional sources and other AIs.

The ETSI Securing Artificial Intelligence group was initiated to anticipate that autonomous mechanical and computing entities may make decisions that act against the relying parties either by design or as a result of malicious intent. The conventional cycle of networks risk analysis and countermeasure deployment represented by the Identify-Protect-Detect-Respond cycle needs to be re-assessed when an autonomous machine is involved.

The intent of the ISG SAI is therefore to address 3 aspects of artificial intelligence in the standards domain:

  • Securing AI from attack e.g. where AI is a component in the system that needs defending
  • Mitigating against AI e.g. where AI is the ‘problem’ or is used to improve and enhance other more conventional attack vectors
  • Using AI to enhance security measures against attack from other things e.g. AI is part of the ‘solution’ or is used to improve and enhance more conventional countermeasures.

The purpose of the ETSI ISG SAI is to develop the technical knowledge that acts as a baseline in ensuring that artificial intelligence is secure. Stakeholders impacted by the activity of ETSI’s group include end users, manufacturers, operators and governments.

Source: ETSI – ETSI launches specification group on Securing Artificial Intelligence

AI equal with human experts in medical diagnosis with images, study finds

Artificial intelligence is on a par with human experts when it comes to making medical diagnoses based on images, a review has found.

The potential for artificial intelligence in healthcare has caused excitement, with advocates saying it will ease the strain on resources, free up time for doctor-patient interactions and even aid the development of tailored treatment. Last month the government announced £250m of funding for a new NHS artificial intelligence laboratory.

However, experts have warned the latest findings are based on a small number of studies, since the field is littered with poor-quality research.

One burgeoning application is the use of AI in interpreting medical images – a field that relies on deep learning, a sophisticated form of machine learning in which a series of labelled images are fed into algorithms that pick out features within them and learn how to classify similar images. This approach has shown promise in diagnosis of diseases from cancers to eye conditions.

However questions remain about how such deep learning systems measure up to human skills. Now researchers say they have conducted the first comprehensive review of published studies on the issue, and found humans and machines are on a par.

Prof Alastair Denniston, at the University Hospitals Birmingham NHS foundation trust and a co-author of the study, said the results were encouraging but the study was a reality check for some of the hype about AI.

Dr Xiaoxuan Liu, the lead author of the study and from the same NHS trust, agreed. “There are a lot of headlines about AI outperforming humans, but our message is that it can at best be equivalent,” she said.

Writing in the Lancet Digital Health, Denniston, Liu and colleagues reported how they focused on research papers published since 2012 – a pivotal year for deep learning.

An initial search turned up more than 20,000 relevant studies. However, only 14 studies – all based on human disease – reported good quality data, tested the deep learning system with images from a separate dataset to the one used to train it, and showed the same images to human experts.

The team pooled the most promising results from within each of the 14 studies to reveal that deep learning systems correctly detected a disease state 87% of the time – compared with 86% for healthcare professionals – and correctly gave the all-clear 93% of the time, compared with 91% for human experts.

However, the healthcare professionals in these scenarios were not given additional patient information they would have in the real world which could steer their diagnosis.

Prof David Spiegelhalter, the chair of the Winton centre for risk and evidence communication at the University of Cambridge, said the field was awash with poor research.

“This excellent review demonstrates that the massive hype over AI in medicine obscures the lamentable quality of almost all evaluation studies,” he said. “Deep learning can be a powerful and impressive technique, but clinicians and commissioners should be asking the crucial question: what does it actually add to clinical practice?”

Source: AI equal with human experts in medical diagnosis, study finds | Technology | The Guardian

No Bones about It: People Recognize Objects by Visualizing Their “Skeletons”

Humans effortlessly know that a tree is a tree and a dog is a dog no matter the size, color or angle at which they’re viewed. In fact, identifying such visual elements is one of the earliest tasks children learn. But researchers have struggled to determine how the brain does this simple evaluation. As deep-learning systems have come to master this ability, scientists have started to ask whether computers analyze data—and particularly images—similarly to the human brain. “The way that the human mind, the human visual system, understands shape is a mystery that has baffled people for many generations, partly because it is so intuitive and yet it’s very difficult to program” says Jacob Feldman, a psychology professor at Rutgers University.

A paper published in Scientific Reports in June comparing various object recognition models came to the conclusion that people do not evaluate an object like a computer processing pixels, but based on an imagined internal skeleton. In the study, researchers from Emory University, led by associate professor of psychology Stella Lourenco, wanted to know if people judged object similarity based on the objects’ skeletons—an invisible axis below the surface that runs through the middle of the object’s shape. The scientists generated 150 unique three-dimensional shapes built around 30 different skeletons and asked participants to determine whether or not two of the objects were the same. Sure enough, the more similar the skeletons were, the more likely participants were to label the objects as the same. The researchers also compared how well other models, such as neural networks (artificial intelligence–based systems) and pixel-based evaluations of the objects, predicted people’s decisions. While the other models matched performance on the task relatively well, the skeletal model always won.

“There’s a big emphasis on deep neural networks for solving these problems [of object recognition]. These are networks that require lots and lots of training to even learn a single object category, whereas the model that we investigated, a skeletal model, seems to be able to do this without this experience,” says Vladislav Ayzenberg, a doctoral student in Lourenco’s lab. “What our results show is that humans might be able to recognize objects by their internal skeletons, even when you compare skeletal models to these other well-established neural net models of object recognition.”

Next, the researchers pitted the skeletal model against other models of shape recognition, such as ones that focus on the outline. To do so, Ayzenberg and Lourenco manipulated the objects in certain ways, such as shifting the placement of an arm in relation to the rest of the body or changing how skinny, bulging, or wavy the outlines were. People once again judged the objects as being similar based on their skeletons, not their surface qualities.

Source: No Bones about It: People Recognize Objects by Visualizing Their “Skeletons” – Scientific American

New Data Science Cheat Sheet, by Maverick Lin

Below is an extract of a 10-page cheat sheet about data science, compiled by Maverick Lin. This cheatsheet is currently a reference in data science that covers basic concepts in probability, statistics, statistical learning, machine learning, deep learning, big data frameworks and SQL. The cheatsheet is loosely based off of The Data Science Design Manual by Steven S. Skiena and An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Inspired by William Chen’s The Only Probability Cheatsheet You’ll Ever Need, located here.

Full cheat sheet available here as a PDF document. Originally posted here. The screenshot below is an extract.

For related cheat cheats (machine learning, deep learning and so on) follow this link.

Source: New Data Science Cheat Sheet, by Maverick Lin – Data Science Central

Scammer Successfully Deepfaked CEO’s Voice To Fool Underling Into Transferring $243,000

The CEO of an energy firm based in the UK thought he was following his boss’s urgent orders in March when he transferred funds to a third-party. But the request actually came from the AI-assisted voice of a fraudster.

The Wall Street Journal reports that the mark believed he was speaking to the CEO of his businesses’ parent company based in Germany. The German-accented caller told him to send €220,000 ($243,000 USD) to a Hungarian supplier within the hour. The firm’s insurance company, Euler Hermes Group SA, shared information about the crime with WSJ but would not reveal the name of the targeted businesses.

Euler Hermes fraud expert Rüdiger Kirsch told WSJ that the victim recognized his superior’s voice because it had a hint of a German accent and the same “melody.” This was reportedly the first time Euler Hermes has dealt with clients being affected by crimes that used AI mimicry.

Source: Scammer Successfully Deepfaked CEO’s Voice To Fool Underling Into Transferring $243,000

IBM open sources Adverserial Robustness 360 toolbox for AI

This is a library dedicated to adversarial machine learning. Its purpose is to allow rapid crafting and analysis of attacks and defense methods for machine learning models. ART provides an implementation for many state-of-the-art methods for attacking and defending classifiers.

Documentation for ART: https://adversarial-robustness-toolbox.readthedocs.io

https://github.com/IBM/adversarial-robustness-toolbox

IBM releases AI Fairness 360 tool open source

The AI Fairness 360 toolkit is an open-source library to help detect and remove bias in machine learning models. The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

The AI Fairness 360 interactive experience provides a gentle introduction to the concepts and capabilities. The tutorials and other notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.

Being a comprehensive set of capabilities, it may be confusing to figure out which metrics and algorithms are most appropriate for a given use case. To help, we have created some guidance material that can be consulted.

https://github.com/IBM/AIF360

IBM releases AI Explainability tools

The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models. The AI Explainability 360 Python package includes a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.

The AI Explainability 360 interactive experience provides a gentle introduction to the concepts and capabilities by walking through an example use case for different consumer personas. The tutorials and example notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.

There is no single approach to explainability that works best. There are many ways to explain: data vs. model, directly interpretable vs. post hoc explanation, local vs. global, etc. It may therefore be confusing to figure out which algorithms are most appropriate for a given use case. To help, we have created some guidance material and a chart that can be consulted.

Github link

Google  Neural net can spot breast, prostate tumors through microscope

Google Health’s so-called augmented-reality microscope has proven surprisingly accurate at detecting and diagnosing cancerous tumors in real time.

The device is essentially a standard microscope decked out with two extra components: a camera, and a computer running AI software with an Nvidia Titan Xp GPU to accelerate the number crunching. The camera continuously snaps images of body tissue placed under microscope, and passes these images to a convolutional neural network on the computer to analyze. In return, the neural net spits out, in real time allegedly, a heatmap of the cells in the image, labeling areas that are benign and abnormal on the screen for doctors to inspect.

Google’s eggheads tried using the device to detect the presence of cancer in samples of breast and prostate cells. The algorithms had a performance score of 0.92 when detecting cancerous lymph nodes in breast cancer and 0.93 for prostate cancer, with one being a perfect score, so it’s not too bad for what they describe as a proof of concept.

Details of the microscope system have been described in a paper published in Nature this week. The training data for breast cancer was taken from here, and here for prostate cancer. Some of the training data was reserved for inference testing.

The device is a pretty challenging system to build: it requires a processing pipeline that can handle, on the fly, microscope snaps that are high resolution enough to capture details at the cellular level. The size of the images used in this experiment measure 5,120 × 5,120 pixels. That’s much larger than what’s typically used for today’s deep learning algorithms, which have millions of parameters and require billions of floating-point operations just to process images as big as 300 pixels by 300 pixels.

Source: It’s official – Google AI gives you cancer …diagnosis in real time: Neural net can spot breast, prostate tumors • The Register

Mysterious, Ancient Radio Signals Keep Pelting Earth. Astronomers Designed an AI to Hunt Them Down..

Sudden shrieks of radio waves from deep space keep slamming into radio telescopes on Earth, spattering those instruments’ detectors with confusing data. And now, astronomers are using artificial intelligence to pinpoint the source of the shrieks, in the hope of explaining what’s sending them to Earth from — researchers suspect — billions of light-years across space.

Usually, these weird, unexplained signals are detected only after the fact, when astronomers notice out-of-place spikes in their data — sometimes years after the incident. The signals have complex, mysterious structures, patterns of peaks and valleys in radio waves that play out in just milliseconds. That’s not the sort of signal astronomers expect to come from a simple explosion, or any other one of the standard events known to scatter spikes of electromagnetic energy across space. Astronomers call these strange signals fast radio bursts (FRBs). Ever since the first one was uncovered in 2007, using data recorded in 2001, there’s been an ongoing effort to pin down their source. But FRBs arrive at random times and places, and existing human technology and observation methods aren’t well-primed to spot these signals.

Now, in a paper published July 4 in the journal Monthly Notices of the Royal Astronomical Society, a team of astronomers wrote that they managed to detect five FRBs in real time using a single radio telescope. [The 12 Strangest Objects in the Universe]

Wael Farah, a doctoral student at Swinburne University of Technology in Melbourne, Australia, developed a machine-learning system that recognized the signatures of FRBs as they arrived at the University of Sydney’s Molonglo Radio Observatory, near Canberra. As Live Science has previously reported, many scientific instruments, including radio telescopes, produce more data per second than they can reasonably store. So they don’t record anything in the finest detail except their most interesting observations.

Farah’s system trained the Molonglo telescope to spot FRBs and switch over to its most detailed recording mode, producing the finest records of FRBs yet.

Based on their data, the researchers predicted that between 59 and 157 theoretically detectable FRBs splash across our skies every day. The scientists also used the immediate detections to hunt for related flares in data from X-ray, optical and other radio telescopes — in hopes of finding some visible event linked to the FRBs — but had no luck.

Their research showed, however, that one of the most peculiar (and frustrating, for research purposes) traits of FRBs appears to be real: The signals, once arriving, never repeat themselves. Each one appears to be a singular event in space that will never happen again.

Source: Mysterious, Ancient Radio Signals Keep Pelting Earth. Astronomers Designed an AI to Hunt Them Down. | Live Science

AI system ‘should be recognised as inventor’

An artificial intelligence system should be recognised as the inventor of two ideas in patents filed on its behalf, a team of academics says.

The AI has designed interlocking food containers that are easy for robots to grasp and a warning light that flashes in a rhythm that is hard to ignore.

Patents offices insist innovations are attributed to humans – to avoid legal complications that would arise if corporate inventorship were recognised.

The academics say this is “outdated”.

And it could see patent offices refusing to assign any intellectual property rights for AI-generated creations.

As a result, two professors from the University of Surrey have teamed up with the Missouri-based inventor of Dabus AI to file patents in the system’s name with the relevant authorities in the UK, Europe and US.

‘Inventive act’

Dabus was previously best known for creating surreal art thanks to the way “noise” is mixed into its neural networks to help generate unusual ideas.

Unlike some machine-learning systems, Dabus has not been trained to solve particular problems.

Instead, it seeks to devise and develop new ideas – “what is traditionally considered the mental part of the inventive act”, according to creator Stephen Thaler

The first patent describes a food container that uses fractal designs to create pits and bulges in its sides. One benefit is that several containers can be fitted together more tightly to help them be transported safely. Another is that it should be easier for robotic arms to pick them up and grip them.

Image copyright Ryan Abbott
Image caption This diagram shows how a container’s shape could be based on fractals

The second describes a lamp designed to flicker in a rhythm mimicking patterns of neural activity that accompany the formation of ideas, making it more difficult to ignore.

Law professor Ryan Abbott told BBC News: “These days, you commonly have AIs writing books and taking pictures – but if you don’t have a traditional author, you cannot get copyright protection in the US.

“So with patents, a patent office might say, ‘If you don’t have someone who traditionally meets human-inventorship criteria, there is nothing you can get a patent on.’

“In which case, if AI is going to be how we’re inventing things in the future, the whole intellectual property system will fail to work.”

Instead, he suggested, an AI should be recognised as being the inventor and whoever the AI belonged to should be the patent’s owner, unless they sold it on.

However, Prof Abbott acknowledged lawmakers might need to get involved to settle the matter and that it could take until the mid-2020s to resolve the issue.

A spokeswoman for the European Patent Office indicated that it would be a complex matter.

“It is a global consensus that an inventor can only be a person who makes a contribution to the invention’s conception in the form of devising an idea or a plan in the mind,” she explained.

“The current state of technological development suggests that, for the foreseeable future, AI is… a tool used by a human inventor.

“Any change… [would] have implications reaching far beyond patent law, ie to authors’ rights under copyright laws, civil liability and data protection.

“The EPO is, of course, aware of discussions in interested circles and the wider public about whether AI could qualify as inventor.”

The UK’s Patents Act 1977 currently requires an inventor to be a person, but the Intellectual Property Office is aware of the issue.

“The government believes that AI technology could increase the UK’s GDP by 10% in the next decade, and the IPO is focused on responding to the challenges that come with this growth,” said a spokeswoman.

Source: AI system ‘should be recognised as inventor’ – BBC News

Humanitarian Data Exchange

The Humanitarian Data Exchange (HDX) is an open platform for sharing data across crises and organisations. Launched in July 2014, the goal of HDX is to make humanitarian data easy to find and use for analysis. Our growing collection of datasets has been accessed by users in over 200 countries and territories. Watch this video to learn more.

HDX is managed by OCHA’s Centre for Humanitarian Data, which is located in The Hague. OCHA is part of the United Nations Secretariat and is responsible for bringing together humanitarian actors to ensure a coherent response to emergencies. The HDX team includes OCHA staff and a number of consultants who are based in North America, Europe and Africa.

[…]

We define humanitarian data as:

  1. data about the context in which a humanitarian crisis is occurring (e.g., baseline/development data, damage assessments, geospatial data)
  2. data about the people affected by the crisis and their needs
  3. data about the response by organisations and people seeking to help those who need assistance.

HDX uses an open-source software called CKAN for our technical back-end. You can find all of our code on GitHub.

Source: Welcome – Humanitarian Data Exchange

How Facebook is Using Machine Learning to Map the World Population

When it comes to knowing where humans around the world actually live, resources come in varying degrees of accuracy and sophistication.

Heavily urbanized and mature economies generally produce a wealth of up-to-date information on population density and granular demographic data. In rural Africa or fast-growing regions in the developing world, tracking methods cannot always keep up, or in some cases may be non-existent.

This is where new maps, produced by researchers at Facebook, come in. Building upon CIESIN’s Gridded Population of the World project, Facebook is using machine learning models on high-resolution satellite imagery to paint a definitive picture of human settlement around the world. Let’s zoom in.

Connecting the Dots

Will all other details stripped away, human settlement can form some interesting patterns. One of the most compelling examples is Egypt, where 95% of the population lives along the Nile River. Below, we can clearly see where people live, and where they don’t.

View the full-resolution version of this map.

facebook population density egypt map

While it is possible to use a tool like Google Earth to view nearly any location on the globe, the problem is analyzing the imagery at scale. This is where machine learning comes into play.

Finding the People in the Petabytes

High-resolution imagery of the entire globe takes up about 1.5 petabytes of storage, making the task of classifying the data extremely daunting. It’s only very recently that technology was up to the task of correctly identifying buildings within all those images.

To get the results we see today, researchers used process of elimination to discard locations that couldn’t contain a building, then ranked them based on the likelihood they could contain a building.

process of elimination map

Facebook identified structures at scale using a process called weakly supervised learning. After training the model using large batches of photos, then checking over the results, Facebook was able to reach a 99.6% labeling accuracy for positive examples.

Why it Matters

An accurate picture of where people live can be a matter of life and death.

For humanitarian agencies working in Africa, effectively distributing aid or vaccinating populations is still a challenge due to the lack of reliable maps and population density information. Researchers hope that these detailed maps will be used to save lives and improve living conditions in developing regions.

For example, Malawi is one of the world’s least urbanized countries, so finding its 19 million citizens is no easy task for people doing humanitarian work there. These maps clearly show where people live and allow organizations to create accurate population density estimates for specific areas.

rural malawi population pattern map

Visit the project page for a full explanation and to access the full database of country maps.

Source: How Facebook is Using Machine Learning to Map the World Population

Meet the AI robots being used to help solve America’s recycling crisis

The way the robots work is simple. Guided by cameras and computer systems trained to recognize specific objects, the robots’ arms glide over moving conveyor belts until they reach their target. Oversized tongs or fingers with sensors that are attached to the arms snag cans, glass, plastic containers, and other recyclable items out of the rubbish and place them into nearby bins.

The robots — most of which have come online only within the past year — are assisting human workers and can work up to twice as fast. With continued improvements in the bots’ ability to spot and extract specific objects, they could become a formidable new force in the $6.6 billion U.S. industry.

Researchers like Lily Chin, a PhD. student at the Distributed Robotics Lab at MIT, are working to develop sensors for these robots that can improve their tactile capabilities and improve their sense of touch so they can determine plastic, paper and metal through their fingers. “Right now, robots are mostly reliant on computer vision, but they can get confused and make mistakes,” says Chin. “So now we want to integrate these new tactile capabilities.”

Denver-based AMP Robotics, is one of the companies on the leading edge of innovation in the field. It has developed software — a AMP Neuron platform that uses computer vision and machine learning — so robots can recognize different colors, textures, shapes, sizes and patterns to identify material characteristics so they can sort waste.

The robots are being installed at the Single Stream Recyclers plant in Sarasota, Florida and they will be able to pick 70 to 80 items a minute, twice as fast as humanly possible and with greater accuracy.

CNBC: trash seperating robot
Bulk Handling Systems Max-AI AQC-C robot
Bulk Handling Systems

“Using this technology you can increase the quality of the material and in some cases double or triple its resale value,” says AMP Robotics CEO Mantaya Horowitz. “Quality standards are getting stricter that’s why companies and researchers are working on high tech solutions.”

Source: Meet the robots being used to help solve America’s recycling crisis

Intellectual Debt (in AI): With Great Power Comes Great Ignorance

For example, aspirin was discovered in 1897, and an explanation of how it works followed in 1995. That, in turn, has spurred some research leads on making better pain relievers through something other than trial and error.

This kind of discovery — answers first, explanations later — I call “intellectual debt.” We gain insight into what works without knowing why it works. We can put that insight to use immediately, and then tell ourselves we’ll figure out the details later. Sometimes we pay off the debt quickly; sometimes, as with aspirin, it takes a century; and sometimes we never pay it off at all.

Be they of money or ideas, loans can offer great leverage. We can get the benefits of money — including use as investment to produce more wealth — before we’ve actually earned it, and we can deploy new ideas before having to plumb them to bedrock truth.

Indebtedness also carries risks. For intellectual debt, these risks can be quite profound, both because we are borrowing as a society, rather than individually, and because new technologies of artificial intelligence — specifically, machine learning — are bringing the old model of drug discovery to a seemingly unlimited number of new areas of inquiry. Humanity’s intellectual credit line is undergoing an extraordinary, unasked-for bump up in its limit.

[…]

Technical debt arises when systems are tweaked hastily, catering to an immediate need to save money or implement a new feature, while increasing long-term complexity. Anyone who has added a device every so often to a home entertainment system can attest to the way in which a series of seemingly sensible short-term improvements can produce an impenetrable rat’s nest of cables. When something stops working, this technical debt often needs to be paid down as an aggravating lump sum — likely by tearing the components out and rewiring them in a more coherent manner.

[…]

Machine learning has made remarkable strides thanks to theoretical breakthroughs, zippy new hardware, and unprecedented data availability. The distinct promise of machine learning lies in suggesting answers to fuzzy, open-ended questions by identifying patterns and making predictions.

[…]

Researchers have pointed out thorny problems of technical debt afflicting AI systems that make it seem comparatively easy to find a retiree to decipher a bank system’s COBOL. They describe how machine learning models become embedded in larger ones and then be forgotten, even as their original training data goes stale and their accuracy declines.

But machine learning doesn’t merely implicate technical debt. There are some promising approaches to building machine learning systems that in fact can offer some explanations — sometimes at the cost of accuracy — but they are the rare exceptions. Otherwise, machine learning is fundamentally patterned like drug discovery, and it thus incurs intellectual debt. It stands to produce answers that work, without offering any underlying theory. While machine learning systems can surpass humans at pattern recognition and predictions, they generally cannot explain their answers in human-comprehensible terms. They are statistical correlation engines — they traffic in byzantine patterns with predictive utility, not neat articulations of relationships between cause and effect. Marrying power and inscrutability, they embody Arthur C. Clarke’s observation that any sufficiently advanced technology is indistinguishable from magic.

But here there is no David Copperfield or Ricky Jay who knows the secret behind the trick. No one does. Machine learning at its best gives us answers as succinct and impenetrable as those of a Magic 8-Ball — except they appear to be consistently right. When we accept those answers without independently trying to ascertain the theories that might animate them, we accrue intellectual debt.

Source: Intellectual Debt: With Great Power Comes Great Ignorance

Waymo and DeepMind mimic evolution to develop a new, better way to train self-driving AI

The two worked together to bring a training method called Population Based Training (PBT for short) to bear on Waymo’s challenge of building better virtual drivers, and the results were impressive — DeepMind says in a blog post that using PBT decreased by 24% false positives in a network that identifies and places boxes around pedestrians, bicyclists and motorcyclists spotted by a Waymo vehicle’s many sensors. Not only that, but is also resulted in savings in terms of both training time and resources, using about 50% of both compared to standard methods that Waymo was using previously.

[…]

To step back a little, let’s look at what PBT even is. Basically, it’s a method of training that takes its cues from how Darwinian evolution works. Neural nets essentially work by trying something and then measuring those results against some kind of standard to see if their attempt is more “right” or more “wrong” based on the desired outcome

[…]

But all that comparative training requires a huge amount of resources, and sorting the good from the bad in terms of which are working out relies on either the gut feeling of individual engineers, or massive-scale search with a manual component involved where engineers “weed out” the worst performing neural nets to free up processing capabilities for better ones.

What DeepMind and Waymo did with this experiment was essentially automate that weeding, automatically killing the “bad” training and replacing them with better-performing spin-offs of the best-in-class networks running the task. That’s where evolution comes in, since it’s kind of a process of artificial natural selection.

Source: Waymo and DeepMind mimic evolution to develop a new, better way to train self-driving AI | TechCrunch

Wow, I hate when people actually write at you to read a sentence again (cut out for your mental wellness).

IBM gives cancer-killing drug AI projects to the open source community

Researchers from IBM’s Computational Systems Biology group in Zurich are working on AI and machine learning (ML) approaches to “help to accelerate our understanding of the leading drivers and molecular mechanisms of these complex diseases,” as well as methods to improve our knowledge of tumor composition.

“Our goal is to deepen our understanding of cancer to equip industries and academia with the knowledge that could potentially one day help fuel new treatments and therapies,” IBM says.

The first project, dubbed PaccMann — not to be confused with the popular Pac-Man computer game — is described as the “Prediction of anticancer compound sensitivity with Multi-modal attention-based neural networks.”

[…]

The ML algorithm exploits data on gene expression as well as the molecular structures of chemical compounds. IBM says that by identifying potential anti-cancer compounds earlier, this can cut the costs associated with drug development.

[…]

The second project is called “Interaction Network infErence from vectoR representATions of words,” otherwise known as INtERAcT. This tool is a particularly interesting one given its automatic extraction of data from valuable scientific papers related to our understanding of cancer.

With roughly 17,000 papers published every year in the field of cancer research, it can be difficult — if not impossible — for researchers to keep up with every small step we make in our understanding.

[…]

INtERAcT aims to make the academic side of research less of a burden by automatically extracting information from these papers. At the moment, the tool is being tested on extracting data related to protein-protein interactions — an area of study which has been marked as a potential cause of the disruption of biological processes in diseases including cancer.

[…]

The third and final project is “pathway-induced multiple kernel learning,” or PIMKL. This algorithm utilizes datasets describing what we currently know when it comes to molecular interactions in order to predict the progression of cancer and potential relapses in patients.

PIMKL uses what is known as multiple kernel learning to identify molecular pathways crucial for categorizing patients, giving healthcare professionals an opportunity to individualize and tailor treatment plans.

PaccMann and INtERAcT‘s code has been released and are available on the projects’ websites. PIMKL has been deployed on the IBM Cloud and the source code has also been released.

Source: IBM gives cancer-killing drug AI project to the open source community | ZDNet

But  now the big question: will they maintain it?

Machine learning has been used to automatically translate long-lost languages

Jiaming Luo and Regina Barzilay from MIT and Yuan Cao from Google’s AI lab in Mountain View, California. This team has developed a machine-learning system capable of deciphering lost languages, and they’ve demonstrated it by having it decipher Linear B—the first time this has been done automatically. The approach they used was very different from the standard machine translation techniques.

First some background. The big idea behind machine translation is the understanding that words are related to each other in similar ways, regardless of the language involved.

So the process begins by mapping out these relations for a specific language. This requires huge databases of text. A machine then searches this text to see how often each word appears next to every other word. This pattern of appearances is a unique signature that defines the word in a multidimensional parameter space. Indeed, the word can be thought of as a vector within this space. And this vector acts as a powerful constraint on how the word can appear in any translation the machine comes up with.

These vectors obey some simple mathematical rules. For example: king – man + woman = queen. And a sentence can be thought of as a set of vectors that follow one after the other to form a kind of trajectory through this space.

The key insight enabling machine translation is that words in different languages occupy the same points in their respective parameter spaces. That makes it possible to map an entire language onto another language with a one-to-one correspondence.

In this way, the process of translating sentences becomes the process of finding similar trajectories through these spaces. The machine never even needs to “know” what the sentences mean.

This process relies crucially on the large data sets. But a couple of years ago, a German team of researchers showed how a similar approach with much smaller databases could help translate much rarer languages that lack the big databases of text. The trick is to find a different way to constrain the machine approach that doesn’t rely on the database.

Now Luo and co have gone further to show how machine translation can decipher languages that have been lost entirely. The constraint they use has to do with the way languages are known to evolve over time.

The idea is that any language can change in only certain ways—for example, the symbols in related languages appear with similar distributions, related words have the same order of characters, and so on. With these rules constraining the machine, it becomes much easier to decipher a language, provided the progenitor language is known.  

Luo and co put the technique to the test with two lost languages, Linear B and Ugaritic. Linguists know that Linear B encodes an early version of ancient Greek and that Ugaritic, which was discovered  in 1929, is an early form of Hebrew.

Given that information and the constraints imposed by linguistic evolution, Luo and co’s machine is able to translate both languages with remarkable accuracy. “We were able to correctly translate 67.3% of Linear B cognates into their Greek equivalents in the decipherment scenario,” they say. “To the best of our knowledge, our experiment is the first attempt of deciphering Linear B automatically.”

That’s impressive work that takes machine translation to a new level. But it also raises the interesting question of other lost languages—particularly those that have never been deciphered, such as Linear A.

In this paper, Linear A is conspicuous by its absence. Luo and co do not even mention it, but it must loom large in their thinking, as it does for all linguists. Yet significant breakthroughs are still needed before this script becomes amenable to machine translation.

For example, nobody knows what language Linear A encodes. Attempts to decipher it into ancient Greek have all failed. And without the progenitor language, the new technique does not work.

But the big advantage of machine-based approaches is that they can test one language after another quickly without becoming fatigued. So it’s quite possible that Luo and co might tackle Linear A with a brute-force approach—simply attempt to decipher it into every language for which machine translation already operates.

 

Source: Machine learning has been used to automatically translate long-lost languages – MIT Technology Review

Good luck deleting someone’s private info from a trained neural network – it’s likely to bork the whole thing

AI systems have weird memories. The machines desperately cling onto the data they’ve been trained on, making it difficult to delete bits of it. In fact, they often have to be completely retrained from scratch with the newer, smaller dataset.

That’s no good in an age where individuals can request their personal data be removed from company databases under the EU GDPR rules. How do you remove a person’s data from a machine learning that has already been trained? A 2017 research paper by law and policy academics hinted that it may even be impossible.

“Deletion is difficult because most machine learning models are complex black boxes so it is not clear how a data point or a set of data point is really being used,” James Zou, an assistant professor of biomedical data science at Stanford University, told The Register.

In order to leave out specific data, models will often have to be retrained with the newer, smaller dataset. That’s a pain as it costs money and time.

The research, led by Antonio Ginart, a PhD student at Stanford University, studied the problem of trying to delete data in machine learning models and managed to craft two “provably deletion efficient algorithms” to remove data across six different datasets for k-means clustering models, a machine learning method to develop classifiers. The results have been released in a paper in arXiv this week.

The trick is to assess the impacts of deleting data from a trained model. In some cases, it can lead to a decrease in the system’s performance.

“First, quickly check to see if deleting a data point would have any effect on the machine learning model at all – there are settings where there’s no effect and so we can perform this check very efficiently. Second, see if the data to be deleted only affects some local component of the learning system and just update locally,” Zou explained.

It seems to work okay for k-means clustering models under certain circumstances, when the data can be more easily separated. But when it comes to systems that aren’t deterministic like modern deep learning models, it’s incredibly difficult to delete data.

Zou said it isn’t entirely impossible, however. “We don’t have tools just yet but we are hoping to develop these deletion tools in the next few months.” ®

Source: Good luck deleting someone’s private info from a trained neural network – it’s likely to bork the whole thing • The Register

‘Superhuman’ AI Crushes Poker Pros at Six-Player Texas Hold’em

Computer scientists have developed a card-playing bot, called Pluribus, capable of defeating some of the world’s best players at six-person no-limit Texas hold’em poker, in what’s considered an important breakthrough in artificial intelligence.

Two years ago, a research team from Carnegie Mellon University developed a similar poker-playing system, called Libratus, which consistently defeated the world’s best players at one-on-one Heads-Up, No-Limit Texas Hold’em poker. The creators of Libratus, Tuomas Sandholm and Noam Brown, have now upped the stakes, unveiling a new system capable of playing six-player no-limit Texas hold’em poker, a wildly popular version of the game.

In a series of contests, Pluribus handedly defeated its professional human opponents, at a level the researchers described as “superhuman.” When pitted against professional human opponents with real money involved, Pluribus managed to collect winnings at an astounding rate of $1,000 per hour. Details of this achievement were published today in Science.

[…]

For the new study, Brown and Sandholm subjected Pluribus to two challenging tests. The first pitted Pluribus against 13 different professional players—all of whom have earned more than $1 million in poker winnings—in the six-player version of the game. The second test involved matches featuring two poker legends, Darren Elia and Chris “Jesus” Ferguson, each of whom was pitted against five identical copies of Pluribus.

The matches with five humans and Pluribus involved 10,000 hands played over 12 days. To incentivize the human players, a total of $50,000 was distributed among the participants, Pluribus included. The games were blind in that none of the human players were told who they were playing, though each player had a consistent alias used throughout the competition. For the tests involving a lone human and five Pluribuses, each player was given $2,000 for participating and a bonus $2,000 for playing better than their human cohort. Elia and Ferguson both played 5,000 separate hands against their machine opponents.

In all scenarios, Pluribus registered wins with “statistical significance,” and to a degree the researchers referred to as “superhuman.”

“We mean superhuman in the sense that it performs better than the best humans,” said Brown, who is completing his Ph.D. as a research scientist at Facebook AI. “The bot won by about five big blinds per hundred hands of poker (bb/100) when playing against five elite human professionals, which professionals consider to be a very high win rate. To beat elite professionals by that margin is considered a decisive win.

[…]

Before the competition started, Pluribus developed its own “blueprint” strategy, which it did by playing poker with itself for eight straight days.

“Pluribus does not use any human gameplay data to form its strategy,” explained Brown. “Instead, Pluribus first uses self-play, in which it plays against itself over trillions of hands to formulate a basic strategy. It starts by playing completely randomly. As it plays more and more hands against itself, its strategy gradually improves as it learns which actions lead to winning more money. This is all done offline before ever playing against humans.”

Armed with its blueprint strategy, the competitions could begin. After the first bets were placed, Pluribus calculated several possible next moves for each opponent, in a manner similar to how machines play chess and Go. The difference here, however, is that Pluribus was not tasked to calculate the entire game, as that would be “computationally prohibitive,” as noted by the researchers.

“In Pluribus, we used a new way of doing search that doesn’t have to search all the way to the end of the game,” said Brown. “Instead, it can stop after a few moves. This makes the search algorithm much more scalable. In particular, it allows us to reach superhuman performance while only training for the equivalent of less than $150 on a cloud computing service, and playing in real time on just two CPUs.”

[…]

Importantly, Pluribus was also programmed to be unpredictable—a fundamental aspect of good poker gamesmanship. If Pluribus consistently bet tons of money when it figured it had the best hand, for example, its opponents would eventually catch on. To remedy this, the system was programmed to play in a “balanced” manner, employing a set of strategies, like bluffing, that prevented Pluribus’ opponents from picking up on its tendencies and habits.

Source: ‘Superhuman’ AI Crushes Poker Pros at Six-Player Texas Hold’em

AI Trained on Old Scientific Papers Makes Discoveries Humans Missed

In a study published in Nature on July 3, researchers from the Lawrence Berkeley National Laboratory used an algorithm called Word2Vec sift through scientific papers for connections humans had missed. Their algorithm then spit out predictions for possible thermoelectric materials, which convert heat to energy and are used in many heating and cooling applications.

The algorithm didn’t know the definition of thermoelectric, though. It received no training in materials science. Using only word associations, the algorithm was able to provide candidates for future thermoelectric materials, some of which may be better than those we currently use.

[…]

To train the algorithm, the researchers assessed the language in 3.3 million abstracts related to material science, ending up with a vocabulary of about 500,000 words. They fed the abstracts to Word2vec, which used machine learning to analyze relationships between words.

“The way that this Word2vec algorithm works is that you train a neural network model to remove each word and predict what the words next to it will be,” Jain said. “By training a neural network on a word, you get representations of words that can actually confer knowledge.”

Using just the words found in scientific abstracts, the algorithm was able to understand concepts such as the periodic table and the chemical structure of molecules. The algorithm linked words that were found close together, creating vectors of related words that helped define concepts. In some cases, words were linked to thermoelectric concepts but had never been written about as thermoelectric in any abstract they surveyed. This gap in knowledge is hard to catch with a human eye, but easy for an algorithm to spot.

After showing its capacity to predict future materials, researchers took their work back in time, virtually. They scrapped recent data and tested the algorithm on old papers, seeing if it could predict scientific discoveries before they happened. Once again, the algorithm worked.

In one experiment, researchers analyzed only papers published before 2009 and were able to predict one of the best modern-day thermoelectric materials four years before it was discovered in 2012.

This new application of machine learning goes beyond materials science. Because it’s not trained on a specific scientific dataset, you could easily apply it to other disciplines, retraining it on literature of whatever subject you wanted. Vahe Tshitoyan, the lead author on the study, says other researchers have already reached out, wanting to learn more.

“This algorithm is unsupervised and it builds its own connections,” Tshitoyan said. “You could use this for things like medical research or drug discovery. The information is out there. We just haven’t made these connections yet because you can’t read every article.”

Source: AI Trained on Old Scientific Papers Makes Discoveries Humans Missed – VICE

That this AI can simulate universes in 30ms is not the scary part. It’s that its creators don’t know why it works so well

The accuracy of the neural network is judged by how similar its outputs were to the ones created by two more traditional N-body simulation systems, FastPM and 2LPT, when all three are given the same inputs. When D3M was tasked with producing 1,000 simulations from 1,000 sets of input data, it had a relative error of 2.8 per cent compared to FastPM, and a 9.3 per cent compared to 2LPT for the same inputs. That’s not too bad, considering it takes the model just 30 milliseconds to crank out a simulation. Not only does that save time, but it’s also cheaper too since less compute power is needed.

To their surprise, the researchers also noticed that D3M seemed to be able to produce simulations of the universe from conditions that weren’t specifically included in the training data. During inference tests, the team tweaked input variables such as the amount of dark matter in the virtual universes, and the model still managed to spit out accurate simulations despite not being specifically trained for these changes.

“It’s like teaching image recognition software with lots of pictures of cats and dogs, but then it’s able to recognize elephants,” said Shirley Ho, first author of the paper and a group leader at the Flatiron Institute. “Nobody knows how it does this, and it’s a great mystery to be solved.

“We can be an interesting playground for a machine learner to use to see why this model extrapolates so well, why it extrapolates to elephants instead of just recognizing cats and dogs. It’s a two-way street between science and deep learning.”

The source code for the neural networks can be found here.

Source: That this AI can simulate universes in 30ms is not the scary part. It’s that its creators don’t know why it works so well • The Register

EU should ban AI-powered citizen scoring and mass surveillance, say experts

A group of policy experts assembled by the EU has recommended that it ban the use of AI for mass surveillance and mass “scoring of individuals”; a practice that potentially involves collecting varied data about citizens — everything from criminal records to their behavior on social media — and then using it to assess their moral or ethical integrity.

The recommendations are part of the EU’s ongoing efforts to establish itself as a leader in so-called “ethical AI.” Earlier this year, it released its first guidelines on the topic, stating that AI in the EU should be deployed in a trustworthy and “human-centric” manner.

The new report offers more specific recommendations. These include identifying areas of AI research that require funding; encouraging the EU to incorporate AI training into schools and universities; and suggesting new methods to monitor the impact of AI. However, the paper is only a set of recommendations at this point, and not a blueprint for legislation.

Notably, the suggestions that the EU should ban AI-enabled mass scoring and limit mass surveillance are some of the report’s relatively few concrete recommendations. (Often, the report’s authors simply suggest that further investigation is needed in this or that area.)

Source: EU should ban AI-powered citizen scoring and mass surveillance, say experts – The Verge