Engineers and researchers from Samsung’s AI Center in Moscow and Skolkovo Institute of Science and Technology have created a model that can generate realistic animated talking heads from images without relying on traditional methods, like 3D modeling.
[…]
“Effectively, the learned model serves as a realistic avatar of a person,” said engineer Egor Zakharov in a video explaining the results.
Such tech could clearly also be used to create deepfakes.
Few-shot learning means the model can begin to animate a face using just a few images of an individual, or even a single image. Meta training with the VoxCeleb2 data set of videos is carried out before the model can animate previously unseen faces.
During the training process, the system creates three neural networks: The embedded network maps frames to vectors, a generator network maps facial landmarks in the synthesized video, and a discriminator network assesses the realism and pose of the generated images.
DARPA, the US military research arm, has launched a program to train fighter jets to engage in aerial battle autonomously with the help of AI algorithms.
The Air Combat Evolution (ACE) program seeks to create military planes that are capable of performing combat maneuvers for dogfighting without the help of human pilots. Vehicles won’t be completely unmanned, however. DARPA is more interested in forging stronger teamwork between humans and machines.
The end goal is to have autonomous jet controls that can handle tasks like dodging out the way of enemy fire at lightning speeds, while the pilot takes on more difficult problems like executing strategic battle commands and firing off weapons.
“We envision a future in which AI handles the split-second maneuvering during within-visual-range dogfights, keeping pilots safer and more effective as they orchestrate large numbers of unmanned systems into a web of overwhelming combat effects,” said Lieutenant Colonel Dan Javorsek, ACE program manager.
It’s part of DARPA’s larger vision of “mosaic warfare.” The idea here is that combat is fought by a mixture of manned and unmanned systems working together. The hope is these unmanned systems can be rapidly developed, and are easily adaptable through technological upgrades so that they can help the military cope with changing conditions.
“Linking together manned aircraft with significantly cheaper unmanned systems creates a ‘mosaic’ where the individual ‘pieces’ can easily be recomposed to create different effects or quickly replaced if destroyed, resulting in a more resilient warfighting capability,” DARPA said in a statement.
The ACE program will initially focus on teaching AI in a similar way that new pilots are trained. Computer vision algorithms will learn basic battle maneuvers for close one-on-one combat. “Only after human pilots are confident that the AI algorithms are trustworthy in handling bounded, transparent and predictable behaviors will the aerial engagement scenarios increase in difficulty and realism,” Javorsek said.
“Following virtual testing, we plan to demonstrate the dogfighting algorithms on sub-scale aircraft leading ultimately to live, full-scale manned-unmanned team dogfighting with operationally representative aircraft.”
DARPA is welcoming R&D proposals from academics and companies for its program and will fund the effort. Successful candidates will engage in the “AlphaDogfight Trials,” where these AI-crafter fighter planes will test one another in a competition to find the best algorithm.
“Being able to trust autonomy is critical as we move toward a future of warfare involving manned platforms fighting alongside unmanned systems,” said Javorsek.
A new deep learning algorithm can generate high-resolution, photorealistic images of people — faces, hair, outfits, and all — from scratch.
The AI-generated models are the most realistic we’ve encountered, and the tech will soon be licensed out to clothing companies and advertising agencies interested in whipping up photogenic models without paying for lights or a catering budget. At the same time, similar algorithms could be misused to undermine public trust in digital media.
[…]
In a video showing off the tech, the AI morphs and poses model after model as their outfits transform, bomber jackets turning into winter coats and dresses melting into graphic tees.
Specifically, the new algorithm is a Generative Adversarial Network (GAN). That’s the kind of AI typically used to churn out new imitations of something that exists in the real world, whether they be video game levels or images that look like hand-drawn caricatures.
Why did the frog cross the road? Well, a new artificial intelligent (AI) agent that can play the classic arcade game Frogger not only can tell you why it crossed the road, but it can justify its every move in everyday language.
Developed by Georgia Tech, in collaboration with Cornell and the University of Kentucky, the work enables an AI agent to provide a rationale for a mistake or errant behavior, and to explain it in a way that is easy for non-experts to understand.
This, the researchers say, may help robots and other types of AI agents seem more relatable and trustworthy to humans. They also say their findings are an important step toward a more transparent, human-centered AI design that understands people’s preferences and prioritizes people’s needs.
“If the power of AI is to be democratized, it needs to be accessible to anyone regardless of their technical abilities,” said Upol Ehsan, Ph.D. student in the School of Interactive Computing at Georgia Tech and lead researcher.
“As AI pervades all aspects of our lives, there is a distinct need for human-centered AI design that makes black-boxed AI systems explainable to everyday users. Our work takes a formative step toward understanding the role of language-based explanations and how humans perceive them.”
The study was supported by the Office of Naval Research (ONR).
Researchers developed a participant study to determine if their AI agent could offer rationales that mimicked human responses. Spectators watched the AI agent play the videogame Frogger and then ranked three on-screen rationales in order of how well each described the AI’s game move.
Of the three anonymized justifications for each move – a human-generated response, the AI-agent response, and a randomly generated response – the participants preferred the human-generated rationales first, but the AI-generated responses were a close second.
Frogger offered the researchers the chance to train an AI in a “sequential decision-making environment,” which is a significant research challenge because decisions that the agent has already made influence future decisions. Therefore, explaining the chain of reasoning to experts is difficult, and even more so when communicating with non-experts, according to researchers.
[…]
By a 3-to-1 margin, participants favored answers that were classified in the “complete picture” category. Responses showed that people appreciated the AI thinking about future steps rather than just what was in the moment, which might make them more prone to making another mistake. People also wanted to know more so that they might directly help the AI fix the errant behavior.
[…]
The research was presented in March at the Association for Computing Machinery’s Intelligent User Interfaces 2019 Conference. The paper is titled Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions. Ehsan will present a position paper highlighting the design and evaluation challenges of human-centered Explainable AI systems at the upcoming Emerging Perspectives in Human-Centered Machine Learning workshop at the ACM CHI 2019 conference, May 4-9, in Glasgow, Scotland.
Electronic health records store valuable information about hospital patients, but they’re often sparse and unstructured, making them difficult for potentially labor- and time-saving AI systems to parse. Fortunately, researchers at New York University and Princeton have developed a framework that evaluates clinical notes (i.e., descriptions of symptoms, reasons for diagnoses, and radiology results) and autonomously assigns a risk score indicating whether patients will be readmitted within 30 days. They claim that the code and model parameters, which are publicly available on Github, handily outperform baselines.
“Accurately predicting readmission has clinical significance both in terms of efficiency and reducing the burden on intensive care unit doctors,” the paper’s authors wrote. “One estimate puts the financial burden of readmission at $17.9 billion dollars and the fraction of avoidable admissions at 76 percent.”
OpenAI, a leading machine-learning lab, has launched for-profit spin-off OpenAI LP – so it can put investors’ cash toward the expensive task of building artificial general intelligence.
The San-Francisco-headquartered organisation was founded in late 2015 as a nonprofit, with a mission to build, and encourage the development of, advanced neural network systems that are safe and beneficial to humanity.
It was backed by notable figures including killer-AI-fearing Elon Musk, who has since left the board, and Sam Altman, the former president of Silicon Valley VC firm Y Combinator. Altman stepped down from as YC president last week to focus more on OpenAI.
Altman is now CEO of OpenAI LP. Greg Brockman, co-founder and CTO, and Ilya Sutskever, co-founder and chief scientist, are also heading over to the commercial side and keeping their roles in the new organization. OpenAI LP stated it clearly it wants to “raise investment capital and attract employees with startup-like equity.”
There is still a nonprofit wing, imaginatively named OpenAI Nonprofit, though it is a much smaller entity considering most of its hundred or so employees have switched over to the commercial side, OpenAI LP, to reap the benefits its stock options.
“We’ve experienced firsthand that the most dramatic AI systems use the most computational power in addition to algorithmic innovations, and decided to scale much faster than we’d planned when starting OpenAI,” the lab’s management said in a statement this week. “We’ll need to invest billions of dollars in upcoming years into large-scale cloud compute, attracting and retaining talented people, and building AI supercomputers.”
OpenAI refers to this odd split between OpenAI LP and OpenAI Nonprofit as a “capped-profit” company. The initial round of investors, including LinkedIn cofounder Reid Hoffman and Khosla Ventures, are in line to receive 100 times the amount they’ve invested from OpenAI LP’s profits, if everything goes to plan. Any excess funds afterwards will be handed over to the non-profit side. In order to pay back these early investors, and then some, OpenAI LP will have to therefore find ways to generate fat profits from its technologies.
The reaction to the “capped-profit” model has raised eyebrows. Several machine-learning experts told The Register they were somewhat disappointed by OpenAI’s decision. It once stood out among other AI orgs for its nonprofit status, its focus on developing machine-learning know-how independent of profit and product incentives, and its dedication to open-source research.
Now, for some, it appears to be just another profit-driven Silicon Valley startup stocked with well-paid engineers and boffins.
Generative AI models have a propensity for learning complex data distributions, which is why they’re great at producing human-like speech and convincing images of burgers and faces. But training these models requires lots of labeled data, and depending on the task at hand, the necessary corpora are sometimes in short supply.
The solution might lie in an approach proposed by researchers at Google and ETH Zurich. In a paper published on the preprint server Arxiv.org (“High-Fidelity Image Generation With Fewer Labels“), they describe a “semantic extractor” that can pull out features from training data, along with methods of inferring labels for an entire training set from a small subset of labeled images. These self- and semi-supervised techniques together, they say, can outperform state-of-the-art methods on popular benchmarks like ImageNet.
“In a nutshell, instead of providing hand-annotated ground truth labels for real images to the discriminator, we … provide inferred ones,” the paper’s authors explained.
In one of several unsupervised methods the researchers posit, they first extract a feature representation — a set of techniques for automatically discovering the representations needed for raw data classification — on a target training dataset using the aforementioned feature extractor. They then perform cluster analysis — i.e., grouping the representations in such a way that those in the same group share more in common than those in other groups. And lastly, they train a GAN — a two-part neural network consisting of generators that produce samples and discriminators that attempt to distinguish between the generated samples and real-world samples — by inferring labels.
Google today introduced TensorFlow Lite 1.0, its framework for developers deploying AI models on mobile and IoT devices. Improvements include selective registration and quantization during and after training for faster, smaller models. Quantization has led to 4 times compression of some models.
“We are going to fully support it. We’re not going to break things and make sure we guarantee its compatibility. I think a lot of people who deploy this on phones want those guarantees,” TensorFlow engineering director Rajat Monga told VentureBeat in a phone interview.
The TensorFlow Lite team at Google also shared its roadmap for the future today, designed to shrink and speed up AI models for edge deployment, including things like model acceleration, especially for Android developers using neural nets, as well as a Keras-based connecting pruning kit and additional quantization enhancements.
Other changes on the way:
Support for control flow, which is essential to the operation of models like recurrent neural networks
CPU performance optimization with Lite models, potentially involving partnerships with other companies
Expand coverage of GPU delegate operations and finalize the API to make it generally available
A TensorFlow 2.0 model converter to make Lite models will be made available for developers to better understand how things wrong in the conversion process and how to fix it.
TensorFlow Lite is deployed by more than two billion devices today, TensorFlow Lite engineer Raziel Alvarez said onstage at the TensorFlow Dev Summit being held at Google offices in Sunnyvale, California.
TensorFlow Lite increasingly makes TensorFlow Mobile obsolete, except for users who want to utilize it for training, but a solution is in the works, Alvarez said.
Wind power has become increasingly popular, but its success is limited by the fact that wind comes and goes as it pleases, making it hard for power grids to count on the renewable energy and less likely to fully embrace it. While we can’t control the wind, Google has an idea for the next best thing: using machine learning to predict it.
Google and DeepMind have started testing machine learning on Google’s own wind turbines, which are part of the company’s renewable energy projects. Beginning last year, they fed weather forecasts and existing turbine data into DeepMind’s machine learning platform, which churned out wind power predictions 36 hours ahead of actual power generation. Google could then make supply commitments to power grids a full day before delivery. That predictability makes it easier and more appealing for energy grids to depend on wind power, and as a result, it boosted the value of Google’s wind energy by roughly 20 percent.
IBM Watson Anywhere is built on top of Kubernetes, the open source orchestration engine that can be deployed in diverse environments. Since the Watson Anywhere platform is built as a set of microservices designed to run on Kubernetes, it is flexible and portable.
[…]
According to IBM, the microservices-based Watson Anywhere delivers two solutions –
Watson OpenScale: IBM’s open AI platform for managing multiple instances of AI, no matter where they were developed – including the ability to explain how AI decisions are being made in real time, for greater transparency and compliance.
Watson Assistant: IBM’s AI tool for building conversational interfaces into applications and devices. More advanced than a traditional chatbot, Watson Assistant intelligently determines when to search for a result, when to ask the user for clarification, and when to offload the user to a human for personal assistance. Also, the Watson Assistant Discovery Extension enables organizations to unlock hidden insights in unstructured data and documents.
IBM Cloud Private for Data is an extension of the hybrid cloud focused on data and analytics. According to IBM, it simplifies and unifies how customers collect, organize and analyze data to accelerate the value of data science and AI. The multi-cloud platform delivers a broad range of core data microservices, with the option to add more from a growing services catalog.
IBM Watson Anywhere is seamlessly integrated with Cloud Private for Data. The combination enables customers to manage end-to-end data workflows to help ensure that data is easily accessible for AI.
At a glance, the images featured on the website This Person Does Not Exist might seem like random high school portraits or vaguely inadvisable LinkedIn headshots. But every single photo on the site has been created by using a special kind of artificial intelligence algorithm called generative adversarial networks (GANs).
Every time the site is refreshed, a shockingly realistic — but totally fake —picture of a person’s face appears. Uber software engineer Phillip Wang created the page to demonstrate what GANs are capable of, and then posted it to the public Facebook group “Artificial Intelligence & Deep Learning” on Tuesday.
The underlying code that made this possible, titled StyleGAN, was written by Nvidia and featured in a paper that has yet to be peer-reviewed. This exact type of neural network has the potential to revolutionize video game and 3D-modeling technology, but, as with almost any kind of technology, it could also be used for more sinister purposes. Deepfakes, or computer-generated images superimposed on existing pictures or videos, can be used to push fake news narratives or other hoaxes. That’s precisely why Wang chose to create the mesmerizing but also chilling website.
a Beijing-based online education start-up has developed an artificial intelligence-powered maths app that can check children’s arithmetic problems through the simple snap of a photo. Based on the image and its internal database, the app automatically checks whether the answers are right or wrong.
Known as Xiaoyuan Kousuan, the free app launched by the Tencent Holdings-backed online education firm Yuanfudao, has gained increasing popularity in China since its launch a year ago and claims to have checked an average of 70 million arithmetic problems per day, saving users around 40,000 hours of time in total.
Yuanfudao is also trying to build the country’s biggest education-related database generated from the everyday experiences of real students. Using this, the six-year-old company – which has a long line of big-name investors including Warburg Pincus, IDG Capital and Matrix Partners China – aims to reinvent how children are taught in China.
“By checking nearly 100 million problems every day, we have developed a deep understanding of the kind of mistakes students make when facing certain problems,” said Li Xin, co-founder of Yuanfudao – which means “ape tutor” in Chinese – in a recent interview. “The data gathered through the app can serve as a pillar for us to provide better online education courses.”
Video game publisher Ubisoft is working with Mozilla to develop an artificial intelligence coding assistant called Clever-Commit, head of Ubisoft La Forge Yves Jacquier announced during DICE Summit 2019 on Tuesday.
Clever-Commit reportedly helps programmers evaluate whether or not a code change will introduce a new bug by learning from past bugs and fixes. The prototype, called Commit-Assistant, was tested using data collected during game development, Ubisoft said, and it’s already contributing to some major AAA titles. The publisher is also working on integrating it into other brands.
“Working with Mozilla on Clever-Commit allows us to support other programming languages and increase the overall performances of the technology. Using this tech in our games and Firefox will allow developers to be more productive as they can spend more time creating the next feature rather than fixing bugs. Ultimately, this will allow us to create even better experiences for our gamers and increase the frequency of our game updates,” said Mathieu Nayrolles, technical architect, data scientist, and member of the Technological Group at Ubisoft Montreal.
Mozilla is assisting Ubisoft by providing programming language expertise in Rust, C++, and Javascript. The technology will also help the company ship more stable versions of its Firefox internet browser.
Imagine using machine learning to ensure that the pieces of an aircraft fit together more precisely, and can be assembled with less testing and time. That is one of the uses behind new technology being developed by researchers at Purdue University and the University of Southern California.
“We’re really taking a giant leap and working on the future of manufacturing,” said Arman Sabbaghi, an assistant professor of statistics in Purdue’s College of Science, who led the research team at Purdue with support from the National Science Foundation. “We have developed automated machine learning technology to help improve additive manufacturing. This kind of innovation is heading on the path to essentially allowing anyone to be a manufacturer.”
The technology addresses a current significant challenge within additive manufacturing: individual parts that are produced need to have a high degree of precision and reproducibility. The technology allows a user to run the software component locally within their current network, exposing an API, or programming interface. The software uses machine learning to analyze the product data and create plans to manufacture the needed pieces with greater accuracy.
“This has applications for many industries, such as aerospace, where exact geometric dimensions are crucial to ensure reliability and safety,” Sabbaghi said. “This has been the first time where I’ve been able to see my statistical work really make a difference and it’s the most incredible feeling in the world.”
The researchers have developed a new model-building algorithm and computer application for geometric accuracy control in additive manufacturing systems. Additive manufacturing, commonly known as 3-D printing, is a growing industry that involves building components in a way that is similar to an inkjet printer where parts are ‘grown’ from the building surface.
Additive manufacturing has progressed from a prototype development tool to one that can now offer numerous competitive advantages. Those advantages include shape complexity, waste reduction and potentially less expensive manufacturing, compared to traditional subtractive manufacturing where the process involves starting with the raw material and chipping away at it to produce a final result.
Wohlers Associates estimates that additive manufacturing is a $7.3 billion industry.
“We use machine learning technology to quickly correct computer-aided design models and produce parts with improved geometric accuracy,” Sabbaghi said. The improved accuracy ensures that the produced parts are within the needed tolerances and that every part produced is consistent and will perform that same way, whether it was created on a different machine or 12 months later
AdScan biedt elke adverteerder een snelle gratis pre-test om in kaart te brengen welke elementen beter of minder scoren en daarmee effect hebben op de ontvangst en het effect van die specifieke commercial.
AdScan is een machine learning-tool die op basis van de inhoud van reclames een voorspelling kan doen over hoe een panel van honderd mensen een reclame beoordeelt. AdScan combineert daarbij historische paneldata, computerpatronen en slimme algoritmes om zo tot een analyse te komen.
De reclamewaarderingstool levert binnen 20 minuten een adviesrapport dat kan bijdragen aan het succes van een campagne. AdScan stelt dan vast of een reclame lager, gemiddeld of hoger dan de benchmark scoort en welke elementen je aan kunt passen om tot een hogere score te komen.
McCormick — the maker of Old Bay and other seasonings, spices and condiments — hopes the technology can help it tantalize taste buds. It worked with IBM Research to build an AI system trained on decades worth of data about spices and flavors to come up with new flavor combinations.
The Baltimore, Maryland-based company plans to bring its first batch of AI-assisted products to market later this year. The line of seasoning mixes, called One, for making one-dish meals, includes flavors such as Tuscan Chicken and Bourbon Pork Tenderloin.
Hamed Faridi, McCormick’s chief science officer, told CNN Business that using AI cuts down product development time, and that the company plans to use the technology to help develop all new products by the end of 2021.
Columbia Engineering researchers have made a major advance in robotics by creating a robot that learns what it is, from scratch, with zero prior knowledge of physics, geometry, or motor dynamics. Initially the robot does not know if it is a spider, a snake, an arm–it has no clue what its shape is. After a brief period of “babbling,” and within about a day of intensive computing, their robot creates a self-simulation. The robot can then use that self-simulator internally to contemplate and adapt to different situations, handling new tasks as well as detecting and repairing damage in its own body. The work is published today in Science Robotics.
To date, robots have operated by having a human explicitly model the robot. “But if we want robots to become independent, to adapt quickly to scenarios unforeseen by their creators, then it’s essential that they learn to simulate themselves,” says Hod Lipson, professor of mechanical engineering, and director of the Creative Machines lab, where the research was done.
A robot built by a team of researchers at MIT in America has two prongs for fingers, sensors in its wrist, and a camera for eyes.
As the AI-powered bot surveys the tower, one of its prongs is told by software to poke a block, which sends feedback to its sensor to work out how movable that particular block is. If it’s too stiff, the robot will try another block, and keep pushing in millimetre increments until it has protruded far enough to be removed and placed on top of the tower.
Prodding until you find a suitable block to push may seem like cheating, but, well, given the state of 2019 so far, we’ll take a rule-stretching robot any day. Here it is in action…
“Unlike in more purely cognitive tasks or games such as chess or Go, playing the game of Jenga also requires mastery of physical skills such as probing, pushing, pulling, placing, and aligning pieces,” said Alberto Rodriguez, an assistant professor of mechanical engineering at MIT, this week.
“It requires interactive perception and manipulation, where you have to go and touch the tower to learn how and when to move blocks. This is very difficult to simulate, so the robot has to learn in the real world, by interacting with the real Jenga tower. The key challenge is to learn from a relatively small number of experiments by exploiting common sense about objects and physics.”
The Remako HD Graphics Mod is a mod that completely revamps the pre-rendered backgrounds of the classic JRPG Final Fantasy VII. All of the backgrounds now have 4 times the resolution of the original.
Using state of the art AI neural networks, this upscaling tries to emulate the detail the original renders would have had. This helps the new visuals to come as close to a higher resolution re-rendering of the original as possible with current technology.
What does it look like?
Bbelow are two trailers. One is a comparison of the raw images, while the other shows off the mod in action.
To advance the state-of-the-art in speech neuroprosthesis, we combined the recent advances in deep learning with the latest innovations in speech synthesis technologies to reconstruct closed-set intelligible speech from the human auditory cortex. We investigated the dependence of reconstruction accuracy on linear and nonlinear (deep neural network) regression methods and the acoustic representation that is used as the target of reconstruction, including auditory spectrogram and speech synthesis parameters. In addition, we compared the reconstruction accuracy from low and high neural frequency ranges. Our results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram
Now, we introduce our StarCraft II program AlphaStar, the first Artificial Intelligence to defeat a top professional player. In a series of test matches held on 19 December, AlphaStar decisively beat Team Liquid’s Grzegorz “MaNa” Komincz, one of the world’s strongest professional StarCraft players, 5-0, following a successful benchmark match against his team-mate Dario “TLO” Wünsch. The matches took place under professional match conditions on a competitive ladder map and without any game restrictions.
Although there have been significant successes in video games such as Atari, Mario, Quake III Arena Capture the Flag, and Dota 2, until now, AI techniques have struggled to cope with the complexity of StarCraft. The best results were made possible by hand-crafting major elements of the system, imposing significant restrictions on the game rules, giving systems superhuman capabilities, or by playing on simplified maps. Even with these modifications, no system has come anywhere close to rivalling the skill of professional players. In contrast, AlphaStar plays the full game of StarCraft II, using a deep neural network that is trained directly from raw game data by supervised learning and reinforcement learning.
The Victoria Police are the primary law enforcement agency of Victoria, Australia. With over 16,000 vehicles stolen in Victoria this past year — at a cost of about $170 million — the police department is experimenting with a variety of technology-driven solutions to crackdown on car theft. They call this system BlueNet.
To help prevent fraudulent sales of stolen vehicles, there is already a VicRoads web-based service for checking the status of vehicle registrations. The department has also invested in a stationary license plate scanner — a fixed tripod camera which scans passing traffic to automatically identify stolen vehicles.
Don’t ask me why, but one afternoon I had the desire to prototype a vehicle-mounted license plate scanner that would automatically notify you if a vehicle had been stolen or was unregistered. Understanding that these individual components existed, I wondered how difficult it would be to wire them together.
But it was after a bit of googling that I discovered the Victoria Police had recently undergone a trial of a similar device, and the estimated cost of roll out was somewhere in the vicinity of $86,000,000. One astute commenter pointed out that the $86M cost to fit out 220 vehicles comes in at a rather thirsty $390,909 per vehicle.
Surely we can do a bit better than that.
Existing stationary license plate recognition systems
The Success Criteria
Before getting started, I outlined a few key requirements for product design.
Requirement #1: The image processing must be performed locally
Streaming live video to a central processing warehouse seemed the least efficient approach to solving this problem. Besides the whopping bill for data traffic, you’re also introducing network latency into a process which may already be quite slow.
Although a centralized machine learning algorithm is only going to get more accurate over time, I wanted to learn if an local on-device implementation would be “good enough”.
Requirement #2: It must work with low quality images
Since I don’t have a Raspberry Pi camera or USB webcam, so I’ll be using dashcam footage — it’s readily available and an ideal source of sample data. As an added bonus, dashcam video represents the overall quality of footage you’d expect from vehicle mounted cameras.
Requirement #3: It needs to be built using open source technology
Relying upon a proprietary software means you’ll get stung every time you request a change or enhancement — and the stinging will continue for every request made thereafter. Using open source technology is a no-brainer.
My solution
At a high level, my solution takes an image from a dashcam video, pumps it through an open source license plate recognition system installed locally on the device, queries the registration check service, and then returns the results for display.
The data returned to the device installed in the law enforcement vehicle includes the vehicle’s make and model (which it only uses to verify whether the plates have been stolen), the registration status, and any notifications of the vehicle being reported stolen.
If that sounds rather simple, it’s because it really is. For example, the image processing can all be handled by the openalpr library.
This is really all that’s involved to recognize the characters on a license plate:
A Minor Caveat
Public access to the VicRoads APIs is not available, so license plate checks occur via web scraping for this prototype. While generally frowned upon — this is a proof of concept and I’m not slamming anyone’s servers.
Here’s what the dirtiness of my proof-of-concept scraping looks like:
Results
I must say I was pleasantly surprised.
I expected the open source license plate recognition to be pretty rubbish. Additionally, the image recognition algorithms are probably not optimised for Australian license plates.
The solution was able to recognise license plates in a wide field of view.
Annotations added for effect. Number plate identified despite reflections and lens distortion.
Although, the solution would occasionally have issues with particular letters.
Incorrect reading of plate, mistook the M for an H
But … the solution would eventually get them correct.
A few frames later, the M is correctly identified and at a higher confidence rating
As you can see in the above two images, processing the image a couple of frames later jumped from a confidence rating of 87% to a hair over 91%.
I’m confident, pardon the pun, that the accuracy could be improved by increasing the sample rate, and then sorting by the highest confidence rating. Alternatively a threshold could be set that only accepts a confidence of greater than 90% before going on to validate the registration number.
Those are very straight forward code-first fixes, and don’t preclude the training of the license plate recognition software with a local data set.
The $86,000,000 Question
To be fair, I have absolutely no clue what the $86M figure includes — nor can I speak to the accuracy of my open source tool with no localized training vs. the pilot BlueNet system.
I would expect part of that budget includes the replacement of several legacy databases and software applications to support the high frequency, low latency querying of license plates several times per second, per vehicle.
On the other hand, the cost of ~$391k per vehicle seems pretty rich — especially if the BlueNet isn’t particularly accurate and there are no large scale IT projects to decommission or upgrade dependent systems.
Future Applications
While it’s easy to get caught up in the Orwellian nature of an “always on” network of license plate snitchers, there are many positive applications of this technology. Imagine a passive system scanning fellow motorists for an abductors car that automatically alerts authorities and family members to their current location and direction.
Teslas vehicles are already brimming with cameras and sensors with the ability to receive OTA updates — imagine turning these into a fleet of virtual good samaritans. Ubers and Lyft drivers could also be outfitted with these devices to dramatically increase the coverage area.
Using open source technology and existing components, it seems possible to offer a solution that provides a much higher rate of return — for an investment much less than $86M.
TAUS, the language data network, is an independent and neutral industry organization. We develop communities through a program of events and online user groups and by sharing knowledge, metrics and data that help all stakeholders in the translation industry develop a better service. We provide data services to buyers and providers of language and translation services.
The shared knowledge and data help TAUS members decide on effective localization strategies. The metrics support more efficient processes and the normalization of quality evaluation. The data lead to improved translation automation.
TAUS develops APIs that give members access to services like DQF, the DQF Dashboard and the TAUS Data Market through their own translation platforms and tools. TAUS metrics and data are already built in to most of the major translation technologies.
Robots normally need to be programmed in order to get them to perform a particular task, but they can be coaxed into writing the instructions themselves with the help of machine learning, according to research published in Science.
Engineers at Vicarious AI, a robotics startup based in California, USA, have built what they call a “visual cognitive computer” (VCC), a software platform connected to a camera system and a robot gripper. Given a set of visual clues, the VCC writes a short program of instructions to be followed by the robot so it knows how to move its gripper to do simple tasks.
“Humans are good at inferring the concepts conveyed in a pair of images and then applying them in a completely different setting,” the paper states.
“The human-inferred concepts are at a sufficiently high level to be effortlessly applied in situations that look very different, a capacity so natural that it is used by IKEA and LEGO to make language-independent assembly instructions.”
Don’t get your hopes up, however, these robots can’t put your flat-pack table or chair together for you quite yet. But it can do very basic jobs, like moving a block backwards and forwards.
It works like this. First, an input and output image are given to the system. The input image is a jumble of colored objects of various shapes and sizes, and the output image is an ordered arrangement of the objects. For example, the input image could be a number of red blocks and the output image is all the red blocks ordered to form a circle. Think of it a bit like a before and after image.
The VCC works out what commands need to be performed by the robot in order to organise the range of objects before it, based on the ‘before’ to the ‘after’ image. The system is trained to learn what action corresponds to what command using supervised learning.
Dileep George, cofounder of Vicarious, explained to The Register, “up to ten pairs [of images are used] for training, and ten pairs for testing. Most concepts are learned with only about five examples.”
Here’s a diagram of how it works:
A: A graph describing the robot’s components. B: The list of commands the VCC can use. Image credit: Vicarious AI
The left hand side is a schematic of all the different parts that control the robot. The visual hierarchy looks at the objects in front of the camera and categorizes them by object shape and colour. The attention controller decides what objects to focus on, whilst the fixation controller directs the robot’s gaze to the objects before the hand controller operates the robot’s arms to move the objects about.
The robot doesn’t need too many training examples to work because there are only 24 commands, listed on the right hand of the diagram, for the VCC controller.
AI systems excel in pattern recognition, so much so that they can stalk individual zebrafish and fruit flies even when the animals are in groups of up to a hundred.
To demonstrate this, a group of researchers from the Champalimaud Foundation, a private biomedical research lab in Portugal, trained two convolutional neural networks to identify and track individual animals within a group. The aim is not so much to match or exceed humans’ ability to spot and follow stuff, but rather to automate the process of studying the behavior of animals in their communities.
“The ultimate goal of our team is understanding group behavior,” said Gonzalo de Polavieja. “We want to understand how animals in a group decide together and learn together.”
The resulting machine-learning software, known as idtracker.ai, is described as “a species-agnostic system.” It’s “able to track all individuals in both small and large collectives (up to 100 individuals) with high identification accuracy—often greater than 99.9 per cent,” according to a paper published in Nature Methods on Monday.
The idtracker.ai software is split into a crossing-detector network and an identification network. First, it was fed video footage of the animals interacting in their enclosures. For example in the zebrafish experiment, the system pre-processes the fish as coloured blobs and learns to identify the animals as individuals or which ones are touching one another or crossing past each other in groups. The identification network is then used to identify the individual animals during each crossing event.
Surprisingly, it reached an accuracy rate of up to 99.96 per cent for groups of 60 zebrafish and increased to 99.99 per cent for 100 zebrafish. Recognizing fruit flies is harder. Idtracker.ai was accurate to 99.99 per cent for 38 fruit flies, but decreased slightly to 99.95 per cent for 72 fruit flies.