Nvidia’s flagship Titan V graphics cards may have hardware gremlins causing them to spit out different answers to repeated complex calculations under certain conditions, according to computer scientists.
The Titan V is the Silicon Valley giant’s most powerful GPU board available to date, and is built on Nv’s Volta technology. Gamers and casual users will not notice any errors or issues, however folks running intensive scientific software may encounter occasional glitches.
One engineer told The Register that when he tried to run identical simulations of an interaction between a protein and enzyme on Nvidia’s Titan V cards, the results varied. After repeated tests on four of the top-of-the-line GPUs, he found two gave numerical errors about 10 per cent of the time. These tests should produce the same output values each time again and again. On previous generations of Nvidia hardware, that generally was the case. On the Titan V, not so, we’re told.
We have repeatedly asked Nvidia for an explanation, and spokespeople have declined to comment. With Nvidia kicking off its GPU Technology Conference in San Jose, California, next week, perhaps then we’ll get some answers.
All in all, it is bad news for boffins as reproducibility is essential to scientific research. When running a physics simulation, any changes from one run to another should be down to interactions within the virtual world, not rare glitches in the underlying hardware.
Unlike previous GeForce and Titan GPUs, the Titan V is geared not so much for gamers but for handling intensive parallel computing workloads for data science, modeling, and machine learning.
And at $2,999 (£2,200) a pop, it’s not cheap to waste resources and research time on faulty hardware. Engineers speaking to The Register on condition of anonymity to avoid repercussions from Nvidia said the best solution to these problems is to avoid using Titan V altogether until a software patch has been released to address the mathematical oddities.
This kind of reminds me of when Intel brought out the Pentium. They couldn’t count either.