Nvidia's Titan V Accused of Returning Wrong Answers in Simulations

This site may earn affiliate commissions from the links on this page. Terms of use.

Titan-V-Feature

Nvidia has long held the pole position in GPGPU computing, particularly in scientific and HPC applications. The company's long-term investment into CUDA and loftier performance computing take won information technology a number of spots in the supercomputing TOP500 and fueled the growth of its Tesla product line, including GPUs similar the $three,000 Titan 5, a Volta-based graphics carte du jour that straddles the line betwixt a consumer and a scientific production. But all may not be well with the Titan 5 — there are reports that the chip can produce different results from run to run.

That'south the word from The Register, which writes:

Ane engineer told The Register that when he tried to run identical simulations of an interaction betwixt a poly peptide and enzyme on Nvidia'southward Titan V cards, the results varied. After repeated tests on four of the top-of-the-line GPUs, he found ii gave numerical errors about x per cent of the fourth dimension. These tests should produce the same output values each time over again and again. On previous generations of Nvidia hardware, that generally was the case. On the Titan Five, non so, we're told.

The Reg goes on to note that it likewise spoke to an "industry veteran," who speculated that the trouble may exist due to issues with HBM retention. That aforementioned private noted that this could be due to problems with the GPU's onboard RAM, and that Nvidia had encountered this kind of issue earlier and been forced to issue patches to accost information technology.

Titan-V-Story

Elsewhere, other communities have noted that the problem could be overblown. Floating signal parallel calculating is not necessarily deterministic, which is to say it does not automatically yield identical results every single fourth dimension. If the order of operations is different from run to run, the terminal result could also be unlike.

Information technology seems unlikely, however, that scientists and researchers would mistake a known issue (not-deterministic output in parallel FP calculations) for a significant hardware outcome. The Reg'due south source indicated the Titan V could give incorrect results about 10 per centum of the time, only did non provide details on which applications were afflicted, whether the frequency of the problem varied from application to application, or if it could be impacted by irresolute various GPU settings.

Right now, what we have are more questions than answers. The problem, if it exists, might be addressable via driver or a lawmaking change. It might besides reflect a problem with the GPU's memory subsystem, equally The Reg speculates. Some HPC applications have updated their ain websites to bespeak they are enlightened of the potential issue and haven't seen information technology yet. It's also possible that the issue is limited to a handful of cards and not indicative of a full general problem.

Equally for Nvidia, the company has told the Reg it is aware of the upshot and has invited anyone affected to contact Nvidia itself. The Titan V isn't really positioned as a gaming GPU, only games practise not appear to be impacted or affected at this time.