Palit GeForce GTX 1660 Super review: testing a novelty in computing and machine learning
In the fall of 2018, Nvidia released a line of RTX graphics cards based on chips codenamed Turing, which have built-in tensor and RT cores. The community of enthusiasts using conventional gaming graphics cards for machine learning and artificial intelligence greeted the new product without much enthusiasm: yes, on the one hand, half-precision calculations in Floating Point 16 operations give a serious speed increase, sometimes by 40-50% compared to Floating calculations Point 32. But on the other hand, the high cost of video cards prompted the use of "cloud services", which are not always convenient, not always secure, and not always clear to configure.
Fortunately, GPU performance in games and in machine learning tasks often go hand in hand, and in the field of gaming hardware, the "greens" have a very strong competitor in the person of AMD. To combat them, Nvidia released the GeForce GTX 1660 series already without tensor cores, but with 6 GB of video memory, which until recently had two models: a GTX 1660 Ti with 1536 stream processors at 1635 MHz and a regular GTX 1660 with 1408 CUDA cores at 1785 MHz. The new GeForce GTX 1660 Super differs from the usual GeForce GTX 1660 only in the type of memory - GDDR6 instead of GDDR5, and by this parameter, the new product becomes a record in the GTX 1660 series, providing the user with a record bandwidth of 336 GB/s. This is 75% higher than the regular GeForce GTX 1660 and 16% higher than the "titanium series". GDDR6 memory has a higher bandwidth per pin than GDDR5, so it has a faster data transfer rate at a lower frequency.
For comparison, in terms of RAM bandwidth, the new product approached the GTX 1080 or RTX 2060, which have a speed of 352 Gb/s and 332 Gb/s, respectively, but they are still far from RTX2080 (Ti) with their record 448 (616) MB/s. ... In comparison, the Nvidia Tesla T4 has "only" about 320 GB/s, the Tesla K80, which you can try on Google Colab, 420 GB/s, the Tesla P100, 720 GB/s, and the Tesla V100, 900 GB/s.
According to many experts, it is the video memory bandwidth, and not the computing power, that is of key importance in the construction of neural networks. For example, in face recognition, the volume of high-quality images that must be presented for training is extremely large. It is also necessary to continually validate the results with new datasets to reduce the error rate. Depending on the application, new data can come in very frequently and require ongoing training. When models include many layers and nodes, there is a need for high memory and interface bandwidth to keep the neural network training and inference at peak rates.
Simply put, today the GTX 1660 Super is the only CUDA-compatible solution that, at a price below 17 thousand rubles, will give you 336 GB/s video memory speed.
Our Hero: Palit GeForce GTX 1660 Super StormX
Most video cards based on Nvidia GeForce GTX 1660 chips use a 2-fan cooling system, and our test Palit GeForce GTX 1660 Super compares favorably with its compactness in this regard: would you like to install a QNAP TS-677 in a NAS? You are welcome! In a short Mini-ITX case for 24x7 operation? Please: no length restrictions, only the height of the board should matter to you.
The cooling system uses a single heatsink not only for the video chip, but also for six memory chips and VRM elements. Thermal grease is used as a heat-conducting interface for the GPU chip, and heat-conducting pads (aka “gum”) are used for memory modules and VRMs.
With such a modest size of the cooler, Palit has managed to use three symmetrical heat-conducting tubes, and due to this move, the video card does not overheat even with such a compact size.
With such a modest size of the cooler, Palit has managed to use two symmetrical heat-conducting tubes, and due to this move, the video card does not overheat even with such a compact size.
Among the features of the board, I would like to note the metal protection from the chip cleavage when the heatsink is skewed. If you remember the days of Athlon XP, you will probably understand that on Palit video cards you can painlessly remove your native cooler, change the thermal paste and put it back in without the risk of chipping off a corner of the GPU.
The board itself, judging by the empty spaces for VRM keys and two memory chips, is designed for a more power-hungry layout. The regular, "non-overclocked" version of the video card has a 3-phase power supply scheme, which makes overclocking very successful: 1980/4800 MHz.
The first part is synthetic tests, and we'll start by evaluating integer and floating point operations.
Let's continue with Geekbench 5, which is evaluating the already basic algorithms for face recognition and the imposition of graphic filters on images.
Practically in the OctaneBench 4.0 test, which uses a real rendering engine, the new product shows very good results for one GPU.
Let's move on to a real test and measure the performance in the most popular Tensorflow/Keras framework.
Let's start with the simple tests included in the Keras package examples. We will compare with the Tesla provided by Google Colab. While Google undeniably shares GPU performance, it's important for us to understand how your local GPU compares to what you get in the cloud.
Tensorflow Keras time, seconds (lower = better)
This is the very reason why I decided not to do the standard Resnet/CIFAR10 tests. Here you go: On simple calculations, a $ 280 graphics card performs comparable to professional GPUs over $ 5K.
Let's take a real task for real learning, like the Markov Chains text analysis project known as Textgenrnn and run training on a small text file of 2.59 MB with the Batch_size = 256 parameter.
Tensorflow, TextgenRNN machine learning, time of single epoch, seconds (lower = better)
Well, in practice, the real speed of the video card is comparable to what you can get in the free Google Colab account, and you can talk as long as you like about optimizing applications and the amount of RAM that allows you to increase the batch_size parameter, but on small models this does not give any advantages, except quick transition to "overfit".
Finishing our testing, I would like to give the result of the video card working on mining Etherium with the default settings.
Energy consumption and heat package
A gaming graphics card does not need any additional airflow, and of course its cooling system is not designed to work in 24x7 mode at maximum load. Fortunately, the GPU is loaded as much as possible - we still have to try, but in a regular ATX case the board showed the following results:
- Idle mode: 17W, 38 degrees Celsius, 1000 RPM
- TensorFlow Keras Textgenrnn: 95W, 60 degrees Celsius, 1753 RPM
- Furmark: 124.8W, 69 degrees Celsius, 2271 RPM
The speed control is smooth, the card is clearly slow to resist increasing the speed on the blades, but at the same time it just as reluctantly slows down its cooler. In general, from the point of view of acoustic comfort - normal.
Before us is an excellent video card for the "micro-cloud", which you can assemble in the Mini_ITX form factor, put it on a closet or in a closet, and periodically load it through Jupiter with calculations for your projects. At the time of our review, there were no official Linux drivers for the GeForce GTX 1660 Super yet, but their appearance is only a matter of time. And so, a good cooling system with 3 heat-conducting pipes and a compact form factor is almost ideal for self-assembly, in which you need to be sure only that the board will fit into the case in height. Today it is any modern case, except for telecommunications with a height of 3U and some recumbent models for HTPC under the TV. Although, what are we talking about? There are no low-end gaming graphics cards, and the Palit GeForce GTX 1660 Super doesn't stand out here.
Of course, the fact that the Palit GeForce GTX 1660 Super does not stop the fan in idle mode is a drawback, but the noise level of the card at this time is below the background and cannot be measured, that is, as long as the cooler is new, this card will not buzz you. Well, three heat-conducting tubes are the readiness of the video card to work in any conditions, even in compact, poorly ventilated cases.
Mikhail Degtyarev (aka LIKE OFF)