Daz Studio Iray - Rendering Hardware Benchmarking

1101113151623

Comments

  • LenioTGLenioTG Posts: 2,104

    RTX 2070 Super + RTX 2060

    (with the new stable build of Daz)

    System Configuration
    System/Motherboard: Asus TUF X570
    CPU: Ryzen 5 3600
    GPU: RTX 2070 Super + RTX 2060
    System Memory: 64GB 3600MHz C16
    OS Drive: Sabrent Rocket 250GB
    Asset Drive: Sabrent Rocket Q 1TB
    Operating System: Windows 10 Pro 10.0.19041
    Nvidia Drivers Version: Studio Driver 456.71
    Daz Studio Version: 4.14.0.8

    Benchmark Results
    DAZ_STATS
    2020-11-11 10:57:19.078 Finished Rendering
    2020-11-11 10:57:19.104 Total Rendering Time: 2 minutes 50.87 seconds
    IRAY_STATS
    2020-11-11 10:57:25.654 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:
    2020-11-11 10:57:25.654 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2060): 697 iterations, 2.551s init, 165.563s render
    2020-11-11 10:57:25.654 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 2070 SUPER): 1103 iterations, 1.930s init, 165.939s render
    Iteration Rate: 10.53 iterations/s
    Loading Time: 4.93 seconds

  • chrislb said:
    Are you going to pick up a Nvlink for these?

     

    Maybe.  Nvidia and EVGA don't have 3 slot spacing NVLink bridges and have no ETA for availability.  Yet most motherboards use 3 slot spacing for PCIEx16 slots.  Also, one fo the 3090's is on loan to me for a short time to do some testing.

  • I did one more test.  I didn't have enough room inside my case to fit three graphics cards, so I used a PCIEx16 riser cable from a vertical GPU mount kit to add in the 2080 Super graphics card(one of the ones I used in previous benchmarks posted here) with both 3090s.  I also used two power supplies(1300 watt and 760 watt) to power all three cards.  

    I was able to get the benchmark render time under 1 minute.

    System Configuration:

    System/Motherboard: MSI MEG X570 ACE

    CPU: AMD R9 3950X @ Stock with PBO +200

    GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

    System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

    OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

    Asset Drive: XPG SX 8100 NVMe SSD

    Operating System: Windows 10 Pro version 2004 Build 19041.450

    Nvidia Drivers Version: Version 457.30

    Daz Studio Version: 4.12.2.60 Public Beta

     

    Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards and one EVGA RTX 2080 Super card only no CPU rendering

     

    2020-11-11 14:08:20.869 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Maximum number of samples reached.

    2020-11-11 14:08:21.434 Finished Rendering

    2020-11-11 14:08:21.477 Total Rendering Time: 58.96 seconds

    2020-11-11 14:08:24.652 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-11 14:08:24.652 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 318 iterations, 2.349s init, 53.674s render

    2020-11-11 14:08:24.652 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 752 iterations, 1.891s init, 53.981s render

    2020-11-11 14:08:24.652 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 3090): 730 iterations, 1.995s init, 53.197s render

     

    Iteration Rate: (1800/53.981) = 33.3450 iterations per second

    Loading Time: ((58.96 seconds) - 53.981) = 4.979 seconds

  • @chrislb That's crazy! You broke the equivalent of the 4 minute mile! Congratulations!

  • RayDAntRayDAnt Posts: 877
    chrislb said:

    It looks like using the CPU with a pair of 3090's actually increases render time.  The 3090 cards I used were the EVGA GTX 3090 FTW3 Ultra cards.  https://www.evga.com/products/product.aspx?pn=24G-P5-3987-KR

    They have a higher base and boost clock than the Nvidia founder's edition cards and a power limit of 420 per card watts with the stock BIOS.  Both cards hit 2000+ MHz during the render.

    If you use PX1 or MSI Afterburner, you can increase the stock power limit to 450 watts per card.  EVGA also has a BIOS for the cards which can increase the power limit to 500 watts per card.  I may be able to get the render time for this benchmark under 1 minute if I use the 500 watt BIOS and the software to raise the power limit to 500 watts.  I actually hit the card's 420 watt power limit during the render.

     

    System Configuration

    System/Motherboard: MSI MEG X570 ACE

    CPU: AMD R9 3950X @ Stock with PBO +200

    GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

    System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

    OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

    Asset Drive: XPG SX 8100 NVMe SSD

    Operating System: Windows 10 Pro version 2004 Build 19041.450

    Nvidia Drivers Version: Version 457.30

    Daz Studio Version: 4.12.2.60 Public Beta

     

    Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards only no CPU rendering

     

    2020-11-10 20:45:43.570 Finished Rendering

    2020-11-10 20:45:43.619 Total Rendering Time: 1 minutes 8.35 seconds

     

    2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 896 iterations, 1.863s init, 63.361s render

    2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 3090): 904 iterations, 1.812s init, 63.570s render

     

    Iteration Rate: (1800/63.570) = 28.315 iterations per second

    Loading Time: ((68.35 seconds) - 63.570) = 4.78 seconds

     

     

    Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra and 3950X CPU

     

    2020-11-10 20:56:09.682 Finished Rendering

    2020-11-10 20:56:09.728 Total Rendering Time: 1 minutes 11.95 seconds

     

    2020-11-10 20:56:12.284 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 868 iterations, 2.582s init, 66.549s render

    2020-11-10 20:56:12.284 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 3090): 851 iterations, 2.511s init, 65.870s render

    2020-11-10 20:56:12.284 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CPU: 81 iterations, 2.107s init, 66.197s render

     

    Iteration Rate: (1800/ 66.549 ) = 27.047 iterations per second

    Loading Time: ((71.95 seconds) - 66.549) = 5.401 seconds

    Are you going to pick up a Nvlink for these?

    Yeah! Imo you might as well (not that I would expect you to find it very useful - NVLink VRAM pooled rendering in Iray is always going to be slower than just using the cards independently. Therefore the only usecase in which it makes sense if you've got >24GB scenes to render...)

  • @chrislb That's crazy! You broke the equivalent of the 4 minute mile! Congratulations!

    Thanks!

     

    It looks like I hit a point of diminishing returns.  Adding another 2080 Super didn't help much.  I'm not sure what the bottleneck is at this point.  Maybe RAM speed or SSD read speed?

     

    System Configuration:

    System/Motherboard: MSI MEG X570 ACE

    CPU: AMD R9 3950X @ Stock with PBO +200

    GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

    System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

    OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

    Asset Drive: XPG SX 8100 NVMe SSD

    Operating System: Windows 10 Pro version 2004 Build 19041.450

    Nvidia Drivers Version: Version 457.30

    Daz Studio Version: 4.12.2.60 Public Beta

     

    Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards and two RTX 2080 Super cards no CPU rendering

     

    2020-11-11 15:22:00.949 Finished Rendering

    2020-11-11 15:22:00.997 Total Rendering Time: 53.45 seconds

    2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 269 iterations, 2.792s init, 47.407s render

    2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 3 (GeForce RTX 2080 SUPER): 270 iterations, 2.394s init, 47.577s render

    2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 643 iterations, 1.926s init, 47.737s render

    2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 3090): 618 iterations, 2.108s init, 47.181s render

     

    Iteration Rate: (1800/47.737) = 37.706 iterations per second

    Loading Time: ((53.45 seconds) - 47.737) = 5.713 seconds

  • RayDAntRayDAnt Posts: 877

    @chrislb mind doing another benchmark run with both 2080 Super's but just a single 3090? That way we have a complete spread of how all the card combos work out for you.

  • outrider42outrider42 Posts: 2,907

    It is still scaling decently pretty well, part of it is just that the 3090s are so much faster it doesn't feel like the 2080 Supers add much in comparison. One 3090 is running 14.1 iterations per second. On the bench list, a 2080 Super hit 6.3. The combined total is 40.8 in theory. That you hit 37.7 is not so bad in my book. Iray does not scale perfectly, you will find scaling more difficult with more cards. I can tell you it is not SSD, as Iray doesn't touch SSD. This is purely in the communication between the cards, and maybe the CPU as it connects them, but you have a high end 5950. So we are talking pcie and bandwidth. You might only be able to go faster with a workstation class CPU as it provides more lanes, but even that is not likely to gain much performance. 

  • RayDAnt said:

    @chrislb mind doing another benchmark run with both 2080 Super's but just a single 3090? That way we have a complete spread of how all the card combos work out for you.

    I already took everything apart.  It was just a temporary test setup to see what would happen with all that GPU power until I get more parts that I need to fit everything inside the case.  I had to use PCIEx16 extension cables because the thickness of the 3090 heatsinks won't allow them to fit inside the case with the 2080 Supers.  When I can get water blocks for them, the 3090s will only take up one PCIE slot width and I can put in at least one 2080 Super or both 2080 supers and a 3090.  Also two 3090's and two 2080 Supers are near the limit of a 15 amp circuit breaker in the U.S.  If the cards hit full power that's 450-500 watts per 3090 and 330 watts per 2080 super (1560 to 1660 watts) plus the power draw of the rest of the system.

    Here is the test setup and why I didn't leave it up for long.

    I'm wondering if the issue of diminishing returns is not enough bandwidth on the PCI Express bus for 4 GPUs on an x570 chipset board.

  • outrider42outrider42 Posts: 2,907

    That would be my guess, there just isn't enough bandwidth to share with 4 powerful cards. The big Nvlink switch (not to be confused with our consumer Nvlink) that powers Nvidia's DG boxes works by bypassing most of it. The switch has its own processor to control the traffic from the 16 GPUs.

    Very cool experiment.

  • That is a terrifying setup.

  • RayDAntRayDAnt Posts: 877
    edited November 2020
    chrislb said:
    RayDAnt said:

    @chrislb mind doing another benchmark run with both 2080 Super's but just a single 3090? That way we have a complete spread of how all the card combos work out for you.

    I already took everything apart.  It was just a temporary test setup to see what would happen with all that GPU power until I get more parts that I need to fit everything inside the case.  I had to use PCIEx16 extension cables because the thickness of the 3090 heatsinks won't allow them to fit inside the case with the 2080 Supers.  When I can get water blocks for them, the 3090s will only take up one PCIE slot width and I can put in at least one 2080 Super or both 2080 supers and a 3090.  Also two 3090's and two 2080 Supers are near the limit of a 15 amp circuit breaker in the U.S.  If the cards hit full power that's 450-500 watts per 3090 and 330 watts per 2080 super (1560 to 1660 watts) plus the power draw of the rest of the system.

    Here is the test setup and why I didn't leave it up for long.

    I'm wondering if the issue of diminishing returns is not enough bandwidth on the PCI Express bus for 4 GPUs on an x570 chipset board.

    In non-memory-pooled setups, once Iray is finished with the initial loading of assets onto each participating GPU, the only traffic that goes on between each of those GPUs and the rest of the system is occasional low-bandwidth command/control messages (going from Iray's master rendering thread, located on the CPU, to the native Cuda Iray kernel running on each GPU die) and periodic pixel value updates (going from each card to the final rendered image's master framebuffer, also located on CPU.)  Meaning that you are most likely looking at - at most - dozens of megabits of PCI-E traffic (ie. basically nothing) going on once the render truly starts. So it is extremely unlikely that lack of PCI-E bandwidth - even on such an overloaded setup such as this - is the weakest link.

    Imo it's much more likely that the reason why the 2080 Supers don't seem to scale too well while in company of 3090s is because of the same Iray scheduler behavior previously brought up here that causes CPUs to actually detract from overall rendering performance. Notice that in the last test you ran, each 2080 Super was credited with completing less than 300 iterations, whereas each 3090 was credited with greater than 600. That's a ballpark greater than 1/2 performance gap between the two cards. Which is right where Iray's scheduler is designed to start taking things into its own hands.

    Post edited by RayDAnt on
  • outrider42outrider42 Posts: 2,907

    It is possible, but that doc is from 2017. Iray RTX changes things around a bit. While CUDA almost always scales the same between different GPUs, that is, if GPU A is twice as fast as GPU B in one scene it is almost always twice as fast at every scene, this doesn't hold true for Iray RTX. We have seen that the RT cores can wildly alter the performance depending on the geometry present. That would create a situation where the a GPU might be twice as fast in one scene, but in another scene it becomes 3 times faster. Or maybe in another scene it is only 1.5 times faster. For users with performance gaps between GPUs, this could lead to erratic performance scaling as one card might work sometimes, but then not others. Additionally, we have plenty of GPU tests where two GPUs are far more mismatched than a 3090 versus a 2080 Super. Like the 2080ti and 980ti test.

    Looking at the data here, we can examine how the performance scaled. We have a test with 3 GPUs and then the 4 GPUs. I think these two tests show something else is at play.

    So the 3 GPU test, with two 3090s and one 2080 Super.

     

    Iteration Rate: (1800/53.981) = 33.3450 iterations per second

    We established that the 3090 in this rig hits 14.1 iterations per second. So, just do some math. 33.345 - 14.1 - 14.1 = 5.145

    If we were to assume the 3090s were hitting that number, that leaves the 2080 Super with 5.145 iterations per second. This 5.145 is indeed lower than what a 2080 Super should be doing, by over a whole iteration.

    The 4 GPU test.

    Iteration Rate: (1800/47.737) = 37.706 iterations per second

    Again, if we do the math, 37.706 - 14.1 - 14.1 = 9.506.

    Hold up! If you divide that by two, you now only get 4.753 iterations per second. The iteration rate dropped more.

    If what you said about Iray scheduling was correct, then I would assume we should see similar performance drops in the 3 GPU test as well. The 2080 Super, at just over 6 iterations per second by itself, is well below the 50% threshold, and thus should have been hit harder in its pairing.

    I believe if we tested one 3090 with one 2080 Super, we would see both cards run near their solo speeds. I believe what are seeing here is scaling decreasing from 3+ GPUs in play. I don't think I have seen a situation where 3 or 4 GPUs ran the same speeds they do when solo or paired in twos. Just look at the chart. We have a test with four 2080tis together. In that test, they ran for a total of 26.699 iterations. If you split this by 4, that would give give performance of 6.67 iterations for each card. However, a single 2080ti gets 7.4 iterations with Iray RTX. Even the Pre-RTX Iray was slightly faster. So we have clear evidence that scaling is indeed an issue, as with 100% scaling they should be getting over 29.6 iterations with four 2080tis, and they are not. In fact, notice the difference is about 3 iterations off the theoretical peak, which is actually right about the same amount chrislb's setup lost compared to its theoretical peak of 40.8. That does not seem like a coincidence to me. Given all this information, I have to conclude we are not dealing with Iray scheduling. We are hitting some kind of performance limit from using 4 GPUs in one system.

  • WestKravenWestKraven Posts: 135
    edited November 2020

    Delete please

    Post edited by WestKraven on
  • WestKravenWestKraven Posts: 135
    edited November 2020

    System Configuration (outdated PC with "decentish" GPUs)
    System/Motherboard: Gigabyte Z87X-D3H Intel
    CPU: Intel i7-4770 3.5 GHz
    GPU: EVGA RTX 2080 Ti @ SPEED/stock , GPU2 EVGA RTX 2070 @ SPEED/stock
    System Memory: Corsair Vengeance 32 GB DDR3
    OS Drive: Samsung 870 QVO 4TB SSD
    Asset Drive:  Crucial MX500 2TB SSD
    Operating System: Windows 10 Pro v1909 Build 18363.1139 
    Nvidia Drivers Version: 451.48
    Daz Studio Version: 4.14
    Optix Prime Acceleration: Not Applicable

    Benchmark Results

    2020-11-11 20:16:54.165 Finished Rendering

    2020-11-11 20:16:54.207 Total Rendering Time: 2 minutes 26.49 seconds

     

    2020-11-11 20:16:59.827 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-11 20:16:59.827 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 2080 Ti): 1094 iterations, 3.321s init, 139.617s render

    2020-11-11 20:16:59.827 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2070): 706 iterations, 3.516s init, 139.654s render

    Rendering Performance: 1800/143.058 = 12.58 Iterations/second
    Loading Time: 146.49 - 143.058 = 3.432 seconds

    Post edited by WestKraven on
  • I decided to give the last combination a try this afternoon.

     

    System Configuration:

    System/Motherboard: MSI MEG X570 ACE

    CPU: AMD R9 3950X @ Stock with PBO +200

    GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

    System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

    OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

    Asset Drive: XPG SX 8100 NVMe SSD

    Operating System: Windows 10 Pro version 2004 Build 19041.450

    Nvidia Drivers Version: Version 457.30

    Daz Studio Version: 4.12.2.60 Public Beta

     

    Benchmark Results - One EVGA RTX 3090 FTW3 Ultra cards and two RTX 2080 Super cards no CPU rendering

     

    2020-11-12 17:05:55.553 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Maximum number of samples reached.

    2020-11-12 17:05:56.123 Finished Rendering

    2020-11-12 17:05:56.169 Total Rendering Time: 1 minutes 13.2 seconds

     

    2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2080 SUPER): 410 iterations, 2.203s init, 67.856s render

    2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 976 iterations, 1.939s init, 67.594s render

    2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 414 iterations, 1.865s init, 67.755s render

     

    Iteration Rate: (1800/67.856) = 26.5267 iterations per second

    Loading Time: ((73.2 seconds) - 67.856) = 5.344 seconds

  • RayDAnt said:
    chrislb said:
    RayDAnt said:

     

    In non-memory-pooled setups, once Iray is finished with the initial loading of assets onto each participating GPU, the only traffic that goes on between each of those GPUs and the rest of the system is occasional low-bandwidth command/control messages (going from Iray's master rendering thread, located on the CPU, to the native Cuda Iray kernel running on each GPU die) and periodic pixel value updates (going from each card to the final rendered image's master framebuffer, also located on CPU.)  Meaning that you are most likely looking at - at most - dozens of megabits of PCI-E traffic (ie. basically nothing) going on once the render truly starts. So it is extremely unlikely that lack of PCI-E bandwidth - even on such an overloaded setup such as this - is the weakest link.

    Imo it's much more likely that the reason why the 2080 Supers don't seem to scale too well while in company of 3090s is because of the same Iray scheduler behavior previously brought up here that causes CPUs to actually detract from overall rendering performance. Notice that in the last test you ran, each 2080 Super was credited with completing less than 300 iterations, whereas each 3090 was credited with greater than 600. That's a ballpark greater than 1/2 performance gap between the two cards. Which is right where Iray's scheduler is designed to start taking things into its own hands.

     

    This afternoon, I did some other testing.  I tried the 3090 along with the 2080 Supers on high quality PCIEx16 extension cables(from Fractal Design's vertical GPU mounts for its computer cases) and also on cheaper PCIEx16 extensions that use USB cables and offer PCIE 3.0 X1 to PCIE 3.0 X4 bandwidth.  The different in render times was less than 2%.  That's probably within margin of error. 

    Total render times were 67.856 seconds with the higher quality cables and 68.870 seconds with the cheap extensions.  The 2080 Supers did 414 iterations and 412 iterations with the low bandwidth cables vs 410 iterations and 414 iterations with the high bandwidth cables.

  • chrislb said:

    I decided to give the last combination a try this afternoon.

     

    System Configuration:

    System/Motherboard: MSI MEG X570 ACE

    CPU: AMD R9 3950X @ Stock with PBO +200

    GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

    System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

    OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

    Asset Drive: XPG SX 8100 NVMe SSD

    Operating System: Windows 10 Pro version 2004 Build 19041.450

    Nvidia Drivers Version: Version 457.30

    Daz Studio Version: 4.12.2.60 Public Beta

     

    Benchmark Results - One EVGA RTX 3090 FTW3 Ultra cards and two RTX 2080 Super cards no CPU rendering

     

    2020-11-12 17:05:55.553 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Maximum number of samples reached.

    2020-11-12 17:05:56.123 Finished Rendering

    2020-11-12 17:05:56.169 Total Rendering Time: 1 minutes 13.2 seconds

     

    2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2080 SUPER): 410 iterations, 2.203s init, 67.856s render

    2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 976 iterations, 1.939s init, 67.594s render

    2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 414 iterations, 1.865s init, 67.755s render

     

    Iteration Rate: (1800/67.856) = 26.5267 iterations per second

    Loading Time: ((73.2 seconds) - 67.856) = 5.344 seconds

    That's insane!!!

  • outrider42outrider42 Posts: 2,907
    edited November 2020

    Whoa, my 1080ti's speed is back! With vengeance! I downloaded the beta 4.14 and installed the latest Nvidia game drivers. If you have followed me, I have been seeing a reduction in render speed since somewhere around Drive 441.66+. Daz 4.14 has a minor Iray update, too, so it very hard to say with certainty which was the issue without testing previous versions of Daz with this new driver. I still have the general 4.12 release, so I will try that version, That should prove whether this was Iray or Nvidia drivers. But not only did I get my speed back, I rendered the bench faster than I ever did before. Before driver 441, before OptiX 6.0. And the difference is beyond any margin of error.

    Daz 4.14.0.8

    Windows 10 2004

    CPU: i5 4690K

    GPU #1:  EVGA 1080ti SC2

    GPU #2: MSI 1080ti Gaming  <--this is my display, yes, I use GPU 2 for display.

    RAM 32GB HyperX

    OS Drive Samsung 860 EVO 1TB

    Asset Drive: Samsung 860 EVO 1TB and WB 4TB Black HDD

    2020-11-13 00:56:54.938 Total Rendering Time: 3 minutes 36.73 seconds

    2020-11-13 00:57:01.544 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-13 00:57:01.544 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce GTX 1080 Ti): 906 iterations, 2.587s init, 211.195s render

    2020-11-13 00:57:01.544 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce GTX 1080 Ti): 894 iterations, 3.025s init, 211.058s render

    For reference, ever since driver 441.66 this same benchmark has been getting right at 4 minutes 30 seconds with every test and every version of Daz that supports that driver. Before this driver, my times right at 4 minutes, and sometimes just under 4 minutes. But here I blow that time away. My device render times also show this big change. My device times were around the 230 second range, or a few seconds less. But here I scored 211 seconds, shaving a pretty solid 15-20 seconds off those times. device render times with 441.66+ were in the 267 second range. So just yesterday I was rendering this benchmark nearly a full minute slower than today. By any standard that is a huge swing. I am stoked, LOL.

    One more thing, back to the monster rig that chrislb has tested, bandwidth may not be quite the word I am looking for. I think the real issue is simply down to synchronizing 4 big GPUs together. That and the physical distance they are from each other. There is probably an element of latency in their communication that bandwidth alone cannot overcome. That latency is the performance bottleneck, and that is something that will get worse as more GPUs get added to a given system. You could perhaps place some blame on scheduling, but not Iray's scheduling (at least I don't think so). Perhaps there is a way to improve multiGPU performance with better scheduling in the hardware, but I don't think it is related to the performance difference between the 3090 and the 2080 Super. We have seen combinations of GPUs with far greater performance gaps than that work fine with each other in pairs. We do not have many benchmarks with 4 pairs of GPUs, this is certainly rare, so getting this kind of data is not easy. But in the systems that do, we can see from the numbers that the 4 GPUs do not scale as well. I think this is a lesson that you will get diminishing returns trying to build super rigs. The DGX-2 Nvlink Switch connecting 16 GPUs is not just a cable, it is powered by its own processor guiding the system so that all the GPUs can talk to each other as fast as possible. Having tech like that is probably the only way to overcome this.

    Post edited by outrider42 on
  • outrider42outrider42 Posts: 2,907

    So I fired up my trusty 4.12.0.86 and the result is interesting indeed.

    2020-11-13 01:51:22.134 Total Rendering Time: 3 minutes 58.76 seconds

    2020-11-13 01:51:26.440 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-13 01:51:26.440 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce GTX 1080 Ti): 903 iterations, 6.132s init, 229.027s render

    2020-11-13 01:51:26.445 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce GTX 1080 Ti): 897 iterations, 6.437s init, 228.384s render

    This time here is basically what I used to get before driver 441.66. I ran this time super consistently. So it looks like new drivers have restored my speed. BUT it also looks like the updated Iray is ALSO a little bit faster. Now it is possible this could be just a change in what Iray considers to be an "iteration". But I do not believe so. The resulting images are identical. So this is quite an interesting result. If you lost some performance, then try updating your drivers. And maybe try the latest Daz Iray. I would love to know if others can get results. FYI, if anybody is concerned about updating, the beta can be installed side by side with the general release, so do not worry about it overwriting your current version. Currently the beta is exactly the same as the general release, too, so do not worry about it being a "beta". Each time the general release is updated, the beta is updated to that same version as well.

    This makes me wonder...might chrislb's monster rig score even faster with Daz 4.14? Is it possible???

  • nonesuch00nonesuch00 Posts: 15,308
    edited November 2020

    So I fired up my trusty 4.12.0.86 and the result is interesting indeed.

    2020-11-13 01:51:22.134 Total Rendering Time: 3 minutes 58.76 seconds

    2020-11-13 01:51:26.440 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-13 01:51:26.440 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce GTX 1080 Ti): 903 iterations, 6.132s init, 229.027s render

    2020-11-13 01:51:26.445 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce GTX 1080 Ti): 897 iterations, 6.437s init, 228.384s render

    This time here is basically what I used to get before driver 441.66. I ran this time super consistently. So it looks like new drivers have restored my speed. BUT it also looks like the updated Iray is ALSO a little bit faster. Now it is possible this could be just a change in what Iray considers to be an "iteration". But I do not believe so. The resulting images are identical. So this is quite an interesting result. If you lost some performance, then try updating your drivers. And maybe try the latest Daz Iray. I would love to know if others can get results. FYI, if anybody is concerned about updating, the beta can be installed side by side with the general release, so do not worry about it overwriting your current version. Currently the beta is exactly the same as the general release, too, so do not worry about it being a "beta". Each time the general release is updated, the beta is updated to that same version as well.

    This makes me wonder...might chrislb's monster rig score even faster with Daz 4.14? Is it possible???

    Well, my mouse rig sped up by 3 minutes:

    OLD:

    Specifications

    System/Motherboard: Gigabyte B450M DS3H WiFi
    CPU: AMD Ryzen 7 2700 32GB
    GPU: PNY GeForce GTX 1650 Super 4GB
    System Memory: 32 GB (2x16GB 2666 MHz) Patriot 
    OS Drive: Crucial 2TB Sata III SSD
    Operating System: Windows 10 build 2004 64bit
    Nvidia Drivers Version: Gaming 456.71
    Daz Studio Version: DAZ Studio Pro Public Beta 4.12.2.51
    Optix Prime Acceleration: N/A

    +++++ Benchmark +++++

    2020-10-14 18:27:08.309 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-10-14 18:27:08.309 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce GTX 1650 SUPER): 1800 iterations, 0.286s init, 908.227s render

    +++++ Benchmark +++++

    So the render took about 15 minutes & 8.227 seconds.

    NEW:

    Specifications

    System/Motherboard: Gigabyte B450M DS3H WiFi
    CPU: AMD Ryzen 7 2700 32GB
    GPU: PNY GeForce GTX 1650 Super 4GB
    System Memory: 32 GB (2x16GB 2666 MHz) Patriot 
    OS Drive: Crucial 2TB Sata III SSD
    Operating System: Windows 10 build 20H2 64bit
    Nvidia Drivers Version: Gaming 457.30
    Daz Studio Version: DAZ Studio Pro Public Beta 4.14.0.8
    Optix Prime Acceleration: N/A

    +++++ Benchmark +++++

    2020-11-13 09:31:31.505 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-13 09:31:31.505 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce GTX 1650 SUPER): 1485 iterations, 3.399s init, 719.417s render

    2020-11-13 09:31:31.505 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CPU: 315 iterations, 2.117s init, 721.770s render

    +++++ Benchmark +++++

    So the render took about 12 minutes & 1.770 seconds. (since the CPU rendering ran almost 2 seconds longer after the GPU rendering had finished).

    There has been about a 20% performance gain with DAZ Studio 4.14.0.8 compared to DAZ Studio 4.12.2.51.

    Post edited by nonesuch00 on
  • @chrislb My electrician finally made it around to my house, and I've got two additional 20A circuits, one for my existing rig, and one for the new rig. Because of my poor experience in the past with parts compatibility, I think I'm going to go with something as close to your exact 2 x 3090 setup (assuming I can get the parts) as I can get.

    But with the wisdom that you've gained from actually doing it, is there anything you would have done differently, had you known then what you know now? Any general system building pointers?

    Anything you can suggest would be appreciated.

  • I have a GTX 1050 Ti (which is terrible, im just w8ing for the 30 series to be available) and I cannot even get it to work. If I choose GPU only for rendering, not even the Iray preview loads. What can I do about that?
     

    A different wuestion: With which CPU and main board alongside the grafic card were the benmark tests made? Is that important or do cpu and main board not matter at all?

  • outrider42outrider42 Posts: 2,907

    I have a GTX 1050 Ti (which is terrible, im just w8ing for the 30 series to be available) and I cannot even get it to work. If I choose GPU only for rendering, not even the Iray preview loads. What can I do about that?
     

    A different wuestion: With which CPU and main board alongside the grafic card were the benmark tests made? Is that important or do cpu and main board not matter at all?

    Are your drivers up to date? The new Iray needs new drivers, get them from Nvidia, not Windows. The 1050ti is supported, so it should work. Besides being slow by modern standards, its biggest issue is VRAM. I am assuming it is 4gb, and that is hard to work with. It is possible you may not be able to run the Iray viewport simply because of the VRAM.

    RayDAnt has tested his scene on a MS Surface that has a 1050ti. The bench scene is designed to use only a small amount of VRAM so that lower capacity cards can still run it. So should work with that spec. But I don't know if he has used the Iray viewport on it. I would say it is possible, but honestly not advisable to use Iray viewport with a 4gb GPU today.

    With Iray you really don't need to worry about CPU spec much at all. The only concern is that your CPU can handle the Daz application itself, which is independent from Iray. So as long as Daz is running ok for you when you build your scenes, there is no need to upgrade CPU/motherboard for Iray rendering. Just get the best GPU you can. The only other spec to concern with is possibly system RAM, it needs to be enough to handle your your creations. You probably need around twice as much RAM as you do VRAM if you are going to utilize that VRAM, possibly a bit more. It just depends on what you want to do.

  • I ran a single 3090 with the new release version of Daz and saw an improvement voer the beta version.

     

    System Configuration

    System/Motherboard: MSI MEG X570 ACE

    CPU: AMD R9 3950X @ Stock with PBO +200

    GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

    System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

    OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

    Asset Drive: XPG SX 8100 NVMe SSD

    Operating System: Windows 10 Pro version 2004 Build 19041.450

    Nvidia Drivers Version: Version 457.30

    Daz Studio Version: 4.14.0.8

     

    Benchmark Results - One EVGA RTX 3090 FTW3 Ultra card only no CPU rendering

     

    2020-11-14 16:15:40.509 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Maximum number of samples reached.

    2020-11-14 16:15:41.079 Finished Rendering

    2020-11-14 16:15:41.117 Total Rendering Time: 1 minutes 37.16 seconds

    2020-11-14 16:15:48.002 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-14 16:15:48.003 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 1800 iterations, 1.372s init, 92.952s render

     

    Iteration Rate: (1800/92.952) = 19.365 iterations per second

    Loading Time: ((97.16 seconds) - 92.952) = 4.208 seconds

  • @chrislb My electrician finally made it around to my house, and I've got two additional 20A circuits, one for my existing rig, and one for the new rig. Because of my poor experience in the past with parts compatibility, I think I'm going to go with something as close to your exact 2 x 3090 setup (assuming I can get the parts) as I can get.

    But with the wisdom that you've gained from actually doing it, is there anything you would have done differently, had you known then what you know now? Any general system building pointers?

    Anything you can suggest would be appreciated.

     

    I'll send youa PM with more info.  Its probably too far off topic for here.

  • chrislb said:

     

    Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards only no CPU rendering

     

    2020-11-10 20:45:43.570 Finished Rendering

    2020-11-10 20:45:43.619 Total Rendering Time: 1 minutes 8.35 seconds

     

    2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 896 iterations, 1.863s init, 63.361s render

    2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 3090): 904 iterations, 1.812s init, 63.570s render

     

    Iteration Rate: (1800/63.570) = 28.315 iterations per second

    Loading Time: ((68.35 seconds) - 63.570) = 4.78 seconds

     

     

    The new version is faster than the beta version.

     

    System Configuration

     

    System/Motherboard: MSI MEG X570 ACE

    CPU: AMD R9 3950X @ Stock with PBO +200

    GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

    System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

    OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

    Asset Drive: XPG SX 8100 NVMe SSD

    Operating System: Windows 10 Pro version 2004 Build 19041.450

    Nvidia Drivers Version: Version 457.30

    Daz Studio Version: 4.14.0.8

     

    Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards only no CPU rendering

     

    2020-11-14 16:56:18.336 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Maximum number of samples reached.

    2020-11-14 16:56:18.985 Total Rendering Time: 56.81 seconds

    2020-11-14 16:56:23.050 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-14 16:56:23.050 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 920 iterations, 1.621s init, 52.099s render

    2020-11-14 16:56:23.051 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 3090): 880 iterations, 1.307s init, 52.028s render

     

    Iteration Rate: (1800/52.099) = 34.549 iterations per second

    Loading Time: ((56.81 seconds) - 52.099) = 4.711 seconds

  • outrider42outrider42 Posts: 2,907

    Whoa, the hype is real! That is a big jump...going from 28 to 35 iterations per second with two 3090s. That is FASTER than what you did with 3 GPUs when you added the 2080 Super to the mix, and that is only a bit over 2 iterations slower than when you had four GPUs together. You are getting performance right now that you needed a fairly large hardware upgrade to do just a few days ago, and all from just a software update. Not to mention the crazy amount of power draw that 4 GPUs took.

    This is really incredible and unexpected. We have not seen this kind of boost in performance from software alone aside from the addition of RTX support. But unlike that change this new speed is coming to everybody, not just RTX owners. I am truly curious as to what changes were made to Iray in this update. Perhaps this is the result of better optimization with OptiX, as it is still rather new for Iray. Or perhaps OptiX itself has improved, and this in turn trickled into Iray. 

    Regardless, I think the evidence is clear, it is time for people to update to 4.14. This performance gain is not trivial and I would expect it to scale up across the board. As we can see from chrislb, this can be a performance gain on par with upgrading the hardware itself, but unlike buying new hardware, this a free upgrade!

    I suppose one other thing to look at might be VRAM use, if 4.14 is using a different amount of VRAM compared to 4.12, that might be tied to the increase in performance.

  • I tried two more combinations

     

    System Configuration:

    System/Motherboard: MSI MEG X570 ACE

    CPU: AMD R9 3950X @ Stock with PBO +200

    GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) and EVGA RTX 2080 Super @ Stock speed and stock power limits

    System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

    OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

    Asset Drive: XPG SX 8100 NVMe SSD

    Operating System: Windows 10 Pro version 2004 Build 19041.450

    Nvidia Drivers Version: Version 457.30

    Daz Studio Version: 4.14.0.8

     

    Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards and two 2080 Super cards only no CPU rendering

     

    2020-11-14 17:28:14.542 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Maximum number of samples reached.

    2020-11-14 17:28:15.161 Total Rendering Time: 42.8 seconds

    2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 219 iterations, 2.958s init, 35.883s render

    2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 3 (GeForce RTX 2080 SUPER): 219 iterations, 2.797s init, 36.047s render

    2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 677 iterations, 1.627s init, 37.237s render

    2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 3090): 685 iterations, 1.748s init, 36.907s render

     

    Iteration Rate: (1800/37.237) = 48.3390 iterations per second

    Loading Time: ((42.8 seconds) - 37.237) = 5.563 seconds

     

     

    Benchmark Results - One EVGA RTX 3090 FTW3 Ultra cards and two 2080 Super cards only no CPU rendering

     

    2020-11-14 17:41:34.711 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Maximum number of samples reached.

    2020-11-14 17:41:35.338 Total Rendering Time: 1 minutes 2.35 seconds

    2020-11-14 17:41:36.943 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2020-11-14 17:41:36.943 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2080 SUPER): 356 iterations, 3.158s init, 56.002s render

    2020-11-14 17:41:36.943 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 353 iterations, 2.771s init, 55.775s render

    2020-11-14 17:41:36.943 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 3090): 1091 iterations, 2.051s init, 56.615s render

     

    Iteration Rate: (1800/56.615) = 31.793 iterations per second

    Loading Time: ((62.5 seconds) - 356.615) = 5.885 seconds

  • outrider42outrider42 Posts: 2,907

    Holy crap, it went from 37 to over 48 iterations per second, that is a monstrous increase. The amount of performance you gained is more than a lot of people's total performance. Just think about that. to put that in perspective, a 3080 ran the bench at 12 iterations. This performance gain is like adding a 5th GPU...a 3080 class GPU...to your rig. This is seriously impressive, and the 4 GPU setup was already impressive!

    Since you have access, would you mind running a single 2080 Super so we can compare it to its previous time? This would also be helpful for looking at scaling. Since we have a single 3090 on 4.14, we would be able to calculate a theoretical peak, and compare that number to what you actually scored to see how many iterations may be lost. For all we know, the scaling might just have improved as well.

    We are going to have to draw a hard line on the benchmark chart now. I think we should carefully note all benchmarks that come after 4.14 because the gap is so big that it may confuse people who compare a 4.14 bench to a 4.12 bench, they are simply not comparable anymore. Obviously benches from one version to the next are generally not comparable, but in practice they really have been consistent over time, with only minor differences aside from when Iray switched to OptiX and RTX support.

    I am surprised that Daz has not advertised this speed upgrade. Advertising a speed increase in the update would get a lot of people on board the choo choo train.

Sign In or Register to comment.