RTX Benchmark thread...show me the power

Takeo.KenseiTakeo.Kensei Posts: 1,303
edited August 2019 in The Commons

Hi everybody. Since RTX card capability seems still nebulous to many, here is benchmark that should really show what can be expected

Scene design is simple. If you want to see the power of these cards, put a lot of geometry in front of the camera. That's where RTcore will shine. There is not even a need for something heavy, a dense hair strand does the job

 

Preliminary Results :

 

A prototype of this scene was benchmarked with DS 4.12 at around 1 minute for a RTX 2060 vs 4 minutes for a GTX 1070 (my thanks to timon630 and LenioTG )

Scene Compatibility :

The benchmark should run without problem on DS 4.11 and 4.12. I can load the scene in 4.10 but my RTX is not compatible with that DS version, so I can't be 100% affirmative, but it should work there too, for people who wish to compare between different DS flavor

The render time shouldn't be too extreme even for CPUs. The benchmark is configured for 600 iterations to get enough data for a reliable mean value, but a 100 or 200 iteration run should also give a good rough estimation for calculations (see below)

Benchmark Procedure :

Prior to loading the scene, please check the following parameters :

1°/ Edit -> Preference -> Uncheck all three "ignore setting" for the Backdrop and Render settings

2°/ Load the scene then hit render

3°/ Wait for the render to complete, then check the Log file through Help -> Troubleshooting -> View log file at the end

The result will look like this. Rendertime is the last number

CUDA device 0 (GeForce RTX 2060):      600 iterations, 21.108s init, 63.782s render

Results interpretation :

The scene is sized 1024x1024 pixels. Render Quality was set to a high number so that no pixel would converge and that the same number of samples would be calculated for each iteration

For each iteration, 1024x1024 = 1 Megasamples are calculated

So, for this benchmark, my card average performance is 600 / 63.782 = 9.4 Megasamples/s on DS 4.12

For comparison sake, I get around 3.2 Megasamples/s with DS 4.11.

These numbers are only representative of this specific scene, and for this specific camera view. I don't know if that number could be useful for something else than benchmarking

Reporting rules:

I don't see the need to ask for a lot of details. Here is what I think would be good for clear readability

[Hardware]  [DS Version] [Optix Prime ON/OFF] [Driver version] [Render Time] [Performance]

For my own result that should give

[RTX 2060] [DS 4.12.0.33] [435.80_gameready_win10_64bit_international] [63.782s] [9.4 Ms/s]  (NB : Optix ON/OFF makes no sense for RTX card on 4.12)

As I said above, the iteration count can be changed and still give a good estimation. In that case, the number of iterations should be mentioned. Ex for DS 4.11 and a 200 iterations only :

[RTX 2060] [DS 4.11.0.383] [Optix Prime ON] [435.80_gameready_win10_64bit_international] [200 Ite] [63.019s] [3.17 Ms/s]

So I get roughly a 3x speedup with RTcores

Thread Rule :

To keep it a benchmark thread, technical discussions should be avoided

 

Now enough talk. Download the scene below and run the bench !

https://www.sendspace.com/file/36vr07

 

 

Post edited by Takeo.Kensei on
«1

Comments

  • Takeo.KenseiTakeo.Kensei Posts: 1,303

    * Reserved

  • Jack TomalinJack Tomalin Posts: 9,495

    I get a ton of missing morphs from that scene, you might want try and simplify it some..

    /data/DAZ 3D/Genesis 2/Female/Morphs/DAZ 3D/Evolution Body/ etc etc

     

  • Takeo.KenseiTakeo.Kensei Posts: 1,303
    edited August 2019

    OK, thanks. I'll check that but that shouldn't impact the benchmark numbers

    * scene modification right now. Download link removed

    * edit 2 : New scene online. Shouldn't give anymore errors

    Post edited by Takeo.Kensei on
  • Jack TomalinJack Tomalin Posts: 9,495
    edited August 2019

    [RTX 2080Ti x 3] [DS 4.12.0.47] [430.86_studio_win10_64bit_international] [12.218s] [49.11 Ms/s]

    Hope that's right

    Post edited by Jack Tomalin on
  • Takeo.KenseiTakeo.Kensei Posts: 1,303
    edited August 2019

    Pfff. You have some power lol

    *Edit : Could you please bench just one card? I think that it would be interesting for potential buyers and we can also see if the number of card scaling is linear. Thanks

    Post edited by Takeo.Kensei on
  • Jack TomalinJack Tomalin Posts: 9,495

    [RTX 2080Ti] [DS 4.12.0.47] [430.86_studio_win10_64bit_international] [29.410s] [20.401 Ms/s]

  • Jack TomalinJack Tomalin Posts: 9,495

    Ran them all a few times, and the best time I got for each was..

    3 cards, 11.388s

    2 cards 14.942s

    1 card 28.534s

    So make of that what you will.

  • dougjdougj Posts: 60

    [RTX 2060]  [DS 4.12.0.47] [Optix Prime ON] [431.70] [63.51s]

     

     

     

  • nicsttnicstt Posts: 8,850

    That's impressive. I compared the 2060 to my 980ti and the 980ti was almost 4 times slower

    Ran them all a few times, and the best time I got for each was..

    3 cards, 11.388s

    2 cards 14.942s

    1 card 28.534s

    So make of that what you will.

    Intesting, results Jack; a second card offers a huge boost, whereas the 3rd card relatively little.

    I wonder if a more taxing and longer render would offer similar ratios.

  • Richard HaseltineRichard Haseltine Posts: 57,760
    edited August 2019
    nicstt said:

    That's impressive. I compared the 2060 to my 980ti and the 980ti was almost 4 times slower

    Ran them all a few times, and the best time I got for each was..

    3 cards, 11.388s

    2 cards 14.942s

    1 card 28.534s

    So make of that what you will.

    Intesting, results Jack; a second card offers a huge boost, whereas the 3rd card relatively little.

    I wonder if a more taxing and longer render would offer similar ratios.

    The first cuts render time to nearly half (14.92s instead of 14.265s), the second isn't that bad (11.388s instead of 9.613s, or you could argue for 10.43 as the expected time comparing with the gain from the second card) but not as big a gain.

    Post edited by Richard Haseltine on
  • RobinsonRobinson Posts: 276

    1 x 2070 (Armor) 51.26s.

  • downloading. at the rate limit of 80kb/s i suspect the render will go faster.

  • i don't trust this stuff anymore. jack's comp is probably way more powerful than mine and yet I am getting this ridiculous result:

    2019-08-07 20:23:13.779 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2080 Ti): 291 iterations, 0.426s init, 9.731s render
    2019-08-07 20:23:13.780 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 2080 Ti): 309 iterations, 0.386s init, 9.770s render

    seriously, my mb is 5 years old, i am dual 16x running in 8x mode and i am air cooled. i call bs.

  • RayDAntRayDAnt Posts: 584
    edited August 2019

    Titan RTX, 4.11.0.383 x64, Optix Prime ON, 431.60 WDDM, 70.724 = 08.484 mega-samples per second

    Titan RTX, 4.12.0.047 x64, OptiX Prime NA, 431.60 WDDM, 26.797 = 22.391 mega-sampels per second

    So that would be an apparent speed increase of 2.639 times with RTCore support for this scene.

    Will post/update with results for Titan RTX with drivers in TCC mode once I get around to it (tbh TCC mode is quite a hastle to deal with in a single-card system.)

     

    Render Quality was set to a high number so that no pixel would converge and that the same number of samples would be calculated for each iteration

    For what it's worth, you can achieve the same effect as setting "Render Quality" to a very high value much more cleanly by just setting "Rendering Quality Enable" to false. Doing this is effectively the same thing as setting "Render Quality = infinity" since it tells Iray to stop tracking image convergence as a completion metric eniterly - thereby assuring iteration/sample calculations remain completely undisturbed from how they are otherwise specified. 

    Also for what it's worth, there is need to worry about "Render Quality" affecting the number of pixel samples calculated per iteration in an Iray photoreal render. As part of its legacy as a true unbiased renderer Iray's photoreal mode makes no use of any sort of per pixel level variable rate shading mechanism in its resultant renders - meaning that every iteration always consists of exactly the same number of pixel samples (eg. 1024 * 1024 = 1,048,576 individual pixel samples) being freshly calculated inside of each and every iteration regardless of what "Render Quality" setting is currently being used.

    Post edited by RayDAnt on
  • Jack TomalinJack Tomalin Posts: 9,495

    i don't trust this stuff anymore. jack's comp is probably way more powerful than mine and yet I am getting this ridiculous result:

    2019-08-07 20:23:13.779 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2080 Ti): 291 iterations, 0.426s init, 9.731s render
    2019-08-07 20:23:13.780 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 2080 Ti): 309 iterations, 0.386s init, 9.770s render

    seriously, my mb is 5 years old, i am dual 16x running in 8x mode and i am air cooled. i call bs.

    For some reason it stopped at 309 iterations, instead of 600?

  • i don't trust this stuff anymore. jack's comp is probably way more powerful than mine and yet I am getting this ridiculous result:

    2019-08-07 20:23:13.779 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2080 Ti): 291 iterations, 0.426s init, 9.731s render
    2019-08-07 20:23:13.780 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 2080 Ti): 309 iterations, 0.386s init, 9.770s render

    seriously, my mb is 5 years old, i am dual 16x running in 8x mode and i am air cooled. i call bs.

    For some reason it stopped at 309 iterations, instead of 600?

    291+309=600

  • Jack TomalinJack Tomalin Posts: 9,495

    i don't trust this stuff anymore. jack's comp is probably way more powerful than mine and yet I am getting this ridiculous result:

    2019-08-07 20:23:13.779 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2080 Ti): 291 iterations, 0.426s init, 9.731s render
    2019-08-07 20:23:13.780 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 2080 Ti): 309 iterations, 0.386s init, 9.770s render

    seriously, my mb is 5 years old, i am dual 16x running in 8x mode and i am air cooled. i call bs.

    For some reason it stopped at 309 iterations, instead of 600?

    291+309=600

    Ah I missed the two cards.. welp, I don't know either.  I ran it again and it's 16.275 now. So, yea.. who knows :D

  • RayDAntRayDAnt Posts: 584
    edited August 2019

    i don't trust this stuff anymore. jack's comp is probably way more powerful than mine and yet I am getting this ridiculous result:

    2019-08-07 20:23:13.779 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce RTX 2080 Ti): 291 iterations, 0.426s init, 9.731s render
    2019-08-07 20:23:13.780 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 2080 Ti): 309 iterations, 0.386s init, 9.770s render

    seriously, my mb is 5 years old, i am dual 16x running in 8x mode and i am air cooled. i call bs.

    Perhaps you have "Max Path Length" set to something other than zero in your render settings or one of the Preferences menu options Takeo.Kensai mentioned up top set? In my experience DS tends to be iffy about which options do/don't change/reset themselves upon the loading of a new scene. So if you've spent time experimenting with different settings in the past it might be worth it to do a reset to defaults under the Preferences menu or even delete the Beta's AppData folder and do a fresh install.

    Post edited by RayDAnt on
  • Doc AcmeDoc Acme Posts: 393

    Hope this isn't too verbose:

    2X Quadro RTX 4000', DS4.11, Optix OFF,  nVidia 431.02

    2019-08-08 15:44:23.316 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : Maximum number of samples reached.
    2019-08-08 15:44:23.626 Saved image: C:\Users\16267\AppData\Roaming\DAZ 3D\Studio4\temp\render\r.png
    2019-08-08 15:44:23.635 Finished Rendering
    2019-08-08 15:44:23.686 Total Rendering Time: 2 minutes 16.46 seconds
    2019-08-08 15:44:23.709 Loaded image r.png
    2019-08-08 15:44:23.780 Saved image: C:\Users\16267\AppData\Roaming\DAZ 3D\Studio4\temp\RenderAlbumTmp\Render 1.jpg
    2019-08-08 15:44:36.271 Saved image: D:\Rendered Output\Daz Stills\RTXTest.png
    2019-08-08 15:44:36.301 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : Device statistics:
    2019-08-08 15:44:36.301 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 1 (Quadro RTX 4000):      209 iterations, 2.692s init, 131.220s render
    2019-08-08 15:44:36.301 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 0 (Quadro RTX 4000):      213 iterations, 2.674s init, 130.524s render
    2019-08-08 15:44:36.311 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CPU:      178 iterations, 2.484s init, 131.168s render

     

  • outrider42outrider42 Posts: 2,194

    2x 1080ti   Daz 4.12   OptiX OFF

    2019-08-08 20:07:52.514 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 0 (GeForce GTX 1080 Ti): 303 iterations, 5.295s init, 62.937s render

    2019-08-08 20:07:52.514 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 1 (GeForce GTX 1080 Ti): 297 iterations, 5.570s init, 63.119s render

    4.12  Optix ON

    2019-08-08 20:27:41.406 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : Device statistics:

    2019-08-08 20:27:41.416 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 0 (GeForce GTX 1080 Ti): 306 iterations, 5.456s init, 63.425s render

    2019-08-08 20:27:41.416 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 1 (GeForce GTX 1080 Ti): 294 iterations, 5.825s init, 62.587s render

    The scene does NOT work in 4.10, it does load, but the dforce strand base hair does not show up at all (she is bald). Strand based hair only works properly in 4.11.0.366 and above, so no version below this can be used for benchmarking this scene that contains strand hair. You may want to make a note of this in the opening post.

    BTW, this is what the image looks like in case anybody is wondering. Is this how it is supposed to look?

  • Doc AcmeDoc Acme Posts: 393

    That's what I saw.

  • RayDAntRayDAnt Posts: 584

    As promised:

    Titan RTX, 4.11.0.383 x64, Optix Prime ON, 431.60 WDDM, 70.724 = 08.484 mega-samples per second
    Titan RTX, 4.12.0.047 x64, OptiX Prime NA, 431.60 WDDM, 26.797 = 22.391 mega-sampels per second
    For an apparent speed increase of 2.639 times

    vs

    Titan RTX, 4.11.0.383 x64, Optix Prime ON, 431.60 TCC, 68.862 = 08.713 mega-samples per second
    Titan RTX, 4.12.0.047 x64, OptiX Prime NA, 431.60 TCC, 25.443 = 23.582 mega-samples per second
    for an apparent speed increase of 2.707 times

  • EBF2003EBF2003 Posts: 27

    2 x 1070ti optix on 4.12 studio drivers 430.86

    2019-08-10 01:13:53.113 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2019-08-10 01:13:53.113 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce GTX 1070 Ti): 215 iterations, 3.078s init, 63.189s render

    2019-08-10 01:13:53.113 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce GTX 1070 Ti): 213 iterations, 3.337s init, 63.258s render

    2019-08-10 01:13:53.113 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CPU: 172 iterations, 2.814s init, 63.735s render

  • RobinsonRobinson Posts: 276
    EBF2003 said:
    2019-08-10 01:13:53.113 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CPU: 172 iterations, 2.814s init, 63.735s render

    Interesting.  So 1 x 2070 is twice as fast as 2 x 1070 (1 x 2070 51.26s).

  • outrider42outrider42 Posts: 2,194
    Robinson said:
    EBF2003 said:
    2019-08-10 01:13:53.113 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CPU: 172 iterations, 2.814s init, 63.735s render

    Interesting.  So 1 x 2070 is twice as fast as 2 x 1070 (1 x 2070 51.26s).

    No, take notice they did not do 600 iterations. The CPU added 174 iterations during this run, which significantly helped.

    I would like to see EBF run the bench again with only the 1070tis.
  • EBF2003EBF2003 Posts: 27
    Robinson said:
    EBF2003 said:
    2019-08-10 01:13:53.113 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CPU: 172 iterations, 2.814s init, 63.735s render

    Interesting.  So 1 x 2070 is twice as fast as 2 x 1070 (1 x 2070 51.26s).

     

    No, take notice they did not do 600 iterations. The CPU added 174 iterations during this run, which significantly helped.

     

    I would like to see EBF run the bench again with only the 1070tis.

    2019-08-10 16:23:15.623 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2019-08-10 16:23:15.623 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce GTX 1070 Ti): 305 iterations, 3.121s init, 88.371s render

    2019-08-10 16:23:15.623 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 1 (GeForce GTX 1070 Ti): 295 iterations, 3.720s init, 87.942s render

    2019-08-10 16:23:18.384 Iray [INFO] - IRT:RENDER ::   1.0   IRT    rend info : Resource assignment for host 0 has changed.

  • RobinsonRobinson Posts: 276
    edited August 2019
    No, take notice they did not do 600 iterations. The CPU added 174 iterations during this run, which significantly helped.

    I'm not going to lie.  This thread is so confusing.  Just render the scene and tell us how long it took and what the setup was.  Shesh.

    Post edited by Robinson on
  • outrider42outrider42 Posts: 2,194
    Robinson said:
    No, take notice they did not do 600 iterations. The CPU added 174 iterations during this run, which significantly helped.

    I'm not going to lie.  This thread is so confusing.  Just render the scene and tell us how long it took and what the setup was.  Shesh.

    Perhaps they did not know it was on? I don't know. It got my attention because that time is like my 2x 1080tis, so I knew something had to be fishy as there is no way 2x 1070ti should match that. Then I saw the line with the CPU. This is actually pretty interesting because most of the time the CPU does't add much to Iray. But in this case whatever CPU it is did quite a bit of work and made a big impact on the final time.

    Most of the people would even go as far as saying buying a top end CPU is a waste if you are doing Iray with GPU. This might suggest otherwise.

    So what kind of CPU are you using there, EBF2003? Is this a new AMD Ryzen?

  • EBF2003EBF2003 Posts: 27
    Robinson said:
    No, take notice they did not do 600 iterations. The CPU added 174 iterations during this run, which significantly helped.

    I'm not going to lie.  This thread is so confusing.  Just render the scene and tell us how long it took and what the setup was.  Shesh.

    Perhaps they did not know it was on? I don't know. It got my attention because that time is like my 2x 1080tis, so I knew something had to be fishy as there is no way 2x 1070ti should match that. Then I saw the line with the CPU. This is actually pretty interesting because most of the time the CPU does't add much to Iray. But in this case whatever CPU it is did quite a bit of work and made a big impact on the final time.

    Most of the people would even go as far as saying buying a top end CPU is a waste if you are doing Iray with GPU. This might suggest otherwise.

    So what kind of CPU are you using there, EBF2003? Is this a new AMD Ryzen?

    it is a xeon E5 2696 v4 i got from ebay couple years ago in a sabertooth x99 motherboard

  • RayDAntRayDAnt Posts: 584
    edited August 2019
    Robinson said:
    No, take notice they did not do 600 iterations. The CPU added 174 iterations during this run, which significantly helped.

    I'm not going to lie.  This thread is so confusing.  Just render the scene and tell us how long it took and what the setup was.  Shesh.

    Perhaps they did not know it was on? I don't know. It got my attention because that time is like my 2x 1080tis, so I knew something had to be fishy as there is no way 2x 1070ti should match that. Then I saw the line with the CPU. This is actually pretty interesting because most of the time the CPU does't add much to Iray. But in this case whatever CPU it is did quite a bit of work and made a big impact on the final time.

    Most of the people would even go as far as saying buying a top end CPU is a waste if you are doing Iray with GPU. This might suggest otherwise.

    So what kind of CPU are you using there, EBF2003? Is this a new AMD Ryzen?

    Can't quote exact gospel/verse on it at the moment, but if you go and read Iray's official documentation on how it handles load balancing, it has a mechanism where if a single Cuda device out of multiple active Cuda devices during a Photoreal render takes significantly longer than the others to transmit its assigned portion of converged pixels back for inclusion in the central framebuffer, Iray's scheduler assumes that something is wrong with that device and automatically RE-assigns it's current workload to the other Cuda devices in the system. What this effectively means is that once you get beyond a certain rendering performance difference between your CPU and GPU(s), Rendering WITH your CPU results in WORSE overall rendering performance than without since - unbeknownst to you (there is never any indication of any of this in the log file) your fast GPUs are constantly being tasked with double processing data that your CPU is already processing. Hence why EB2003 gets BETTER performance with his Xeon + 1070s. Whereas I, with my 8700K + Titan RTX, get WORSE performance with CPU also enabled.

    Post edited by RayDAnt on
Sign In or Register to comment.