RTX Benchmark thread...show me the power

2

Comments

  • outrider42outrider42 Posts: 3,679
    RayDAnt said:
    Robinson said:
    No, take notice they did not do 600 iterations. The CPU added 174 iterations during this run, which significantly helped.

    I'm not going to lie.  This thread is so confusing.  Just render the scene and tell us how long it took and what the setup was.  Shesh.

    Perhaps they did not know it was on? I don't know. It got my attention because that time is like my 2x 1080tis, so I knew something had to be fishy as there is no way 2x 1070ti should match that. Then I saw the line with the CPU. This is actually pretty interesting because most of the time the CPU does't add much to Iray. But in this case whatever CPU it is did quite a bit of work and made a big impact on the final time.

    Most of the people would even go as far as saying buying a top end CPU is a waste if you are doing Iray with GPU. This might suggest otherwise.

    So what kind of CPU are you using there, EBF2003? Is this a new AMD Ryzen?

    Can't quote exact gospel/verse on it at the moment, but if you go and read Iray's official documentation on how it handles load balancing, it has a mechanism where if a single Cuda device out of multiple active Cuda devices during a Photoreal render takes significantly longer than the others to transmit its assigned portion of converged pixels back for inclusion in the central framebuffer, Iray's scheduler assumes that something is wrong with that device and automatically RE-assigns it's current workload to the other Cuda devices in the system. What this effectively means is that once you get beyond a certain rendering performance difference between your CPU and GPU(s), Rendering WITH your CPU results in WORSE overall rendering performance than without since - unbeknownst to you (there is never any indication of any of this in the log file) your fast GPUs are constantly being tasked with double processing data that your CPU is already processing. Hence why EB2003 gets BETTER performance with his Xeon + 1070s. Whereas I, with my 8700K + Titan RTX, get WORSE performance with CPU also enabled.

    Not trying to get too far off here, but the results of the Xeon are pretty interesting.

    Having a slower device dropped makes sense, but then again it doesn't make sense with the different possible GPU combinations out there. There are people who have used the bottom of the barrel GPUs combined with high end GPUs that offered even greater gaps in performance. SY had a GTX 740 with a 980ti and they all played nice together. The 740 is garbage by any standard and there are CPUs that are faster than it, especially today. That's just one example. I had a 1080ti paired with a 970, and one time a 670. I could test that 670 out for kicks to see if it still works with a 1080ti.

    I am surprised how well the 22 core Xeon hangs with the 1070tis. Its not really that much slower than a single 1070ti looking at the iteration counts. 1070tis are pretty solid cards for Iray, they are right there with 1080s in performance. It makes me VERY curious how well the new Ryzens handle Iray since they are clocked so much higher than any Xeon while still packing a good number of cores, core that have much better IPC than before.

    I would live to see a Ryzen 12 core 3900X paired with a 2080ti or any GPU for that matter and see how they do.

  • RayDAntRayDAnt Posts: 1,120
    edited August 2019
    RayDAnt said:
    Robinson said:
    No, take notice they did not do 600 iterations. The CPU added 174 iterations during this run, which significantly helped.

    I'm not going to lie.  This thread is so confusing.  Just render the scene and tell us how long it took and what the setup was.  Shesh.

    Perhaps they did not know it was on? I don't know. It got my attention because that time is like my 2x 1080tis, so I knew something had to be fishy as there is no way 2x 1070ti should match that. Then I saw the line with the CPU. This is actually pretty interesting because most of the time the CPU does't add much to Iray. But in this case whatever CPU it is did quite a bit of work and made a big impact on the final time.

    Most of the people would even go as far as saying buying a top end CPU is a waste if you are doing Iray with GPU. This might suggest otherwise.

    So what kind of CPU are you using there, EBF2003? Is this a new AMD Ryzen?

    Can't quote exact gospel/verse on it at the moment, but if you go and read Iray's official documentation on how it handles load balancing, it has a mechanism where if a single Cuda device out of multiple active Cuda devices during a Photoreal render takes significantly longer than the others to transmit its assigned portion of converged pixels back for inclusion in the central framebuffer, Iray's scheduler assumes that something is wrong with that device and automatically RE-assigns it's current workload to the other Cuda devices in the system. What this effectively means is that once you get beyond a certain rendering performance difference between your CPU and GPU(s), Rendering WITH your CPU results in WORSE overall rendering performance than without since - unbeknownst to you (there is never any indication of any of this in the log file) your fast GPUs are constantly being tasked with double processing data that your CPU is already processing. Hence why EB2003 gets BETTER performance with his Xeon + 1070s. Whereas I, with my 8700K + Titan RTX, get WORSE performance with CPU also enabled.

    Not trying to get too far off here, but the results of the Xeon are pretty interesting.

    Having a slower device dropped makes sense, but then again it doesn't make sense with the different possible GPU combinations out there. There are people who have used the bottom of the barrel GPUs combined with high end GPUs that offered even greater gaps in performance. SY had a GTX 740 with a 980ti and they all played nice together. The 740 is garbage by any standard and there are CPUs that are faster than it, especially today. That's just one example.

    Here's the pull quote (this comes from the most recent version of Nvidia's high-level Iray design document The Iray Light Transport Simulation and Rendering System):

    To eliminate these issues in batch scheduling, all devices render asynchronously (see Fig. 13). Each device works on a local copy of the full framebuffer. The low discrepancy sequences are partitioned in iterations of one sample per pixel. Sets of iterations are assigned dynamically to the devices. This way a high per device workload is maintained, scaling to many devices. To prevent congestion, the device framebuffer content is asynchronously merged into the host framebuffer in a best effort fashion. If a device finishes an iteration set and another device is busy merging, the device will postpone merging and continue rendering its next assigned iteration set. This mechanism could lead to starvation, possibly blocking some devices from ever merging, and increase the potential of loss of progress in case of device failure. Iray resolves this issue by also skipping merging if a device has less than half as many unmerged iterations as any other device on that host, allowing the other device to merge first. (p. 23-24)

    Assuming you mean Sickleyield's OG benchmarking scores, those were done with two 740s and two 980Ti's in the same system. Meaning that the above described anomalous behavior wouldn't be in effect since this is something that only happens when a single Cuda device in a system is severly outgunned by all other active render devices in that system. As was indeed found to be the case even for Sicklyield back then when doing those same tests with CPU enabled in that same system (see here.) My guess is that any other card pairings which have cropped up over the years to seemingly belie this behavior also have similar subtleties to them. 

     

    I had a 1080ti paired with a 970, and one time a 670. I could test that 670 out for kicks to see if it still works with a 1080ti.

    Yeah, if it isn't too much trouble - would love to hear what results you get. Although some quick Googling tells me that a 1080Ti is generically around 3-4x more powerful than a GTX 670. Which may not be a big enough difference for this effect to manifest. To put things into perspective, the rendering performance difference between the single slowest Cuda capable device and any others in my system that does get effected by this (i7-8700K paired with a single Titan RTX) is around 16X. Ie. about the same as what you should see with those two cards SQUARED.

    Post edited by RayDAnt on
  • Dim ReaperDim Reaper Posts: 687
    edited August 2019

    I downloaded the benchmark scene from the link on the first page and I'm getting a lot of missing morphs - looks like the same ones that Jack Tomalin mentioned.

    Regardless, since it is unlikely to affect the render time to any great extent when comparing the render in two version of DS, I ran the scene in DS 4.11 and DS 4.12.047

     

    RTX 2080Ti, 4.11.0.383 x64, Optix Prime ON, 431.60 Game-ready driver, 600/77.636 = 7.7 mega-samples per second

    RTX 2080Ti, 4.12.0.047 x64, OptiX Prime NA, 431.60 Game-ready driver, 600/28.845 = 20.8 mega-samples per second

     

    After running these and posting, I remembered the speed increase between DS 4.11 and 4.12 on all cards, and so decided to run the scene again on a 1080Ti.

    GTX 1080Ti, 4.11.0.383 x64, Optix Prime ON, 431.60 Game-ready driver, 600/133.071 = 4.5 mega-samples per second

    GTX 1080Ti, 4.12.0.047 x64, Optix Prime ON, 431.60 Game-ready driver, 600/122.994 = 4.8 mega-samples per second

    The two results are showing a big difference between the RTX-enabled version of iray with an RTX card, compared to a GTX card.

     

    Post edited by Dim Reaper on
  • 3djoji3djoji Posts: 1,075
    edited August 2019

    I just find this post, making the render on my PC - DS 4.12 - with a  Quadro RTX 5000 (studio driver), Optix ON, CPU selected. Results : 600 iterations in 38.933s / 15,411 megasamples per second

    Same config but Optix Off : 38.58s / 15.55 megasamples per second

    I made some more tries : CPU only (Intel 6136 Gold - 23 CPU cores are used during rendering) : 4' 57.73s (297.73 s) / 2,015 megasamples per second

    When connecting my eGPU in the thunderbold3, I get a Titan XP StarWar (studio driver) series added. In that case, the result is 33.44 s / 17,94 megasamples per second

    Same config but Optix Off : 32.72s / 18.33 megasamples per second

    I was surprised and I just tried CPU+TitanXP and the result is 1' 24.68s (84.68s) / 7,085 megasamples per second

    not bad at all.

    Post edited by 3djoji on
  • nicsttnicstt Posts: 11,714
    nicstt said:

    That's impressive. I compared the 2060 to my 980ti and the 980ti was almost 4 times slower

    Ran them all a few times, and the best time I got for each was..

    3 cards, 11.388s

    2 cards 14.942s

    1 card 28.534s

    So make of that what you will.

    Intesting, results Jack; a second card offers a huge boost, whereas the 3rd card relatively little.

    I wonder if a more taxing and longer render would offer similar ratios.

    The first cuts render time to nearly half (14.92s instead of 14.265s), the second isn't that bad (11.388s instead of 9.613s, or you could argue for 10.43 as the expected time comparing with the gain from the second card) but not as big a gain.

    Good to know.

  • JD_MortalJD_Mortal Posts: 758
    edited September 2019

    Titan-Xp Col Edi, 4.11.0.383 x64, Optix OFF, GRD 436.15, WDDM, 401.184sec = 01.496 mega-samples/sec

    Titan-V, 4.11.0.383 x64, Optix OFF, GRD 436.15, WDDM, 236.997sec = 02.532 mega-samples/sec

    Intel i9-7980XE (16core/32threads), 4.12.0.67 x64Using Embree (2.8.0), NA, NA, 191.820sec = 03.127 mega-samples/sec

    Titan-Xp Col Edi, 4.11.0.383 x64Optix ON, GRD 436.15, WDDM, 118.952sec = 05.044 mega-samples/sec

    Titan-Xp Col Edi, 4.12.0.67 x64Optix ON, GRD 436.15, WDDM, 110.376sec = 05.436 mega-samples/sec

    Titan-V, 4.11.0.383 x64, Optix ON, GRD 436.15, WDDM, 76.910sec = 07.801 mega-samples/sec

    Titan-V, 4.12.0.67 x64Optix ON, GRD 436.15, WDDM, 74.143sec = 08.092 mega-samples/sec

    Same test, using 4.12-beta, Optix ON, and all four cards. (2x Titan-V, 2x Titan-Xp Col Edi)...23.824sec = 25.165 mega-samples/sec

    Same test, using 4.12-betaOptix ON, and all four cards and my CPU. (2x Titan-V, 2x Titan-Xp Col Edi)...21.744sec = 27.594 mega-samples/sec

    However, I am sure the variation between Optix on/off runs are from the fact that the "hair" is a collection of "instances" or "clones". Apparently half the hairs are clones, or more. Exactly what Optix is designed to accelerate. I say this because my CPU beat my Titan-V and my Titan-Xp Col Edi cards, by using OptiX, in the 4.12-beta test. It wasn't too far behind the OptiX results for the cards, but normally my CPU is 1/10th the speed of my GPU. Might also be that RTX cards are just not as burdened as the single pipe-line of non-RTX cards.

    My CPU has 0 cuda cores and 0 RTX/Volta cores. But mine does have a lot of cores/threads (16/32) to process clones of "lines". This CPU doesn't even have a built-in video-card chip, so it isn't getting any processing power from that. (Like an i3/i5/i7 would, with the built-in INTEL video, which has OpenGL accelerators and DX12 acceleration.)

    I removed everything in the scene except the one camera used and the hair... (For the test below)

    Just the hair alone, on 4.12-Beta, Optix ONusing all cards and CPU... 18.409sec = 32.593 mega-samples/sec

    The results are only 3 seconds behind. All that other stuff in the scene was nearly irrelevant. I am sure the model was the majority of the additional 3 seconds to the render time. (The eyes are the biggest time killer on Daz models.) This is one of those things, I was talking about in another thread, where Daz needs to look into, to see why we get results like this, and fix the issue. Yes, RTX is faster doing this, but that illusion is due to the fact that non-RTX cards, but not CPUs, are burdened by trying to render hair like this.

    I would bet that an AMD Threadripper, with 32/64 cores/threads would finally beat my CPU on this render... I am sure "LAMH" was originally only OpenGL, then converted to IRAY, but maybe it broke along the way and nVidia, like they did with "Quake" and other games, ultimately fixed the "bad conversion code", but only for newer cards. (Quake had horrible coding that they couldn't fix, so nVidia fixed the code for them, in the game drivers. That's old news, but a common thing to do, as they did with many other games and still do.)

    Who knows, maybe now, with RTX, "Look at my hair" may finally make some sales and produce better looking hair!

    Eg, Make bigger hair, like 1980's bigger hair... Trillobites. Big fuzzy balls, all over!

    Is RTX replacing Optix? Is that why it isn't going to be selectable anymore? (Or why the selection-box seems to be broken?)

    Sad that it looks like it takes all four of my cards to render this close to ONE RTX card... Then I look at the hair and realize that I'm not missing out on much. :P

    RTX is going to produce some awesome IRAY scenery, once people figure out how to create object instances in Daz3D, and have them actually work, like the hair does. Grass, Hair, Trees, Flowers, Clouds, Mountains, Volumetric blisters of fog, Rocks, Walls, Roads, Sidewalks, Fences... All instanced but still unique variations.

    Nice find, for a benchmark test. Next time just make a ball of hair. That low-polly sphere with the reflection and volume-refraction is itching the sour parts of my brain! Not to mention the model project errors. Your test will NOT be the same results as ours, since you have those textures and morphs and your render-times are processing them, but ours are not. Like you said, here, it honestly doesn't matter.

    Anyone want to buy an old Titan-X and two Titan-Xp Collectors Edition cards and a few Titan-V cards? I'll take four Titan-RTX cards as an equal trade! I'll even throw-in a running computer to pack them all into. Yes, they will all fit and run in there. :P

    Anyone have the running wattage of these RTX cards rendering, measured at the wall?

    Post edited by JD_Mortal on
  • RayDAntRayDAnt Posts: 1,120
    edited September 2019

    So for kicks I decided to up the scene complexity drastically (literally just copy/pasted the Hair prop itself 4 times in place in the scene) to see how the Titan RTX would react, and this is what I got:

    Titan RTX, 4.11.0.383 x64, Optix Prime ON, 436.15 WDDM, 409.003 = 01.467 mega-samples per second
    Titan RTX, 4.12.0.067 x64, OptiX Prime NA, 436.15 WDDM, 036.892 = 16.264 mega-sampels per second
    That is a MASSIVE performance increase of 1187%! So depending on how prevalent hi-polycount things like Strand hair end up being in mainstream DS content... you're gonna want those RTX cores.

    Post edited by RayDAnt on
  • JD_MortalJD_Mortal Posts: 758
    edited September 2019

    This reminds me of the old INTEL benchmarks, or Radeon benchmarks, where they publish only specific tests that gratify one individual element that no-one actually uses, as the foundation for a speed comparison. 

    If you save this as an OBJ, not the "Hair", then load it as an OBJ, does it still render at the same speed for you?

    I would honestly chalk up this benchmark as proof of a specific bug, or an area that warrants further optimization, not a test for RTX speed. Because the Titan-RTX should be more than 10x the speed of my CPU, and so should the Titan-V. This just shows that RTX is not "burdened", but the other cards are. By whatever they did to LAMH, which is what you are actually testing here.

    What you are using: (We use the free plugin, LAMH player, old plugin) SKU:15548 https://www.daz3d.com/look-at-my-hair

    What we need for comparison, but it's not free: SKU:44935 https://www.daz3d.com/lamh-2-iray-catalyzer

    It's not about "triangles", alone... because the hair is only 389K triangels, after conversion. But the hair and all the other stuff, together, are 671K worth of triangels, and an additional 235 textures. Yet, removing everything except the hair, didn't speed-up the render by half. It only shaved-off a few seconds. Just as adding four balls of hair didn't increase render-time by 4x as much. I went from 191sec to 291sec, though the geometry was apparently 4x greater. Going from 389K to 1554K triangles. Adding 4 objects, it still says there is only one object and one instance. I don't trust the logs output. Especally since it says it isn't using optix, because it isn't available, but it is clearly using it on Titan-Xp and Titan-V cards. :P

    When I doubled everything, except the hair, which was removed... There was 566K triangles and 470 textures, but it only took 30.918 seconds to render with the Titan-V, which should be nearly equal to the speed that it took to render the hair with only 671K triangles. Instead, it was less than half the time of the 74.143 second render of the original render, with the hair and the models.

    Oddly... Double the models = 30 sec + just the hair 18 sec = 48 seconds total... But it took 74 seconds with one set of models and one set of hair, together.

    This is the same 4x hair render, with no other objects.

    i9-7980XE CPU, 4.11.0.383 x64, OptiX Prime ON (5.0.1), NA, 1206.867s = 0.497 mega-sampels per second {Scene processed in 5.505s} <- Odd

    i9-7980XE CPU, 4.12.0.067 x64, Using Embree (2.8.0), NA, 298.192s = 2.012 mega-sampels per second {Scene processed in 0.176s} <- More Odd

    New Features in Embree 2.8.0

    • Added support for line segment geometry.
    • Added support for quad geometry (replaces triangle-pairs feature).

    P.S. Embree is up to version 3.6.1, which would make this even faster for my CPU. https://www.embree.org/

    Embree also has "clean-up code", that reduces the objects back into poly-lines and quads, if needed, when programs like Daz, feed it pure polygons, which LAMH was converted into, for iray compatibility. Because LAMH was only, originally, designed to work on 3DeLight rendering. (OpenGL ray-tracing)

    Somewhere in the process, whatever IRAY is using for RTX to render, is doing the same thing. But it isn't doing that for the other non-RTX cards, yet. (Nor is Daz, since IRAY also handles quads and poly-lines, but daz only outputs triangles for everything. IRAY isn't doing the clean-up of the scene for non-RTX cards, like it does for the RTX cars. Which is why LAMH was converted to output triangles, but only for IRAY.) 

    Oddly, it doesn't even, in the logs, indicate that the hair is all "instances". (I assume that is because Daz has converted it all to triangles. Come on, it consumes 1.6GB of memory space and has thousands of "images", but the hair doesn't have one texture map on it at all.)

    Ultimately, this is comparing apples to tires, at this point. Yes, it's IRAY doing the output, but unless you limit yourself to 4.11, where we are all rendering with the same sub-system, then you may as well be comparing 3DeLight to IRAY again. The Beta version is using individual variations that are not exactly "equal", or compatible, except on the output of the rendered image. This specific test using a non-daz, external plug-in, which produces different output than what IRAY manages to handle the same, per card.

    Post edited by JD_Mortal on
  • RayDAntRayDAnt Posts: 1,120
    edited September 2019
    JD_Mortal said:

    It's not about "triangles", alone... because the hair is only 389K triangels, after conversion. But the hair and all the other stuff, together, are 671K worth of triangels, and an additional 235 textures. Yet, removing everything except the hair, didn't speed-up the render by half. It only shaved-off a few seconds. Just as adding four balls of hair didn't increase render-time by 4x as much. I went from 191sec to 291sec, though the geometry was apparently 4x greater. Going from 389K to 1554K triangles.  [...] When I doubled everything, except the hair, which was removed... There was 566K triangles and 470 textures, but it only took 30.918 seconds to render with the Titan-V, which should be nearly equal to the speed that it took to render the hair with only 671K triangles. Instead, it was less than half the time of the 74.143 second render of the original render, with the hair and the models.

    Oddly... Double the models = 30 sec + just the hair 18 sec = 48 seconds total... But it took 74 seconds with one set of models and one set of hair, together.

    Yeah, keep in mind that the thing being investigated here is the relative performance of the different software/hardware strategies in use for tracing the paths of lightrays AROUND whatever triangles happen to be present in the scene - not necessarily processing those triangles themselves. Meaning that triangle density (insofar as screen space in the rendered viewport is concerned) likely plays more of a role in determining what's most time consuming during the rendering process than the actual number of traingles in the scene itself.

    Ie. the reason why the hair prop takes so much more time to render in the stock scene than the entirety of the rest of the geometry in it despite them being roughly the same triangle counts is because the hair presents a more taxing job due to how close together its visible component parts (traingles) are. Not necessarily because of how many of them there are. Imagine you're in a large but sparsely laid out forest with a bunch of infinitely bouncy balls, and you wanted to record the paths they'd take out of your sight when you throw them at the tree trunks. Now imagine wanting to do the same thing in a different forest with the same number of trees but much more densely packed. You're gonna get LOTS more twists and turns to track.

    Post edited by RayDAnt on
  • RayDAntRayDAnt Posts: 1,120
    edited September 2019

    Decided to give this benchmark a go on my (puny) GTX 1050-based Surface Book 2 for some additional sanity checking on non-RTX hardware.

    Here's for GPU:
    GTX 1050 2GB, 4.11.0.383 x64, Optix Prime, 436.30 WDDM, 600/663.609 = 0.904 mega-samples per second
    GTX 1050 2GB, 4.12.0.073 x64, OptiX Prime, 436.30 WDDM, 600/641.034 = 0.936 mega-samples per second
    That's a boost of 0.032 mega-samples per second or 3.54%. So basically unchanged.

    Meanwhile for CPU:
    Intel i7-8650U, 4.11.0.383 x64, Optix Prime, NA, 300/1659.505 = 0.181 mega-samples per second
    Intel i7-8650U, 4.12.0.073 x64, Embree, NA, 600/1847.245 = 0.325 mega-samples per second
    That's a boost of 0.144 or 79.56%. So can confirm - definitely a huge boost for this scene on CPU moving from OptiX Prime to Embree for raytracing acceleration.

    But not even remotely in the same territory as what @JD_Mortal saw with the i9-7980XE. Very puzzling indeed. Scratch that - was looking at the modified x4 test numbers. These are all for the stock benchmarking scene.

    Post edited by RayDAnt on
  • outrider42outrider42 Posts: 3,679

    It does makes sense that Embree, being developed by CPU focused Intel would probably be very well optimized as a CPU based solution. Iray being developed by Nvidia, has no incentive at all to optimize CPU only rendering. But it is still surprising to see Embree hold up so well in this type of test. Frankly it makes Iray and OptX look real bad, LOL. Anyway, JD is using an 18 core 36 thread part, vs a 4 core 8 thread part. Plus that 4 core part is a mobile chip, so its downclocked from the desktop. Even though the spec says it boosts to 4.2 ghz I doubt it stays there for very long a render. Actually, both chips supposedly boost to 4.2 ghz, however the i9 has a much higher base of 2.6 ghz while the mobile i7 just has a base of 1.9 ghz. That is big difference. 

    But yeah, the AMD 3960X is releasing very soon, possibly at the end of this month but no date has confirmed just yet. This part will be 16 core 32 threads, with BASE of 3.6 ghz, and boosts to 4.7 ghz. Oh, and its going to retail for about $750. I'm sure somebody here is going to buy this thing, and its going to be fun to see how it benches Iray-Embree. I'm also curious how these things run in more "normal" scenes, not just this benchmark. It could be very interesting stuff.

    It seems like the most exciting part of the new Iray is not just RTX support, but CPU based Embree.

  • RayDAntRayDAnt Posts: 1,120
    edited September 2019

    And another round on my i7-8700K:
    Intel i7-8700K, 4.11.0.383 x64, Optix Prime, NA, 600/882.179 = 0.680 mega-samples per second
    Intel i7-8700K, 4.12.0.073 x64, Embree, NA, 600/478.244 = 1.255 mega-samples per second
    That's a boost of 0.575 mega-samples per second or 84.56%. So pretty much consistent with the i7-865U numbers I got.

     

     

    It does makes sense that Embree, being developed by CPU focused Intel would probably be very well optimized as a CPU based solution. Iray being developed by Nvidia, has no incentive at all to optimize CPU only rendering. But it is still surprising to see Embree hold up so well in this type of test. Frankly it makes Iray and OptX look real bad, LOL.

    Eh. OptiX Prime acceleration is a 5+ year old solution (it debuted as part of OptiX 3.5 back in 2013-2014) to an ongoing problem from a time when something like realtime raytracing wasn't even on most developer's radars. And as for Iray itself... at its hard-coded core, its more than 30 years old. Which isn't to say that either of these things are in themselves uselessly outdated or irrelevant. But obviously when it comes to performance gains through tightly coupling software to current hardware innovations, a newly minted version of Embree or Iray RTX is gonna put previous solutions to shame.

    Personally what I'm most interested to see is - if/when IRay's developers get around to implementing it - how raytracing acceleration for GTX cards via the same full OptiX-based programming pipeline as RTX cards (remember - Nvidia hadn't even announced plans for RTX backwards support on GTX hardware until months after Iray RTX was already at the demo stage) compares to the current OptiX Prime one. My prediction is pretty much identically. But it could easily be otherwise, and that imo would be a really interesting nut to crack.

    Post edited by RayDAnt on
  • TheKDTheKD Posts: 2,674
    edited September 2019

    I thought the benchmark was impressive, 2080 super wows me even more in real world use. Usually when rendering for art purposes, I go with one of the golden ratio settings, with the longest side at 4000 px. Now with my 1070 and 960, I would generally get around 1000 iterations an hour, and a finished render would take an hour and a half to three hours. I am doing such a render now with the 2080 super and 1070, I looked at around the 9 minute mark, and it's already past 1000 iterations already. So it seems like maybe the larger you like to render, the more performance boost you will see.

    Yeah, took like 30 mins, so 30 mins instead of 120+ mins seems a lot more impressive than shaving a few minutes off the benchmark, but doing the math it conforms to the ~3 times faster that I got on the benchmark.

    Post edited by TheKD on
  • RayDAntRayDAnt Posts: 1,120
    edited September 2019
    TheKD said:

    So it seems like maybe the larger you like to render, the more performance boost you will see.

    Absolutely. Unfortunately this is a phenomenon that general purpose benchmarks designed to be runnable in as short a time as possible on as many high-end AND low-end GPU models as possible (aka pretty much all of them including the one I created) are incapable of quantifying. For that you'd need a specialized benchmark scene (most likely built using non-free content) that would be an impractical workload for most people with non top-tier current gen cards to complete. Meaning that we're pretty much just as well off with people's random anecdotes (like what you just posted) about the positive differences made.

    Post edited by RayDAnt on
  • davesodaveso Posts: 6,413
    edited October 2019

    was going to try bench with my new 2070 super, but there are files missing in the scene:

     

    data/daz 3d/genesis 2/female/morphs/daz 3d/base/fbmemaciated.dsf

    data/daz 3d/genesis 2/female/morphs/daz 3d/base/fbmpearfigure.dsf

    data/daz 3d/genesis 2/female/morphs/daz 3d/base/phmearselflong.dsf

    anyway, rendered, but no idea where these numbers are you guys are lisitng. Mine only says at the end .. 
    Total Rendering Time: 42.9 seconds  I have RTX 2070 Super CPU AMD 3800X

    So if I'm following correctly ... 

    RTX 2070 Super/DS4.12.0.86/600/42.9 =13.986 

    Post edited by daveso on
  • RTX2060 mobile (hp omen15 with core i5-933h), 4.12.0.086 x64, OptiX Prime NA, nvidia 431.86 studio driver ,55.3s = 10.9 mega-sampels per second

  • Dim ReaperDim Reaper Posts: 687
    edited October 2019
    daveso said:

    was going to try bench with my new 2070 super, but there are files missing in the scene:

     

    data/daz 3d/genesis 2/female/morphs/daz 3d/base/fbmemaciated.dsf

    data/daz 3d/genesis 2/female/morphs/daz 3d/base/fbmpearfigure.dsf

    data/daz 3d/genesis 2/female/morphs/daz 3d/base/phmearselflong.dsf

    anyway, rendered, but no idea where these numbers are you guys are lisitng. Mine only says at the end .. 
    Total Rendering Time: 42.9 seconds  I have RTX 2070 Super CPU AMD 3800X

    So if I'm following correctly ... 

    RTX 2070 Super/DS4.12.0.86/600/42.9 =13.986 

     

    Don't use the "Total Rendering Time" figure.  Once the scene has completed, save the image and then go to the log file.  The final line should look something like this:

    2019-10-06 16:09:47.528 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (GeForce RTX 2080 Ti): 600 iterations, 2.958s init, 28.806s render

    Use the final render time (28.806s in the example above) and divide 600 by that to get the mega samples/sec.

    Also, I get a long list of missing morph files too.  Doesn't seem to make much difference to the final times recorded.

    Post edited by Dim Reaper on
  • outrider42outrider42 Posts: 3,679
    The missing morph files are just an artifact of how it was originally saved, and make no impact. What you can do is load the scene and then save it again without making any changes. This will save it without those morphs, so if you load it up again you will no longer get those errors and it will load faster.

  • davesodaveso Posts: 6,413

    thanks .. new details . interesting my CPU was used for 65 iterations. Wonder why that happened?  this Super 2070 has 8 gig ram 

    rend info : CUDA device 0 (GeForce RTX 2070 SUPER): 535 iterations, 1.955s init, 37.490s render

    2019-10-06 12:50:07.960 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CPU: 65 iterations, 1.857s init, 37.013s render

  • outrider42outrider42 Posts: 3,679
    You can still render with CPU in this Iray. If your CPU is checked as a rendering then it will be used. In some cases the CPU will not render alongside GPU, perhaps if it is too slow.

    The only other way the CPU renders is if the scene exceeds VRAM and gets dumped to CPU. For this bench that should not be possible, unless some error happens or if you are using other applications that hog up VRAM while Iray is running.

    Intel Embree is the CPU render plugin. This is in place because OptiX 6.0 has no CPU option at all, so Iray includes Embree for users who lack a Nvidia GPU. I do find it very interesting how they can both apparently run at the same time.
  • jurajura Posts: 50

    Hi there

    Here are my results

     

    DAZ3D Studio 4.11 Pro

    RTX 2080Ti Total Rendering Time: 1 minutes 9.28 seconds

    4*GPUs(RTX 2080Ti,GTX 1080Ti,GTX 1080,GTX 1080) Total Rendering Time: 32.38 seconds

    GTX 1080Ti with 2113MHz OC Total Rendering Time: 2 minutes 4.53 seconds

     

    DAZ3D Studio 4.12 Pro 

    RTX 2080Ti Total Rendering Time: 28.85 seconds

    CUDA device 0 (GeForce RTX 2080 Ti): 600 iterations, 0.226s init, 26.937s render

    4*GPUs(RTX 2080Ti,GTX 1080Ti,GTX 1080,GTX 1080) Total Rendering Time: 20.69 seconds

    Hope this helps

    Thanks,Jura

     

  • WandererWanderer Posts: 956

    Okay, I have a few issues, hopefully they can be resolved easily enough:

    When I open the scene, I get 

    The following files could not be found:
    data/daz 3d/genesis 2/female/morphs/daz 3d/base/fbmemaciated.dsf
    data/daz 3d/genesis 2/female/morphs/daz 3d/base/fbmpearfigure.dsf
    data/daz 3d/genesis 2/female/morphs/daz 3d/base/phmearselflong.dsf

    This is a fresh install of everything on a new PC build, so maybe something isn't in its proper place yet.

    But then, when I get to the end of the log, I get this:

    2019-10-19 02:47:03.081 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Received update to 00600 iterations after 28.187s.
    2019-10-19 02:47:03.082 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Maximum number of samples reached.
    2019-10-19 02:47:03.402 Saved image: C:\Users\ed_my\AppData\Roaming\DAZ 3D\Studio4\temp\render\r.png
    2019-10-19 02:47:03.408 Finished Rendering
    2019-10-19 02:47:03.461 Total Rendering Time: 31.71 seconds
    2019-10-19 02:47:03.483 Loaded image r.png
    2019-10-19 02:47:03.523 Saved image: C:\Users\ed_my\AppData\Roaming\DAZ 3D\Studio4\temp\RenderAlbumTmp\Render 1.jpg
    2019-10-19 02:47:08.693 WARNING: ..\..\..\..\..\src\sdksource\cloud\dzcloudtasknotifier.cpp(178): peer performed orderly shutdown errno=0

    No device statistics. Any suggestions on this? If this is inappropriate, I'll take my question over to the tech support section.

  • RayDAntRayDAnt Posts: 1,120
    edited October 2019
    Wanderer said:

    Okay, I have a few issues, hopefully they can be resolved easily enough:

    When I open the scene, I get 

    The following files could not be found:
    data/daz 3d/genesis 2/female/morphs/daz 3d/base/fbmemaciated.dsf
    data/daz 3d/genesis 2/female/morphs/daz 3d/base/fbmpearfigure.dsf
    data/daz 3d/genesis 2/female/morphs/daz 3d/base/phmearselflong.dsf

    This is a fresh install of everything on a new PC build, so maybe something isn't in its proper place yet.

    But then, when I get to the end of the log, I get this:

    2019-10-19 02:47:03.081 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Received update to 00600 iterations after 28.187s.
    2019-10-19 02:47:03.082 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Maximum number of samples reached.
    2019-10-19 02:47:03.402 Saved image: C:\Users\ed_my\AppData\Roaming\DAZ 3D\Studio4\temp\render\r.png
    2019-10-19 02:47:03.408 Finished Rendering
    2019-10-19 02:47:03.461 Total Rendering Time: 31.71 seconds
    2019-10-19 02:47:03.483 Loaded image r.png
    2019-10-19 02:47:03.523 Saved image: C:\Users\ed_my\AppData\Roaming\DAZ 3D\Studio4\temp\RenderAlbumTmp\Render 1.jpg
    2019-10-19 02:47:08.693 WARNING: ..\..\..\..\..\src\sdksource\cloud\dzcloudtasknotifier.cpp(178): peer performed orderly shutdown errno=0

    No device statistics. Any suggestions on this? If this is inappropriate, I'll take my question over to the tech support section.

    Close the separate window with the render in it. Render Statistics don't appear in the log file until after an Iray render Context is closed. Which is what normally happens when you close that window.

    Post edited by RayDAnt on
  • WandererWanderer Posts: 956
    edited October 2019

    @RayDAnt Good to know. Thank you!

     

    Okay: 

    GPU only: [RTX 2080Ti] [DS 4.12.0.86] [436.48_gameready_Win10_64bit_something] [26.923s] [22.285 Ms/s]

    For giggles:

    GPU/CPU:

    [RTX 2080Ti] [DS 4.12.0.86] [436.48_gameready_Win10_64bit_something] [540Ite] [25.401s] [21.259 Ms/s] (infinite cosmic power)

    [I9 9940X] [DS 4.12.0.86] [Not sure] [60Ite] [26.047s] [2.303 Ms/s] (itty-bitty living space)

     

    Post edited by Wanderer on
  • chrislbchrislb Posts: 95

    The benchmark file is no longer available for download.  When I try to download it, I get this message:

    "Sorry, the file you requested is not available.
    Possible reasons include:

    - File date limit has expired.
    - File was not successfully uploaded.

    It is not possible to restore the file. Please contact the uploader and ask them to upload the file again."

  • SevrinSevrin Posts: 6,301
    chrislb said:

    The benchmark file is no longer available for download.  When I try to download it, I get this message:

    "Sorry, the file you requested is not available.
    Possible reasons include:

    - File date limit has expired.
    - File was not successfully uploaded.

    It is not possible to restore the file. Please contact the uploader and ask them to upload the file again."

    This thread is deprecated and has been replaced by the one linked in RayDAnt's signature.

  • nicsttnicstt Posts: 11,714
    edited October 2020
    RayDAnt said:
    Robinson said:
    No, take notice they did not do 600 iterations. The CPU added 174 iterations during this run, which significantly helped.

    I'm not going to lie.  This thread is so confusing.  Just render the scene and tell us how long it took and what the setup was.  Shesh.

    Perhaps they did not know it was on? I don't know. It got my attention because that time is like my 2x 1080tis, so I knew something had to be fishy as there is no way 2x 1070ti should match that. Then I saw the line with the CPU. This is actually pretty interesting because most of the time the CPU does't add much to Iray. But in this case whatever CPU it is did quite a bit of work and made a big impact on the final time.

    Most of the people would even go as far as saying buying a top end CPU is a waste if you are doing Iray with GPU. This might suggest otherwise.

    So what kind of CPU are you using there, EBF2003? Is this a new AMD Ryzen?

    Can't quote exact gospel/verse on it at the moment, but if you go and read Iray's official documentation on how it handles load balancing, it has a mechanism where if a single Cuda device out of multiple active Cuda devices during a Photoreal render takes significantly longer than the others to transmit its assigned portion of converged pixels back for inclusion in the central framebuffer, Iray's scheduler assumes that something is wrong with that device and automatically RE-assigns it's current workload to the other Cuda devices in the system. What this effectively means is that once you get beyond a certain rendering performance difference between your CPU and GPU(s), Rendering WITH your CPU results in WORSE overall rendering performance than without since - unbeknownst to you (there is never any indication of any of this in the log file) your fast GPUs are constantly being tasked with double processing data that your CPU is already processing. Hence why EB2003 gets BETTER performance with his Xeon + 1070s. Whereas I, with my 8700K + Titan RTX, get WORSE performance with CPU also enabled.

    Not trying to get too far off here, but the results of the Xeon are pretty interesting.

    Having a slower device dropped makes sense, but then again it doesn't make sense with the different possible GPU combinations out there. There are people who have used the bottom of the barrel GPUs combined with high end GPUs that offered even greater gaps in performance. SY had a GTX 740 with a 980ti and they all played nice together. The 740 is garbage by any standard and there are CPUs that are faster than it, especially today. That's just one example. I had a 1080ti paired with a 970, and one time a 670. I could test that 670 out for kicks to see if it still works with a 1080ti.

    I am surprised how well the 22 core Xeon hangs with the 1070tis. Its not really that much slower than a single 1070ti looking at the iteration counts. 1070tis are pretty solid cards for Iray, they are right there with 1080s in performance. It makes me VERY curious how well the new Ryzens handle Iray since they are clocked so much higher than any Xeon while still packing a good number of cores, core that have much better IPC than before.

    I would live to see a Ryzen 12 core 3900X paired with a 2080ti or any GPU for that matter and see how they do.

    I have a 980ti and a Threadripper 1950x (16/32). In Iray, the 980ti is around 30-40% faster and on occasions more. In Blender the 2.83 and on the Threadripper is at worst about the same, but often much faster, even 30-40% is not uncommon, and on occasions much more.

    Post edited by nicstt on
  • RayDAntRayDAnt Posts: 1,120
    edited October 2020
    Sevrin said:
    chrislb said:

    The benchmark file is no longer available for download.  When I try to download it, I get this message:

    "Sorry, the file you requested is not available.
    Possible reasons include:

    - File date limit has expired.
    - File was not successfully uploaded.

    It is not possible to restore the file. Please contact the uploader and ask them to upload the file again."

    This thread is deprecated and has been replaced by the one linked in RayDAnt's signature.

    ...not exactly. The benchmark scene used in this thread was calibrated to stress RTX specific hardware features. Whereas the scene in mine was calibrated to be of more general use (since getting usable stats for pre-RTX cards was more useful at that point.

    I'm pretty sure I have the benchmark scene from this thread backed up somewhere. Assuming no objections from the OP, I'll repost it here as soon as I get a chance.

    Post edited by RayDAnt on
  • SevrinSevrin Posts: 6,301
    RayDAnt said:
    Sevrin said:
    chrislb said:

    The benchmark file is no longer available for download.  When I try to download it, I get this message:

    "Sorry, the file you requested is not available.
    Possible reasons include:

    - File date limit has expired.
    - File was not successfully uploaded.

    It is not possible to restore the file. Please contact the uploader and ask them to upload the file again."

    This thread is deprecated and has been replaced by the one linked in RayDAnt's signature.

    ...not exactly. The benchmark scene used in this thread was calibrated to stress RTX specific hardware features. Whereas the scene in mine was calibrated to be of more general use (since getting usable stats for pre-RTX cards was more useful at that point.

    I'm pretty sure I have the benchmark scene from this thread backed up somewhere. Assuming no objections from the OP, I'll repost it here as soon as I get a chance.

    Well, considering that the previous post before today was a full year ago, it appears to have run its course.

  • Does anyone have a copy of the scene still? The sendspace link is telling me there was an error. Thanks

Sign In or Register to comment.