General GPU/testing discussion from benchmark thread

Richard HaseltineRichard Haseltine Posts: 96,712
edited April 2019 in The Commons

This discussion was created from comments split from: Iray Starter Scene: Post Your Benchmarks!.

Post edited by DAZ_ann0314 on
«13456718

Comments

  • RayDAntRayDAnt Posts: 1,120
    edited April 2019
    Artini said:

    It looks like Daz Studio can only use one CPU while rendering in iray.

    image

    Interesting. If - while rendering - you right click over the CPU utilization graph in Task Manager and select change graph to > Logical processors you should be able to see exactly how many logical cores/threads Iray is actually using while rendering. SInce you have a dual Xeon setup you should also be able to select change graph to > NUMA nodes - which should show Iray usage split up by physical CPU layout (as I understand it a dual Xeon Gold setup has 2x2 = 4 NUMA nodes.)

    What you will most likely see there is that Iray is only ever using two of those NUMA nodes (ie. one single whole physical processor.) This is because Windows has internal default settings which limit any single process from running on NUMA nodes physically located in separate CPU dies since this severally curtails relative performance gains (due to memory latency issues.) At least in the context of divvying out CPU resources to virtual machines (which is what multi CPU/NUMA node Xeon setups are generally optimized for.) There is bound to be some advanced windows setting somewhere that will override this behavior. But all my searches on it are coming up blank (most likely it's just too obscure of a use-case to find discussion about it.) If your goal is to get Iray running on all cores anyway, my best advice is to read up on NUMA nodes.

    Post edited by RayDAnt on
  • outrider42outrider42 Posts: 3,679

    Artini just broke the system!

    Lets take a look. The P5000 is actually the Quadro version of the 1080. So performance *should* be in the ball park of a 1080. Quadros are not clocked as fast as gaming cards, so the 1080 should be just a bit faster than a P5000. I didn't dig up a solo 1080 bench to compare, but that is not what what I am interested in right now.

    With the SY scene, Artini managed to only to hit 1 minute 14.75 seconds with OptiX on. That is a good 10 seconds slower than the single Titan RTX, and 12 seconds slower than my dual 1080tis. If this is the only test you look at, it would be easy to leave unimpressed that 3 Quadros+32 core Xeons is still 19.9% SLOWER. Ouch.

    HOWEVER...things change dramatically with my benchmark.

    Artini's machine ran my bench in 2 minutes 47.26 seconds with OptiX on. Compare that to my time of 4 minutes 47.6 seconds, and RayDant's Titan RTX which was over 5 minutes. This is an incredible 72% FASTER than my rig, and 82.7% faster than RayDant's Titan RTX. It is pretty safe to say that Artini's rig completely blew our rigs away in my bench, and yet in the SY scene the performance was the exact opposite! I do not believe we have seen a case like this where the results were so dramatically different. My scene pushes shading much more, and Artini's Quadros powered right through it. Suddenly Artini's rig starts to look pretty sexy, LOL.

    This test truly demonstrates once and for all why we need more than one benchmark. While Artini's rig is pretty unique, it shows us that different scenes can give wildly different results and that they do NOT scale like people might expect them to.

    Thanks for posting these numbers, Artini.

  • ebergerlyebergerly Posts: 3,255

    Artini's machine ran my bench in 2 minutes 47.26 seconds with OptiX on. Compare that to my time of 4 minutes 47.6 seconds, and RayDant's Titan RTX which was over 5 minutes. This is an incredible 72% FASTER than my rig, and 82.7% faster than RayDant's Titan RTX.

    I think your math is a bit off. It's more like 42% and 45% improvement, not 72% and 82%.  

  • RayDAntRayDAnt Posts: 1,120
    edited April 2019

    Artini just broke the system!

    Lets take a look. The P5000 is actually the Quadro version of the 1080. So performance *should* be in the ball park of a 1080. Quadros are not clocked as fast as gaming cards, so the 1080 should be just a bit faster than a P5000. I didn't dig up a solo 1080 bench to compare, but that is not what what I am interested in right now.

    With the SY scene, Artini managed to only to hit 1 minute 14.75 seconds with OptiX on. That is a good 10 seconds slower than the single Titan RTX, and 12 seconds slower than my dual 1080tis. If this is the only test you look at, it would be easy to leave unimpressed that 3 Quadros+32 core Xeons is still 19.9% SLOWER. Ouch.

    HOWEVER...things change dramatically with my benchmark.

    Artini's machine ran my bench in 2 minutes 47.26 seconds with OptiX on. Compare that to my time of 4 minutes 47.6 seconds, and RayDant's Titan RTX which was over 5 minutes. This is an incredible 72% FASTER than my rig, and 82.7% faster than RayDant's Titan RTX. It is pretty safe to say that Artini's rig completely blew our rigs away in my bench, and yet in the SY scene the performance was the exact opposite! I do not believe we have seen a case like this where the results were so dramatically different. My scene pushes shading much more, and Artini's Quadros powered right through it. Suddenly Artini's rig starts to look pretty sexy, LOL.

    This test truly demonstrates once and for all why we need more than one benchmark. While Artini's rig is pretty unique, it shows us that different scenes can give wildly different results and that they do NOT scale like people might expect them to.

    Thanks for posting these numbers, Artini.

    Fyi you've overlooked the fact that all of Artini's numbers are from Daz Studio 4.10. My Titan RTX numbers are all from 4.11.0.236 (since that is the only Turing compatible release atm) where the same scenes take significantly longer to render due to the more advanced version of Iray.

    ETA: My GTX 1050's best time for the SickleYield scene in DS 4.10 is 9 minutes 48.45 seconds vs 8 minutes 56.77 seconds in DS 4.11.0.236. Which is a difference of 51.68 seconds or a time reduction of about 10% in the older version. Similarly my GTX 1050 times for the benchmark you created are 26 minutes 12.45 seconds in DS 4.10 vs 54 minutes 4.85 seconds in the beta. Or a time reduction of almost 50% in the previous version of Daz Studio.

    Assuming these performance differences between Daz Studio releases were to remain true with Turing cards (in the hypothetical case that rendering with Turing in 4.10 was possible) that would make my Titan RTX's best total rendering times of 1 minute 4.9 seconds for SickleYield's test scene and 5 minutes 16.37 seconds for yours in 4.11.0.236 the equivalent of 58.41 seconds and 2 minutes 38.19 seconds respectively in 4.10. The latter of which would make a single $2500 Titan RTX (or even a single $1200 RTX 2080ti for that matter) a measurably better performer than approximately $10400 worth of previous previous generation high end professional gear (3 Quadro P5000s for around $1800 each + 2 Xeon Gold 6140s for around $5000) if going by Artini's results. And keep in mind that this is still without taking Turing's RTCore capabilites into account.

    Post edited by RayDAnt on
  • ebergerlyebergerly Posts: 3,255
    edited April 2019
    RayDAnt said:

    And keep in mind that this is still without taking Turing's RTCore capabilites into account.

     

    Which again raises the question, why are we spending so much time on benchmarks that will be largely irrelevant later this year or next year? And we're quoting render times down to 0.01 second accuracy?? I don't get it.

    And we're using Iray-based scenes?? Didn't NVIDIA basically drop support for Iray last year, in favor of moving to the full Optix? I thought Iray is dead (not that anyone ever really used it besides DAZ and Allegorithmic I think). Wasn't that the big bombshell we were discussing last year, they were moving away from the average 3D rendering consumer towards the professional market? Am I missing something? 

    https://www.nvidia.com/en-us/design-visualization/solutions/rendering/product-updates/  

    So presumably the move for the future will be towards Optix, not Iray. Now I'm certainly no expert on Optix, but I assume that will include new material definitions and a bunch of RTX-based features. And the present Iray scenes, while they'll presumably still work, will be largely irrelevant. 

    So I presume Studio will move from Iray to Optix in the coming months/years, which again seems like a huge deal. Especially if there are new material definitions, which require a new user interface with the new material settings, and some way to make the existing Iray materials work. And so on and so on....

    Post edited by ebergerly on
  • prixatprixat Posts: 1,585
    ebergerly said:
    ...I presume Studio will move from Iray to Optix....

    Optix is a library not a renderer, you could speculate that Iray will move from 'Optix Prime' to 'Optix'. I think there are discussions of what that requires elsewhere.

  • ebergerlyebergerly Posts: 3,255
    Optix Prime is a subset API that only handles ray tracing/ray intersection calculations. Iray is a complete rendering engine that works with CUDA, and in the case of Studio it can hand off some of the ray tracing calcs to Prime if selected to speed things up a bit. Optix is also a complete rendering engine. So I assume this is a case of Iray or Optix, with Prime being a fairly irrelevant plug-in.
  • ArtiniArtini Posts: 8,773
    edited April 2019

    Make a test with

    Outrider42 test scene:
    https://www.daz3d.com/gallery/#images/526361/
    with Daz Studio 4.11.0.236 Pro Beta
    Optix On
    Using OptiX Prime ray tracing (5.0.1).
    CPU: using 33 cores for rendering
    Rendering with 4 device(s):
    CUDA device 0 (Quadro P5000):      1446 iterations, 1.042s init, 249.291s render
    CUDA device 1 (Quadro P5000):      1421 iterations, 1.025s init, 249.964s render
    CUDA device 2 (Quadro P5000):      1442 iterations, 1.077s init, 249.089s render
    CPU:      691 iterations, 0.316s init, 249.990s render
    4 minutes 11 seconds

    image

    image

    or42pic05scr2.jpg
    1237 x 899 - 209K
    or42pic05scr.jpg
    1012 x 726 - 192K
    Post edited by Artini on
  • RayDAntRayDAnt Posts: 1,120
    Artini said:

    image

    So judging by those NUMA node usage graphs on the right I'd venture to say I was correct. Any plans to try getting both CPUs into the Iray rendering game? Still can't say it's definitely possible. But it would certainly be interesting to hear about if you find a way that works.

     

    ebergerly said:

    Which again raises the question, why are we spending so much time on benchmarks that will be largely irrelevant later this year or next year?

    With all due respect, generically critcising people for discussing a topic (benchmarking) in a discussion thread dedicated to that topic doesn't seem like the wisest approach if informative discussion is your goal.

    Professional 3D rendering hardware/software across the entire industry is in major flux right now because of the sudden push towards hardware accelerated ray-tracing. If you wanna see how that actively develops in the context of Daz Studio keep visiting this thread (or the one that comes after it) over the next six months or so. If you just want to see definitive results, avoid it (or the thread that comes after it) for the next six months or so. Easy peasy.

  • outrider42outrider42 Posts: 3,679
    RayDAnt said:

    Artini just broke the system!

    Lets take a look. The P5000 is actually the Quadro version of the 1080. So performance *should* be in the ball park of a 1080. Quadros are not clocked as fast as gaming cards, so the 1080 should be just a bit faster than a P5000. I didn't dig up a solo 1080 bench to compare, but that is not what what I am interested in right now.

    With the SY scene, Artini managed to only to hit 1 minute 14.75 seconds with OptiX on. That is a good 10 seconds slower than the single Titan RTX, and 12 seconds slower than my dual 1080tis. If this is the only test you look at, it would be easy to leave unimpressed that 3 Quadros+32 core Xeons is still 19.9% SLOWER. Ouch.

    HOWEVER...things change dramatically with my benchmark.

    Artini's machine ran my bench in 2 minutes 47.26 seconds with OptiX on. Compare that to my time of 4 minutes 47.6 seconds, and RayDant's Titan RTX which was over 5 minutes. This is an incredible 72% FASTER than my rig, and 82.7% faster than RayDant's Titan RTX. It is pretty safe to say that Artini's rig completely blew our rigs away in my bench, and yet in the SY scene the performance was the exact opposite! I do not believe we have seen a case like this where the results were so dramatically different. My scene pushes shading much more, and Artini's Quadros powered right through it. Suddenly Artini's rig starts to look pretty sexy, LOL.

    This test truly demonstrates once and for all why we need more than one benchmark. While Artini's rig is pretty unique, it shows us that different scenes can give wildly different results and that they do NOT scale like people might expect them to.

    Thanks for posting these numbers, Artini.

    Fyi you've overlooked the fact that all of Artini's numbers are from Daz Studio 4.10. My Titan RTX numbers are all from 4.11.0.236 (since that is the only Turing compatible release atm) where the same scenes take significantly longer to render due to the more advanced version of Iray.

    ETA: My GTX 1050's best time for the SickleYield scene in DS 4.10 is 9 minutes 48.45 seconds vs 8 minutes 56.77 seconds in DS 4.11.0.236. Which is a difference of 51.68 seconds or a time reduction of about 10% in the older version. Similarly my GTX 1050 times for the benchmark you created are 26 minutes 12.45 seconds in DS 4.10 vs 54 minutes 4.85 seconds in the beta. Or a time reduction of almost 50% in the previous version of Daz Studio.

    Assuming these performance differences between Daz Studio releases were to remain true with Turing cards (in the hypothetical case that rendering with Turing in 4.10 was possible) that would make my Titan RTX's best total rendering times of 1 minute 4.9 seconds for SickleYield's test scene and 5 minutes 16.37 seconds for yours in 4.11.0.236 the equivalent of 58.41 seconds and 2 minutes 38.19 seconds respectively in 4.10. The latter of which would make a single $2500 Titan RTX (or even a single $1200 RTX 2080ti for that matter) a measurably better performer than approximately $10400 worth of previous previous generation high end professional gear (3 Quadro P5000s for around $1800 each + 2 Xeon Gold 6140s for around $5000) if going by Artini's results. And keep in mind that this is still without taking Turing's RTCore capabilites into account.

     

    Some people report slower render times in 4.11, some people actually report faster render times.

    Thanks for posting new times with 4.11, Artini.

    And once again, even with 4.11, still Artini broke the system.

    Artini's SY scene was faster with 4.11, which proves not everyone was slowed by 4.11. Moving to 4.11, Artini was only a fraction of a second slower than me, and turns the tables on the Titan RTX by beating it.

    My scene till produced very interesting results. Artini's best time of 4 minute ans 11 seconds is a bit slower the the 4.10 mark, but still quite a bit faster than both RayDant and myself. Beating me by over 36 seconds, and beating the Titan RTX by a stunning 54 seconds.

    This was my point, it is not about comparing Artini's rig to mine or the Titan RTX, it is about the difference between the tests themselves. If you only go by the SY scene, you might assume that Artini's rig is the about the same speed as mine. But this does not tell the whole story. My scene shows that Artini's rig is indeed faster than mine. It is a totally different story.

    I also went back and tested in 4.10 again. My SY scene recorded 1 minute 2.89 seconds. That is actually slower than my 4.11 score, although by only a fraction of a second. So in this scene, both Artini and me benefit with 4.11 slightly. Moving on to my scene, I rendered it in 2 minutes 53.24 seconds. So on my scene, 4.11 is much slower, and both of us record slower times with my scene in 4.11.

    However, Artini still beats me in 4.10 by a wide margin, where in the SY scene, Artini was a little slower. In the end, it didn't matter if it was 4.10 or 4.11, and again, this was the whole point of my post. The SY scene and my scene can indicate very different levels of performance. Our scenes do not scale with each other at all.

  • ebergerlyebergerly Posts: 3,255
    edited April 2019

    I'm merely encouraging folks to return to the simple, usable and practical nature of the thread, which is to get a general idea of best bang for the buck when considering new GPU purchases. And if the tech enthusiasts want to get into the tiny speculative details then why not start a new thread?

    Best I can determine right now is that an RTX 2060 has a great bang for the buck, the GTXs are way overpriced and basically not worth considering, and I have no clue about the others. Maybe if we could flesh out a bit more simple info about that it would be far more beneficial for most of us.

    Post edited by ebergerly on
  • outrider42outrider42 Posts: 3,679

    Its not so simple anymore, though. You really cannot compare bench times people are using today to those of 4 years ago, as comparing our 4.10 to 4.11 proves. 4.8, 4.9, 4.10, 4.11, all of these behave differently. I wonder how many people are even using 4.8 anymore, because for one, Pascal doesn't work on it and so many people have Pascal. The software has changed, hardware has changed, and even different scenes yield different results. You have to have consistency in this kind of data, or there is no point. My recent post shows that you can lose quite a bit of speed with 4.11 with my scene. Some people have different results with the SY scene. This makes all previous testing on older versions of DS very suspect. Any suggestion that all things are equal is disingenuous.

    It would be great if there was a group that benched a ton of setups like you often see with gaming, giving us a nice clear picture. But Iray is just too niche. If I could collect a bunch of GPUs, I would totally do it. That would be my idea of fun because I am weird like that.

    This thread went off the rails pretty quickly. Within the first couple pages, people were altering the scene, and not properly reporting their hardware or scene settings. I can't stress that enough. People were removing 1 or 2 of the balls if their hardware was weak. That is not how benching works! You cannot make the test easier for yourself under any circumstance, that skews everything.

  • ebergerlyebergerly Posts: 3,255
    edited April 2019

    Its not so simple anymore, though. You really cannot compare bench times people are using today to those of 4 years ago, as comparing our 4.10 to 4.11 proves. 4.8, 4.9, 4.10, 4.11, all of these behave differently.

    I agree. It's complicated. And like I keep saying, this will all probably change in the not-so-distant future, so we all know that benchmarks at this point are of only (very) limited benefit. So why not just accept that and make a very simple process that everyone can participate in, without 6 different scenes with 378 different variables? At least we can get a ballpark that means something to most users until we know more. For example:

    1. Pick a scene.
    2. Pick a D|S version (of course, not a BETA version since those are, ummm, in BETA testing still and you can't necessarily rely on the results)
    3. Ask everyone "hey, everyone, could you please download this scene, and if you have this D|S version, do a render on your GPU and post your time? And we'd especially appreciate input from RTX owners."
    4. Gather the results in a small spreadsheet and tell everyone "hey, this is what we got, and if you're going to buy a GPU here's a ballpark of what you might expect for now, but that will probably change a lot in the near future. And if you buy an RTX card now you'll probably be better off than these results indicate".
    5. Done. 

    In fact, I have a spreadsheet (see below) that's all ready to add RTX 2070, 2080, and 2080ti results for Sickleyield, or I'd be more than happy to change the numbers to reflect a new scene. And I've even included present prices (even thought that's useless info). 

    Benchmark New.JPG
    702 x 691 - 89K
    Post edited by ebergerly on
  • RayDAntRayDAnt Posts: 1,120
    edited April 2019

    This was my point, it is not about comparing Artini's rig to mine or the Titan RTX, it is about the difference between the tests themselves. If you only go by the SY scene, you might assume that Artini's rig is the about the same speed as mine. But this does not tell the whole story. My scene shows that Artini's rig is indeed faster than mine. It is a totally different story.

    Oh absolutely. It goes without saying that scene composition makes a huge difference to render performance across different rendering hardware/software configurations. Eg. SickleYield' scene, with its lack of utilization for more advanced shading techniques (due to them not existing yet when it was made) is going to scale (render performance wise) very differently across different generations of hardware/software than a more recently composed scene - like yours, for instance.

    To further support this, check out these graphs:

    These are Iteration Time analyses (the unbiased renderer equivalent of Frame Time analyses from the game benchmarking world) generated by me from info contained in Daz Studio's log file (more on how to generate these for yourself in a future post) for individual runs of each benchmarking scene currently making the rounds in this thread, as well as the benchmarking scene I am currently in the process of beta-ing. There's a lot of important/useful information to be had in these graphs. However the main thing to be seen here that's relevant to your point is just how diverse each scene's pattern of Iray Iteration Rendering Time is despite ALL variables (including environmental and procedural - Daz Studio was fully closed and freshly reopened before rendering each of the scenes) except which scene was loaded being kept identical. These graphs are an objective quantification of just how much of an effect scene content/composition has on rendering performance - not just in terms of a single simple statistic like Total Rendering Time, but also in terms of what goes on with Iray under the hood on an iteration by iteration basis.

    Ultimately the only way render results from a specific scene will ever be a truly accurate prediction of relative performance for a specific person's use case is if the scene in question is one of their own scenes. If this was gaming performance we were talking about, each and every individual scene is the operational equivalent of its own uniquely developed gaming title (with each version of Daz Studio/Iray being a different version of the same game engine.) And standard practice in gaming performance benchmarking circles is to basically test in as many popular game titles as you can stand before going mental. 

    In terms of designing a scene to function specifically as a benchmark, the only realistic options are either to tailor it around examining a specific aspect of performance (like Aala's cornell box, which is specifically for testing rendering performance in the absence of texture/materials data - thereby sidestepping the issue of texture/materials data loading times skewing Total Rendering Times away from pure hardware rendering performance), or going for something at least theoretically representative of a typical DS user's typical scene composition (eg. a current generation human figure with clothing/hair in some sort of lit environment - like the benchmarking scene you created.) Which, fwiw is also the tactic I am taking with the RTX-oriented new benchmarking scene I am currently in the process of beta-ing.

    SickleYield - GTX 1050 2GB (Driver 391.40) - Daz Studio 4.11.0.236 Beta - OptiX On.png
    600 x 371 - 23K
    outrider42 - GTX 1050 2GB (Driver 391.40) - Daz Studio 4.11 Beta - OptiX On.png
    600 x 371 - 20K
    DAZ_Rawb - GTX 1050 2GB (Driver 391.40) - Daz Studio 4.11 Beta - OptiX On.png
    600 x 371 - 31K
    Aala 1k - GTX 1050 2GB (Driver 391.40) - Daz Studio 4.11 Beta - OptiX On.png
    600 x 371 - 20K
    RayDAnt (prototype 12) - GTX 1050 2GB (Driver 391.40) - Daz Studio 4.11 Beta - OptiX On.png
    600 x 371 - 20K
    Post edited by RayDAnt on
  • RayDAntRayDAnt Posts: 1,120
    edited April 2019
    ebergerly said:

    Its not so simple anymore, though. You really cannot compare bench times people are using today to those of 4 years ago, as comparing our 4.10 to 4.11 proves. 4.8, 4.9, 4.10, 4.11, all of these behave differently.

    I agree. It's complicated. And like I keep saying, this will all probably change in the not-so-distant future, so we all know that benchmarks at this point are of only (very) limited benefit. So why not just accept that and make a very simple process that everyone can participate in, without 6 different scenes with 378 different variables? At least we can get a ballpark that means something to most users until we know more. For example:

    1. Pick a scene.
    2. Pick a D|S version (of course, not a BETA version since those are, ummm, in BETA testing still and you can't necessarily rely on the results)
    3. Ask everyone "hey, everyone, could you please download this scene, and if you have this D|S version, do a render on your GPU and post your time? And we'd especially appreciate input from RTX owners."
    4. Gather the results in a small spreadsheet and tell everyone "hey, this is what we got, and if you're going to buy a GPU here's a ballpark of what you might expect for now, but that will probably change a lot in the near future. And if you buy an RTX card now you'll probably be better off than these results indicate".
    5. Done. 

    In fact, I have a spreadsheet (see below) that's all ready to add RTX 2070, 2080, and 2080ti results for Sickleyield, or I'd be more than happy to change the numbers to reflect a new scene. And I've even included present prices (even thought that's useless info). 

    I have the EXACT sort of thread you're talking about (with all of the proper instructions for reporting things based on extensive behind-the-scenes testing) sitting in my drafts section. I'm just procrastinating on posting it because I can't decide on how many iterations to tell people to run the benchmarking scene. Too many and it becomes unusable on Pre-Turing hardware. Too few, and it becomes useless on Turing hardware once RTX features directly applicable to rendering (ie. RTCores) fully come online. The worst thing I could do is publish, and then have to change/republish the benchmarking scene - forcing everyone to have to redo all their tests with an identical looking benchmark. Which is a statistics keeping nightmare in the making.

    Post edited by RayDAnt on
  • LenioTGLenioTG Posts: 2,118
    RayDAnt said:

    I have the EXACT sort of thread you're talking about (with all of the proper instructions for reporting things based on extensive behind-the-scenes testing) sitting in my drafts section. I'm just procrastinating on posting it because I can't decide on how many iterations to tell people to run the benchmarking scene. Too many and it becomes unusable on Pre-Turing hardware. Too few, and it becomes useless on Turing hardware once RTX features directly applicable to rendering (ie. RTCores) fully come online. The worst thing I could do is publish, and then have to change/republish the benchmarking scene - forcing everyone to have to redo all their tests with an identical looking benchmark. Which is a statistics keeping nightmare in the making.

    Wow RayDAnt, it's so much fun to look at your stats! I love graphs etc. :D

    I trust your judgment, I'll definitely do your benchmarks when they'll be released! ^^

  • outrider42outrider42 Posts: 3,679

    You can't simplify it that much, though. As already stated, one scene doesn't tell the full story. Any gaming bench worth its weight has several tests in it. A gaming oriented bench will have a dozen or more games in it. Luxmark and other render engines have multiple benchmark scenes. To me, a scene for Iray is like a game, and each can tell a different story.

    Even when this thread started, people discussed using OptiX on or off, and Speed vs Memory optimization. There were tons of variables from the very start, as people were not only benching hardware, they were also testing for the fastest setting in Iray. I think a lot of us forget that, this was not simply a hardware bench thread, people were trying to understand Iray in general because it was brand new to them. And every time a new version of Iray came out, people jumped to this thread to compare the speeds to the last version. In this sense, the SY scene still has some merit, but it just cannot be the only things used.

    That will repeat again when the next version comes along. People will want to compare, whether they have RTX or not, how fast the new Iray is compared to the old one.

    I've never talked about adding tons of variables. All I ask is for 2 or 3 bench scenes, because it is only this way you can get a true understanding. Download them, don't mess with the settings, don't touch anything, just hit render. Bam, 3 benchmarks to compare. OptiX may need to be checked, because oddly enough OptiX being on or off is not saved with the scene file.

    Why is that so complicated?

    And the beta is a viable option. It would be horribly unfair otherwise because Turing owners would have no option at all. While 4.11 may be beta, the Iray plugin in it is NOT a beta. Daz Studio is the beta. 4.11 actually corrects some flaws from 4.10, like how it handles chromatic sss.

    You know what would be really rad? If Daz Studio had its own built in bench scenes! Why not? It has built in tutorials, other render software have them, why not Daz Studio?

  • RayDAntRayDAnt Posts: 1,120
    edited April 2019

    You know what would be really rad? If Daz Studio had its own built in bench scenes! Why not? It has built in tutorials, other render software have them, why not Daz Studio?

    Fwiw if you install Iray Server (available as a 1 month free trial here) it actually comes with a (very familiar looking...) ready-to-render benchmarking scene - here's a test render I did of it some time ago:

    Obviously knowing this is not much use for Daz Studio purposes. But I did think that was interesting.

    Test-Beauty.png
    1920 x 1080 - 2M
    Post edited by RayDAnt on
  • outrider42outrider42 Posts: 3,679
    RayDAnt said:

    You know what would be really rad? If Daz Studio had its own built in bench scenes! Why not? It has built in tutorials, other render software have them, why not Daz Studio?

    Fwiw if you install Iray Server (available as a 1 month free trial here) it actually comes with a (very familiar looking...) ready-to-render benchmarking scene - here's a test render I did of it some time ago:

    Obviously knowing this is not much use for Daz Studio purposes. But I did think that was interesting.

    That figures. That scene file is probably very tiny, too. Octane has a dedicated benchmark you can download from their site and even upload your results. These benches serve the obvious purpose of informing someone if their machine can handle Octane. When Iray first released, Daz had a page on the site for Iray recommendations, and listed a few GPUs with some unknown benchmark times. Today the only thing I see is a recommendation of at least 4GB VRAM, which funny enough has always been the recommendation.

  • LenioTGLenioTG Posts: 2,118

    That figures. That scene file is probably very tiny, too. Octane has a dedicated benchmark you can download from their site and even upload your results. These benches serve the obvious purpose of informing someone if their machine can handle Octane. When Iray first released, Daz had a page on the site for Iray recommendations, and listed a few GPUs with some unknown benchmark times. Today the only thing I see is a recommendation of at least 4GB VRAM, which funny enough has always been the recommendation.

    And funny enough I have 3Gb of VRAM, with just 2,3 avaiable for Daz!! xD

  • ADAD Posts: 396
    edited April 2019

    I intend to equip my computer with NVIDIA GeForce RTX 2080 Ti GPUs. The question is better 2 cards or does IRay work without problems with 1 card with 11 GB?
    Thanks for information.

    Post edited by AD on
  • LenioTGLenioTG Posts: 2,118
    AD said:

    I intend to equip my computer with NVIDIA GeForce RTX 2080 Ti GPUs. The question is better 2 cards or does IRay work without problems with 1 card with 11 GB?
    Thanks for information.

    Hi :D
    I've never used Daz with a high end system like that, so the other guys will give you better answers! ^^

    But until then I think the VRAM doesn't matter: if the scene needs less than 11Gb of VRAM, you'll use both GPUs, otherwise you'll use none of them.
    For example, if you have a GPU with 6Gb of VRAM and another with just 2Gb, and the scene needs 4Gb, you'll simply just use the first GPU.

    Don't mix up different generations, but other than that you should be fine!

    RTX has a new technology called Nvlink, that should improve multi-GPUs setups performance, but I don't think it's implemented yet!

    Two RTX 2080 Ti are better than one, but I doubt you'll reach double the performance :)

  • ADAD Posts: 396

    Many thanks for the detailed information. Could you possibly recommend a GPU if the Asus RTX 2080 TI is not yet implemented? Sorry, I have no technical experience.

  • bk007dragonbk007dragon Posts: 113
    edited April 2019
    AD said:

    Many thanks for the detailed information. Could you possibly recommend a GPU if the Asus RTX 2080 TI is not yet implemented? Sorry, I have no technical experience.

    The new rtx cards are supporated only in the Beta client atm. NV LINK does not help with daz, is not needed. 1 card will process well, 2 faster. The card don't add together to support a larger scene. 11GB card + 11 GB card = 11 GB rendered faster. NOT 22gb. Hope this helps. For perspective I have a 1080Ti & 1070ti and have rendered both individually and together. NO nvlink.
    Post edited by bk007dragon on
  • outrider42outrider42 Posts: 3,679
    The 2080ti is currently the fastest single GPU you can buy. It works fine in Daz, but you must install the 4.11 beta in order to use any RTX card. Daz Iray will take everything you throw at it, there is practically no limit to how many GPUs you can use at once, as long as your machine can handle them, and as long as the scene you create fits onto each GPU's VRAM.

    Rule of thumb...CUDA cores stack, VRAM does not.
  • ADAD Posts: 396

    Thank you for this information, but unfortunately the GPU's 1080 Ti and 1070 Ti are no longer available for purchase. Are there any other alternatives that have proven their worth in practical DAZ Studio IRay rendering, but are still available for purchase?
    Thank you for the help.
     

     

  • Takeo.KenseiTakeo.Kensei Posts: 1,303
    RayDAnt said:
    RayDAnt said:
     
    RayDAnt said:

    OptiX Performance Boost: Based on multiple runs. Whether rendering each scene with OptiX Prime enabled lead to a better (shorter) Total Rendering Time than without it for that partcular hardware/software configuration. Take special note of the "No" answers for all the post 2017 scenes with the Titan RTX configuration. 

    The fact that the RTX Titan doesn't benefit from Optix Prime for the scenes after 2017 whereas the GTX 1050 does, also raises few questions :

    1*/ Why ? Obvious answer would be that the raytracing performance is a bit better with the RTX Titan than with the GTX 1050

    2°/ Does it expand to all RTX cards or is it just for the quickest (need somebody who has a 2060 to test to get a beginning of an answer) ?

    I guess that the answer will be that it works for all cards and that it is a particularity of the Turing architecture but can't confirm without test

    Fwiw the answer to this question is already evident.

    Not at all.  You don't have access to internal code to prove that

    The illustrations that post is an analysis of came directly from Nvidia and were generated by their developers with the luxury of having access to internal code. Hence why I bothered posting an analysis of it - because it was an authoritative source publishing proprietary information.

     

     

     

    You talk about memory load balancing to prevent CPU fallback and consider optix prime to possibly trigger a CPU fallback ? I rather consider it will not and that if there is not enough memory, Optix Prime will run partially or not at all (so is it pooly programmed?). Did anybody report that Optix prime was making the render to fall to CPU?

    Nvidia's own documentation on how exactly it is Iray - in conjunction with OptiX Prime - works explicitly states that it leads to increased vram usage to the point that it will cause some borderline too-large scenes to fallback to CPU versus without it. The point of the statistics I compiled was to discover how much of a difference it makes - not question whether there is a difference or not, because honetly I can see no reaon why Iray's developers would lie about that.

    Then you should point me to the doc where you explicitely see these points

    1°/ Iray uses INT operations and they could be done in paralell with some other FP32 operations (aka not blocked by an other FP32 operation waiting for the result)

    2°/ Activate Optix Prime can cause a drop to CPU because of the additionnal memory used

    Again, using a game analysis to make some hypothesis about Iray is not relevant. I've read enough doc and I've seen none of what you say

  • bluejauntebluejaunte Posts: 1,861
    AD said:

    Thank you for this information, but unfortunately the GPU's 1080 Ti and 1070 Ti are no longer available for purchase. Are there any other alternatives that have proven their worth in practical DAZ Studio IRay rendering, but are still available for purchase?
    Thank you for the help.
     

     

    Get a used one?

  • ebergerlyebergerly Posts: 3,255

     

    Then you should point me to the doc where you explicitely see these points

    1°/ Iray uses INT operations and they could be done in paralell with some other FP32 operations (aka not blocked by an other FP32 operation waiting for the result)

    2°/ Activate Optix Prime can cause a drop to CPU because of the additionnal memory used

    Again, using a game analysis to make some hypothesis about Iray is not relevant. I've read enough doc and I've seen none of what you say

    Well I suppose you could do a quick test yourself and see what the answer is. 

    • I fired up Studio, loaded one of my heaviest scenes, turned off Optix Prime and hit Render. Checked Task Manager/Details, where it shows the Dedicated GPU Memory usage for the Studio process, and while rendering the total GPU VRAM usage (across my 2 GPU's) was 15.6 GB.
    • Closed Studio, re-opened it, loaded the same scene, no changes except turning Optix Prime on, hit Render, and the same scene used 16.6 GB across my two GPU's.

    So yes, it appears that Optix Prime uses additional VRAM. 

  • RayDAntRayDAnt Posts: 1,120
    edited April 2019

    Then you should point me to the doc where you explicitely see these points

    Okay:

     

    1°/ Iray uses INT operations and they could be done in paralell with some other FP32 operations (aka not blocked by an other FP32 operation waiting for the result)

    Turing implements a major revamping of the core execution datapaths. Modern shader workloads typically have a mix of FP arithmetic instructions such as FADD or FMAD with simpler instructions such as integer adds for addressing and fetching data, floating point compare or min/max for processing results, etc. In previous shader architectures, the floating-point math datapath sits idle whenever one of these non-FP-math instructions runs. Turing adds a second parallel execution unit next to every CUDA core that executes these instructions in parallel with floating point math. (Turing GPU Architecture In-Depth: Turing Streaming Multiprocessor (SM) ArchitectureNVIDIA Turing GPU Architecture, pp. 11. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

    Iray is the definition of a modern shader workload.

     

    2°/ Activate Optix Prime can cause a drop to CPU because of the additionnal memory used

    bool iray_optix_prime = true

    Switches from the classical built-in Ray Tracing traversal and hierarchy construction code to the external NVIDIA OptiX Prime API. Setting this option to false will lead to slower scene-geometry updates when running on the GPU and (usually) increased overall rendering time, but features less temporary memory overhead during scene-geometry updates and also decreased overall memory usage during rendering. (Iray Photoreal: Rendering Options. Nvidia Iray Programmer's Manualhttps://raytracing-docs.nvidia.com/iray/manual/index.html#iray_photoreal_render_mode#rendering-options)

     Photoreal is the Iray mode used in virtually all modern Daz Studio rendering scenarios.

    Post edited by RayDAnt on
This discussion has been closed.