General GPU/testing discussion from benchmark thread
Richard Haseltine
Posts: 96,718
This discussion was created from comments split from: Iray Starter Scene: Post Your Benchmarks!.
Post edited by DAZ_ann0314 on
This discussion has been closed.
Comments
Interesting. If - while rendering - you right click over the CPU utilization graph in Task Manager and select change graph to > Logical processors you should be able to see exactly how many logical cores/threads Iray is actually using while rendering. SInce you have a dual Xeon setup you should also be able to select change graph to > NUMA nodes - which should show Iray usage split up by physical CPU layout (as I understand it a dual Xeon Gold setup has 2x2 = 4 NUMA nodes.)
What you will most likely see there is that Iray is only ever using two of those NUMA nodes (ie. one single whole physical processor.) This is because Windows has internal default settings which limit any single process from running on NUMA nodes physically located in separate CPU dies since this severally curtails relative performance gains (due to memory latency issues.) At least in the context of divvying out CPU resources to virtual machines (which is what multi CPU/NUMA node Xeon setups are generally optimized for.) There is bound to be some advanced windows setting somewhere that will override this behavior. But all my searches on it are coming up blank (most likely it's just too obscure of a use-case to find discussion about it.) If your goal is to get Iray running on all cores anyway, my best advice is to read up on NUMA nodes.
Artini just broke the system!
Lets take a look. The P5000 is actually the Quadro version of the 1080. So performance *should* be in the ball park of a 1080. Quadros are not clocked as fast as gaming cards, so the 1080 should be just a bit faster than a P5000. I didn't dig up a solo 1080 bench to compare, but that is not what what I am interested in right now.
With the SY scene, Artini managed to only to hit 1 minute 14.75 seconds with OptiX on. That is a good 10 seconds slower than the single Titan RTX, and 12 seconds slower than my dual 1080tis. If this is the only test you look at, it would be easy to leave unimpressed that 3 Quadros+32 core Xeons is still 19.9% SLOWER. Ouch.
HOWEVER...things change dramatically with my benchmark.
Artini's machine ran my bench in 2 minutes 47.26 seconds with OptiX on. Compare that to my time of 4 minutes 47.6 seconds, and RayDant's Titan RTX which was over 5 minutes. This is an incredible 72% FASTER than my rig, and 82.7% faster than RayDant's Titan RTX. It is pretty safe to say that Artini's rig completely blew our rigs away in my bench, and yet in the SY scene the performance was the exact opposite! I do not believe we have seen a case like this where the results were so dramatically different. My scene pushes shading much more, and Artini's Quadros powered right through it. Suddenly Artini's rig starts to look pretty sexy, LOL.
This test truly demonstrates once and for all why we need more than one benchmark. While Artini's rig is pretty unique, it shows us that different scenes can give wildly different results and that they do NOT scale like people might expect them to.
Thanks for posting these numbers, Artini.
I think your math is a bit off. It's more like 42% and 45% improvement, not 72% and 82%.
Fyi you've overlooked the fact that all of Artini's numbers are from Daz Studio 4.10. My Titan RTX numbers are all from 4.11.0.236 (since that is the only Turing compatible release atm) where the same scenes take significantly longer to render due to the more advanced version of Iray.
ETA: My GTX 1050's best time for the SickleYield scene in DS 4.10 is 9 minutes 48.45 seconds vs 8 minutes 56.77 seconds in DS 4.11.0.236. Which is a difference of 51.68 seconds or a time reduction of about 10% in the older version. Similarly my GTX 1050 times for the benchmark you created are 26 minutes 12.45 seconds in DS 4.10 vs 54 minutes 4.85 seconds in the beta. Or a time reduction of almost 50% in the previous version of Daz Studio.
Assuming these performance differences between Daz Studio releases were to remain true with Turing cards (in the hypothetical case that rendering with Turing in 4.10 was possible) that would make my Titan RTX's best total rendering times of 1 minute 4.9 seconds for SickleYield's test scene and 5 minutes 16.37 seconds for yours in 4.11.0.236 the equivalent of 58.41 seconds and 2 minutes 38.19 seconds respectively in 4.10. The latter of which would make a single $2500 Titan RTX (or even a single $1200 RTX 2080ti for that matter) a measurably better performer than approximately $10400 worth of previous previous generation high end professional gear (3 Quadro P5000s for around $1800 each + 2 Xeon Gold 6140s for around $5000) if going by Artini's results. And keep in mind that this is still without taking Turing's RTCore capabilites into account.
Which again raises the question, why are we spending so much time on benchmarks that will be largely irrelevant later this year or next year? And we're quoting render times down to 0.01 second accuracy?? I don't get it.
And we're using Iray-based scenes?? Didn't NVIDIA basically drop support for Iray last year, in favor of moving to the full Optix? I thought Iray is dead (not that anyone ever really used it besides DAZ and Allegorithmic I think). Wasn't that the big bombshell we were discussing last year, they were moving away from the average 3D rendering consumer towards the professional market? Am I missing something?
https://www.nvidia.com/en-us/design-visualization/solutions/rendering/product-updates/
So presumably the move for the future will be towards Optix, not Iray. Now I'm certainly no expert on Optix, but I assume that will include new material definitions and a bunch of RTX-based features. And the present Iray scenes, while they'll presumably still work, will be largely irrelevant.
So I presume Studio will move from Iray to Optix in the coming months/years, which again seems like a huge deal. Especially if there are new material definitions, which require a new user interface with the new material settings, and some way to make the existing Iray materials work. And so on and so on....
Optix is a library not a renderer, you could speculate that Iray will move from 'Optix Prime' to 'Optix'. I think there are discussions of what that requires elsewhere.
Make a test with
Outrider42 test scene:
https://www.daz3d.com/gallery/#images/526361/
with Daz Studio 4.11.0.236 Pro Beta
Optix On
Using OptiX Prime ray tracing (5.0.1).
CPU: using 33 cores for rendering
Rendering with 4 device(s):
CUDA device 0 (Quadro P5000): 1446 iterations, 1.042s init, 249.291s render
CUDA device 1 (Quadro P5000): 1421 iterations, 1.025s init, 249.964s render
CUDA device 2 (Quadro P5000): 1442 iterations, 1.077s init, 249.089s render
CPU: 691 iterations, 0.316s init, 249.990s render
4 minutes 11 seconds
So judging by those NUMA node usage graphs on the right I'd venture to say I was correct. Any plans to try getting both CPUs into the Iray rendering game? Still can't say it's definitely possible. But it would certainly be interesting to hear about if you find a way that works.
With all due respect, generically critcising people for discussing a topic (benchmarking) in a discussion thread dedicated to that topic doesn't seem like the wisest approach if informative discussion is your goal.
Professional 3D rendering hardware/software across the entire industry is in major flux right now because of the sudden push towards hardware accelerated ray-tracing. If you wanna see how that actively develops in the context of Daz Studio keep visiting this thread (or the one that comes after it) over the next six months or so. If you just want to see definitive results, avoid it (or the thread that comes after it) for the next six months or so. Easy peasy.
Some people report slower render times in 4.11, some people actually report faster render times.
Thanks for posting new times with 4.11, Artini.
And once again, even with 4.11, still Artini broke the system.
Artini's SY scene was faster with 4.11, which proves not everyone was slowed by 4.11. Moving to 4.11, Artini was only a fraction of a second slower than me, and turns the tables on the Titan RTX by beating it.
My scene till produced very interesting results. Artini's best time of 4 minute ans 11 seconds is a bit slower the the 4.10 mark, but still quite a bit faster than both RayDant and myself. Beating me by over 36 seconds, and beating the Titan RTX by a stunning 54 seconds.
This was my point, it is not about comparing Artini's rig to mine or the Titan RTX, it is about the difference between the tests themselves. If you only go by the SY scene, you might assume that Artini's rig is the about the same speed as mine. But this does not tell the whole story. My scene shows that Artini's rig is indeed faster than mine. It is a totally different story.
I also went back and tested in 4.10 again. My SY scene recorded 1 minute 2.89 seconds. That is actually slower than my 4.11 score, although by only a fraction of a second. So in this scene, both Artini and me benefit with 4.11 slightly. Moving on to my scene, I rendered it in 2 minutes 53.24 seconds. So on my scene, 4.11 is much slower, and both of us record slower times with my scene in 4.11.
However, Artini still beats me in 4.10 by a wide margin, where in the SY scene, Artini was a little slower. In the end, it didn't matter if it was 4.10 or 4.11, and again, this was the whole point of my post. The SY scene and my scene can indicate very different levels of performance. Our scenes do not scale with each other at all.
I'm merely encouraging folks to return to the simple, usable and practical nature of the thread, which is to get a general idea of best bang for the buck when considering new GPU purchases. And if the tech enthusiasts want to get into the tiny speculative details then why not start a new thread?
Best I can determine right now is that an RTX 2060 has a great bang for the buck, the GTXs are way overpriced and basically not worth considering, and I have no clue about the others. Maybe if we could flesh out a bit more simple info about that it would be far more beneficial for most of us.
Its not so simple anymore, though. You really cannot compare bench times people are using today to those of 4 years ago, as comparing our 4.10 to 4.11 proves. 4.8, 4.9, 4.10, 4.11, all of these behave differently. I wonder how many people are even using 4.8 anymore, because for one, Pascal doesn't work on it and so many people have Pascal. The software has changed, hardware has changed, and even different scenes yield different results. You have to have consistency in this kind of data, or there is no point. My recent post shows that you can lose quite a bit of speed with 4.11 with my scene. Some people have different results with the SY scene. This makes all previous testing on older versions of DS very suspect. Any suggestion that all things are equal is disingenuous.
It would be great if there was a group that benched a ton of setups like you often see with gaming, giving us a nice clear picture. But Iray is just too niche. If I could collect a bunch of GPUs, I would totally do it. That would be my idea of fun because I am weird like that.
This thread went off the rails pretty quickly. Within the first couple pages, people were altering the scene, and not properly reporting their hardware or scene settings. I can't stress that enough. People were removing 1 or 2 of the balls if their hardware was weak. That is not how benching works! You cannot make the test easier for yourself under any circumstance, that skews everything.
I agree. It's complicated. And like I keep saying, this will all probably change in the not-so-distant future, so we all know that benchmarks at this point are of only (very) limited benefit. So why not just accept that and make a very simple process that everyone can participate in, without 6 different scenes with 378 different variables? At least we can get a ballpark that means something to most users until we know more. For example:
In fact, I have a spreadsheet (see below) that's all ready to add RTX 2070, 2080, and 2080ti results for Sickleyield, or I'd be more than happy to change the numbers to reflect a new scene. And I've even included present prices (even thought that's useless info).
Oh absolutely. It goes without saying that scene composition makes a huge difference to render performance across different rendering hardware/software configurations. Eg. SickleYield' scene, with its lack of utilization for more advanced shading techniques (due to them not existing yet when it was made) is going to scale (render performance wise) very differently across different generations of hardware/software than a more recently composed scene - like yours, for instance.
To further support this, check out these graphs:
These are Iteration Time analyses (the unbiased renderer equivalent of Frame Time analyses from the game benchmarking world) generated by me from info contained in Daz Studio's log file (more on how to generate these for yourself in a future post) for individual runs of each benchmarking scene currently making the rounds in this thread, as well as the benchmarking scene I am currently in the process of beta-ing. There's a lot of important/useful information to be had in these graphs. However the main thing to be seen here that's relevant to your point is just how diverse each scene's pattern of Iray Iteration Rendering Time is despite ALL variables (including environmental and procedural - Daz Studio was fully closed and freshly reopened before rendering each of the scenes) except which scene was loaded being kept identical. These graphs are an objective quantification of just how much of an effect scene content/composition has on rendering performance - not just in terms of a single simple statistic like Total Rendering Time, but also in terms of what goes on with Iray under the hood on an iteration by iteration basis.
Ultimately the only way render results from a specific scene will ever be a truly accurate prediction of relative performance for a specific person's use case is if the scene in question is one of their own scenes. If this was gaming performance we were talking about, each and every individual scene is the operational equivalent of its own uniquely developed gaming title (with each version of Daz Studio/Iray being a different version of the same game engine.) And standard practice in gaming performance benchmarking circles is to basically test in as many popular game titles as you can stand before going mental.
In terms of designing a scene to function specifically as a benchmark, the only realistic options are either to tailor it around examining a specific aspect of performance (like Aala's cornell box, which is specifically for testing rendering performance in the absence of texture/materials data - thereby sidestepping the issue of texture/materials data loading times skewing Total Rendering Times away from pure hardware rendering performance), or going for something at least theoretically representative of a typical DS user's typical scene composition (eg. a current generation human figure with clothing/hair in some sort of lit environment - like the benchmarking scene you created.) Which, fwiw is also the tactic I am taking with the RTX-oriented new benchmarking scene I am currently in the process of beta-ing.
I have the EXACT sort of thread you're talking about (with all of the proper instructions for reporting things based on extensive behind-the-scenes testing) sitting in my drafts section. I'm just procrastinating on posting it because I can't decide on how many iterations to tell people to run the benchmarking scene. Too many and it becomes unusable on Pre-Turing hardware. Too few, and it becomes useless on Turing hardware once RTX features directly applicable to rendering (ie. RTCores) fully come online. The worst thing I could do is publish, and then have to change/republish the benchmarking scene - forcing everyone to have to redo all their tests with an identical looking benchmark. Which is a statistics keeping nightmare in the making.
Wow RayDAnt, it's so much fun to look at your stats! I love graphs etc. :D
I trust your judgment, I'll definitely do your benchmarks when they'll be released! ^^
You can't simplify it that much, though. As already stated, one scene doesn't tell the full story. Any gaming bench worth its weight has several tests in it. A gaming oriented bench will have a dozen or more games in it. Luxmark and other render engines have multiple benchmark scenes. To me, a scene for Iray is like a game, and each can tell a different story.
Even when this thread started, people discussed using OptiX on or off, and Speed vs Memory optimization. There were tons of variables from the very start, as people were not only benching hardware, they were also testing for the fastest setting in Iray. I think a lot of us forget that, this was not simply a hardware bench thread, people were trying to understand Iray in general because it was brand new to them. And every time a new version of Iray came out, people jumped to this thread to compare the speeds to the last version. In this sense, the SY scene still has some merit, but it just cannot be the only things used.
That will repeat again when the next version comes along. People will want to compare, whether they have RTX or not, how fast the new Iray is compared to the old one.
I've never talked about adding tons of variables. All I ask is for 2 or 3 bench scenes, because it is only this way you can get a true understanding. Download them, don't mess with the settings, don't touch anything, just hit render. Bam, 3 benchmarks to compare. OptiX may need to be checked, because oddly enough OptiX being on or off is not saved with the scene file.
Why is that so complicated?
And the beta is a viable option. It would be horribly unfair otherwise because Turing owners would have no option at all. While 4.11 may be beta, the Iray plugin in it is NOT a beta. Daz Studio is the beta. 4.11 actually corrects some flaws from 4.10, like how it handles chromatic sss.
You know what would be really rad? If Daz Studio had its own built in bench scenes! Why not? It has built in tutorials, other render software have them, why not Daz Studio?
Fwiw if you install Iray Server (available as a 1 month free trial here) it actually comes with a (very familiar looking...) ready-to-render benchmarking scene - here's a test render I did of it some time ago:
Obviously knowing this is not much use for Daz Studio purposes. But I did think that was interesting.
That figures. That scene file is probably very tiny, too. Octane has a dedicated benchmark you can download from their site and even upload your results. These benches serve the obvious purpose of informing someone if their machine can handle Octane. When Iray first released, Daz had a page on the site for Iray recommendations, and listed a few GPUs with some unknown benchmark times. Today the only thing I see is a recommendation of at least 4GB VRAM, which funny enough has always been the recommendation.
And funny enough I have 3Gb of VRAM, with just 2,3 avaiable for Daz!! xD
I intend to equip my computer with NVIDIA GeForce RTX 2080 Ti GPUs. The question is better 2 cards or does IRay work without problems with 1 card with 11 GB?
Thanks for information.
Hi :D
I've never used Daz with a high end system like that, so the other guys will give you better answers! ^^
But until then I think the VRAM doesn't matter: if the scene needs less than 11Gb of VRAM, you'll use both GPUs, otherwise you'll use none of them.
For example, if you have a GPU with 6Gb of VRAM and another with just 2Gb, and the scene needs 4Gb, you'll simply just use the first GPU.
Don't mix up different generations, but other than that you should be fine!
RTX has a new technology called Nvlink, that should improve multi-GPUs setups performance, but I don't think it's implemented yet!
Two RTX 2080 Ti are better than one, but I doubt you'll reach double the performance :)
Many thanks for the detailed information. Could you possibly recommend a GPU if the Asus RTX 2080 TI is not yet implemented? Sorry, I have no technical experience.
Thank you for this information, but unfortunately the GPU's 1080 Ti and 1070 Ti are no longer available for purchase. Are there any other alternatives that have proven their worth in practical DAZ Studio IRay rendering, but are still available for purchase?
Thank you for the help.
Then you should point me to the doc where you explicitely see these points
1°/ Iray uses INT operations and they could be done in paralell with some other FP32 operations (aka not blocked by an other FP32 operation waiting for the result)
2°/ Activate Optix Prime can cause a drop to CPU because of the additionnal memory used
Again, using a game analysis to make some hypothesis about Iray is not relevant. I've read enough doc and I've seen none of what you say
Get a used one?
Well I suppose you could do a quick test yourself and see what the answer is.
So yes, it appears that Optix Prime uses additional VRAM.
Okay:
Iray is the definition of a modern shader workload.
Switches from the classical built-in Ray Tracing traversal and hierarchy construction code to the external NVIDIA OptiX Prime API. Setting this option to false will lead to slower scene-geometry updates when running on the GPU and (usually) increased overall rendering time, but features less temporary memory overhead during scene-geometry updates and also decreased overall memory usage during rendering. (Iray Photoreal: Rendering Options. Nvidia Iray Programmer's Manual. https://raytracing-docs.nvidia.com/iray/manual/index.html#iray_photoreal_render_mode#rendering-options)
Photoreal is the Iray mode used in virtually all modern Daz Studio rendering scenarios.