Iray Starter Scene: Post Your Benchmarks!

1353638404149

Comments

  • nicsttnicstt Posts: 11,714

    I've not seen it mentioned, that the 20 series need beta, and there is a fix that is implemented in beta that slows renders down.

  • LenioTGLenioTG Posts: 2,118
    edited March 2019
    Ribanok said:
    kameneko said:

    ... Again, if someone has a RTX 2060 or a GTX 1660Ti, it would be nice to see how they perform xD

    Here you are :)

    My system: i5-3570K @4.2 GHz ; 8 GB RAM ; MSI Ventus XS GeForce RTX 2060 ; Windows 7 64-bit ; DAZ4.11 Beta ; Nvidia driver 419.35

    The benchmarks are done using SickleYield's Test. I didn't change anything in the scene, each time rendered until 100% (5000 frames reached).

    Today you are my hero Ribanok, thank you so much! :D

    RayDAnt said:

    Yeah, we need a thread with just results and no discussions!

    Imo discussions are fine so long as the first post in the thread has a regularly updated summary of the benchmarking results posted after it (which is my game plan for the new benchmarking thread I'm working on.)

    That's a good idea!

    RayDAnt said:

    Remember to not make the benchmark scene bigger than 2Gb, otherwise some GPU couldn't run the test, and I don't know about you, but some of us are still restricted to low-mid end GPUs!

    My current rendering systems are a GTX 1050 2GB based Surface Book 2 laptop and a Titan RTX 24GB based Homebrew desktop. And one of the main requirements I've set for the new benchmarking scene I'm currently developing (already well into the testing phase fwiw) is that it needs to be fully renderable on both/anything in between.

    Great! :D

     

    ebergerly said:

    FWIW, here's a quick update to my spreadsheet showing the RTX 2060 and (I think) RTX Titan numbers from the recent pages with the previous GTX numbers, all from the Sickleyield scene (I think). One of the results didn't include info on system configuration, so I had to do some detective work. I also did a quick look at newegg to get prices of GPUs and added those. If you disagree with any of the numbers feel free to generate your own spreadsheet and post it. 

    And like I said, use these numbers at your own risk. The RTX technology still has a ways to go before we really know what it can do with DAZ stuff, and there are also enough variables with the setup of the scene to make the numbers (IMO) only good by say +/- 15 seconds or more. For example, as I recall, if you did a render, then repeated, the second one would be significantly faster. Not sure if stuff like that was included in everyone's procedure, 

    BTW, I'm feeling pretty good that my GTX-1080ti plus GTX-1070 are only about 15 seconds slower than a $2,500 RTX Titan. I think I paid less than half of that for both. laugh

    Thank you, even one sample is more useful than none!

    My GTX 1060 3Gb was over 12min actually!

    UPDATE: I've taken the test again. I don't know what went wrong, but this time Ryzen 5 1600 (one core and two threads disabled) + GTX 1060 3Gb, OptiX Prime Acceleration enabled, 4.11 Public Build, driver 419.35: 4m 9s.

     

    In any case, I think a parameter for the next render test could be GPU only! Because the CPU part adds a lot of variety.

    Post edited by LenioTG on
  • outrider42outrider42 Posts: 3,679
    edited March 2019
    RayDAnt said:

    And there lies the problem with the original scene, where the top cards are all just a second or two apart. Where a 1080ti+1070 is "only" 15 seconds slower than a Titan RTX. The times are getting so small that the slightest deviation in render time results in a highly skewed result. Is a 1080ti+1070 really just 15 seconds slower than a 2080ti? This chart also shows TWO 1080ti's render the same time as the 1080ti+1070 combo? Something is afoot. Though my machine with two 1080ti's render the SY scene in almost exactly 1 minute, give or take a couple seconds. I hit 58 seconds a few times, 1 minute and 4 seconds other times. There is a 6 second variance, which on this chart is results in a sizable performance gap depending on which time I quote.

    That is why I posted my benchmark. A lot of things had changed from 2015. G3 and G8 released, plus Daz Iray had all new features, such as dual lobe specularity which was not present in the 2015 bench. I made use of these new features. Gaming benchmark suits are updated all the time to take advantage of new features, like DirectX 12 or a new game engine. Also, gaming benchmarks vary wildly to say the least. I just saw a bench suit of over 30 games between several GPUs on Hardware Unboxed, and the results are almost never the same between games. One GPU could have a massive 44% lead over a certain GPU in one game, but actually lose by 12% to that same GPU in another. Having a variety of benchmarks to provide balance is usually a good idea.

    Also, the way Iray calculates convergence CAN change between SDKs. This is a fact as provided by Migenius. And this becomes apparent when looking at the data of each SY scene when users list how many iterations were done. There are cases where people have wondered why their bench took longer or shorter going from 4.8 to 4.9 or 4.10 or 4.11. The answer lies in how many iterations were ran before the scene stopped. The iteration count can vary quite a bit in some cases.

    Additionally, Iray can indeed run slightly different iteration counts from run to run, even with fresh boots, again skewing results. How does this happen? That is because the render engine only checks the convergence every so often. Thus it is possible for Iray to over run the iteration count that it might normally do if it happens to check the convergence at the "wrong" time. However, a hard iteration count will stop Iray at that set number nearly every time.

    And if Iray runs slower in that last 5% of convergence, isn't that even more evidence that convergence alone is not a great measurement?

    So this is why I feel very strongly that capping the iteration count so that it ends before convergence provides more consistent results.

    Still having tests that can show both can have merit.

    Having different tests is fine because just like gaming, different scenes can task different elements of Iray.  Many other render engines have different benchmark scenes for this reason. Luxmark quickly comes to mind. The recent Radeon VII renders certain benchmark scenes MUCH faster than other cards. However, more challenging scenes greatly even out the field, to the point where some cards actually beat the VII. Why? It could be that some simple scenes allow the VII's insane memory bandwidth to run wild. So this proves that certain scenes can influence results, and that having a variety is a good idea.

    Most people using Daz Studio are using humans in their scenes. Thus having a human character in the test scene is very logical, and human skin can be one of the most challenging things to emulate in any render engine. So I think it is important that most bench scenes have a human in them. Its also the key difference between Daz Iray and nearly every other render engine out there. Most test scenes you find are strictly environments with no humans in them. For example, you will not find any humans in a Lux benchmark.

    I admit I was surprised how people started using Rawb's scene here...he did not even post his scene in this thread and people just adopted it. But it has a big drawback, with multiple Genesis 3 or 8 figures the scene may not work on VRAM strapped cards at all. Still, it is different from my or SY's scene in that it is brighter and has a large sheet of glass.

    The newest scene posted has no people, but offers lots of areas for light to bounce around and blend. So that could have merit.

    We know there are multiple elements to every render. The initial light pass, and the shading calculations. Different cards may excel at these differently, especially different generations of cards. So having different tests that can test these different aspects can be a good thing.

    And of course any scene has to use the items that come installed with Daz Studio and any Starter Essentials.

    outrider42, for what it's worth, I think you're probably gonna like the new benchmarking scene I'm currently working on (mostly done with, actually - all I have left to do is calculating the best render limits) since it pretty much directly addresses every single one of your points here (including the one about small render deviations resulting in skewed results for top-end cards/multiple card combinations - more on this once I finish getting my testing methodology fully written up.)

     

    Flipping back a bit, I have talked about the differences between OptiX and OptiX Prime. To repeat the notes from official documents, Prime needs to be recompiled for every new GPU architecture. That is why Pascal did not work when it launched. In fact, Daz would not even run with Pascal at all, it was far more than simply being able to choose OptiX ON or OFF, that wasn't even an option. The same holds true for Turing. Only the Daz beta can utilize Turing, so at some point the beta's plugin had to have been updated for Volta and Turing.

    To be clear, there is no explicit Turing support in any currently publicly available version of Daz Study or Iray. With that said, rendering with Turing cards does currently work (in a performance limited fashion) on the current 4..11 Beta. However this is only because Volta has forwards compatibility with core elements of the Turing architecture, and Volta has been explicitly supported by Iray since last summer.

     

    Prime cannot be updated to RTX. It is due to how Prime uses VRAM. Prime uses more VRAM than standard OptiX. The more rays, and the faster they are cast, the more GPU VRAM balloons in Prime. Apparently RTX cards shoot so many rays and so fast that they would totally overflow just about every GPU's frame buffer. It is not a matter that it cannot be done, it is a matter that it would be unusable. This why OptiX Acceleration does not work with RTX cards, why the times are basically the same (or actually slower in most cases.) Standard OptiX does not do this.

    While you are correct that OptiX Prime both uses more video memory and sometimes leads to measurable performance decreases on Turing hardware if enabled (depending on how many light sources/how much ray-tracing activity happens in a scene - at least that seems to be the pattern with my Titan RTX) the true reason for this latter phenomenon isn't what you think.

    OptiX Prime is a self-contained headless computer program (aka a DLL) which spits out ray-tracing calculations at the behest of a parent program (in this case Daz Studio's Iray render engine plugin.) It does this by either using general purpose "compute" processing power on an Nvidia GPU (via Nvidia's proprietary GPGPU Cuda API) or by using general purpose processing power on an Intel or AMD CPU (via Intel's non-proprietary ray-tracing API Embree.) The key detail here is the phrase using general purpose processing power. Although the data which OptiX Prime produces is fundamentally tied to graphics rendering, the actual computational workload of the program itself is the same as that of a general run-of-the-mill computer program - not part of a dedicated graphics rendering pipeline (like texture shading for instance.)

    This is significant because, in a first over predecessors like Pascal, Turing based GPUs feature physical hardware enhancements (concurrent FP & INT execution data paths in a 100:36 performance ratio) tailored towards accelerating specific parts of a dedicated graphics rendering pipeline (like texture shading) and rather than general GPGPU compute workloads like OptiX Prime. To put it another way, Turing does a much better job of excellerating texture shading workloads than it does compute workloads. And since OptiX Prime is technically a compute workload, using OptiX Prime on Turing hardware (without functioning RTCores - once those come online OptiX Prime will be soundly left in the dust) oftentimes actually leads to decreased rendering performance compared to not using it.

    OptiX Prime acceleration was conceived of during a time when pure graphics rendering and pure compute processing on Nvidia GPUs was about equal performance-wise. Turing GPUs mark a major paradigm shift away from that balance with their inclusion of ASIC-like accelerators like RTCores for ray-tracing and Tensor cores for deep learning. This makes past GPGPU-based solutions for doing the same/similar things (like OptiX Prime for RTCores) functionally obsolete. OptiX Prime acceleration is currently (and still will for the foreseeable future) fully operational on Turing GPUs. It just doesn't necessarily improve performance anymore.

     

    However, OptiX Prime is vital for Daz Studio, because obviously not everyone has a Nvidia GPU, and there are people who desire rendering scenes that do not fit on a GPU they may have. OptiX Prime was also important in helping make Iray a decently fast render engine for its time. So does OptiX 6 add a CPU fallback mode?

    Fyi there is not now, nor ever has been, any non-Nvidia GPU support in either OptiX, OptiX Prime, or even Iray itself. However there has and continues to be OptiX Prime and Iray support for Intel and Nvidia CPUs via Embree. And I have yet to see anyone even hint at OptiX Prime acceleration being removed as an option in Daz Studio/Iray at some point. Which would make no logical sense.since it is still useful on a whole class of Iray supported devices (CPUs) - it being effectively obsoleted on Turing hardware is an edge case.

    I know there is no Turing support for Iray. The beta came out before Turing did, and it would be impossible for the SDK that the beta has to contain Turing drivers. But the beta did add Volta support.

    And of course Iray is Nvidia only. It is based on CUDA, which is Nvidia's API. Whether or not it can run on anything is not the point. It does not run on everything. Nor does the standard version of OptiX, as it only runs on GPU. OptiX is not OptiX Prime, and they are not even considered to be under the same umbrella.

    "My question is whether OptiX Prime counts under the umbrella of OptiX for this purpose."
    No. We have no plans to support RTX acceleration with OptiX Prime.

    That comes from the Nvidia forum, the people who mod that forum are all actual developers on the project. 

    Whether something can or cannot be done does not mean it will. The people at Octane tried for years to make CUDA run on AMD. They made this promise I believe about 4 years ago. They are still claiming that they want to make it happen, but after 4 years, I cannot say my hopes are very high. Not to mention, they claimed to be "close" when they made that announcement 4 years ago. So while CUDA may just be general purpose processing...it sure has been tough to crack.

     

    ebergerly said:

    Some things to consider:

    1. Sickleyield posted her benchmark scene 4 years ago. Since then there have been a ton of kind folks who spent a lot of time posting results with their particular cards, which is why we have a nice list of relative performance data with different cards.
    2. Each time a new benchmark scene comes out, that process starts from scratch, and all previous data is effectively erased.
    3. As I recall, the results in the spreadsheet seem to scale up to much longer scenes. For example, if someone found that a 1080ti+1070 rendered Sickleyield's scene 35% faster than a 1080ti alone (1.3 minutes vs 2 minutes), that 35% improvement also happened with a much longer rendering scene. So even with relatively fast render times, they seem to apply to longer scenes. If there's doubt, just render the Sickleyield scene and then a longer scene to see if the differences are the same. 
    4. No matter what scene you use, you honestly can't rely on any render time results that get posted with any accuracy for many reasons:
      1. Maybe someone's system was throttling because it got too hot because they didn't clean the dust bunnies or messed with the BIOS or 20 other reasons.
      2. Maybe someone posted the times for the slower, first render of the scene, not the faster second render.
      3. Maybe they're reading different render times from different sources and not comparing apples to apples.
      4. Maybe they have different render settings, like convergence, etc.
      5. And so on...

    So at the end of the day, all you can do is look at reported render times and get a very general idea of relative (not exact) performance compared to other cards. And if an RTX Titan seems to render a scene only something like 30% faster than a combo of 1080ti and 1070 at more than twice the price, then the important takeaway (for me at least) is that yeah, the RTX series are really overpriced and not yet ready for primetime. Whether it's EXACTLY 15 seconds difference is somewhat irrelevant.

    Because keep in mind that many/most of us try to make our scenes so that they render relatively fast so we don't have to sit there for an hour waiting for a render. So the Sickleyield results may be more appropriate for many users. Especially since cards are so insanely overpriced lately, maybe (hopefully) folks are looking at stuff like scene management and compositing to make more lightweight scenes that render faster and more efficiently. 

    That is quite an assumption there. It is pretty hard to render in less than an hour unless you just render at tiny resolutions or use no backgrounds. Almost everything coming to the store these days is extending render times because of its design. Many new characters are using chromatic ss, which seriously balloons render times. Simply switching from a non chromatic skin to chromatic can double render times easily, especially if you want to do a proper close up.

    Then you have 3D environments. Using one will dramatically increase render times. Toss in just 2 or 3 Genesis 8 characters with any decent 3D environment and you looking at a render that will take hours even with a 1080ti. Considering that new environments are released pretty much every day, and there are PAs who ONLY make environments, I think it is safe to say many people are rendering for hours.

    Have you stopped to consider that maybe Iray has a speed limit? That perhaps the SY scene can reach a point to where it renders so fast that Iray itself cannot go faster? If that was to happen, then the results would certainly be skewed. This happens in gaming. Some older games actually have limits to how fast they can run, and some reach CPU limits, in either case this can cause GPUs to have very similar results even when the GPUs may be quite far apart in raw power. This skews results and can make cards look much closer in ability than they really are. We are reaching render times in the 30 second range when people stack cards. So I have to wonder if the results are reaching a peak that is causing the data to look closer than it should.

    That is why you have different benchmarks. If Iray had never changed, then perhaps you might have a point. But Iray has changed. The version of Iray in Daz today is quite different from what it launched with back in 2015. In fact, Daz 4.8 Iray had a very well known bug with SS that caused some things to appear redder than they should. New versions have added dual lobe spec and chromatic ss. There is also dforce...you know I'd like to see a dforce bench.

    The RTX Titan has 24GB of VRAM. Your 1070 only has 8. The instant you exceed that the 1070 becomes a brick. And if you exceed the 11 in the 1080ti, it becomes a very expensive paperweight. The Titan has its purpose, and it is also just one GPU. That person could add any other card to their machine and render even faster. It is indeed perhaps too expensive, and the Titan is only a bit more powerful than the 2080ti. But it may not be wise to justify your belief based on one 4 year old benchmark. The Titan has always been a hybrid between gaming cards and Quadro, and as such the Titan has additional features. One being TCC, Tesla Compute Cluster. This requires having another video card to drive the video, as the Titan will become a dedicated compute card. This has two perks, it makes every once of VRAM available, so Windows does not take any VRAM from it. The second is it can increase performance. The Titan V shows a marked improvement in several rendering apps when using TCC mode. I have never seen anyone test TCC with Iray, so I have no idea if it makes a difference. Perhaps our Titan user can test this. But I have a feeling it can impact performance some.

    If I bought a 2080ti, it would render as fast or faster than my two 1080ti's together. I vividly recall you said that was not possible, and yet bluejante proved it. So even without being "ready for prime time", the cards are still very fast. And it will be even faster in the future. OptiX 6.0 has released, and the results from it are spot on with the claims made months ago.

    This comes from here http://boostclock.com/show/000250/gpu-rendering-nv-rtxon-gtx1080ti-rtx2080ti-titanv.html

    Here's some additional testing from a fellow on the Nvidia forums. https://devtalk.nvidia.com/default/topic/1047464/optix/rtx-on-off-benchmark-optix-6/

    Turning RTX on speeds up the render by anywhere from 2.5x to over 9x the speed of rendering with RTX off. And this is before considering the fact that 2080ti is already faster than everything but the Titan with RTX off.

    I don't know if Iray will get OptiX 6.0. It is very hard to say. But even if it does not, there other render engines taking advantage of RTX like Octane. The 2019 Octane Preview is something you can download and try today. In the scene below, the 2080ti already enjoys a pretty large lead over the 1080ti, but turning RTX on causes it to just explode.

    Post edited by outrider42 on
  • bluejauntebluejaunte Posts: 1,861

    Tomorrow Arnold 5.3 for GPU will go to open beta. Apparently they already support RTX so maybe some interesting benchmarks will come out of that. Or I guess we could get a trial of the standalone and see what a 2080 TI has over a 1080 TI.

  • LenioTGLenioTG Posts: 2,118

    Wow guys...those benchmark are great! You're basically suggesting me to wait for a RTX 2060 rather than a GTX 1070/1660 Ti!

    When do you think RTX support will come to Iray? Because it will come, right?

  • RayDAntRayDAnt Posts: 1,120
    edited March 2019
    kameneko said:

    Wow guys...those benchmark are great! You're basically suggesting me to wait for a RTX 2060 rather than a GTX 1070/1660 Ti!

    When do you think RTX support will come to Iray? Because it will come, right?

    There was already a fully RTX accelerated demo of Iray more than a month ago. It's release is most likely imminent (see this conversation over at the Iray developers forum.

    I have never seen anyone test TCC with Iray, so I have no idea if it makes a difference. Perhaps our Titan user can test this. But I have a feeling it can impact performance some.

    It's on my to-do list. It goes without saying that 3d rendering isn't the only thing I use my computer for. And using TCC means having to revert back down to CPU integrated graphics for primary display output and acceleration of any non-TCC compatible graphics workloads.

    Also fwiw there are already benchmarks from at least one user in this thread using TCC enabled on multiple Titan X's. Although, if memory serves, they didn't post any sort of with/without it performance comparison.

    UPDATE: found it. It actually ually does have some relative performance info. But I don't know how useful it is since it was in the very early days of TCC Titan support and in mixtures of TCC supported/unsupported cards.

    Post edited by RayDAnt on
  • DA KrossDA Kross Posts: 70
    edited March 2019

    I recently bought an RTX 2080. My Previous card was a 980 ti. I am using the 4.11 beta.

    Zotac Geforce GTX 980 Ti AMP! Edition

    OptiX On: 2:12 (132 seconds)

    OptiX Off: 2:43 (163 seconds)

    Gigabyte Geforce RTX 2080 Gaming OC

    OptiX on 1:15 (75 seconds)

    OptiX off 1:47 (107 seconds)

    The Delta (OptiX on versus Off) wth the 980 ti is 19.01% and the Delta with the 2080 is 29.90%. I have no idea if that is attributed to the RT cores alone or just the newer GPU. The 2080 is 43.18% faster at rendering than the 980 ti with OpiX on,

     

    Post edited by DA Kross on
  • fred9803fred9803 Posts: 1,559

    Just to clarify D-A Kross. Which test scene are you using? Are you rendering to 100% convergence? Are you starting from the Iray preview mode or from scratch?

    I ask because my 2080 takes more than twice your time to render.

  • stormqqstormqq Posts: 76
    kameneko said:

    Wow guys...those benchmark are great! You're basically suggesting me to wait for a RTX 2060 rather than a GTX 1070/1660 Ti!

    When do you think RTX support will come to Iray? Because it will come, right?

    I have the same question. however, nvidia claims they are going to bring iray support for rtx cards. According to their post made in the following site, it renders fast with rtx cards than gtx ones. nvidia says it's caused by AI, and real time ray tracing, which are only avaliable in RTX cards. all in all, it seems like gtx 2060 is going to be worth in the future, but at the moment gtx 1660 ti is worth. we can't predict the future, we never know Nvidia will introduce real time ray tracing, ai denoising features to consumers cards too, the post says about quadro cards. so I suggest you to wait for a couple of months to see how it turns out, if you are rush to buy then go ahead and get gtx 1660 ti. I don't trust Nvidia, I feel like it will take at least a couple of years to see a significant boost in rendering speed with rtx cards. however, if you use octane then it's a different story. 

    https://blogs.nvidia.com/blog/2019/02/10/quadro-rtx-powered-solidworks-visualize/

  • DA KrossDA Kross Posts: 70
    edited March 2019
    fred9803 said:

    Just to clarify D-A Kross. Which test scene are you using? Are you rendering to 100% convergence? Are you starting from the Iray preview mode or from scratch?

    I ask because my 2080 takes more than twice your time to render.

    Sickle Yield's starter scene to 5000 samples and 400 x 520 reosolution (the default settings that came with the scene)

    Post edited by DA Kross on
  • LenioTGLenioTG Posts: 2,118
    RayDAnt said:
    kameneko said:

    Wow guys...those benchmark are great! You're basically suggesting me to wait for a RTX 2060 rather than a GTX 1070/1660 Ti!

    When do you think RTX support will come to Iray? Because it will come, right?

    There was already a fully RTX accelerated demo of Iray more than a month ago. It's release is most likely imminent (see this conversation over at the Iray developers forum.

    Nice to know, thank you! :D

    I'm not in a hurry to buy that RTX 2060 anyway! :)

  • neumi1337neumi1337 Posts: 18

    It wouldn't hurt if Daz3d would be talking about the nvidia RTX support in their new update (presumably the new Iray version with RT core support)/Optix 6.0).

    https://nvidianews.nvidia.com/news/nvidia-rtx-ray-tracing-accelerated-applications-available-to-millions-of-3d-artists-and-designers-this-year

  • bluejauntebluejaunte Posts: 1,861

    Yeah so, kudos to NVIDIA. I was one of the guys saying Iray feels like an afterthought to them but it doesn't really look that way now. Eating my words laugh

  • GPUs: Dual Nvidia RTX 2080 TI
    CPU: Intel i9-9900K

    Rendered with CPU on and both cards, Optix on: 34.4 seconds
    Rendered with GPUs only, CPU off, Optix on: 38.6 seconds.

  • RayDAntRayDAnt Posts: 1,120
    edited March 2019

    To sort of follow on from outrider42's last post, Nvidia recently published the following two graphics as part of its announcement of RTX backwards compatibility on most Pascal 10xx series cards. Which, if you study them carefully, shed a great deal of light on/confirm what people have so far noticed about Iray rendering performance on Turing based GPUs. Particularly as regards the use of OptiX Prime acceleration (that it actually takes longer to render with it enabled than not in scenes featuring more advanced material/lighting interactions.) First the graphics:

    Caveat: both these graphics are illustrations of Frametime performance patterns in a gameplay oriented biased rendering engine (in this case the game engine for Metro Exodus) and not a best-visual-quality oriented unbiased rendering engine like Iray. This means that the same graphics for an Iray iteration would most likely feature several times longer processing times in the post ray-tracing portion of the rendering process (the combination of the parts labeled "INT + FLOAT" and "TENSOR CORE" in the 2nd graphic.) With this said, the main useful takeaways are:

    1. Each frame/iteration workload features 4 distinct stages of processing:
      1. Pre-processing (the area labeled above as "FLOAT")
      2. Ray-tracing (the area labeled above as "RT CORE")
      3. Shading (the area labeled above as "INT + FLOAT")
      4. Post-processing (the area labeled above as "TENSOR CORE")
    • The most computationally complex and consequently time-consuming part of this process (by far) is ray-tracing. 
    • Ray-tracing is much more efficiently done as concurrent mixtures of INT and FLOAT operations than just concurrent FLOAT operations (approximately 2x more efficient.) 
    • Ray-tracing is muuuuuch more efficiently done as dedicated RTCore operations than either concurrent mixtures of INT and FLOAT operations (approximately 8x more efficient) or just concurrent FLOAT operations (approximately 16x more efficient.)

    This explains why:

    Cuda core-only Iray rendering is significantly faster on Turing hardware than equivalent Cuda core count Pascal hardware. Because Turing features concurrent INT and FLOAT operations.

    Cuda core-only Iray rendering using OptiX Prime acceleration sometimes (scene dependent) leads to measurably longer rendering times on Turing hardware. Because OptiX Prime ray-tracing acceleration is based in part on having to process INT operations without the benefit of INT specific Cuda cores (since INT specific Cuda cores didn't exist as a thing prior to Turing.) Meaning that rendering scenes needing a significant numbers of INT operations with OptiX Prime enabled will lead to less efficent rendering times since the usual performance gains of OptiX Prime are already being achieved at a hardware level - leaving you with just the added overhead of OptiX Prime essentially spinning its wheels for nothing.

    Turing GPUs will be orders of magnitude faster at Iray rendering overall (not just in terms of the the ray-tracing part of the process) with the utilization of RTCores than without. The key question of course is by how much. My current best educated guess is 3x times more efficient on average, with specific scenes varying widely both down and up from that number depending on scene content.

    Post edited by RayDAnt on
  • Takeo.KenseiTakeo.Kensei Posts: 1,303
    edited March 2019
    RayDAnt said:
    This explains why:

    Cuda core-only Iray rendering is significantly faster on Turing hardware than equivalent Cuda core count Pascal hardware. Because Turing features concurrent INT and FLOAT operations.

    Cuda core-only Iray rendering using OptiX Prime acceleration sometimes (scene dependent) leads to measurably longer rendering times on Turing hardware. Because OptiX Prime ray-tracing acceleration is based in part on having to process INT operations without the benefit of INT specific Cuda cores (since INT specific Cuda cores didn't exist as a thing prior to Turing.) Meaning that rendering scenes needing a significant numbers of INT operations with OptiX Prime enabled will lead to less efficent rendering times since the usual performance gains of OptiX Prime are already being achieved at a hardware level - leaving you with just the added overhead of OptiX Prime essentially spinning its wheels for nothing.

    I don't think the INT gain for gaming can be of any use. For what I know, in 3D rendering, you do calculation in FP32

    Potential gain could be with FP16 but even that may not even be applicable for 3D render since we tend to rather need more precision

    For me there are two factor for acceleration :

    Cuda vs RTCore raytracing which you can see in the RTCore part

    New vs old cuda core : New Cuda cores are bigger than Pascal ones and allow more operations to be done. I think that for the same cuda core count you can expect a 30% gain

    That explains that the RTX 2060 beats the GTX 1070 in actual bench without RTCore acceleration

    From that I would expand some conjecture : the GTX 1660 could well be on par with a GTX 1070 for Iray with less cuda cores

    * Edit for precision

    Post edited by Takeo.Kensei on
  • ebergerlyebergerly Posts: 3,255
    edited March 2019

    I suppose another approach to speculating on the RTX performance when it finally is ready for prime time is to look at the prices and assume that NVIDIA understands that its customers will pay somewhat proportionally more for proportional performance increases. Kinda like it was for the GTX series as I recall. And I'm assuming NVIDIA has a real good idea of how the RTX will actuallly perform in 6 months or so when all of this is ironed out, which is how they set the pricing for the various models.

    Seemed like with GTX they charged something like 30-40% more for what ended up being something like a 30-40% decrease in Iray render times (eg, between 1050 to 1060 to 1070, etc.). Unless my rememberer is broken...

    And with RTX the price differences at this point seem to be similar between 2060, 2070, and 2080...like a 40% increase in price for each successive RTX model ($350, $500, $700). And at $1200 the 2080ti is like a 70% increase above the 2080. 

    So right now the 2060 gives like a 25% improvement (ie, decrease) in render times over a 1070. And maybe when this all gets ironed out it might increase to say 40-50% decrease in render times.

    So here's my speculation based solely on prices as indicator of future performance:

    • A 20 minute render with a 1070 might become a 10 minute render in a 2060 when this all gets ironed out. 
    • The same render might become a 6 minute render in a 2070 (40% improvement)
    • The same render might become a 3.5 minute render in a 2080 (42% improvement)
    • The same render might become a 1 minute render in a 2080ti (71% improvement)

      Personally, I guess I prefer this approach rather than speculating on future performance based on stuff like INTs and FLOAT's. laugh

    Post edited by ebergerly on
  • RayDAntRayDAnt Posts: 1,120
    RayDAnt said:
    This explains why:

    Cuda core-only Iray rendering is significantly faster on Turing hardware than equivalent Cuda core count Pascal hardware. Because Turing features concurrent INT and FLOAT operations.

    Cuda core-only Iray rendering using OptiX Prime acceleration sometimes (scene dependent) leads to measurably longer rendering times on Turing hardware. Because OptiX Prime ray-tracing acceleration is based in part on having to process INT operations without the benefit of INT specific Cuda cores (since INT specific Cuda cores didn't exist as a thing prior to Turing.) Meaning that rendering scenes needing a significant numbers of INT operations with OptiX Prime enabled will lead to less efficent rendering times since the usual performance gains of OptiX Prime are already being achieved at a hardware level - leaving you with just the added overhead of OptiX Prime essentially spinning its wheels for nothing.

    I don't think the INT gain for gaming can be of any use. For what I know, in 3D rendering, you do calculation in FP32

    Ray-traced rendering requires a significant percentage of INT operations because intersections are calculated using bounded volume heirarchies, which are a form of discrete math structure ("discrete math" is just a fancy term for mathematical constructs that DON'T use continuous aka floating point data.) So for real time ray-traced gaming graphics/not necessarily real time ray-traced 3d production graphics like Iray, having dedicated INT processors would indeed significantly improve performance.

  • ebergerlyebergerly Posts: 3,255
    edited March 2019
    RayDAnt said:
    Ray-traced rendering requires a significant percentage of INT operations because intersections are calculated using bounded volume heirarchies, which are a form of discrete math structure ("discrete math" is just a fancy term for mathematical constructs that DON'T use continuous aka floating point data.) So for real time ray-traced gaming graphics/not necessarily real time ray-traced 3d production graphics like Iray, having dedicated INT processors would indeed significantly improve performance.

    It's been a while, but last year I wrote a super simple ray tracer from scratch in C# (it could use either CPU or GPU for rendering), and I recall that most of it was Vector3's with XYZ values as floats for the rays and hit points, etc.. But I did simple spheres, not bounding boxes. And the only INT's were the description of the image-related stuff in pixels.

    Which raises the question about how you can have float vectors interacting with INT stuff...seems like a mess. 

    Capture.JPG
    915 x 727 - 42K
    Post edited by ebergerly on
  • ebergerlyebergerly Posts: 3,255
    edited March 2019

    Oh wow, I forgot I even did a Windows Forms version that reads from a CSV file which defines the scene parameters. laugh

    Okay, to get back on track, it will be nice when we finally get all of this RTX stuff ironed out and we can actually put render times to all of this rather than having to rely on vague speculation. 

    Capture1.JPG
    1191 x 817 - 87K
    Post edited by ebergerly on
  • outrider42outrider42 Posts: 3,679
    edited March 2019

    Some thoughts:

    Why is it that we hear confirmation for RTX for Daz3D from Nvidia before Daz themselves???? And a date on top of that, at least 2019 is something. The secrecy around here...its ridiculous. Is there a reason why Daz does not wish to inform their customers of what is being planned? Ever. Do you have any idea how much customers have been concerned about the state of Iray, and therefor Daz Studio itself, ever since RTX came out? You see, when you do not inform your customers of what you are doing, your customers will speculate about it. And often in a bad light.

    What's even nuttier, is that there is a comment for RTX from Daz for the Nvidia presentation.

    Daz 3D will support RTX in 2019 “ Many of the world’s most creative 3D artists rely on Daz Studio for truly amazing photorealistic creations. Adding the speed of NVIDIA RTX to our powerful 3D composition & rendering tools will be a game changer for creators.” STEVE SPENCER GM & VP of Marketing | Daz 3D.

    Why is this statement not here, somewhere, anywhere on this website?

    But....because nobody wants to talk to us customers, there is still something to speculate. Take a look at the list of software getting RTX in 2019:

    The first software providers debuting acceleration with NVIDIA RTX technology in their 2019 releases include:

    • Adobe Dimension & Substance Designer
    • Autodesk Arnold & VRED
    • Chaos Group V-Ray
    • Dassault Systèmes CATIALive Rendering & SOLIDWORKS Visualize 2019
    • Daz 3D Daz Studio
    • Enscape Enscape3D
    • Epic Games Unreal Engine 4.22
    • ESI Group IC.IDO 13.0
    • Foundry Modo
    • Isotropix Clarisse 4.0
    • Luxion KeyShot 9
    • OTOY Octane 2019.2
    • Pixar Renderman XPU
    • Redshift Renderer 3.0
    • Siemens NX Ray Traced Studio
    • Unity Technologies Unity (2020)

    Notice something...missing? OK, now this is what I see. I think Daz is the only one on this list using Iray. For example, iClone, which also has an Iray plugin now, is not listed here. And...there is no mention of Iray anywhere in this presentation, nor in Steve's statement. 

    So what does this mean? Will RTX be coming to Iray...or will there be a whole new render plugin for Daz?

    How about somebody from Daz stepping up to the plate and confirming something to its customers here on the forum. The cat is already out of the bag for RTX, so how about telling us where the RTX is coming from, and if Iray will continue to be a part of Daz Studio moving forward if indeed Daz is getting a new render engine. Will there be a Legacy Iray for users? There are a LOT of questions to be had.

    Post edited by outrider42 on
  • dougjdougj Posts: 92

    Asus RTX 2060 / Studio 4.11 Beta

    GPU only OptiX ON: 2min 14sec

    GPU only OptiX OFF: 2min 57sec

     

     

  • LenioTGLenioTG Posts: 2,118

    Asus RTX 2060 / Studio 4.11 Beta

    GPU only OptiX ON: 2min 14sec

    GPU only OptiX OFF: 2min 57sec

    Thank you for posting your results! :D

    I'm going to buy a RTX 2060...when I'll have the money, so not any time soon! xD (Especially if those guys at Daz continue going out with great sales xD)

  • outrider42outrider42 Posts: 3,679

    Asus RTX 2060 / Studio 4.11 Beta

    GPU only OptiX ON: 2min 14sec

    GPU only OptiX OFF: 2min 57sec

     

     

    You lost a lot of time with OptiX off, which seems not on par with most RTX benches posted. Does that disparity hold up for other scenes, like a different benchmark?

    And why would the 2060 benefit from OptiX while the other RTX cards barely see any meaningful difference? This really makes me wonder if there truly is a "speed limit" for OptiX that the high end cards are hitting.
  • RayDAntRayDAnt Posts: 1,120

    Asus RTX 2060 / Studio 4.11 Beta

    GPU only OptiX ON: 2min 14sec

    GPU only OptiX OFF: 2min 57sec

     

     

    You lost a lot of time with OptiX off, which seems not on par with most RTX benches posted. Does that disparity hold up for other scenes, like a different benchmark?

    And why would the 2060 benefit from OptiX while the other RTX cards barely see any meaningful difference? This really makes me wonder if there truly is a "speed limit" for OptiX that the high end cards are hitting.

    That's actually about on par with my Titan RTX - 1:08 with OptiX Prime on vs 1:27 with it off. That's a 19 second difference,or roughly 25% worse performance.

  • Takeo.KenseiTakeo.Kensei Posts: 1,303
    edited March 2019

    Asus RTX 2060 / Studio 4.11 Beta

    GPU only OptiX ON: 2min 14sec

    GPU only OptiX OFF: 2min 57sec

     

     

     

    You lost a lot of time with OptiX off, which seems not on par with most RTX benches posted. Does that disparity hold up for other scenes, like a different benchmark?

     

    And why would the 2060 benefit from OptiX while the other RTX cards barely see any meaningful difference? This really makes me wonder if there truly is a "speed limit" for OptiX that the high end cards are hitting.

    Not really what I've seen

    https://www.daz3d.com/forums/post/quote/53771/Comment_4104011

    Robinson said:

    Daz Studio Public Beta 4.11
    Stock scene as downloaded, unmodified. All times are to full completion.

    GPU Only 1 x Geforce RTX 2070  = 2 minutes 21.14 seconds (Optix off)
    GPU Only 1 x Geforce RTX 2070  = 1 minutes 49.11 seconds (Optix on)
    GPU Only 1 x Geforce RTX 2070 + 1 x Geforce GTX 970 = 1 minutes 45.78 seconds (Optix off)
    GPU Only 1 x Geforce RTX 2070 + 1 x Geforce GTX 970 = 1 minutes 19.47 seconds (Optix on)

    All to full completion.  My 2070 is the MSI Armor, i.e. non-binned chip at stock.  When I benchmarked before with just the 970 I didn't get any benefit from Optix.  It made little to no difference.  With the 2070 it appears to.

    https://www.daz3d.com/forums/post/quote/53771/Comment_4234711

    banpei said:

    Intel i7-9700

    Asus Z390-A

    Corsair 2x8GB DDR4-3600

     EVGA - GeForce RTX 2080 8 GB XC ULTRA GAMING Video Card

    Samsung Evo 860 1 TB

    Public Beta 4.11

    Sickleyield's Scene:

    CPU+/GPU+/Optix Prime- :  2:00 mins

    CPU-/GPU+/Optix Prime-:  1:50

    CPU-/GPU+/Optix Prime+: 1:24

    CPU+/GPU+/Optix Prime+: 1:23

     

    https://www.daz3d.com/forums/post/quote/53771/Comment_4272246

    RayDAnt said:
     

    SickleYield's Benchmark (Iray Scene For Time Benchmarks)

    Titan RTX + OptiX Prime ON: 1 minutes 5.59 seconds

    Titan RTX + OptiX Prime OFF: 1 minutes 24.77 seconds

    I think the SickleYield scene may favor Optix ON

    I didn't run any bench so I don"t know the difference between the scenes, so I can't form any hypothesis and there may not be enough precise datas for that

    Post edited by Takeo.Kensei on
  • dougjdougj Posts: 92
    edited March 2019

    Asus RTX 2060 / Studio 4.11 Beta

    GPU only OptiX ON: 2min 14sec

    GPU only OptiX OFF: 2min 57sec

     

     

     

    You lost a lot of time with OptiX off, which seems not on par with most RTX benches posted. Does that disparity hold up for other scenes, like a different benchmark?

     

    And why would the 2060 benefit from OptiX while the other RTX cards barely see any meaningful difference? This really makes me wonder if there truly is a "speed limit" for OptiX that the high end cards are hitting.

    I tested the OptiX setting on a different scene and it didn't show as much variation as the benchmark scene.

    OptiX ON: Total Rendering Time: 1 minutes 47.14 seconds*

    OptiX OFF: Total Rendering Time: 1 minutes 56.79 seconds*

     

    *Note: These times are not for the benchmark scene

    Post edited by dougj on
  • tj_1ca9500btj_1ca9500b Posts: 2,047
    edited March 2019

    Has anyone benched the Threadripper 2990WX by itself yet?  I can extrapolate from the 16 core Threadripper benches, but the 2990WX has a 'unique' memory config, hence why I'm still curious.

    Yes, it WILL be slower than most GPU's, but it might be handy for really big scenes with lots of characters, if you aren't feeling like rendering different portions of the scene in multiple passes, hence my continued curiousity...

    Post edited by tj_1ca9500b on
  • outrider42outrider42 Posts: 3,679

    I wonder why the benchmark scene behaves that way?

    It also validates the reasoning behind using multiple and different benchmarks.

    I'd like to see if anybody has one of those Threadrippers as well. Also, I'd really like to see Threadripper used in multiGPU systems. The old Puget benchmarks claimed that having more CPU cores boosted multiGPU speeds, even when the CPU was not being used to actively render. They were using Xeons, as that test was long before Ryzen launched.

    So I would love to see not only how new Threadrippers perform solo, but how they effect the speeds in multiGPU. If they do, then there is double incentive for buying as many cores as you can.

  • LenioTGLenioTG Posts: 2,118

    I wonder why the benchmark scene behaves that way?

    It also validates the reasoning behind using multiple and different benchmarks.

    I'd like to see if anybody has one of those Threadrippers as well. Also, I'd really like to see Threadripper used in multiGPU systems. The old Puget benchmarks claimed that having more CPU cores boosted multiGPU speeds, even when the CPU was not being used to actively render. They were using Xeons, as that test was long before Ryzen launched.

    So I would love to see not only how new Threadrippers perform solo, but how they effect the speeds in multiGPU. If they do, then there is double incentive for buying as many cores as you can.

    It's true that it would be more reliable with more benchmark scene, and that we need just a handful of people willing to do them well...but don't you think we already have a few people willing to take the test in the right way, and that making it even more complex would just kill this benchmark thing? Isn't there any way to develop a new test scene that takes into factor those new variables?

Sign In or Register to comment.