Iray Starter Scene: Post Your Benchmarks!

1343537394049

Comments

  • RayDAntRayDAnt Posts: 1,120
    edited February 2019

    As for SDK documentation being wrong because it is newer version

    Names for things change and routines get added/deperecated between releases. So if you happen to be attempting to do forensics on a program by searching through already-compiled code for human-readable hints of what does what, being on the right version of documentation for things is kind of important.

     

    Furthermore, if you check the optix_prime.1.dll you will find the string:

    OptiX Version:[5.0.1] Branch:[rel5.0] Build Number:[23995567] CUDA Version:[9.0] 64-bit 2018-04-24

    That's because OptiX Prime is technically a subset API of whichever version of OptiX it was compiled under (optix_prime.1.dll is just a precompiled library of raytracing specialized subroutines written using the full OptiX API.)

    In any case, the reason why the presence of that particualr version string in the log file is significant to all this is because if you initiate a full Iray render in a mode where it has been thoroughly documented that OptiX Prime acceleration is indeed the only OptiX-related thing being used (example: Photoreal mode with OptiX Prime Acceleration enabled), Iray reports confirmation of this back to DS like this:

    2019-02-08 15:10:11.787 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : Using OptiX Prime ray tracing (5.0.1).

    With no mention of just OptiX being used alone. Whereas if you execute a render in the one mode where it has been (less) thoroughly documented that Iray uses OptiX for the entire rendering process rather than just OptiX Prime's ray-tracing (Interactive mode with at least one Nvidia GPU selected as an active render device) Iray reports confirmation of this back to DS like this:

    2019-02-05 13:04:09.694 Iray INFO - module:category(IRT:RENDER):   1.0   IRT    rend info :   OptiX Version:[5.0.1] Branch:[rel5.0] Build Number:[23995567] CUDA Version:[9.0] 64-bit 2018-04-24 

     With no mention of OptiX Prime being used for any part of the process. The only explanation for this difference in logging behavior is if the latter render case uses full OptiX while the former doesn't.

    Finally, in order to statically link something you need to have either:

    1. Full source code

    OR

    2. Static library (.lib) file

    OptiX SDK (5 and 6) do not have either.

    Iray and OptiX (at least up until the most recent version of Iray used in DS Beta) have both been in-house productions at Nvidia. Meaning that Iray's developers clearly have had full access to OptiX's source code (beyond what you or I can get with an SDK) from the get-go.

    Furthermore, if you dig through the annals of Iray-related documentation on places like on-demand-gtc.gputechconf.com, you will find mentions of OptiX being used as a complete rendering alternative to Cuda in Iray Interactive mode going back years (example: this slide from a presentation on Advanced OptiX Programming techniques given at GTC back in 2014):

    Notice how it contrasts Iray running on "OptiX" in Interactive mode vs. CUDA in Photoreal. OptiX Prime can't be used as a substitute for CUDA-based rendering because it (by design) only handles ray-tracing. The only thing called OptiX capable of doing something like that is the full OptiX API.

    Post edited by RayDAnt on
  • RayDAntRayDAnt Posts: 1,120
    edited February 2019

    >With no mention of OptiX Prime being used for any part of the process.
    >The only explanation for this difference in logging behavior is if the latter render case uses full OptiX while the former doesn't.

    I have literally shown you that the latter logging string is coming from Prime library and as such can't be the proof that full OptiX is used.

    I've been doing some more digging, and it seems that we are actually both kind of right.

    You are correct in that the logging string I found does indeed come directly from optix-prime.dll after all (despite the misleading lack of "Prime" appearing in its name...) If you take the optix-prime.dll found in DS 4.10 (OptiX Prime Library version 3.9.1.0), replace it with the optix-prime.dll found in the current DS 4.11 Beta (OptiX Prime Library version 5.0.1.0) and then render a scene in DS 4.10 using Interactive Rendering Mode on an Nvidia GPU (which renders perfectly fine by the way) you get this in the log file:

    2019-02-09 22:03:36.101 Iray INFO - module:category(IRT:RENDER):   1.0   IRT    rend info :   OptiX Version:[5.0.1] Branch:[rel5.0] Build Number:[23995567] CUDA Version:[9.0] 64-bit 2018-04-24 

    Ie. the version string has changed to match the newer OptiX Prime DLL. However, you also get this right after it:

    2019-02-09 22:03:36.101 WARNING: dzneuraymgr.cpp(307): Iray WARNING - module:category(IRT:RENDER):   1.0   IRT    rend warn : OptiX Prime versions do not match (library: 50001, internal: 3091), behavior may be undefined

    Notice the use of the word "internal". Meaning that there is version-specific OptiX and/or OptiX Prime code statically linked somwhere else in Iray's runtime files outside of the OptiX Prime Library DLL. Presumably somewhere inside libirt.dll specifically since that is where actual rendering takes place when Iray's Interactive mode is selected (see this section of the "Iray Programmers Manual".) 

    attached file, it contains a list of imported functions in Iray libraries.

     

    Searching for signs of inlined code by looking at lists of DLL import functions makes no sense. By definition, imported functions come from external sources to a DLL. The only place where inlined code could logically be is in an export function since those are the only functions whose code physically resides inside a DLL. Here is a list of all the export functions in libirt.dll:

    List of export functions in libirt.dll:Ordinal Virtual Adress Name16      0x02a0a0       mi_plugin_factory59      0x31a060       rtcSetDisplacementFunction60      0x31a260       rtcSetErrorFunction61      0x31a290       rtcSetIntersectFunction62      0x31a490       rtcSetIntersectFunction1663      0x31a690       rtcSetIntersectFunction464      0x31a890       rtcSetIntersectFunction865      0x31aa90       rtcSetIntersectionFilterFunction66      0x31ac90       rtcSetIntersectionFilterFunction1667      0x31ae90       rtcSetIntersectionFilterFunction468      0x31b090       rtcSetIntersectionFilterFunction871      0x31b4c0       rtcSetOccludedFunction72      0x31b6c0       rtcSetOccludedFunction1673      0x31b8c0       rtcSetOccludedFunction469      0x31b290       rtcSetMask70      0x31b490       rtcSetMemoryMonitorFunction74      0x31bac0       rtcSetOccludedFunction875      0x31bcc0       rtcSetOcclusionFilterFunction76      0x31bec0       rtcSetOcclusionFilterFunction1677      0x31c0c0       rtcSetOcclusionFilterFunction478      0x31c2c0       rtcSetOcclusionFilterFunction879      0x31c4c0       rtcSetParameter1i80      0x31c530       rtcSetProgressMonitorFunction81      0x31c620       rtcSetTransform82      0x31cbc0       rtcSetUserData83      0x31cdc0       rtcUnmapBuffer84      0x31cfc0       rtcUpdate85      0x31d1b0       rtcUpdateBuffer27      0x317a00       rtcDeviceSetParameter1i28      0x317b30       rtcDisable29      0x317d20       rtcEnable30      0x317f10       rtcExit41      0x318bc0       rtcNewDevice42      0x318c40       rtcNewHairGeometry43      0x318d40       rtcNewInstance44      0x318f90       rtcNewLineSegments56      0x319a40       rtcSetBoundsFunction57      0x319c40       rtcSetBoundsFunction258      0x319e40       rtcSetBuffer08      0x325a90       ?run@Task@TaskSchedulerTBB@embree@@QEAAXAEAUThread@23@@Z09      0x325d90       ?startThreads@TaskSchedulerTBB@embree@@CAXXZ10      0x325db0       ?startThreads@ThreadPool@TaskSchedulerTBB@embree@@QEAAXXZ11      0x325f10       ?swapThread@TaskSchedulerTBB@embree@@CAPEAUThread@12@PEAU312@@Z12      0x325f40       ?thread@TaskSchedulerTBB@embree@@CAPEAUThread@12@XZ13      0x325f60       ?threadCount@TaskSchedulerTBB@embree@@SA_KXZ14      0x325f70       ?threadIndex@TaskSchedulerTBB@embree@@SA_KXZ19      0x3173d0       rtcDebug20      0x3173f0       rtcDeleteDevice21      0x3174f0       rtcDeleteGeometry24      0x3177b0       rtcDeviceNewScene25      0x3178f0       rtcDeviceSetErrorFunction26      0x3179e0       rtcDeviceSetMemoryMonitorFunction36      0x3187a0       rtcIntersect37      0x3187d0       rtcIntersect1638      0x3188b0       rtcIntersect439      0x3188e0       rtcIntersect840      0x3189c0       rtcMapBuffer01      0x3252b0       ?add@ThreadPool@TaskSchedulerTBB@embree@@QEAAXAEBV?$Ref@UTaskSchedulerTBB@embree@@@3@@Z06      0x3258c0       ?remove@ThreadPool@TaskSchedulerTBB@embree@@QEAAXAEBV?$Ref@UTaskSchedulerTBB@embree@@@3@@Z17      0x317110       rtcCommit18      0x317200       rtcCommitThread22      0x317680       rtcDeleteScene23      0x317780       rtcDeviceGetError31      0x318030       rtcGetError32      0x318060       rtcGetUserData33      0x318200       rtcInit34      0x318350       rtcInterpolate35      0x318560       rtcInterpolateN45      0x319090       rtcNewQuadMesh46      0x319190       rtcNewScene47      0x319200       rtcNewSubdivisionMesh48      0x319340       rtcNewTriangleMesh49      0x319440       rtcNewUserGeometry50      0x319530       rtcNewUserGeometry251      0x319620       rtcOccluded52      0x319650       rtcOccluded1653      0x319730       rtcOccluded454      0x319760       rtcOccluded855      0x319840       rtcSetBoundaryMode02      0x325360       ?addScheduler@TaskSchedulerTBB@embree@@CAXAEBV?$Ref@UTaskSchedulerTBB@embree@@@2@@Z03      0x325370       ?allocThreadIndex@TaskSchedulerTBB@embree@@QEAA_JXZ04      0x325630       ?execute_local@TaskQueue@TaskSchedulerTBB@embree@@QEAA_NAEAUThread@23@PEAUTask@23@@Z05      0x325790       ?instance@TaskSchedulerTBB@embree@@CAPEAU12@XZ07      0x325970       ?removeScheduler@TaskSchedulerTBB@embree@@CAXAEBV?$Ref@UTaskSchedulerTBB@embree@@@2@@Z15      0x326330       ?wait@TaskSchedulerTBB@embree@@SA_NXZ

    Considering both that mi_plugin_factory is the only function in this list that's not a part of the Embree API (what evidently powers Interactive Mode rendering on Intel CPUs) and that it seems to be roughly the same size as all the rest of these functions put together (at least judging by VA offsets) I'd venture to guess that the phantom OptiX/OptiX Prime code is in there. But barring having access to libirt.dll's uncompiled source code I doubt there's much more anyone can discover on the matter.

    Ultimately one thing is clear: Iray's premier raytraced rendering solution Photoreal has never utilized the full OptiX API (electing instead to rely on it's own native Cuda code iin order to exploit performance gains on previous generation hardware.) And getting it integrated enough to take advantage of RTX acceleration features like RTCores is undoubtedly gonna take a huge amount of code restructering. But I have little doubt that those changes will come. Iray is an industry standard piece of software. It's makers don't have the motivation nor the latitude to not adapt it to new tech. But it will take some time.

    Post edited by RayDAnt on
  • LenioTGLenioTG Posts: 2,118

    Does anyone have a RTX 2060? I`d like to see how it performs in Daz Studio! laugh

  • Question for people with a 1070 and above graphics cards: what was the maximum power consumption you hit while rendering?

    I am planning on building a  new PC as an upgrade from my gaming laptop for rendering. Appreciate any inputs from the community. 

  • RayDAntRayDAnt Posts: 1,120
    rinkuchal said:

    Question for people with a 1070 and above graphics cards: what was the maximum power consumption you hit while rendering?

    I am planning on building a  new PC as an upgrade from my gaming laptop for rendering. Appreciate any inputs from the community. 

    3d rendering in software like Daz Studio is pretty much a power virus. Meaning that the sum of the max TDPs for each and every major system component in your build (plus approximately 100-200 watts for minor components and efficiency headroom) is what you should budget for when selecting a PSU.

    So eg. for a single GPU build with a 1070 (150 watts TDP) and a previous gen i5-8600K processor (95 watts TDP) you should be looking for something in the neighborhood of 400 watts.

    Honestly, what with the current state of positive price/quality trends in the PSU market, you'd probably be far better off future-proofing by simply getting a power-efficient 500 watt or 750 watt (in case dual GPU setups ever end up being in your future) PSU for around 100-120 bucks and simply calling it a day. 

  • junkjunk Posts: 1,230
    edited February 2019
    rinkuchal said:

    Question for people with a 1070 and above graphics cards: what was the maximum power consumption you hit while rendering?

    I am planning on building a  new PC as an upgrade from my gaming laptop for rendering. Appreciate any inputs from the community. 

    Heck I just bought a Corsair HX1200 80 PLUS PLATINUM Power Supply for $93.69 total on ebay NEW.   I needed it for my three video cards inside my machine but still at this price it would give room for your to grow as you add possibly more cards.

    The seller is "2012yoshi46" but that particular auction just ended with still 10 available.  They'll probably relist very soon as I just bought it on Monday of this week.

    Post edited by junk on
  • Newegg has a lot of 1200 + power supplies at affordable prices( less than $300.00 )

    I tend to go for as big as my budget will allow when I build 

    I would rather have more than I need at the time that way I can expand a little at a time later

  • rinkuchalrinkuchal Posts: 38
    edited February 2019
    RayDAnt said:
    rinkuchal said:

    Question for people with a 1070 and above graphics cards: what was the maximum power consumption you hit while rendering?

    I am planning on building a  new PC as an upgrade from my gaming laptop for rendering. Appreciate any inputs from the community. 

    3d rendering in software like Daz Studio is pretty much a power virus. Meaning that the sum of the max TDPs for each and every major system component in your build (plus approximately 100-200 watts for minor components and efficiency headroom) is what you should budget for when selecting a PSU.

    So eg. for a single GPU build with a 1070 (150 watts TDP) and a previous gen i5-8600K processor (95 watts TDP) you should be looking for something in the neighborhood of 400 watts.

    Honestly, what with the current state of positive price/quality trends in the PSU market, you'd probably be far better off future-proofing by simply getting a power-efficient 500 watt or 750 watt (in case dual GPU setups ever end up being in your future) PSU for around 100-120 bucks and simply calling it a day. 

     

    junk said:
    rinkuchal said:

    Question for people with a 1070 and above graphics cards: what was the maximum power consumption you hit while rendering?

    I am planning on building a  new PC as an upgrade from my gaming laptop for rendering. Appreciate any inputs from the community. 

    Heck I just bought a Corsair HX1200 80 PLUS PLATINUM Power Supply for $93.69 total on ebay NEW.   I needed it for my three video cards inside my machine but still at this price it would give room for your to grow as you add possibly more cards.

    The seller is "2012yoshi46" but that particular auction just ended with still 10 available.  They'll probably relist very soon as I just bought it on Monday of this week.

     

    Newegg has a lot of 1200 + power supplies at affordable prices( less than $300.00 )

    I tend to go for as big as my budget will allow when I build 

    I would rather have more than I need at the time that way I can expand a little at a time later

    Thanks for the replies, and advice. My situation is a bit more price sensitive since i'll need to deal with import tariffs and forex rates. Thanks again.

    Post edited by rinkuchal on
  • RayDAntRayDAnt Posts: 1,120
    rinkuchal said:

    Thanks for the replies, and advice. My situation is a bit more price sensitive since i'll need to deal with import tariffs and forex rates. Thanks again.

    Keep in mind that a relatively high wattage/quality ATX power supply will routinely last you ten years. And there are no constant tech upgrades in the PSU space (unlike all other parts of the PC building market.) Meaning that a good/powerful PSU is the one component that is pretty much gauranteed to last you into your next build.

    Eg. when the decent quality 450 watt PSU in my first PC build (a Core2Duo-based system from 2008) finally died about three years ago, I decided to replace it with a higher quality 750 watt one  for around 45+ USD despite having no immediate plans to upgrade anything. Fast forward to today and that same 750 watt PSU is now happily chugging along in an i7-8700K/Titan RTX build with loads of headroom to spare.

  • ParadigmParadigm Posts: 421
    RayDAnt said:
    rinkuchal said:

    Fast forward to today and that same 750 watt PSU is now happily chugging along in an i7-8700K/Titan RTX build with loads of headroom to spare.

    Can cofirm, my 750 runs an AMD 2700x, 32 GB RAM, 2080TI and two 4k monitors at once with no problem.

  • RayDAntRayDAnt Posts: 1,120
    edited February 2019

    I have noticed something interesting when testing with one of my scenes:

    2019-02-22 21:42:41.331 WARNING: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(304): Iray ERROR - module:category(IRAY:RENDER):   1.0   IRAY   rend error: Architectural sampler has been removed and is no longer supported.

    That means we won't be having Architectural Sampler option in the next version of DAZ Studio.

    Fyi the architectural sampler feature was officially removed from Iray more than 7 months ago (see this post from the official DS 4.11 Beta development thread for details.)

     

    UPDATE:
    Tried some benchmarking scenes (OptiX On) in DS 4.11 Beta both with and without those modded libiray.dll and optix.prime.1.dll DLLs using the Titan RTX.

    SickleYield - Normal DLLs - OptiX Prime On - Total Rendering Time: 1 minutes 4.58 seconds
    SickleYield - Modded DLLs - OptiX Prime On - Total Rendering Time: 1 minutes 4.12 seconds​

    outrider42 - Normal DLLs - OptiX Prime On - Total Rendering Time: 5 minutes 10.83 seconds​
    outrider42
     - Modded DLLs - OptiX Prime On - Total Rendering Time: 5 minutes 7.49 seconds

    DAZ_Rawb - Normal DLLs - OptiX Prime On - Total Rendering Time: 4 minutes 24.27 seconds​
    DAZ_Rawb - Modded DLLs - OptiX Prime On - Total Rendering Time: 4 minutes 24.52 seconds

    Aala 1k - Normal DLLs - OptiX Prime On - Total Rendering Time: 1 minutes 30.98 seconds​
    Aala 1k - Modded DLLs - OptiX Prime On - Total Rendering Time: 1 minutes 31.24 seconds

    Going by these results there are no signifcant gains to be had from using the latest version of the OptiX Prime acceleration library with Iray on RTX Turing hardware. So for those still waiting to see what Turing is fully capable of for rendering, it looks like you're just gonna have to keep on until full RTCore support comes around.

    Post edited by RayDAnt on
  • LenioTGLenioTG Posts: 2,118

    Does somebody have a RTX 2060 or a GTX 1660 Ti? I'd like to see how they perform! :D

    I'm saving for them...anything higher is out of my reach! xD

  • bailaowaibailaowai Posts: 44

    Man, it sure would have been nice if this thread hadn't been hijacked with some "new benchmark" half way through, and then apparently again just recently.  That, together with the massive amount of technical discusion and debate, makes it really tough to get useful information out of it.  Someone smarter than me should create a brand-new thread with exactly one benchmark file/process, and encourage people to post ONLY results, so it would become possible to actually go back and compare results between cards.  I have a 980Ti and would like to get a sense of what improvement I'd see by going to a 2080Ti.  I know I've seen 980Ti results somewhere back in the 37 pages of this thread.  There are many 2080Ti results.  I have no idea if they are the same benchmark.

  • ebergerlyebergerly Posts: 3,255

     

    bailaowai said:

    Man, it sure would have been nice if this thread hadn't been hijacked with some "new benchmark" half way through, and then apparently again just recently.  That, together with the massive amount of technical discusion and debate, makes it really tough to get useful information out of it.  Someone smarter than me should create a brand-new thread with exactly one benchmark file/process, and encourage people to post ONLY results, so it would become possible to actually go back and compare results between cards.  I have a 980Ti and would like to get a sense of what improvement I'd see by going to a 2080Ti.  I know I've seen 980Ti results somewhere back in the 37 pages of this thread.  There are many 2080Ti results.  I have no idea if they are the same benchmark.

    Yeah, I hear ya...

    Honestly I pretty much gave up on this whole RTX thing maybe 6 months ago when it became clear that it's such an incredibly complex set of interconnected features that nobody understood, and it would require many months for it to all come to a point where we could figure out exactly what benefit it would have. IMO, it's been all hype and rumor and misunderstanding since then. And a whole lot of internet tech folks who desperately want it to be awesome. 

    Anyway, I'd suggest you come back maybe in 6 months or so, because it's still not ready for prime time IMO. Yeah, it has a lot of potential, but there's like 15 "but it depends..." attached to it all. 

    And if you want a splash of cold water in your face, take a look at the prices of the high end RTX cards right now. That's why I say check back in 6 months. Maybe the prices will get more realistic and the actual data on render time improvement will be available by then. 

     

  • bluejauntebluejaunte Posts: 1,861

    Price for rendering in Iray is actually about the same as a 1080 TI, considering it costs roughly twice as much at the moment but also renders twice as fast.

  • ebergerlyebergerly Posts: 3,255

    Price for rendering in Iray is actually about the same as a 1080 TI, considering it costs roughly twice as much at the moment but also renders twice as fast.

    True, but I'm guessing that most of us wouldn't pay $1,300 for a graphics card even if it rendered our scene before we asked it to. smiley

     

  • RayDAntRayDAnt Posts: 1,120
    edited March 2019
    bailaowai said:

    Man, it sure would have been nice if this thread hadn't been hijacked with some "new benchmark" half way through, and then apparently again just recently.

    Oh, it's much worse than that. :) If you go all the way through it, this thread is a nearly hopeless hodgepodge of 4 completely different benchmarking scenes (I actually went post-by-post through the entire thread a couple weeks ago in the hopes that I could create some sort of master chart depicting the relative performance of different pieces of hardware from it - and the reason why I haven't posted parts of my 800+ line spreadsheet yet is that there simply isn't enough contextual data like which version of DS/drivers/even operating system people were benching on at the time, whether OptiX acceleration was enabled, etc. in the vast majority of it to understand what's going on.

    Someone smarter than me should create a brand-new thread with exactly one benchmark file/process, and encourage people to post ONLY results, so it would become possible to actually go back and compare results between cards.

    Not that I'm claiming to be smarter than you, but I'm already most of the way there. I already have the whole testing methodology (much shorter than the one I already posted in this thread earlier) worked out and everything. Pretty much the only thing holding me back at this point is picking a scene to use for the actual benchmark. All the existing ones have issues (primarily regarding render completion limits) that need to be tweaked in order for them to function as ex[ected across differently performing systems/graphics cards. And I'm hesitant to attempt making one from scratch myself since I honestly don't know what people would find most useful to see content-wise in a scene meant to represent a typical DS workload (suggestions anyone?) Not to mention simply reusing one of the existing ones with tweaks is dangerous since it might lead people to mistakenly benchmarking with its prior version because of them looking similar.

    Post edited by RayDAnt on
  • ebergerlyebergerly Posts: 3,255

    While it's honorable to try to compile all of this (I did a bit of that myself by making a spreadsheet of GTX rendertimes from the old benchmark - which, BTW, was generally ignored AFAIK), I think doing that with RTX is a bit like chasing a ghost :)

    My reason is this:

    RTX is comprised of a bunch of SEPARATE new technologies, using newly separated hardware and software components which used to be all together:

    • AI-accelerated features (NGX)
    • Asset formats (USD and MDL)
    • Rasterization including advanced shaders
    • Raytracing via OptiX, Microsoft DXR and Vulkan
    • Simulation tools:
      • CUDA 10
      • Flex
      • PhysX

    And those affect different aspects of a scene to be rendered. And they're all at different stages of development, implementation, and integration. And your scene might take advantage of some of those things, but not others, and that will ultimately affect render time. Which is why I suggest that people wait 6 months or so to see how this all shakes out. As well as the exhorbitant prices shaking out, at least hopefully...

  • LenioTGLenioTG Posts: 2,118

    It would be nice to see your spreadsheets!

    Yeah, we need a thread with just results and no discussions!

    Remember to not make the benchmark scene bigger than 2Gb, otherwise some GPU couldn't run the test, and I don't know about you, but some of us are still restricted to low-mid end GPUs! Again, if someone has a RTX 2060 or a GTX 1660Ti, it would be nice to see how they perform xD

  • RibanokRibanok Posts: 12
    kameneko said:

    ... Again, if someone has a RTX 2060 or a GTX 1660Ti, it would be nice to see how they perform xD

    Here you are :)

    My system: i5-3570K @4.2 GHz ; 8 GB RAM ; MSI Ventus XS GeForce RTX 2060 ; Windows 7 64-bit ; DAZ4.11 Beta ; Nvidia driver 419.35

    The benchmarks are done using SickleYield's Test. I didn't change anything in the scene, each time rendered until 100% (5000 frames reached).

     

    --- No overclocking ---
    CPU Clock: Default (3.4 GHz)
    GPU Clock: Default (average ~1900 MHz in GPU-Z during benchmark)
    Video Mem: Default (1700 MHz in GPU-Z)

                          OptiX Enabled
    i5-3570K              1 hours 15 minutes 51.53 seconds
    RTX 2060              2 minutes 12.52 seconds

     

    --- Overclocking CPU ---
    CPU Clock: 4.2 GHz
    GPU Clock: Default (average ~1900 MHz in GPU-Z during benchmark)
    Video Mem: Default (1700 MHz in GPU-Z)

                          OptiX Enabled             OptiX Disabled
    i5-3570K
                  1 hours 11.93 seconds
    RTX 2060              2 minutes 10.44 seconds   2 minutes 48.96 seconds  
    i5-3570K + RTX 2060   2 minutes 53.4 seconds   


    ---Overclocking CPU and GPU ---
    CPU Clock: 4.2 GHz
    GPU Clock: +80 MHz in MSI Afterburner (average ~1980 MHz in GPU-Z during benchmark)
    Video Mem: Default (1700 MHz in GPU-Z)

                          OptiX Enabled         
    RTX 2060             
    2 minutes 9.51 seconds


    ---Overclocking CPU, GPU and Video Memory ---
    CPU Clock:   4.2 GHz
    GPU Clock:   +80 MHz   (average ~1980 MHz in GPU-Z during benchmark)
    Video Mem: +1200 MHz in MSI Afterburner (2000 MHz in GPU-Z)

                          OptiX Enabled         
    RTX 2060             
    2 minutes 1.44 seconds

  • RayDAntRayDAnt Posts: 1,120
    edited March 2019
    kameneko said:

    It would be nice to see your spreadsheets!

    Will see about putting them out there (probably as a publicly viewable Google spreadsheet) once time permits. I must caution, though, that they REALLY aren't very helpful since the numbers are so inconsistent - mostly stemming from multiple builds of Daz Studio/Iray being tested across multiple OS/driver versions without any documentation of which was used where (the downside of a spur-of-the-moment benchmark thread staying active 5+ years...)

    Yeah, we need a thread with just results and no discussions!

    Imo discussions are fine so long as the first post in the thread has a regularly updated summary of the benchmarking results posted after it (which is my game plan for the new benchmarking thread I'm working on.)

    Remember to not make the benchmark scene bigger than 2Gb, otherwise some GPU couldn't run the test, and I don't know about you, but some of us are still restricted to low-mid end GPUs!

    My current rendering systems are a GTX 1050 2GB based Surface Book 2 laptop and a Titan RTX 24GB based Homebrew desktop. And one of the main requirements I've set for the new benchmarking scene I'm currently developing (already well into the testing phase fwiw) is that it needs to be fully renderable on both/anything in between.

    Post edited by RayDAnt on
  • ebergerlyebergerly Posts: 3,255
    edited March 2019

    FWIW, here's a quick update to my spreadsheet showing the RTX 2060 and (I think) RTX Titan numbers from the recent pages with the previous GTX numbers, all from the Sickleyield scene (I think). One of the results didn't include info on system configuration, so I had to do some detective work. I also did a quick look at newegg to get prices of GPUs and added those. If you disagree with any of the numbers feel free to generate your own spreadsheet and post it. 

    And like I said, use these numbers at your own risk. The RTX technology still has a ways to go before we really know what it can do with DAZ stuff, and there are also enough variables with the setup of the scene to make the numbers (IMO) only good by say +/- 15 seconds or more. For example, as I recall, if you did a render, then repeated, the second one would be significantly faster. Not sure if stuff like that was included in everyone's procedure, 

    BTW, I'm feeling pretty good that my GTX-1080ti plus GTX-1070 are only about 15 seconds slower than a $2,500 RTX Titan. I think I paid less than half of that for both. laugh

    BenchmarkNewestRTXCost.jpg
    540 x 576 - 69K
    Post edited by ebergerly on
  • outrider42outrider42 Posts: 3,679

    And there lies the problem with the original scene, where the top cards are all just a second or two apart. Where a 1080ti+1070 is "only" 15 seconds slower than a Titan RTX. The times are getting so small that the slightest deviation in render time results in a highly skewed result. Is a 1080ti+1070 really just 15 seconds slower than a 2080ti? This chart also shows TWO 1080ti's render the same time as the 1080ti+1070 combo? Something is afoot. Though my machine with two 1080ti's render the SY scene in almost exactly 1 minute, give or take a couple seconds. I hit 58 seconds a few times, 1 minute and 4 seconds other times. There is a 6 second variance, which on this chart is results in a sizable performance gap depending on which time I quote.

    That is why I posted my benchmark. A lot of things had changed from 2015. G3 and G8 released, plus Daz Iray had all new features, such as dual lobe specularity which was not present in the 2015 bench. I made use of these new features. Gaming benchmark suits are updated all the time to take advantage of new features, like DirectX 12 or a new game engine. Also, gaming benchmarks vary wildly to say the least. I just saw a bench suit of over 30 games between several GPUs on Hardware Unboxed, and the results are almost never the same between games. One GPU could have a massive 44% lead over a certain GPU in one game, but actually lose by 12% to that same GPU in another. Having a variety of benchmarks to provide balance is usually a good idea.

    Also, the way Iray calculates convergence CAN change between SDKs. This is a fact as provided by Migenius. And this becomes apparent when looking at the data of each SY scene when users list how many iterations were done. There are cases where people have wondered why their bench took longer or shorter going from 4.8 to 4.9 or 4.10 or 4.11. The answer lies in how many iterations were ran before the scene stopped. The iteration count can vary quite a bit in some cases.

    Additionally, Iray can indeed run slightly different iteration counts from run to run, even with fresh boots, again skewing results. How does this happen? That is because the render engine only checks the convergence every so often. Thus it is possible for Iray to over run the iteration count that it might normally do if it happens to check the convergence at the "wrong" time. However, a hard iteration count will stop Iray at that set number nearly every time.

    And if Iray runs slower in that last 5% of convergence, isn't that even more evidence that convergence alone is not a great measurement?

    So this is why I feel very strongly that capping the iteration count so that it ends before convergence provides more consistent results.

    Still having tests that can show both can have merit.

    Having different tests is fine because just like gaming, different scenes can task different elements of Iray.  Many other render engines have different benchmark scenes for this reason. Luxmark quickly comes to mind. The recent Radeon VII renders certain benchmark scenes MUCH faster than other cards. However, more challenging scenes greatly even out the field, to the point where some cards actually beat the VII. Why? It could be that some simple scenes allow the VII's insane memory bandwidth to run wild. So this proves that certain scenes can influence results, and that having a variety is a good idea.

    Most people using Daz Studio are using humans in their scenes. Thus having a human character in the test scene is very logical, and human skin can be one of the most challenging things to emulate in any render engine. So I think it is important that most bench scenes have a human in them. Its also the key difference between Daz Iray and nearly every other render engine out there. Most test scenes you find are strictly environments with no humans in them. For example, you will not find any humans in a Lux benchmark.

    I admit I was surprised how people started using Rawb's scene here...he did not even post his scene in this thread and people just adopted it. But it has a big drawback, with multiple Genesis 3 or 8 figures the scene may not work on VRAM strapped cards at all. Still, it is different from my or SY's scene in that it is brighter and has a large sheet of glass.

    The newest scene posted has no people, but offers lots of areas for light to bounce around and blend. So that could have merit.

    We know there are multiple elements to every render. The initial light pass, and the shading calculations. Different cards may excel at these differently, especially different generations of cards. So having different tests that can test these different aspects can be a good thing.

    And of course any scene has to use the items that come installed with Daz Studio and any Starter Essentials.

    Flipping back a bit, I have talked about the differences between OptiX and OptiX Prime. To repeat the notes from official documents, Prime needs to be recompiled for every new GPU architecture. That is why Pascal did not work when it launched. In fact, Daz would not even run with Pascal at all, it was far more than simply being able to choose OptiX ON or OFF, that wasn't even an option. The same holds true for Turing. Only the Daz beta can utilize Turing, so at some point the beta's plugin had to have been updated for Volta and Turing.

    Prime cannot be updated to RTX. It is due to how Prime uses VRAM. Prime uses more VRAM than standard OptiX. The more rays, and the faster they are cast, the more GPU VRAM balloons in Prime. Apparently RTX cards shoot so many rays and so fast that they would totally overflow just about every GPU's frame buffer. It is not a matter that it cannot be done, it is a matter that it would be unusable. This why OptiX Acceleration does not work with RTX cards, why the times are basically the same (or actually slower in most cases.) Standard OptiX does not do this. However, the big negative against standard OptiX is that it is a PURE GPU renderer. Standard OptiX will NOT fall back to CPU or use CPU in any way. Thus if you run out of VRAM with standard OptiX...you are unable to render at all! I posted this paragraph's info before, and included links to where it came from. But I am not searching through them. Just know I am not making this information up or speculating in any way. This is how OptiX works.

    So this should be easy to test. All you need to do is to create a scene and measure how much VRAM it is using through the help file. Test with OptiX ON and OFF. If there is a difference, we have an result. If you create a scene that is too large for your VRAM with OptiX OFF, does the render fall back to CPU? If it does, then we know for a fact that this is not standard OptiX, even if it uses less VRAM.

    This is really interesting because if you have a scene that fails with OptiX ON, then perhaps that scene just might work with OptiX OFF. Food for thought.

    However, OptiX Prime is vital for Daz Studio, because obviously not everyone has a Nvidia GPU, and there are people who desire rendering scenes that do not fit on a GPU they may have. OptiX Prime was also important in helping make Iray a decently fast render engine for its time. So does OptiX 6 add a CPU fallback mode?

  • RayDAntRayDAnt Posts: 1,120
    edited March 2019

    And there lies the problem with the original scene, where the top cards are all just a second or two apart. Where a 1080ti+1070 is "only" 15 seconds slower than a Titan RTX. The times are getting so small that the slightest deviation in render time results in a highly skewed result. Is a 1080ti+1070 really just 15 seconds slower than a 2080ti? This chart also shows TWO 1080ti's render the same time as the 1080ti+1070 combo? Something is afoot. Though my machine with two 1080ti's render the SY scene in almost exactly 1 minute, give or take a couple seconds. I hit 58 seconds a few times, 1 minute and 4 seconds other times. There is a 6 second variance, which on this chart is results in a sizable performance gap depending on which time I quote.

    That is why I posted my benchmark. A lot of things had changed from 2015. G3 and G8 released, plus Daz Iray had all new features, such as dual lobe specularity which was not present in the 2015 bench. I made use of these new features. Gaming benchmark suits are updated all the time to take advantage of new features, like DirectX 12 or a new game engine. Also, gaming benchmarks vary wildly to say the least. I just saw a bench suit of over 30 games between several GPUs on Hardware Unboxed, and the results are almost never the same between games. One GPU could have a massive 44% lead over a certain GPU in one game, but actually lose by 12% to that same GPU in another. Having a variety of benchmarks to provide balance is usually a good idea.

    Also, the way Iray calculates convergence CAN change between SDKs. This is a fact as provided by Migenius. And this becomes apparent when looking at the data of each SY scene when users list how many iterations were done. There are cases where people have wondered why their bench took longer or shorter going from 4.8 to 4.9 or 4.10 or 4.11. The answer lies in how many iterations were ran before the scene stopped. The iteration count can vary quite a bit in some cases.

    Additionally, Iray can indeed run slightly different iteration counts from run to run, even with fresh boots, again skewing results. How does this happen? That is because the render engine only checks the convergence every so often. Thus it is possible for Iray to over run the iteration count that it might normally do if it happens to check the convergence at the "wrong" time. However, a hard iteration count will stop Iray at that set number nearly every time.

    And if Iray runs slower in that last 5% of convergence, isn't that even more evidence that convergence alone is not a great measurement?

    So this is why I feel very strongly that capping the iteration count so that it ends before convergence provides more consistent results.

    Still having tests that can show both can have merit.

    Having different tests is fine because just like gaming, different scenes can task different elements of Iray.  Many other render engines have different benchmark scenes for this reason. Luxmark quickly comes to mind. The recent Radeon VII renders certain benchmark scenes MUCH faster than other cards. However, more challenging scenes greatly even out the field, to the point where some cards actually beat the VII. Why? It could be that some simple scenes allow the VII's insane memory bandwidth to run wild. So this proves that certain scenes can influence results, and that having a variety is a good idea.

    Most people using Daz Studio are using humans in their scenes. Thus having a human character in the test scene is very logical, and human skin can be one of the most challenging things to emulate in any render engine. So I think it is important that most bench scenes have a human in them. Its also the key difference between Daz Iray and nearly every other render engine out there. Most test scenes you find are strictly environments with no humans in them. For example, you will not find any humans in a Lux benchmark.

    I admit I was surprised how people started using Rawb's scene here...he did not even post his scene in this thread and people just adopted it. But it has a big drawback, with multiple Genesis 3 or 8 figures the scene may not work on VRAM strapped cards at all. Still, it is different from my or SY's scene in that it is brighter and has a large sheet of glass.

    The newest scene posted has no people, but offers lots of areas for light to bounce around and blend. So that could have merit.

    We know there are multiple elements to every render. The initial light pass, and the shading calculations. Different cards may excel at these differently, especially different generations of cards. So having different tests that can test these different aspects can be a good thing.

    And of course any scene has to use the items that come installed with Daz Studio and any Starter Essentials.

    outrider42, for what it's worth, I think you're probably gonna like the new benchmarking scene I'm currently working on (mostly done with, actually - all I have left to do is calculating the best render limits) since it pretty much directly addresses every single one of your points here (including the one about small render deviations resulting in skewed results for top-end cards/multiple card combinations - more on this once I finish getting my testing methodology fully written up.)

     

    Flipping back a bit, I have talked about the differences between OptiX and OptiX Prime. To repeat the notes from official documents, Prime needs to be recompiled for every new GPU architecture. That is why Pascal did not work when it launched. In fact, Daz would not even run with Pascal at all, it was far more than simply being able to choose OptiX ON or OFF, that wasn't even an option. The same holds true for Turing. Only the Daz beta can utilize Turing, so at some point the beta's plugin had to have been updated for Volta and Turing.

    To be clear, there is no explicit Turing support in any currently publicly available version of Daz Study or Iray. With that said, rendering with Turing cards does currently work (in a performance limited fashion) on the current 4..11 Beta. However this is only because Volta has forwards compatibility with core elements of the Turing architecture, and Volta has been explicitly supported by Iray since last summer.

     

    Prime cannot be updated to RTX. It is due to how Prime uses VRAM. Prime uses more VRAM than standard OptiX. The more rays, and the faster they are cast, the more GPU VRAM balloons in Prime. Apparently RTX cards shoot so many rays and so fast that they would totally overflow just about every GPU's frame buffer. It is not a matter that it cannot be done, it is a matter that it would be unusable. This why OptiX Acceleration does not work with RTX cards, why the times are basically the same (or actually slower in most cases.) Standard OptiX does not do this.

    While you are correct that OptiX Prime both uses more video memory and sometimes leads to measurable performance decreases on Turing hardware if enabled (depending on how many light sources/how much ray-tracing activity happens in a scene - at least that seems to be the pattern with my Titan RTX) the true reason for this latter phenomenon isn't what you think.

    OptiX Prime is a self-contained headless computer program (aka a DLL) which spits out ray-tracing calculations at the behest of a parent program (in this case Daz Studio's Iray render engine plugin.) It does this by either using general purpose "compute" processing power on an Nvidia GPU (via Nvidia's proprietary GPGPU Cuda API) or by using general purpose processing power on an Intel or AMD CPU (via Intel's non-proprietary ray-tracing API Embree.) The key detail here is the phrase using general purpose processing power. Although the data which OptiX Prime produces is fundamentally tied to graphics rendering, the actual computational workload of the program itself is the same as that of a general run-of-the-mill computer program - not part of a dedicated graphics rendering pipeline (like texture shading for instance.)

    This is significant because, in a first over predecessors like Pascal, Turing based GPUs feature physical hardware enhancements (concurrent FP & INT execution data paths in a 100:36 performance ratio) tailored towards accelerating specific parts of a dedicated graphics rendering pipeline (like texture shading) rather than general GPGPU compute workloads like OptiX Prime. To put it another way, Turing does a much better job of accelerating texture shading workloads than it does compute workloads. And since OptiX Prime is technically a compute workload, using OptiX Prime on Turing hardware can just as easily lead to decreased overall rendering performance as increased.

    OptiX Prime acceleration was conceived of during a time when pure graphics rendering and pure compute processing on Nvidia GPUs was about equal performance-wise. Turing GPUs mark a major paradigm shift away from that balance with their inclusion of ASIC-like accelerators like RTCores for ray-tracing and Tensor cores for deep learning. This makes past GPGPU-based solutions for doing the same/similar things (like OptiX Prime for RTCores) functionally obsolete. OptiX Prime acceleration is currently (and still will for the foreseeable future) fully operational on Turing GPUs. It just doesn't necessarily improve performance anymore.

     

    However, OptiX Prime is vital for Daz Studio, because obviously not everyone has a Nvidia GPU, and there are people who desire rendering scenes that do not fit on a GPU they may have. OptiX Prime was also important in helping make Iray a decently fast render engine for its time. So does OptiX 6 add a CPU fallback mode?

    Fyi there is not now, nor ever has been, any non-Nvidia GPU support in either OptiX, OptiX Prime, or even Iray itself. However there has and continues to be OptiX Prime and Iray support for Intel and Nvidia CPUs via Embree. And I have yet to see anyone even hint at OptiX Prime acceleration being removed as an option in Daz Studio/Iray at some point. Which would make no logical sense.since it is still useful on a whole class of Iray supported devices (CPUs) - it being effectively obsoleted on Turing hardware is an edge case.

    Post edited by RayDAnt on
  • ebergerlyebergerly Posts: 3,255
    edited March 2019

    Some things to consider:

    1. Sickleyield posted her benchmark scene 4 years ago. Since then there have been a ton of kind folks who spent a lot of time posting results with their particular cards, which is why we have a nice list of relative performance data with different cards.
    2. Each time a new benchmark scene comes out, that process starts from scratch, and all previous data is effectively erased.
    3. As I recall, the results in the spreadsheet seem to scale up to much longer scenes. For example, if someone found that a 1080ti+1070 rendered Sickleyield's scene 35% faster than a 1080ti alone (1.3 minutes vs 2 minutes), that 35% improvement also happened with a much longer rendering scene. So even with relatively fast render times, they seem to apply to longer scenes. If there's doubt, just render the Sickleyield scene and then a longer scene to see if the differences are the same. 
    4. No matter what scene you use, you honestly can't rely on any render time results that get posted with any accuracy for many reasons:
      1. Maybe someone's system was throttling because it got too hot because they didn't clean the dust bunnies or messed with the BIOS or 20 other reasons.
      2. Maybe someone posted the times for the slower, first render of the scene, not the faster second render.
      3. Maybe they're reading different render times from different sources and not comparing apples to apples.
      4. Maybe they have different render settings, like convergence, etc.
      5. And so on...

    So at the end of the day, all you can do is look at reported render times and get a very general idea of relative (not exact) performance compared to other cards. And if an RTX Titan seems to render a scene only something like 30% faster than a combo of 1080ti and 1070 at more than twice the price, then the important takeaway (for me at least) is that yeah, the RTX series are really overpriced and not yet ready for primetime. Whether it's EXACTLY 15 seconds difference is somewhat irrelevant.

    Because keep in mind that many/most of us try to make our scenes so that they render relatively fast so we don't have to sit there for an hour waiting for a render. So the Sickleyield results may be more appropriate for many users. Especially since cards are so insanely overpriced lately, maybe (hopefully) folks are looking at stuff like scene management and compositing to make more lightweight scenes that render faster and more efficiently. 

    Post edited by ebergerly on
Sign In or Register to comment.