Maximum GPU RAm for GPU rendering in Iray

Hi,

This porbably has been asked before, but anyways...

I currently have a PC with two Nvidia Titan GPU's of 12 GB GPU Ram each. Now because the way Iray works, I can "only" render out scenes that fit withing that 12GB slot in one go, before Iray falls back on the CPU to render it out.

Hence the question: is there any way at all,... no matter the cost, the hardware setup, whatever... to combine the GPU RAM of multiple GPU's together? I ask because there appear to be rather expensive "render boxes" out there with like 10 GPU's in there, but they only carry about 6 GB of GPU Ram each...

If GPU Ram can't be combined att all, is there perhaps a way (like in Blender) to have Iray render using the GPU's but load everything on System RAM? That would be great too.

 

Thanks a lot,

Me

«13

Comments

  • Richard HaseltineRichard Haseltine Posts: 107,953

    Well, you could commission your own render engine if money is no object...

    Moer seriously, though still not cheap, the newer nVidia GPUs, at least the high-end models, can combine RAM - though there are overheads in doing so, and I'm not sure anyone has tried with Iray in DS.

  • Thanks!

    And which GPU models would that be? Is there a name for that technology I should look for in the technical specs? What do you mean by "overheads"?

  • Got it!!! That technology is called NVLINK!

  • kenshaw011267kenshaw011267 Posts: 3,805

    Before you spend a lot of money be aware there is no report that anyone has ever made it work in iRay.

  • HavosHavos Posts: 5,575

    You might want to look into optimizing your scenes. Unless you are rendering at very high resolution (ie 4K x 4K or more), pretty much any scene, no matter how complex, should fit into 12 GB.

  • ebergerlyebergerly Posts: 3,255

    Also, keep in mind the rarely mentioned impact of more GPU VRAM is the need for more system RAM to support it. A 12 GB scene in GPU VRAM will require 2 to 3 times that for system RAM (ie, 24-36GB) just to handle the raw scene data before it's optimized and compressed into IRAY code for the GPU. In my experience it's usually closer to 3x. 

  • kyoto kidkyoto kid Posts: 41,847
    edited June 2019

    ...if money isn't a factor, there is the Titan RTX with 24 GB of VRAM for 2,500$.

    Otoy is working on enabling NVLink to pool GPU memory in the Octane4 beta however this will only work with Turing or Volta based GPUs.

    Prices for the Quadro RTX series have also come down from when they were first released.  The 24 GB RTX 6000 now retails for 4,000$ (down for 6,300$) while the 48 GB RTX 800 has a list price of 5,500$ (down from 10,000$).

    https://techgage.com/news/nvidia-quadro-rtx-6000-rtx-8000-price-drops/

    Post edited by kyoto kid on
  • PadonePadone Posts: 4,001
    edited June 2019

    or just use render farms they're cheap and easy ..

    Post edited by Padone on
  • kyoto kidkyoto kid Posts: 41,847

    ...depends it's OK to Daz send assets to a third party.

  • Richard HaseltineRichard Haseltine Posts: 107,953
    kyoto kid said:

    ...depends it's OK to Daz send assets to a third party.

    For a render farm yes, as long as the assets are not made available to other users of the service in the process. Most properly run render famrs should be fine, some amateur set-ups might not be.

  • nicsttnicstt Posts: 11,715

    I have a simple test render of a character I'm working on; it is rendering on a 980ti; Studio is only using 9.7GB, whereas Photoshop is using 11.4GB.

    Studio often takes up 25-29GB of RAM when it drops to CPU (and has probably taken more but I haven't noticed.); I have 64, so not worried, mentioning it as there is cost to everything, which includes optimising as has been have mentioned.

    No matter how much RAM you have, you will eventually run out. :) This is for GPU and Main system - at least system RAM can be upgraded.

  • Daz Jack TomalinDaz Jack Tomalin Posts: 13,813
    kyoto kid said:

    ...depends it's OK to Daz send assets to a third party.

    For a render farm yes, as long as the assets are not made available to other users of the service in the process. Most properly run render famrs should be fine, some amateur set-ups might not be.

    Yea if you/they use the Iray server software the content is stored in a single cache file - it's not something you can just open up and fish data out of.

  • RayDAntRayDAnt Posts: 1,154
    edited July 2019

    Hi,

    This porbably has been asked before, but anyways...

    I currently have a PC with two Nvidia Titan GPU's of 12 GB GPU Ram each. Now because the way Iray works, I can "only" render out scenes that fit withing that 12GB slot in one go, before Iray falls back on the CPU to render it out.

    Hence the question: is there any way at all,... no matter the cost, the hardware setup, whatever... to combine the GPU RAM of multiple GPU's together? I ask because there appear to be rather expensive "render boxes" out there with like 10 GPU's in there, but they only carry about 6 GB of GPU Ram each...

    The short answer to your question is yes. The secret is an alternate driver functioning mode on Windows called "TCC" which is available on all Nvidia Quadro, Tesla, and Titan cards (starting with the 6GB Keppler generation.) Meaning that not only is it possible, the cards you already have right now should be perfectly capable of doing it.

    However there are some complications to using your cards in this mode. The first is that TCC mode disables all associated graphics cards' standard graphics processing capabilities like driving displays. Meaning that, in a two GPU system like yours, you will either need a 3rd lower power GPU or an iGPU to drive your monitors. The second is that setting it up in windows requires messing around with commandline tools. Specifically the Nvidia SMI tool, found at:

    C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe

    As part of your standard Nvidia drivers install. How to set it up/how it works isn't incredibly well documented ("TCC" mode and everything associated with it is thoroughly lost somewhere in Nvidia's business/server farm technical support tree) but it should be perfectly doable with some tinkering around. See this post for how to test for the base functionality (just ignore all the Turing/2080ti specific talk.)

     

    If GPU Ram can't be combined att all, is there perhaps a way (like in Blender) to have Iray render using the GPU's but load everything on System RAM? That would be great too.

    Unfortunately not - at least not while using Iray's Photoreal rendering mode (the only one worth using if your goal is high quality renders.) Unless all of your graphics cards AND your CPU have NVLink functionality built directly into them (believe it or not, NVLink capable CPUs do exist...)

    Post edited by RayDAnt on
  • kenshaw011267kenshaw011267 Posts: 3,805

    You keep making this claim but have yet to show that it actually works. Simply putting a card into TCC mode does not establish that it can do any other Nvlink capability. TCC is strictly turning off the display outs to keep Windows from reserving VRAM for use as a video buffer. Based on the Quadro documentation you should be able to run one card in TCC and one not and still combine VRAM using Nvlink. 

  • IvyIvy Posts: 7,165
    edited July 2019

    you must be building huge non iray optimized scenes. because I render out complex animation all day long with a 2 - 1080ti's 11.5 gigs cards with no problems.and my scenes are pretty expansive,   & even though i have 2 cards it only combines cores function & not the vgpu ram so I still only benefit from the core function not the combined gpu ram.  if i have a scene so big to render out, calling for more gpu ram than my system has then i use a render farm. its not as expensive as you would think.

    Post edited by Ivy on
  • RayDAntRayDAnt Posts: 1,154
    edited July 2019

    You keep making this claim but have yet to show that it actually works. Simply putting a card into TCC mode does not establish that it can do any other Nvlink capability. TCC is strictly turning off the display outs to keep Windows from reserving VRAM for use as a video buffer. Based on the Quadro documentation you should be able to run one card in TCC and one not and still combine VRAM using Nvlink. 

    Um... nothing that I wrote about doing above has anything to do with NVLink. TCC mode shuts down a card's display output functionality and - MORE importantly since this is how the vram pooling actually gets implemented - puts it into a peer-to-peer resource sharing relationship (via PCI-E lanes and/or NVLink connectors depending on the GPU generations involved) with any other TCC-enabled cards detected in the system. NVLink has no bearing on pre-Turing/Volta GPUs whatsoever. And the OP is dealing with pre-Turing/Volta Titan GPUs where TCC P2P'ing is implemented exclusively via PCI-E.

    Post edited by RayDAnt on
  • Thanks for all the replies!!! I am doing rather complex scenes with up to four G8 figures in them on SUBD 4 and 12 genesis 1 on SUBD0 for background characters. Before I start tweaking my drivers, could anybody please let me know if this also works with DAZ Iray when I put my GPU's in TCC? also can I teh use my motherboards ports to hook up my displays?

    Thanks a lot,

    Me

  • nicsttnicstt Posts: 11,715

    Thanks for all the replies!!! I am doing rather complex scenes with up to four G8 figures in them on SUBD 4 and 12 genesis 1 on SUBD0 for background characters. Before I start tweaking my drivers, could anybody please let me know if this also works with DAZ Iray when I put my GPU's in TCC? also can I teh use my motherboards ports to hook up my displays?

    Thanks a lot,

    Me

    Why are you using G1 for background? Use G8 and reduce texture size. G1 is considerably more geometry.

  • I thought G1 had less geometry. I'm using G8 textures on the G1 figures that I reshade to make them look different. I'm only using one texture map for the male figures and one texture map for the female figures. I find that to be way more GPU RAm efficient than using separate textures (even reduced ones) for my background figures.

  • drzapdrzap Posts: 795

    Thanks for all the replies!!! I am doing rather complex scenes with up to four G8 figures in them on SUBD 4 and 12 genesis 1 on SUBD0 for background characters. Before I start tweaking my drivers, could anybody please let me know if this also works with DAZ Iray when I put my GPU's in TCC? also can I teh use my motherboards ports to hook up my displays?

    Thanks a lot,

    Me

    Unfortunately, you're not likely to get an answer on this one.  Currently the only cards that can go into TCC mode are professional Nvidia cards like Quadros and Titans and you will need a couple of them along with a bridge to test that out.  You will need to have a liquid cooled setup to properly use two RTX Titans in the same case (unless you want massive throttling) and Quadros are very expensive.  Anyone who has such a setup is most probably well beyond Daz Studio for rendering.  This means that there is probably nobody around who can test if memory pooling in DS actually works.

  • kyoto kidkyoto kid Posts: 41,847

    Thanks for all the replies!!! I am doing rather complex scenes with up to four G8 figures in them on SUBD 4 and 12 genesis 1 on SUBD0 for background characters. Before I start tweaking my drivers, could anybody please let me know if this also works with DAZ Iray when I put my GPU's in TCC? also can I teh use my motherboards ports to hook up my displays?

    Thanks a lot,

    Me

    ...this scene has eight Genesis figures (one is the train driver) along with a lot of other details including volumetric mist, reflectivity and a number of emissive sources at a resolution of 1,600 x 1,200.  When opened in the Daz programme, it takes up about 9.8 GB of system memory. Given what Ebergerly mentions, that would translate to at most, 4.9  GB of VRAM. This is not the most ambitious scene, as I have others that were unfinished because they were approaching the system RAM limits of my workstation (fortunately were saved on backup media), since I was still rendering Iray in CPU mode at the time.  I now have a Titan-X dedicated to rendering (a smaller VRAM GPU is running the displays).

     

     

    railway station beta.png
    1600 x 1200 - 3M
  • kenshaw011267kenshaw011267 Posts: 3,805
    kyoto kid said:

    Thanks for all the replies!!! I am doing rather complex scenes with up to four G8 figures in them on SUBD 4 and 12 genesis 1 on SUBD0 for background characters. Before I start tweaking my drivers, could anybody please let me know if this also works with DAZ Iray when I put my GPU's in TCC? also can I teh use my motherboards ports to hook up my displays?

    Thanks a lot,

    Me

    ...this scene has eight Genesis figures (one is the train driver) along with a lot of other details including volumetric mist, reflectivity and a number of emissive sources at a resolution of 1,600 x 1,200.  When opened in the Daz programme, it takes up about 9.8 GB of system memory. Given what Ebergerly mentions, that would translate to at most, 4.9  GB of VRAM. This is not the most ambitious scene, as I have others that were unfinished because they were approaching the system RAM limits of my workstation (fortunately were saved on backup media), since I was still rendering Iray in CPU mode at the time.  I now have a Titan-X dedicated to rendering (a smaller VRAM GPU is running the displays).

     

    That scene takes up far more than 5 Gb of VRAM. The amount of system RAM consumed by the PC while manipulating a scene is not directly comparable to how much VRAM will be consumed by the GPU during a render.

     

  • kenshaw011267kenshaw011267 Posts: 3,805
    RayDAnt said:

    You keep making this claim but have yet to show that it actually works. Simply putting a card into TCC mode does not establish that it can do any other Nvlink capability. TCC is strictly turning off the display outs to keep Windows from reserving VRAM for use as a video buffer. Based on the Quadro documentation you should be able to run one card in TCC and one not and still combine VRAM using Nvlink. 

    Um... nothing that I wrote about doing above has anything to do with NVLink. TCC mode shuts down a card's display output functionality and - MORE importantly since this is how the vram pooling actually gets implemented - puts it into a peer-to-peer resource sharing relationship (via PCI-E lanes and/or NVLink connectors depending on the GPU generations involved) with any other TCC-enabled cards detected in the system. NVLink has no bearing on pre-Turing/Volta GPUs whatsoever. And the OP is dealing with pre-Turing/Volta Titan GPUs where TCC P2P'ing is implemented exclusively via PCI-E.

    No Nvidia card without NVlink functionality can pool VRAM. That is one of the big selling points of Nvlink. Before the Nvlink bridges came out the Nvlink capability was built into motherboards, very expensive MoBo's meant for server and workstation CPU's.

  • RayDAntRayDAnt Posts: 1,154

    Thanks for all the replies!!! I am doing rather complex scenes with up to four G8 figures in them on SUBD 4 and 12 genesis 1 on SUBD0 for background characters.

     

    Before I start tweaking my drivers, could anybody please let me know if this also works with DAZ Iray when I put my GPU's in TCC?

    Yes, absolutely. In fact, if you boot up a copy of Iray in its server farm form (available from here as a free 30 day trial) Iray even suggests outright in its startup log for you to switch your GPUs over into TCC mode if you have any Quadro or Titan cards in your system. Eg. just booted Iray Server up on my single Titan RTX-based rendering rig as I was typing this, and this is what I got:

    [Mon, 01 Jul 2019 17:17:44] 3        WORKER_4 |   1.4   IRAY   rend info : CUDA device 0 (TITAN RTX): compute capability 7.5, 24 GiB total, 20.0706 GiB available, display attached[Mon, 01 Jul 2019 17:17:44] 3        WORKER_4 |   1.4   IRAY   rend info : CUDA device 0 (TITAN RTX): WDDM driver used, consider switching to TCC driver model if no display needed (via 'nvidia-smi -dm 1'), to increase rendering performance

     Iray itself is optimized to take advantage of GPUs in TCC mode. The only murky area in all this regarding support is whether or not Daz Studio itself introduces bugs into Iray's ability to function as designed. As others have said, there are very few Daz users out there with TCC capable cards (much less more than one of them, like you...) So there may be issues there. But again - TCC support in Iray itself is explicit.

     

    also can I teh use my motherboards ports to hook up my displays?

    Depends on which CPU you have. It needs to be a model that comes with integrated graphics. So if you have a relatively recent Intel processor you should be fine. If you're sporting an AMD chip, you may be in trouble. By default, recent AMD CPUs do not include a integrated GPU (the usual way to tell is look to see if your CPU has a letter 'G' somewhere in its model number.) If your motherboard has graphics ports but your GPU lacks an iGPU, then you need either a 3rd dedicated graphics card or a new CPU to make all of this work. 

  • HavosHavos Posts: 5,575
    edited July 2019

    I thought G1 had less geometry. I'm using G8 textures on the G1 figures that I reshade to make them look different. I'm only using one texture map for the male figures and one texture map for the female figures. I find that to be way more GPU RAm efficient than using separate textures (even reduced ones) for my background figures.

    In percentage terms, there is not a lot between the geometry size of G1-G8, maybe 20% at the most. In addition the space used by the geometry is dwarfed by that needed by the textures unless using very high SubD. I believe the geometry size from most to least is G2, G1, G3, G8.

    Post edited by Havos on
  • RayDAntRayDAnt Posts: 1,154
    edited July 2019

    No Nvidia card without NVlink functionality can pool VRAM. That is one of the big selling points of Nvlink. Before the Nvlink bridges came out the Nvlink capability was built into motherboards, very expensive MoBo's meant for server and workstation CPU's.

    TCC mode and P2P based resource sharing on Nvidia GPUs has been around for longer than NVLink has existed. Prior to NVLink, everthing was implemented via PCI-E lanes (the only thing special about the high end board systems you mention were the numbers of independent PCI-E lanes in the system.) More lanes = better performance. Having NVLink = even bettter performance. But even a consumer standard motherboard plus pre-NVLink Quadro/Titan cards will still work - just not as fast.

    Post edited by RayDAnt on
  • kenshaw011267kenshaw011267 Posts: 3,805
    RayDAnt said:

    No Nvidia card without NVlink functionality can pool VRAM. That is one of the big selling points of Nvlink. Before the Nvlink bridges came out the Nvlink capability was built into motherboards, very expensive MoBo's meant for server and workstation CPU's.

    TCC mode and P2P based resource sharing on Nvidia GPUs has been around for longer than NVLink has existed. Prior to NVLink, everthing was implemented via PCI-E lanes (the only thing special about the high end board systems you mention were the numbers of independent PCI-E lanes in the system.) More lanes = better performance. Having NVLink = even bettter performance. But even a consumer standard motherboard plus pre-NVLink Quadro/Titan cards will still work - just not as fast.

    Prove it. You've made a whole lot of claims that run counter to everything publsihed by Nvidia and by my experience and I've been IT manager of a server farm for nearly 5 years and worked with servers for better than a decade before that. 

  • RayDAntRayDAnt Posts: 1,154
    edited July 2019

    You've made a whole lot of claims that run counter to everything publsihed by Nvidia

    Everything I've said here comes directly from multiple official Nvidia documentation sources.

     

    by my experience and I've been IT manager of a server farm for nearly 5 years and worked with servers for better than a decade before that.

    Was any of that time spent operating Nvidia Tesla Compute Clusters? Because if the answer is no, then obviously there's no reason why your past work experience would be relevant to knowing how they function.

    Post edited by RayDAnt on
  • ebergerlyebergerly Posts: 3,255
    edited July 2019

    Again, I'm trying to understand why there's so much continuing discussion of nitty gritty details of P2P and TCC and NVLink when none of that matters unless your software is configured to pool memory. Which, AFAIK, Iray isn't, nor is just about any other software. Has anyone actually seen results of pooled memory tests on any software? Again, it seems like just another of those new technologies that people want to get excited about but just isn't ready for primetime. Kinda like was predicted last year (to a loud chorus of boo's...) about the still-not-yet-ready-for-primetime RTX fiasco. 

    The only real data I've seen on NVLINK and memory pooling is a few papers from Puget Systems from last year, but they pretty much conclude the same thing as I recall...that's it's all up to the software to implement it, but no solid details on any that do. 

    And oh, by the way, for this awesome new memory pooling you need a whole bunch more system RAM to support the gobs of linked GPU VRAM. So now we're talking 64-128GB of system RAM. And you need 1 or two expensive NVLINK connnectors along with the crazy expensive RTX cards that support it. And maybe even a 3rd GPU just to run your monitors if you use TCC. 

    Personally, I can't even imagine what kind of Iray scene would actually require 24GB+ of VRAM. I have to work hard and kludge together 3 scenes to get even 1/2 of that and overload my 1080ti. And is all of this cost and hassle so much more important than just doing some compositing? Personally, I just don't get it. But that's just me I suppose. 

    Post edited by ebergerly on
  • kenshaw011267kenshaw011267 Posts: 3,805
    RayDAnt said:

    You've made a whole lot of claims that run counter to everything publsihed by Nvidia

    Everything I've said here comes directly from multiple official Nvidia documentation sources.

     

    by my experience and I've been IT manager of a server farm for nearly 5 years and worked with servers for better than a decade before that.

    Was any of that time spent operating Nvidia Tesla Compute Clusters? Because if the answer is no, then obviously there's no reason why your past work experience would be relevant to knowing how they function.

    No. There is absolutely no where in Nvidia's documentation saying VRAM pooling was ever available even on Quadro's prior to Nvlink.

    TCC is a mode that Quadro's can be put in. It's not a kind of server. You may mean simply servers with Tesla cards but those can go in most any server rig. You do need specific MoBo's for multiple Tesla configurations but newer Mobo's are Nvlink enabled instead.

    And just to be clear I've been working with Quadro's and Tesla's in one way or another for a decade at least. 

Sign In or Register to comment.