Maximum GPU RAm for GPU rendering in Iray

gibrril_fa945a6cee · July 2019

Thanks!

kenshaw011267 · July 2019

RayDAnt said:

gibrril_fa945a6cee said:

Thanks! But do NVLink connectors work on Titans?

Unfortunately the only Titans with NVLink support currently are the Titan V and Titan RTX. And NVLink is a physical connector standard - meaning that older generation cards can't be updated to include/be compatible with it.

ETA: Well, color me shocked. Just learned that the Titan V - despite having remnants of the physical connectors themselves - actually has no NVLink support (or SLI support for that matter.) So for Titans with NVLink you're currently looking at just the Titan RTX.

There are 2 versions of the Titan V. The CEO edition one does have Nvlink but is also stupidly expensive, you could good Quadro's for the price.

Takeo.Kensei · July 2019

RayDAnt said:

In late 2013, Nvidia debuted what continues to be its premier implementation of VRAM pooling (known officially as "Unified Memory") as part of CUDA 6

Unified Memory is not Vram pooling. It's a way to facilitate programming. Instead of having to manage x GPU memory + system memory and doing many data copy back and forth you just have one object to manipulate. The first little scheme in the link you've provided should be clear enough so as not to misunderstand what it is

RayDAnt said:

kenshaw011267 said:

TCC is a mode that Quadro's can be put in. It's not a kind of server.

TCC (Turing Compute Cluster) mode is the driver-level implementation of Nvidia's Tesla Accelerated Computing Platform. And it's use is required for access to underlying features of the platform like Unified Memory (aka VRAM pooling.)

You insist on TCC without knowing if it's relevant or not. Until DX12 and Microsot DXR and the Vulkan implementation, the only way to get VRAM pooling was to use Nvidia's API in which TCC mode was mandatory. The other way is not true. Having TCC doesn't mean you have VRAM Pooling.

With new consumer cards having NVlink and no TCC mode, nobody knows what is mandatory or not. And certainly not for an application that is not out yet

I've seen no Rendering application with VRAM pooling via PCIe, so putting some old Titan Cards in TCC mode will not allow anything else than may be saving a bit of Vram

So the correct answer to the OP is : buy some NVlink capable card. At this point nothing is confirmed for Iray regarding the minimum spec for Vram Pooling. Safest bet is Titan RTX or Quadro RTX but it is still too soon to make any intelligent decision if you want Vram pooling with Iray. If the OP is not limited to Iray, there is a DAZ-to-Octane plugin and Octane can do out of core render

Other option, is that there is also some freebie that can export DS scene to blender

My advice to the OP : wait for the vram pooling is confirmed before buying hardware and meanwhile, try optimizing texture use (there is a scene optimizer for DS) or export to a renderer that can do out of core rendering

ebergerly · July 2019

Takeo.Kensei said:

RayDAnt said:

In late 2013, Nvidia debuted what continues to be its premier implementation of VRAM pooling (known officially as "Unified Memory") as part of CUDA 6

Unified Memory is not Vram pooling. It's a way to facilitate programming. Instead of having to manage x GPU memory + system memory and doing many data copy back and forth you just have one object to manipulate. The first little scheme in the link you've provided should be clear enough so as not to misunderstand what it is

I wish we could "tap the brakes" a bit in trying to prove others wrong on some minor, fairly irrelevant technical details. Yes, Unified Memory has been around in CUDA for many years, and I suspect therefore it's become the standard way (using "cudaMallocManaged") for programmers to implement stuff like memory pooling, since it is a vast improvement. I've programmed using CUDA to develop a raytracer, and also used unified memory, and it's a huge improvement. Whether Unified Memory is an exact synonym for memory pooling is a minor and irrelevant technical detail. But effectively it's the same thing since unified memory is now the mechanism used for memory pooling.

Takeo.Kensei · July 2019

ebergerly said:

Takeo.Kensei said:

RayDAnt said:

In late 2013, Nvidia debuted what continues to be its premier implementation of VRAM pooling (known officially as "Unified Memory") as part of CUDA 6

Unified Memory is not Vram pooling. It's a way to facilitate programming. Instead of having to manage x GPU memory + system memory and doing many data copy back and forth you just have one object to manipulate. The first little scheme in the link you've provided should be clear enough so as not to misunderstand what it is

I wish we could "tap the brakes" a bit in trying to prove others wrong on some minor, fairly irrelevant technical details. Yes, Unified Memory has been around in CUDA for many years, and I suspect therefore it's become the standard way (using "cudaMallocManaged") for programmers to implement stuff like memory pooling, since it is a vast improvement. I've programmed using CUDA to develop a raytracer, and also used unified memory, and it's a huge improvement. Whether Unified Memory is an exact synonym for memory pooling is a minor and irrelevant technical detail. But effectively it's the same thing since unified memory is now the mechanism used for memory pooling.

Using unified memory on any Consumer Card doesn't give mem pooling. So I don't see where you get the equivalence between them

With your reasonning, any GPU programmatically managed with unified memory (which should be 99% of the cuda applications) should have Mem pooling ?

So can we go buy lots of cheap old Nvidia GPU, chain them with PCIe 1x riser and get a massive 60 GB Vram pooled system for a fraction of what Nvidia is selling ?

In my POV, these "irrelevant technical details" are important and if precise details in technical subject is not relevant for you then you should rather go to non technical discussions. These little details determine what can or cannot be done with your hardware

ebergerly · July 2019

Yeah, I think this topic was already pounded into the ground last year when I posted about how I wrote some CUDA code that effectively pooled GPU VRAM over the PCI bus, using the Unified Memory tools. The tools to pool memory exist, though there are different forms and implementations of memory pooling. Of that there is little doubt.

For example, you can take a huge array of many millions of elements, split it in half, send one half to one GPU to multiply each element by 2, and the other half to the other GPU to multiply each element by 2, and then get their results when they're done. You've effectively filled each GPU's VRAM with 1/2 an array, thereby doing a 24 GB matrix calculation with 2 x 12GB VRAM GPU's. But this use case doesn't require any communication between GPU's since each calculation is totally independent, so it's an easy version of memory pooling. You use the combined memory of two GPU's to solve a single problem, while doing it in 1/2 the time as it would be with only 1 GPU.

The question is whether that technology has yet been implemented over NVLINK in a true high speed bi-directional manner that would allow Iray to calculate a single scene where all raytracing elements are dependent on other elements (which might be residing on the other GPU), and therefore you need to communicate extremely fast (over NVLINK) in both directions (I seem to recall SLI is uni-directional??) to complete the task. I think it's clear that that functionality does not yet exist in Iray, or most other rendering apps.

The capability exists, but the implementation does not yet exist. I assume we can all agree on that?

ebergerly · July 2019

Also, when discussing pooling VRAM, don't lose sight of the complexity of the problem...

For example, let's say you have a scene with 20 objects. In my overly-simplified example, with VRAM pooling you'd move 1/2 of the scene info (maybe 10 of the object descriptions including vertices and materials and locations, etc.) into one GPU, and the other half into the other GPU. So all each GPU knows about is it's 1/2 of the scene.

So you send a ray for pixel 1,536,224 into the scene, and it hits an object in GPU1. Cool. Then the ray bounces off and, uh oh, it's heading for an object that's residing on the other GPU. So you need to have some management info somewhere that keeps track of all the data across both GPU's. And to do all of this very fast you need to transfer rays from one GPU to the other as fast as possible and return the results as fast as possible. Which means you need super fast, bi-directional data transfer and a whole lot of software to manage all of it. And someone has to write all of that software. And it has nothing to do with NVLINK.

Of course all of this could work if you did that communication over a slow speed like (like PCI), but it's dreadfully slow. So hypothetically maybe you could achieve memory pooling at the expense of much slower renders, depending on how well you write the rendering software and what you're willing accept for render speeds in exchange for more memory.

So don't think that "oh, it has NVLINK therefore it can pool memory and also speed up my Iray renders". It may be able to pool memory, but it may also be dreadfully slow. And last I recall the RTX cards are utilizing a uni-directional SLI version of NVLINK, which means it's not helping you nearly as much as the ultimate NVLINK. This stuff requires a LOT of parts to come together to get a final result.

kyoto kid · July 2019

...also as it was mentioned months ago by Nvida, that is the rendering software which needs to be modified to support VRAM pooling. So far VRay and the 2019 beta of Octane 4 are the only engines which support it, and then it is only available with the high end RTX and Volta cards. Tesla-V is also the only series which has an option for NVLink interface to the MB, RTX Quadros and Titans still use PCIe.

Notifications

Maximum GPU RAm for GPU rendering in Iray

Comments

Adding to Cart…