OT: New Nvidia Cards to come in RTX and GTX versions?! RTX Titan first whispers.

1111214161727

Comments

  • ebergerlyebergerly Posts: 3,255
    edited September 2018

    From the NVIDIA Optix 5.1 Programming Guide:

    "On system configurations without NVLINK support, the board with the smallest VRAM amount will be the limit for on-device resources in the OptiX context. In homogeneous multi-GPU systems with NVLINK bridges and the driver running in the Tesla Compute Cluster (TCC) mode, OptiX will automatically use peer-to-peer access across the NVLINK connections to use the combined VRAM of the individual boards together which allows bigger scene sizes."

    From the NVIDIA docs describing TCC mode:

    "NVIDIA GPUs can exist in one of two modes: TCC or WDDM. TCC mode disables Windows graphics and is used in headless configurations, whereas WDDM mode is required for Windows graphics. NVIDIA GPUs also come in three classes:

    • GeForce — typically defaults to WDDM mode; used in gaming graphics.
    • Quadro — typically defaults to WDDM mode, but often used as TCC compute devices as well.
    • Tesla — typically defaults to TCC mode. Current drivers require a GRID license to enable WDDM on Tesla devices."

    So it sounds closer to what I expected, that based on correct NVLINK connector and correct software/driver configuration, the Optix rendering tools will automatically do VRAM stacking, but without those (in GeForce, for example), you may have to do some more difficult, low level programming to get it to work. 

    Or something like that.... 

    Post edited by ebergerly on
  • kyoto kidkyoto kid Posts: 41,845
    edited September 2018
    Taoz said:
    kyoto kid said:

    ...well that's actually for two bridges as required for Volta cards.

    Ok, still pretty expensive though.

    ...true, though as I mentioned they are listing the NVLink connector for the new Turng RTX Quadros at 39$ (which was the old SLI bridge price).  Something about that doesn't just seem right particularly when the bridges for the GeForce cards are twice as expensive. It makes me wonder if memory pooling support isn't a "built in" feature of the new Quadro line (or the Quadro drivers) that just requires two cards to be linked together. They sure seem to make it a major selling point for their top end line whereas no hype about it at all for the GeForce series.

    Post edited by kyoto kid on
  • Kevin SandersonKevin Sanderson Posts: 1,643
    edited September 2018

    KK, the $39 one looks smaller on the NVIDIA site and they just call it Quadro SLI HB Bridge and the other ones are actually called NVLink. https://www.nvidia.com/en-us/design-visualization/quadro-store/

    I might actually take a break from all this. Neat Video is looking much better to me as a denoise option for $100 (it's CUDA accelerated). It will work with my Vegas program I already own.

    Post edited by Kevin Sanderson on
  • kyoto kidkyoto kid Posts: 41,845
    edited September 2018

    ..so looks as if they haven't released details on the new the NVLink bridge for the Turing Quadros yet. Although on this page they show the link configurations for the GP, GV and  RTX Quadros.

    https://www.nvidia.com/en-us/design-visualization/nvlink-bridges/

    It also specifies just a single bridge will be required between RTX5000, 6000, and 8000 cards instead of a pair as for the Volta and GP series.

    Interesting that the store link at the bottom right of the page goes to the same page that you linked to which is only shows the Pascal and Volta cards/links. Sort of misleading.

    Post edited by kyoto kid on
  • outrider42outrider42 Posts: 3,679
    Here is the link to Tom Peterson's comments on NVLink, the link takes you straight to that part of the conversation. You can also watch the rest of the video for more general info.

    Now he does mention that performance may drop when two cards pool memory, but you need to stop and think about what he is talking about...video games. Video games need this memory speed because they are constantly swapping data in and out as they render over 60 frames every second. But Iray and other render engines are not like gaming at all. For Iray, the entire scene is loaded to GPU. You already have data flowing between cards in mGPU setups. NVLink greatly speeds up this flow of data as it bypasses pcie entirely.

    So I believe that using NVLink to pool memory would not only increase memory, it may potentially even increase the render speed over the standard mGPU setup.

    And as NVLink bypasses pcie, I see no reason why anyone would need a new motherboard to support such a feature. No place mentions needing new boards, and more over, there are no new motherboards coming, we would certainly have preorders for them if this was true.

    So all that is left is for Iray to implement these features. I'm sure its only a matter of time. I'm feeling pretty good about things in general. I called Tensor coming to games, ray tracing coming to games, and Nvlink being able to pool memory. Everyone thought I was crazy, LOL. I missed the VRAM count, but I was mostly being optimistic, and the price is higher than expected, but pretty much everyone missed that one. The only prediction left is Iray performance. And I stand by my prediction that it will amaze you. The 5-8 boost OTOY is claiming sounds about right to me. For some scenes, I think Iray might even see higher gains than that, and this is before using any denoiser. I predict if you use the denoiser, you will see even higher gains than that. These are for going from a 1080ti to a 2080ti. I'm going balls out, who wants to bet against me?
  • bluejauntebluejaunte Posts: 1,990

    I do think 8x is a bit too optimistic after the comments by the Redshift dev who noted that a frame can need up to (if I understood correctly) 50% of calculations on shading where RT cores won't help. My promo scenes are mostly very simple with just a character and a simple background. So presumably not that much raytracing would be going but fairly complex shading of the dual lobe and SSS skin shader. I wasn't aware previously that raytracing and shading are two completely different things that need to happen each frame.

    But who am I to complain about 2x performance increase in a worst case scenario? Or would the worst case be way worse? I guess, the worst case would be that Iray doesn't support RT at all. Never know laugh

  • kyoto kidkyoto kid Posts: 41,845

    ...well still not totally conclusive as to whether memory stacking will or even can be implemented on the GeForce side.  Seems we most likely will have to wait until these cards are actually in the hands of users and developers. Going to keep an eye on Otoy as if anyone could or would enable it they would be a likely candidate.

    As to "bypassing" PCIe, the cards still need to "communicate" with the rest of the system memory via the PCIe interface and that will continue to be the "choke point", unless you have an MB with NVLink slots as well.  Such boards do exist but are specifically targeted for data centres and supercomputer arrays.  The only NVidia card that is fully NVLink enabled (both between cards and the CPU is the V-100 Tesla series and the CPUs, must also be NVLInk enabled as well (like the IBM Power Series).  Even the high end Quadros (including the forthcoming RTX series) still use PCIe 3.0 as the interface.

     

  • Ghosty12Ghosty12 Posts: 2,080
    edited September 2018

    They could be getting ready for PCIe 4.0 which is supposedly coming soon, with guesstimates putting it sometime in 2019..

    https://www.tomshardware.com/news/pcie-4.0-5.0-pci-sig-adata,37276.html

    Post edited by Ghosty12 on
  • kyoto kidkyoto kid Posts: 41,845
    edited September 2018

    ...wonderful, that means having to build a new system as I still am in PCIe 2.0 land (yeah, that's how old my hardware is X58 MB, DDR3 memory, Nehalem Xeon).

    Hope that Lotto ticket comes through for me tonight.

    Post edited by kyoto kid on
  • ebergerlyebergerly Posts: 3,255
    edited September 2018
    And as NVLink bypasses pcie, I see no reason why anyone would need a new motherboard to support such a feature. No place mentions needing new boards, and more over, there are no new motherboards coming, we would certainly have preorders for them if this was true.

     

    As Kyoto Kid says, that's the whole point of NVLink...to provide not only a new high speed link between GPU's, but also a new high speed link between the CPU and the GPU's because the latest generation of PCIe is relatively slow. Which means you need CPU's and motherboards which support this new NVLink hardware bus, and either totally replace or bypass the existing PCIe bus. And it sounds like the only hardware that does that now are the IBM POWER8 and POWER9 server systems.

    Will existing Intel/AMD hardware ever support NVLink? Well, seems to me that since PCIe 5.0 is coming next year, and it sounds like it has similar or greater transfer capabilities to NVLink (32GT/s vs 25GT/s, though who the heck knows if all those specs are apples-to-apples), it's kinda doubtful IMO. There's tons of peripherals out there that use PCIe right now, and I'd be surprised if NVIDIA conquers that world. 

    So yeah, if you really want the whole NVLink package you'd either get a motherboard and CPU that supports it, or see what happens next year with PCIe 5.0 and decide if you want a new computer with PCIe 5.0 hardware and peripherals. Or just wait and see what happens with the GPU-to-GPU NVLink situation.   

    Post edited by ebergerly on
  • ebergerlyebergerly Posts: 3,255
    edited September 2018

    BTW, regarding VRAM stacking...

    I mentioned a while back in the discussion on Windows 10 hogging GPU VRAM that an added wrinkle is the ability of CUDA (since like 2014 or something, about the same time they introduced NVLink?) to lump together the VRAM in all the GPU's together with system RAM into one, easily addressable chunk, called "Unified Memory". So it's nothing new.

    And CUDA, which handles the low level communication between the system and the GPU, and converting the serial CPU code into parallel GPU instructions, has a nice feature called "cudaMallocManaged", which replaces the old "cudaMalloc". It's the command you use to allocate memory. With the "new" command (actually like 4 years old) you can address Unified Memory if it's all configured properly. But as with any of this, the devil is in the details, such as is your software application using "cudaMallocManaged", and does your hardware and drivers and NVLink and Iray support it, and so on.

    But clearly NVIDIA has been headed that way for a while, it's just a matter of which hardware and software is configured to support it. Pascal increased the memory addressing capabilities up to 49 bits or something (70 terabytes?) so you could access more memory. CUDA 10 isn't yet available to the general public as far as I can tell, but when it is I'll download it and play with it to see how it all ties in with RTX. I'm still using 9.2, and there's no mention of anything RTX. 

    Post edited by ebergerly on
  • TaozTaoz Posts: 10,236
    Here is the link to Tom Peterson's comments on NVLink, the link takes you straight to that part of the conversation. You can also watch the rest of the video for more general info.  

    Note that that type of links do not work in this forum, the videos always start at 0:00. Maybe there is a way to make them work but I'm not aware of any.  

  • nicsttnicstt Posts: 11,715

    I do think 8x is a bit too optimistic after the comments by the Redshift dev who noted that a frame can need up to (if I understood correctly) 50% of calculations on shading where RT cores won't help. My promo scenes are mostly very simple with just a character and a simple background. So presumably not that much raytracing would be going but fairly complex shading of the dual lobe and SSS skin shader. I wasn't aware previously that raytracing and shading are two completely different things that need to happen each frame.

    But who am I to complain about 2x performance increase in a worst case scenario? Or would the worst case be way worse? I guess, the worst case would be that Iray doesn't support RT at all. Never know laugh

    It wasn't x8 but 'up x8'.

  • ebergerlyebergerly Posts: 3,255
    edited September 2018

    I do think 8x is a bit too optimistic after the comments by the Redshift dev who noted that a frame can need up to (if I understood correctly) 50% of calculations on shading where RT cores won't help.

    I agree. As explained elsewhere, the "8x" boost that was mentioned is a "rays per second" speed increase, not a "relative decrease in render times". And there are a lot more steps to the overall rendering process than just the ray tracing calculations. As you said there's figuring out the color/shading of the surface that the ray hits, there's de-noising, there's any physics calculations, and on and on. 

    That being said, those other components also have been separated out to a large extent as has been mentioned elsewhere, and hardware and software has been developed and updated in RTX (NGX, Optix 5.0, Physx, CUDA10, etc.) designed to speed up those elements as well. So the big question is when and how well will that all come together to result in significant improvements in render times? 

    Clearly there will be improvements, probably resulting in render time improvements greater than the historical 33% decreases we've seen in past generations. And solely based on price, I think the 2080ti will have to cut render times in half compared to a 1080ti for it to be viable for those of us doing renders. Not many would pay $1,200 for a card that can't do what two $600 cards can do. 

    My only prediction is that one of the nicer and most notable effects of a fully implemented RTX is that the Iray preview will be much faster, much like the beautiful realtime previews in Blender's Eevee, which is done with existing hardware. And that probably requires you enable the new AI de-noising if/when it's available.

    As far as bottom line render times, like I say it has to at least cut Iray render times in half compared to a 1080ti, so I won't be surprised at something like a 60-70% cut in render times. Which means a 10 minute render becomes a 3 or 4 minute render. 

    Personally, I'd be far more excited if Iray's realtime preview gets updated to become like Eevee using my existing hardware. I've already got two GPU's and I'm not interested in buying another one, especially at those prices. laugh 

        

    Post edited by ebergerly on
  • bluejauntebluejaunte Posts: 1,990

    Don't think Iray preview is going to profit more from RTX than a normal render. IRay preview isn't realtime rendering, just showing the result in a progressive way in the viewport, stops rendering when you move the viewport. It has nothing to do with the likes of Evee which is more akin to a game engine and rasterization.

  • ebergerlyebergerly Posts: 3,255
    You sure 'bout that?
  • bluejauntebluejaunte Posts: 1,990

    Reasonably sure. It's like any other "offline" renderer giving you a progressive preview, starting with a really rough, noisy approximation of the render and then it gradually clears up. Until you move the viewport again and it starts over. This isn't realtime, if it were it would render at like 30 to 60 frames per second and the finnished render would appear instantly.

  • ebergerly said:

    From the NVIDIA Optix 5.1 Programming Guide:

    "On system configurations without NVLINK support, the board with the smallest VRAM amount will be the limit for on-device resources in the OptiX context. In homogeneous multi-GPU systems with NVLINK bridges and the driver running in the Tesla Compute Cluster (TCC) mode, OptiX will automatically use peer-to-peer access across the NVLINK connections to use the combined VRAM of the individual boards together which allows bigger scene sizes."

    With that it's already clear that you can forget memory stacking with Iray and any Optix based renderer.

    For gamers Nvlink will have a big interrest vs SLI. I'm rather skecptical for Iray unless you buy Quadros or Titans

     

    Now he does mention that performance may drop when two cards pool memory, but you need to stop and think about what he is talking about...video games. Video games need this memory speed because they are constantly swapping data in and out as they render over 60 frames every second. But Iray and other render engines are not like gaming at all. For Iray, the entire scene is loaded to GPU. You already have data flowing between cards in mGPU setups. NVLink greatly speeds up this flow of data as it bypasses pcie entirely.

    That is not the case. PCIe communication is still required. See https://fuse.wikichip.org/news/1224/a-look-at-nvidias-nvlink-interconnect-and-the-nvswitch

    So I believe that using NVLink to pool memory would not only increase memory, it may potentially even increase the render speed over the standard mGPU setup.
    And as NVLink bypasses pcie, I see no reason why anyone would need a new motherboard to support such a feature. No place mentions needing new boards, and more over, there are no new motherboards coming, we would certainly have preorders for them if this was true.

    Since Nvlink will be a dead end for Iray and Optix based renderers, you can already forget that for consumer cards. So no need for a new motherboard in the end. Unless some motherboard with PCIe 4.0 come out and the Real Time Raytracing would benefit of a larger bandwith I don' t really see the need

    We can always dream that Nvidia would allow TCC on Geforce but I don't think that may happen

    Otherwise, the only possibility I see is to use Linux.

     

    ebergerly said:

    I do think 8x is a bit too optimistic after the comments by the Redshift dev who noted that a frame can need up to (if I understood correctly) 50% of calculations on shading where RT cores won't help.

    I agree. As explained elsewhere, the "8x" boost that was mentioned is a "rays per second" speed increase, not a "relative decrease in render times". And there are a lot more steps to the overall rendering process than just the ray tracing calculations. As you said there's figuring out the color/shading of the surface that the ray hits, there's de-noising, there's any physics calculations, and on and on. 

    That being said, those other components also have been separated out to a large extent as has been mentioned elsewhere, and hardware and software has been developed and updated in RTX (NGX, Optix 5.0, Physx, CUDA10, etc.) designed to speed up those elements as well. So the big question is when and how well will that all come together to result in significant improvements in render times? 

    Clearly there will be improvements, probably resulting in render time improvements greater than the historical 33% decreases we've seen in past generations. And solely based on price, I think the 2080ti will have to cut render times in half compared to a 1080ti for it to be viable for those of us doing renders. Not many would pay $1,200 for a card that can't do what two $600 cards can do. 

    My only prediction is that one of the nicer and most notable effects of a fully implemented RTX is that the Iray preview will be much faster, much like the beautiful realtime previews in Blender's Eevee, which is done with existing hardware. And that probably requires you enable the new AI de-noising if/when it's available.

    As far as bottom line render times, like I say it has to at least cut Iray render times in half compared to a 1080ti, so I won't be surprised at something like a 60-70% cut in render times. Which means a 10 minute render becomes a 3 or 4 minute render. 

    Personally, I'd be far more excited if Iray's realtime preview gets updated to become like Eevee using my existing hardware. I've already got two GPU's and I'm not interested in buying another one, especially at those prices. laugh

    The 5-8x speed up seems reasonable in my POV. I think it should even be possible to achieve higher performance. The introduction of tensor cores should open up some new ways to speed up renders and also bring some new possibilities in what we can do. For me it's just the beginning as software and techs should improve after a while and we'll see more gain after 1-2 years

     

  • ebergerlyebergerly Posts: 3,255
    edited September 2018
    ebergerly said:

    From the NVIDIA Optix 5.1 Programming Guide:

    "On system configurations without NVLINK support, the board with the smallest VRAM amount will be the limit for on-device resources in the OptiX context. In homogeneous multi-GPU systems with NVLINK bridges and the driver running in the Tesla Compute Cluster (TCC) mode, OptiX will automatically use peer-to-peer access across the NVLINK connections to use the combined VRAM of the individual boards together which allows bigger scene sizes."

    With that it's already clear that you can forget memory stacking with Iray and any Optix based renderer.

     

    I wouldn't jump to that conclusion. What it said is  "In homogeneous multi-GPU systems with NVLINK bridges and the driver running in the Tesla Compute Cluster (TCC) mode, OptiX will automatically use peer-to-peer access across the NVLINK connections to use the combined VRAM of the individual boards together which allows bigger scene sizes."

    That doesn't mean it can't be done manually with some effort in non-TCC systems (assuming the needed connectors, drivers, etc., are installed). In fact that's what Peterson of NVIDIA seemed to be saying:

     "With Quadro RTX cards, NVLink combines the memory of each card to create a single, larger memory pool. Petersen explained that this would not be the case for GeForce RTX cards. The NVLink interface would allow such a use case, but developers would need to build their software around that function."

    And CUDA has had that capability since 2014 with Unified Memory, so it's a matter of all the other pieces coming together. 

    Post edited by ebergerly on
  • ebergerly said:
    ebergerly said:

    From the NVIDIA Optix 5.1 Programming Guide:

    "On system configurations without NVLINK support, the board with the smallest VRAM amount will be the limit for on-device resources in the OptiX context. In homogeneous multi-GPU systems with NVLINK bridges and the driver running in the Tesla Compute Cluster (TCC) mode, OptiX will automatically use peer-to-peer access across the NVLINK connections to use the combined VRAM of the individual boards together which allows bigger scene sizes."

    With that it's already clear that you can forget memory stacking with Iray and any Optix based renderer.

     

    I wouldn't jump to that conclusion. What it said is  "In homogeneous multi-GPU systems with NVLINK bridges and the driver running in the Tesla Compute Cluster (TCC) mode, OptiX will automatically use peer-to-peer access across the NVLINK connections to use the combined VRAM of the individual boards together which allows bigger scene sizes."

    That doesn't mean it can't be done manually with some effort in non-TCC systems (assuming the needed connectors, drivers, etc., are installed). In fact that's what Peterson of NVIDIA seemed to be saying:

     "With Quadro RTX cards, NVLink combines the memory of each card to create a single, larger memory pool. Petersen explained that this would not be the case for GeForce RTX cards. The NVLink interface would allow such a use case, but developers would need to build their software around that function."

    And CUDA has had that capability since 2014 with Unified Memory, so it's a matter of all the other pieces coming together. 

    Iray is made by Nvidia. I doubt you'll get a special version for DS users.

    The unified memory has only been implemented with non consumer cards from Nvidia and it seems that is something that will go on

    Futthermore I've been reading few technical docs from Nvidia and many interesting features that I think are needed for Memory stacking do ask for TCC. Implementing memory stacking for Iray would ask a lot of modifications in these libraries including Cuda. I don't see why Nvidia would do it as it's beneficial for them to segment the market

    I didn't say it's not possible; it's rather the opposite, but you musn't rely on Nvidia's Libraries. So that would mean either DX12 or Vulkan based renderer. Which means not Iray

  • ebergerlyebergerly Posts: 3,255

    Reasonably sure. It's like any other "offline" renderer giving you a progressive preview, starting with a really rough, noisy approximation of the render and then it gradually clears up. Until you move the viewport again and it starts over. This isn't realtime, if it were it would render at like 30 to 60 frames per second and the finnished render would appear instantly.

    Check this out...

    https://www.youtube.com/watch?time_continue=23&v=l-5NVNgT70U

  • ebergerlyebergerly Posts: 3,255
    edited September 2018

     

    Iray is made by Nvidia. I doubt you'll get a special version for DS users.

    The unified memory has only been implemented with non consumer cards from Nvidia and it seems that is something that will go on

    Futthermore I've been reading few technical docs from Nvidia and many interesting features that I think are needed for Memory stacking do ask for TCC. Implementing memory stacking for Iray would ask a lot of modifications in these libraries including Cuda. I don't see why Nvidia would do it as it's beneficial for them to segment the market

    I didn't say it's not possible; it's rather the opposite, but you musn't rely on Nvidia's Libraries. So that would mean either DX12 or Vulkan based renderer. Which means not Iray

    I think there are a few misconceptions going on there. For example, VRAM stacking is more about CUDA and Optix, not Iray. And certainly not the DAZ Studio Iray.

    For example, CUDA is the low level language that manages the GPU's along with Windows WDDM, and CUDA and Optix work with NVLink. Here's a snip from the NVIDIA docs on the latest Optix:

    • Transparently scales across multiple GPUs
    • Automatically combines GPU memory over NVLink for large scenes

    And I believe the latest CUDA 10 and Optix 5.0 are all set to do VRAM stacking. But at this point it's all just speculation on how/if memory stacking will show up with GeForce RTX, or any other cards, in the future, so I guess we'll just have to wait. I think it's too early to really know for sure. 

    Post edited by ebergerly on
  • ebergerlyebergerly Posts: 3,255
    edited September 2018

    By the way, I do agree with your point on market segmentation. And I think it's probably most efficient for them to make their hardware as uniform as possible so they don't have to make separate production lines to build stuff. Same with software. It's much easier to make a single version rather than multiple versions you have to update and keep track of. 

    So my hunch is that much of the RTX hardware/cards, and especially the related software, API's, drivers, etc., have most if not all the functionality of the higher end stuff, but they enable and disable certain features for the different models to meet the public's expectation that more features means they should pay more for it. So I'm guessing there are some aspects of the lower end GeForce RTX cards and related software that are disabled, and in the higher end they are enabled to make life much easier for those who spent all that money. Doesn't mean you can't manually perform the same or similar functionality with existing software, they just don't give you access to all the software/drivers/API's that make it a breeze. 

    Just a guess though... 

    Post edited by ebergerly on
  • ebergerly said:

    Reasonably sure. It's like any other "offline" renderer giving you a progressive preview, starting with a really rough, noisy approximation of the render and then it gradually clears up. Until you move the viewport again and it starts over. This isn't realtime, if it were it would render at like 30 to 60 frames per second and the finnished render would appear instantly.

    Check this out...

    https://www.youtube.com/watch?time_continue=23&v=l-5NVNgT70U

    Wow! That is really nice compared to the ones I saw online with Redshift using AI.

  • nicsttnicstt Posts: 11,715
    edited September 2018

    sudo code

    var isRtx = true;

    if (isRtx){

        DisableSharedRam();

    }

    a simple flag could be used to enable and disable features; the software can easily determine what card its running on; I'm not saying either way what they'll do; they could enable RAM pooling on RTX cards, but as has been said, they do have different markets. WDDM is responsible for the currently situation with missing RAM; Nvidia could disable that 'feature' I understand, but haven't.

    Post edited by nicstt on
  • ebergerlyebergerly Posts: 3,255

    nicstt, that's exactly what I'm thinking. In fact I was imagining some aspect of the expensive Quadro RTX connectors that, when plugged in, signals CUDA/Optix that they can enable all their fancy VRAM stacking code that does it automatically. And if you plug in the cheap GeForce version, it says "oh, well this guy can do all the painful low level stuff to coordinate the complex VRAM stacking process himself". 

  • kyoto kidkyoto kid Posts: 41,845
    ebergerly said:
    ebergerly said:

    From the NVIDIA Optix 5.1 Programming Guide:

    "On system configurations without NVLINK support, the board with the smallest VRAM amount will be the limit for on-device resources in the OptiX context. In homogeneous multi-GPU systems with NVLINK bridges and the driver running in the Tesla Compute Cluster (TCC) mode, OptiX will automatically use peer-to-peer access across the NVLINK connections to use the combined VRAM of the individual boards together which allows bigger scene sizes."

    With that it's already clear that you can forget memory stacking with Iray and any Optix based renderer.

     

    I wouldn't jump to that conclusion. What it said is  "In homogeneous multi-GPU systems with NVLINK bridges and the driver running in the Tesla Compute Cluster (TCC) mode, OptiX will automatically use peer-to-peer access across the NVLINK connections to use the combined VRAM of the individual boards together which allows bigger scene sizes."

    That doesn't mean it can't be done manually with some effort in non-TCC systems (assuming the needed connectors, drivers, etc., are installed). In fact that's what Peterson of NVIDIA seemed to be saying:

     "With Quadro RTX cards, NVLink combines the memory of each card to create a single, larger memory pool. Petersen explained that this would not be the case for GeForce RTX cards. The NVLink interface would allow such a use case, but developers would need to build their software around that function."

    And CUDA has had that capability since 2014 with Unified Memory, so it's a matter of all the other pieces coming together. 

    Iray is made by Nvidia. I doubt you'll get a special version for DS users.

    The unified memory has only been implemented with non consumer cards from Nvidia and it seems that is something that will go on

    Futthermore I've been reading few technical docs from Nvidia and many interesting features that I think are needed for Memory stacking do ask for TCC. Implementing memory stacking for Iray would ask a lot of modifications in these libraries including Cuda. I don't see why Nvidia would do it as it's beneficial for them to segment the market

    I didn't say it's not possible; it's rather the opposite, but you musn't rely on Nvidia's Libraries. So that would mean either DX12 or Vulkan based renderer. Which means not Iray

    ...I agree.  Again, the Quadros (GP100 and later) also support TCC through NVLink which with the forthcoming Turing cards is a major Nvidia hype point. Now there is a "better" reason for a serious 3D artist to consider paying more rather than thinking they can get away with paying less for the consumer versions.

    Yeah 20,000$ (+ whatever the new link bridge will cost) is a lot of scratch, but imagine having 96 GB of VRAM (or for 12,000$, 48 GB) at your fingertips.  That's more than many of us have in physical memory on our systems and it's totally devoted to graphics. Crikey for 4,600$ (two RTX 5000s) + the link, one can have 32 GB which currently is only available on the 9,000$ GV100.  Some people here have invested more in Daz content than that.

  • nicsttnicstt Posts: 11,715
    ebergerly said:

    nicstt, that's exactly what I'm thinking. In fact I was imagining some aspect of the expensive Quadro RTX connectors that, when plugged in, signals CUDA/Optix that they can enable all their fancy VRAM stacking code that does it automatically. And if you plug in the cheap GeForce version, it says "oh, well this guy can do all the painful low level stuff to coordinate the complex VRAM stacking process himself"

    Haha, yeh something like that.

  • ebergerly said:

    I think there are a few misconceptions going on there. For example, VRAM stacking is more about CUDA and Optix, not Iray. And certainly not the DAZ Studio Iray.

    No misconception; and you forget about the TCC. Iray is built on Cuda and Optix. No TCC = no mem stacking with Optix = no mem stacking in Iray (it's the same for specific features in Cuda which need TCC)

    Anyway I don't think we should go on speculating. Just have to wait for the cards to be out and an updated version of DS Iray

    And to end the debate here is a snip from the Cuda 9.2 Doc with important points in red

    9.1.4. Unified Virtual Addressing

    Devices of compute capability 2.0 and later support a special addressing mode called Unified Virtual Addressing (UVA) on 64-bit Linux, Mac OS, and Windows XP and on Windows Vista/7 when using TCC driver mode. With UVA, the host memory and the device memories of all installed supported devices share a single virtual address space.

    Prior to UVA, an application had to keep track of which pointers referred to device memory (and for which device) and which referred to host memory as a separate bit of metadata (or as hard-coded information in the program) for each pointer. Using UVA, on the other hand, the physical memory space to which a pointer points can be determined simply by inspecting the value of the pointer using cudaPointerGetAttributes().

    Under UVA, pinned host memory allocated with cudaHostAlloc() will have identical host and device pointers, so it is not necessary to call cudaHostGetDevicePointer() for such allocations. Host memory allocations pinned after-the-fact via cudaHostRegister(), however, will continue to have different device pointers than their host pointers, so cudaHostGetDevicePointer() remains necessary in that case.

    UVA is also a necessary precondition for enabling peer-to-peer (P2P) transfer of data directly across the PCIe bus for supported GPUs in supported configurations, bypassing host memory.

    See the CUDA C Programming Guide for further explanations and software requirements for UVA and P2P.

    The P2P is also an important feature to get quick communications https://developer.nvidia.com/gpudirect

    Good luck to manage a multi GPU system's memory without NVDIA's API (read further the doc about coherency and other problems)

     

    nicstt said:

    sudo code

    var isRtx = true;

    if (isRtx){

        DisableSharedRam();

    }

    a simple flag could be used to enable and disable features; the software can easily determine what card its running on; I'm not saying either way what they'll do; they could enable RAM pooling on RTX cards, but as has been said, they do have different markets. WDDM is responsible for the currently situation with missing RAM; Nvidia could disable that 'feature' I understand, but haven't.

    I'll correct something there : It"s not about using a RTX or not. It's about running in TCC mode or not

    var TCCactive = true;

    if not (TCCactive){

        DisableSharedRam();

    }

    But it"s more complex than that. The point is that memory should be stacked even with older GFX Card even with an updated software provided the configuration is correct

  • ebergerlyebergerly Posts: 3,255
    edited September 2018

     

    kyoto kid said:

    Yeah 20,000$ (+ whatever the new link bridge will cost) is a lot of scratch, but imagine having 96 GB of VRAM (or for 12,000$, 48 GB) at your fingertips.  That's more than many of us have in physical memory on our systems and it's totally devoted to graphics. Crikey for 4,600$ (two RTX 5000s) + the link, one can have 32 GB which currently is only available on the 9,000$ GV100.  Some people here have invested more in Daz content than that.

    Don't forget the amount of system RAM you'd need in order to fill GPU's to a total of 48 or 96GB. That's like 100GB to 300GB of system RAM to add to your bill. Add another $2,000 to $5,000, plus a new computer to handle all of that. 

    I think I'll pass....laugh

    Post edited by ebergerly on
Sign In or Register to comment.