Nvidia Ampere Cards Supposed Specs and More.

Ghosty12Ghosty12 Posts: 2,080
edited May 2020 in The Commons

Watching this video on what could be coming with the next Nvidia cards they do sound awesome.. Although a long video talking about things not really relavant for us, from 14.33 to 16.20 is the most interesting part talking about NVCache and Tensor memory compression..

The talk about NVCache is the most exciting of all, as it looks like these cards will make full use of the systems memory and drives on top of the cards vram.. If all of this comes to pass, it will be interesting to see what Nvidia does with iRay at this point only time will tell but it is exciting none the less.. :)

Post edited by Ghosty12 on
«13

Comments

  • alex86firealex86fire Posts: 1,130

    Wow, I didn't check his sources but if what he says in there is true this is an insane boost.

    Can't wait for the cards to show up and Iray and Daz to start utilizing their strengths.

  • Ghosty12Ghosty12 Posts: 2,080

    Yeah I am hoping that if what is there is true it will be very cool, as then if iRay/Daz Studio is optimized for these new features the worry of running out of vram maybe a thing of the past..

  • SevrinSevrin Posts: 6,313

    He does seem very excited about the attention he's getting, but if he's gonna go on for half an hour I'll just wait for Thursday.

  • marblemarble Posts: 7,500
    Ghosty12 said:

    Yeah I am hoping that if what is there is true it will be very cool, as then if iRay/Daz Studio is optimized for these new features the worry of running out of vram maybe a thing of the past..

    I'm not sure that I understood the vram thing but was he talking about offloading to SSD rather than system RAM?

  • Ghosty12Ghosty12 Posts: 2,080
    marble said:
    Ghosty12 said:

    Yeah I am hoping that if what is there is true it will be very cool, as then if iRay/Daz Studio is optimized for these new features the worry of running out of vram maybe a thing of the past..

    I'm not sure that I understood the vram thing but was he talking about offloading to SSD rather than system RAM?

    My understanding of it is that it can be either or both system ram and SSD..

  • kenshaw011267kenshaw011267 Posts: 3,805

    The channel has generally had good sources but this seems way OTT. Wait for Thursday and then wait for the benchmarks and reviews.

  • Ghosty12Ghosty12 Posts: 2,080

    The channel has generally had good sources but this seems way OTT. Wait for Thursday and then wait for the benchmarks and reviews.

    It is a little bit but seems to be more concise than some of the other hardware channels.. But yes hopefully we get a clearer picture come Thursday..

  • RobinsonRobinson Posts: 751

    So low end Ampere is going to blow high end Turing out of the water in Ray Tracing.  I knew when I bought my 2070 it was kind-of early adption experimentation (very good, by the way).  The next generation of cards with RT is going to make it obsolete, pretty much.

  • outrider42outrider42 Posts: 3,679

    Some of it may be fluff, but I do think there will be more right than wrong.

    Just look logically at how much the 7nm process has changed AMD. You guys need to consider that Turing is on the same fab as Pascal, and look at the performance they got. Just shrinking the fab down alone is going to be a massive gain in performance. With a die shrink, they can add a lot more cores on a single die, while also boosting the clock speeds even higher on top of that.

    Tensor cores are going to be a wild card. Nvidia has been trying to think of just what they can do with them to benefit gamers more. If they can compress VRAM, that would be a huge win, too. But is that compression possible with Iray? I would think yes, but who knows. We'd have to see how it works.

    The access to SSD is very much what the PS5 is doing already. Some of you might think I am crazy, but the day is coming when a SSD becomes a requirement for certain games just like a certain level of graphics. But for us Daz users, this brings up a totally different prospect. Could this advancement lead to a form of out of core rendering? I think it is very possible.

    While the gaming line will probably not be discussed May 14, you will be able to get an idea of what Ampere is going to be. Will ray tracing be 4 times faster than Turing? Well we will find out the answer Thursday. Will Tensor cores do some kind of VRAM compression? We will find out Thursday.

    So we will find out a lot on Thursday. And as for "well, will gamers get that?" I don't think we need to question that. Back before we even knew what Turing was going to be, I told you guys that gaming would get the ray tracing and Tensor cores that the newly announced Quadros had. Some tried to argue against that, saying that Tensor was just for science. LOL.

    At any rate, stay tuned Thursday for the keynote on youtube.

  • alex86firealex86fire Posts: 1,130
    Robinson said:

    So low end Ampere is going to blow high end Turing out of the water in Ray Tracing.  I knew when I bought my 2070 it was kind-of early adption experimentation (very good, by the way).  The next generation of cards with RT is going to make it obsolete, pretty much.

    I don't expect them to work well with daz and Iray until next year though so it's worth the wait for whoever wants to buy something now but for whoever has something already (maybe not 2080TI or Titan) it's good that they have a good card in the meantime. I purchased a 2070 super in November and couldn't be happier with it. It lets me work with daz in a decent way until the RT series will be working properly.

    I will probably get something around 3070 at that time if the price point is similar.

  • rrwardrrward Posts: 556

    I am cautiously optimistic. I was able to jump from three 1080ti's to two 2080ti's and have faster render times. It might be worth it to go Quadra this time if the perfomance is high enough. (After waiting for Studio and any other software I'm using) to get updated to Ampere.

    Why is this hobby so damned expensive?

  • 1gecko1gecko Posts: 309
    rrward said:

    I am cautiously optimistic. I was able to jump from three 1080ti's to two 2080ti's and have faster render times. It might be worth it to go Quadra this time if the perfomance is high enough. (After waiting for Studio and any other software I'm using) to get updated to Ampere.

    Why is this hobby so damned expensive?

    because it's fun!!

    ... and it's art - and if you mix two expensive things into one, you get expensive^2 wink !

  • rrward said:

    I am cautiously optimistic. I was able to jump from three 1080ti's to two 2080ti's and have faster render times. It might be worth it to go Quadra this time if the perfomance is high enough. (After waiting for Studio and any other software I'm using) to get updated to Ampere.

    Why is this hobby so damned expensive?

    I know, right? I can no longer complain about how much my wife spends on clothes because I dwarf her petty outlays, now...

  • nicsttnicstt Posts: 11,715

    Interesting vid.

    Compelling arguments, but as we have no idea on his sources, it's prudent to wait and see.

    Unless other's know what his previous reporting is like and how accurate?

  • AsariAsari Posts: 703
    edited May 2020
    rrward said:

    I am cautiously optimistic. I was able to jump from three 1080ti's to two 2080ti's and have faster render times. It might be worth it to go Quadra this time if the perfomance is high enough. (After waiting for Studio and any other software I'm using) to get updated to Ampere.

    Why is this hobby so damned expensive?

    I know, right? I can no longer complain about how much my wife spends on clothes because I dwarf her petty outlays, now...

    Double win for me, I've bought more clothing assets for my digital people to wear than for myself, and plan to hold onto this until the end of the year to fund my 3080ti card. Also not upgrading my phone this year although I probably should, my phone is pretty old now. But between the choice of smartphone vs Nvidia, it is Nvidia. Sorry phone.
    Post edited by Asari on
  • marblemarble Posts: 7,500

    If they can compress VRAM, that would be a huge win, too. But is that compression possible with Iray? I would think yes, but who knows. We'd have to see how it works.

    The access to SSD is very much what the PS5 is doing already. Some of you might think I am crazy, but the day is coming when a SSD becomes a requirement for certain games just like a certain level of graphics. But for us Daz users, this brings up a totally different prospect. Could this advancement lead to a form of out of core rendering? I think it is very possible.

     

    This was precisely what I was getting at when I said I don't understand what he was saying about SSD. In other words, did he mean out-of-core or is that not a consideration for gaming? 

  • SpottedKittySpottedKitty Posts: 7,232
    edited May 2020

    Isn't out-of-core going to be a rendering time bottleneck? Even if an SSD is configured as an internal drive, how does that compare with processing speed using the card's on-board VRAM?

    Edit: And that's a heck of a big card. Would it fit in anything smaller than a mini- or full tower case? And what about power requirements? And...

    Post edited by SpottedKitty on
  • kenshaw011267kenshaw011267 Posts: 3,805

    Isn't out-of-core going to be a rendering time bottleneck? Even if an SSD is configured as an internal drive, how does that compare with processing speed using the card's on-board VRAM?

    Edit: And that's a heck of a big card. Would it fit in anything smaller than a mini- or full tower case? And what about power requirements? And...

    There is significant latency when fetching data to VRAM. The request has to be generated on the GPU, sent across PCIE to the CPU, wait for the software's CPU thread's time slice, the CPU generates the request to the SSD which goes over either SATA or PCIE to the drive which then has to find the data and read it. Then that all gets reversed to send the data to the GPU. This could be 100's of milliseconds, assuming small fecthes. Compared to the microseconds or so it takes to run an operation on data already in VRAM that could significantly slow down rendering. Which is why iRay isn't out of core. But there have been advances in this sort of coding so that the entire process doesn't block waiting for the new data and that could cut latency significantly.

  • RobinsonRobinson Posts: 751

    I don't expect them to work well with daz and Iray until next year though so it's worth the wait for whoever wants to buy something now but for whoever has something already (maybe not 2080TI or Titan) it's good that they have a good card in the meantime. I purchased a 2070 super in November and couldn't be happier with it. It lets me work with daz in a decent way until the RT series will be working properly.

    I will probably get something around 3070 at that time if the price point is similar.

    I don't see why it wouldn't work right away.  It's more asics isn't it (tensor and compute), not a new kind of interface for the software to work against. 

  • outrider42outrider42 Posts: 3,679

    Isn't out-of-core going to be a rendering time bottleneck? Even if an SSD is configured as an internal drive, how does that compare with processing speed using the card's on-board VRAM?

    Edit: And that's a heck of a big card. Would it fit in anything smaller than a mini- or full tower case? And what about power requirements? And...

    There is significant latency when fetching data to VRAM. The request has to be generated on the GPU, sent across PCIE to the CPU, wait for the software's CPU thread's time slice, the CPU generates the request to the SSD which goes over either SATA or PCIE to the drive which then has to find the data and read it. Then that all gets reversed to send the data to the GPU. This could be 100's of milliseconds, assuming small fecthes. Compared to the microseconds or so it takes to run an operation on data already in VRAM that could significantly slow down rendering. Which is why iRay isn't out of core. But there have been advances in this sort of coding so that the entire process doesn't block waiting for the new data and that could cut latency significantly.

     I would be willing to bet money that nearly 100% of the Daz Iray audience would happily accept a performance penalty in exchange for out of core rendering. Even a large performance hit. After all, the current alternative is exactly 0 performance and rendering on a CPU at pitiful fraction of the speed. Unless you rock a $4000 Threadripper.

    You keep bringing up latency, but what if that latency was reduced?  Besides, I don't think latency would be a big issue for Iray in particular, because Iray does not operate like any video game. Iray is not constantly swapping data. Even in out of core, it would not be as time sensitive as a video game or 90% of other software. We have Nvlink now, which also shares data over two cards from the Nvlink bridge. The performance penalty for Nvlink with Vray is quite tiny, like 5-10%. I have not seen anybody post benchmarks with Nvlin since it became supported, but I would wager its very similar. Even if out of core rendering cut rendering speed in half, that would still be WAY better than the alternative of dropping to CPU only.

    Kyotokid would be doing back flips, and he wouldn't be alone. Granted, a lot Daz users might suffer physical harm from attempting back flips. But once everybody gets out of the hospital they will be rendering with huge Cheshire grins on their faces.

    And somehow, miraculously, another CUDA based rendering engine has out of core rendering available today, without any special hardware requirement for doing so. Octane allows for out of core rendering. When enabled, Octane will allow texture data to be stored in system RAM. The newer Octane can even share geometry out of core. Which interestingly, Nvlink only shares texture data when it is used for Iray. But hey, texture data can add up.

    Now obviously Octane is not using a SSD for out of core. But regardless, faster communication between components like the SSD is going to be pushed more and more. The PS5 has an SSD that operates at speeds of DDR2. So we have a SSD that is capable of performing like RAM already in existence. 

     

     

    Robinson said:

    I don't expect them to work well with daz and Iray until next year though so it's worth the wait for whoever wants to buy something now but for whoever has something already (maybe not 2080TI or Titan) it's good that they have a good card in the meantime. I purchased a 2070 super in November and couldn't be happier with it. It lets me work with daz in a decent way until the RT series will be working properly.

    I will probably get something around 3070 at that time if the price point is similar.

    I don't see why it wouldn't work right away.  It's more asics isn't it (tensor and compute), not a new kind of interface for the software to work against. 

    I believe Ampere will work of the box with Iray. Not because Iray supports RT and Tensor now, but because OptiX 6.0 in Iray RTX does not need to recompiled for new arch's. The old OptiX Prime needed to be recompiled for every new arch. That is why we always had to wait for Iray to support new cards. With OptiX 6.0 that should be a thing of the past.

    Take a look at this information about OptiX Prime:

    OptiX Prime[edit]

    Starting from OptiX 3.5.0 a second library called OptiX Prime was added to the bundle which aims to provide a fast low-level API for ray tracing - building the acceleration structure, traversing the acceleration structure, and ray-triangle intersection. Prime also features a CPU fallback when no compatible GPU is found on the system. Unlike OptiX, Prime is not a programmable API, so lacks support for custom, non-triangle primitives and shading. Being non-programmable, OptiX Prime does not encapsulate the entire algorithm of which ray tracing is a part. Thus, Prime cannot recompile the algorithm for new GPUs, refactor the computation for performance, or use a network appliance like the Quadro VCA, etc.

    Iray doesn't use OptiX Prime anymore kids! Its a moot point! I would expect that a simple driver update would be all that's needed for Ampere to work with Iray.

    Nvidia CEO Jensen posted this to youtube today. How about rendering Daz Iray with this little toy?

  • kenshaw011267kenshaw011267 Posts: 3,805
    edited May 2020

    Isn't out-of-core going to be a rendering time bottleneck? Even if an SSD is configured as an internal drive, how does that compare with processing speed using the card's on-board VRAM?

    Edit: And that's a heck of a big card. Would it fit in anything smaller than a mini- or full tower case? And what about power requirements? And...

    There is significant latency when fetching data to VRAM. The request has to be generated on the GPU, sent across PCIE to the CPU, wait for the software's CPU thread's time slice, the CPU generates the request to the SSD which goes over either SATA or PCIE to the drive which then has to find the data and read it. Then that all gets reversed to send the data to the GPU. This could be 100's of milliseconds, assuming small fecthes. Compared to the microseconds or so it takes to run an operation on data already in VRAM that could significantly slow down rendering. Which is why iRay isn't out of core. But there have been advances in this sort of coding so that the entire process doesn't block waiting for the new data and that could cut latency significantly.

     I would be willing to bet money that nearly 100% of the Daz Iray audience would happily accept a performance penalty in exchange for out of core rendering. Even a large performance hit. After all, the current alternative is exactly 0 performance and rendering on a CPU at pitiful fraction of the speed. Unless you rock a $4000 Threadripper.

    You keep bringing up latency, but what if that latency was reduced?  Besides, I don't think latency would be a big issue for Iray in particular, because Iray does not operate like any video game. Iray is not constantly swapping data. Even in out of core, it would not be as time sensitive as a video game or 90% of other software. We have Nvlink now, which also shares data over two cards from the Nvlink bridge. The performance penalty for Nvlink with Vray is quite tiny, like 5-10%. I have not seen anybody post benchmarks with Nvlin since it became supported, but I would wager its very similar. Even if out of core rendering cut rendering speed in half, that would still be WAY better than the alternative of dropping to CPU only.

    Kyotokid would be doing back flips, and he wouldn't be alone. Granted, a lot Daz users might suffer physical harm from attempting back flips. But once everybody gets out of the hospital they will be rendering with huge Cheshire grins on their faces.

    And somehow, miraculously, another CUDA based rendering engine has out of core rendering available today, without any special hardware requirement for doing so. Octane allows for out of core rendering. When enabled, Octane will allow texture data to be stored in system RAM. The newer Octane can even share geometry out of core. Which interestingly, Nvlink only shares texture data when it is used for Iray. But hey, texture data can add up.

    Now obviously Octane is not using a SSD for out of core. But regardless, faster communication between components like the SSD is going to be pushed more and more. The PS5 has an SSD that operates at speeds of DDR2. So we have a SSD that is capable of performing like RAM already in existence. 

    Let me post some stats so we're on teh same page.

    NVLink transmits data between the two cards at no less than 50Gb/s. (different cards support higher bandwidths)

    PCIE gen 3 is 985 Mb/s (so an x4 is 3.9 Gb/s while a x16 is 15.7 Gb/s

    PCIE gen 4 is double gen 3 so almost 2 Gb/s, x4 is 8. 

    SATA is 300 Mb/s (there are slower SATA versions but they should mostly be gone now).

    Memeory bandwidth on a modern CPU is around 4 Gb/s (so many factors effect this I'd need to test a specific set of HW to give an accuarte number).

    So the NVlink is like a firehose of data and latency isn't much of a problem. 

    A gen x16 slot is pretty good and isn't going to be the major bottleneck in reading data from any disk.

    But look at those numbers for SATA or 4x PCIE lanes. Even a Gen 4 m.2 is going to struggle with the speed of data demanded by a GPU.

    Every CUDA "core" is a seperate processing pipeline so a couple of thousand processing units all demanding access to the same data, the scene geometry and textures, all the time. Octane must be doing some pretty intensive work to try and keep the relevant data on the card and as much as possible in the CPU cache. going all the w3ay to RAM is really going to hurt performance but if you really want a huge scene rendered on cheap hardware go for it. I would like to see some benchmarks for in core versus out of core renders though.

    However I think the biggest issue is that this is still a software issue. Some feature of Ampere cards might make this possible in games but that would still require coding to the new API. The same applies to iRay even if the functionality is in the cards it would still need to be enabled.

    Post edited by kenshaw011267 on
  • Ghosty12Ghosty12 Posts: 2,080

    After reading this article which is a summation of sorts of the video in my OP, https://www.tweaktown.com/news/72399/nvidia-ampere-rumor-nvcache-speeds-up-load-times-optimize-vram-usage/index.html and remembering what was in that video, if what is said is true hopefully we find out Thursday. NVCache is supposed to work similar to how AMD's HBCC = High Bandwidth Cache Controller works, in that it borrows some resources from both the systems ram and SSD to speed up load times (games) and optimize vram usage as well.

  • kyoto kidkyoto kid Posts: 41,925

    ...hmm maybe when they release an Ampere Titan, the price for the Turing one might drop .  24 GB of VRAM, yeah I could do with that.

  • marblemarble Posts: 7,500
    edited May 2020
    kyoto kid said:

    ...hmm maybe when they release an Ampere Titan, the price for the Turing one might drop .  24 GB of VRAM, yeah I could do with that.

    I got the impression from the video in the OP that we shouldn't expect much in the way of increased VRAM. Tha's why I'm trying to get a clear idea of whether out-of-core will be available. If I upgrade my 1070 it will probably have to be a 3070 which, from the rumours, looks like it might stay at 8GB or perhaps a slight jump to 10. 

    That TweakTown article mentions the 3070 and lists 8/16 GB VRAM but I'm not sure whether that indicates a choice of two versions or a guess that it might be either 8 or 16.

    Post edited by marble on
  • alex86firealex86fire Posts: 1,130
    marble said:
    kyoto kid said:

    ...hmm maybe when they release an Ampere Titan, the price for the Turing one might drop .  24 GB of VRAM, yeah I could do with that.

    I got the impression from the video in the OP that we shouldn't expect much in the way of increased VRAM. Tha's why I'm trying to get a clear idea of whether out-of-core will be available. If I upgrade my 1070 it will probably have to be a 3070 which, from the rumours, looks like it might stay at 8GB or perhaps a slight jump to 10. 

    That TweakTown article mentions the 3070 and lists 8/16 GB VRAM but I'm not sure whether that indicates a choice of two versions or a guess that it might be either 8 or 16.

    What I understood from the video when I first saw it is that maybe you could use other stuff except VRAM for some of the applications. If you can select for what you use VRAM and for what you don't, it would be great, you could use everything else in windows for out of core memory storage so you have all the VRAM for Daz.

    Even without this though, the tensor memory compression, once fully utilized by Iray *I don't know if they have to do anything for that) would already give a huge boost in the capacity of the VRAM for what I understand would be a small penalty in performance. 

     

    Robinson said:

    I don't expect them to work well with daz and Iray until next year though so it's worth the wait for whoever wants to buy something now but for whoever has something already (maybe not 2080TI or Titan) it's good that they have a good card in the meantime. I purchased a 2070 super in November and couldn't be happier with it. It lets me work with daz in a decent way until the RT series will be working properly.

    I will probably get something around 3070 at that time if the price point is similar.

    I don't see why it wouldn't work right away.  It's more asics isn't it (tensor and compute), not a new kind of interface for the software to work against. 

    The same could be said about the RTX cards from what I know yet it took a long time for them to properly work.

    I think that at least the rendering engine, IRay in our case, has to be optimised to know how to work properly with the cards.

    We will get some sort of performance either way but not sure how much. I am not an expert though so I might be wrong.

  • kyoto kidkyoto kid Posts: 41,925
    edited May 2020
    marble said:
    kyoto kid said:

    ...hmm maybe when they release an Ampere Titan, the price for the Turing one might drop .  24 GB of VRAM, yeah I could do with that.

    I got the impression from the video in the OP that we shouldn't expect much in the way of increased VRAM. Tha's why I'm trying to get a clear idea of whether out-of-core will be available. If I upgrade my 1070 it will probably have to be a 3070 which, from the rumours, looks like it might stay at 8GB or perhaps a slight jump to 10. 

    That TweakTown article mentions the 3070 and lists 8/16 GB VRAM but I'm not sure whether that indicates a choice of two versions or a guess that it might be either 8 or 16.

    ...well the Turing RTX Titan is already 24 GB.  If there is an Ampere version and with a higher core counts and other performance improvements, I still think the price of the older one would drop somewhat. 

    Anyway seems a lot of the talk about performance improvement concerning Ampere related to games and fame rate with ray tracing on. As I am not a gamer VRAM for rendering is all that interests me and if i means using the Turing Titan I'd be fine with that.

    Post edited by kyoto kid on
  • LenioTGLenioTG Posts: 2,118

    I am hyped! :D

    If they'll sell a RTX 3080 with 10GB of VRAM that goes 30% faster than a 2080 Ti for 800€, I'll buy it! :)

  • kenshaw011267kenshaw011267 Posts: 3,805

    If NVCache is equivalent HBCC then it likely won't support anything like out of core rendering. 

    HBCC tracks usage of things like textures during gameplay and leaves the less used stuff on disk or in system RAM. That really has no bearing on a render in iray. The whole scene is being lit and rays are boucing around all the time essentially randomly. At best you might see a reduction in VRAM needed during a rander as the cache controller figures out what textures aren't being used because they can't be seen by the camera and are not reflecting off any surface but as to using less VRAM straight from the start? That's just not what HBCC does.

  • outrider42outrider42 Posts: 3,679
    edited May 2020

    Isn't out-of-core going to be a rendering time bottleneck? Even if an SSD is configured as an internal drive, how does that compare with processing speed using the card's on-board VRAM?

    Edit: And that's a heck of a big card. Would it fit in anything smaller than a mini- or full tower case? And what about power requirements? And...

    There is significant latency when fetching data to VRAM. The request has to be generated on the GPU, sent across PCIE to the CPU, wait for the software's CPU thread's time slice, the CPU generates the request to the SSD which goes over either SATA or PCIE to the drive which then has to find the data and read it. Then that all gets reversed to send the data to the GPU. This could be 100's of milliseconds, assuming small fecthes. Compared to the microseconds or so it takes to run an operation on data already in VRAM that could significantly slow down rendering. Which is why iRay isn't out of core. But there have been advances in this sort of coding so that the entire process doesn't block waiting for the new data and that could cut latency significantly.

     I would be willing to bet money that nearly 100% of the Daz Iray audience would happily accept a performance penalty in exchange for out of core rendering. Even a large performance hit. After all, the current alternative is exactly 0 performance and rendering on a CPU at pitiful fraction of the speed. Unless you rock a $4000 Threadripper.

    You keep bringing up latency, but what if that latency was reduced?  Besides, I don't think latency would be a big issue for Iray in particular, because Iray does not operate like any video game. Iray is not constantly swapping data. Even in out of core, it would not be as time sensitive as a video game or 90% of other software. We have Nvlink now, which also shares data over two cards from the Nvlink bridge. The performance penalty for Nvlink with Vray is quite tiny, like 5-10%. I have not seen anybody post benchmarks with Nvlin since it became supported, but I would wager its very similar. Even if out of core rendering cut rendering speed in half, that would still be WAY better than the alternative of dropping to CPU only.

    Kyotokid would be doing back flips, and he wouldn't be alone. Granted, a lot Daz users might suffer physical harm from attempting back flips. But once everybody gets out of the hospital they will be rendering with huge Cheshire grins on their faces.

    And somehow, miraculously, another CUDA based rendering engine has out of core rendering available today, without any special hardware requirement for doing so. Octane allows for out of core rendering. When enabled, Octane will allow texture data to be stored in system RAM. The newer Octane can even share geometry out of core. Which interestingly, Nvlink only shares texture data when it is used for Iray. But hey, texture data can add up.

    Now obviously Octane is not using a SSD for out of core. But regardless, faster communication between components like the SSD is going to be pushed more and more. The PS5 has an SSD that operates at speeds of DDR2. So we have a SSD that is capable of performing like RAM already in existence. 

    Let me post some stats so we're on teh same page.

    NVLink transmits data between the two cards at no less than 50Gb/s. (different cards support higher bandwidths)

    PCIE gen 3 is 985 Mb/s (so an x4 is 3.9 Gb/s while a x16 is 15.7 Gb/s

    PCIE gen 4 is double gen 3 so almost 2 Gb/s, x4 is 8. 

    SATA is 300 Mb/s (there are slower SATA versions but they should mostly be gone now).

    Memeory bandwidth on a modern CPU is around 4 Gb/s (so many factors effect this I'd need to test a specific set of HW to give an accuarte number).

    So the NVlink is like a firehose of data and latency isn't much of a problem. 

    A gen x16 slot is pretty good and isn't going to be the major bottleneck in reading data from any disk.

    But look at those numbers for SATA or 4x PCIE lanes. Even a Gen 4 m.2 is going to struggle with the speed of data demanded by a GPU.

    Every CUDA "core" is a seperate processing pipeline so a couple of thousand processing units all demanding access to the same data, the scene geometry and textures, all the time. Octane must be doing some pretty intensive work to try and keep the relevant data on the card and as much as possible in the CPU cache. going all the w3ay to RAM is really going to hurt performance but if you really want a huge scene rendered on cheap hardware go for it. I would like to see some benchmarks for in core versus out of core renders though.

    However I think the biggest issue is that this is still a software issue. Some feature of Ampere cards might make this possible in games but that would still require coding to the new API. The same applies to iRay even if the functionality is in the cards it would still need to be enabled.

    And yet...Octane does it. And they've done it for a couple years now. Octane has proven this is possible, in spite of the lag, in spite of the bus speed. 

    Also, Vray is planning on adding out of core as an option. So then you'll have another render engine to add to the growing list.

    Again, does performance take a hit? Of course, its a big hit. But again, its better than the alternative of pure CPU rendering.

    Also, AMD ProRender is yet another that supports out of core. And more interestingly since you mentioned it, AMD specifically states HBCC as being required to run out of core. So HBCC is not just for gaming. Even the card AMD suggests it it with is a pro card. Quote:

    Out of core rendering works by keeping textures in system memory (RAM) and streaming them into GPU memory as and when required. In order to take advantage of this feature, you will need a Vega-based GPU with an HBCC (High-Bandwidth Cache Controller), such as the Radeon Pro WX 9100.

     

    marble said:
    kyoto kid said:

    ...hmm maybe when they release an Ampere Titan, the price for the Turing one might drop .  24 GB of VRAM, yeah I could do with that.

    I got the impression from the video in the OP that we shouldn't expect much in the way of increased VRAM. Tha's why I'm trying to get a clear idea of whether out-of-core will be available. If I upgrade my 1070 it will probably have to be a 3070 which, from the rumours, looks like it might stay at 8GB or perhaps a slight jump to 10. 

    That TweakTown article mentions the 3070 and lists 8/16 GB VRAM but I'm not sure whether that indicates a choice of two versions or a guess that it might be either 8 or 16.

    The 8/16 are possible configurations of the 3070. They likely indicate gamer/pro versions of that card. Many Quadros in the past just double the VRAM of their gaming counterparts. So sadly it is not likely there would be a 16GB version of the 3070, the 16 is refering to the Quadro equivalent to the 3070. We can dream, though. Some configurations are possibly still undecided at this point, in fact some final specs may not be set down until just one month before production. They should even have the dies from TSMC already (since they are launching the Quadros and Teslas,) but the question is where do they make the cuts to cut down the chips for the 3080 and 3070. The yields will certainly be one factor. If the yields are not great, that could lead to some lower core counts because it would be difficult to manufacture the higher core counts. So there is a balance between yield, price, and of course, AMD competition that all factor in these specs.

    Some people also forget that Nvidia is not just competing against AMD, they are competing against consoles. If an Xbox can match a 2080 in performance and it sells for $500, and then Ampere is not drastically better than a 2080 for a decent price...some people may just buy a console and skip over PC. So that is a factor as well. The consoles are real threat to Nvidia since gaming is a big part of their business. Never forget that. Consoles nearly killed PC gaming for a good while. That is another reason I believe that Nvidia will not hold back. They simply cannot afford to.

    At any rate the keynote is just hours away. So we shall see what Jensen is cooking. Look for the details. While this keynote is about the pro line, there will be clues for the gaming lineup. 

    Post edited by outrider42 on
  • SevrinSevrin Posts: 6,313

    So I went through the 8 GTC videos and... there wasn't any guidance about desktop GPUs to be found.  Did I miss something?

Sign In or Register to comment.