iRay Confusion.

Robinson · November 2018

So I've got an NVIDIA RTX 2070 and I'm using 4.11 Daz beta. It's way faster rendering than my previous GTX 970 (obviously) but I'm a bit confused... is iRay or is iRay not making use of the RT cores on my graphics card? In other words, is this as good as it gets or will I get the promised 5-6 gigarays with some future update? If the latter... when.

Dolce Saito · November 2018

According to the latest OptiX blog post, latest optix released by nvidia, which is, version 5.1.1, doesn't support turing gpu's. So, api side, you've no luck yet for optix optimizations. Also, real time raytracing would be cool to have in Daz, but it is not being implemented I guess...

Robinson · November 2018

It doesn't need to be "real time" (the phrase is misused - real-time just means you have a time-based constraint); I'd just like the silicon in my graphics card to be used as intended, i.e. to speed up ray tracing.

prixat · November 2018

We're in the same boat as everyone else, other than the demos shown by nVidia, NOTHING uses RT cores... yet.

JD_Mortal · November 2018

Robinson said:

It doesn't need to be "real time" (the phrase is misused - real-time just means you have a time-based constraint); I'd just like the silicon in my graphics card to be used as intended, i.e. to speed up ray tracing.

RT was made to bring real-time ray-tracing into gaming, not to do anything, except possibly slow-down, single-frame renders. (Due to having the cores split, and not contiguious.) It is a "Volta chip and cuda chip", split into segments. It was intended to give "gamers" a taste of a high-end cuda/tensor card. Because "Volta" is not a good gaming card, but it is adequate.

RT will not speed it up, for "rendering a single image". You are using all the cuda-cores at once. There is nothing to "split" cores, nor would that make anything faster for a single output image. When playing a game, with 60-FPS, you can get faster "real time ray-trace rendering", by dedicating x-frames per core. Here, in Daz3D rendering, x = 1 output frame. In a game, x = (FPS/core-levels) or it is used for various depth-levels. Each frame renders at a fraction of the quality that all would produce, to create multiple "fast ugly" images instead of just one "slow detailed" image. (In animations, noise is less of an issue, because you only see a single image for 1/60th of a second.)

If you wanted to preview animations, without taking things to full-renders, you could potentially have a use for the core-splitting. However, for a final render, all cores would still be used per-frame. It is still about the number of "Cuda-cores" and soon, "Tensor-cores" (For the faster AI work). RT is a novelty that isn't embraced by anyone yet, not even them.

What they are not telling gamers is the fact that real-time ray-tracing will come at one, or two, of two costs. Display-lag and/or lower FPS. Because to display 60-FPS, you would need 6x 60FPS = 360FPS, to show 60FPS with nearly no lag. Which is possible for many older games. However, a game displaying 60FPS now, would be delayed by nearly one full second, as each frame is sent to one of six cores, to be processed. Virtually, up to 6 frames "behind input", or "real-time". The "DEMO" uses four cards, running in 4-way SLI, to display a simple 1080p resolution with "real-time ray-tracing".

Robinson · November 2018

RT will not speed it up, for "rendering a single image".

Of course it will. The RT cores run the BVH algorithm about ten times faster than CUDA cores do.

Anyway, I've been digging. The current implementation of Optix (5.1) does not support RTX. The next version, currently in progress, will (5.2). I'm guessing the DXR thing is a red-herring because Daz's iRay implementation will use the NVIDIA dll, not DXR. Presumably that would maintain cross-platform compatibility.

JD_Mortal · November 2018

Robinson said:

RT will not speed it up, for "rendering a single image".

Of course it will. The RT cores run the BVH algorithm about ten times faster than CUDA cores do.

I hope they just get the Tensor-cores version up and running. It has been a year, so far... The RT version of IRAY, if it exists in the future, may be a year wait for you too...

I can't speak for how well, or what gains could come from Optix, but I currently only see gains in the real-time "live-view", or "Daz3D preview window", from selecting Optix. When rendering, I get worse results from having Optix selected. (I thought I read that Optix was internally disabled, in IRAY, when using the DeNoiser option.) That is also why I wanted a seperate tab just for the "IRAY preview" render settings.

That BVH algorithem, for creating the tree-structure, is a one-time calculation, which is done within a fraction of a second... doing that one thing 10x faster will have no major impact on a single image. It is only actually 6x, because there are 6 core-divisions. On paper, or in specific situations, it may be "up to 10x faster". Which falls in line with the 10FPS which current "cuda" can do in real-time, but RTX does it fast enough for the full 60FPS in real time. Thus, the reason why they used 6 divisions. That speed is for DirectX(DXR) "falcore", "ray-tracing", not IRAY-rendering. There is no "quality" in INT4 and INT8 fast and dirty calculations, when our renderings all use FP16, FP32 and FP64. (2xxx cards have only a few FP64 cores, so they can at-least do some kind of FP64 calculations. I don't think the 1xxx cards have any FP64 cores.)

The denoising process is also done between x-iterations, and is already a light-weight process. Tensor cores do most of that work, for HD rendering, not RT, which is reserved for high FPS gaming. When you get to the final stages of rendering, there is little actual "denoising", because all the blurry noise has been replaced by absolute pixel data. (Again, something that makes fast-moving dirty images possible, clearly, not still images.) I also render slightly slower when using the DeNoiser, but that is, I am sure, due to the fact that it is using my CUDA core, whcih is thus not rendering the actual image in those iterations. (DeNoiser doesn't work on my any of my Titan-V cards, at all.)

Robinson · November 2018

JD_Mortal said:
That BVH algorithem, for creating the tree-structure, is a one-time calculation

It's not building the tree, it's casting the ray through it to find intersections (traversal). That happens billions of times in a typical render. It's one of the biggest hits to performance in ray tracing.

Some detail here: https://www.hardwarezone.com.sg/feature-what-you-need-know-about-ray-tracing-and-nvidias-turing-architecture/rt-cores-and-tensor-cores

Notifications

iRay Confusion.

Comments

Adding to Cart…