dForce - fp16, fp32 or FP64 ?

alofaro · March 2025

Hello, I have a very specific question, for which I did not find an answer (and yes, I checked the pinned post "dForce start here", the only indication there is "float", I can guess something, but I could be guessing wrong).
Does anybody (I hope someone from Daz answers, actually) if dForce in doing the simulations uses FP16 (half precision, though I doubt it), FP32 (single precision) or FP64 (double precision) ?

The reason is simple, I have more than one GPU, and for Iray that works great, I can tell DAZ to use all of them.
However, I saw that for dForce, is possible to select only one - dForce is made with OpenCL, which in theory supports multi-GPU, but I can understand it could be a mess adding that possibility (it is not transparent for the developers, they have to manage the parallelism in their software) when I guess most Daz users use only on GPU.

   The GPUs I have are of different generations, which means, they have different performances and different number of cores - but before anybody replies with a "just use the latest", it is not that simple.
    The newer cards have more CUDA cores and slightly more FP32 performances, but they have lower FP16 and (a lot lower) FP64 calculation performance and less memory, so depending on what format is used by dForce for its calculations, the older or the newer may be faster.
Scientific physics simulations tend to use FP64, hence my doubt (I don't expect Daz Studio to be 100% physics accurate, but one never knows).
I don't know if the fact of changing collision mode has any impact on this.
Thank you in advance for any useful input.

Richard Haseltine · March 2025

I don't think this has been stated. Have you tried running some tests, just draping a fairly high division plane over another primitive for example?

jbowler · March 2025

It seems to be a WIP for NVidia:

https://forums.developer.nvidia.com/t/half-precision-reciprocals-in-opencl/253577

The precision that DAZ ends up using on NVidia cards for rendering seems to be "float" (fp32, 23-bit worst case precision). I say that because on distant objects a variant of "poke through" seems to occur with geoshells and possibly other stuff; objects that are distant enough for the 23 bit precision to be insuffiicent for smallish geoshell offsets. FP32 should go up to almost 1km without loss of precision but it's close. The current IEEE 754 FP16 has only 10 bit precision which is way too little for 3D geometry; 1mm is not reliably representable at just over1m (1024mm)!

The main application for "half float", "fp16", "_Float16", is for tristimulus colour values particularly in HDR computations and this is what NVidia Iray does. In this context bfloat16 is, in fact, better than IEEE754 but it is a very specific application (hum, might work for audio too :-)

Some of dForce could, maybe, be optimised with _Float16 but it is fundamentally geometry and it occurs between objects in the scene so it has to take into account the dimensions of the scene. I can see possible optimisations but honestly I wouldn't expect them to be implemented. Your mixed cards should be fine if you could find some way of using them and regardless of what OpenCL does or doesn't do with _Float16 I don't expect it to impinge on dForce (unless, of course, someone makes a mistake :-)

alofaro · March 2025

Richard Haseltine said:

I don't think this has been stated. Have you tried running some tests, just draping a fairly high division plane over another primitive for example?

Hi, thanks for replying, yes, though I was hoping someone from Daz would chip in and give a definitive answer.
Not their first line support, because I know they don't know their software that well from a technical point of view, they seem to be more there for usage questions - I had some interaction with them, where I could see luckily I am an IT person and was even a DBA, or I would have spent literally weeks to do everything they told me, when actually the problem I had was simply related to a morph (and the hint in the right direction was actually in the fora here on the web site, given by an user).
Someone who is more deeply in the inner details/specifications or development could have answered giving direclty the answer.

   I tried to run some scene through the different cards, and the results are definitively.... inconclusive ! :-D
   There is a slight difference, but we are talking in general of about 2.5 minutes over 157 minues (the other runnings had similar results, in one way or another), less than 2% - even considering that between optimal testing, and real use (I did not have only Daz running on the computer, though the cards I used for the sims where not the one connected to the screen) there is always a distance, this does not reflect the difference between the theoretical card performances, in any area.
   If it was FP16 the older card would be faster, if it was FP64 the older card would be hugely faster (like 8 times the speed), if it was F32 the newer card would be faster, but in a much more visible way (like 30% or so).

   Both cards support OpenCL 3.0, that is why both can be used for dForce.
   OpenCL is meant so that one can run the same application on GPU and CPUs of different makers (Nvidia, AMD, Intel), and can even parallelise between different GPUs and CPUs in the same computer, even from different makers,. It is logical it was used for Daz Studio, so it could work whether there was no GPU acceleration, Nvidia GPU, AMD GPU or Intel GPU.
   There are three ways to parallelise OpenCL, hence take full advantage of having multiple cards and multiple CPUs: 1) the software is made considering parallelisation; 2) the software is made with "standard" OpenCL but uses a parallelisation library that deals with the parallelisation (linked at compilation time); 3) the software uses a tool that allow OpenCL parallelisation at execution (though that was an academic article, I have to check how it was implemented, it may not be possible to use it at all, both for technical and possibly legal reasons).
   But the results, and the fact one can select only one card for dForce, mean Daz Studio is using none of these methods.

   The difference between the non-parallelised Daz Studio OpenCL and the parallelised Iray, is visible in the fact that the same scene I used needed 1h57-1h55 for the simulation, but Iray rendering takes a couple of minutes, because it uses all three card at once.
It is true that a physics body simulation is not exactly the same as rendering, but even with the optimisations of the newer cards, rendering with raytracing is anyway actually a physics simulation (simulating the path or rays of light).

I guess I will have to see if the old trick of starting a second instance still works, so I can have one instance doing dForce on one scene using one card, while I use another instance to work on another scene, and eventually render (excluding from Iray rendering the card used for dForce, just to be on the safe side), in parallel, that can also be a way of being able to work faster and take advantage of CPUs and GPUs in the machine.

alofaro · April 2025

jbowler said:

It seems to be a WIP for NVidia:

https://forums.developer.nvidia.com/t/half-precision-reciprocals-in-opencl/253577

The precision that DAZ ends up using on NVidia cards for rendering seems to be "float" (fp32, 23-bit worst case precision). I say that because on distant objects a variant of "poke through" seems to occur with geoshells and possibly other stuff; objects that are distant enough for the 23 bit precision to be insuffiicent for smallish geoshell offsets. FP32 should go up to almost 1km without loss of precision but it's close. The current IEEE 754 FP16 has only 10 bit precision which is way too little for 3D geometry; 1mm is not reliably representable at just over1m (1024mm)!

The main application for "half float", "fp16", "_Float16", is for tristimulus colour values particularly in HDR computations and this is what NVidia Iray does. In this context bfloat16 is, in fact, better than IEEE754 but it is a very specific application (hum, might work for audio too :-)

Some of dForce could, maybe, be optimised with _Float16 but it is fundamentally geometry and it occurs between objects in the scene so it has to take into account the dimensions of the scene. I can see possible optimisations but honestly I wouldn't expect them to be implemented. Your mixed cards should be fine if you could find some way of using them and regardless of what OpenCL does or doesn't do with _Float16 I don't expect it to impinge on dForce (unless, of course, someone makes a mistake :-)

Thanks, actually, it was not that I wanted them to use FP16, it was more a matter of wondering what it used, but my focus was not on rendering, was on dForce.
Actually, FP16 was introduced in the cards way back in the Volta generation, focusing on machine learning, not graphic or physics at all. Later, they even when with 8 bit int and others even less precise and smaller, because.realised that in many case machine learning applications use for the "weight" of the nodes values that are not even FP.
For rendering, in theory at least, Iray takes care of everything and should even be able to use the specific extra hardware in newer cards to speed things up, plus, it uses all the cards in the computer, so, it is less of a problem for me at the moment.

Anyway, I had to replace one of the newer RTX A5000, because with two one was overheating, and I got some very interesting results.
   I replaced it with a Titan V I also have (same generation as the GV100, it actually came out right before it) and the increase in the speed of the simulations using the Titan V as simulations card is amazing - the same dForce simulation that took ca 1h57 minutes (+/- 2 ca minutes) with a GV100 and with the RTX A5000, takes only ca 37 minutes with the Titan V, a reduction of 68% in the simulation time, and while I was using it also as video output for the computer !!
   The Titan V were originally made for research, but still, it means it is not even the FP32 performances that matter, or better, it seems to confirm that the performances with OpenCL have nothing to do with the raw performances in FP32 or FP64 .
   Unfortunately, I am afraid in maximum a couple of years CUDA will not support anymore these cards, and maybe even Iray :-(, but till when Daz continues to use OpenCL for the dForce simulations, it seems it a good choice (and mine is old soo I had bought it full price, but nowadays one can find one for less than an RTX4090).

Notifications

dForce - fp16, fp32 or FP64 ?

Comments

Adding to Cart…