Rendering an animation... needs a NASA computer?

Ivy · January 2018

great

ebergerly said:

Ivy said:

Both render engines have their propose , Iray is nice for photorealistic still renders , not really all that practical at this time for animations.

Again, why is it not practical for animations? Are you rendering in 4k? Based on what I render, I rarely get over 15 minutes or so per frame, so rendering an animation doesn't seem too painful. That's 4 frames per hour, or 6 hours for 1 second of animation, or 4 seconds per day. Yeah, painful, but that's not considering doing any scene optimizing or post production stuff to cut that down considerably.

I guess I must be just bad at making & setting up 1080HD animations. please lets see some of your results of a 2 minute iray film? :)

ebergerly · January 2018

Ivy said:

I guess I must be just bad at making & setting up animations. please lets see some of your results of a 2 minute iray film? :)

I'm not criticizing, just trying to understand why I get 18 minutes with 4-G3's, an interior scene that's pretty dark with emissives, (and generally get 10-15 minute renders), and others get such longer render times, and why people think Iray is slower than others. I haven't seen any real data that explains it.

DustRider · January 2018

I've got to go, but I wanted to add that when I mentioned that Octane was somewhat faster than Iray, I was thinking in terms of stills. For animations, Octane will be significantly faster than Iray because the scene load time for each frame is ...... um ...... rediculous????

Ivy · January 2018

ebergerly said:

Ivy said:

I guess I must be just bad at making & setting up animations. please lets see some of your results of a 2 minute iray film? :)

I'm not criticizing, just trying to understand why I get 18 minutes with 4-G3's, an interior scene that's pretty dark with emissives, (and generally get 10-15 minute renders), and others get such longer render times, and why people think Iray is slower than others. I haven't seen any real data that explains it.

I dont get it?. I would have to see your 2 minute animation to make any kind of educated guess to how good the results were? I use 2 minutes as a baseline time animation . that is about 10 scenes at about 20 seconds a scene. that show your consistency for render time. . if your getting render times around 15 minutes a keyframe at 30kps then really your not getting all that fast of result.

I have seen tons of 10 or 15 seconds test dancing girl animations with no back grounds or props, it means nothing. want to impress me? show me something real, using real world scenes, rendering back grounds and props. that puts a load on your vram with out a lot of firefly's and graininess.

I try to render HD films . I have no need for 4K films, I'm not rendering for VR. I render at 2440 x1080 raw, to get a final 1920 x1080 HD standard film . my average iray render pre frame like this iray gymnastic animation took about 45 seconds pre key - frame with 300 iterations the average scene is about 300 keyframes, or 10 seconds long. rendering in about a hour in a half pre 10 second scene , after it was done I saved in avi and copy the raw PNG's - the RIB folder located in the temp render folder as well for back up and special scene editing. most of my 3delight animations look more cartoony(which I actually like) & render around 20 - 30 seconds pre keyframe @ 30kfps. But then again I'm not rendering pixar animations either. this is daz after all.. and where are we posting most of these animations Youtube?

ebergerly · January 2018

So FWIW, the 1600x900 scene that took 18 minutes to render finally took 1 hour 40 minutes at 4k. Wow. So yeah, 4k takes a LOT of time to render compared to lower resolutions. That's like 5 times as long. Which kinda makes sense, because that's about 5 times as many pixels I suppose.

nonesuch00 · January 2018

Rendering at higher resolutions lets you see shortcomings in a frame easier.

DustRider · January 2018

ebergerly said:

DustRider said:

I'm sure they have well over 1,000 multicore (18 or more cores??) CPU's available.

Do you know if they use CPU's or GPU's for rendering? Because my Ryzen is an 8 core/16 thread, and to render the scene that took 18 minutes on my GTX 1070 + GTX 1080ti would take about a month on the Ryzen. So even 1,000 multicore CPU's doesnt' seem like a lot compared to 1,000 GPU's.

Anyway, so far it looks like the 1200x800 scene that took 18 minutes on my two GPU's is going to take maybe an hour at 4k (3840x2160). So maybe 3 times as long. Still, nothing close to 29 hours, but now I understand those who render in 4k and take an hour to render.

They use CPU's for rendering. Also, keep in mind that they need to render each frame 3 times for final production. One frame for the 2D movie, and 2 frames for each frame of the 3D movie.

Here is a little blurb on Renderman 21.5 and 22 from Sigraph 2017. Renderman XPU which is CPU+GPU rendering (probably more like GPU assisted) will be available in Renderman 22, the current release is 21.6. The biggest advancement in 21.5 wast the addition of raytraced SSS which bumps up realism on SSS materials a lot, and according the everything I've read, PRman is the only software to do this (at least when released). The problem with doing the kind of rendering that many of the studios need on the GPU is that many of the effects they need don't lend themselves well to GPU processing. GPU's are great for brute force, little logic needed processing, but there are still many functions required in certain studio production effects that GPU processing is horrible at, so pure GPU rendering hasn't been very popular yet. I'm sure what XPU will do (which is really targeting the workstation more than the render farm) is use the GPU for the calculations that GPU's do well (ray tracing), and let the CPU do the stuff it does well. They are also working toward a more real-time rendering environment for the artists workstation, where XPU will be a huge improvement (and no doubt internally their main focus). So, while in it's simplest form the statement "So even 1,000 multicore CPU's doesn't' seem like a lot compared to 1,000 GPU's" may seem,and be, perfectly logical, some things being done on the CPU would be extremely slow on the GPU. So depending on the need, 10 CPU's may run circles around 1,000 GPU's (that's just thinking about the logic missing in GPU's, if you add in that some processes aren't easily scaled to massively parallel processing, which is what you need for efficient GPU processing, then GPU only processing can be even more inefficient).

I'm guessing (hoping) that your exaggerating a bit when you say that your 18 min. GPU render would take a month using CPU?? My tests show my CPU being consistently about 4X-6X slower than GPU rendering (and that's on a laptop). Maybe Iray uses Intel specific optimizations that your Ryzen doesn't have (I would think that Nvidia would have made use of special Intel instruction sets to improve Iray CPU performance)? That is something to keep in mind when designing a box to render with. Any Intel specific optimizations in your render engine (or whatever software you use) may not work with or work as well on non-Intel processors.

Kevin Sanderson · January 2018

Pixar's Renderman recently added noise reduction using the GPU (the GPU just does the noise reduction routine...similar to NeatVideo) so you don't have to be like the maniacs here looking for noise free renders. Blender has something similar to help clean up noisy renders, so you don't have to run so many samples or iterations to get something acceptable.

ebergerly · January 2018

DustRider said:

I'm guessing (hoping) that your exaggerating a bit when you say that your 18 min. GPU render would take a month using CPU?? My tests show my CPU being consistently about 4X-6X slower than GPU rendering (and that's on a laptop).

Actually my Ryzen renders the benchmark scene in about 20 minutes, while my GTX 1080ti does it in 2 minutes, and the 1070 in 3 minutes. And my 1060 in 4.5 minutes. So no, not a month, but it just feels like it

As far as the GPU vs. CPU, from a programming perspective the GPU has thousands of cores (which generally correspond to programming threads where you can do parallel calculations). The Ryzen has 16 threads. There's a lot more to it, but generally the GPU can be assumed to vastly outperform a CPU.

For example, the 1080ti has 3584 cores/threads, and a 1070 had 1970. So there's a vast difference even among high end GPU's. Compared to the Ryzen's 16.

Now I'm just getting started with GPU programming, so I'm not real familiar with what's going on under the hood with CUDA. But it seems clear that if you can take full advantage of the GPU's cores they're way ahead of any CPU.

ebergerly · January 2018

BTW, Dustrider, I think you're saying that the CPU is better at different types of calculations, and that's pretty much the conclusion I've come to. I'm guessing the calcs done by a GPU's threads are far simpler than can be handled by a CPU. I'm guessing the registers in a GPU are smaller, which implies that the calculations are pretty simple. Seems to me a GPU would therefore be better at stuff like image calculations, since each pixel is only some 8-bit numbers (RGB), as compared to stuff you can do on a CPU with big floating point stuff.

DustRider · January 2018

ebergerly said:

BTW, Dustrider, I think you're saying that the CPU is better at different types of calculations, and that's pretty much the conclusion I've come to. I'm guessing the calcs done by a GPU's threads are far simpler than can be handled by a CPU. I'm guessing the registers in a GPU are smaller, which implies that the calculations are pretty simple. Seems to me a GPU would therefore be better at stuff like image calculations, since each pixel is only some 8-bit numbers (RGB), as compared to stuff you can do on a CPU with big floating point stuff.

I think that's a pretty good assesment. Here is a decent article/answer on it. I'm not a serious programmer, but have known several. Some that worked on GPU programming at a rather high level (i.e. not through the sdk), and one that played with render software programming. so I understand the "administrative technical" view, but lack the in depth technical knowledge/experience. Basically, if an operation can be serialized at a massive scale, where only the result is important (no need to evaluate the result), then GPU's work well. If you have a single stream of instructions (pardon my poor techno jargon here, I'm sure it's more or less poor) where the results of one instruction must be evaluated to determine what happens next, then GPU processing is a poor choice. In essence you have all these processors on the GPU waiting to get the next instruction, instead of being busy chewing through calculations. In essence, the more biased the rendering process, the more difficult it is to schedule it properly to keep the GPU busy.

Some of these processes can be programed to work more efficiently in the GPU, but it's my understanding that the Cuda SDK available from Nvidia is a bit lacking in many areas (at least IIRC when the Octane Render group started, they decided not use the Nvidia SDK, because it was lacking and inefficient in many areas ... if you're familiar with both Octane and Iray, you can see the inefficiencies of Iray - it uses one CPU core while rendering in GPU only, Octane hardly touches the CPU, even when rendering with out of core textures). My guess is that XPU in Renderman is being implemented to use the GPU for what it does well, and use the CPU for what it does well, so the best of both worlds. The tough part is the scheduling of processes so that neither the CPU or the GPU are sitting around waiting for each other. Keep in mind that the typical workstation where the user needs to render images at Pixar (or most CGI studios) probably (just guessing here) has a minimum of 16 cores and 32 threads, so XPU would help make IPR rendering for pre-visualization very useful, even with huge scenes.

Kevin Sanderson · January 2018

If you have a lot of time, hard work and talent, you can do some pretty cool animation in DAZ Studio. The best example was done several years ago by Jesus Orellana, doing all the hard time consuming stuff in DAZ Studio and special vfx in Blender (this was before Iray in Studio) and heavy postwork. Here's an interview and the short: https://www.blendernation.com/2011/11/22/rosa/ and an interview detailing a little more https://www.shortoftheweek.com/news/interview-with-jesus-orellana-rosa/ It was started on an iMac and completed on a Mac Pro.

DustRider · January 2018

Kevin Sanderson said:

If you have a lot of time, hard work and talent, you can do some pretty cool animation in DAZ Studio. The best example was done several years ago by Jesus Orellana, doing all the hard time consuming stuff in DAZ Studio and special vfx in Blender (this was before Iray in Studio) and heavy postwork. Here's an interview and the short: https://www.blendernation.com/2011/11/22/rosa/ and an interview detailing a little more https://www.shortoftheweek.com/news/interview-with-jesus-orellana-rosa/ It was started on an iMac and completed on a Mac Pro.

I'd forgotten about that one. Amazing work he did!

Kevin Sanderson · January 2018

Yeah, it was amazing. it helped punch his ticket to Hollywood, but he learned that Hollywood can be a very hard place. He's working on a graphic project and revealing some of his original art on facebook. https://www.facebook.com/orellanakun/

DustRider · January 2018

Thanks for the link!

Yep, it can be really tough in the world of Hollywood CGI/animation. Hasn't helped at all with a lot of the work going elsewhere for really cheap. Plus there are so many who want to work for the studios that the competition for positions is really tough. I watched a documentary about Pixar 3 or 4 years age, and IIRC the person being interviewed said they got over a thousand applications a week.

kyoto kid · January 2018

..another benefit of CPU rendering is total memory support. The most VRAM available today is 24 GB on the 5.000$ Quadro P6000 . I wouldn't doubt that the servers in these massive render farms have 64 or even 128 GB of physical memory per machine. Multiply that by 1,000 blades, each with say, dual 18 core hyperthreading processors, and that is a lot of networked rendering power (72,000 total processor threads and 64 to 128 Terabytes total memory to throw at a project). A major studio may have two or more such facilities available as needed

The "upper limit" is only restricted to hoe much memory each MB supports (and there are consumer MBs which can support 512 or even 1 TB of physical memory) Imagine a 1,000 blade render farm with the total memory resources of 1 Petabyte each running dual 32 core Epyc processors (128,000 threads).

WendyLuvsCatz · January 2018

maybe

kyoto kid · January 2018

....8 times the cores but still limited to the VRAM of a single card, so if the scene load exceeds that, all of those cores become useless as the process dumps to the CPU.

DrNewcenstein · January 2018

For $50K I would expect Nvidia to have rigged the VCA to be viewed by the computer as one giant card, and not 8 separate cards. I had to remove excess system resources from Device Manager just to get 2 internal and one external GPU running (things like front USB ports, and one HDD port by accident if I'm not mistooken). The 4-GPU cluster helped a bit but each card was still viewed as a separate device by Device Manager (and then there was the Titan Z which is always viewed as 2 cards), so I can't imagine what my machine would look like if it actually wanted to read 8 GPUs separately.

Then again I also have 7 internal drives....

However, the VCA is typically loaded with 16GB-24GB cards (M6000 - P6000), so if your scene exceeds 16GB, you've made a terrible mistake somewhere.

As for animations in DS taking several hours, it depends on what you have going on in the scene, just like with stills. All the geometry in the scene is in the VRAM and in the rendering calculation because it's not able to ignore what's not in the active camera frame. It took 6 hours to render a 1920x1080 7 second sequence for my dForce Terrains tutorial of a figure walking through a water plane. That was rendering to a movie, not an image sequence.

I did see that Rosa video a couple of years ago. It's impressive for sure, but it's far from being commonplace, and while that''s good for him, it's bad for DS in the sense that it's not being seen as the animation tool it should be since it already has access to a gigantic library of ready-to-use (i.e. rigged) figure assets. I mean it just strikes me as absolutely bizarre that you'd spend so much time crafting 3D figures with a proprietary rig and then focus so hard on making the software that reads that proprietary rigging better at spitting out 2D images than full animations.

It. Doesn't. Make. Sense.

kyoto kid · January 2018

...unfortunately GPU memory does not pool for rendering purposes only compute purposes and you need NVLink to even do that.

FBR · January 2018

Disney's Big Hero 6 was rendered on a 55,000-core supercomputer spread across four geographic locations.

Without such insane tech home animation will always be a quality compromise and even then mere seconds of animation can takes a day to render. I've been there and worse after viewing realised ti wasn't right and had to start again.

At 4k and a home budget I think not

Maybe 720p and even then it will be sloooooooooow

Your best bet is to find an affordable render farm.

DrNewcenstein · January 2018

kyoto kid said:

...unfortunately GPU memory does not pool for rendering purposes only compute purposes and you need NVLink to even do that.

Then Nvidia certainly has their work cut out for them.

kyoto kid · January 2018

...indeed they do.

ebergerly · January 2018

DustRider said:

Basically, if an operation can be serialized at a massive scale, where only the result is important (no need to evaluate the result), then GPU's work well. If you have a single stream of instructions (pardon my poor techno jargon here, I'm sure it's more or less poor) where the results of one instruction must be evaluated to determine what happens next, then GPU processing is a poor choice. In essence you have all these processors on the GPU waiting to get the next instruction, instead of being busy chewing through calculations. In essence, the more biased the rendering process, the more difficult it is to schedule it properly to keep the GPU busy.

Yeah, I think you're talking about "parallel processing". If you're familiar with a "for" loop in any programming language, in C# and others there's a "parallel.for" loop. Basically that takes a for loop of many instructions (like serially stepping thru each pixel in an image and modifying the RGB values one after the other), and converts it into a parallel process. Which means it assigns each pixel's calculations to a separate thread, and all the calcs are done at the same time, rather than one after the other. So in my 16 thread Ryzen, 16 pixels would be calculated simultaneously, rather than one after the other.

With my GPU, I'm using a library that has a "gpu.for" loop, which does something similar, but for the thousands of GPU threads. Presumably, it assigns the caclulation for each pixel to each of the thousands of threads. So instead of only 16 pixels at a time, the GPU is calculating 3,800 pixels at a time.

Again, that's a gross over-simplification, and I'm not sure the GPU does it exactly that way, but you get the point. The big question is whether the exact same CPU calculation can be done by the GPU, or whether each CPU instruction needs to be broken up into multiple, simpler calculations for the GPU, for example. Maybe the separate R, G, & B values can be calculated in parallel by the CPU, but need to be broken up into 3 serial calcs by the GPU, or something like that.

ebergerly · January 2018

By the way, if anyone has rendered in Blender Cycles (and other) renderers, you'll see something like a "tile size" setting for the renderer. And with a GPU you want to set that to a big number, but a CPU you want it small. I think the reason is pretty much what I was discussing about threads. As each pixel gets calculated by each parallel thread, it updates the rendered image preview. With a GPU there's a lot more simultaneous threads, therefore you want to assign it more "tiles" (threads) to munch on.

So as you see the blocky, rendered image preview updating in real time, what you're watching is each GPU thread finishing its calculations for each pixel. Or something like that.

Padone · January 2018

If we talk about Cycles, in my experience the tile size at 64 is fine both for GPUs and CPUs. There's some difference if you vary it from 64 but not too much for the average scene. What does the big deal over iRay is the denoiser filter. Cycles really does a good job with it and you can render at a fraction of the time. That's good for animation.

Personally I'd avoid using DAZ Studio and iRay for animation, for more than one reason. But if I had to then I'd go with the denoiser in post as suggested by @th3Digit. Also optimizing the scene can really make a big difference in rendering time.

ebergerly · January 2018

Padone said:

If we talk about Cycles, in my experience the tile size at 64 is fine both for GPUs and CPUs.

You sure about that? I think if you do some tests you'll find a huge difference between GPU and CPU's. And there's even online videos showing others' tests.

ebergerly · January 2018

BTW, if you fast forward to 11:15 on Andrew's video below you can see him discussing the effects of tile size on CPU vs. GPU. It's huge.

https://www.youtube.com/watch?v=8gSyEpt4-60

He shows that for the BMW benchmark, a tile size of 16 is best for CPU (42:30), and 512 for GPU (13:56). If you do 64 for GPU it will take almost 41 minutes. Big difference

Padone · January 2018

Well it depends on what we consider a "huge difference". In the best case you get a 25% gain on GPU by using 256 over 64. But since the tile size does matter for memory and speed when using the denoiser, 64 does just fine for me. That is, with the denoiser you render 10x faster. So that's the big deal.

http://adaptivesamples.com/2013/11/05/auto-tile-size-addon-updated-again/

https://docs.blender.org/manual/en/dev/render/cycles/settings/scene/render_layers/denoising.html

From the link above. "When using GPU rendering, the denoising process may use a significant amount of vRAM. If the GPU runs out of memory but renders fine without denoising, try reducing the tile size."

ebergerly · January 2018

Padone said:

Well it depends on what we consider a "big gain". In the best case you get a 25% gain on GPU by using 256 over 64.

If you use your 64 tile size on the BMW reference it renders in about 41 minutes. If you crank it up to 512 it renders in about 14 minutes. That's a huge improvement by anybody's measurements isn't it? It renders in 1/3 the time.

Notifications

Rendering an animation... needs a NASA computer?

Comments

Adding to Cart…