GPU crashes after render?

Hey there, first time poster but long time lurker in the forums. I've been using Daz Studio for several years now and haven't faced any major problems that I wasn't able to fix on my own with the help of the Internet, but in the past few days I've been encountering a really annoying issue that I can't seem to fix or even identify the cause. Basically as the title states, my GPU "crashes" (not sure if that's the correct technical term) after I do an Iray render in Daz Studio, I think the same issue sometimes occurs when I do an Iray preview in the viewport as well. I use GPU-Z to track my GPU status and very strangely, the GPU voltage drops to zero and the other GPU sensors show a "--" value (see the attached screenshot). When this happens Daz Studio freezes if I haven't closed it yet, and when I try to re-open Daz Studio, I get a message saying something about an OpenGL error. If I open Nvidia GeForce Experience, it says something like the driver can't be detected. The problem sometimes resolves itself if I restart the computer, or if I disable and re-enable the graphics driver in Device Manager, but this gets quite time-consuming and sometimes I encounter the blue screen of death when I try to re-enable the graphics driver. Here are the specs for my computer (I'm using a laptop):

Laptop model: Dell XPS 15 9570
Operating system: Windows 10 Home, 64 bit
CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 16.0 GB RAM
GPU: Nvidia GeForce GTX 1050 Ti with Max-Q Design Driver Version 462.59
Daz Studio Version 4.15.0.2 Pro edition (64 bit)

I've tried to do some troubleshooting myself, but no luck:

  1. I checked the log file, but I can't find anything out of the ordinary (no warning/error messages at the point of the crash).
  2. I suspected something might have gone wrong with graphics driver software as I had recently been fiddling with the settings in the Nvidia Control Panel, so I uninstalled and did a clean installation of the most updated Nvidia Studio driver, but the error still happened. I also tried rolling back the driver to the previous version, again no change.
  3. I used OCCT to do a GPU stress test and it couldn't detect any errors when I ran the test for around half an hour.
  4. I uninstalled and re-installed Daz Studio via DIM, nothing changed.

I'm not much of a tech person so I really can't figure out what's going on, and from what I can tell this problem hasn't been reported before. I'm not sure if it's because I overworked my GPU with overnight renders or if I accidentally changed something I wasn't supposed to...

If anyone has any ideas I would be really appreciative of the help! And thank you for reading this wall of text smiley

Comments

  • Catherine3678abCatherine3678ab Posts: 8,010
    edited July 2021

    Download DIM, download/install the beta. Update the NVidia drivers. Use the beta. Hope that works ... we're waiting for the beta to be polished for release hopefully before I lose all my hair.

    Unless you've already using DIM, after installing the beta, it's okay to uninstall DIM.

    Post edited by Catherine3678ab on
  • jbowlerjbowler Posts: 742

    It looks like you are running out of GPU memory in the render; the 1050 has 4GByte (I believe), the GPU-Z suggests to me that you hit that.  In my experience earlier versions of either Daz (I'm using 4.15.0.14) and/or NVidia (but I'm using 462.59 like you - I haven't upgraded to the June Studio driver yet) would latch up if an Iray operation (render or preview) hit the available memory and not recover until a reboot, i.e. until the driver was restarted.  I have to assume this is a bug in NVidia and my current system (4.15.0.14+462.59) seems much much more robust, but maybe it was a bug that Daz has managed to work round.  For certain closing down DAZ should not leave the NVidia driver broken and it it does for certain it is an NVidia bug, so it seems that it is not fixed yet.

    You could try the beta, but other than that the only possibility I am aware of is to watch the GPU-Z memory and trim the scene to fit the available memory.  In general I found that I could see an OOM (Out of Memory) report in the DAZ Studio log file; open the log file from DAZ, note the full path name (do a save as from TextEdit to get the path) and then, after a failure, open that path.  Restarting DAZ now destroys the old log file (it used to just append) so to discover the error messages in cases like this you have to know where the log file is (it's in %AppData%/DAZ 3D/Studio4 but I find it easier just to copy it from the TextEdit save as... dialog then pin that path).

  • PerttiAPerttiA Posts: 9,420

    If your GPU crashes, you must restart the computer before the GPU drivers are back to fully functional state.

  • Thank you for the suggestions! I tried the beta but unfortunately the problem is still present (although now when the GPU crashes, the Daz Studio window automatically closes instead of the whole screen freezing up so I guess that's a slight improvement...?)

    I've also opened the log file just after the crash (without re-opening Daz Studio), but again there are no warning/error messages at the point of the crash.

    It looks like you are running out of GPU memory in the render; the 1050 has 4GByte (I believe), the GPU-Z suggests to me that you hit that.

    I'm a bit doubtful the GPU memory running out is the issue here; I have CPU fallback enabled, so Daz Studio will fall back to rendering with CPU if the scene is too big to fit into the GPU (GPU-Z will show that the GPU load is 0%). Besides, I've tried rendering an older scene which I could previously render without any issues, and the GPU still crashes after the render.

    What I find really puzzling is that this issue only just cropped up in the past week or so, and I can't figure out what has changed to cause this issue to happen. I've tried uninstalling the most recent Windows update (as a friend suggested to me), but that didn't resolve the problem. I'm starting to suspect that either (1) there's a bug inherent in Daz Studio and/or the Nvidia driver that I somehow managed to trigger, or (2) my Nvidia graphics card hardware has become faulty as a result of overworking it. I use my laptop quite a lot daily so it'll be quite inconvenient for me to swap out the graphics card, I hope to exhaust other potential solutions before getting to that.

  • IceCrMnIceCrMn Posts: 2,107
    edited July 2021

    You posted that you were using "...GPU: Nvidia GeForce GTX 1050 Ti with Max-Q Design Driver Version 462.59..."

    Are you certain there isn't an issue with that custom driver?

    Does the official Nvidia driver have the same problem?

    Post edited by IceCrMn on
  • jbowlerjbowler Posts: 742

    IceCrMn said:

    You posted that you were using "...GPU: Nvidia GeForce GTX 1050 Ti with Max-Q Design Driver Version 462.59..."

    I think the hardware is a gaming laptop:

    https://www.nvidia.com/en-us/geforce/gaming-laptops/

    So far as I know there is just one "Studio" driver, though the fact that the hardware has the "Max-Q technology" might be relevant, maybe the Game Ready driver would be better?  The problem is that there are a combinatorial number of driver/DAZ version/Windows versions upgrades/downgrades to try, and it might just be flakey hardware...  So it's better to find some evidence of what the problem is from the DAZ logs, or the Windows logs or the GPU-Z log.  In particular GPU-Z can display the total memory used; click on the "Memory Used" triangle drop down and select "Show Highest Reading".  If that is nowhere near 4096 MB then memory isn't a  problem, unless it's a hardware failure in the GPU memory...  The same can be done for other things like the temperatures; anything that looked high (or maybe low) just before the readings started to disappear.

    Since the failure is reproducible and the nature of the failure changes with the DAZ Studio beta that suggests it's a problem within DAZ Studio and its NVida container.  What is more the whole GPU never seemed to have crashed - the GPU-Z screen shot was obtained after the "crash" and I assume the laptop screen is driven by the 1050 so at least that part of the GPU is still working.

    My understanding is that in theory a crash in an app using the GPU should be isolated to that process because the NVidia driver allocates a "container" for each process.  However containers share resources, not just memory (although that is the one that the resource they all share) but also stuff like the CUDA cores.  So when a DAZ Studio process freezes as a result of an internal cache it can leave memory, CUDA cores and no doubt other resources locked up.  What is more other apps use the GPU for 3D rendering - Google Chrome, the Windows "Start" menu, the Windows Screen Locker.  Chrome is particularly GPU hungry and real graphics apps like Blender and PhotoShop etc even more so.  Chrome updates automagically (as do other browsers using the same underlying tech) and certainly did so "in the past few days."  If there is a resource related problem changes like that could provoke it.

    I'd recommend using the clean system approach - close everything down on the laptop and restart it (not shutdown/start, which just does a sleep by default; "restart".)  Then, without running anything else (including GPU-Z), see if the problem is reproducible.  Assuming it is either post the log file (from the point where the render starts) after examining it for sensitive information or just the end from the line:

    2021-07-02 08:48:23.758 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Updating geometry.

    If the problem doesn't repro then that's a way to continue using DAZ Studio, although hardly ideal, and it allows careful re-introduction of other apps (particularly the web browser) to see if it's an interaction there.

    It's also true that the weather in the US has been particularly hot for the last few days; if there's a hardware problem that would certainly be a factor.  Try testing at night, or cranking the AC up.

  • IceCrMn said:

    You posted that you were using "...GPU: Nvidia GeForce GTX 1050 Ti with Max-Q Design Driver Version 462.59..."

    Are you certain there isn't an issue with that custom driver?

    It isn't a custom driver, I downloaded it straight from the Nvidia website at https://www.nvidia.com/Download/index.aspx?lang=en-us.

    jbowler said:

    In particular GPU-Z can display the total memory used; click on the "Memory Used" triangle drop down and select "Show Highest Reading".  If that is nowhere near 4096 MB then memory isn't a  problem

    Just to confirm that memory isn't the problem here, I rendered a blank scene with only the basic G8M model in it and using the default HDRI lighting. The GPU still crashed after the render was finished (memory used was around 2000 MB if I remember correctly, way below 4096 MB in any case).

    jbowler said:

    I'd recommend using the clean system approach - close everything down on the laptop and restart it (not shutdown/start, which just does a sleep by default; "restart".)  Then, without running anything else (including GPU-Z), see if the problem is reproducible. 

    I think you might be on to something here, I tried this and the GPU didn't seem to crash after a render! (Or at least, it took longer than usual to crash.) Granted, I only tried this once, but I'll do more testing over the next weekend when I have more free time to see if it's just a one-off fluke or if there is indeed an app that's interfering with the GPU rendering in Daz Studio.

  • jbowlerjbowler Posts: 742
    edited July 2021

    tetrahedrane said:

    I think you might be on to something here, I tried this and the GPU didn't seem to crash after a render! (Or at least, it took longer than usual to crash.) Granted, I only tried this once, but I'll do more testing over the next weekend when I have more free time to see if it's just a one-off fluke or if there is indeed an app that's interfering with the GPU rendering in Daz Studio.

    You've consistently said "after a render"; does that mean that the problem appears after the Iray render window reports that the convergence threshold has been reached?  (The log file will contain, at the end:

    2021-07-03 14:33:28.600 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Received update to 05914 iterations after 17065.672s.
    2021-07-03 14:33:28.600 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Convergence threshold reached.
    2021-07-03 14:36:24.096 Iray [INFO] - IMAGE:IO ::   1.0   IMAGE  io   info : Saving image "G:\Production\Daz3D\beta-temp\render\r_canvases\r-Full-Beauty.exr", pixel type "Rgb_fp", 4320x7680x1 pixels, 1 miplevel.
    2021-07-03 14:36:48.979 Saved image: G:\Production\Daz3D\beta-temp\render\r.png
    2021-07-03 14:36:49.012 Finished Rendering
    2021-07-03 14:36:49.101 Total Rendering Time: 4 hours 48 minutes 8.23 seconds

    Without the line in italics if you aren't rendering to a canvas, and with different paths for the saves but the same "r" file name.)

    If the problem is happening after rendering is finished or when it is closing down that provides quite a lot of information about where the problem is, particularly if the "r" files don't get written, or are incomplete.  There are also several easy debug possibilities (such as cancelling the render early.)

    Post edited by jbowler on
  • tetrahedranetetrahedrane Posts: 9
    edited July 2021

    Yes I do mean after the render, the crash happens either after I cancel the render prematurely (e.g. when I just want to do a quick preview of the final render) or after the convergence threshold is reached (like with the G8M test render described above). It doesn't immediately crash though, usually there is a delay of a few minutes before the GPU crashes. Here's the end of the log file for the G8M test render, there are no lines after this as the GPU crashed around a minute after this and I haven't opened Daz Studio since then.

    2021-07-04 18:37:06.417 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Received update to 00155 iterations after 11.866s.

    2021-07-04 18:37:06.459 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Convergence threshold reached.

    2021-07-04 18:37:07.378 Saved image: C:\Users\XYZ\AppData\Roaming\DAZ 3D\Studio4 Public Build\temp\render\r.png

    2021-07-04 18:37:07.383 Finished Rendering

    2021-07-04 18:37:07.414 Total Rendering Time: 14.1 seconds

    2021-07-04 18:37:07.420 Loaded image r.png

    2021-07-04 18:37:07.431 Saved image: C:\Users\XYZ\AppData\Roaming\DAZ 3D\Studio4 Public Build\temp\RenderAlbumTmp\Render 1.jpg

    2021-07-04 18:37:24.247 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : Device statistics:

    2021-07-04 18:37:24.247 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend info : CUDA device 0 (NVIDIA GeForce GTX 1050 Ti with Max-Q Design): 129 iterations, 2.912s init, 9.935s render

    If I'm doing an Iray preview, the crash happens a few minutes after I switch back to Shaded Texture view, though sometimes I've noticed that the crash happens in the middle of the preview if I leave it running for long enough. I don't think I've tried a full overnight render since the problem popped up.

    Post edited by tetrahedrane on
  • jbowlerjbowler Posts: 742

    tetrahedrane said:

    If I'm doing an Iray preview, the crash happens a few minutes after I switch back to Shaded Texture view, though sometimes I've noticed that the crash happens in the middle of the preview if I leave it running for long enough. I don't think I've tried a full overnight render since the problem popped up.

    How do you know it has crashed?  I.e. what are the symptoms?

  • If I have GPU-Z open, the GPU voltage reading drops to 0 V and the other GPU sensors show a "--" reading, as I mentioned in the first post of this thread.

    Also, the Daz Studio window (and sometimes the entire computer screen) would either freeze up (for the non-beta Daz Studio), or close itself (for the beta).

  • IceCrMnIceCrMn Posts: 2,107

    I got bored and did a duckduckgo search for your laptop.

    https://www.dell.com/community/XPS/Problems-with-Integrated-Graphics-XPS-15-9570/td-p/7564837

    First post on that thread ;

    "... What I noticed while looking at task manager in-game is that the integrated graphics card will sometimes randomly step in driving my 1050 ti to go to 0% while the integrated GPU gets maxed out at 100%..."

    Seems there are a high number of reports of GPU issues with that model.

    Some owners have found soluitions, some haven't.

    Dell has released 2 BIOS updates in an effort to fix some of the GPU, CPU, theramal, and other issues to varying success.

    Google search for "dell xps 15 9570 GPU issues"

  • jbowlerjbowler Posts: 742

    tetrahedrane said:

    If I have GPU-Z open, the GPU voltage reading drops to 0 V and the other GPU sensors show a "--" reading, as I mentioned in the first post of this thread.

    Also, the Daz Studio window (and sometimes the entire computer screen) would either freeze up (for the non-beta Daz Studio), or close itself (for the beta).

    Ok, so GPU-Z (if it is running) *always* shows the unexpected sensor values and DAZ Studio freezes (the drop down menus at the top of the window do not highlight when the mouse cursor crosses them) but sometimes the whole screen will freeze - is the mouse cursor still visible then?  I think you also said that sometimes the whole system will blue screen.  I can't see your original GPU-Z screenshot any longer, but can you check the maximum ("Show Highest Reading") of the "GPU Temperature" sensor value (third from the top in GPU-Z 2.40.0) and post one screen shot of the *bottom* of the "Advanced" tab (i.e. down to the "Adjustment Range" on the "Temperature Limit" pane) and another of the "maximum" sensors values?  (For the latter go into the GPU-Z "Settings" window [top right button], click on the "sensors" tab and select "Highest" for "sensor display mode").

  • tetrahedranetetrahedrane Posts: 9
    edited July 2021

    jbowler said:

    Ok, so GPU-Z (if it is running) *always* shows the unexpected sensor values and DAZ Studio freezes (the drop down menus at the top of the window do not highlight when the mouse cursor crosses them) but sometimes the whole screen will freeze - is the mouse cursor still visible then? 

    If only Daz Studio freezes, I can still move the mouse cursor and force the window to close. But if the entire screen freezes then I can't move the cursor at all and I have to hold down the power button to force the laptop to shut down and restart.

    can you check the maximum ("Show Highest Reading") of the "GPU Temperature" sensor value (third from the top in GPU-Z 2.40.0) and post one screen shot of the *bottom* of the "Advanced" tab (i.e. down to the "Adjustment Range" on the "Temperature Limit" pane) and another of the "maximum" sensors values?

    I've attached a screenshot of the Temperature Limit pane in GPU-Z. I'm a bit busy these few days to run Daz Studio, but I don't recall the highest reading of GPU Temperature hitting above 90 degrees for the G8M test render that crashed. I could check again over the weekend to be sure.

    IceCrMn said:

    I got bored and did a duckduckgo search for your laptop.

    https://www.dell.com/community/XPS/Problems-with-Integrated-Graphics-XPS-15-9570/td-p/7564837

    First post on that thread ;

    "... What I noticed while looking at task manager in-game is that the integrated graphics card will sometimes randomly step in driving my 1050 ti to go to 0% while the integrated GPU gets maxed out at 100%..."

    Seems there are a high number of reports of GPU issues with that model.

    Hmm I've looked at the thread you've linked and it does sound kind of similar to the problem I'm facing. If it's indeed the same problem then I suppose there isn't really much I can do about it. I won't be too surprised if that actually is the problem though, I guess very graphics intensive programs like Daz Studio were never really meant to run on laptops :P

     

    GPU-Z temperature limit.png
    335 x 161 - 6K
    Post edited by tetrahedrane on
  • PerttiAPerttiA Posts: 9,420

    tetrahedrane said:

    Hmm I've looked at the thread you've linked and it does sound kind of similar to the problem I'm facing. If it's indeed the same problem then I suppose there isn't really much I can do about it. I won't be too surprised if that actually is the problem though, I guess very graphics intensive programs like Daz Studio were never really meant to run on laptops :P

    No, and the reason is non-standard components/configurations and the non-Nvidia integrated GPU, also the temperatures can be a problem if the computer is not cleaned often enough and it's a hot summer... 

  • jbowlerjbowler Posts: 742

    There a still many possibilites and, indeed, @IceCrMn's link suggests yet another quite possible explanation.

    tetrahedrane said:

    If only Daz Studio freezes, I can still move the mouse cursor and force the window to close.

    The "close" button at the top right is part of Windows, not DAZ Studio.  If you click on it and the Microsoft "end task" window appears then the DAZ Studio UI thread is apparently frozen, otherwise (regardless of whether anything happens) the UI thread is not blocked; this is why I asked about the DAZ Studio menus; if you mouse over those and they don't highlight then the UI thread is blocked (assuming it isn't in a modal dialog, which it apparently isn't in this case.)  So what does "force the window to close" mean, how did you force it?  If the UI thread is blocked there are ways of getting more information (indeed with a debugger it's very easy, but Windows Task Manage can provide help too.)

    But if the entire screen freezes then I can't move the cursor at all and I have to hold down the power button to force the laptop to shut down and restart.

    Ok, but did you try ctrl-alt-del?  I.e. where you able to get the monitor screen (blue screen with Lock/Switch User/Sign Out/Task Manager/Cancel) to appear.  If not that strongly suggests a hardware issue.

    I've attached a screenshot of the Temperature Limit pane in GPU-Z. I'm a bit busy these few days to run Daz Studio, but I don't recall the highest reading of GPU Temperature hitting above 90 degrees for the G8M test render that crashed. I could check again over the weekend to be sure.

    I find the "default" value weird - on my Titan XP (which is, IRC, pretty much just the prosumer version of your card) the default and current are both 84C but then I haven't altered them since I bought the card.  Your "current" figure is consistent with this post:

    https://www.dell.com/community/XPS/XPS-9570-GPU-temp-issue-how-to-deal-with-it/td-p/7774015

    What I was looking for is a suggestion that you might have, or have had, miner malware on your computer.  This stuff uses the GPU to do arithmetic and aims to do so without you knowing it's there.  I saw reports a few months back of some piece of malware that disabled, or maxed, the temperature limit causing potential damage to the GPU.  I would not be surprised if such malware attempted to hide the sensors from other apps, such as GPU-Z, in the hope that the user of the computer wouldn't notice.

    The other thing is that you said you used the NVidia control panel:

    I suspected something might have gone wrong with graphics driver software as I had recently been fiddling with the settings in the Nvidia Control Panel, so I uninstalled and did a clean installation of the most updated Nvidia Studio driver, but the error still happened.

    I don't think a re-install will alter the temperature settings; I believe they are stored in the card hardware (they can be set from the BIOS before the system boots.)  In any case if you did set the default to 94C at some point and Dell didn't promptly respond by resetting the current to 75C that might damage the card, likewise if some malware temporarily overrode the Dell settings.

    Current malware is the more likely of those two explanations because it might well kick in after a few minutes of GPU idle; i.e. waiting for you to stop using the computer.

     

    IceCrMn said:

    First post on that thread ;

    "... What I noticed while looking at task manager in-game is that the integrated graphics card will sometimes randomly step in driving my 1050 ti to go to 0% while the integrated GPU gets maxed out at 100%..."

    Seems there are a high number of reports of GPU issues with that model.

    Hmm I've looked at the thread you've linked and it does sound kind of similar to the problem I'm facing. If it's indeed the same problem then I suppose there isn't really much I can do about it. I won't be too surprised if that actually is the problem though, I guess very graphics intensive programs like Daz Studio were never really meant to run on laptops :P

    I'd assumed that the laptop just used the NVidia GPU.  Apparently it can swap between the two, presumably with a screen redraw because while I can see that the display port hardware could be shared I can't see how hte frame buffer can.  It's possible such a swap could confuse DAZ Studio - it does use a very old version of Qt to handle the screen - and the reports suggest that the laptop might disable the GPU possibly if it starts to overheat.  I guess some change to the temperature profiles on the GPU might end up causing the swap to start happening, but it seems most likely that if it *is* happening it is a response to GPU idle.  Maybe the laptop detects the GPU has (apparently) been idle for a couple of minutes and switches it off...

    But why did it suddenly start happening?   Maybe a Microsoft Windows update, more likely a Dell firmware update.

    DAZ Studio does retain resources in the GPU after the first render/preview - the GPU-Z memory goes up then comes down but it doesn't return to the original value.  If a render window is kept open the GPU memory won't go down (at least with 4.15.0.*); this is how DAZ Studio handles the "resume".  The memory is only released when a new render starts (at this point "resume" of the previous render stops being possible.)  The Iray preview does also go idle eventually; as with the render the preview rendering does have some kind of end condition.

    Anyway, this hypothesis is easy to test - run two instances of DAZ Studio at once, put one into Iray preview (or do a dForce simulation) and see if a second instance now survives a couple of minutes after ending the Iray preview.  A negative result (i.e. no "crash") doesn't completely prove that it is GPU idle that causes the problem but it would be a strong indicator.

  • tetrahedranetetrahedrane Posts: 9
    edited July 2021

    jbowler said:

    The "close" button at the top right is part of Windows, not DAZ Studio.  If you click on it and the Microsoft "end task" window appears then the DAZ Studio UI thread is apparently frozen, otherwise (regardless of whether anything happens) the UI thread is not blocked; this is why I asked about the DAZ Studio menus; if you mouse over those and they don't highlight then the UI thread is blocked (assuming it isn't in a modal dialog, which it apparently isn't in this case.)  So what does "force the window to close" mean, how did you force it?  If the UI thread is blocked there are ways of getting more information (indeed with a debugger it's very easy, but Windows Task Manage can provide help too.)

    Sorry, I misunderstood your initial question. If I remember correctly, the Daz Studio menus were not responsive. I forced the window to close by using clicking on the close button at the top right (which brings up a pop-up window that says that Daz Studio is unresponsive) or by using Task Manager to force it to close.

    Ok, but did you try ctrl-alt-del?  I.e. where you able to get the monitor screen (blue screen with Lock/Switch User/Sign Out/Task Manager/Cancel) to appear.  If not that strongly suggests a hardware issue.

    Yes I tried ctrl-alt-del, but there was no response even after waiting for a few minutes, which is why I had to resort to holding down the power button to force shutdown the laptop. (To clarify, I only had to do this if the entire screen freezes. If only the Daz Studio window freezes then I can still use ctrl-alt-del to bring up Task Manager and force the window to close.)

    I find the "default" value weird - on my Titan XP (which is, IRC, pretty much just the prosumer version of your card) the default and current are both 84C but then I haven't altered them since I bought the card.

    I didn't actually notice that before you pointed it out! But I don't know if it was like that before or if it changed more recently. If there's a malware as you suggested, how do you think I can get rid of it?

    Post edited by tetrahedrane on
  • jbowlerjbowler Posts: 742

    tetrahedrane said:

    If there's a malware as you suggested, how do you think I can get rid of it?

    Make sure Windows is up-to-date (there was a big security fix in the June update) and that Window Defender is up-to-date; check Settings/Update and Security/Windows Security, eyeball all the icons on that page for the warning triangle (I just did it and noticed there's a new thing called "Reputation based protection" which could have some relevance to crypto-mining malware, crypto-jacking).  Go into the "Virus Protection" page and double check that it is running (it should have run very recently - the "last scan" should be in the last few days).  If anything does show up there should be enough information to find out how to remove it.

    When you next encounter the problem open the Windows Task Manager, got to Processes, make sure "GPU" and "GPU engine" are checked in the right-click drop-down for the column names at the top and sort by "GPU".  This should show every process that is connected to a GPU and which GPU is in use.  I think "GPU 0" is the Intel 630, "GPU 1" is the NVidia one (that's what I see on my system).  Anything that shows up using the NVidia (probably "GPU 1") is interesting; DAZ Studio should be there.  I believe (but don't know) that Chrome and maybe some of the 3D Microsoft widgets may be using it; they don't on my system because the (only) monitor is connected to "GPU 0" but I don't know what happens with the Dell setup.

    Since the system is basically idle you don't expect to see any GPU activity to speak of, there will be peaks if you move a window or interact with the system (a few %) and the Microsoft "Desktop Window Manager" grabs 5% or so every now and then.  Sort by "GPU Engine" to find every process associated with a GPU, look for anything on GPU 1.  It's worth doing this immediately after a reboot to get a feeling of what is there normally (though, of course, different things may be running for a few minutes after a reboot.)  It's quite possible there will be some Dell specific process doing something; normally a search on the web for the process name will reveal what it is.

    If something has managed to make the NVidia GPU invisible then the above might show something if the action was malicious but if you are looking at a hardware problem or a simple driver bug it's more likely that nothing will show up or, indeed, that the task manager might hang (I've seen that happen).  ctrl-alt-del then select the task-manager sometimes helps with that.  A quick way of finding out whether the GPU is disabled or hijacked is to wait a little (a couple of minutes), reboot and immediately check the current temperature with GPU-Z - if it is well below 75C (Dell's limit) the GPU wasn't idle, if it is at or above 75C that's suspicious.

    If the task manager is running ok you can swap to the "Details" tab, find DAZStudio.exe and right click then select "Analyze wait chain".  Since the DAZ UI thread is hung it should say something other than "process is running normally".  Do that for DAZ Studio (if possible) and also for the "NVContainer.exe" entries (four or five of them) which (I believe) are the processes used by DAZ for Iray rendering (I don't know this for sure).  Normally the NVContainers are "running normally" though from time to time they will block temporarily.  If you kill DAZ Studio they should still be there, you could try killing the "nvcontainer.exe" instances (two on my system) and that might be a recovery; they will just come back.  Indeed you can kill the nvcontainer.exe processes while DAZ Studio is running without harming DAZ Studio (even if it is doing an Iray preview), so maybe I'm wrong about what they are.

    One problem on Windows, at least for my system, is that Windows apparently doesn't know about the memory, etc, being used by the NVidia card - it reports believable numbers for the Intel GPU but the NVidia numbers are always 0.  GPU-Z is able to report either set of figures.

  • tetrahedranetetrahedrane Posts: 9
    edited July 2021

    Thanks for the really comprehensive replies! I'll definitely go try out your troubleshooting suggestions over the weekend. Hopefully I'll get a bit more clarity about what's going on.

    Post edited by tetrahedrane on
  • tetrahedranetetrahedrane Posts: 9
    edited July 2021

    Finally found the time to do some testing with Daz Studio, so I re-read the older replies and tried out some of the suggestions. Apologies if it's a bit disorganised.

    run two instances of DAZ Studio at once, put one into Iray preview (or do a dForce simulation) and see if a second instance now survives a couple of minutes after ending the Iray preview.  A negative result (i.e. no "crash") doesn't completely prove that it is GPU idle that causes the problem but it would be a strong indicator.

    I opened two instances of Daz Studio and ran an Iray preview of an empty scene with just the basic G8M model in one of the instances, while keeping the other one idle. As expected, the GPU crashed a while after ending the Iray preview and the instance that was running the preview automatically closed itself. When I switched over to the other instance by clicking on the window to bring it to the foreground, it hung for a short while before automatically closing. So it seems the GPU crashing forces both instances to close.

    Incidentally, now when the GPU crashes the Daz Studio window always automatically closes itself, for both the beta and non-beta versions. Previously for the non-beta version, I found that either the Daz Studio window would freeze or the entire screen freezes when the GPU crashes. Somehow this problem has gone away, maybe it's because I decided to re-install all the updates (Windows/Dell/Nvidia) that I previously rolled back while trying to troubleshoot the problem.

    If the task manager is running ok you can swap to the "Details" tab, find DAZStudio.exe and right click then select "Analyze wait chain". 

    I did this just after the GPU crashed, but before I clicked on the Daz Studio window (clicking on the window will cause Daz Studio to hang for a few seconds before it closes itself). The first screenshot is what I saw when I looked at the "Analyse wait chain", but the message was already there when I just opened Daz Studio and the GPU was running normally, so I'm guessing that message is not of much help.

    I can't see your original GPU-Z screenshot any longer, but can you check the maximum ("Show Highest Reading") of the "GPU Temperature" sensor value (third from the top in GPU-Z 2.40.0) and post one screen shot of the *bottom* of the "Advanced" tab (i.e. down to the "Adjustment Range" on the "Temperature Limit" pane) and another of the "maximum" sensors values?

    I already posted a screenshot of the Temperature Limits in a previous reply, so I'm just going to post the screenshot of the sensors tab (it's the second screenshot attached). I took the screenshot after running an Iray preview of an empty scene with just the basic G8M model, before the GPU crashed. The GPU temperature tops out at around 77 degrees, but what I found a bit concerning was that the CPU temperature was running at above 90 degrees for quite a while. I did a quick online search and found that my laptop model has had quite a lot of reported thermal issues -- maybe the CPU temperature being too high somehow causes the laptop to disable the Nvidia GPU in order to prevent overheating? In any case, it's probably about time for me to invest in a laptop cooling pad, I haven't used one since my previous cooling pad broke down a couple of years ago.

    Analyse wait chain.png
    550 x 393 - 17K
    GPU-Z max readings.png
    516 x 663 - 33K
    Post edited by tetrahedrane on
  • jbowlerjbowler Posts: 742

    tetrahedrane said:

    I already posted a screenshot of the Temperature Limits in a previous reply, so I'm just going to post the screenshot of the sensors tab * * * I took the screenshot after running an Iray preview of an empty scene with just the basic G8M model, before the GPU crashed.

    Well, the GPU thermal limiting is clearly working; a defined limit of 75C and a max of 77C.  This compares with my 85C/87C figures.  The GPU "hot spot" is also comparable to the figure on my system - I get a hot spot of something like 93C.  The "PerfCap Reason" also shows that the performance is being limited by temperature - the purple in the graph corresponds to thermal capping IRC, you can mouse over the bar to find out what the reason was at any displayed point.

    I opened two instances of Daz Studio and ran an Iray preview of an empty scene with just the basic G8M model in one of the instances, while keeping the other one idle. As expected, the GPU crashed a while after ending the Iray preview and the instance that was running the preview automatically closed itself. When I switched over to the other instance by clicking on the window to bring it to the foreground, it hung for a short while before automatically closing. So it seems the GPU crashing forces both instances to close.

    Incidentally, now when the GPU crashes the Daz Studio window always automatically closes itself, for both the beta and non-beta versions. Previously for the non-beta version, I found that either the Daz Studio window would freeze or the entire screen freezes when the GPU crashes. Somehow this problem has gone away, maybe it's because I decided to re-install all the updates (Windows/Dell/Nvidia) that I previously rolled back while trying to troubleshoot the problem.

    By best guess is that it is nothing to do with malware but is is, rather, something to do with the firmware/OS switching the whole GPU off to save power.  So it's possibly thermally related because when the system gets hot (as in 99C on the CPU) it might be trying to switch off the GPU as soon as it goes idle to cool the whole box down.  This is, of course, just a guess; the GPU-Z screenshot apparently shows the GPU completely idle for an extended period of prime before the preview render then idle again (GPU Clock down to the bottom of the scale) for a short period of time after the render load goes away (GPU Load goes to minimum.)

    maybe the CPU temperature being too high somehow causes the laptop to disable the Nvidia GPU in order to prevent overheating? 

    Maybe, but given that the GPU seemed to be basically "off" before the Iray preview I suspect Dell simply changed the system to always turn the GPU off if nothing is using it.  Of course DAZ Studio is using it after the preview, but in the background; the "memory used" figure after the preview stays high because DAZ Studio keeps resources on the GPU even though they aren't being used because the preview has converged.

    There is a setting in (WIndows) Settings/System/Display/Graphics settings (under "Multiple Displays" for some reason...) called "Graphics performance preference" which might help; choose "Desktop app" in the drop-down, use Browse to navigate to DAZStudio.exe in C:\Program Files, add it, then choose "Options" and it should give a "high performance" mode which selects the NVidia GPU as opposed to "Power Saving" to select the Intel one.  I'm grasping at straws here but clearly something changed recently on the Dell setup which is causing DAZ to fail when the GPU is used.  If Dell is turning the GPU off, given that DAZ retains resources in the GPU, that setting might have an effect on the behavior (DAZ might crash more often, or less...)

    The one other thing I can think of is that now you have an up-to-date system and the problem is isolated in DAZ Studio it would be appropriate to file a report with DAZ technical support; DAZ probably don't want to see every DAZ Studio capable machine stop working...

    The only other thing I can think of is to have a "keep alive" app running continuously on the GPU doing something trivial to keep the GPU on.  Somewhat ironically given what I said before a crippled (e.g. very low priority) crypto-miner might do this.  Slightly safer might be one of the video player programs that has GPU support and playing a video loop continuously so that the GPU load never drops to 0 but never gets high enough to compete with DAZ.

  • jbowlerjbowler Posts: 742

    BTW: the Analyze Wait Chain stuff is because of DAZ Connect; when each instance of DAZ Studio starts it downloads the whole product library manifest from DAZ if "login" is enabled.  Since this is utterly pointless and annoying with multiple instances (I normally have 2 or 3 running) I have switched off the "login on start" option and simply run DIM to update the library for new purhases and product updates.

  • I think I may have found the cause of the crash: GPU-Z!

    I recently realised that you could enter Alt+R to bring up the Nvidia performance overlay which gives the same GPU sensor values as what GPU-Z shows. So for one of my troubleshooting tests I decided to use this instead of GPU-Z, and I found that Daz Studio did not crash after the render!

    To confirm this, I did an Iray preview of an empty scene with just the basic G8M model. I didn't open GPU-Z but used the Nvidia performance overlay to monitor the GPU sensor values. I left the Iray preview running for 1 minute, then switched back to Texture Shaded view for 5 minutes to see if the GPU would crash. It did not, so I repeated this two more times, and not once did the GPU crash. To see if GPU-Z was actually causing the crash, I then opened GPU-Z and ran the preview for the same amount of time. A few seconds after switching back to Texture Shaded, the GPU crashed. And to make sure that what I saw was not just a fluke, I repeated the whole thing (without opening GPU-Z, then with GPU-Z) another time, and got the same results.

    So it seems like GPU-Z somehow causes the GPU to crash. I'm not really sure why this would be the case though, as I always thought GPU-Z just "passively" displays the GPU sensor values and wouldn't have the ability to disable the GPU.

    Interestingly, this bug with GPU-Z seems to be only limited to Daz Studio. I played a game with GPU-Z running in the background, and the GPU didn't crash after I ended the game, even though the game was definitely using the GPU (100% GPU load). Anyway, I'll probably submit a bug report to Daz and maybe to the creators of GPU-Z regarding this incompatibility, and for now I'll just stop using GPU-Z.

  • jbowlerjbowler Posts: 742

    tetrahedrane said:

    Interestingly, this bug with GPU-Z seems to be only limited to Daz Studio. I played a game with GPU-Z running in the background, and the GPU didn't crash after I ended the game, even though the game was definitely using the GPU (100% GPU load). Anyway, I'll probably submit a bug report to Daz and maybe to the creators of GPU-Z regarding this incompatibility, and for now I'll just stop using GPU-Z.

    I would definately report it to TechPowerUp, either directly or via their forum pages.  It must be some interaction with whatever Dell are doing for power management; I have exactly the same version and a broadly similar NVidia card and don't see any problems.  Maybe the 2.40.0 update (which was released on May 28, 2021) introduced the problem?  I'd had assumed that GPU-Z was pretty much passive, but I can see that it might still interfere with power management stuff since it is reading the card sensors.

    Anyway, it's good information; if I start seeing DAZ crashes/mysterious exits I will now try stopping GPU-Z (I have it running pretty much all the time.)

Sign In or Register to comment.