How to Build a quad-GPU rig for monster Iray performance (and animating)

124

Comments

  •  

    No. Running a CPU at 100% 24/7 will not kill a CPU. It won't even shorten the lifespan. MTBF is MTBF. As to not getting the R7 because you have a crappy Xeon board, A decent B450 is $100. An R7 2700 is $180. If you paid less than $300 for the refurb workstation good for you but you got what you paid for.

    I'm not interested in replace server processor by standard processor. Standard processors are simply not intended to heavy load tasks as HPC for various reasons.

  • What simulation SW are you using that falls back to CPU if the whole sim won't fit in VRAM and are they only CUDA? If not get off Nvidia and get a FirePro or the like which are far cheaper per Gb of VRAM.

    For example ANSYS HFFS.

  • There is no such thing as a K80m. Nvidia appends m to graphic cards to say its a mobile card. Passive cooling in a server rack? Not possible. You hopefully mean the Tesla doesn't need a fan, but it doesn't have one. The server will definitely need fans and server cases are built for a lot of fans pushing air through the obstructed insides. T640's generally, all the ones I've seen but who knows there could be a model my company never used, have/need 8 fans. If you're buying refurb make sure it comes with the fans or be prepared to buy them seperately. T630 is nearly identical so I assume it uses 8 as well. We've never vbought from HP bought I'd be stunned if they weren't built for 8 fans. That's pretty standard for racks.

    K80M don't exist? And what is that http://gpuboss.com/graphics-card/-9223372036822975357 or you have also it in https://i.dell.com/sites/csdocuments/Shared-Content_data-Sheets_Documents/en/PowerEdge-T640-Spec-Sheet.pdf

    Why it si marked as K80M I don't know. I know only K80. But here https://www.dell.com/learn/us/en/04/campaigns/poweredge-gpu it is only as K80

    I never said that servers don't need cooling. Passive cooling is oficially used term for GPUs withou fan like has GeForce, Quadro or Tesla K20c or K40c.

  •  

    No, what I told you is how m.2's work. We don't have a ton of them yet but the ones we have have not failed unusually frequently.

    So why do you think that Dell is doing it with fan???? Same solution has for similar host card HPE. But in any case I will prefer to use host card made by Dell for Dell computers (like for example T7810) rather then 3rd party product where can be problems with compatibility, etc.

  •  But if this is for a job get them to pony up for a Pascal rather than the Maxwell dinosaurs. The P2000 and P4000 are both available on the used market for very reasonable prices.

    I looked for performance of P2000 and P4000 with compare to M6000. Maxwell dinosaur overperform P2000 for 60 - 70 %. P4000 vs M6000 looks very equal, but in any case M6000 has 12GB in compare of P4000 8GB. P4000 is slightly cheaper. But due of VRAM I will prefer Maxwell dinosaur M6000 rather than P4000.
    In matter of fact also very old Kepler K6000 overperform P2000 for about 10%. But bonus of Kepler for compare with P2000 is not only more than twice VRAM but regarding scientific computing performance in double precision FP64. K6000 has about 1,7Tflops and P2000 only 90Gflops. Generally Kepler cards having obviously very big performance in FP64 in compare to Maxwell or Pascal cards.
    So, simply P2000 or P4000 is not solution in any case. No matter if for rendering or scientific computing. Only potential winner can be for example P5000. It has 16GB and overperforms M6000 for about 30%. But M6000 I’m able to purchase for 500 - 600 USD and P5000 for at least twice cost.

  • This has gotten tedious. Every single server and workstation part is the same tech as some desktop one. They get clocked lower because more cores packed together means more heat and running at higher clocks makes that worse. That has nothing to do with CPU lifespan. Every Intel and AMD chip has a MTBF and that number doesn't change if you run the chip 100% 24/7 as long as ithe cooling is adequate.

    ANSYS HFSS actually. Either you're a student and can run your sims at school or you work for someone with the cash to get you a real system. Heck the software itself is ridiculously expensive if you're not a student.

    K80 is the only Tesla K80 listed by Nvidia. The sites you linked to showed pics of K80's and the specs match.

    Dell put a fan on the card because a lot of people think m.2's need cooling because they don't have heat sinks or anything. Doesn't change the facts.

    I have literally no idea what you're actually doing but go ahead not my problem.

  • ales_irayales_iray Posts: 20
    edited September 2019

    Yes, I agree with you that this has gotten tedious. Because, I'm sorry, but you done gotten it tedious. So I hope for last - use for standard processor in place of server for jobs like HPC is irelevant. In fact of course I don't know what it will be doing repeated long time heavy loads with lifetime of standard processor because I never tested it and I'm really not interested to test it. But in any case there is many other reasons why to for heavy loads use server processors in place standard. Many of them can be crucial in dependency of your task.

    ANSYS was only example of one of SW that I'm using. Some of them are free. Generally it seams that CUDA computing has better support then AMD. But in any case DAZ 3D Iray requires CUDA. So at least for this reason is irelevant to speak about AMD card.

    Are you sure that your SSD without cooling will be OK in case of long time heavy load????

    Post edited by ales_iray on
  • !

    Yes I'm sure about everything I wrote. I'm the IT manager at a datacenter. I have to know an awful lot about hardware reliability. it truly doesn't matter what load you put on a CPU as long as it is properly cooled. Simply applying any current to a CPU runs the clock on MTBF. The voltage may vary a small amount between full load and idle as may temp, under proper cooling it won't vary much more than 20 to 30C which is nearly inconsequential as these things go, but those variations simply aren't enough to substantially affect lifespan of a CPU.

    There are a lot of CUDA only products but if you're on a budget get the ones that don't since Radeon cards have more VRAM. That was your sole argument for wanting the high VRAM cards which have terrible performance. If you also want to render in DS either go with 3Delight or get a decent newer card that can handle the scenes you want. But spending more than $500 on a Maxwell Quadro? You know people like me throw those things away for a reason, right?

    Yes, I'm sure that a m.2 will be fine without active cooling ior even a heatsink for years on end. I not only have my own data I can show you plenty of others who agree with me.

  • No. Running a CPU at 100% 24/7 will not kill a CPU. It won't even shorten the lifespan. MTBF is MTBF.

    MTBF is based on expected consumer use, running a CPU at 100% load 24/7 will absolutely shorten its lifespam as compared to running it under more typical consumer use case.

    What ultimately kills a CPU (or GPU or any processor) is temperature and voltage over time. Most processors reduce voltage at lower clocking speeds to conserve power and prolong lifespan.

    The way they "die" is usually that they become less over clockable until they can't even handle stock clocking. Generally they don't just abruptly stop working entirely unless you've rect your VRN's or gone for some crazy OC settings. I had an old VC that was factory OC and after a couple of years needed to be under clocked or it would crash under load.

    That being said, by the time your processor actually dies (even with overclocking and heavy use) it is probably time for a new one anyway. With decent cooling they last an incredibly long time.

    ales_iray said:

    Regarding R7 2700 - as I know Dell and HPE workstations and servers uses Intel Xeon processors, so AMD is impossible. Also I'm not sure, but R7 2700 is not server processor. For simulations when your CPU runs on 100% tens of hours you need server processor like Xeon. Normal processor it will probably kill, their are not intended for so heavy load.

    My understanding is the server grade stuff like quadro and xeon are sometimes just the same as the consumer grade stuff except they are cherry picked over performers that when tested use less power and perform at higher clocking speeds. This is (from what I understand) an excellent predictor of future reliability and lifespan as well as stability. Companies are willing to pay a premium for increased stability on critical systems, but it just isn't a major issue for consumers. How often do you see a BSOD on a fully stock non-overclocked PC?

    Another (AFAIK very minor) issue for up time measured in weeks or months is ECC memory. AFAIK AMD support ECC memory on consumer grade processors, but intel only do it on Xeon. Nvidia have it on Quadro, but not their consumer grade stuff, not even the Titans get it. No idea about Radeon. I have heard that non-server versions of Windows will force reboot after 30 days so it probably isn't a significant issue for most consumers anyway.

    Sorry, but I was mostly skim reading this, is your R7 2700 water cooled or does it have decent air cooling? if you're keeping the temperature low and it's stock clocked then you can just run it at 100% as long as you like, it will die sooner, but who cares if it dies 3 years from now instead of 5.

  • NO!

    MTBF when reported and it is for every server and pro CPU I'm aware of means how many hours the chip can be on for before it will fail on average. It trult doesn't matter, except very marginally, whether the chip is under full load or not. If it is truly under 100% load 24/7 then it won't be boosting and the temp will stabilize, presumably well below the chips thermal limit. That is perhaps the ideal scenario for chip longevity. One of the reasons servers are almost never powered down, despite the obvious advantages to doing so against having to have hot swap fans, drives and PSU's, is that each power on cycle puts a lot of wear on every component in the system.

    If you OC a chip you are running more voltage through it which whether there is more heat or not still means a shorter lifespan, that's why underclocking makes chips last longer (underclocking is actually undervolting the chip). But again that's irrelevant to 24/7 use versus on/off use (except turning on a OC'd system puts even more strain on the CPU than if it wasn't OC'd).

    As to server chips being binned. They are in a certain way. Ever Xeon or Epyc chip made is produced as the top of the line one. But there are inevitably lots that either have cores that flat out don't work or work very badly. So every chip is tested and the ones with the badly performing cores have those cores turned off. They are then sold as lower core count Xeon's or Epyc's (this might be slightly different on Rome chips due to the chiplet design as the lower performing chiplets might just go in the Ryzen desktop CPU's but some will definitely go into the lower core count Rome as well).

    Server CPU's are clocked lower than desktop CPU's because of the core counts. More cores packed in so close together means more heat. Lower the clock and you get less heat. It's already pretty challenging handling the heat from things like 225W Xeon's that sometimes come 2, 4 or 8 chips to a motherboard (We actually have some 7U chassis with 8 Xeon 8175's with a TDP of 165W each. That's a lot of heat even in that big box.). If they were clocked higher it would get worse (Intel has announced the 9200 Xeon platinums with TDP's of up to 400W but they aren't going to be sold except to OEM's and we mostly build our own so we can customize them to each customers needs so I won't have to try and cool a box putting out 3200W just from the CPU's).

  • This was tedius back on page 2.

    Sorry, couldn't resist poking a bit of fun...  laughcheeky

    But I say just buy it, don't bother overclocking it, back it up regularly, cool it in whatever way you wish, keep the rats from chewing on the power cords, blow out the cockroaches every couple years, get some work done, get some more work done, and let this thread fall to page 45.

    Oh, and stop tweaking!  Not twerking; I don't care if you do that.  But stop with the tweaks, people!  I can't stand it when people tell me they're still using their Windows ME tricks on Windows 10.  Ug!

    I always have to remind myself:  It'll probably run for years.  If I don't screw it up!  wink

  • kenshaw011267 I stop to understand to you thinking.

    Yes, VRAM size is important for me. Same importance has for me overall GPU performance. But what is most important for me that GPU must be usable for me. So, no matter AMD cost - theses GPUs without CUDA are simply unusable for me. So, I have all based on Iray (purchase items that cost me some $) and you are advising me to take it away - sorry, this is idea is totally out. I rather not to ask what I will do with other SWs that requires CUDA.

    That Maxwell is crappy is only your opinion. I wrote you that M6000 still overperforms some Pascal GPUs. You are manager and you simply not paying HW from your money - I think that from this is outgoing your curious thinking. I'm paying HW for my purpose from my money, So, I will twice time thing if something is crappy or not. In any case If you are really throwing Maxwell cards I can provide you my address and you can send this waste to me.

     

    Regard M.2 temperature. OK, so write here what is ideal temperature, please. I have on mind temperature when SSD will work on full performance, not top limit temperature when over it will be dangerously hot. Simply ideal temperature. I can check it in SSD dashboard if it is too cool, or OK, or too hot. I'm not operating my computer in air conditioned room, so all must be fully operative up to maximum temperature that is allowed by Dell for this computer. But in any case - as I wrote before this is Dell card for T7810 and other Dell computers. I will not replace it by any 3rd party card to risk systems stability. I can replace it by Supermicro host card. But question is simple why, for what reason.

     

    If you are IT manager in data center I expect you are well familiar with computers operation in datacenter. But I think that regarding HPC you simply know nothing. HPC is my job, I'm doing that every day.

    So motherboard in T7810 in Dell is for you scrappy and I have to replace it by some Gygabite board and some standard AMD processor. But this my crappy board is for 2 CPUs and 256GB RAM four channel. And it is important for me. Board with only 64GB RAM is unusable for me. E5-2630 v3 is for you terrible. But this processor is able to run overclocked long time in 100% utilization. E5-2630 v3 is four channel and is able to address more than 700GB RAM. I never found how many GB of RAM is able to address AMD processor mentioned by you. I also don't know if it is able to work in 2 CPU configuration. In any case this AMD is only 2 channel. If you are saying that under long time full load clock speed of this processor will go down then this processor is terrible and unusable for me. No matter on datasheet clock speed values. But I wrote it multiple times that normal processors are not intended for heavy load as server processors. Of course, there are AMD based boards for server AMD processors. But in basic price will be similar like for Xeons. But same as in case of SSD host card - T7810 is existing computer that cost me some money and I don't know why to throw it away.

  • I'm not sure what the back-and-forth is all about.. but I'll throw in some experience from my end. I've run a quad-gpu render server (with dual xeon and an SSD) for 2 and a bit years now, while not under load 24/7, it's been rock solid (hardware wise at least).  Being fully watercooled there is some maintenance involved, but it means it can run a lot cooler and quieter than it would under air.

    So yea, keep it cool, clean and backed up daily and you're golden.

  • Generally about Server Processors vs. Customer or Standard Processors. As I wrote this is not only about life time. This is also about the ability to work in multiCPU configuration, about how many RAM is able to address, about how many number of memory channels is able to handle, about virtualization features, etc.
    In matter I never used consumer CPU for HPC, so I never seen what it will be doing. I have consumer CPUs only in notebook, all other computers that I'm using are Xeons. This is because if there is experience to use server processors for this task then I will be doing it in that way and not experiment with this. If this consumer processor will underperforms because of clock speed decrease during long time heavy load then this processor is terrible for task like HPC and thus unusable for this task. Xeons in computers that I am using are able to run overclocked for long time.
    Same is regarding GeForce vs Quadro. Quadro and Tesla cards also have a variety of other features for science computing in compare with GeFroce.

  • I'm not sure what the back-and-forth is all about.. but I'll throw in some experience from my end. I've run a quad-gpu render server (with dual xeon and an SSD) for 2 and a bit years now, while not under load 24/7, it's been rock solid (hardware wise at least).  Being fully watercooled there is some maintenance involved, but it means it can run a lot cooler and quieter than it would under air.

    So yea, keep it cool, clean and backed up daily and you're golden.

    I also don't know what this all about is. In matter of fact I have question that is wrote in my first post into this topic from August 24 (part about building quad GPU server based on Tesla K80 and some existing Dell or HPE server). I also pointed again this question in September 22. In place of that there is again and again solved that HW in my current computer that simply exist for some task (and is not intended to be changed with exception adding powerfull GPU) is totally wrong and crappy and have to be changed.

  • I think this argument has run its course and needs to be dropeed by all parties.

  • MistaraMistara Posts: 38,675

    i think hexacore the new standard.

    my day job just rolled out elite desktop hexacores / 16gb. 
    they didn't spend any money on decent gpu
    the mobo video is dual dp out
    grateful they gave us a dvd drive.

  • Deadly BudaDeadly Buda Posts: 155

    Hello Philosopher! How would this build be modified to run 4x liquid cooled rtx 3090s?

  • Deadly Buda said:

    Hello Philosopher! How would this build be modified to run 4x liquid cooled rtx 3090s?

    One thing is the heat.. especially from the backplate (even water cooled you really need a fan or two on them). 

    The second would be the PSU requirements.

    I haven't tried putting all 4 together yet, I've just stuck with a pair in two machines.. even a pair are incredibly hot (not so much die temps, but general heat output inside the case).

     

     

  • Deadly BudaDeadly Buda Posts: 155

    I seem to notice that Daz doesn't put a massive load on the GPUs when rendering Iray. I'm guessing you could power them down to using only about 275 watts and still render at virtually the same speed. So, I'm guessing that would give enough headroom for the rest of the system using a beefy 1300watt+ power supply. Or am I dreaming here?

  • Deadly BudaDeadly Buda Posts: 155

    Daz Jack Tomalin said:

    One thing is the heat.. especially from the backplate (even water cooled you really need a fan or two on them).

    I thought the water block would take care of that? 

  • Deadly Buda said:

    I seem to notice that Daz doesn't put a massive load on the GPUs when rendering Iray. I'm guessing you could power them down to using only about 275 watts and still render at virtually the same speed. So, I'm guessing that would give enough headroom for the rest of the system using a beefy 1300watt+ power supply. Or am I dreaming here?

    Personally, my gut thinks that would be tight.. I'm planning on trying the same but with a 1600w PSU. I mean in theory, it might work.. (assuming your PSU has enough plugs etc..that's a whole other story).  My Asus cards are 2x8pin, which is ok for me.. but if they're 3x8pin cards then you're gonna struggle with 4 cards.

  • Daz Jack TomalinDaz Jack Tomalin Posts: 13,813
    edited April 2021

    Deadly Buda said:

    Daz Jack Tomalin said:

    One thing is the heat.. especially from the backplate (even water cooled you really need a fan or two on them).

    I thought the water block would take care of that? 

    The front of the cards, yes.. but the 3090's have memory on the back too.. which get roasting hot.  There are actually full cover waterblocks on the way (at least from EK) which also actively cool the back side of the cards too.. but I've yet to see them in machines with multiple GPU's, so can't speak for clearance etc.

    Post edited by Daz Jack Tomalin on
  • SevrinSevrin Posts: 6,313

    Time to try this idea.  It will also  make delicious french fries!

    Strip Out The Fans, Add 8 Gallons of Cooking Oil | Tom's Hardware

  • Deadly BudaDeadly Buda Posts: 155

    Daz Jack Tomalin said:

    The front of the cards, yes.. but the 3090's have memory on the back too.. which get roasting hot.  There are actually full cover waterblocks on the way (at least from EK) which also actively cool the back side of the cards too.. but I've yet to see them in machines with multiple GPU's, so can't speak for clearance etc.

    Yes I'm thinking the single slot Asus EKWB rtx 3090s could do the trick. Combined with a 1600 watt PSU you reckon it could work?

  • colcurvecolcurve Posts: 171

    is there any reference rig for 4x rtx30xx ?

    since bitcoin ruined the prices it might be a bad idea to invest in a gpu right now, but i like making plans still :)

    will have to wait until they find a reasonable way to block hashing

  • Deadly Buda said:

    Daz Jack Tomalin said:

    The front of the cards, yes.. but the 3090's have memory on the back too.. which get roasting hot.  There are actually full cover waterblocks on the way (at least from EK) which also actively cool the back side of the cards too.. but I've yet to see them in machines with multiple GPU's, so can't speak for clearance etc.

    Yes I'm thinking the single slot Asus EKWB rtx 3090s could do the trick. Combined with a 1600 watt PSU you reckon it could work?

    Yea I think so.. depending on the other components.. isn't there a 2000w PSU about (seasonic?) might be safer 

  • Deadly BudaDeadly Buda Posts: 155

    Daz Jack Tomalin said:

    Yea I think so.. depending on the other components.. isn't there a 2000w PSU about (seasonic?) might be safer 

    I'm in the USA, so 120 volt, but maybe someone here knows... can a 2000watt PSU plug in to a USA wall socket if its on a 20 or 30 amp circuit?

  • FrankTheTankFrankTheTank Posts: 1,481
    edited April 2021

     

    I'm in the USA, so 120 volt, but maybe someone here knows... can a 2000watt PSU plug in to a USA wall socket if its on a 20 or 30 amp circuit?

    you would need a special outlet and heavier duty wiring put in by an electrician for 220v direct to the circuit box, kind of similar to how they have special outlets for electric dryers. 

    Post edited by FrankTheTank on
  • Deadly BudaDeadly Buda Posts: 155

    TGSNT said:

    you would need a special outlet and heavier duty wiring put in by an electrician for 220v direct to the circuit box, kind of similar to how they have special outlets for electric dryers. 

    Hey thanks for the input. This is going to be a lot of dedication to get IRAY render times down. Lol. 

Sign In or Register to comment.