« Its an Operating System Jim, But Not as We Know It | Main | Hitachi joins the SSD game, putting other HDD vendors on notice »

November 18, 2008

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83514402453ef01053601fc78970c

Listed below are links to weblogs that reference Distributed Power Management: Improved Efficiency or Playing with Fire?:

Comments

Robert Wipfel

Hi Chris, if one is running a modern enterprise Linux distribution as the virtualization platform, you don't have to physically power off servers to gain power savings; e.g. by continuously monitoring the capacity of your data center servers; which modern data centers are presumably already (or considering) doing for ITIL suggested capacity planning; and load balancing of VMs for failover etc, it's possible to detect periods of inactivity that would allow for squeezing workloads into a smaller number of processing units; and then using e.g. Linux tickless kernel support and CPU P-States to reduce the power consumption of unused cores. Powering off servers completely is also possible using a BMC which would still drain lights-out power. In the hopefully rare case (MTBF eventually TBD by IHVs) that a server won't actually resume service, virtualized workloads could be resumed elsewhere; is just a different kind of failover ;)

Mike DiPetrillo

Question, do you shut off your desktop or your laptop each night? You at least put them to sleep right? This is the SAME THING. Yes it's true that turning on electronics or a car is the hardest thing on the system. However, with modern components this really doesn't hurt as much as you think. I drive a Prius and it turns the motor on and off all the time. Doesn't hurt the longevity. My wife drives a Ford Escape hybrid - also no impact to the engine. My laptop doesn't seem to be dead. The toaster oven is still working. My TV is just fine. These things are meant to be run for a long time. What really kills electronics is heat. Turning the things off from time to time helps this. Actually the biggest things that fail in servers are usually hard drives and fans. Hard drives die from constant motion, heat, and friction (again heat). Turning the server off will help all of that. The EPA actually did a report on this to congress and showed no impact to MTBF by powering off devices. Lastly, you have to consider that MTBF for most of these devices is about 3 years longer than the device is on the books and in the datacenter.

So, you can keep your servers on and kill them faster and pay the extra money for power or you can use DPM to lower your power bill and actually lengthen the life of your servers. Your choice. The smart people out there are taking advantage of this and saving an average of 40% by powering off underutilized hosts on nights and weekends. That 40% savings can buy a few extra servers or parts if you're really that paranoid that one of your hosts might not come back up.

Chris Wolf

Hi Mike,

As usual I appreciate your passion. However, to say that the large majority of VMware customers that have stayed away from DPM are not smart is a bit much in my opinion. Mike, if you're referring to the 2007 EPA study, the final report noted that the disk spin down spin up findings in the report did not apply to higher speed "server-class" drives. See page 70 (http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf). In fact, all studies I have seen (even the latest one commissioned by the Uptime Institute) evaluated MTBF of laptop hard disks. That is not an apples-to-apples comparison with server hard disks, and is something the EPA correctly recognized. This is one reason your clients are hesitant to implement DPM with production systems. And to be honest, I really don't think mocking them is the best response for yourself or any vendor representative. Instead, I would suggest you devote some of your passion and energy to get the IHVs to provide your customers with the MTBF data they are looking for.

Mike DiPetrillo

First, Chris, yes I'm passionate. That comes out even more when I see customers throwing away money because they're scared for all the wrong reasons (that's the "F" in FUD). I will continue to chastise anyone that brings up an argument for not doing something to save their company money when that reason isn't based on any facts. It's high time engineers out there stepped up to the plate, started to think, and actually engineered. Yes, not everyone fits this boat and I'm sorry for those true engineers that really do truly understand things. For those out there that fit into that bucket I'm sorry. Now go and hit the other 3 people in the room that don't truly understand stuff.

I think we're stuck on the point that hard drives are the number 1 thing that cause servers not to come back alive. I don't have stats one way or the other and can't find any on the web. If you have some it would be great to share. However, given this fact you have to remember that most "server grade" drives have MTBF numbers much longer than the life expectancy of a human being. Take a good staple for server hard drives, the venerable Seagate Cheetah. On the datasheet (http://tinyurl.com/29drm9) it lists a MTBF of 1.2 million hours (137 years). The Cheetah isn't even the fastest or longest lasting drive out there but it's in a ton of servers and storage arrays. The point is the AVERAGE time before a failure is 137 years. Let's say shutting it down and powering it back up cuts the life expectancy by 80%. That still puts the death of the drive near 27 years. Do you really think it will still be in the datacenter at that point?

All of that drive talk was also for drives in use or near constant use. In a typical ESX host installation the internal drives barely move since there's nothing local that's really doing anything. All of the VM storage is out on shared disk. With ESXi there's even less of a reason to have shared storage and by PXE booting nodes as seen here (http://tinyurl.com/6yfbq8) there are no internal drives. You even have server vendors like Dell and IBM shipping ESXi on flash in the server with no local drives. Bottom line is you can't rightfully blame the drives for server death.

This brings us back to what is killing the server - heat. DPM reduces heat, gives the electronics a break, and saves you power at the same time. DPM is the solution to the problem (servers dying) - not a catalyst to make them die sooner.

So after all of this let's kill off the myth that DPM is going to increase server failure rate. I'm pretty sure that was created by someone that just didn't think this through or was plagued by picking the wrong drives in the first place. I have seen that - mass drive failures. However, these failures were bad components and hardware and were not caused by parking the drive head excessively. DPM doesn't have any impact on that.

I'm sure there will be studies done at some point on the true impact here. Those will take time since you're talking about running a bunch of load tests for a long time (years) to test MTBF. I'm sure those tests will tell you exactly what I'm saying here - nothing to worry about. In the mean time you can keep hiding behind FUD and wasting company resources or you can join the hundreds of other companies already using DPM and enjoying massive power savings. The choice is ultimately the customers. All I ask for is to let them make decisions based on some good facts rather than poorly researched and reposted FUD.

Burton Group Burton Group

Hi Mike,

Thanks for the comments, good to debate the subject with you.

I would agree that none of us in this industry should make decisions based on poor research or FUD. That is exactly why Chris is recommending that more research be conducted on the effects that routine/regular power cycling has on MTBF of enterprise servers. So far, none of the major IHVs (IBM, HP, Dell) have conducted such a study (nor have plans to, AFAIK).Until they do, Burton Group is not willing to recommend that enterprise customers "just try it" within their data centers "because it works for laptops and desktops". While I respect the EPA, Chris is correct to point out that their research was not conducted on enterprise class servers, nor was their research conducted by those in the know -- the IHVs themselves.

May I suggest that VMware can/should alleviate the legitimate concerns customers have about dynamic power management, by partnering with one of the major IHVs to sanction an independent 3rd party to study the effects on enterprise class servers (and publish the certified results)?

Unfortunately, I suspect VMware will have trouble getting IHVs to sponsor such a test. Why? Because the IHVs don't recommend that customers routinely power cycle servers...not only for technical concerns, but for support and warranty concerns as well (speaking from experience, having worked for Compaq, HP, and Dell for most of my career). Instead, IHVs are headed toward power state management. IOW, rather than power off the entire server, IHVs are exploring technology to lower the power consumed by individual components based on usage.

As far as server component failures go, it doesn't matter which component fails, when it fails, the server is down. So, you're correct to say hard drives aren't the only component that can potentially bring a server down. To understand overall system MTBF, it's important to look at the MTBF of each component, which ones fail most often, and what affects MTBF for that component. The most common component failures are PSUs, fans, HDD, and memory (in that order I believe).

So what is MTBF? How is it calculated? I ask this because your comment on MTBF is a bit misleading. MTBF does not mean that your HDD will be operational the number of hours cited by the manufacturer. It's simply an estimation of the average time to failure using prediction models. MTBF is based on the idea that electronics (like HDDs) fail at constant rate and follow a predictable distribution. Prediction models, such as MIL-HDBK-217, or Telcordia (Bellcore) SR-332), are normally used in the case of electronics . This site: http://www.t-cubed.com/faq_mtbf.htm gives the MTBF function as(MTBF = 1/(sum of all the part failure rates). Using this formula, we can predict the probability of a component failure over time. R(T) = exp(-T/MTBF). So, if the MTBF of a component is 500,000 hours, then what is the failure rate of the HDD over 5 years (43824 hours)? exp(-43824/500,000) = .0916 or 91% chance that the system will remain operational during that time. Or stated another way, there's a 9% chance that the component will fail in that time.

Sounds good good at first, doesn't it? But this is where the real analysis comes in. Because MTBF is an estimation, the electronic manufacturers use tests to increase failure rates so that they can obtain an MTBF number without having to test for tens of years.

For example, MTBF for HDDs is a estimation calculated based on a battery of environmental tests. These tests include extreme heat, cold, humidity, vibration, I/O, and, you guessed it, power cycling. The HDD vendors run these tests until the drive fails and then extrapolate what the MTBF is. In other words, the HDD vendors increase the stress on the drive to create high failure rates so they can calculate MTBF. So, obviously power cycling has an extreme effect on MTBF.

Also notice that the formula isn't linear. If we plot the graph, you'll see that failure rates increase exponentially as time increases toward MTBF. If the MTBF is lowered (brought to the left on the graph), then the time to exponential failure rates is lowered as well.

And, we can't blame this simply on heat. If the other environmental tests weren't enough (like vibration), it's the expansion and contraction that electronics go through during certain events (like power cycling) that can cause wear on components until mechanical/structural failure. Thus, structural damage, caused by extreme conditions, such as too much heat/cold/power cycling is more to blame.

So, what if DPM's power cycling has the same affect as lowering the MTBF by 50%? Then we have almost a 16% (.0839) chance for failure (using the same numbers). In a data center with thousands of servers with PSUs, fans, hard drives, and power supplies what just happened?

Even better, in a highly consolidated, virtualized data center, how many spare servers should there be laying around? Where's the biggest bang for the buck? In the data center? Or for desktops, where tens of thousands are left on at times when they aren't needed?

Overall Mike, Burton Group believes dynamic power management *can* be a good thing, but DPM shouldn't be just about shutting down servers. DPM should be about lowering power states were possible, only shutting down when necessary AND when studies have shown that the effects of doing so are negligible.

Drue Reeves

Mike DiPetrillo

Dru,

Thanks for taking the time to give some good detail behind this. I knew you guys had the data to support a theory. It would have been great if this was included up front but now that it's out this will be VERY useful for customers to understand.

I completely agree that there's no way IHVs are going to say, "hey, you should use DPM" outwardly. All the IHVs want to add advanced power features to their servers so they can differentiate themselves and sell more servers. That's completely understandable. I do have 2 problems with that:

1) Power management in the servers alone is not where this intelligence belongs. While a single server can see it's underutilized at a certain time it doesn't have the smarts to see that the servers next to it are also underutilized. There's no central control point from the server vendors that can see the apps, all of the servers, all of the power, and all of the heat in a datacenter. Sure there are monitoring tools that get good portions of this out there but within a single server it just doesn't exist.

2) I haven't seen one single IHV that has come out and said, "don't use DPM because we won't support you if your hardware dies.

While I can agree that tests need to be run to see what real impact this has on MTBF for servers there's also no concrete data out there to show that DPM activities actually decrease MTBF. There's a lot of theories and formulas but not one study has been shown with DPM and measuring the impact on MTBF. This means one can't go around saying, "don't use DPM because your servers will die more often". There's always 2 sides to every coin.

So where do we go from here? To start, I've setup an informal test in my basement with an old Dell 2650. This particular server started its life in a customer's datacenter for a few years before showing up on eBay where I snagged it for a modest fee. The server was shipped to my house in a cardboard box with newspaper stuffed around it for "protection". I ran that server for 1 year in a basement lab before moving two times to where I currently live. The moves were done without the cardboard or newspaper. The server now lives in my dusty, unfinished basement running ESX. All this time the original 146 GB SCSI drives are churning away. What I have setup is a torture test. I've got the server running with ESX 3.5 U3. I have it plugged into a web based power switch. I wrote a script to power on the server and let it boot. After 5 minutes the script pings the server, takes a picture of the screen with a web camera I have nearby, and then hard powers-off the server. That whole process then repeats itself. I plan to do this 365 times to simulate a DPM operation once a day. This should be a great test to see if the server dies. If this old clunker doesn't bite the bullet then my nice informal test should say it's safe for DPM to perform nicer soft power-offs in a customer's datacenter with their newer servers that are built to handle power cycles better. Sound fair?

I'll make sure to post the results and all screen shots over on my blog at http://www.mikedipetrillo.com when I'm done. In the mean time if there are some IHVs out there reading along feel free to reach out if you'd like to do some more formal testing.

Pete Waterman

You may recall from my presentation at the Catalyst conference earlier in the year that I actually do buy into the "power on demand" idea. We're using diskless blades of course, which give us a huge advantage - no "moving parts" cycle when we do this. Considering our historical failure rate on procs and memory is generally intensely low, we're not so worried about it.

It's an interesting discussion though, because we always looked at this warily from an entirely different perspective - power instability. We didn't like the potential damage and risk caused by the power spike required to cycle up a bunch of servers at once, and were scared of a situation where 20+ servers cycling up in a rack would cause an overload. Thankfully Dell listened to us and gave us some pretty intelligent powerup control in the M1000e chassis.

I do think the point that no IHV has done testing on this (or is willing to share it at least) is pretty interesting though. We've done some very loose informal testing on it, enough that we're not too worried (a few hundred power cycles), but I suppose it can't hurt to go a bit deeper and see what's up.

Towards that end, like Mike above, I've set up a few systems in our lab to do a full power cycle (power up, boot OS, shut down OS, power off) repeatedly. Looks like the entire cycle takes just over 2min. I set three blades to repeat this and give us around 480 cycles a day. I'll run this indefinitely and if any concerns arise, I'll have some solid data to bring to Dell and encourage evaluation of this.

(realistically of course, we don't expect to power cycle a typical blade more than once a day and with a shelf life of 4-5 years being less than 2k power cycles, I won't worry much if a couple weeks from now all three blades are still cherry)

Chris Wolf

Hi Pete,

It's good to hear from you and thanks for weighing in. Please let us know how your results go and Dell's response to your tests. I have personally received a great deal of response from both virtualization vendors and hardware vendors since this post was first published. Thus far, some early feedback from IHVs has indicated that they are more interested in standing behind active power management solutions (e.g. reducing the clock rate or shutting down unneeded CPUs) that do not involve power cycling servers. I think moving forward we're going to see a mix of both approaches and it's important for IHVs to ease their customers concerns. A simple statement clarifying any impact large scale use of DPM would have on server lifetime and related server maintenance and support contracts would go a long way. Many of our clients would like to go down this path, but they just want reassurance from the vendors. Many organizations appear willing to wait on embracing any for of DPM until they receive those reassurances. Keep spreading the word. The more the hardware vendors hear about customer demand for clarity, the quicker we'll see them step up and provide that clarity.

Pete Waterman

It's only been a few days, but I thought I'd share some data from my testing so far. I've had three 1955 blades on a perpetual power cycle for seven days now. To recap, the cycle is as follows:

1. Blade powered on via ipmi
2. RHEL5 fully boots on blade, then init 0's (and powers off)
3. Repeat ~30s later

So far I've encountered two interesting glitches - two out of the three blades at one point during the week completely stopped responding to IPMI poweron commands. In order to re-initiate the testing, I had to powercycle the blade via DRAC, after which IPMI started responding again. (One blade tripped this after 616 power cycles, the other after 2603 power cycles).

Second, I have to power cycle the chassis IPKVM module every few days to get video back on these blades - for some reason after a few hundred reboots the IPKVM stops detecting signal from them.

No other problems, however - certainly none fatal. So, in general everything is holding up well so far, with far more power cycle events than I'd expect in a lifetime. I'll keep it running though...


Data Range: Mon Dec 8 @ 13:54EST to Mon Dec 15 @ 13:25EST

Blade 1: 3351 power cycles
Blade 2: 3046 power cycles
Blade 3: 2611 power cycles

(differences in number are due to the length of time the blade was hung with the IPMI problem before I paid attention)

The comments to this entry are closed.

  • Burton Group Free Resources Stay Connected Stay Connected Stay Connected Stay Connected



Blog powered by Typepad