Anyone have a petabyte scale storage environment? Maybe hundreds of terabytes? Raise your hands. For those with hands raised (you can put them down now), no doubt your life is occupied replacing failed disks. Keeping such a large storage system running can be like me plugging leaks in my roof during a Texas down pour (everything is bigger in Texas after all!). Hardware failures, like my roof leaks during a Texas down pour, are inevitable, continual and just part of life. The bigger the system is, the more likely that at any moment a disk rebuild is underway and someone is madly hunting for the failed disk to replace it, like me hunting for the hole in my roof. And woe to the datacenter that lets its temperature rise – disk failures accelerate as ambient temperature goes up.
Let’s look at an example of a Texas gully washer size system: In 2005, IBM built a supercomputer system for the government (Lawrence Livermore National Laboratory) called Purple/C. Here’s some specs:
- 1536 node 100 teraflop
- 2 PB GPFS (general parallel file system)
- 500 RAID controllers, 11000 disk drives, 4 disks per RAID 5 set
- 126GB/s parallel IO to a single file
Now that’s a big system. Lots of disks. IBM tells me that at any given time, up to 20% of the disks are in a rebuild mode. Since these are SATA disks, that’s not surprising. A rebuild can happen not just due to an outright disk failure but also from a single block corruption. So calculating just on the basis of a disk’s MBTF is misleading. One must also account for data reliability for each disk. The more active a disk is the more likely a bad sector will be encountered and thus require a RAID rebuild. Even with a background disk scrubbing routine.
That’s all well and good (well, not so good), and certainly there are other large supercomputer-like systems dealing with the same laws of physics, but the lesson, for those not in the supercomputer business, is this:
Large enterprises moving to a cloud storage environment based on distributed commodity storage nodes should expect the same. That is, regular hard failures throughout the system. For some admins this may seem alarming. Disk failures are never a good thing. Grouping 1000s of disks (SATA or otherwise) into a system, however, ensures the presence of failed disks – there is no avoiding the laws of probability.
To make large implementations feasible requires a resilient distributed file and management system capable of living with constant disk failures. Some startups such as Parascale and Gluster are in that space as are the likes of Symantec with their just announced FileMover product, IBM’s SoFS system based on GPFS, IBRIX, and Isilon, etc…
So if you plan a cloud storage deployment within your IT, get use to and plan for disk failures. This means staging boxes of disk drives by the system, a rational way to identify failed disks, and then someone to replace the disk.
To simplify maintenance, maybe a robotic tape library can be repurposed to plug and unplug failed disks (tongue-in-cheek). Unfortunately, no such option exists for my roof.
Posted by Gene Ruth