This month has seen several vendors announce or commit to offer thin-provisioning in some form on their storage products. For those not familiar with the term, thin-provisioning is all about breaking the 1:1 relationship between allocated storage and available storage. Today, if you’ve got 1TB of storage available in your array, then no matter how you slice it, you can’t allocate more than 1TB to the hosts attached to the array. With thin provisioning, you can allocate as much as you like, hosts still see what appear to be conventional volumes, but physical storage is only allocated in small chunks when the host actually writes data to the volume. The benefits of this approach are powerful:
- Efficiency: Since physical storage is only allocated when it’s needed, you end up with a much higher ratio of data to storage. This can have a significant impact on the amount of storage capacity you buy as well as the daily running costs (power, cooling, and management.)
- Simplicity: You can allocate all the storage that a host will ever need when it is installed, no more downtime to add storage and expand file systems or databases.
Just so long as you keep adding physical storage as fast as the hosts create data everything is hunky-dory!
There are some downsides that you need to consider though:
- Array limitations: there are two places where thin-provisioning can be implemented, in the array or in the storage network above the arrays (network storage virtualization.) Array-based approaches run into hard capacity limits of the array, i.e. never allocate more storage than the total array capacity or you run into the problem of what to do when the array is full. Implementing thin-provisioning as a service in the virtualization layer removes that limit because you can add arrays to the allocation pool.
- Garbage collection: deleting data on a file system located on a thin-provisioned volume doesn’t return capacity to the allocation pool. Because the thin-provisioning is being done at the block level it has no clue about the file system, once capacity is allocated it stays allocated. This may cause problems with file systems that try to avoid overwriting deleted data such as NTFS.
- Rogue processes: there is a danger that some rogue process goes crazy and keeps writing data until there is no more physical capacity available to satisfy other volumes drawing from the same allocation pool. At which point the whole thing goes to Hell in a hand basket.
- Performance: the whole point of thin-provisioning is to make more efficient use of resources, and that means fewer drives for a given amount of data. Fewer drives inevitably translate to lower performance. There is also the issue of the allocation and lookup overhead for every write. When a write comes in, the thin-provisioning logic has to check first to see if physical space has already been allocated. If it hasn’t been allocated, additional logic is triggered to map some physical space to the virtual disk.
The biggest concern should be how the system handles the case when physical storage is running low:
- What alerts does it give?
- Can you set thresholds for when alerts start happening?
- Can it throttle I/Os from processes that go on an allocation binge?
Ultimately, I would like to see the alerting system integrated with sales at the vendor so that more storage is ordered automatically when a user defined thresholds is exceeded.
So, my advice is that thin-provisioning is an important advance (just as virtual memory was for operating systems,) but you should look to network approaches such as those offered by Hitachi Data Systems and Network Appliance, or if you prefer software, DataCore. These approaches don’t have the hard capacity limits of typical array-based approach. I would also advise a lot of testing on non-critical applications before turning the technology loose in production.
post by: Nik Simpson


You may have missed it, but EqualLogic released thin provisioning the same day HDS did. Its a very well thought out implementation that does not have the array limitations you describe, provides multiple levels of alerts and thresholds, throttles performance for rogue applications AND allows the function to be turned on and off.
http:www.equallogic.com/news/release_display.aspx?id=2829
Posted by: MarcFarley | May 24, 2007 at 05:55 PM
Thanks Mark, actually Equalogic was one fo the companies that I was thinking about when I wrote the entry. Good to hear that you can span an allocation pool across more than one array as that does answer my primary concern about array-based thin-provisioning.
Posted by: Nik Simpson | May 25, 2007 at 05:22 AM