As long as I've been in this industry, vendors have been using (and misusing) benchmarks as marketing tool (shaped like a big club for beating competitors over the head with). These benchmarks typically fall into three categories:
- Barely useful - these are the application benchmarks such as those created by the Transaction Processing Performance Council (TPPC) which make a genuine effort to model a real workload and to police the whole benchmark process so that vendors can't play games with pricing or take short cuts on configurations to inflate their numbers or reduce their overall price. There main problem here is that if there is no benchmark that comes close to modeling your workload, then throwing darts at a board is likely to be as effective as reviewing benchmark data when trying to select the best solution for your needs. Workstation buyers are better serviced here with real application benchmarks such as SPECapcSM for Solid Edge V19™.
- Largely academic interest - these are benchmarks of a single aspect of a system, such as the Systems Performance Evaluation Council's (SPEC) benchmark - SPEC CPU 2006. For the most part these benchmarks are very hard to interpret when trying to decide which system gives the best bang for the buck on your specific application, unless you know a lot about the type of compute load your application generates and where its performance bottlenecks lie.
- Benchmarketing - these are the internal benchmarks done by the vendors where the primary purpose is macho displays of "chest thumping" at the expense of another vendor. You know what I'm talking about here, benchmark reports done by the vendor or by an "independent body" hired by the vendor (no problem with impartiality there) often using a workload cooked up by the vendor specifically to make their system look good against their competition.
I get particularly annoyed when there is a useful industry standard benchmark that is largely ignored. For example, you can't throw a rock these days without hitting some vendor telling you how energy efficient their blade systems are, yet there is not a single blade system with published results for the SPECpower benchmark. Granted the benchmark has not been out that long, but even so, IBM, DELL, and HP have taken the time to publish results for some of their rack mount servers, but their blade platforms are all missing in action. So instead of having comparable, independently verified numbers for power consumption all we get is self-serving proclamations from the vendors about how their secret power saving sauce is more effective than the other guys stuff; conclusions based on their entirely independent internal benchmark of course.
The major benchmarking authorities; TPPC and SPEC deserve some credit for producing benchmarks that have been widely published over the years, for example SPEC's sfs97 file server benchmark has become the de-facto performance benchmark for network attached storage, unfortunately it only covers NFS performance. Meanwhile, TPPC (and the Storage Performance Council) deserve credit for insisting on pricing information as part of the benchmark (a feature sadly lacking in SPEC's benchmarks) which allows customers to make judgements about price/performance.
To be fair to vendors, apart from their use in marketing (which has a questionable return on investment) putting any sort of monetary figure on the value of performing benchmarks is very hard to do. It's rare that customers demand industry standard benchmark results as part of a request for proposal, so not doing them doesn't impact business much if at all. Given that fact, it's hard to make a case for vendors to spend time and money on benchmarks, especially the bigger more complex benchmarks which take real expertise and a boatload of hardware to run. To get an idea of the cost of running a major benchmark, take a look at the top x86 TPC/C results from vendors like HP. Their most recent result used $465K worth of hardware and software! Worse still each new generation of servers requires more hardware to get the best numbers (i.e. more disks, faster disks, more memory, more network infrastructure etc) so vendors don't publish very frequently which makes it hard to compare results.
This situation leads to several questions:
- Is there is any way to make benchmarks more relevant to IT organizations trying to make an intelligent hardware purchasing decision?
- Assuming the answer to (1) is "yes" then is there any way for IT organizations to pressure vendors into producing useful benchmark results?
- Is there a better way to create benchmarks that reflect real world customer configurations and requirements?
- Are the existing independent benchmark bodies such as SPEC doing a good job or do they need to change or be replaced entirely?
I have some ideas on how to answer these questions, but they'll have to wait for my next blog entry. Meanwhile I'd be interested to hear the thoughts and experiences with respect to benchmarks of anybody reading this blog.
Posted by: Nik Simpson


Maybe let 'software vendors' write the benchmarks for the hardware vendors to compete against.
And then you can know that "SystemX" will run the OracleSPEC at some level, which might be much more relevant to your production environment than some theoretical benchmark.
Posted by: Han Solo | June 08, 2008 at 08:38 AM
If customers made benchmarks more important as part of their purchase criteria, then there might be an incentive to make them more useful. The problem is that benchmarks need to be interpreted against a real-world environments and there will probably never be enough dials on the benchmark machine to model complex customer environments. Assuming a sufficient number of dials, the cost of running and interpreting benchmarks would be very high.
A meaningful benchmark today would include virtual systems with various workload mixes. Yikes!
Posted by: marc farley | June 09, 2008 at 10:25 AM