13 benchmarking sins

Avoid these benchmarking boners if you want useful data from your system tests

1 2 3 4 Page 4
Page 4 of 4

Misleading benchmarks

Misleading benchmark results are common in the industry. Often they are a result of either unintentionally limited information about what the benchmark actually measures or deliberately omitted information. Often the benchmark result is technically correct but is then misrepresented to the customer.

Consider this hypothetical situation: A vendor achieves a fantastic result by building a custom product that is prohibitively expensive and would never be sold to an actual customer. The price is not disclosed with the benchmark result, which focuses on nonprice/performance metrics. The marketing department liberally shares an ambiguous summary of the result ("We are 2x faster!"), associating it in customers' minds with either the company in general or a product line. This is a case of omitting details in order to favorably misrepresent products. While it may not be cheating -- the numbers are not fake -- it is lying by omission.

Such vendor benchmarks may still be useful for you as upper bounds for performance. They are values that you should not expect to exceed (with an exception for cases of friendly fire).

Consider this different hypothetical situation: A marketing department has a budget to spend on a campaign and wants a good benchmark result to use. They engage several third parties to benchmark their product and pick the best result from the group. These third parties are not picked for their expertise; they are picked to deliver a fast and inexpensive result. In fact, non-expertise might be considered advantageous: the greater the results deviate from reality, the better. Ideally one of them deviates greatly in a positive direction!

When using vendor results, be careful to check the fine print for what system was tested, what disk types were used and how many, what network interfaces were used and in which configuration and other factors.

Benchmark specials

A type of sneaky activity -- which in the eyes of some is considered a sin and thus prohibited -- is the development of benchmark specials. This is when the vendor studies a popular or industry benchmark, and then engineers the product so that it scores well, while disregarding actual customer performance. This is also called optimizing for the benchmark.

The notion of benchmark specials became known in 1993 with the TPC-A benchmark, as described on the Transaction Processing Performance Council (TPC) history page :

The Standish Group, a Massachusetts-based consulting firm, charged that Oracle had added a special option (discrete transactions) to its database software, with the sole purpose of inflating Oracle's TPC-A results. The Standish Group claimed that Oracle had "violated the spirit of the TPC" because the discrete transaction option was something a typical customer wouldn't use and was, therefore, a benchmark special. Oracle vehemently rejected the accusation, stating, with some justification, that they had followed the letter of the law in the benchmark specifications. Oracle argued that since benchmark specials, much less the spirit of the TPC, were not addressed in the TPC benchmark specifications, it was unfair to accuse them of violating anything.

TPC added an anti-benchmark special clause:

All benchmark special implementations that improve benchmark results but not real-world performance or pricing, are prohibited.

As TPC is focused on price/performance, another strategy to inflate numbers can be to base them on special pricing -- deep discounts that no customer would actually get. Like special software changes, the result doesn't match reality when a real customer purchases the system. TPC has addressed this in its price requirements:

TPC specifications require that the total price must be within 2% of the price a customer would pay for the configuration.

While these examples may help explain the notion of benchmark specials, TPC addressed them in its specifications many years ago, and you shouldn't necessarily expect them today.

Cheating

The last sin of benchmarking is cheating: sharing fake results. Fortunately, this is either rare or nonexistent; I've not seen a case of purely made-up numbers being shared, even in the most bloodthirsty of benchmarking battles.

Brendan Gregg is lead performance engineer at Joyent and formerly worked as performance lead and kernel engineer at Sun Microsystems an Oracle.

This article is excerpted from the book Systems Performance: Enterprise and the Cloud by Brendan Gregg, published by Prentice Hall Professional, Oct. 2013. Reprinted with permission. Content copyright 2014 Pearson Education, Inc.

Copyright © 2013 IDG Communications, Inc.

1 2 3 4 Page 4
Page 4 of 4
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon