A few months ago Brian Beach, a Distinguished Engineer at cloud backup joint Backblaze, published a set of study-like blog postings relating their experiences with hard drive lifespan in their 25,000 plus spindle environment.
The blogs garnered quite a bit of interest due to the subject matter, and provocative titles like: “How Long Do Hard Drives Last?” “Enterprise Drives: Fact or Fiction?” and “What Hard Drive Should I Buy?” The blogs raise interesting questions and put forward controversial conclusions.
One of most contentious claims came from the first blog (Simon Sharwood covers it here) in which Beach asserts that consumer-grade hard drives are actually more reliable than their supposedly industrial strength (and definitely more pricey) enterprise cousins. According to their research, enterprise drives failed at an annual rate of 4.6% vs. 4.2% for the consumer versions.
The bottom line, according to Beach, is that consumer drives are a better choice (even after factoring in the longer enterprise warranty) due to their higher reliability and lower cost.
Even more contentious (or contentiouser?) is the last blog, which showed Backblaze failure rates by drive manufacturer. The results were pretty stark, with an “Annual Failure Rate” chart that showed Hitachi drives at less than 2%; Western Digital spinners at around 3%; and Seagate drives at an astounding 14% for the 1.5TB flavor, ~9% for 3TB, and a still-too-high 3.8% or so for the 4GB version. Yikes! We should stay away from Seagate, then, right?
Not So Fast…
Some digging into the analysis reveals that the foundations underlying the Backblaze conclusions aren’t all that sturdy. Take the data center vs. consumer drive failure rate stat, for example. To compute annual failure rates, Backblaze compares failures per “drive-years of service,” which is the number of each type of drive they have multiplied by years of service – simple, eh?
The problem is that they’re comparing 14,719 drive-years of service on their consumer disks vs. only 368 drive-years of service on data center-grade drives. Overall, the enterprise drives had 17 (4.6%) failures while the consumer drives bricked 613 times (4.2%).
This is a damned small sample on the data center drive side of the equation. The difference between 4.2% and 4.6% annual failure rates on 368 drive-years’ worth of service is only 1.5 spindles. So if only two more enterprise drives had survived, the analysis would have shown data center drives to be more reliable than consumer drives.
Moreover, Backblaze has only run the enterprise drives for two years, compared to the more than four years of mileage on their consumer disks. Beach does acknowledge this fact, but doesn’t see any reason to believe that their enterprise drives will become more reliable in the next three years, or to the end of their warranty period.
So What Hard Drive Should I Buy? Tell me, tell me!!
Looking at big, colorful charts tells me that I should avoid Seagate drives like I avoid email from a Nigerian bureaucrat who’s just looking for a bit of help getting some money out of his country.
But the real story is a lot more nuanced and complicated. This article from Instrumental CEO Henry Newman does a great job of digging into the guts of the Backblaze analysis and pointing out the shortcomings in their approach.
Henry Newman is a bit of an institution in the HPC and storage world. He’s not what I’d call “reserved” when it comes to sharing his opinions – not a guy who pulls his punches. But he also backs up his opinions with facts and solid research, making him one of my go-to sources when I don’t understand something (which is pretty damned often).
In his analysis of the analysis, Henry points out that Backblaze’s Seagate results are hugely skewed by two drive models: the 1.5TB Barracuda and Barracuda Green SKUs. Seagate publicly disclosed problems with this drive family back in 2008, so it’s not surprising that these drives have – well – problems, right?
There are also some issues with exactly how they’re evaluating the drives, how much traffic a consumer drive should be expected to handle, and things along those lines. It’s all pretty interesting stuff, and points to the need for more rigorous research and testing when it comes to drives and reliability.
One final point: at the end of the “What Hard Drive Should I Buy” post, Beach discussed what Backblaze is buying today. Right now, their most favored drive is the – wait for it – Seagate 4TB Barracuda, even though it supposedly is less reliable than the WD or Hitachi drives. Huh?
Brian Wilson, CTO and founder of Backblaze (and front man for the Beach Boys) explains it this way:
“Double the reliability is only worth 1/10th of 1 percent cost increase. I posted this in a different forum:
Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary – maybe $5,000 total? The 30,000 drives costs you $4 million.
The $5k/$4million means the Hitachis are worth 1/10th of 1 percent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).
Moral of the story: design for failure and buy the cheapest components you can. “
So the value of higher reliability – in their unique situation – isn’t nearly as much as one might think. Using Brian’s analysis above, this means that a drive that offered double the reliability of the 4TB Seagates (which currently cost around $160) is only worth an additional $0.16 (yeah, sixteen cents) to Backblaze.
A quick check of 4TB spindles from Western Digital and Hitachi (now owned by WD) reveals that retail prices of these are roughly $30 – $50 more than the Seagate alternative.
So what can we learn from all of this? You need to carefully evaluate your information sources. While it’s easy to say “Do your own testing,” it’s just not practical in most cases. You’re going to have to rely on third party sources of information to some extent.
When looking at user experiences, reviews, case studies, etc., you have to factor how they’re using the product and how well that lines up with your unique needs and requirements. And remember, as always: past performance doesn’t necessarily dictate future results. And your mileage will vary. Caveat emptor, y’all…