How long do disk drives last?» Click to show Spoiler - click again to hide... «
How long do disk drives last? The short answer is: we don’t know yet, but it’s longer than you might guess.
Why does a company that keeps more than 25,000 disk drives spinning all the time not know how long they last? Backblaze has been providing reliable and unlimited online backup for over five years. For the past four years, we’ve had enough drives to provide good statistics, but 74% 78% of the drives we buy are living longer than four years. So while 26% 22% of drives fail in their first four years, and we have detailed information about the failure rates of drives in their first four years, we don’t yet know what will happen beyond that. So how long do drives last? Keep reading.
How Drives Are Used At Backblaze
Backblaze uses lots of hard drives for storing data. 45 drives are mounted in each Backblaze Storage Pod, and the Storage Pods are mounted in racks in our data centers. As new customers sign up, we buy more disk drives, test them, and deploy them. We are up to 75 petabytes of cloud storage now.
Before being deployed, each Backblaze Storage Pod is tested, including tests on all of the drives in it. Recently, Andy posted about Poor Stephen, a disk drive that failed this testing. His post describes the process Backblaze uses to set up, load test, and deploy a Storage Pod.
Types Of Hard Drives In The Analysis
Backblaze has standardized on “consumer-grade” hard drives. While hard drive companies say these drives are not designed to work in RAID arrays or the 24×7 workload of a data center environment, Backblaze uses software redundancy to protect data. In a future blog post we will delve into the statistics comparing “consumer” and “enterprise” hard drives.
By far the majority of these hard drives are “raw” or “internal” hard drives. However, because the Thailand Drive Crisis made it nearly impossible to find internal hard drives for sale at reasonable prices, Backblaze started to farm hard drives. Thus, approximately 6 petabytes of the drives in this analysis were originally “external” hard drives that were “shucked” out of their enclosures.
Number of Hard Drives
The chart below shows the age distribution of the drives in the Backblaze data centers. The shape of the chart is mostly a reflection of the growth of the company, and the addition of drives as the customer base grew. Overall, not that many drives fail.
blog-drivestats-total-drives-cuml
Failure Rates
Before diving into the data on failure rates, it’s worth spending a little time clarifying what exactly a failure rate means. At first glance, you might think that a failure rate of 100% is the worst possible. Every drive is failing! That’s not the whole story, though.
Imagine you have a disk drive supplier who provides drives that are 100% reliable for six months, but then all fail at that point. What’s the annual failure rate? If you have to keep 100 drives running at all times, you’ll have to replace the drive in every slot twice a year. That means that you’ll have to replace 200 drives each year, which makes your annual failure rate 200%. So, in theory at least, there is no worst possible failure rate. If every drive failed after one hour of use, the annual failure rate would be 876,000%. Fortunately, the drives that Backblaze gets are more reliable than that.
The Bathtub Curve
Reliability engineers use something called the Bathtub Curve to describe expected failure rates. The idea is that defects come from three factors: (1) factory defects, resulting in “infant mortality”, (2) random failures, and (3) parts that wear out, resulting in failures after much use. The chart below (adapted from Wikimedia Commons) shows how these three factors can be expected to produce a bathtub-shaped failure rate curve.
blog-drivestats-bathtub
The theory matches the reality that Backblaze experiences. The chart below shows the failure rate of drives in each quarter of their life. For the first 18 months, the failure rate hovers around 5%, then it drops for a while, and then goes up substantially at about the 3-year mark. We are not seeing that much “infant mortality”, but it does look like 3 years is the point where drives start wearing out.
blog-drivestats-quarter-failure
Calculating Life Expectancy
What’s the life expectancy of a hard disk drive? To answer that question, we first need to decide what we mean by “life expectancy”.
When measuring the life expectancy of people, the usual measure is the average number of years remaining at a given age. So when we say that the life expectancy of newborns in the world in 2010 is 67.2 years, we are saying that if we wait until all of those new people have lived out their lives in 120 or 130 years, the average of their lifespans will be 67.2.
For disk drives, it may be that all of them will wear out before they are 10 years old. Or it may be that some of them last 20 or 30 years. If some of them live a long, long time, it makes it hard to compute the average. Also, a few outliers can throw off the average and make it less useful.
The number that we will be able to compute soon, and the one that is more likely to be useful, is the median lifespan of a new drive. In other words, at what age have half of the drives failed? We are starting to get an idea what the answer will be.
Disk Drive Survival Rates
On the internet, it’s surprisingly hard to get an answer to the question “How long will a hard drive last?” What you’ll find are mostly anecdotal stories, or perhaps references to Google‘s and CMU‘s studies, neither of which really answer the question.
The anecdotes you get don’t give you any useful information:
From tomshardware.com: “Hard drives are mechanical and thus will eventually fail. … I’ve had drives arrive DOA, some die after a day, and some that have lasted 10 years. There is just no way to tell how long a drive will live.”
From CNET: “I don’t know about 5 years. My WD died after 2 years.”
Google’s study has some interesting information on failure rates. They found that temperature doesn’t matter as much as you might think, and that the SMART checks of a drive aren’t very good at predicting drive failure.
CMU’s study found that manufacturer’s MTBF (Mean Time Between Failures) ratings are exaggerated. Drives fail a lot more than the MTBF would indicate.
The chart below shows the percentage of drives at Backblaze that are still alive at different ages:
For the first 1.5 years, drives fail at 5.1% per year.
For the next 1.5 years, drives fail LESS, at about 1.4% per year.
After 3 years though, failures rates skyrocket to 11.8% per year.
blog-drivestats-3-lifecycles
Most Drives Are Still Alive
The chart above could be misleading. At a glance, it appears that most of the drives have already died and all are on track to die within the next year. However, if you redraw the chart with the bottom at 0, you can see that nearly 80% of all the drives Backblaze has ever purchased are still operating!
blog-drivestats-4-year-life
How Long WILL The Hard Drives Last?
What happens to drives when they’re older than 5 years? Neither Google nor the CMU team presented any data on drives older than 5 years, although the CMU paper has a tantalizing comment in its conclusion claiming that failure rates go up after 5 years. No basis for that assertion is provided, though.
At Backblaze, we’ve been up and running for 5 years, and all of the drives we install are new drives, so we also don’t have any data for drives older than that. We are looking forward to finding out what will happen when drives become 5, 6, 7, and 8 years old.
If you extrapolate the line from the previous chart to estimate the point at which half of the drives have died, you get a prediction:
The median lifespan of a drive will be over 6 years.
blog-drivestats-6-year-life
When Backblaze started, there were some concerns that consumer-grade disk drives wouldn’t hold up in a data center. If this 6-year median lifespan is true, it means that more than half the drives will last six years, and those concerns were unfounded. We intend to continue to update these statistics quarterly. Thus, over the next couple of years, we’ll have hard data on the median lifespan of hard drives. Stay tuned to the blog to find out the answers.
Nov 14: Update
My bad: Due to a transcription error, the percentages in the second paragraph were wrong, and were more pessimistic than necessary. 78% (not 74%) of drives are still alive after four years. The projection of a six-year median lifespan is not affected by this change. Thanks to sharp-eyed Frédéric for catching the error. – Brian
» Click to show Spoiler - click again to hide... «
硬盘能用多久?80%可连续跑四年
固态硬盘用户都会担心寿命问题,机械硬盘除非碰上故障则一般不用顾虑这个,那么,一块硬盘到底能用多久呢?
Backblaze是一家在线备份服务提供商,成立已经超过五年,现在手里有超过2.5万块硬盘正在运行,因此有了足够的数据可以统计出硬盘的寿命。
需要指出的是,这些硬盘都是24×7不间断运行的,还组建了RAID阵列,但使用的都是普通消费级产品,而不是企业级的、监控级的,只是使用软件冗余来保护数据,因此本文中的统计情况都代表了持续运行下的硬盘表现,普通用户手中的至少得延伸3倍(那也是每天跑8个小时)。
硬盘能用多久?80%可连续跑四年
在具体展开之前,先解释一下故障率。你可能会觉得100%的故障率就是最糟糕的,那可大错特错了。假设你有100块硬盘,一直都很可靠,然后过了半年突然全部挂掉了,那年故障率应该是多少?这时候你每年得两次全部换新,也就是需要200块新硬盘,因此年故障率为200%。
要是每小时坏一块硬盘呢?年故障率就是876000%!
硬盘能用多久?80%可连续跑四年
Backblaze历年使用的硬盘数量
澡盆曲线(Bathtub Curve):
工程师用这个名词表示产品预期故障率随时间的变化情况。
一般来说,产品故障来自三个方面:1、出厂缺陷,几乎很快就会导致产品坏掉;2、随机故障,基本上是稳定的;3、零部件磨损故障,使用时间越长越容易出现。
三者综合,就会形成一条澡盆曲线。
硬盘能用多久?80%可连续跑四年
理论和事实符合得非常好。以下就是Backblaze硬盘每个季度的硬盘故障率统计:
硬盘能用多久?80%可连续跑四年
最初18个月(六个季度),故障率一直在5%上下,之后一年大幅降低,然后在进入第三个年头的时候急剧增加,达到了10-15%。
这说明,硬盘如果连续使用,有很大的几率在三年后出问题。
平均预期寿命:
人的平均预期寿命这个说法大家经常会听到,但你可能并不清楚它到底代表什么。假如说2010年全球新生儿的平均寿命为67.2岁,那么等大概一个世纪这些人全部去世之后,他们的平均死亡年龄就是67.2岁。当然,可能有些人出生没多久就夭折了,也有些人活到了130岁。
硬盘也是如此。
硬盘存活率:
Backblaze对自己的硬盘统计后发现:
- 头一年半内,每年有5.1%的硬盘挂掉。
- 接下来的一年半里,这个比例降至仅仅大约1.4%。
- 再往后的三年中,故障率窜升至11.8%。
减去这些牺牲的,剩下的在总量中的比例就是硬盘存活率。
硬盘能用多久?80%可连续跑四年
其实单看上述图表很容易误导,似乎硬盘很快就要全部没法用了,但注意Y轴存活率的起点是70%,而换成0的话是这个样子的:
硬盘能用多久?80%可连续跑四年
这就是说,连续运行四年之后,仍有80%的硬盘一切正常。
硬盘到底能用多久?
五年之后会怎么样?这方面的数据实在匮乏,网上基本搜不到有用的,Backblaze也需要继续观察下去才行。只有其它类似但运营时间更长的公司/机构慷慨一些,才能告诉我们更多。
但是如果纯粹在理论上预测呢?将上图的曲线延续下去会是这样的:
硬盘能用多久?80%可连续跑四年
换言之,如果维持后期的故障率不变,那么六年后将有一半的硬盘挂掉,或者说还剩下一半是正常的,又或者说你的硬盘有一半概率能连续跑六年。
这个结论让你放心不?
source with graph @
http://blog.backblaze.com/2013/11/12/how-l...sk-drives-last/This post has been edited by sotong168: Nov 15 2013, 01:01 PM