[译][新闻]你的关于硬盘相关的知识,全部都是错的!

最新ubuntu/linux/开源新闻或者其它IT相关资讯
回复
头像
skyx
论坛版主
帖子: 9202
注册时间: 2006-12-23 13:46
来自: Azores Islands
联系:

[译][新闻]你的关于硬盘相关的知识,全部都是错的!

#1

帖子 skyx » 2008-02-29 9:08

Everything You Know About Disks Is Wrong
你的关于硬盘相关的知识,全部都是错的!


http://storagemojo.com/?p=383


"The Google engineers just published a p-a-p-e-r on Failure Trends in a Large Disk Drive Population. Based on a study of 100,000 disk drives over 5 years they find some interesting stuff. To quote from the abstract: 'Our analysis identifies several parameters from the drive's self monitoring facility (SMART) that correlate highly with failures. Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures. Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported.'"



google的工程师edpin,wolf,luiz调查研究了 5 年间10万片硬盘的使用情况,发表了一篇关于硬盘的论文,论文见附件



得到的结论是:

在单个硬盘上,SMART所报的硬盘当前状态参数和硬盘失效没有任何相关性,SMART根本无法预测或报告当前硬盘的健康状况。
更让人吃惊的是:
硬盘温度以及硬盘的工作强度(activity levels)和硬盘失效狗屁关系都没有。


另一个研究机构的报告也得出了相似的结论:
http://www.usenix.org/events/fast07/tec ... index.html




以上两个报告表明:

* Expensive 'enterprise' drives don't have notably better reliability than their 'consumer' counterparts (consider this conclusion in the context of my past recommendation of Western Digital 10,000 RPM Raptor SATA HDDs as a credible alternative to other manufacturers' much more costly SAS drives)
* S.M.A.R.T. error reporting only encompasses a fraction of all experience HDD failure mechanisms, and, specifically to this writeup's theme,
* RAID 1 and 5 are less robust than might appear to be the case at first glance...particularly when (as in my case...ahem) all of the drives in the RAID array come from the same manufacturer, and especially when they come from the same manufacturing lot. If one drive fails, the likelihood that a second drive will fail shortly thereafter is uncomfortably...likely.

*相对消费级的硬盘,昂贵的企业级硬盘驱动器并没有表现出更好的可靠性,因此,在这种情况下,我曾推荐的西数10000 RPM的猛禽消费级SATA硬盘,理所当然地可以作为一个相对更为可靠的昂贵企业级SAS(Serial-Attached SCSI )硬盘驱动器的替代
* s.m.a.r.t.错误报告只涵盖了一小部分硬盘失效的机制,特别是在我描述的这种情况下。
*RAID 1和5.更加显得脆弱 ,尤其是如同这种情况时:所有的硬盘来自同一制造商,尤其是当他们来自同一个批次,如果一个驱动器出故障,有可能第二个驱动器就会在此后不久出现令人不安的失效




如果你懂点硬盘常识,以下内容你会认为都是对的:

* Costly FC and SCSI drives are more reliable than cheap SATA drives.
* RAID 5 is safe because the odds of two drives failing in the same RAID set are so low.
* After infant mortality, drives are highly reliable until they reach the end of their useful life.
* Vendor MTBF are a useful yardstick for comparing drives.


我们想当然的硬盘知识:

*昂贵的光纤接口或SCSI硬盘比廉价的SATA硬盘更可靠。
*因为在同一RAID卡上的两个硬盘同时失效的概率是如此之低,所以RAID5是安全的,
*在经过最初的高失效期后,硬盘就具有高度可靠性,直至他们达到可用寿命。
*厂商给的MTBF(平均无故障使用时间)是一个有用的比较驱动器可靠性的尺度。

实际上的情况,以上认识全是错的!!
附件
disk_failures.zip
google的工程师edpin,wolf,luiz调查研究了 5 年间10万片硬盘的使用情况,发表的让世人震惊的论文
(211.23 KiB) 已下载 1610 次
上次由 skyx 在 2008-05-28 9:11,总共编辑 2 次。
no security measure is worth anything if an attacker has physical access to the machine
头像
xiooli
帖子: 6956
注册时间: 2007-11-19 21:51
来自: 成都
联系:

#2

帖子 xiooli » 2008-02-29 9:14

那么应该如何让我们不把这个错误继续下去呢?
头像
windwiny
帖子: 2254
注册时间: 2007-03-13 17:26

#3

帖子 windwiny » 2008-02-29 9:15

谁翻译一下
头像
eexpress
帖子: 58428
注册时间: 2005-08-14 21:55
来自: 长沙

#4

帖子 eexpress » 2008-02-29 9:25

此文本身就是耸人听闻的,和标题党类似。
● 鸣学
头像
eexpress
帖子: 58428
注册时间: 2005-08-14 21:55
来自: 长沙

#5

帖子 eexpress » 2008-02-29 9:30

skyx 写了: 要点我都翻译了。
● 鸣学
头像
skyx
论坛版主
帖子: 9202
注册时间: 2006-12-23 13:46
来自: Azores Islands
联系:

#6

帖子 skyx » 2008-02-29 9:46

eexpress 写了:
skyx 写了: 要点我都翻译了。

* Expensive 'enterprise' drives don't have notably better reliability than their 'consumer' counterparts (consider this conclusion in the context of my past recommendation of Western Digital 10,000 RPM Raptor SATA HDDs as a credible alternative to other manufacturers' much more costly SAS drives)
* S.M.A.R.T. error reporting only encompasses a fraction of all experience HDD failure mechanisms, and, specifically to this writeup's theme,
* RAID 1 and 5 are less robust than might appear to be the case at first glance...particularly when (as in my case...ahem) all of the drives in the RAID array come from the same manufacturer, and especially when they come from the same manufacturing lot. If one drive fails, the likelihood that a second drive will fail shortly thereafter is uncomfortably...likely.

*相对消费级的硬盘,昂贵的企业级硬盘驱动器并没有表现出更好的可靠性,因此,在这种情况下,我曾推荐的西数10000 RPM的猛禽消费级SATA硬盘,理所当然地可以作为一个相对更为可靠的昂贵企业级SAS(Serial-Attached SCSI )硬盘驱动器的替代
* s.m.a.r.t.错误报告只涵盖了一小部分硬盘失效的机制,特别是在我描述的这种情况下。
*RAID 1和5.更加显得脆弱 ,尤其是如同这种情况时:所有的硬盘来自同一制造商,尤其是当他们来自同一个批次,如果一个驱动器出故障,有可能第二个驱动器就会在此后不久出现令人不安的失效
no security measure is worth anything if an attacker has physical access to the machine
头像
eexpress
帖子: 58428
注册时间: 2005-08-14 21:55
来自: 长沙

#7

帖子 eexpress » 2008-02-29 9:50

工控机搞过的。是器件达到工业级标准而已,而不是设计达到更高层次。
所以说,这样的文章,对不熟悉的人等于白说,熟悉的,不需要说。
● 鸣学
头像
skyx
论坛版主
帖子: 9202
注册时间: 2006-12-23 13:46
来自: Azores Islands
联系:

#8

帖子 skyx » 2008-02-29 10:02

* Costly FC and SCSI drives are more reliable than cheap SATA drives.
* RAID 5 is safe because the odds of two drives failing in the same RAID set are so low.
* After infant mortality, drives are highly reliable until they reach the end of their useful life.
* Vendor MTBF are a useful yardstick for comparing drives.

我们想当然的硬盘知识:

*昂贵的光纤接口或SCSI硬盘比廉价的SATA硬盘更可靠。
*因为在同一RAID卡上的两个硬盘同时失效的概率是如此之低,所以RAID5是安全的,
*在经过最初的高失效期后,硬盘就具有高度可靠性,直至他们达到可用寿命。
*厂商给的MTBF(平均无故障使用时间)是一个有用的比较驱动器可靠性的尺度。
no security measure is worth anything if an attacker has physical access to the machine
头像
skyx
论坛版主
帖子: 9202
注册时间: 2006-12-23 13:46
来自: Azores Islands
联系:

#9

帖子 skyx » 2008-02-29 10:06

windwiny 写了:谁翻译一下

翻译工作到此结束
no security measure is worth anything if an attacker has physical access to the machine
头像
skyx
论坛版主
帖子: 9202
注册时间: 2006-12-23 13:46
来自: Azores Islands
联系:

#10

帖子 skyx » 2008-02-29 17:49

eexpress 写了:
skyx 写了: 要点我都翻译了。
一楼已经更新,都翻译了。
no security measure is worth anything if an attacker has physical access to the machine
头像
hcym
帖子: 15634
注册时间: 2007-05-06 2:46

#11

帖子 hcym » 2008-02-29 17:56

nnd



SMART所报的硬盘当前状态参数和硬盘失效没有任何相关性


缺德


确实


确证
头像
adagio
论坛版主
帖子: 22110
注册时间: 2008-02-17 23:47
来自: 美丽富饶的那啥星球

#12

帖子 adagio » 2008-02-29 20:40

硬盘温度和硬盘的工作强度(activity levels)和硬盘失效狗屁关系都没有。
有同感,我老觉得买硬盘基本就是撞大运。
看牌子?IBM够牛吧?搞个玻璃硬盘最后还不是被人骂死,扔给日本人了事。我公司现在还挺者两个,真庆幸我当时没买。
98年买的昆腾火球,半年就完蛋了。05年买了个希捷,天天BT,有时候BT加电驴,到现在还不是转得哗哗的。[/quote]
头像
skyx
论坛版主
帖子: 9202
注册时间: 2006-12-23 13:46
来自: Azores Islands
联系:

#13

帖子 skyx » 2008-02-29 22:00

前阵子一帮既得利益集团拼命攻击ubuntu 的load cycle问题,我一直想找一个有力的反驳证据,没想到伟大的google工程师在一年前就发表了相关的论文。nnd.

感谢某个版副的加精支持。
no security measure is worth anything if an attacker has physical access to the machine
majia1hao
帖子: 180
注册时间: 2007-09-09 9:35

#14

帖子 majia1hao » 2008-02-29 22:03

我不是专家,有关硬盘的相关知识少的可怜,听说过smart,但是不了解,我只知道硬盘是用来装东西的,原理是磁存储,一般是3.5寸7200转,也有2.5寸5400转的,是不是都错了?

俺们了解点知识不容易,专家一张口,全错了!国外也有砖家阿!就这耸人听闻的标题,其内容的严谨性就值得怀疑

其他的我也反驳不了,对raid有一点了解,我就想不通raid怎么更脆弱了?
因为在同一RAID卡上的两个硬盘同时失效的概率是如此之低,所以RAID5是安全的
不错,这就是raid的原理和设计思想
可是,外国砖家说
所有的硬盘来自同一制造商,尤其是当他们来自同一个批次,如果一个驱动器出故障,有可能第二个驱动器就会在此后不久出现令人不安的失效
一回事么?第一个驱动器失效之后,你不补一个硬盘上去,那还能叫raid5么?raid是让你这样用的?出了故障不处理,轻伤不下火线,那迟早得挂,什么也抗不住这样用
头像
skyx
论坛版主
帖子: 9202
注册时间: 2006-12-23 13:46
来自: Azores Islands
联系:

#15

帖子 skyx » 2008-02-29 22:11

majia1hao 写了:
因为在同一RAID卡上的两个硬盘同时失效的概率是如此之低,所以RAID5是安全的
不错,这就是raid的原理和设计思想
可是,外国砖家说
所有的硬盘来自同一制造商,尤其是当他们来自同一个批次,如果一个驱动器出故障,有可能第二个驱动器就会在此后不久出现令人不安的失效
一回事么?第一个驱动器失效之后,你不补一个硬盘上去,那还能叫raid5么?raid是让你这样用的?出了故障不处理,轻伤不下火线,那迟早得挂,什么也抗不住这样用
less robust
will fail shortly thereafter is uncomfortably...likely


不好意思,我翻译的句子中,带有我的主观感情。

我英文不好,中文更烂,有歧义的地方,请以英文原文为准,特别是要以两篇英文论文为准
no security measure is worth anything if an attacker has physical access to the machine
回复