[分享]关于磁盘碎片两篇文章及其翻译,让你对Linux文件系统放心。

系统安装、升级讨论
版面规则
我们都知道新人的确很菜,也喜欢抱怨,并且带有浓厚的Windows习惯,但既然在这里询问,我们就应该有责任帮助他们解决问题,而不是直接泼冷水、简单的否定或发表对解决问题没有任何帮助的帖子。乐于分享,以人为本,这正是Ubuntu的精神所在。
回复
头像
JangMunho
帖子: 1347
注册时间: 2006-01-18 12:55
来自: 也许真的没有人理解

[分享]关于磁盘碎片两篇文章及其翻译,让你对Linux文件系统放心。

#1

帖子 JangMunho » 2008-02-14 20:43

第一篇是来自Linux.org上的HOW-TO文章,有些人已经见过了。

代码: 全选

10.4. Some facts about file systems and fragmentation

Disk space is administered by the operating system in units of blocks and fragments of blocks. In ext2, fragments and blocks have to be of the same size, so we can limit our discussion to blocks.

Files come in any size. They don't end on block boundaries. So with every file a part of the last block of every file is wasted. Assuming that file sizes are random, there is approximately a half block of waste for each file on your disk. Tanenbaum calls this "internal fragmentation" in his book "Operating Systems".

You can guess the number of files on your disk by the number of allocated inodes on a disk. On my disk

# df -i
Filesystem           Inodes   IUsed   IFree  %IUsed Mounted on
/dev/hda3              64256   12234   52022    19%  /
/dev/hda5              96000   43058   52942    45%  /var

there are about 12000 files on / and about 44000 files on /var. At a block size of 1 KB, about 6+22 = 28 MB of disk space are lost in the tail blocks of files. Had I chosen a block size of 4 KB, I had lost 4 times this space.

Data transfer is faster for large contiguous chunks of data, though. That's why ext2 tries to preallocate space in units of 8 contigous blocks for growing files. Unused preallocation is released when the file is closed, so no space is wasted.

Noncontiguous placement of blocks in a file is bad for performance, since files are often accessed in a sequential manner. It forces the operating system to split a disk access and the disk to move the head. This is called "external fragmentation" or simply "fragmentation" and is a common problem with MS-DOS file systems. In conjunction with the abysmal buffer cache used by MS-DOS, the effects of file fragmentation on performance are very noticeable. DOS users are accustomed to defragging their disks every few weeks and some have even developed some ritualistic beliefs regarding defragmentation.

None of these habits should be carried over to Linux and ext2. Linux native file systems do not need defragmentation under normal use and this includes any condition with at least 5% of free space on a disk. There is a defragmentation tool for ext2 called defrag, but users are cautioned against casual use. A power outage during such an operation can trash your file system. Since you need to back up your data anyway, simply writing back from your copy will do the job.

The MS-DOS file system is also known to lose large amounts of disk space due to internal fragmentation. For partitions larger than 256 MB, DOS block sizes grow so large that they are no longer useful (This has been corrected to some extent with FAT32). Ext2 does not force you to choose large blocks for large file systems, except for very large file systems in the 0.5 TB range (that's terabytes with 1 TB equaling 1024 GB) and above, where small block sizes become inefficient. So unlike DOS there is no need to split up large disks into multiple partitions to keep block size down.

Use a 1Kb block size if you have many small files. For large partitions, 4Kb blocks are fine. 
10.4 关于文件系统和磁盘碎片的一些事实
磁盘由操作系统以块和段为单位管理。在ext2文件系统中,段和块的大小是一样的,所以我们接下来的讨论均针对块进行。

文件的大小不尽相同,文件的结尾也很有可能不在块边缘上,所以所有文件的结尾块都会被浪费一部分。如果文件大小是随机的,那么大约磁盘中每个文件都会浪费一半的块空间。Tanenbaum在他的著作《操作系统》中称此为“内部碎片”。

你可以通过磁盘已分配的inode号来估测磁盘中的文件数目。
在我的磁盘中:

代码: 全选

# df -i
Filesystem           Inodes   IUsed   IFree  %IUsed Mounted on
/dev/hda3              64256   12234   52022    19%  /
/dev/hda5              96000   43058   52942    45%  /var
在“/”目录下大约有12000个文件,在“/var”中大约有44000个。当块大小为1KB时,就将近有6+ 22= 28MB的磁盘空间因为文件结尾而浪费了。如果我选择了4KB的块大小,我就将失去4倍于这个数目的磁盘空间。

不过,在有较大且连续块的文件系统中,数据的传输也会比较快。这就是为什么ext2文件系统在增添文件的时候会试图以8个连续块预分配空间。没有使用的预分配空间在关闭文件时会被释放,所以空间没有浪费。

使用非连续的块来存储文件会影响性能,因为文件通常会被顺序读取。这迫使操作系统将磁盘操作分为许多次,并不断移动磁头;这就叫做“外部碎片”,或简单称之为“碎片”。磁盘碎片是MS-DOS文件系统的普遍问题。考虑到MS-DOS对磁盘缓存的极端使用,碎片对性能的影响不容忽视。DOS用户已经习惯了每几个星期进行一次碎片整理,甚至有人将此发展成了癖好。

这些习惯在Linux和ext2中可以彻底抛弃了。只要硬盘使用原生的Linux文件系统,并且生于空间不少于5%,那么这样的日常使用完全不用碎片整理。ext2有一个叫做defrag的碎片整理工具,但用户即便是偶尔一次使用也会被警告,因为在执行defrag的时候,一旦发生意外断电,你的文件系统可能彻底瘫痪。一般用户都会作备份,恢复备份文件就可以达到整理的目的。

MS-DOS文件系统还会因为内部碎片而浪费大量的磁盘空间。当分区大于256MB的时候,DOS文件块变得相当庞大,以至于它们的使用率变得非常低(FAT32在一定程度上修正了此问题)。ext2并不会因为分区变大而被迫使用更大的块,除非文件系统太大(超过512GB,因为小的文件块会影响性能),所以它也不用像DOS那样将磁盘分成许多区以保持较小的文件块大小。

如果你有许多小文件,请使用1KB的块大小,大的分区使用4KB更为合适。

另,摘自http://www.itworld.com/Comp/3380/nls_unixfrag040929/index.html

代码: 全选

Fragmentation on Unix

Most modern Unix file systems attempt to fragmentation at a minimum, though they do this in various ways. The ufs file systems used by Solaris and nearly all BSD variants of Unix attempt to keep fragmentation to a minimum by writing related data blocks within the same cylinder group. This reduces seek time when the files are accessed. And, while a large block size is used to improve throughput, a smaller unit of storage -- referred to as a fragment -- is used to store portions of files that don't require a full block. This reduces the wasted space within the file system and one variety of fragmentation sometimes referred to as "internal fragmentation".

The ext2 and ext3 file systems most often used on Linux systems also attempt to keep fragmentation at a minimum. These file systems keep all blocks in a file close together. How they do this is by preallocating disk data blocks to regular files before they are actually used. Because of this, when a file increases in size, several adjacent blocks are already reserved, reducing file fragmentation. It is, therefore, seldom necessary to analyze the amount of fragmentation on a Linux system, never mind actually run a defragment command. An exception exists for files that are constantly appended to as the reserved blocks will only last so long.
Unix中的磁盘碎片:

多数现代Unix操作系统都试图将碎片最少化,但它们的做法并不相同。像Solaris和绝大多数BSD Unix操作系统使用的ufs就是通过尽量将相关数据写入同一柱面来是磁盘碎片最少化,这使得读取文件的时候,寻道时间得以缩短。而且,大的文件块用于提升吞吐性能,小的文件块(就是段)用于存储不够一个整块的小文件。这就减少了空间浪费,由此产生的碎片称为“内部碎片”。

Linux所使用的ext2和ext3文件系统也试图将碎片数量简直最小。这两种文件系统将同一文件的文件块保持接近。它们通过使用之前的预分配来做到这一点。正因为如此,当一个文件的大小增长了的时候,它后面的许多文件块已经被分配了,磁盘碎片得以减少。因此,在Linux中分析磁盘上的碎片是少见的事情,运行磁盘整理程序是无所谓的。多余的预分配空间并不会占用很长时间。
MacBook Pro 15" User
Cocoa Programmer
头像
JangMunho
帖子: 1347
注册时间: 2006-01-18 12:55
来自: 也许真的没有人理解

#2

帖子 JangMunho » 2008-02-15 20:56

居然没有人回帖……好歹我翻译了半天,有歧义也说一下嘛
MacBook Pro 15" User
Cocoa Programmer
回复