EXT3 File System
Structural Overview
All file systems include a few basic types of data structures:- bootstrap code to be loaded into memory and executed when the computer is powered on. MVS volumes reserve the entire first track of the first cylinder for the boot strap.
- volume descriptors information describing the size, type, and layout of the file system ... and in particular how to find the other key meta-data descriptors.
- file descriptors information that describes a file (ownership, protection, time of last update, etc.) and points where the actual data is stored on the disk.
- free space descriptors lists of blocks of (currently) unused space that can be allocated to files.
- file name descriptors data structures that associate user-chosen names with each file.
- Divide the volume into fixed sized blocks. The block size (typically somewhere between 512 and 64K bytes) is chosen when the file system is created. Larger block sizes lead to more efficient I/O, but incur higher losses do to internal fragmentation.
- Reserve the first block of each volume for a bootstrap.
- Place a volume descriptor (superblock, described in section 3) in the second block of the volume.
- Use an I-node (described in section 4) to describe each file.
- Use directories (described in section 6) to associate names with files.
Each BSD volume is divided up into a number of cylinder groups (CGs). Each cylinder group is a miniature file system ... with the intent that directory entries in a CG point to I-nodes in the same CG, which point to blocks in the same CG. To the extent this is true, a file can be opened and read with minimal head motion.
Volume Descriptor (Super Block)
All UNIX-derived file systems have a superblock that identifies the file system and defines its key parameters. In the BSD file systems (with cylinder groups) there is actually a superblock at the start of each cylinder group. The additional copies are completely redundant, but are occasionally valuable when attempting to recover data from a severely damaged file system.The primary superblock can be found in the second block (block 1) of the volume. Realize, however, that file systems can support a wide range of bolck sizes ... so the superblock may be 512 bytes into the volume, or it may be 64K bytes into the volume (depending on what the file system block size is). When opening a new file system, the OS may have to probe multiple locations in order to find the superblock and determine the file system's block size.
The superblock contains a few basic types of information:
- file system identification
- file system layout parameters
- key device parameters
- file system tuning parameters
- file system status
3.1 File System Identification information
The superblock includes a magic number which identifies the file system type, and can be used to confirm that we have found the correct block size. There are many variants of the Berkeley file system, and most of them have different magic numbers. In addition to the basic magic number, there is usually also some version number information, indicating what features this particular file system does and does not support.The superblock also includes a file system ID number, which can be used by system managers to uniquely label each file system (and indicate what its contents is).
3.2 File System Layout Parameters
In order to interpret the bits of any UNIX file system, we need to know how the file system is layed out. Because they are divided into cylinder groups, BSD file systems have many more layout parameters than most file systems:- the block size used by this file system (e.g. 512, 8K, 64K)
- the size of a block fragment (discussed in section 4.1).
- total size of the file system, in blocks
- total number of data blocks and I-nodes in the file system
- total number of cylinder groups in the file system.
- the offset into each cylinder group where the key areas (e.g. cylinder group summary, free block list, free I-node list, I-node area, and data blocks) begin.
-
I-nodes 1-100, and blocks 1-3,999 would be in cylinder group 1.
I-nodes 101-200, and blocks 4,000-7,999 would be in cylinder
group 2.
etc.
3.3 Device Parameters
We can greatly improve file access time by allocating consecutive blocks to a file. In order to know what blocks are "consecutive", the operating system must know how long a track is, and how many tracks there are per cylinder. It is also useful to know how fast the disk spins (so we can anticipate and plan for rotational latency). Parameters like these are recorded in the superblock to enable the file system implementation to more efficiently allocate and process files.3.4 File System Tuning Parameters
We are often forced to make trade-offs between how rappidly we can perform various operations, and how efficiently we use space. BSD file systems have per file system tuning paremeters that permit system managers to make different trade-offs on different file systems. A temporary file system, for instance, might be optimized for allocation speed, while a project archival file system might be optimized for space efficiency or contiguous allocation. The superblock includes a collection of tunable parameters that permit system managers to optimize each file system for the way it will be used.3.5 File System Status Information
File system superblocks also contain a small amount of information to reflect how the file system is currently being used (or was last being used).- The name of the mount-point onto which the file system was last mounted.
- The time at which the file system was last updated.
- Whether or not the file system was cleanly unmounted. If the system crashes with a file system still mounted, it will be necessary to audit the file system to make sure that the crash did not result in any corruption.
3.6 Cylinder Group Summaries
In addition to the file system description information in the superblock, there is some additional (per cylinder group) information associated with each cylinder group:- total number of free I-nodes, blocks, and fragments in this cylinder group.
- start-of-free-space guess pointers to optimize free-space searches.
4. File Descriptors (I-nodes)
Each file in a UNIX file system is described by an I-node, and any file is uniquely identified by a (file system ID, I-node number) pair. All I-nodes are the same length (typically 128 bytes). As described previously, the division of a file system into cylinder groups makes the process of finding a particular I-node non-trivial.The UNIX/POSIX file manipulation system calls presume that the same basic administrative information is associated with all files, and consequently, all UNIX-based file systems tend to have exactly the same administrative information in their I-nodes:
- The type of the file. In addition to ordinary files (that contain data stored on disk) an I-node might represent a directory, a device, or an inter-process communications port.
- The protection information for this file. The basic protection information is encoded in nine bits, which denote read, write, and execute access, for the owning user, the owning group, and anyone else.
- The integer user-ID of the user who owns the file.
- The ingeger group-ID of the group who owns the file.
- The time when the file was last accessed.
- The time when the file was last modified.
- The time when the I-node was last updated.
- The length of the file (in bytes).
- The number of currently existing links to this file.
4.1 Fragment Blocks
Larger block sizes improve file I/O throughput by performing a smaller number of larger transfers. Traditionally, this comes at the cost of increased internal fragmentation (since on average, half of the last block is unused). The Berkeley file system came up with an innovative solution to this problem.In Berkeley UFS file systems, blocks can be broken into smaller fragments, which are also of a fixed size, but typically only a fraction (e.g. 1/8) as large as a block. If, after being written to and closed, a file does not fill up its last block, the last block is replaced with a smaller number of contiguous block fragments. This can give us the best of both worlds: e.g. the I/O throughput of 8K blocks, but an average internal fragmentation loss of only 512 bytes per file.
This further complicates the allocation and addressing of blocks. All of the block addresses in a file are actually fragment addresses. All of the pointers except for the last, however, refer to a full block worth of fragments. To figure out how many fragments are in the last block, the system looks at the file size, modulo the block size. This sort of complexity is probably typical of the hoops people jump through to get better throughput and storage efficiency out of a file system.
5. Free Space Descriptors
Each cylinder group contains blocks and I-nodes that can be allocated to files. Because the number of blocks and I-nodes per cylinder group is fixed, the Berkeley file system keeps track of them in a bit-map. A bit-map is a very compact representation for the state of a fixed number of objects. Each bit in the bit-map corresponds to one block (fragment) or I-node in the cylinder group.6. File Naming (directories)
UNIX file systems associate names with I-nodes through links in directories. Each link is described by a directory entry. In the original UNIX file system, directory entries were fixed sized (16 bytes) and pretty simple:UNIX file systems have always supported modified hierarchical directory trees. The directories always exist in a tree (each directory has exactly one parent), but there can be multiple links to each file. By convention the first two entries in each directory were for "." a pointer to the I-node for the current directory, and ".." a pointer to the I-node for the parent directory.
6.1 Long File Names
A fully qualified path name can be very long (if the file is nestled deep in a directory hierarchy), but each element of the file name was limited to be 14 bytes or less. Many people complained about this limitation.When Microsoft confronted this problem, they came up with an auxiliary directory entry scheme that could represent the extended filenames in a relatively upwards-compatible fashion. Berkeley took the opposite approach. They had introduced enough changes to the file system that they decided they were willing to change the format of directories. But, rather than merely changing all the existing UNIX programs to understand the new directory entry format, they created new system calls for walking and searching directories. The new system calls hid the underlying format of the directory entries from the applications, thus making it easier to make other changes in the future. Changing programs not to depend on the details of on-disk file system data structures is a good thing.
The new directory format uses variable length directory entries that contain variable length names. Having a separate length for the directory entry and the file name made it easier to add new information to directory entries in the future.
6.2 Symbolic Links
The other major limitations of traditional UNIX directories involved the I-node number in each directory entry:- The I-node number could only refer to files in the same file system. There was no way to create a directory entry that referred to a file in another file system.
- All hard links are equivalent, and there is no way for the owner of a file to delete the file if other links still exist.
- Since the symbolic link contains a name, rather than an I-node number, the name can include the path to the mount-point of another file system. This means that symbolic links can refer to files on other file systems.
- The symbolic link is not a direct reference to an I-node, but rather the name of another file (which is presumably a real link to some file). If the real hard link is deleted, the symbolic link remains, but no longer refers to an existing file. A symbolic link does not prevent a file from being deleted.
Tidak ada komentar:
Posting Komentar