24 February 2006

Unix File System

(This is a subheading of Unix-3)

The file structure is a very basic identifying feature of the operating system. Files are organized in the familiar system of a tree, with one "root" directory branching into many subdirectories, ad infinitum. Unix is, or was, distinguished by the fact that it treats everything, including hardware, running processes, or network connections, as if they were a file.

Behind this is a technical format which I shall explain briefly because I'm not an expert. One of the early concerns of the Unix developers was getting the system to launch when it was turned on. This was a technical problem likened to "lifting oneself up by one's bootstraps," because the computer had to "know" what to do when it was turned on. One could of course embed instructions on this matter in the microprocessor, but this would (a) tie up costly CPU space back when that sort of thing was very scarce, and (b) make it very difficult to improve the OS. The solution was in fact to give the microprocessor the absolute minimum instructions, and then have those instructions lead to more specific instructions, thence to instructions more specific still, culminating in a stable computing environment ready for use.

In Unix, the strategy was to designate sectors of computer memory as boot blocks, which the microprocessor reads first. One of the boot blocks is the superblock, which contains a magic number, or code that communicates that this has the Unix file system (UFS), plus the UFS reader and other basic system data. After the various hardware components run their self-check, the microprocessor is instructed to open the UFS reader, which then loads the UFS.

Then, with the UFS enabled, the microprocessor launches the secondary boot program. This secondary boot program loads two kernels, a platform-specific kernel (which contains information relevant to that particular microprocessor model) and the Unix kernel. At this point, some important differences emerge in the booting process depending on if it's Solaris, HP-UX, Tru64, AIX, IRIX, FreeBSD, OpenBSD, Mac OS X, or Linux. Another crucial distinction is whether this is a single-user system or a multi-user system; if the former, then user-login is disabled and a root password is required to access the shell.

A standard concept in file systems is the idea of the hard disk as a stack of platters (see figure, right). In the figure, the concentric bands on the surface of each disk are called tracks, each of which is divided into 8 sectors. The disk cylinder is the set of matching tracks on all of the platters. The mechanical significance is that each platter has its own head reader, and each reader is attached to a thin armature. The armatures are like a comb, and they cannot move independently of each other, so all read-write heads will be reading tracks in the same cylinder. That's why the cylinder is such an important concept.

Several adjacent disk cylinders are known as a cylinder group. The UFS has several cylinder groups, each with copies of the superblock, the inodes (information about the file), and data blocks containing the actual contents of each file.

This was the technical achievement of the UFS that spread with subtle variation throughout the computing world. Further refinement included levels of abstraction between the mechanical organization of files among tracks of the disk platters, and the actual UFS.
Unix allows a wide variety of file names, typically up to 14 characters of any kind except the null character and "/." In addition, one may include a file extension, although this is not always required. My reference mentions that #, @, ?, $, !, &, *, parentheses, colons and semicolons, pipes (|), quotes, carets (^), <,>, \, and some other punctuation symbols are likely to cause problems if used. UFS distinguishes between upper and lower case letters.

Programs written in C require a file extension of .c, and the troff word processor requires that macros (but not documents!) have extensions .mm or .ms. In some cases, applications may require two file extensions; for example, document.tar.z means that document has been archived with tar and compressed using pack. The benefit of this is that a single script can uncompress ('unpack") and then "untar" the document.
RESOURCES: Wikipedia, Unix File System, bootstrapping, metadata; "Booting process in Solaris," Adminschoice; "The Structure of Cylinder Groups for UFS File Systems," Sun Microsystems; The Art of Unix Programming, Eric Raymond: esp. "File Attributes and Record Structures," "A Unix File Is Just a Big Bag of Bytes";

ONLINE TUTORIALS: Unix tutorial, Stanford University School of Earth Sciences, USA; "Unix for Web developers," eXtropia tutorials;

BOOKS: UNIX: the Complete Reference, by Kenneth Rosen, Douglas Host, James Farber, & Richard Rosinski—Tata McGraw-Hill edition 2002

Labels: , , ,


Post a Comment

<< Home