26 February 2006

Unix Shells

(This is a subheading of Unix-3)

Superscripts on Unix commands link to the online man pages.

Because of the immense power and versatility of the Unix operating system, and because of the relatively austere user interface, there is a need for shells that envelope the kernel to various depths. In this metaphor, the OS is like a stem of wheat, and the kernel is the pulpy part inside, while the shell is the hard outer husk. Unix shells are command line interfaces but nevertheless vary considerably, and it's not unusual for Unix users to use multiple shells concurrently.

All of these standard shells are available online for free, or else come standard with any installation of Unix. Since one of the purposes of a shell is to interpret system commands, shells are sometimes spoken of as if they were programming languages/compilers. Hence, multiple shells exist for about the same reason that different programming languages do.

Here's a listing of the most common shells:

Bourne Shell (sh)man: original, stripped-down shell.

C Shell (csh)man: includes a command history list and job control. Shells related to this one (e.g., tcsh) have syntax that is incompatible with the ksh family.

Korn Shell (ksh)man: builds on the Bourne shell by adding command history list, job control, and history editing. Also known as the POSIX shell because it was the basis of the POSIX standard. This partly explains why bash, tcsh, and zsh are based on ksh.

Bourne Again Shell (bash)man: updated version of ksh, with built-in help command. This is the standard Linux (where it is known as sh).

Remote Shell (rsh)man: originally peculiar to BSD Unix, it has expanded to become a family of protocols: rcpman, rexecpman, and rshdman. The remote shell was designed expressly for remote access to online systems, but possessed a tragic flaw: it does not encrypt data.

Secure Shell (ssh)man: developed in 1995; protocol and set of standards to facilitate private, secure communications between 2 computers over an IP network. Usually used as an alternative to Telnet. The secure shell can be used to secure not only interactive access but also provides file transfer and virtual private networks (VPN's). (Main article)

TENEX C Shell (tcsh)man: a version of csh, with built-in help, created for the TENEX operating system. Because it is entirely compatible with csh, it could be ported to other Unix variants. It includes spelling correction, programmable word completion, and of course all the features of csh. However, it has been almost entirely replaced by bash.

Z Shell (zsh)man: another updated version of ksh. Of all the common Unix shells, quite possibly the most powerful. It is also one of the more recent, having been created in 1990.
Here are a few common features of Unix shells (now standard across OS's):

command history list: a listing of all commands the user enters during a session. One merely enters the history command, history.* By default, history only displays the previous command, but one can alter that with the command set history = 8 (or whatever number one likes). Depending on the shell, there are many additional commands. (Many of these will be familiar to DOS users).

history editing: useful for correcting errors. It allows one to edit command lines using the vi script editor.

job control: often Unix will be running commands in the background, so that one may well have one or two commands running in the background. Job control commands allow one to terminate (i.e., to kill) a command that is running; to suspend (i.e., to stop)
ADDITIONAL READING & SOURCES: The superscript man links to the man pages for that item. Korn Shell (page by developer David Korn); "Bourne Shell Overview," from old university course material (please note that sh is quite old and has not changed significantly from 1991); "Introduction to the C Shell," Bill Joy (revised for BSD 4.3); Kimmo Suominen, "Getting Started with SSH"; the Zsh Wiki; the TSCH Wiki; Wikipedia, Bourne Shell, C Shell, Korn Shell, bash, tcsh, zsh, Comparison of computer shells;

ONLINE TUTORIALS: Unix tutorial, Stanford University School of Earth Sciences, USA; "Unix for Web developers," eXtropia tutorials; "UNIX Tutorial for Beginners," University of Surrey, Guildford, UK; "Unix System Administration Independent Learning," esp. "Description of different kinds of shells"—USAIL; "A User's Guide to the Z-Shell," Peter Stephenson (2003)

BOOKS: UNIX: the Complete Reference, by Kenneth Rosen, Douglas Host, James Farber, & Richard Rosinski—Tata McGraw-Hill edition 2002

Labels: , , , ,

25 February 2006

Unix Utilities

(This is a subheading of Unix-3)

In my post on Unix shells, I mentioned that Unix is an extremely powerful and versatile operating system. The reason why I say this is that Unix has historically been supremely well-endowed with utilities. Mainly this is because all of the various flavors of Unix are used primarily for software development, and hence have especially sensitive files, directories, or data streams to manage.

Many of these are programs listed under the Tools and Utilities category of important Unix shell commands. A large number of these utilities include search engines, such as grep, look, and find. There are the shell utilities, which are interpreted program languages: awk, sed, echo, dc, bc. The point of an interpreted language is that it does not literally execute your script in an unprotected mode; it's actually an interactive program that uses your script as input, but has considerable leeway in responding to it. Instead of crashing when you have an error, it helps you to find the error, or recovers prior versions of your script that didn't crash.

Sed is used for editing text files. Basically it works like this: you invoke sed by simply using it in the command line, then using any number of sed commands, such as s (substitute), then the basic parameters of the command (like "address" is replaced with "10 Downing Street") and location info (e.g., the file name.)
           $ sed 's/address/10 Downing Street/' [old]new file

(The quotes are just good practice; they're not always essential.) Sed allows you to use complex wildcards and nest multiple commands (-e):
           $ sed -e 's/c/c/' -e 's/y/Y/' [old]new file
It's interesting to note that sed is regarded as a utility, and yet it (and all the other shell utilities) are used entirely within the shell where one invokes them. Of course, to a Unix user this is a trivial remark. But to someone accustomed to programs being invoked, and operating in their own shell (or "window"), this is a fairly novel concept. Moreover, sed is likely to be invoked by a script, so that programs one writes seamlessly use sed.

Echo is a very simple utility; it writes the argument as the standard output. The trivial example is
           $ echo good morning
           good morning
Understandably, this might seem a bit silly. But what echo can also do is return the values of parameters:
           $ echo $PATH
This is useful in scripts when echo is combined with "escape sequences" (operands) which modify the standard output relative to the operand (e.g., inset tabs, backspace). Echo scripts are used for interactive modules on programs, in which the program asks users data required to do its job. Also, instructors frequently use echo combined with some other command to demonstrate what the other command does (example; the author linked types "echo "abc 123" | sed [...]" and echo returns the results of what sed does to "abc 123."

The utilities dc and bc are both calculators; dc stands for "desk calculator," bc for "basic calculator." A big difference between the two is that bc is somewhat simpler, and uses conventional math notation; dc uses Reverse Polish Notation (e.g., "1 2 + p" returns 3), while bc uses infix (e.g., "1+2" returns 3); the advantage of RPN is that it makes it easier to enter complex formulae without resorting to parentheses; you know the order in which the processor will perform math operations.

While this list is woefully incomplete, I am reminded that no list would be complete without mentioning make. This is a tool that keeps track of dependencies among constituent files of a program. It keeps track of the function of executable file types so that a lot of the routine steps of program development are automated. First, the programmer must create a makefile that records all of the dependencies of the program's files: sources, object modules, assembler files, and so on. When the make command is issued, source files are recompiled. This utility is very flexible, so that (for example) one can define many potential files as SOURCES.

Further automation resulted in the makedepend, which generates makefiles.
RESOURCES: The Art of Unix Programming, Eric Raymond; esp. "make: Automating Your Recipes"; "Sed - An Introduction and Tutorial," Bruce Barnett (2007); "Using Unix 'desk calculator'"; "bc - The shell maestro’s calculator,"Pandu Rao (2007) & "GNU bc," CyrekSoft; "Unix Manual Page for awk," "bc," "dc," "look," "sed," "ufsrestore," "vi," & "yacc"; "UNIX Shell Scripting Universe," Richard Reepe;

ONLINE TUTORIALS: Unix tutorial, Stanford University School of Earth Sciences, USA; "Unix for Web developers," eXtropia tutorials; "UNIX Tutorial for Beginners," University of Surrey, Guildford, UK; "Unix System Administration Independent Learning," USAIL;

BOOKS: UNIX: the Complete Reference, by Kenneth Rosen, Douglas Host, James Farber, & Richard Rosinski—Tata McGraw-Hill edition 2002

Labels: , , , ,

24 February 2006

Unix File System

(This is a subheading of Unix-3)

The file structure is a very basic identifying feature of the operating system. Files are organized in the familiar system of a tree, with one "root" directory branching into many subdirectories, ad infinitum. Unix is, or was, distinguished by the fact that it treats everything, including hardware, running processes, or network connections, as if they were a file.

Behind this is a technical format which I shall explain briefly because I'm not an expert. One of the early concerns of the Unix developers was getting the system to launch when it was turned on. This was a technical problem likened to "lifting oneself up by one's bootstraps," because the computer had to "know" what to do when it was turned on. One could of course embed instructions on this matter in the microprocessor, but this would (a) tie up costly CPU space back when that sort of thing was very scarce, and (b) make it very difficult to improve the OS. The solution was in fact to give the microprocessor the absolute minimum instructions, and then have those instructions lead to more specific instructions, thence to instructions more specific still, culminating in a stable computing environment ready for use.

In Unix, the strategy was to designate sectors of computer memory as boot blocks, which the microprocessor reads first. One of the boot blocks is the superblock, which contains a magic number, or code that communicates that this has the Unix file system (UFS), plus the UFS reader and other basic system data. After the various hardware components run their self-check, the microprocessor is instructed to open the UFS reader, which then loads the UFS.

Then, with the UFS enabled, the microprocessor launches the secondary boot program. This secondary boot program loads two kernels, a platform-specific kernel (which contains information relevant to that particular microprocessor model) and the Unix kernel. At this point, some important differences emerge in the booting process depending on if it's Solaris, HP-UX, Tru64, AIX, IRIX, FreeBSD, OpenBSD, Mac OS X, or Linux. Another crucial distinction is whether this is a single-user system or a multi-user system; if the former, then user-login is disabled and a root password is required to access the shell.

A standard concept in file systems is the idea of the hard disk as a stack of platters (see figure, right). In the figure, the concentric bands on the surface of each disk are called tracks, each of which is divided into 8 sectors. The disk cylinder is the set of matching tracks on all of the platters. The mechanical significance is that each platter has its own head reader, and each reader is attached to a thin armature. The armatures are like a comb, and they cannot move independently of each other, so all read-write heads will be reading tracks in the same cylinder. That's why the cylinder is such an important concept.

Several adjacent disk cylinders are known as a cylinder group. The UFS has several cylinder groups, each with copies of the superblock, the inodes (information about the file), and data blocks containing the actual contents of each file.

This was the technical achievement of the UFS that spread with subtle variation throughout the computing world. Further refinement included levels of abstraction between the mechanical organization of files among tracks of the disk platters, and the actual UFS.
Unix allows a wide variety of file names, typically up to 14 characters of any kind except the null character and "/." In addition, one may include a file extension, although this is not always required. My reference mentions that #, @, ?, $, !, &, *, parentheses, colons and semicolons, pipes (|), quotes, carets (^), <,>, \, and some other punctuation symbols are likely to cause problems if used. UFS distinguishes between upper and lower case letters.

Programs written in C require a file extension of .c, and the troff word processor requires that macros (but not documents!) have extensions .mm or .ms. In some cases, applications may require two file extensions; for example, document.tar.z means that document has been archived with tar and compressed using pack. The benefit of this is that a single script can uncompress ('unpack") and then "untar" the document.
RESOURCES: Wikipedia, Unix File System, bootstrapping, metadata; "Booting process in Solaris," Adminschoice; "The Structure of Cylinder Groups for UFS File Systems," Sun Microsystems; The Art of Unix Programming, Eric Raymond: esp. "File Attributes and Record Structures," "A Unix File Is Just a Big Bag of Bytes";

ONLINE TUTORIALS: Unix tutorial, Stanford University School of Earth Sciences, USA; "Unix for Web developers," eXtropia tutorials;

BOOKS: UNIX: the Complete Reference, by Kenneth Rosen, Douglas Host, James Farber, & Richard Rosinski—Tata McGraw-Hill edition 2002

Labels: , , ,

23 February 2006

System Commands

(This is a subheading of Unix-3)

The shells exist for system commands (and for some basic coding). While there are differences in the features that the different shells offer users, they all respond to a similar set of system commands. It is my experience that Unix users tend to find ways of constantly alluding to these commands in contexts quite remote from computers, and then being astonished when one doesn't know what they are saying.

Here is a list of some of the more important. I leaned very heavily on UNIX: the Complete Reference (full citation below), but of course that book has 40 pages of commands and extensions, plus the other 1235 pages of instructions on how to use them. Moreover, that book specifies instructions used only in Solaris, instructions not used in Linux, etc. This series of posts has nothing to do with Linux, and I hope I'll get a chance to write about that in the near future.

The boldface command names link to the relevant man pages.

Basic & General:
bash: opens Bourne-Again Shell

bg: resumes suspended job in background

cat (concatenate): entering this command returns the contents of the designated file. It does not return a header, filename, date, or other secondary information. You can also use it to output the contents to a specified file. This, of course, allows one to merge files into a new one. Recall cat is typically used in programs so it is used to organize the output from a program.

cd: change directory; same as MS-DOS

cf: copies file1 into file or directory target

csh: opens C Shell

date: returns current date & time

exit: terminates current user session

fg: resumes suspended job in foreground

find: finds files in path for expression. Technically, the search performed with find is "live"; the shell will examine the contents of the files at the time the command is launched, rather than search a directory map. So find takes longer than grep.
-print: prints current pathname during search

-name pattern: finds files matching pattern

-exec command: performs command on files that are found
history: returns previous command; many be reset to return any number of previous commands.

jobs: returns list of jobs running currently

kill pid: : terminates process pid permanently; also allows one to send a signal to a process

ksh: opens Korn Shell

ln file1 target: links file1 to target; one may link many files to one target

ls (list): returns directory content listing;

man command: returns manual pages for command

mkdir dir: creates directory dir;

more: displays selected parts of files, depending on extension:
filenames: returns filenames to be displayed

-c: clears screen and redraws, rather than scroll;

-d: displays errors rather than beeping;

-s: displays multiple blank lines as one blank line;

+linenumber: starts display at linenumber
mv file1 target: move file1 to target; one may move many files to one target

pwd: returns directory information;

resume %jobid: resumes suspended job jobid;

set: returns value of all variables of the current shell;

sh: opens Bourne Shell

spell file: returns list of incorrectly spelled words in file;
-b: Checks for British spelling
tabs: Sets tabs

zsh: opens Z Shell

who: returns info about users on system
Basic Communications:
mail: reads mail sent to user
-user: sends mail to user ID user

-F sysa!user: Forwards mail to user ID user on system sysa

mailx: interactive mail function
-f fname: reads mail from file fname instead of the normal mailbox

-H: Displays the message header summary only

ping host: Sends a request to respond to system host through internet connection.

talk username: Sets up conversation with user username on internet
System Administration:
at t: Directs system to run command at time t; the command follows on the next line.
-f file: specifies file file with multiple commands to run

batch: Allows execution of commands to later time. The command "batch" implies that one doesn't care exactly when the command runs; typically this is understood to be whenever load permits (as on a client-server network).

cron: Begins a daemon processs to run routinely scheduled jobs

crontab file: Pulls entries from file into crontab directory
-e: Edits/creates empty file

-l: Displays all of the user's crontab entries

-r: Removes selected entries from crontab directory

df: Shows number of free disk blocks and files on the system

du: Shows number of disk blocks usage on the system

fsck: (I was surprised to learn this is a real word!) Runs file system check and repairs errors.

limit: Limits, among other things, file size

passwd name: changes the user's password; extensions allow one to set limits on the duration of the password, etc.

ps: Shows status of running processes
-a: Displays information about the most frequently requested processes

-e: Displays information about all currently running processes

-f: Displays a full listing of all currently running processes

sar: Reports on activities within a system
-o file: sends the report to file in binary format;

-i sec: Sets sampling rate for activities to sec seconds

-A: Reports all levels of process and device activity

shutdown: Shuts down the system (or launches some other system state)
-g grace: Specifies a grace period grace other than 60 seconds before shutdown;

-i state: Specifies new state state that the system is to enter

-y: Runs the shutdown process without any user intervention

tar files: Copies files to and extracts files named files from magnetic tape
-c: Creates new tape

-x: Extracts files form the mounted tape

useradd: Adds a new user login ID to the system
-D: Returns default values for user ID parameters

-d dir: Specifies alternative home directory dir for relevant user

-e date: Specifies date date as termination date for relevant user ID

-g group: Specifies the group ID group for this user

-o: Allows the user ID to be duplicated for other users on the system

-u UID: Specifies the user ID as UID for the relevant user

Text Formatting:
checkdoc file: Checks file for formatting errors

dpost file: Creates PostScript-formatted file from troff output file.

Tools & Utilities:
bc: Performs interactive arithmetic processing and displays results
-l: Allows use of functions contained in the /usr/bin/lib.b library

file: Specifies that interactive mathematical operations are to be performed on values listed in file:

cmp file1file2: compares the contents of two files byte for byte; returns nothing if the files are identical, returns the first line where a difference occurs

comm file1file2: sorts the contents of two files and compares them line for line;

cut file: Cut file; works similar to [control]-X in MS Windows, except one must enter file name, rather than selct with a mouse;

grep: the primary search function. You type in a search string, and grep returns the line with the string it finds. It allows you to search for one target with regular expression elements. Grep may also be used to search for the expression among files in a designated directory. Variants: fgrep (doesn't allow regular expressions, but allows multiple targets), egrep ("extended grep"—takes a richer set of expressions and allows multiple targets), pgrep (searches running processes for the search string). The egrep & fgrep variants have since been mostly rolled into the modern grep.

look "string" file: Searches for occurrence of "string" in file. Returns lines that begin with the string. (Note: the grep ^^string does exactly the same thing. Redundancy is not a vice in Unixworld.)

paste file1file2: Combines text of file1 with file2. Not at all like [control]-V in MS Windows; for one thing, paste command does not recall text captured in prior use of cut command

sed ("stream editor"): filters text files. It is analogous to the search-replace feature found in many Windows or Macintosh programs, but can be used from a program itself to modify output.
My purpose in listing the commands above was to illustrate some of the power imbedded in the Unix shell. I have not written about any of the GUI's, such as Solaris, X Windows, and Mac OS X, since those represent different topics. It's true that a lot of Unix users now tend to use Unix commands like these through GUI "shells" instead, and this has blurred much of the distinction between Unix and competing development environments. On the other hand, my purpose was to illustrate some of the tools that were developed, and are now taken for granted by computer users.
daemon: short for user daemon (pron. DAME-on) A user daemon is simply a background process that does useful work for a specific user. Typically daemons are used for operations that do not require execution at a specific time, like permanently deleting older files from a full wastebasket.

file: recall Unix treats everything as a file, including other terminals, other running processes, and so on. So, for example, outputing something to a designated file is a rather powerful instruction.

regular expression (regex): here, an expression using a standardized search syntax; for example, the command % grep \ban returns all the lines containing "an" at the beginning of a word (i.e., after a word break, hence the expression \b). Another example of a regex are the wildcards $ and *, which are used as substitutes for any unspecified character or character string, respectively.
RESOURCES: The Art of Unix Programming, Eric Raymond: esp. "Taxonomy of Unix IPC Methods" and "Application Protocol Design"; Wikipedia, cat, cmp, comm, grep, list of Unix programs, regular expression, sed; "Mac OS X Unix Tutorial | page 2 of 2," Inside Mac Media

ONLINE TUTORIALS: Unix tutorial, Stanford University School of Earth Sciences, USA; "Unix for Web developers," eXtropia tutorials; "UNIX Tutorial for Beginners," University of Surrey, Guildford, UK; "Unix System Administration Independent Learning," USAIL;

BOOKS: UNIX: the Complete Reference, by Kenneth Rosen, Douglas Host, James Farber, & Richard Rosinski—Tata McGraw-Hill edition 2002

Labels: , , , ,

22 February 2006

Unix Development Environment

(This is a subheading of Unix-3)

In researching this article, I was often struck by the hazy difference between applications and actual components of the Unix environment. Partly, this is because I made the unsatisfactory choice of writing about Unix, rather than (say) a particular subsector of the Unix market like Mac OS X, Linux, FreeBSD, SunOS, or AIX. Each of those subsectors has a pool of applications native to it, and each of these subsectors has a peculiar field of usefulness. Had I been writing about any one of these, I would have had much more information on specific applications and their peculiarities.

Another thing to remember is that people typically use Unix in ways very different from, say, they might a Macintosh. Unix is typically used for nonstandard, high-end, specialized purposes. Silicon Graphic's Irix was developed for exceptionally advanced custom graphics applications; its "applications" could could have price tags in the six figures. University students in the computer sciences were likely to be trained in Unix development environments, and develop refinements of their preferred flavor as class projects; later, on their jobs, they might use a flavor of Unix to manage a server or develop applications. Unix development environments can be used to develop applications for non-Unix computers, such as video games. In this sense, therefore, Unix is not strictly analogous to the MS Window market. MS Windows computers are more like home appliances, nearly always used for the same small number of tasks, and nearly always used by non-programmers for some ultimate non-IT related goal; Unix computers have a complex and ambiguous differentiation between application and operating system, they are highly customizable, and they are usually used for IT-related goals. Often they are servers, and the applications that run on them are intentionally kept out of sight.

New applications may be written as a shell script, or else in language interpreters like awk, perl, and tcl/tk. (Note the lowercase: AWK is a programming language, while awk is a program that interprets the language. The other corresponding languages are Perl and Tcl/Tk, with the first letter capitalized).

AWK (and awk) is an extremely important component of the Unix development environment; it's essentially a structured file query language, which does something when it finds the specified character string. The program syntax allows for extremely terse commands, typically of one line in length. The newer version of this tool is nawk.

Perl is famous as a very common language for web-based content management software (CMS). Mostly one finds it used in CGI applications like wiki engines or blogging software. It incorporates much of AWK and the sed command, but with a different syntax.

Tcl is a language with an interpreter that generates C/C++ code as the end product. The user actually creates new Tcl commands which are saved as C or C++ subroutines. Not surprisingly, Tcl's body of available core commands is now extremely large, which explains the enormous range of application for it. Tcl has a graphical toolkit, known as Tk (analogous to the Visual Basic toolbox). Tk allows one to create, edit, and place "widgets" in a grapical interface like X Windows.

(Further information added as needed)
RESOURCES: The Art of Unix Programming, Eric Raymond: esp. "Language Evaluations" and "Minilanguages";

ONLINE TUTORIALS: Unix tutorial, Stanford University School of Earth Sciences, USA; "Unix for Web developers," eXtropia tutorials; "An Awk Primer," Greg Goebel ; "Tcl/Tk Cookbook," Lakshmi Sastry & Venkat VSS Sastry;

BOOKS: UNIX: the Complete Reference, by Kenneth Rosen, Douglas Host, James Farber, & Richard Rosinski—Tata McGraw-Hill edition 2002

Labels: , , , ,

21 February 2006

The Unix Kernel

(This is a subheading of Unix-3; earlier post on kernels)

Disclaimer: I am a student of this subject, and not an expert. Please check sources listed at the end of this post.

The kernel is the part of the operating system that interacts directly with with the hardware. It serves to manage computer memory & processor resources and basic input-output (BIOS). A modern kernel is also expected to manage the multithreading abilities of today's microprocessors.

Communications between the kernel and the programs running on the computer are system calls; these include commands such as "open file," "writing to file," "obtain information about file," "terminate process," "change priority of process," and so on. The system calls essentially define standard compliance, and all implementations of Unix System V have compatible system calls. However, the internals (programs that actually execute these system calls on behalf of the kernel) may be quite different from Unix version to version.

Once crucial difference between Unix kernels and those of other programs like MS-DOS is that the former does not give applications direct access to system resources (printers, memory, etc) [*]. Instead, applications have access to the system call interface, which is Unix's version of the API.

Another crucial distinction is that, while [the majority of] OS's treat all input/output (I/O) devices as different classes of objects, Unix treats them as all as files [*]. To flatten matters out even more, Unix files are all identical in format. Only script inside the file actually specifies what format it's supposed to conform to. This is somewhat interesting because, as I mentioned, all I/O devices (including disk drives) are files. Another point I ought to mention, and will surely mention again, is that Unix is all about terseness.* So the root directory in Unix is written as a slash "/." Semantically, I suppose it would be more accurate to say that the root is actually nameless—the slash occurs after the file name, which is why those of you with a Unix account will access it under the path /usr/bin/. All of the files under the root directory are referenced in relation to it, even though they might be on different physical drives from each other.

This is significant for the kernel because it simplifies the organization of system calls. It does present a problem, however, for legacy Unix kernels because of the recent explosion in the number of device types a system is expected to connect to. There is also some concern for Unix users that the kernel is itself another file, and is vulnerable to editing errors. That is why people learning how to use Unix are compelled to learn so much about it.
* Terseness in Unix: system commands generally, not to mention file naming conventions, are examples of Unix's legendary laconicism. I've known people who were utterly enraptured by it, and others who hate it with a passion. Generally speaking, those who hate it are older and are people with whom I've discussed it recently; those who love it are younger and told me about this back in the late 1980's.
RESOURCES: The Art of Unix Programming, Eric Raymond: esp. "The Elements of Operating-System Style"; Wikipedia entry; "Rethinking /dev and devices in the UNIX kernel," Poul-Henning Kamp, FreeBSD Project 2002;

ONLINE TUTORIALS: "Unix for Web developers," eXtropia tutorials; "UNIX Tutorial for Beginners," University of Surrey, Guildford, UK; "Unix System Administration Independent Learning," USAIL;

BOOKS: UNIX: the Complete Reference, by Kenneth Rosen, Douglas Host, James Farber, & Richard Rosinski—Tata McGraw-Hill edition 2002

Labels: , , ,

20 February 2006

Milestones: Unix (3)

Part 1, 2

Disclaimer: these are notes of a student on the subject, and not an expert. For expert information, please see sources at end of entry.

We've discussed the emergence of Unix as an unusual case of an operating system used on a vast range of differently-sized machines, from servers and supercomputers to entry-level workstations. Today, of course, the most common Unix platform is the Intel-based desktop, running Linux. I'm going to be writing a lot about Linux in the future, but I wanted to discuss the effects of "post-war" Unix.

Single Unix Specification
For a history of the battles over the Unix standard, I recommend this page from the Open Group's website. Hereafter, I am going to refrain from further discussion of the history of Unix and turn to what it has become. Unix can be understood as four distinct things:

  1. a specification (e.g. SVID)
  2. technology (e.g. SVR4)
  3. a product(e.g. UNIXWare)
  4. a registered trademark (UNIX)
The specification is now established by IEEE guidelines, and tied to the trademark—thereby regulating what Unix is. This was not merely a design issue, in which the best possible compromise between competing Unix softwares was selected by engineers; rather, it involved distinguishing what was essential to all implementations, and what represented competitive differentiation. The one thing the specification could not do is restrict the technology, since of course this is supposed to be moving forward; the example given, of SVR4, was in fact a particular technological implementation of the specification by a particular vendor circa 1989, and of course has long since been superseded.

The product was a form of technological implementation, which involved trade-offs and design inputs by the vendor.
Like all operating systems, understanding what Unix actually is means including its minimum components. These include:
  1. the kernel
  2. the shells
  3. the file system
  4. the development environment
  5. the system commands
  6. the utilities
  7. the documentation
(In some cases #3 & 4 are combined).

I'm not going to discuss documentation because I think that's self-evident; I'll just mention in passing that every single Unix command includes a man page that explains what the command does. Unix pioneered digital documentation of its commands; later, MS-DOS replicated this with the elegant /? switch that one may type after a command to get a description of what it does. Unix also features the much more detailed doc command, used for explaining much bigger subsystems. Again, all documentation is integral to the actual OS.
RESOURCES: The Open Group: "The Single Unix Specification"; The Open Group Base Specifications Issue 6; IBM, et. al., General Programming Concepts: Writing and Debugging Programs (1999);

BOOKS: UNIX: the Complete Reference, by Kenneth Rosen, Douglas Host, James Farber, & Richard Rosinski—Tata McGraw-Hill edition 2002

Labels: ,

10 February 2006

Camera Phones—your stalker's wimping out

Two unsurprising reports, amusing in juxtaposition: first, the SF Chron:
(May 16, 2004) The latest fad in cell phones — built-in digital cameras and even camcorders — lets people easily capture snippets of everyday life, but the enormous potential for abuse has businesses and organizations scrambling.

With camera phones, peeping Toms can snap revealing shots in gym locker rooms. Amateur paparazzi can stalk celebrities. Cheating students can peek at test answers. Crooked employees can copy confidential plans. Identity thieves can capture credit card numbers.

Moreover, the ease of beaming photos allows voyeurs to post "up skirt," "down blouse" and other compromising photos on the Internet for all to see.
Of course, if someone is trying to blackmail you, don't fret excessively:
IN-STAT (promotional email): A camera is considered by many users to be one of the most desirable features in wireless handsets, yet, evidence suggests that only a tiny percentage of camera phones are used regularly to transmit pictures or to store for later use, reports In-Stat (http://www.in-stat.com). Less than a third of camera phone owners surveyed by In-Stat indicated that they share picture messages with friends, the high-tech market research firm says.

"People who haven’t yet purchased camera phones are very enthusiastic about all the uses for their images," says David Chamberlain, In-Stat analyst. "However, once they start using their new phones, they are turned off by perceived poor picture quality, slow network speeds, and the difficulty of creating and sending pictures. Our survey found that very few pictures actually make their way out of the handset to be shared with others."

A recent report by In-Stat found the following:

  • Those who now use camera or camcorder phones say that they are less likely to replace their phones in the near future than other users.
  • There will be from 300–850 million mobile users that will send at least one image per month across the carrier network by 2010.
  • Only one in 20 camera phone users prints pictures or stores them on carrier-provided web sites. 28% of current camera phone owners actually share pictures using messaging service, compared with nearly 60% who hoped to before purchasing their camera phones.

I expect the job is inherently too taxing. The technology is surprisingly good for direct uploads, but the TV sitcom scenario of an omnipresent digital polaroid is overblown: transmission is unsatisfactory, in large measure because the phones are too small. And it takes too much effort to actually decide what to do with the staggering proliferation of digital mementos.

Labels: , ,

01 February 2006

Biofuels (2)

Part 1

The person who referred me to all the glowing articles "refuting" David Pimentel's essay is an anonymous enthusiast who got his articles at a website entitled "Journey to Forever." He says the site "presents both sides of the issue," although it is actually an unabashed supporter of biofuels. However, there is an article by Jürgen Krahl and Axel Munack, "Review: Utilization of Rapeseed Oil...," originally published by the Society of Automotive Engineers (SAE). The authors tested rapeseed oil methyl ester (RME), which is saffola converted to biodiesel; they also tested unmodified rapeseed oil, which requires a different type of engine.

Now, recall from Pimentel's paper that growing an acre of corn and rendering it into ethanol yields the energy equivalent of 190 gallons (720 L) of gasoline; the process requires 140 gallons (530 L) of gasoline, for a net yield of 50 gallons (190 L). So, consuming 1 gallon of "gasoline" would require a process in which 3.8 gallons of "gasoline" are produced and consumed, 1 of which is by the end user. That, I understand, is a component of the 10-fold increase in farmland requirement Pimentel mentions. So, if ethanol were to phase out gasoline entirely, proponents of biofuels can claim that growing all that maize will absorb the carbon released in combustion. But other byproducts of combustion would actually have to be 74% less than gasoline per gasoline-equivalent burned (or, 84% less per gallon burned). According to the chart here, that is not the case with rapeseed.

I am skeptical that ethanol can be so much better than rapeseed oil in energy emissions. But let us now turn to the favorable reports on energy outputs.

Here's "How Much Energy Does It Take to Make a Gallon of Ethanol?" (Lorenz & Morris, August 1995). They include a detailed breakdown of inputs using different processes. The main difference between them is that some require small amounts of fertilizer, which is the main energy input and "process steam." This is the process under which ethanol is transformed into a liquid fuel, and it is the principle variable. According to Lorenz and Morris, the steaming process has massively reduced energy per gallon; where once (in 1980) it required as much as 157% of the ethanol's energy yield, by '95 it averaged 33% of yield. Using "State of the art" technology meant for every gallon of gasoline (equivalent), 1.39 would be produced and consumed. That would mean, of course, that instead of requiring 10.7 times the land area currently used for US food consumption, 3.9 times as much land would be required. I reviewed the rest of the article and noticed little to arouse my suspicions, except that they tended to assume industrial efficiency that is rarely achieved in practice. Also, they "award" a lot of energy output to other products besides fuel, such as cogeneration. Should ethanol be embraced on a major scale, the opportunities to exploit cogeneration would decline sharply and we would most likely trend down to Pimentel's estimates of land use.

I skimmed through the "update" (PDF) by proponents (and USDA fellows) Shapouri, Duffield, and Wang. The article has a table on page 2. I should point out that SD&W show a surprisingly narrow range of estimates from different studies, with most of the difference being in expected BTU/gallon conversions (75K/gallon for Pimentel versus 50K for SD&amp;W; Wang, et al. claimed the figure could be as low as 40K, which is an outlier). The other big variable was nitrogen fertilizer, which SD&W claimed used only 18K BTU/lb, an astonishingly low statistic. If one believes that energy employed in the production of nitrogen has fallen so much, or that use of such energy-intensive fertilizers can be sharply reduced, then you are still stuck with a political calamity when the US begins encroaching on the farmland of 3rd world countries as it scrambles to substitute gasoline or diesel with ethanol.

The bottom line is that, while the hardcore enthusiasts and techno-optimists look forward to higher net energy yields with ethanol and rapeseed, the grim fact is that their own incentivized statistics (I expect had they reported poorer results for ethanol they would have been sacked—and probably unemployable in their fields) still stick us with consuming 3-6 times as much oil-like fluid as we actually use. The extra would be used to make the net energy value that we use. And while massive improvements might conceivably made in the ecological impact of industrial farming, it would never be enough to offset the 4-10 times as much of it we would hereafter have to do.

BOTTOM LINE: Biofuels are a biohazard.

Labels: , ,

Biofuels (1)

For several years I've heard about the industrial process of converting crops into fuel. This is called biofuel, although from the first I've been highly skeptical of the technology. The reason is that agriculture is highly energy intensive, and after burning diesel fuel in order to grow crops such as maize [corn] or oilseed rape [saffola], the product has to be chemically processed into a chemical compatible with industrial applications. This processing, or "refining," of maize/oilseed into petroleum products likewise requires more energy inputs.

So what is the balance sheet? Are biofuel programs just a singularly perverse way of wasting tax dollars and petrol, in the name of pretending to be energy efficient?

At a lively comment thread on Daily Kos, an ardent supporter of biofuels offered a list of articles offering evidence. For your reading convenience, I'll list them here:

  1. "Estimating the Net Energy Balance of Corn Ethanol," by Shapouri, Duffield, & Graboski (1995); see also "The Energy Balance of Corn Ethanol: An Update" (PDF; 2002), same authors.
  2. "How Much Energy Does It Take to Make a Gallon of Ethanol?" Lorenz & Morris (1995)
  3. Others listed here

And as a counterpoint, here is David Pimentel's 1998 study on ethanol, "Energy & Dollar Costs of Ethanol Production with Corn" (PDF). Here's Prof. Pimentel's report:
The production of corn in the United States requires significant energy and dollar inputs. Indeed, growing corn is a major energy and dollar cost of producing ethanol ...For example, to produce an average of 120 bushels of corn per acre using conventional production technology requires more than 140 gallons of gasoline equivalents... The major energy inputs in U.S. corn production are oil, natural gas, and/or other high grade fuels. Fertilizer production and fuels for mechanization account for about two-thirds of these energy inputs for corn production...

Once corn is harvested, three additional energy expenditures contribute to the total costs in the conversion process. These include energy to transport the corn material to the ethanol plant, energy expended relating to capital equipment requirements for the plant, and energy expended in the plant operations for the fermentation and distillation processes...

The total energy input to produce one gallon of ethanol is 129,600 BTU. However, one gallon of ethanol has an energy value of only 76,000 BTU. Thus, a net energy loss of 53,600 BTU occurs for each gallon of ethanol produced. Put another way, about 71% more energy is required to produce a gallon of ethanol than the energy that is contained in a gallon of ethanol

If this is true, then the ethanol is not a competitor to gasoline and diesel; rather, the process of producing it, including such components as energy consumed in the production of agrichemicals, etc., is another demand for gasoline and diesel, or, weighted properly, coal, PNG, and hydroelectric. Additionally, ethanol is mainly used as a supplement to fuel; in several states it has been chosen as an oxidant to replace MTBE.

I could digress on ethanol versus MTBE, but I won't.

If my wife were writing this post, she would long ago have inveighed against inflicting a disastrous new scourge on the planet, in the form of tying up seven times as much land for supplying fuel, as is used now for supplying food. The amount of land required to sustain America's peculiar dietary habits, moreover, are vastly greater than those that for other national diets. That's because the US diet is dominated by meat. Beef requires about twenty pounds of grain per pound of food; so if a person replaces 11% (by weight) of her vegan diet with beef, she has doubled the land area required to feed her. Now, imagine if the US population has an energy "hiccough" when oil prices become prohibitive, then switches over to using its strong dollar to tie up the land of famine-stricken Africa for maize.

(Part 2)

Labels: , ,