07 June 2006

Databases for Laypeople: Part 4 in a series

Hypertext and the Relational Database
(Table of Contents--Part 3)

It's somewhat unorthodox to introduce the relational database as a form of hypertext.* But consider the parallels: in order to allow a user to access data in a relational database, the RDBMS has designated key fields for each record that are also a referring field someplace else. In each hypertext document, a part of the document is its address (analogous to the key field), which will appear in the referring hypertext document as a part of a hotlink.

The structure of information in a relational database may be understood, for the time being, as if it were a page in Wikipedia. Each record may be compared to a Wikipedia entry, with some of the fields (or words in the article) linking to other records in other tables. This analogy is not perfect; for example, an article about prime ministers of Peru will link to other "records," or articles, about other prime ministers of Peru. In contrast the object of a relation between two different fields is to link two different types of record--records kept in different tables. That's a flaw in the analogy, and in fact in cases where data records are used professionally--like patents or legal rulings--the hypertext online references do employ distinct tables for structurally different forms of data.

Search portal for the US Patent Office
In order to over come this flaw in the analogy, we need to imagine something with a more professionally focused, but still easy to intuit: a hyperlinking, searchable, US Patent Office-style web page. Like a wiki, it is edited constantly, in this case by users/editors who have applied for a patent, or who are responsible for authorizing those patents. Unlike the most famous wiki, viz., Wikipedia, the records are fundamantally different in character and are stored in different "tables" depending on whether they are Utility (i.e., a new technical innovation that fulfills an identified purpose), Design (i.e., just a refinement of applied art), Plant (e.g., a flower), Reissue, Defensive Publication, or something else. Like many multi-user databases, this allows different users different types of permission: everyone may read Wikipedia or the patent data available above, but only a small number may edit the information. Applicants may edit their application; the USPO staff may edit the status of the application, based on office rulings.

Finally, as with all wikis, a database is edited by many people concurrently, who may sometimes have updates that conflict directly with each other. In some cases, this can have disastrous consequences!

In a database, a record with exactly one related record in another table is pretty rare. One example is the office where each employee has one and only one computer, which is associated strictly with that employee. With database design and capabilities, the possible exceptions to this have to be considered: a computer used by potentially several different employees, or an employee who needs access to several workstations in the office, will make a one-to-one relation impossible for that particular pair of tables. On the other hand, if it does exist, then the fields for "computer" are likely to be part of the employee record.

Typically we think of the page that links to many other pages as being the home page, or site-map. For example, imagine a simple web page with a list of links to articles (example). In database design, the relationship might involve contacts for one's vendors: our imaginary event-arranging company might have half a dozen contacts at Aramark, but each contact will be for only one firm. Aramark has many employees, but each employee works only for one firm.

Much of the time, one encounters "promiscuous" relations: again, with our imaginary event-management company, clients and events can have a many-to-many relationship as well. For example, a book show might be held several times a year, each year. Each event features scores of firms selling books. For purposes of billing, the event-management company has to be able to locate information about each client (or each vendor) related to each event, and for planning each event the staff needs to access information about each event related to each client.

A chart showing the relationship between the entities "Clients" and "Events" might look like this:
The problem of relating an unknowable number of clients to a corresponding, but unknowable, number of events, is solved by having each client on the client table linked to a client:event table. In this way, the "real" master table includes the minimal information about each client:event record.

(Part 5)
* Somewhat unorthodox--but not unheard of. See "From Database to Web Site: Transforming a PC Relational Database to a World Wide Web Resource," by Jane A. Keefer, West Chester University:
Once the discographic file name parameters are established, active hypertext linking is available between all performers and titles simply by using the primary key for each record as the value of the HTML Anchor Name tag. Since each primary key is unique, the correct reference is assured and this device illustrates the functionality of the relational file structure as an early form of hypertext technology, much as the card catalog with its main and added entries can now be seen as a crude effort at relational data structuring.
The blog and the wiki are both examples of cgi-administered databases.



Post a Comment

<< Home