27 October 2006

Semantics versus Syntax

Syntax refers to the grammatical structure of a language. For example, the English language includes nouns, verbs, adverbs, and adjectives. Depending on the position and conjugation of the words in a sentence, the sentence may have correct syntax, but still be nonsensical because of the logical content of the sentence. The canonical example is "Colorless green dreams sleep furiously," which uses valid syntax--[adj] [adj] [noun] [verb] [adverb]--but is nonsense because it violates several rules of semantics.

Semantics, therefore, refers to the logical content of a sentence. As we can see, "Colorless green dreams sleep furiously" could be translated into many languages (Les rêves verts sans couleur dorment furieusement), but still have the same problems of semantics: green and colorless are mutually contradictory attributes, while it is absurd to speak of dreams sleeping, even in a poetic sense.

Labels:

23 October 2006

Content Management Software

Content Management Software (CMS) is a broad category for software; it's used to organize and arrange the data files used in media of various kinds, such as online news sites, blogs, wikis, and so on. That's a pretty large subject, and those with a vocational interest in CMS applications are likely to discover that the huge range of CMS products reflects a huge range of CMS markets, with incomparable subdivisions.


CMS applications are usually very closely related to database management software; it has a backend that consists of content being managed, and a frontend that consists of the query (search) feature, display feature, or reporting feature. Hereafter, I'll be referring to the back end as the "content management application" (CMA) and to the front end as the "content delivery application (CDA)."

Not all CMS apps fit into the database format, though; for example, Movable Type and PmWiki are two common file-based applications.1 Movable Type is a very popular blogging software which users install on their personal webhost; PmWiki is a WikiEngine that users can install on their host for the creation of stand-alone wikis. In a file-based CMS, the software generates a new static page, which hereafter has a permanent address of its own.

Both file-based and database-oriented CMS apps have their virtues. The big advantage of the database design is that it is really suited to sites with immense numbers of entries. For example, while I don't know for certain, I strongly suspect that Blogger (this site) has over a 10 million individual posts. Often I have links to a particular post, which then appears as a stand-alone page. One can also pull up all my entries for the month of August 2005. In Movable Type, where I have another blog, one may also pull up posts organized by category. I naturally had an interest in including each post in as many topic categories as possible. My host was quickly maxed out because Movable Type would accomplish this by making a separate page for each way of organizing posts, which meant each post had to be stored many times; moreover, a static page has to include all of the junk on the side, the logo at the top, and the Perl scripts for comments. If Blogger worked that way, the site would be impossibly expensive to run. Instead, the blog engine (frontend) actually generates a page from fields in a vast table. Of of the columns of that table includes the text you are reading right now. Another column includes the date, while another includes settings I selected, like "allow reader comments."

File-based software has the advantage that hosting is simpler. If there are relatively few pages in the site (under a thousand), it's not a serious problem if every page the visitor could see is its own permanent file. The hosting software is simpler, and backups or modifications are easier.2

Features of CMS typically include CSS and XML, which determine the appearance of all pages without the user having to laboriously enter the HTML for each page. So, for example, the CSS for Blogger ensures that, unless I specify otherwise in an HTML tag, all text will be in Georgia font, and the background will be a certain color.
___________________________________________________________
1 "Movable Type vs. ExpressionEngine — A comparison," François Nonnenmacher, 2004; WikiMatrix-Compare Them table.

2 WikiMatrix--Compare Them All. I actually think modifications of the file templates are much easier if you've got a database CMS. Movable Type has templates that can be modified with a minimal knowledge of Perl, but editing the contents of each page requires that you individually visit each page and change it. In a database CMS, it's possible to access all page content through the page content table. You can insert fields and have those fields updated en masse whenever the data that belongs in it is updated.

Another important feature that I appreciate about database CMS designs is that archiving is totally flexible. Using Movable Type again--it can organize archived posts by month, week, topic, and individual; but it has to have a permanent file for each. If you create a post filed under multiple topics, then your host has to store many copies of the exact same HTML. That's absurd. In contrast, Blogger (which is not a stand-alone CMS) can simply create an entirely new archive just by entering a search query. Of course, Blogger's busy servers don't save anything--they just upload the archive to your Temp file, just like anything else you download.
___________________________________________________________

SOURCES & ADDITIONAL READING: "Content Management System," Whatis.com; "Content Management Software," Wikipedia; Overview of (web) content management systems, CMS Matrix (and links off that list);

Labels: ,

22 October 2006

Unified Modeling Language (UML)

In order to program objects, a standard is required for ensuring that the objects will not interfere with each other. A modeling language can be loosely described as a "meta-language," or abstract representation of language for software designing purposes. Initially, modeling language was considered to be part of the programming technique; programming teams might follow basic guidelines so they could communicate with each other. Gradually, the various modeling languages converged into a standard that is universally taught. One of the benefits of this has been the creation of an open-source library of software objects and tools that can be adapted readily to a core program.

In 1997, the Object Management Group (OMG) released the first version of the Unified Modeling Language (UML), which probably contributed to the subsequent popularity of OOP. Initially, the paucity of open-source objects posed a problem; in order to create large numbers of mutually compatible applications based on objects, one needs an immense number of program objects for all the detailed subroutines that a full-fledged application comprises. Object-oriented programming is especially unsuited to the conventional variety of intellectual property rights, since proprietary objects can only be used by the original developer, or else, require complex licensing agreements. The UML seems to have had its greatest impact in the rapid and impressive development of the online content management software (CMS) since '04, most notably Drupal.

The UML is a set of freely available standards (download) that are roughly analogous to any number of industrial methodologies (TQM, etc.). From these standards have evolved a large number of UML development tools, most notably the UML diagrams.


Click on image for larger view

The illustration above is a screencapture of a program called Pacestar UML Diagrammer, whose purpose--shockingly enough--is to generate UML diagrams. There are 13 diagram types; the one shown in the active window above is a class diagram, with a case diagram in the background. All thirteen are listed below, with links to an excellent site explaining their purpose. As one realizes the complexity of the symbolic language that was developed for UML,one begins to understand the sweeping importance UML has had on modern (post-2004) software design. Firstly, UML 2.0 (released that year) was the industry standard for symbolic analysis of the operation, architecture, and functionality of object oriented software; secondly, new software applications tend to be object-oriented; and thirdly, OOP was becoming much more popular than it had been in the past precisely because UML was increasing the ease of OOP relative to softwares for which something like UML did not--or could not--exist.
__________________________________________
Object Constraint Language:

OCL is a formal system of semantics which is used for establishing the correctness of a "statement" in a programming language. As the name implies, it may constrain an object, by excluding certain types of statements.
Pollice: For example, it could help you indicate that, to be assigned a room, a specific course must have at least six students enrolled. With OCL you could annotate the association between the Course and Classroom classes to represent the constraint, as shown in Figure 1. As an alternative to the note shown in this figure, you could use a constraint connector between the Course and Classroom classes.
Pollice's article is very enthusiastic about OCL, which is often necessary for an instructor (I usually find I can learn a concept faster if I believe, or convince myself, that the concept is brilliant). He illustrates the difference between a set of semantic rules, which is what OCL is, and an actual language (which would have an explicit syntax and vocabulary). Human languages have surprisingly universal semantics, something that is entirely untrue in either mathematics or programming languages; there, semantic rules vary depending on the logical relationships being manipulated. According to Pollice, OCL has very mathematically-oriented semantics, which makes it especially powerful since mathematics has evolved a very profound, comprehensive semantic structure, whereas programming languages tend to have very rudimentary rules of syntax that are peculiar to each one).

In contrast, many programming languages have semantic rules that are comparatively closer to formal English (e.g., COBOL); this is actually something of a waste for programming objects, an object usually performs a very specific mathematical or logical operation that has no use for the arbitrary and alien constraints of human semantics, which are designed to describe tangible reality. OCL rules for what constitute acceptable language for objects under each objects peculiar conditions and contraints are, things that ought to be applied as understood by the programmer as a tool; the programmer ought not to try to master the entire codex of OCL rules. The benefits of expanding one's knowledge of OCL is that one can learn to think formally, thereby expanding one's power to discern appropriate design approaches.

(OGL SOURCE: Gary Pollice, "Formally speaking: How to apply OCL")
__________________________________________
CRITICISM OF UML:

Needless to say, for anything as influential as UML has been, there are criticisms; what's surprising is how mild they are. For the most part, these consist of inadequacies and omissions in the language. Scott W. Ambler laments that UML is a long way off from true computer-aided software engineering (CASE), and developers are still obligated to develop proprietary extensions to it in order to generate executable code or derive UML models from existing code.

Another criticism is the proliferation of diagrams; while several new ones have been added for UML 2.0, it seems that the large number reflects the sort of committee-induced compromise between incompatible design approaches: include the tools to do both.

This complaint also arises with the logic. UML authorizes the use of OCL semantics, English (detailed semantics) and its own peculiar set, there's an argument that the varied semantic structures defeat the purpose of any. It's unclear if this is necessarily a flaw, though, since different objects may require different semantic structures.
__________________________________________
NOTES:
Complex licensing agreements: frequently an application published for any particular market has many features that any one user is unlikely to use. Plug-ins may well be an option, but in cases where they are not, there is a problem of pricing licenses for proprietary software objects when the developer of the main program expects only 10% or so of users to ever use the feature.

Diagram types: these are (1) Class, (2) Component, (3) Composite structure, (4) Deployment, (5) Object, (6) Package, (7) Activity, (8) State Machine, (9) Use case, (10) Communication, (11) Interaction overview (UML 2.0), (12) Sequence, & (13) UML Timing (UML 2.0);


UML Timing diagram; click for source

__________________________________________
ADDITIONAL READING & SOURCES: Wikipedia entries for Object-oriented programming: Object modeling language, Unified Modeling Language (UML); Executable UML;

Unified Modeling Language (UML) page; OMG; Mandar Chitnis, Pravin Tiwari, & Lakshmi Ananthamurthy, "UML Overview"; Bruce Powel Douglass, "UML 2.0 Incrementally Improves Scalability And Architecture"; Scott W. Ambler, "Be Realistic About the UML: It's Simply Not Sufficient";

Labels: ,

10 October 2006

Hacking

The term "hacking" is often used to refer to the act of editing or fixing flawed computer code. Typically a hack is interpreted as a patch or clever work-around. I have also seen the term used to refer to the creation of forks of computer programs; this latter sense means, the programmer (legally or not) edits the source code of a program so its functionality is different. The programmer then circulates this new, edited version under a new name.

In many cases, a computer programmer requires a specialized program to do this; for example, there are a lot of programs that are used to automatically generate HTML, JavaScript applets, and other quasi-programs. Often they have unsatisfactory quirks, and programmers create programs to follow them around and clean up (or hack) the script. So programs and bots can be surrogate hackers too.

However, it is also the case that "a hack" was often slack at MIT for a prank. Usually hacks (in this sense) were very elaborate pranks that required an immense amount of work.

Another sense of the term "hack" derives from the older jargon, crack. The term "safe cracker" is perhaps well-known to avid readers of pulp fiction; it refers to the trick of opening a safe by manipulating the locking mechanism, rather than blowing it up. Likewise, a "code cracker" was someone who specialized in finding patterns in encrypted data, and thereby decoding it. Applied to computer terminology, it naturally referred to the ability of specialists to defeat or cripple a computer system. One obvious motive for doing this would be crime: a cracker could, for example, crack the security of a bank and change his account balance to whatever he thought he could get away with. Or he could vandalize the system of an organization he loathed.

This has unfortunately created a certain confusion of terms. One of the things any hacker could naturally do quite well is create malware, such as spam bots. Spam bots could conceivably be useful; it's just that they aren't. So the term "hacker" came to be associated with negative, destructive use of skills that are intrinsically valuable.

Richard Stallman introduces a third, closely related, sense of the term hack: the introduction of a novel, potentially useful or entertaining idea. His example includes the trick of eating with more than two chopsticks in one's hand. While this is not very useful, he mentioned that a friend was able to eat with four in his right hand, using them as two pair. It appeals to a sense of playfulness and appreciation for originality.

Labels: ,