23 October 2006

Content Management Software

Content Management Software (CMS) is a broad category for software; it's used to organize and arrange the data files used in media of various kinds, such as online news sites, blogs, wikis, and so on. That's a pretty large subject, and those with a vocational interest in CMS applications are likely to discover that the huge range of CMS products reflects a huge range of CMS markets, with incomparable subdivisions.


CMS applications are usually very closely related to database management software; it has a backend that consists of content being managed, and a frontend that consists of the query (search) feature, display feature, or reporting feature. Hereafter, I'll be referring to the back end as the "content management application" (CMA) and to the front end as the "content delivery application (CDA)."

Not all CMS apps fit into the database format, though; for example, Movable Type and PmWiki are two common file-based applications.1 Movable Type is a very popular blogging software which users install on their personal webhost; PmWiki is a WikiEngine that users can install on their host for the creation of stand-alone wikis. In a file-based CMS, the software generates a new static page, which hereafter has a permanent address of its own.

Both file-based and database-oriented CMS apps have their virtues. The big advantage of the database design is that it is really suited to sites with immense numbers of entries. For example, while I don't know for certain, I strongly suspect that Blogger (this site) has over a 10 million individual posts. Often I have links to a particular post, which then appears as a stand-alone page. One can also pull up all my entries for the month of August 2005. In Movable Type, where I have another blog, one may also pull up posts organized by category. I naturally had an interest in including each post in as many topic categories as possible. My host was quickly maxed out because Movable Type would accomplish this by making a separate page for each way of organizing posts, which meant each post had to be stored many times; moreover, a static page has to include all of the junk on the side, the logo at the top, and the Perl scripts for comments. If Blogger worked that way, the site would be impossibly expensive to run. Instead, the blog engine (frontend) actually generates a page from fields in a vast table. Of of the columns of that table includes the text you are reading right now. Another column includes the date, while another includes settings I selected, like "allow reader comments."

File-based software has the advantage that hosting is simpler. If there are relatively few pages in the site (under a thousand), it's not a serious problem if every page the visitor could see is its own permanent file. The hosting software is simpler, and backups or modifications are easier.2

Features of CMS typically include CSS and XML, which determine the appearance of all pages without the user having to laboriously enter the HTML for each page. So, for example, the CSS for Blogger ensures that, unless I specify otherwise in an HTML tag, all text will be in Georgia font, and the background will be a certain color.
___________________________________________________________
1 "Movable Type vs. ExpressionEngine — A comparison," François Nonnenmacher, 2004; WikiMatrix-Compare Them table.

2 WikiMatrix--Compare Them All. I actually think modifications of the file templates are much easier if you've got a database CMS. Movable Type has templates that can be modified with a minimal knowledge of Perl, but editing the contents of each page requires that you individually visit each page and change it. In a database CMS, it's possible to access all page content through the page content table. You can insert fields and have those fields updated en masse whenever the data that belongs in it is updated.

Another important feature that I appreciate about database CMS designs is that archiving is totally flexible. Using Movable Type again--it can organize archived posts by month, week, topic, and individual; but it has to have a permanent file for each. If you create a post filed under multiple topics, then your host has to store many copies of the exact same HTML. That's absurd. In contrast, Blogger (which is not a stand-alone CMS) can simply create an entirely new archive just by entering a search query. Of course, Blogger's busy servers don't save anything--they just upload the archive to your Temp file, just like anything else you download.
___________________________________________________________

SOURCES & ADDITIONAL READING: "Content Management System," Whatis.com; "Content Management Software," Wikipedia; Overview of (web) content management systems, CMS Matrix (and links off that list);

Labels: ,

08 September 2006

Java and CMS

I mentioned rather briefly my interest in Java-powered CMS (here). There are not many wiki engines written in Java, possibly because it's more demanding. Java is a program run by the web browser, which take is responsible for converting available data into a readable webpage. My impression which could be wrong, is that interactive webpages are more robust and less prone to unintended results when loading, since they are designed to actually interface with the web browser's virtual machine. In contrast, programs like PHP or JavaScript are designed to create another layer of interface by prompting the website's host to generate a page.
(I tried to discuss this in the prior post on CMS applications linked above. Basically, most CMS applications either generate static pages, which are created as stand-alone HTML files; or else they follow the database format, in which case every single distinguishing trait of each page in the website is saved in computer memory as a field in a database record. The later design is usually more efficient in terms of memory and searching, and is essential for very large sites like Wikipedia. In either case, however, the CMS application that powers the website must generate a file--temporary or permanent--that is read as HTML.)

Another reason why Java-based CMS's might be better is that they do not actually launch a server process whenever the user interacts with the application. Supposing it is a WikiEngine, for example, which is accessed by a large number of users. Each time a user wants to preview her new post, for example, the CGI application is required to launch a new process. But the Java app will only need to launch a new thread.

CGI versus Java: not a valid comparison!

It has to be pointed out that the dichotomy between CGI and Java is not valid. CGI is, after all, an open application programming interface (API); Java is a programming language. One can create a CGI application that is powered by Java, although this is not common. Generally speaking, Perl or PHP is used for programming CGI applications; Java applets are used for programs that run off the visitor's web browser.

However, in researching this essay, it became apparent that Java (unlike Perl or PHP) can replace many of the functions of a CGI application, while executing those functions in a way that is, in some ways, preferable to (and logically exclusive of) the CGI API. Conversely, most CMS's that are in common use were created in Perl or PHP (not Java!) because they are easily understood by people with a casual familiarity with HTML. Also, it is often unnecessary to have a costly Java application when mere HTML with a little JavaScript will do fine.
__________________________________________________________
There are quite a few Java-powered WikiEngines, mostly of the database-orientation. Courtesy of WikiMatrix, I am aware of Clearspace, Corendal, Ikewiki, JAMWiki, JSPWiki, SnipSnap, and XWiki. In addition to these named, there are some systems developed for large organizations, such as SamePage, which I have ignored. XWiki (samples) seems to be oriented to professional developers, and I don't think it's really feasible for my purposes.

Ikewiki is a semantic wiki developed in Salzburg, Austria. Semantic wikis (SW's) differ from the usual type in that they have a peculiar logical structure of the data. So far I have found no implementations.

Examination of wikis created from these engines has been extremely time-consuming, but let's make some quick notes. Clearspace is a commercial product ($29/user) from Jive Software. It's evidently used in the BBC's website, TechRepublic forums, CNET forums, and Amazon.

JAMWiki is an interesting concept: it's a WikiEngine with feature parity to MediaWiki (the most commonly implemented of all, and used with Wikipedia). So far, the selection of implementations is very slim indeed. Janne Jalkanen created JSPWiki to develop and advertise coding tricks, but it's spartan and specific to the general purpose of JSP.

Labels: , , , , , ,

14 June 2005

On a Foray into HTML-3

SunSoft Java[*], and Netscape JavaScript [*] are closely related ideas. They're both programming languages that are commonly associated with the internet. The similar names are just a coincidence, however, and they refer to very different things. In this blog post and others, I'm going to refer to a program and its elements as "code." You could say programs are written with code. I also will use a term, "compiler." This is a program that reads code written in a high-level language and translates it into assembly language so the computer can do what it's supposed to do.

Java was developed about the same time as Mosaic, the first Web browser. Most computers supplied since 1990 have a "Java virtual machine" (JVM) that is a compiler for Java code. This allows Java code to be read by any browser anywhere, any time, regardless of the computer on which one is browsing the web. The VM is common to all browsers, regardless of flavor (this is not STRICTLY true!).

An application is any program that you need a computer for, such as word processing or managing a database. An application written in Java is called an applet. An applet can do pretty much anything that a conventional application can do; so, for example, this list of applets includes calculators, graphers, simulators; an MP3 player; chat rooms, email programs, and spam blockers are also written in Java.
What about JavaScript? JavaScript was created by Netscape as a simple set of commands that all browsers would recognize. Unlike Java, which is a completely separate programming language, designed for autonomous applications, JavaScript is a set of commands recognized by browsers. JavaScript programs, or scripts, are usually embedded directly in HTML files. The script executes when the user's browser opens the HTML file.

JavaScript allows the person visiting your website to interact with the site. A simple script involves letting the visitor select the background color of the page. Another script can detect the user's operating system and browser type, then give instructions that are appropriate to the user's particular computer. A third type evaluates user input. Drop down menus and combination bars are things that you can do with JavaScript.

(To be continued.)

Labels: , , , , , ,

On a Foray into HTML-2

This post has been edited for accuracy

So, to recap: the Web and the Internet are similar and it's reasonable for people to use them as synonyms. It's just that the Web is what individual computer users have created with HTML, in the medium of the Internet. The Internet is older; it's th foundation and building material of the Web.

Web pages are created with HTML. This simply a file type that can be read by a browser. Web pages are "made" of HTML; HTML is a high-level computer language that explains to the browsers visiting the site how to display the text and images hosted at the website.

In addition to the HTML files that the browser reads, there are elements that the browser is told to display. Web browsers are designed to "read" (recognize the format and display accurately) JPEG images (*.jpg), GIF images (*.gif), TIF (*.tif), and bitmaps (*.bmp). They can also recognize other types of files, which I'll describe in a moment.

In addition to HTML files, the above-mentioned image files, and Java or Javascript files, you can post pretty much any type of file you want on your website. However, in order to read things like an MS Word document or Acrobat PDF, you need to have the software installed on your computer. Hence, the popularity of Adobe Acrobat. The software for reading PDF files is free; people pay to buy the software for creating *.pdf files. These files will display in a new window of the browser, or a window spawned by the browser's computer (i.e., Windows or Mac OS will launch MS Excel if you open an Excel file at a website).

WHAT ARE SOME OTHER FILE TYPES YOU CAN HAVE?
You can have MPEG's, which are files that are either audio, video, or both. MPEG refers to a standards committee (like you needed to know that!), and this committee keeps issuing new formats. MPEG-2 is the standard used for most *.mpg files. A variant is MPEG-4, which was modified to create the Windows Media Video (*.wmv, or "Wave") format; Apple Quicktime (*.mov) is a third. These file types can be created by different softwares, and they can be played back by freely ditributed playback software. Like Adobe Acrobat, the player is usually free, and the computer's operating system must spawn the player for it to be seen. The file formats are mutually incompatible, although some players can play more than one format.

In addition to these, there is Macromedia Flash/FlashPlayer. This is like the others, except that Flash allows one to create a digital image by manipulating objects in the Flash software; it's like MS PowerPoint, with the ability to animate the presentation and upload it to the web. Flash files (*.swf) are typically viewed as an animated graphic within the web page; it's not usually necessary to spawn a new window for playback. As a result, one can combine animated and non-animated elements in a single page. Also, Flash is very easy to use, in my opinion.
COOL STUFF I NOTICED LATER: Here's a blog post about new features available in the latest release of Flash (hat tip to Wikipedia's Flash entry).
(To be continued)

Labels: , , , ,

13 June 2005

On a Foray into HTML

Some terms of art for the web:

Some of you are going to hear some technical language used here that is quite intimidating. A case in point is the jargon associated with web pages, the internet, and so on. The fact that many of these terms have multiple meaning doesn't make it easier, but let us hope this does.

First, many people surfing the internet may be a little confused by the terms, "internet" and "web." These are almost, but not quite, synonyms. The internet is a network of networks that is connected (at least initially) through the telephone lines, using signals much like voice transmission. Modems used a universal standard for exchanging data through the phone lines, called TCP/IP. This format was developed in 1969 though the Advanced Research Projects Agency (ARPA), a branch of the Department of Defense. Much later, a protocol called HTML was developed that allowed web browsers to treat data sent over modems and convert this into graphical images, such as a web page. At the same time that HTML was invented, web browsers were also invented by the National Center for Supercomputing Applications (NCSA). It's easy to see why browsers and HTML had to be invented concurrently: a browser had to be able to translate data from a modem into an image that could be displayed, and there had to be a standard that allowed browsers to speak to each other.

The internet was initially useful to computer terminals connected to mainframes, running arcane software like FTP, Usenet and Gopher. I recall having a lot of friends who were familiar with these services and talked about them a lot, and finding it inconceivable that these things would ever amount to anything but costly nerd toys. In 1992, however, Mosaic emerged as the first graphical browser, thereby creating--in a stroke--the world of interconnected hypertext we know as the "Web."

(To be continued.)

Labels: , , , ,