02 August 2007

XML-3: Entities & Elements

Disclaimer: I am not an expert on this topic, but a student. I am hoping that these notes will be of help or interest to others trying to understand what XML is and how it works. My notes are not as well-composed as I would like, and I've been more interested in correcting errors than poor composition. Moreover, there's a lot of repetition that I've trying to prune back.

From my previous post on XML:
Markup languages are about more than merely tags; there are also elements, which include the basic building blocks of a document. [...] In XML [...] everything in the page must belong to an element, and there is a hierarchy of elements. An element may be a listing of some kind, or the body of text, or footnote text, or a title, or salutation. An element is opened by a tag, and must always be closed by one, unless it's an empty element

Entities are named units of storage in XML. Conceivably, the entire document may be an entity, including associated files defining the XML elements. However, this is trivial example of an entity. Internal entities are defined in the document and may include something as simple as a symbol, or perhaps a footer. External entities include a universal resource identifier (URI) , which identifies precisely where the content of the entities is found. In some cases, the advantage of using an entity is that it may be used as a variable; changing the value in one place changes it everywhere it appears in the resulting document. Also, internal parameter entities can be used in the associated files to change what is a legal element.
Documents created in HTML (usually) and in XML (always) contain a header, or prologue, which defines the elements in the body. An element is not something that XML has in addition to tags; tags are used to create elements, designate what they are or do, and what their attributes may be (e.g., color, size, image location, conditionality). While HTML documents may, or may not, be organized into elements, XML source documents always have everything organized into elements. Moreover, these elements are nested; so, for example, the entire visible part of the source document is the root element (or body). Everything is an element that is child to the root; so, for example, the title of the document is a <TITLE> element nested, or contained wholly inside of, the root element.

All elements need to be declared, or formally listed in the document type declaration (DTD). I mention this now because the declaration of entities shines some light into how they work and why. The DTD indicates such things as which elements are children of other elements and specifies what attributes, or descriptive qualities (like height, width, color, typeface) each particular element has.

Entities are a somewhat elusive concept. Internal entities are defined entirely within the DTD; external entities have some or all of their content outside the document. In the external entity, a hyperlink points to the externally located content. Entities are treated as variables, and that's how they work.
  • XML applications have five predefined general entity references (list); they're typographic symbols that are most frequently used to type out illustrative XML code on websites. They appear only in the source code of the file and are invoked with an ampersand ("&").
  • Internal general entities include things like headers or copyright data that must appear many places in the source document. The author may wish to change the effect of the entity everywhere it appears by editing it in the declaration; so, for example by changing the year on the copyright, or including her middle initial in each appearance of her name. They appear only in the source code of the file and are invoked with an ampersand ("&").
  • External general entities are entities that refer to something outside of the document. A common example is parts of a document stored in other files, which may include #PCDATA source code. They appear only in the source code of the file and are invoked with an ampersand ("&").
  • Internal parameter entities are entities that are in the DTD; one can actually incorporate an entity into a declaration. They are invoked with a percentage sign ("%"). This allows one to have declarations that invoke some variable.

The purpose of entities is to allow an author or programmer to construct a document from pieces, including pieces of other documents that may well exist in another domain.

source document: the XML code that constitutes the website. Excludes the declarations and stylesheet.

REFERENCE: XML Tutorial - Entities and Other Components

BOOK: Elliote Rusty Harold, XML 1.1 Bible, 3rd Edition, Wiley Publishing (2004)

Labels: , , , , ,


Post a Comment

<< Home