XML & DTD Syntax

Quick Introduction to XML Syntax

To ensure that all readers of the ALFS DTD Book get as much as possible from its contents, it is necessary to provide a quick introduction to the concepts of XML and DTD syntax.

[Note] Note

This introduction provides very few examples. This book is written in an XML DTD called DocBook XML. For an example of XML just look at the book's source. Since this book is documenting an XML DTD, look at the rest of the book's contents for examples of DTD syntax.

To begin, here are some basic rules of XML :

  • XML documents use a self-describing and simple to use syntax.

  • All XML elements must have a closing tag. With XML, it is illegal to omit the closing tag.

  • XML tags are case sensitive.

  • All XML elements must be properly nested. Improper nesting of tags makes no sense to XML.

  • All XML documents must have a root element. In other words, all XML documents must contain a single tag pair to define a root element.

  • Attribute values must always be quoted. With XML, it is illegal to omit quotation marks around attribute values.

  • XML parsers preserve all whitespace in XML documents, even that which is considered non-significant.

  • The use of the ampersand [ & ] symbol is reserved. XML uses this to define an entity reference.

Standard Entities

As mentioned in the last section, the ampersand symbol cannot be used by itself. There are a set of standard entity references that every DTD file should contain. There are mostly symbols that you would want to place inside the XML file. You define them by using thier decimal value on the ASCII chart. Here is a good list :

  • Less-Than [ < ] : "&#60;"

  • Greater-Than [ > ] : "&#62;"

  • Ampersand [ & ] : "&#38;"

  • Apostrophe [ ' ] : "&#39;"

  • Quote [ " ] : "&#34;"

  • Non Breaking Space (a forced space) : "&#32;"

  • Emdash [ -- ] : "&#045;&#045;"

Generally, you can assume that amp, lt, gt, and apos are predefined, but you can always make sure by using the numeric references above. Also, if you ever need to create your own, you know how to do it now.

XML Elements and Attributes

XML is designed to hold any kind of information. This information is stored in Elements. Elements are the basic building blocks of XML and are represented in a XML document as tag pairs. Attributes provide a mechanism to further define or classify an element. Elements have relationships with other elements in a document. Some are parents and some are children. Using this semantic description, one can see that children elements need parent elements defined and used first. As mentioned in the last section, an XML document must have a root element. Think of this as the ultimate parent element. The root element must be defined and used before all other elements and all sub-elements (children). All elements and sub-elements will reside inside of the root element. An element can have parsed content, mixed content, simple content, empty content or attributes in their definition.

XML elements must follow these naming rules :

  • Names can contain letters, numbers, and other characters

  • Names must not start with a number or punctuation character

  • Names must not start with the letters xml (or XML or Xml ...)

  • Names cannot contain spaces

Quick Introduction to DTD Syntax

Once an XML document is written, it is generally a good idea to validate the elements used in the document against a known DTD. The Document Type Definition is the mechanism with which one validates the content of a well-formed XML document.

XML DTD files contain :

  • Element declarations and definitions : Elements are declared and defined with their relationships in the DTD file.

  • Attribute declarations and definitions : Element classes or attributes are declared and defined in the DTD file.

  • Entities : Entities are the same thing as variables inside a DTD file or XML document. They can hold any kind of data.

  • PCDATA : PCDATA is Parsed Character DATA. PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.

  • CDATA : CDATA is Character DATA. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.

DTD Element Declaration

Elements are declared in the DTD file using a simple, but strict syntax. There are four ways to define an element :

  • EMPTY : When an element is declared with the EMPTY keyword, it means that the element will not hold any information. This is generally used for special tags like <br>.

  • ANY : When an element is declared with the ANY keyword, it means that the element can contain any information that the author wants it to. This is generally a special case.

  • Character Data : When an element is declared with either the PCDATA or CDATA keywords, it will hold one of the two types of information described above.

  • With Children : When an element is declared with the names of other elements in it, this defines a parent-child relationship. Look in the DTD for the child element names to be further defined with the other three ways.

  • Mixed : Some combination of the above four. Generally this is character data mixed with children.

When an element is declared with children, it will also define how the children can be used inside an XML document and also in the order that they are allowed to appear in an XML document. There are four ways that children elements can be defined in a DTD file :

  • One Occurance Only : Example : Element: <search_replace>. The child elements of <search_replace> -- <file>, <find>, and <replace> can only be used once. Notice that there are no symbols after any of the child element names. This is the identifier.

  • Minimum of One Occurance : Example : Element: <permissions>. One of the child elements of <permissions> -- <name>, must be used a minimum of once, but can also be used many times. Notice the plus [ + ] symbol after the name. This is the identifier.

  • Zero or More Occurances : Example : Element: <download>. One of the child elements of <download> -- <url>, can be used zero or many times. Notice the asterisk [ * ] symbol after the name. This is the identifier.

  • Zero or One Occurance : Example : Element: <download>. One of the child elements of <download> -- <digest>, can be used zero or one time only. Notice the question mark [ ? ] symbol after the name. This is the identifier.

  • Either / Or Occurances : Example : Element: <execute>. One of the two child elements of <execute> -- <param>, or <prefix>, can only be used. Notice the pipe [ | ] symbol in between the two elements. This is the identifier.

DTD Attribute Declaration

As mentioned above, attributes can help to define "classes" of Elements. Attributes are defined with types and values. There are 11 types :

  • CDATA : The value is Character Data.

  • (en1|en2|...) : The value is an enumerated list.

  • ID : The value is a unique id.

  • IDREF : The value is the id of another element.

  • IDREFS : The value is a list of other ids,

  • NMTOKEN : The value is a valid XML name.

  • NMTOKENS : The value is a list of valid XML names.

  • ENTITY : The value is an entity.

  • ENTITIES : The value is a list of entities.

  • NOTATION : The value is a name of a notation.

  • xml : The value is a predefined XML value.

There are four value options :

  • Value : The default value of the attribute surrounded by quotes [ " " ]. Example : Element : <alfs>.

  • #IMPLIED : The attribute is optional. Example : Element : <alfs>.

  • #REQUIRED : The attribute is required when the element is used. Example : Element: <execute>.

  • #FIXED : A fixed value. Used with the Value option. Example : Element : <alfs>.

DOCTYPE and SYSTEM Declarations

The DOCTYPE declaration is used in an XML document to define to the XML parser what DTD should be referenced. This declaration is helpful when you have a seperate DTD file outside of the XML document. See Element : <alfs> for an example.

The SYSTEM declaration is used in an XML document to give provide a way to split up a file into smaller chunks. Many XML files can be quite large and having all the information inside one file can be unwieldy. The SYSTEM declaration works just like any ENTITY declaration. See Element : <alfs> for an example.