XML-What's that Spell?

By Mike Hutchison, Tampa PC Users Group
myankee@ij.net


Extensible Markup Language or XML is a specialized version or variant descendent of Standard Generalized Markup Language (SGML), which is the "parent" of HTML. XML allows people working in this language to define their own elements. Some people refer to it as a meta-language, a language used to describe other languages, or to hear some tell it, vocabularies. HTML does not have this functionality. And on the negative side of the ledger with respect to SGML vis-à-vis modern data exchange is the fact that SGML has been around since the 70's. It can't be expected to work in a networked, interactive online application environment. The intent is for XML to work with HTML (& SGML) rather than trying to completely replace them.

XML tries to define a sub-set of SGML which is meant for web use. One early use of XML was Microsoft's Channel Definition Format. It made the assignments of documents for push delivery over the Internet. Push here refers to technology which sends data to users on a regular schedule. XML has different subtypes of it's own: OSD, CDF, Synchronized Multimedia Integration Language or SMIL, Chemical Markup Language-CML, Mathematical Markup Language-MathML.

XML is a plain text file; like HTML, it can be created in Notepad. The language has few rules and these relate only to syntax. For example, all attribute values in XML must be in quotation marks.

In HTML the use of quotation marks for the value of attributes is more flexible. Another rule is that XML is case-sensitive. It does not ignore "white space". The first element of an XML document must be something like this: <?xml version="1.0"?> .

Even though you can write XML with Notepad, you probably don't want to attempt creating a custom parser for each application that you come up with in the same fashion. It also bears mentioning that Microsoft came out with an XML Notepad ( a special version of Notepad with the creation of XML files in mind). When Internet Explorer 5 came out it carried full support for the W3C XML standard and came with a built in parser component for XML. A check of the W3C web site indicates that they are still on the XML 1.0 specification. MSXML 2.0 was Microsoft's primary XML processor, and it shipped with IE 5; MSXML 2.0 was freely available as a standalone, redistributable file. The W3C says an XML processor is software that can read XML documents and access their structure and content. A parser, by way of review for the writer's benefit, is a program (often part of the compiler) that gets input of various types, and breaks it up into parts that can be processed by other programming elements. The parser frequently is utilized also to check for proper syntax.

HTML uses markup to describe the structure of a document while XML markup is a description of a document's content. HTML and XML both use tags which are words enclosed with <brackets> and attributes e.g., color="olive". HTML spells out what each of these things means, but XML uses the tags to delimit pieces of data; that is, tell where a specific piece of data begins and ends. XML does not require a DTD (Document Type Definition). It can be transformed easily using Extensible Style Language. One basic concept of XML is that data should be exchanged in the form of documents-that is, some sort of form or document (invoice, purchase order, etc.) that non-techie business folks are familiar with, replete with the data that makes it a useful entity.

XML "data islands", a Microsoft concept, can be added to a HTML document. There are two ways to do this that meet syntax constraints:

The XML data can exist inline, surrounded by XML open and close tags.

<XML ID="XMLID">
<XMLDATA>
<DATA>TEXT</DATA>
</XMLDATA>
</XML>

The XML element can have a SRC attribute, the value of which is the URL for an XML data source.

<XML SRC="http://localhost/xmlFile.xml"></XML>

The XML element is present in the HTML Document Object Model. It is in the all collection and is seen by the browser as just a regular node. The XML data within the XML element can then be accessed by calling the XMLDocument property on the XML element.

One last note: XML is not just for use in web applications. It is also intended to be a cross-platform medium between different computer systems.

Here are some links:

Http://www.webdeveloper.com/xml/
Http://www.ucc.ie/xml/#spec
Http://www.w3.org/XML/1999/XML-in-10-points
Http://www.w3.org/XML/


These last two are from the World Wide Web Consortium, the people who write the Bible on XML and HTML and DOM, DTD, etc.

For a practical demo don't forget the XML-Search-Amazon-locally on the TPCUG home page. u