Reading XML


This chapter describes how XML files can be read into user-defined structures within the Fire language.

XML is becoming more and more the de facto when transferring data between applications. Fire has an xmlread command which parses an XML file and loads its data into a Fire structure.

General Principles

As with all xml data transfer the recipient of an xml file, in this case a Fire application, is expected to know the schema of the data. Therefore a Fire structure (or class) must be created before the data is read. This structure will contain members corresponding to the data tags in the xml file. Consider the following trivial xml file:

<?xml version="1.0" encoding="ISO-8859-1"?>
<poi> <name>War Memorial</name> <id>6512</id> <location>(25466,71625)</location> <date>18-Dec-1919</date> </poi>

A Fire structure to receive this data might be as follows:

atable mytab
class ~mytab.poi_t {
    string name
    string date
    string id
    string location
}

and the Fire code to create an object of this type (here called warmem) and load the xml data (from a file called warmem.xml) would be:

# Create an instance of the structure
    ~mytab.poit_t warmem
# Read the xml file contents into it
    xmlread  warmem.xml,warmem

This is a very simple case, where all the xml tag data is assumed to be textual data. However, we can vary the structure member types to enable type-coercion. This is often the requirement, enabling the data to be used more intelligently in subsequent Fire commands. In the case above we could have declared the structure as follows:

atable mytab
class ~mytab.poi_t {
    string name
    time date
    numeric id
    point location
}

Unless an xml file is valdated against an xml schema (which we will discuss later), the order of elements within a Fire structure does not have to follow the order of tags within the xml file. Data association is done by name, not by position.

As expected, all xml comment tags i.e. <!-- ... --> are ignored during parsing.

In addition DTD and XSL definitions are currently ignored. In short, only data tags are parsed.

Tags and Attributes

XML tags and attributes are treated in the same way when parsed. The following 2 xml files would be interpreted the same way:

<?xml version="1.0" encoding="ISO-8859-1"?>
<poi> <name>War Memorial</name> <id>6512</id> <location>(25466,71625)</location> <date>18-Dec-1919</date> </poi>
<?xml version="1.0" encoding="ISO-8859-1"?>
<poi id="6512" date="18-Dec-1919"> <name>War Memorial</name> <location>(25466,71625)</location> </poi>

Tag and attribute names will map directly to names in the receiving Fire object, i.e. an xml tag or attribute named id will get loaded into the member id of the Fire object. Case is ignored.

Where there is a conflict of name with possible reserved Fire member names, there are 2 ways of getting round this:

Prefix the Fire member name with m_, e.g. m_id, or

Redirect the load via an xml_tag attribute on the receiving stucture member.

To illustrate, using the example in the previous section, some xml tag values are here directed into structure members of different names:

class ~mytab.poi_t {
    string name
    time m_date
    numeric id
    point pt,-tag='location'
}

On a subsequent xmlread command, the tag or attribute value date will be loaded into the structure member m_date, and the tag or attribute value location will be loaded into the structure member pt.

If there is no Fire object member for an xml tag or attribute, the xml tag data is discarded. Similarly if there is no xml data for a Fire object member, the Fire object member retains its current value.

Any XML namespace prefix is discarded during name matching.

You may have noticed that neither the Fire structure name poi_t nor the name of the object being loaded warmem matches the outer XML tag poi. This is because by default there is no matching done on this name, so in effect any xml file can be loaded into any Fire type. Of course if no sub-tag names match Fire structure names, nothing will get loaded.

If a specific outer tag name is required, e.g. as an extra integrity check, a -tag switch can be added to the xmlread command, e.g.

xmlread warmem.xml,warmem,-tag='poi'

The result of this will be that if the xml data outer tag is not <poi> .... </poi>, then the xmlread command will fail.

Sometimes, there is a requirement for testing the validity of an xml file before reading its contents. The xmltest function is available for this purpose and performs 2 duties:

it determines whether the file is a bona-fide xml file and will pass the parsing checks,

it returns the value of the outermost tag (the "root" tag) so you can check whether its data is the type you expect.

Array Values

Often xml data will contain tag multiple values , e.g.

<band>
    <name>Led Zeppelin</name>
    <genre>Heavy Metal</genre>
    <member>John Bonham<member>
    <member>John Paul Jones<member>
    <member>Jimmy Page<member>
    <member>Robert Plant<member>
</band>

There is one value for the name tag, but 4 values for the member tag. The following Fire structure would typically be used:

class ~mytab.rockband_t {
    string name
    string genre
    string member[]
}

As you can see, a placeholder array member has been declared of unspecified length. During the load process this length will get extended appropriately (in the case of Led Zeppelin to 4, although it often sounded like more). If no member tags are found, this array will remain with a length 0 after the load.

Sub Structures

XML data can get very verbose and convoluted when supplying data for complex schemas. To accommodate such schemas it is usually necessary for Fire structures within structures to be defined.

Consider an xml file using a schema for configuring a Tomcat web-server:   
There are many sub-structures within this data, which means many predefined Fire structures must be declared before the data can be loaded properly:  

You will notice that many Fire sub-structures are declared as arrays of unspecified length. This method is used to detect whether the elements have been included in the xml data.

Typically Fire structures do not have a structure value, although their members do. Parent xml tags can have a text value so a specially marked member can be added to a Fire structure to access this parent value. This value can be coerced from a text value into another Fire data type just like the other members. Consider the following snippet of XML:

<employee dept="Accounts" age="32">Joe Kinnear</employee>

This could be read into an object of the following Fire structure:

class employee_t {
    string tag_value,-ptv
    string dept
    numeric age
}

where the tag_value member will be given the value Joe Kinnear. Note how we have used a -ptv (parent text value) switch to tell the xmlread command that this member needs to be processed differently.

The name tag_value is arbitrary and could be a different name if required, e.g.

class employee_t {
    string full_name,-ptv
    string dept
    numeric age
}

An alternative method of setting a structure value is to define a class with inheritance, e.g.

class employee_t,string {
    string dept
    numeric age
}

Data Coercion

As we have discussed in an earlier section, even though xml data is primarily textual, by declaring Fire types other than type string, data values can be coerced. The allowed destination Fire types are as follows: numeric, point, time and blob, as well as the default type string.

Before looking at these in details we should consider the xml string values prior to coercion. Text values for attributes, e.g. <name att="value"> pose no problems because the values can be quoted and, apart from the double quote which must be escaped (via \"), all characters can be included raw in the value.

However, tag values, e.g. <att>value</att> usually contain what is known as parsed character data, which means white space is compressed and some characters (< > ' " &) have a special meaning. To get round this, either these characters are name-escaped (via the sequences &lt; &gt; &quot or &amp;) or the whole data can be enclosed by the user-friendly sequence <![CDATA[ and ]]>, in which case all characters are treated as raw data and no character escaping is necessary.

As an example, consider 2 ways in which the string &<"hello there">& might be supplied as a tag value in an XML file:

<att>&amp;&lt;&quot;hello there&quot;&gt;&amp;</att> 
<att><![CDATA[&<"hello there">&]]></att>

In both cases, the input is pretty unsightly, but ours not to reason why. That's XML for you.

The coercion of string values into other types behaves generally as one would expected.

Coercion to Numeric

Integer, real and floating-point constants and expressions are all permitted, e.g.

<id>45</id>
<total>98.72</total>
<dvalue>4.5e-17</dvalue>
<calc>4 + sqrt(17)</calc>

Coercion to Point

2-D or 3-D point values are permitted, with comma-separated x,y or z ordinates, within parentheses or without, e.g.

<pt>(10,20)</pt>
<pt3d>55,17.3,912</pt3d>

A -p2d switch is available to indicate that point values are 2-dimensional only (no z).

Coercion to Time

The usual time/date Fire date and time constants are permitted, e.g.

<last_accessed>9/31/1989, 15:35</last_accessed>
<birthday>14-Jan-1982</birthday>

Coercion to Blob

Blobs are useful when multi-line string values are supplied. Once a value has been coerced into a blob, the blob can be treated as a file. Consider an example xml file which supplies some fire code to be executed, e.g.

<mydata>
   <width>400</width>
   <height>500</height>
   <code><![CDATA[
args w=numeric, h=numeric
# Draw a window of supplied dimensions
window gw = wgraphic -dim=<w,h>
box gw,-draw
   ]]></code>
</mydata>

and the Fire code to define a fire structure, read the data and execute fire code within it:

atable test
# Define the Fire structure (class)
structure ~test.thing_t {
   numeric width
   numeric height
  blob code
}
# Read the xml file into an instance of one of these
~test.thing_t  obj
xmlread  mydata.xml,obj
# Execute the blob as though it were a macro
exec obj.code(obj.width,obj.height)

Binary blobs can be read by specifying that incoming data is base-64 encoded or hexadecimal. There are 2 switches (-b64 and -hex) for this purpose:

class ~test.myblobs_t {
   blob base64_blob,-b64
   blob hex_blob,-hex
}

Data Validation

The above examples enabled us to read xml into Fire objects. The xml was pretty simple and not validated as to its well-formedness (!!). Well-formed documents are those adhering to an xml schema, which defines the layout of an xml document and can also impose constraints on element values within the data.

To validate an xml document against a known schema, we can again use the xmltest function, but this time add an additional "to-be-validated" parameter:

if (xmltest('services.xml',1)) !Document is valid

Fire's xml parser (which is in fact the Xerces parser from the Apache Software Foundation) will now do a much more thorough examination of the document, validating it against a schema (usually an xsd file pointed to by a url in the document root tag). Any data which is non-compliant with the schema will result in failure of the function.

By default Fire has xml validation turned off. The xmlread command can perform validation before reading the data by appending a -v command switch.

You may be asking yourself how we can create Fire structures which conform to an xml schema, since in the earlier sections we discussed reading xml documents into Fire structures which we have created manually. Ideally we would like to create Fire structures from a schema without having to know too much about schema definition, or even create Fire structures from an xml document without a schema. The next section discusses this.

Automatic Class Definition

Let us consider an xml document svcs.xml which conforms to an xml schema. You will see that this document refers to its schema by a schemaLocation xml attribute in its root tag. In our case the url to the schema is http://www.xmarc.net/services.xsd. There are different ways for an xml document to reference the schema to which it conforms, this is just one of them.

The schema itself is also an xml file but conventionally with the file extension .xsd.

.

It would be nice to create the corresponding Fire structure(s) for this schema without having to wade through the xsd code. Well we can, using the xsdread command like this:

xsdread 'http://www.xmarc.net/services.xsd',-of=services.cmd

The command creates a Fire macro (we have called it services.cmd) from the contents of the xsd file.
Note how we can refer directly to the url of the xml schema. There is no need to download it to a local file before processing it.

You will see that the generated macro code contains commands to create several Fire structures, the names of which have been derived from element names within the xml schema. You will also see how various constructs (e.g. attribute { ... }) and member switches (e.g. -tag and -int) have been automatically generated. These extras have no impact on the Fire structures but have the purpose of imposing special behavior on the data when it is read from or written to xml. A full list of these switches can be found here.

The xsdread command creates a macro to be executed as a post-process rather than creating Fire structures directly because typically there is some customization required to the generated structures. Common post-processing tasks include:

renaming the structure member names,

assigning the structure classes to atables,

cast the elements to different types, e.g. string to blob.

It is therefore not intended for the xsdread command be used as a production command but as an application development tool.

There are some xml concepts which have no equivalent in the Fire language world, also some xml schema tags which are not processed. For this reason it is not possible to accommodate every aspect of xml schemas, e.g. the <any> tag is ignored, xpath facilities are not supported, and so on. In addition no validation of the schema itself is performed, Fire assumes that it is valid.

Our generated macro (perhaps after some minor edits) would be executed so that the new Fire structures get added to an atable (since all structure definitions must be members of an atable). This might be done as follows:

atable svcs
scope = svcs
exec services.cmd,-sc

We now have our Fire structures set up and can start reading xml data file into an object:

~svcs.ServicesType_t mysvcs
xmlread svcs.xml,mysvcs

Here we have created an object (mysvcs) and populated it from the contents of an xml file svcs.xml.

Although in this example we created the Fire structure classes directly from the xml schema file (services.xsd), we could have created it from the xml data file (svcs.xml) instead, e.g.

xsdread svcs.xml,-of=services.cmd

Here Fire will find the url to the schema automatically and then do exactly the same process. This method just means you don't have to manually extract the url of the schema before doing the xsdread.

Automatic Schema-less Class Definition

Can we generate Fire structures automatically without knowing an xml schema ?

The answer is yes, sort of. Without a schema, Fire cannot determine element data-types, value constraints or array limitations, but it can hazard a guess.

To get it to do this you throw an xml file at the xsdread command with the -raw switch. If the xml file has no associated schema, the tags are analysed and structures are created but all elements are assumed to be of type string. Differentiation is made between xml elements and attributes, and informed guesses are made as to which elements are optional and which are obligatory. Invariably you would edit the resulting macro, making your own assumptions about element types etc.

Consider what would happen if the svcs.xml data file used in the previous section had no associated schema:

xsdread svcs.xml,-raw,-of=services_raw.cmd

This would produce a much different macro to that produced from a known schema.
Many of the xml schema elements do not appear, since they do not occur in the data file. Also some elements are assumed to be obligatory since they are always present, but in fact the schema defines them to be optional.

So the answer to the original question is: Yes you can generate structures from raw xml but the results are strictly bare bones.

Manual XML Analysis

So far we have read whole XML files directly into Fire objects, but sometimes there is a need to pick isolated values from XML files, or you may wish to interrogate an XML file to inspect its element hierarchy.

We have made available a function xmlobj, which lets us examine an xml element (together with its child elements and attributes) via a component identifier, e.g.

component myxml = xmlobj('myfile.xml')

This creates a component object named myxml from the top-level element tag in the xml file, which we can then interrogate for information, e.g.

tell myxml.attribute_names

This will give us the names of all the attributes in the element.

To get the values of the attributes there is a member function, e.g.

tell myxml.attribute_svalue('title')

Sub-elements may be examined by creating more components from the top-level component, e.g.

component sub = myxml.child('item')

This provides access to the first sub-element named 'item', thereby enabling us to traverse down the hierarchy. There is also a function to traverse back up:

if (sub.parent == myxml) tell 'This is a second level element'

The full element hierarchy is therefore accessible.

A requirement might be to find all elements with a particular name within an xml file, and get a particular attribute value for each.
Consider the following to achieve this, e.g.

# Get a handle on the top-level element
   component fc=xmlobj('props.xml')
# Get a handle on all elements tagged '<projection>'
   component elems[] = fc.find('projection',1); # 2nd parameter means recurse down the hierarchy
# Get the value of the 'name' attribute for each
   string names[elems.alength]
   for i=1,elems.alength {
      names[i] = elems[i].attribute_svalue('name')
   }

The full range of properties and functions available to xml components can be found here.


Prev Chapter    Next Chapter