This chapter describes how XML files can be read into user-defined structures within the Fire language.
XML is becoming more and more the de facto when transferring data between applications. Fire has an xmlread command which parses an XML file and loads its data into a Fire structure.
As with all xml data transfer the recipient of an xml file, in this case a Fire application, is expected to know the schema of the data. Therefore a Fire structure (or class) must be created before the data is read. This structure will contain members corresponding to the data tags in the xml file. Consider the following trivial xml file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<poi> <name>War Memorial</name> <id>6512</id> <location>(25466,71625)</location> <date>18-Dec-1919</date> </poi>
A Fire structure to receive this data might be as follows:
atable mytab class ~mytab.poi_t { string name string date string id string location }
and the Fire code to create an object of this type (here called warmem) and load the xml data (from a file called warmem.xml) would be:
# Create an instance of the structure ~mytab.poit_t warmem # Read the xml file contents into it xmlread warmem.xml,warmem
This is a very simple case, where all the xml tag data is assumed to be textual data. However, we can vary the structure member types to enable type-coercion. This is often the requirement, enabling the data to be used more intelligently in subsequent Fire commands. In the case above we could have declared the structure as follows:
atable mytab class ~mytab.poi_t { string name time date numeric id point location }
Unless an xml file is valdated against an xml schema (which we will discuss later), the order of elements within a Fire structure does not have to follow the order of tags within the xml file. Data association is done by name, not by position.
As expected, all xml comment tags i.e. <!-- ... --> are ignored during parsing.
In addition DTD and XSL definitions are currently ignored. In short, only data tags are parsed.
XML tags and attributes are treated in the same way when parsed. The following 2 xml files would be interpreted the same way:
<?xml version="1.0" encoding="ISO-8859-1"?>
<poi> <name>War Memorial</name> <id>6512</id> <location>(25466,71625)</location> <date>18-Dec-1919</date> </poi>
<?xml version="1.0" encoding="ISO-8859-1"?>
<poi id="6512" date="18-Dec-1919"> <name>War Memorial</name> <location>(25466,71625)</location> </poi>
Tag and attribute names will map directly to names in the receiving Fire object, i.e. an xml tag or attribute named id will get loaded into the member id of the Fire object. Case is ignored.
Where there is a conflict of name with possible reserved Fire member names, there are 2 ways of getting round this:
Prefix the Fire member name with m_, e.g. m_id, or
Redirect the load via an xml_tag attribute on the receiving stucture member.
To illustrate, using the example in the previous section, some xml tag values are here directed into structure members of different names:
class ~mytab.poi_t { string name time m_date numeric id point pt,-tag='location' }
On a subsequent xmlread command, the tag or attribute value date will be loaded into the structure member m_date, and the tag or attribute value location will be loaded into the structure member pt.
If there is no Fire object member for an xml tag or attribute, the xml tag data is discarded. Similarly if there is no xml data for a Fire object member, the Fire object member retains its current value.
Any XML namespace prefix is discarded during name matching.
You may have noticed that neither the Fire structure name poi_t nor the name of the object being loaded warmem matches the outer XML tag poi. This is because by default there is no matching done on this name, so in effect any xml file can be loaded into any Fire type. Of course if no sub-tag names match Fire structure names, nothing will get loaded.
If a specific outer tag name is required, e.g. as an extra integrity check, a -tag switch can be added to the xmlread command, e.g.
xmlread warmem.xml,warmem,-tag='poi'
The result of this will be that if the xml data outer tag is not <poi> .... </poi>, then the xmlread command will fail.
Sometimes, there is a requirement for testing the validity of an xml file before reading its contents. The xmltest function is available for this purpose and performs 2 duties:
it determines whether the file is a bona-fide xml file and will pass the parsing checks,
it returns the value of the outermost tag (the "root" tag) so you can check whether its data is the type you expect.
Often xml data will contain tag multiple values , e.g.
<band> <name>Led Zeppelin</name> <genre>Heavy Metal</genre> <member>John Bonham<member> <member>John Paul Jones<member> <member>Jimmy Page<member> <member>Robert Plant<member> </band>
There is one value for the name tag, but 4 values for the member tag. The following Fire structure would typically be used:
class ~mytab.rockband_t { string name string genre string member[] }
As you can see, a placeholder array member has been declared of unspecified length. During the load process this length will get extended appropriately (in the case of Led Zeppelin to 4, although it often sounded like more). If no member tags are found, this array will remain with a length 0 after the load.
XML data can get very verbose and convoluted when supplying data for complex schemas. To accommodate such schemas it is usually necessary for Fire structures within structures to be defined.
You will notice that many Fire sub-structures are declared as arrays of unspecified length. This method is used to detect whether the elements have been included in the xml data.
Typically Fire structures do not have a structure value, although their members do. Parent xml tags can have a text value so a specially marked member can be added to a Fire structure to access this parent value. This value can be coerced from a text value into another Fire data type just like the other members. Consider the following snippet of XML:
<employee dept="Accounts" age="32">Joe Kinnear</employee>
This could be read into an object of the following Fire structure:
class employee_t { string tag_value,-ptv string dept numeric age }
where the tag_value member will be given the value Joe Kinnear. Note how we have used a -ptv (parent text value) switch to tell the xmlread command that this member needs to be processed differently.
The name tag_value is arbitrary and could be a different name if required, e.g.
class employee_t { string full_name,-ptv string dept numeric age }
An alternative method of setting a structure value is to define a class with inheritance, e.g.
class employee_t,string { string dept numeric age }
As we have discussed in an earlier section, even though xml data is primarily textual, by declaring Fire types other than type string, data values can be coerced. The allowed destination Fire types are as follows: numeric, point, time and blob, as well as the default type string.
Before looking at these in details we should consider the xml string values prior to coercion. Text values for attributes, e.g. <name att="value"> pose no problems because the values can be quoted and, apart from the double quote which must be escaped (via \"), all characters can be included raw in the value.
However, tag values, e.g. <att>value</att> usually contain what is known as parsed character data, which means white space is compressed and some characters (< > ' " &) have a special meaning. To get round this, either these characters are name-escaped (via the sequences < > " or &) or the whole data can be enclosed by the user-friendly sequence <![CDATA[ and ]]>, in which case all characters are treated as raw data and no character escaping is necessary.
As an example, consider 2 ways in which the string &<"hello there">& might be supplied as a tag value in an XML file:
<att>&<"hello there">&</att>
<att><![CDATA[&<"hello there">&]]></att>
In both cases, the input is pretty unsightly, but ours not to reason why. That's XML for you.
The coercion of string values into other types behaves generally as one would expected.
Integer, real and floating-point constants and expressions are all permitted, e.g.
<id>45</id> <total>98.72</total> <dvalue>4.5e-17</dvalue> <calc>4 + sqrt(17)</calc>
2-D or 3-D point values are permitted, with comma-separated x,y or z ordinates, within parentheses or without, e.g.
<pt>(10,20)</pt> <pt3d>55,17.3,912</pt3d>
A -p2d switch is available to indicate that point values are 2-dimensional only (no z).
The usual time/date Fire date and time constants are permitted, e.g.
<last_accessed>9/31/1989, 15:35</last_accessed> <birthday>14-Jan-1982</birthday>
Blobs are useful when multi-line string values are supplied. Once a value has been coerced into a blob, the blob can be treated as a file. Consider an example xml file which supplies some fire code to be executed, e.g.
<mydata> <width>400</width> <height>500</height> <code><![CDATA[ args w=numeric, h=numeric # Draw a window of supplied dimensions window gw = wgraphic -dim=<w,h> box gw,-draw ]]></code> </mydata>
and the Fire code to define a fire structure, read the data and execute fire code within it:
atable test # Define the Fire structure (class) structure ~test.thing_t { numeric width numeric height blob code }
# Read the xml file into an instance of one of these ~test.thing_t obj xmlread mydata.xml,obj
# Execute the blob as though it were a macro exec obj.code(obj.width,obj.height)
Binary blobs can be read by specifying that incoming data is base-64 encoded or hexadecimal. There are 2 switches (-b64 and -hex) for this purpose:
class ~test.myblobs_t { blob base64_blob,-b64 blob hex_blob,-hex }
The above examples enabled us to read xml into Fire objects. The xml was pretty simple and not validated as to its well-formedness (!!). Well-formed documents are those adhering to an xml schema, which defines the layout of an xml document and can also impose constraints on element values within the data.
To validate an xml document against a known schema, we can again use the xmltest function, but this time add an additional "to-be-validated" parameter:
if (xmltest('services.xml',1)) !Document is valid
Fire's xml parser (which is in fact the Xerces parser from the Apache Software Foundation) will now do a much more thorough examination of the document, validating it against a schema (usually an xsd file pointed to by a url in the document root tag). Any data which is non-compliant with the schema will result in failure of the function.
By default Fire has xml validation turned off. The xmlread command can perform validation before reading the data by appending a -v command switch.
You may be asking yourself how we can create Fire structures which conform to an xml schema, since in the earlier sections we discussed reading xml documents into Fire structures which we have created manually. Ideally we would like to create Fire structures from a schema without having to know too much about schema definition, or even create Fire structures from an xml document without a schema. The next section discusses this.
You will see that the generated macro code contains commands to create several Fire structures, the names of which have been derived from element names within the xml schema. You will also see how various constructs (e.g. attribute { ... }) and member switches (e.g. -tag and -int) have been automatically generated. These extras have no impact on the Fire structures but have the purpose of imposing special behavior on the data when it is read from or written to xml. A full list of these switches can be found here.
The xsdread command creates a macro to be executed as a post-process rather than creating Fire structures directly because typically there is some customization required to the generated structures. Common post-processing tasks include:
renaming the structure member names,
assigning the structure classes to atables,
cast the elements to different types, e.g. string to blob.
It is therefore not intended for the xsdread command be used as a production command but as an application development tool.
There are some xml concepts which have no equivalent in the Fire language world, also some xml schema tags which are not processed. For this reason it is not possible to accommodate every aspect of xml schemas, e.g. the <any> tag is ignored, xpath facilities are not supported, and so on. In addition no validation of the schema itself is performed, Fire assumes that it is valid.
Our generated macro (perhaps after some minor edits) would be executed so that the new Fire structures get added to an atable (since all structure definitions must be members of an atable). This might be done as follows:
atable svcs scope = svcs exec services.cmd,-sc
We now have our Fire structures set up and can start reading xml data file into an object:
~svcs.ServicesType_t mysvcs xmlread svcs.xml,mysvcs
Here we have created an object (mysvcs) and populated it from the contents of an xml file svcs.xml.
Although in this example we created the Fire structure classes directly from the xml schema file (services.xsd), we could have created it from the xml data file (svcs.xml) instead, e.g.
xsdread svcs.xml,-of=services.cmd
Here Fire will find the url to the schema automatically and then do exactly the same process. This method just means you don't have to manually extract the url of the schema before doing the xsdread.
Can we generate Fire structures automatically without knowing an xml schema ?
The answer is yes, sort of. Without a schema, Fire cannot determine element data-types, value constraints or array limitations, but it can hazard a guess.
To get it to do this you throw an xml file at the xsdread command with the -raw switch. If the xml file has no associated schema, the tags are analysed and structures are created but all elements are assumed to be of type string. Differentiation is made between xml elements and attributes, and informed guesses are made as to which elements are optional and which are obligatory. Invariably you would edit the resulting macro, making your own assumptions about element types etc.
So far we have read whole XML files directly into Fire objects, but sometimes there is a need to pick isolated values from XML files, or you may wish to interrogate an XML file to inspect its element hierarchy.
We have made available a function xmlobj, which lets us examine an xml element (together with its child elements and attributes) via a component identifier, e.g.
component myxml = xmlobj('myfile.xml')
This creates a component object named myxml from the top-level element tag in the xml file, which we can then interrogate for information, e.g.
tell myxml.attribute_names
This will give us the names of all the attributes in the element.
To get the values of the attributes there is a member function, e.g.
tell myxml.attribute_svalue('title')
Sub-elements may be examined by creating more components from the top-level component, e.g.
component sub = myxml.child('item')
This provides access to the first sub-element named 'item', thereby enabling us to traverse down the hierarchy. There is also a function to traverse back up:
if (sub.parent == myxml) tell 'This is a second level element'
The full element hierarchy is therefore accessible.
A requirement might be to find all elements with a particular name within an xml file, and get a particular attribute value for each.
Consider the following to achieve this, e.g.
# Get a handle on the top-level element component fc=xmlobj('props.xml') # Get a handle on all elements tagged '<projection>' component elems[] = fc.find('projection',1); # 2nd parameter means recurse down the hierarchy # Get the value of the 'name' attribute for each string names[elems.alength] for i=1,elems.alength { names[i] = elems[i].attribute_svalue('name') }
The full range of properties and functions available to xml components can be found here.