Yes indeed, I forgot the XML parsers that are part of bigger libraries:
- POCO XML Parser: Has a DOM- and a SAX-like parser and seemingly supports DTD validation.
- Boost Property Tree: Wraps RapidXML (described above), so not really interesting.
Out of your other links,
Bare XML is the only real XML parser, which looks pretty basic. It supports some sort of validation, but it's probably a non-standard format. CodeSynthesis XSD is a code generator, VTD-XML is a language based on XML.
Now while we're still talking about XML parsers and the different features they support, I think it's about time to give you an overview of some of the existing XML technologies. In fact I originally wanted to explain them before discussing XML parsers, but the topic came up in the thread, so we have to catch up on this now.
XML (
wikipedia)
Description: XML is a text based markup language, used to exchange structured data between programs and users in a human readable format. XML is probably most prominently used in HTML. In XML, objects are represented by "elements". Each element can possess attributes and sub-objects.
Syntax:
Element: <Element /> or <Element></Element>
Subelement: <Element><Subelement /></Element>
Attribute: <Element attribute="value" />
Processing instruction <?name instruction ?>
Comment: <!-- comment -->
Usage in Orxonox: We use XML elements to represent our objects and attributes to represent their values. Processing instructions are currently not used (even though our lua tags follow the same syntax, they are processed by our own parser before the XML parser even touches the document). I recommend to parse lua with the XML parser in future, even though that will require some heavy changes (but lua in general is part of a future post in this thread).
CDATA (
wikipedia)
Description: CDATA is short for character data and allows insertion of arbitrary text/data in XML. Because this usually conflicts with the XML syntax, CDATA provides a separated element which automatically escapes all characters that have a meaning to XML (like <, >, ", etc).
Syntax: <![CDATA[arbitrary text or code]]>
Usage in Orxonox: Could be used to define lua scripts in XML that should not be parsed while loading the level, but rather be passed to a "script" object. Could also be an alternative way to define templates.
XML Namespace (
wikipedia)
Description: Like in C++, namespaces in XML allow to have different elements and attributes with the same name, but different namespace, hence preserving distinguishability.
Syntax: <Element namespace:attribute="value" />
There are different ways to use a namespace, but this is probably the most convenient way if used in different places across the XML file. Additionally namespaces have to be defined before usage.
Usage in Orxonox: I see a possible usage for this technique in the editor: we could save meta-data for each object, that is only relevant for the editor. For example, you could make some objects invisible in the editor (only in the editor, not in the game), which would be saved with a editor:visible="false" attribute. By using the "editor" namespace, the attribute doesn't collide with the "visible" attribute of BaseObject.
XPath (
wikipedia)
Description: XPath is a way to gather information from an XML file. With XPath you can retrieve an element with given attributes, list all elements of some type, get the number of subelements of an element, the value of an attribute of some element, etc.
Syntax: (no guarantee for correct syntax since I neither know XPath nor have a way to test it, but you should get a pretty good impression)
/a/b/text() # returns the text-value of "b" (which is a sub-element of "a")
/a/b[@attribute=value] # returns all elements of type "b" which are a sub-elements of "a" and have "attribute" with value "value"
count(a/*) # counts all sub-elements of "a"
Usage in Orxonox: Wherever we reference an object in XML, we currently use it's name (which is not a completely satisfying solution). This happens when we use a template, connect events and listeners, and probably some more occasions. XPath could be a convenient and powerful way to reference an object or a set of objects. Its query language allows to select specific objects, e.g. all objects within a given range of positions.
DTD Validation (
wikipedia)
Description: Validation in general is used to define how an XML file should be built. It defines the allowed elements and their attributes, as well as sub-elements, valid ranges for values, and more. DTD is the oldest and most widely supported (in XML libraries) way to do this.
Syntax: <!ELEMENT html (head, body)> # defines that the "html" element can have "head" and "body" as sub-elements (as far as I interpret it without really knowing DTD)
Usage in Orxonox: While validation in general may be useful, I prefer XML Schema. Though, since XML parsers that support DTD are way more common, we may also use DTD if necessary. For more details about validation see the next point about XML Schema.
XML Schema Validation (
wikipedia)
Description: XML Schema is basically the same as DTD, but defined in XML (DTD is not XML itself). It's the recommended standard of the W3C and supports recent developments of XML like namespaces (DTD does not). Appart from that, XML Schema is as powerful as DTD.
Syntax:
<xs:element name="Country">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="FR" />
<xs:enumeration value="DE" />
<xs:enumeration value="UK" />
</xs:restriction>
</xs:simpleType>
</xs:element>
Usage in Orxonox: An XML Schema (or DTD) could be useful in Orxonox because it allows to define constraints, allowed values, default values, etc. I'm not really interested in the validation itself (I assume we only generate valid XML files anyway, either manually or with an editor), but the XML Schema can be used to create XML files in an external XML editor. Through the XML Schema, the editor can list all allowed elements and attributes. Additionally the definition of default values is very powerful in combination with XPath (see above): If an attribute has a default value, but is not explicitly listed in the XML file, it will not be recognized by XPath. In combination with an XML Schema however XPath would also recognize the (implicit) default values.
XInclude (
wikipedia)
Description: Allows inclusion of other files (both XML files and plain text files)
Syntax: <xi:include href="file.xml"/>
Usage in Orxonox: Currently we include files with lua, in future we could do this with XInclude. A few posts above I explained that I want to remove the need for explicit inclusions, so this may be a bit surprising, but that statement was made regarding templates. But we could still include commonly used code like currently the weapon settings. XInclude is superior to lua include because it needs no pre-processing.
Some less important XML features that could still be useful:
XML Base (
wikipedia) allows the definition of a "root" path, all other paths (e.g. for XInclude) are then relative to this base.
xml:id defines a unique id for each object (or rather ensures that a given id is unique if present)
More techniques that are probably not useful in Orxonox, but just to give you some insight:
XSLT (
wikipedia) is a transformation language which transforms an XML document into another XML document following some rules. We will probably never use this, but one possible case could be to define GUIs in an Orxonox specific XML format and then transform it into CEGUI conformant files.
XPointer (
wikipedia) is a technique to address some parts of an XML document. I'm not entirely sure how this differs from XPath, but it's probably used to reference objects in other files. If at some point we know more about whether and how to use XPath in Orxonox, we can probably also tell if XPointer is of any interest to us.
XLink (
wikipedia) is used to link from one XML file to another. This is basically the same as links in HTML documents, but generalized for XML. This is probably only interesting if XML is presented to a user in some sort of a browser (like HTML).
XQuery (
wikipedia) is basically a query language like
SQL but instead of retrieving information from a database it does the same with an XML file. additionally it's a functional programming language.