XmlPL is a programming language which provides built-in features for XML processing integrated with a C-like syntax. Among these features are XML Path Expressions and XML Statements.
XmlPL grew out of frustration with the inefficiencies and over complicated nature of existing methods for XML manipulation such as XSLT, XQuery, and API approaches like DOM and SAX. XmlPL overcomes many of the problems with these approaches by treating XML as a native data Type, integrating Path Expressions with a C-like syntax and compiling rather than interpreting to create an intuitive and efficient XML processing language.
This is an informal document describing XmlPL's syntax in detail. It is intended to be a reference for those who wish to develop programs using XmlPL. XmlPL's library Functions are not discussed here. Furthermore, not many examples are provided here. Examples and additional documentation can be found at http://www.xmlpl.org/doc-overview.html.
This document is an initial draft. Therefore, portions will likely change as XmlPL stabilizes. However, the language has been in development and real world use since 2003, so changes to the language syntax should be minor. If you encounter errors in this document please email docs@xmlpl.org.
This section describes the conventions used in this document.
Topics beyond the scope of this document are generally linked to resources on the Internet which explain the topic in question in more detail. Because of the nature of the Internet there is no guarantee that referenced information will be the same or even still exist at the time you are reading this document. If you encounter dead or inappropriate links please inform the author by email to docs@xmlpl.org.
The language grammar is listed through out this document in boxes such as the one below:
Example grammar box
The grammar is described using the same variant of EBNF notation used in the w3.org's XML specification. A description of this notation can be found in the Notation section of the XML specification.
In addition the '.' character is used to match any character in the ASCII character set.
Lexical grammar tokens are written in all capital letters to distinguish them from parser tokens.
The XmlPL compiler is under active development and is currently in the alpha phase. For this reason there are some inconsistencies between the language specification and the actual implementation. As the compiler progresses these differences with disappear and should be completely eradicated by the time XmlPL reaches a 1.0 release. In the meantime, sections of this document which are either in flux or are not yet implemented as written are colored like the text in this section. If you have questions about the actual implementation please email the author at support@xmlpl.org.
All text in this document is copyright (C) 2006, Cauldron Development LLC and available under the terms of the GNU Free Documentation License. The terms of this license can be found online at http://www.gnu.org/copyleft/fdl.html or may be obtained by written request to: Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
The remainder of this document is divided into six major parts.
The Lexical Structure which describes how XmlPL Program text is broken up in to tokens which are used to define Program structure. The section Type System describes data Types in XmlPL and how they are used.
The last four sections, Declarations, Statements, Expressions and Native Block describe the structure of an XmlPL program. Declarations are the top most structure and may contain Statements which in turn may contain Expressions. The Native Block provides the means to access the underlying language to which XmlPL is translated and may be used as either a Declaration or a Statement.
This section describes the Lexical structure of an XmlPL Program.
Throughout this document whitespace within the grammar is ignored and can be assumed to be. allowed between tokens. However, in this section whitespace is named explicitly. Unless otherwise noted you can assume that whitespace is not allowed between lexical tokens.
Names in XmlPL are case sensitive and are used to identify Functions and Variables and in XML Statements and Path Expressions.
There are two types of identifiers or Names in XmlPL. Those used for Variable and Function names and those used in XML Element, Attribute, Processing instruction and Path names. Although, XmlPL, for the most part to unifies these types of Names there are some differences.
Too remain compatible with XML both types of Names allow the '-' character which is not allowed in languages like C, C++ and Java. This can cause some confusion in XmlPL. An identifier such as x-y does not mean x minus y like it can in other languages. However, accidently typing x-y when x minus y was the intent will cause a compile time error if the identifier x-y is not found. Using '-' in identifier Names is not recommended.
Additionally, the '.' character is allowed in Names. However, in Function and Variable names it has a special meaning. The '.' character is used a scope operator. Generally, its use is not necessary, but when a Function or Variable reference is ambiguous, i.e. it occurs in more than one of the imported libraries, then it is necessary to disambiguate the reference by prepending the full library path.
The final difference is that XmlPL Keywords are not valid Function and Variable names however they are allowed in XML and Path Expressions. This makes it possible to, for example, create an XML element with the Name "element".
Keywords are identifiers that have a special meaning in XmlPL. They are used to identify XmlPL Statements, Declarations, Built-in types and some Constant values. Keywords are not allowed as Function or Variable names.
Literal decimal, hexadecimal and binary numbers are supported in XmlPL. Decimal numbers can be Integers or Reals. Integers are 32 bit signed numbers. Reals are 64bit IEEE double precision floating point numbers.
XmlPL does not aim to be a completely general purpose programming language so a large variety of number formats and sizes are not provided. If these features are needed, those parts of the Program should be written in a more appropriate language and linked with XmlPL via a library or Native blocks.
Binary and hexadecimal Integers are supported by prefixing the number with '0b' and '0x' respectively.
Strings in XmlPL are UTF8 encoded. This means that US-ASCII characters are only 8-bits and length and will function just as they do in C, C++ and Java, but Unicode characters can be used where necessary. However, beware that UTF8 characters may be as long as 4 bytes. XmlPL doesn't support a character data Type so this is not a big problem. In the future a byte Type will likely be supported. This leaves it up to the user and library Functions to deal with UTF8 encoding when necessary without overly complicating the language or wasting large amounts of system memory (for the western world at least).
Literal Strings are formed by placing a number of US-ASCII characters between '"', quotation marks. Special characters, including the quotation mark itself, can be included in literal Strings via the '\' escape character. The table below lists valid escape sequences.
| escape code | description |
|---|---|
| '\\' | single backslash |
| '\n' | new line |
| '\t' | tab |
| '\"' | double quote |
| '\x' HEX BYTE | 8-bit hexadecimal character code |
| '\' OCTAL CODE | 8-bit octal character code |
| '\u' HEX WORD | UTF8 character code |
| '\U' HEX WORD HEX WORD | UTF8 long character code |
XmlPL is statically and strongly typed. XmlPL also supports one dynamic Type and three generic Types.
An exception to XmlPL's static typing is the node Type. A node Variable can point to any of the other XML Types or null. This Type is necessary because an XML element may have Element, text, comment or pi children in any order. Processing element children therefore requires a small amount of dynamic typing in an otherwise staticly typed language.
However, the XmlPL compiler does its best to figure out exact Types and avoid the use of node. For example, the result Type of a Path Statement depends an the axis selected. A Path Expression will only have a node Sequence Return Type if it is not sure at compile time what the result Type will be.
There are several Built-in types in XmlPL which make the language unique. Specifically, XML is a native data Type. The XML Types represent pointers to XML data.
The Boolean Type is allowed two values true or false.
Integer and Real Types are numbers as described in the section Numbers.
Strings in XmlPL are pointers to immutable arrays of character data. Concatenating one or more Strings together creates a new string. String values are garbage collected when no more Variables point to them. Literal Strings are described in the section Strings.
The exception Type is used in Throw, Try and Catch Statements.
Prefix and QName are used to form qualified XML Names.
The void Type is used in two specific cases. Functions which do not Return any data should be declared with a void Return Type. Expressions, such as a Function call, which should not append to the current target output stream should be Cast to void.
The built-in XML types make XmlPL unique. Except for the node Type the XML types correspond directly to XML structures. These Types are pointers to XML data which are garbage collected when no longer referenced.
pi is short for Processing instruction.
There is no support for XML entities or cdata sections in XmlPL. However these are converted to text by the XML input processor and can be manipulated as such.
The node Type is XmlPL's only dynamic Type. Nodes may point to any of the other XML types. The actual Type can be found at runtime via library calls.
XmlPL supports three generic Type modifiers. The '[]' modifier defines sequence of the specified Type. The '++' defines an iterator over the specified Type. Finally, the '<<' defines an output stream of the given Type.
Any other Type may be a sub-type of the generic Types except the generic Types themselves. This means it is not possible to create a Sequence of Sequences or an iterator over output streams.
These three generic Types form a powerful I/O and memory management abstraction, are closely related to one another and are heavily integrated into the language. The compiler treats these Types specially when used in Function output, Redirect and Foreach Statements and XML Path Expressions. Optimizations of the behavior of these generic Types allow for a large performance savings over traditional DOM and XPath based APIs.
In many cases XmlPL performs implicit Type coercion. This means that if the actual Type of a value is not valid for a Function call, Variable Assignment or operator the compiler will attempt to coerce the Type into a more appropriate Type based on a set of built-in Type casting rules. These rules are not always enough and can even result in ambiguities in which case the compiler will report an error. These ambiguities can be resolved through Cast Expressions.
The table XmlPL - Implicit Type Coercion shows XmlPL's built-in casting rules.
Declarations can declare the existence of Functions and Variables in XmlPL.
The Program Declaration may start with a Package Declaration and contains the entire code of an XmlPL Program or library. The Package Declaration can be used to assign a Name and Version to a XmlPL library.
In general, Declarations are the top most Elements of the language and may contain Statements which in turn may contain Expressions.
Import Declarations are used to make Functions and Variables contained in a library accessible from the current Program or library. Import Declarations have the following format.
Version Expressions can be used to enforce restrictions on which libraries are imported. A specific version, a range of Versions or a list of specific Version and/or ranges can be specified.
A library's namespace can be renamed on Import by using the as keyword. This makes it possible to disambiguate references to libraries with similar Names or to Import different Versions of the same library. Two libraries with the same Name and Version cannot be imported into the same Program even indirectly.
Functions are a basic building block of Program structure in XmlPL. Function are invoked by Function Calls.
Functions with the same name, but different Arguments are allowed, a.k.a. function overloading. However, two Functions with the same Name cannot differ only in Return Type.
The return type, function name, and the types, order and number of arguments taken together are often referred to as the function signature.
As in C, all Functions must be declared before they are referenced. For this reason forward Declarations are supported. Declaring a function's signature and simply terminating it with a semi-colon creates a forward Declaration.
Functions which Return a sequence Type actually emit a stream of data through an implicit output stream of the same Type. If this data is received by an actual Sequence Variable the Sequence will be constructed. Otherwise, even though the Function Return Type is a Sequence Type the output is a stream of events which may be output directly to a file descriptor. This affords a large performance improvement when constructing the XML DOM is not necessary.
Functions may have zero or more Arguments. Argument names must be unique within a Function and may be of any valid Type. Variable length or default Arguments are not supported.
Arguments may be passed by value or by reference. The ampersand symbol is used to indicate a pass by reference. Arguments passed by reference may be modified by the called Function otherwise Arguments are constant. Some Types, notably the XML types, generic Types and String Type, are always passed by reference via an underlying pointer. In the same way Java treats object references the called Function may modify the data pointed to by these references. If these Types are passed by reference the calling Function may also modify the callers reference Variable as well.
As in C, C++, Java and many other languages programs in XmlPL are started with a call to the special function Main. Unlike other functions, Main may not be overloaded and must be present in all programs. Libraries do not require and should generally not contain a Main function.
There are a few different valid Main function signatures. These are described below.
As the grammar indicates the Main function may return either node or string streams or an integer value. In the case of stream return types the program's return code will be zero unless an error occurs such as a signal or an uncaught exception in which case a non-zero value will be returned. In the future integer or byte streams will likely be supported.
The two possible arguments of type string[] and document are the program arguments and standard input stream respectively.
Command line arguments are put in a sequence of strings much like in Java. The number of arugments can be found by measuring the size of the string sequence.
If the document argument is present the standard input stream will be interpreted as an XML document. Because of the nature of XmlPL's underlying DOM implementation the input stream is not parsed until accessed. The input stream can be accessed using XML Path Expressions.
All Variables must be declared before they can be used. Variables declared Constant cannot be modified and must therefore be initialized with an Assignment Expression. Non-constant Variables may also be initialized. Like Java, but unlike C or C++, XmlPL does not allow uninitialized data and will initialize all Variables to their respective null value if not explicitly initialized by the programmer.
Global Variables are those Variables declared outside of a Function. Global Variables are allocated staticly at program load time. Global Variables in libraries can be accessed directly in Programs which Import the library. Global Variables are not thread safe and must therefore be protected in the presence of threads.
Local Variables are those declared inside a Function Declaration. Local Variables are created at execution time on the stack and have a limited life. Local Variables go out of existence when they fall out of scope of the executing process.
Statements are responsible for Program flow control and thus direct the execution path. Function Calls and short circuiting Boolean Expressions can also affect Program flow.
In XmlPL appending to the current target output stream is the default behavior.
Expressions evaluated as standalone Statements are treated as append Statements. Assignment and Iterator Expressions are exceptions to this rule. If a target output stream has not been declared append Statements will cause a compiler warning to be generated.
The target output stream is declared either with a Function with a sequence Return Type or a Redirect Statement. In the presence of a valid target output stream Function Calls which Return a non-void value, but should not append to the target stream, must be Cast to void or have their value captured by an Assignment.
The result Type of an append Statement must be compatible with the target output stream's Type otherwise a compiler error will be generated.
A Block Statement is used to indicate that a Sequence of Statements should be executed together and in order. Block Statements create a new lower scope in which new Variables can be declared and may even, by reusing a previously declared name, shadow Variable Declarations made at a higher scope. Variables declared in the same scope must have unique Names otherwise a compiler error will be generated.
If Statements provide conditional execution. First the Parenthesized Expression is evaluated. The result of this evaluation is Cast to a boolean. If the result is true, the If Statement is executed. An optional else Statement is executed if the Expression was false.
Loop Statements make it possible to execute a Statement repeatedly. Loop execution can be effected by Break, Continue, Return and Throw Statements which occurring in the loops child Statement a.k.a. the loop body.
The While loop is the simplest loop Statement. First its Parenthesized Expression is evaluated. If true, the Statement or body of the While loop is executed. Then the Expression is reevaluated and if still true the Statement is executed again. This continues and will result in an infinite loop unless either one of four termination conditions occurs:
Foreach Statements are used to loop over Sequences. First the Parenthesized Expression is evaluated. If Expression evaluation results in a non-sequence Type the result is Cast to a Sequence of that Type. The body of the loop, unless interrupted, is evaluated once for each of the items in the Sequence starting from the beginning. If the resulting Sequence is empty the loop is not evaluated. At each pass the special context variable, represented by the '.' symbol is set to the value of the corresponding item in the Sequence. Execution continues until either the end of the Sequence is reached or termination conditions 2, 3, or 4, as described above in While loop evaluation, are encountered.
For loop evaluation is best explained as a series of steps:
Switch Statements evaluate certain Case Statements depending on the result of evaluating the Parenthesized Expression. Once the Expression is evaluated, if the resulting Type is an integer then the switch type is integer. If the resulting Type is or can be Cast to a string then the switch type is string. Any other Types will cause a compiler error.
The Switch Statement will compare the result value to each of the Case Statements and execute the matching case. If no Case matches the default Case is executed. If no Default case exists and no Cases matched the Expression result then nothing is executed.
In practice the compiler is usually able to better than comparing the result Expression to each case. The result is the same, but Switch Statements are often more efficient than a series of If Else Statements and certainly less typing.
Unlike C, C++ and Java, XmlPL supports String switches. These are very useful for switching based on Element or Attribute Name among other things. The compiler can arrange string Cases into a binary search tree. This can greatly reduces the number of string compares in large Switch Statements.
Case Constants must be unique within a Switch Statement. It is not allowed to mix integer and string Cases. More concisely, Case Constants must match the result Type of the Switch Expression.
If a Case Statement matches the Switch expression's value the first Statement, if any, following the Case is executed. Execution continues until either a Break or Return Statement is encountered, an exception is thrown or the end of the Switch Statement is reached. Execution can pass on from one Case Statement to the next if Switch execution is not otherwise terminated. This makes it possible to group Cases together and execute the same Statements if any one of the Cases are matched. It is also possible to pass from a Case on to the default case.
A default Case is not required but must occur last.
Break, Continue and Return along with Throw Statements effect Program flow by interrupting the execution of their parent Statements.
Break Statements are only allowed within interruptible Statements such as the loop Statements While, Foreach and For and in Switch Statements. It is an error for a Break Statement to occur anywhere else. When Program execution encounters a Break Statement execution of the closest interruptible parent Statement is stopped immediately. Execution proceeds immediately after the interrupted Statement.
A Break Statement applies only to one interruptible parent Statement.
The Continue Statement only applies to the loop Statements While, Foreach and For and cannot occur anywhere else in the Program. In all cases execution of the loop Statement stops immediately. How execution proceeds varies slightly for the different loop types.
In a While loop the condition Expression is reevaluated and if true the loop body is executed again from the top.
The Foreach loop moves on to the next item in the Sequence it is processing. If the end of the Sequence has been reached execution stops. Otherwise the loop is executed again from the top with the context set to the next value.
In a For loop the condition Expression is reevaluated. If the result is false loop execution is complete. If true the iteration Expression is evaluated and the loop body is executed again from the top.
When a Return Statement is encountered execution of the current Function terminates after possibly evaluating the Return Expression.
The Return Expression must match the Return Type of the current Function. If the Function Return Type is void or a Sequence Type then a Return Expression is not allowed otherwise it is required. The value of the Return Expression if any will be the resulting value of the Function call Expression which called the current Function.
Throw, Try and Catch Statements are used for exception processing.
Throw Statements generate exceptions. First the Expression is evaluated. If the Expression does not result in an exception value it is Cast to an exception if possible. If the result value cannot be Cast to an exception an error will be generated at compile time. Immediately after the Expression is evaluated and possibly Cast the exception is thrown. The Program call stack will begin to unwind until either a parent catch Statement is encountered or the top of the Program is reached in which case the Program is terminated with an error.
Try Statements are used to catch thrown exceptions. When a Try Statement is encountered the try block is executed. If an exception is thrown during execution of the try block the call stack will begin to unwind if no other Try Statement is encountered first the unwind will be stopped. Then execution will proceed with the catch block. If no exception is thrown in the Try Block the catch Block is never executed.
During catch block execution, the context variable will be set to the thrown exception. The catch block can either handle the exception or rethrow it. In either case once the catch block is finished the Try Statement is complete.
Only one catch is allowed or necessary because XmlPL, for simplicities sake, only supports one exception Type.
Redirect Statements are used to set the current target output stream. When encountered first the Parenthesized Expression is evaluated. If it does not evaluate to an output stream Type and cannot be Cast to one a compile time error will occur. Otherwise, the current target is set to the resulting output stream. The Statement is then executed.
If any append Statements are encountered they will append to the newly set target output stream unless further redirected at a lower level. Append Statements must match the output stream Type or be castable to that Type.
Redirect Statements do not effect Function Return Type.
To quote the compiler, 'Use of the Empty Statement is not necessary or recommended'. Its main purpose is to detect accidental ';' symbols. To create a loop with no body use a Continue Statement instead. All other Statements are pointless with out a body. A compiler warning will be generated if the Empty Statement is encountered.
Expressions are evaluated to yield values. Every Expression has a result Type. This is the data Type of the value resulting from evaluating the Expression. The result Type of an Expression is known at compile time. This is known as static typing.
Most Expressions have no side effects when evaluated. In other words their evaluation does not change the value of any Variables or effect any system resources.
Exceptions to this general rule are Assignment and iterator Expressions. Expressions which append to the current target output stream could also be considered exceptions to the rule but these are really Statements. Additionally, Function call Expressions can also indirectly cause side effects.
To understand how Expressions are evaluated it is important to understand precedence and associativity. These will not be explained here. There are many other places to find this information. The table below shows operator precedence and associativity in XmlPL.
| name | operators | associativity |
|---|---|---|
| assignment | '=', '+=', '-=', '*=', '/=', '%=', ',=' | right |
| sequence | ',' | left |
| or | '||' | left |
| and | '&&' | left |
| bitwise or | '|' | left |
| xor | '%' | left |
| bitwise and | '&' | left |
| equality | '==''!=' | left |
| relational | '<', '>', '<=', '>=' | left |
| shift | '<<', '>>' | left |
| addition | '+', '-' | left |
| multiplicative | '*', '/', '%' | left |
| not, sign and compliment | '!', '-', '+', '~' | unary |
| cast | '(type)' | unary |
| iterator | '++', '--' | unary |
| release | '$' | unary |
| filter | '[]' | unary |
Assignment Expressions are one of the few Expressions that produce a side effect. Unlike most binary Expressions the right-hand Expression is evaluated first then the left. The resulting Expression Type is that of the left-hand Expression.
The left-hand Expression must evaluate to a single non-constant value. Examples of this are non-constant Variable references and Sequence item Assignment via a Filter Expression with integer Predicate. Other results will cause a compile-time error.
The result of evaluating the left-hand Expression is assigned the value of the result of evaluating the right-hand Expression. If the Types are not the same the right-hand Expression is Cast to the left-hand expression's Type if possible. Otherwise a compile-time error is generated.
Since the right-hand Expression is evaluated first the original value of the left-hand Expression can be used in computing the new value.
The '=' is the only true Assignment operator. The others are syntactic sugar for x = x <op> (y) where the original Expressions is of the form x <op>= y.
All of these operators occur in C, C++ and Java except ',=' where the Expression x ,= y means, append y to Sequence x. In this special case y is not Cast to the Sequence Type x, but if necessary it is Cast to the Sequence sub-type of x.
Sequence Expressions are used to create Sequences. Sequence Expressions are evaluated from left to right. The resulting Expression Type is a Sequence of the left-hand Expression Type. the right-hand Expression is Cast to the left-hand Type if possible.
A Sequence Expression cannot create a Sequence of Sequences. If either or both of the sub-expressions results in a Sequence then the final result is a Sequence which is the concatenation of the two sub-expressions.
If the right-hand Expression results in a Sequence that is not the same Type as the Sequence Expression Type an attempt will be made to Cast the items of the right-hand Sequence to the sub-type of the Sequence Expression. If this is not possible a compile time error will be generated.
XML Expressions make it possible to inline XML data in XmlPL with some exceptions. Element bodies are treated as a Block of Statements and some characters are treated specially in Comments and Processing Instructions. XML Declarations beginning with a '<!' symbol, cdata sections and entity references are not supported. These constructs are not really necessary and greatly complicate XML processing. A correct implementation of XmlPL will however accept any valid XML as defined by the Extensible Markup Language (XML) 1.0 specification. Unsupported Elements are simply dropped or, in the case of cdata sections and entity references, converted to text.
A consequence of these restrictions is that DTDs are not supported. DTDs are unfortunately still widely used, but obsolete in the presence of technologies such as XML Schema and RelaxNG. DTDs can easily be converted in to other formats using free tools such as NekoDTD
Elements are the most basic building block of XML data. In XmlPL Elements may be simply written explicitly as in an XML document or they may be annotated with Expressions which dynamicly create the Element name and Attributes at runtime.
Both Element long and short forms are supported. In the sort form the Element has no children and is terminated with a '/>' rather than an end tag.
In the long form an Element contains a Block of Statements. If Statements other than Append Statements are used the Element is a Statement rather than an Expression and cannot occur as a sub-expression. Append Statements within an Element are redirected to the parent element.
A consequence of this is that, unlike in real XML, text strings must be quoted. However, this gives the developer precise control over white-space which otherwise can be problematic. This results in XmlPL Programs producing smaller, but unformatted XML data which is generally preferable because of the space savings, but can be difficult to read. This can easily be solved using tools like tidy, xmllint or by simply creating your own XML pretty printer in XmlPL.
The closing Element tag must match the opening one unless a Element name Expression is used in which case no closing Name is required.
Attributes can be set in an Element in one of two ways. Either by explicit Attribute construction via the above grammar production or through an append Statement. For this reason Element Expressions are not committed to the output stream until either the first non-attribute child is encountered or the Element is ended.
In the first case both Attribute name and value can be simple Constant values or Expressions which are evaluated at runtime.
In the second case Attribute appends must occur before the first non-attribute append Statement otherwise they will be Cast to strings and append as XML text. This allows for programmatic generation of Attributes.
In all cases if an Attribute is set more than once the last value is the one that will stick.
XML Comments can contain Parenthesized Expressions which are evaluated and Cast to string. The resulting Comment is the concatenation of the explicit text and the result values of the Expressions.
Neither the '((' or '-->' symbols are allowed in explicit text. '((' is interpreted as a single '(' in explicit Comment text.
Processing Instruction Expressions like Elements can either contain an explicit XML Name or an Expression. In the Case of an Expression it must evaluate to a string or be castable to a string.
Processing Instruction data is processed in the same way as Comment text with the exception that explicit text may not contain the '?>' symbol but may contain '-->' symbols.
Boolean Expressions always result in a boolean value. An '||' or Expression has a true result if either of the sub-expressions is true otherwise it is false. Similarly, a '&&' and Expression is true if both sub-expressions are true otherwise it is false. The '!' not Expression is true if its sub-expression is false and is false otherwise.
The Boolean operators are special in that they perform short-circuit evaluation. If the left-hand Expression of an Or is true, the right-hand Expression is not evaluated because the overall result is already known to be true. If the left-hand Expression of an And is false the right-hand Expression is not evaluated.
It is important to be aware of short-circuiting if a right-hand Expression is expected to produce a side-effect such as an Assignment.
The Bit operators as the name suggests perform bitwise operations on Integers. The operators are '|' bitwise or, '&' bitwise and, '^' xor, and '~' bit complement. Bit Expressions always result in an integer value.
Relational Expressions are used to compare two values and always result in a boolean value. The operators are '==' equality, '!=' inequality, '<' less than, '>' greater than, '<=' less than or equal and '>=' greater than or equal.
The sub-expressions of a Relational Expression must be comparable or castable to a comparable Type such as integer, real or string. If the left-hand Type is not the same as the right-hand Type an attempt is made to Cast the right-hand Type to the left-hand Type. If this is not possible a compiler error will be generated. strings are compared lexicographically.
Shift Expressions perform bitwise shifts of integer values. Their sub-expression must be integers or castable to integers. The result is always an integer. The operators are '<<' or left shift and '>>' or right shift. Both operators Shift the integer result of the left-hand Expression by the integer result of the right-hand Expression number of bits in the specified direction. Zeros are shifted in and the bits shifted out are lost.
Arithmetic Expressions provide '+' or addition, '-' subtraction, '*' multiplication, '/' division and '%' modulo operations on integers and reals and have the usual mathematical meanings. Additionally, the '+' operator can apply to strings in which Case it performs string concatenation.
In XmlPL the '-' is also a valid identifier character. x-y should not be confused as x minus y. It is an identifier pronounced x dash y. This is for compatibility with XML Names. If subtraction is the desired effect the Expression should be x - y.
Sign Expressions apply to integers and reals and result in an integer or real. There are two Sign Expressions '+' positive and '-' negative. The positive Sign has no effect. The negative Sign inverts the Sign of its sub-expression.
Cast Expressions are used to explicitly change the Type of an Expression. Many Type Casts occur automatically. Type Casts are necessary to disambiguate calls to overloaded Functions, when an automatic Cast is not the desired result, when a Cast is possible but requires more than one step, or to avoid appending to the target output stream. Some examples are given below.
Given two Function signatures void f(node x) and void f(string x) and if A is an Attribute, the Function call Expression f(A) is ambiguous because A could be Cast to either a string or a node. Rather than guess the compiler will generate an error. This error can be resolved by casting A to one of the two Types: f((string)A).
If S is a string Variable the comparison S == true is true if S is not the empty string. However, if comparison to the string value of true was desired S == (string)true will be true only if S is equal to the string value "true".
Sometimes a Cast from one Type to another is possible but not automatic. This occurs when more than one casting step is required. For example the Expression @name == "test" will cause a compiler error because it is not possible to directly Cast a Attribute Sequence to a string. However an Attribute Sequence can be Cast to a single Attribute and an Attribute can be Cast to a string. (attribute)@name == "test" fixes the problem.
Iterator Expressions apply to integers and Sequence iterators. Their result value depends on whether the operator occurs before or after the Expression. If before the Iteration is performed then the Expression result is evaluated. If after the Expression result is calculated first then the Iteration is performed.
Iterator Expressions are special in that they have a side effect and do not become append Statements when executed standalone.
Path Expressions are used in XmlPL to query and select XML data. Those familiar with the w3.org's XPath 1.0 should find XmlPL's Path Expressions very intuitive. However, they are not the same. XmlPL Path Expressions are simpler and more compatible with XmlPL's C like syntax.
XmlPL Path Expressions are also compiled which makes them much faster than most XPath implementations. Don't be fooled by XPath implementations which claim to compile XPath, but really just build an Expression tree.
Major differences between XmlPL Path Expressions and XPath 1.0 are noted below.
| XPath 1.0 | XmlPL | |
|---|---|---|
| Equality operator | '=' | '==' |
| And operator | 'and' | '&&' |
| Or operator | 'or' | '||' |
| Modulo operator | 'mod' | '%' |
| Divide operator | 'div' | '/' |
| Union operator | '|' | not supported directly, the removal of duplicates at each Step is inefficient and usually unnecessary. Use a Function to filter duplicates when necessary. |
| Axis names | supported | supported indirectly via Functions |
| Node Test Or'ing (e.g. root/(x|y)/*) | only in 2.0 | supported |
| Reference a Variable x | $x | x |
| Implicit context (i.e. root[child]) | allowed | not allowed. use root[./child] |
| select Processing instruction by name | root/processing-instruction("name") | root/?name |
A Path Expression consists of an Expression followed by one or more Path steps which select parts of the XML document.
Each Step results in a Sequence of XML nodes. These nodes can be filtered by Node Tests and Predicates. The default XML axis is the child axis. For example, x/child selects all the Elements which are children of Elements in the Variable x which have the Name "child". Other axes, such as the Attribute, parent or Processing instruction axes, can be selected with a Context Step.
The Context Steps are '@' attribute axis, '..' parent axis and '?' processing instruction axis.
In XmlPL there is no guarantee that the parent axis will Return a non-null value. Parent pointers are so-called weak pointers. This means a parent pointer is not enough to keep an XML Element in memory. With out a pointer to the Element or one of its ancestors, such as the document root, the Element may be garbage collected.
Context Steps are named as such because they can be used with or without specifying a Path context. This makes it possible to perform operations such as root/child[@name = "test"]. Notice it is not necessary to write root/child[./@name = "test"] because with context, '.', is implicit with the '@' step.
In contrast, to select the "child" Elements which have at least one child named "x" specifying the context is necessary: root/child[./x]. With out the context, root/child[x], the Statement means select all the "child" Elements if the Variable "x" is true.
Node Tests filter the nodes of a Step. Name Tests filter nodes by name. A series of Names can be or'ed together with parenthesizes and the '|' operator. Type tests test the XML type of a node.
Name Tests filter Path Steps based on Element, Attribute or Processing instruction name. Names which do not match are removed from the Path result.
Type Tests test XML nodes in Path Expressions for a specific Type. Although they appear to be Functions they are not.
Predicates further filter Path Steps and are executed after Node Tests. The Predicate if any is called once for each node selected by the step.
During Predicate evaluation the context variable '.' is set the the value of the node being tested.
If the Predicate evaluates to false the node is removed from the Path result. In XmlPL there is no guarantee when a Predicate will be called. For example, iterators can call Predicates long after the original Path Expression was executed. In this way Predicates act as a kind of lambda function.
The result Type of a Path Expression can be calculated by the compiler to some degree. The table below shows the result Types of various Path Expressions.
| example | result type | |
|---|---|---|
| child axis | x/child | element[] |
| parent axis | x/.. | element[] |
| attribute axis | x/@name | attribute[] |
| processing instruction axis | x/?somename | pi[] |
| type test | x/<type>>() | <type>>[] |
| attribute axis in predicate | @name in x/*[@name == "y"] | attribute not attribute[] |
The Release Expression makes it possible to use a Variable value and Release its reference in one step. Unreferenced values can be garbage collected. Release Expressions make it possible to control memory usage.
Filter Expressions work much like Predicates. There are two modes of operation depending on the Type of the right-hand Expression. In either case the left-hand Expression must evaluate to a Sequence Type.
If the right-hand Expression results in an integer then the Filter Expression will be interpreted to mean select the item at position x, where x is the result of evaluating the right-hand Expression. In this case the result Type of the Expression will be the sub-type of the Sequence on the left-hand side.
Otherwise the right-hand Expression will be executed once for each item in the Sequence. If the result is true the item will be added to the result Sequence. The result Type will be the same as the left-hand Expression.
Constant Expressions produ