An interactive SaxonJS/JavaScript workbench processor of Invisible XML - Workbench Version 1.4
TODO:
For the runtime files for the iXML processor and a sample jωiXML application see here.
This workbench runs entirely within the browser client, using SaxonJS as the top-level program and the jwiXML JavaScript library. There is no server-side processing, apart from initial delivery of necessary files.
You can input and edit an iXML grammar either ab initio or by loading from a file on your computer or one of the test-case or sample grammars from the iXML GitHub repository. The input string can also be edited or loaded from file or, in the case of a test-case or sample grammar, by selecting from one of a number of provided input strings, relevant to the grammar in question.
Local file selection is either by a conventional selector dialog or by drag-and-drop of a file onto the relevant text area. (Note that on Firefox security hurdles may preclude drag-and-drop.)
iXML grammars are edited in the upper textarea, where usual keystokes are supported, but there is no 'syntax awareness' during input:
The 'format' button above the textarea will, for a valid grammar, 'pretty-print' by replacing the text
with a canonical ixml rendering of the parsed grammar. This form will line up all the rules so
all their names are right aligned and their definitions left-aligned. Strings should be enclosed in the
quotation characters used in the original (doubling such characters within the string as necessary). For
alternatives, the separator character used is the first separator character (i.e. ';'
or
'|'
) encountered whilst parsing that set of alternatives. If the serialisation of the
definition of a rule will be longer than 50 characters, top-level alternatives will have a newline and
appropriate indentation attached to their separators.
This means that, for instance, an original line of the form:
a: [L] | "s" ; '"', #a; bcdef ; bcdef ,"a", bcdef; bcdef, "b", bcdef. bcdef: [N]|("1"; "2" | "3").
will be formatted to
a: [L]| "s"| '"',#a| bcdef| bcdef,"a",bcdef| bcdef,"b",bcdef. bcdef: [N]| ("1";"2";"3").
Input strings to be parsed by that grammar are edited in the lower text-area:
The size of both of the textareas can be adjusted (at least on Chrome and Firefox) by the resizer 'chevron' at the lower right hand corner.
With the grammar and potential input string edited, clicking on the GO! button causes the following actions:
The grammar defined by the text in the Grammar window is parsed and compiled as an iXML grammar to produce an internal object representing the compiled grammar. Assuming the grammar has valid iXML syntax, this is then displayed in the 'Grammar Details' section (which is normally hidden - just click on the bar to reveal or hide).
Here various projections of the grammar can be displayed either in XML format or an iXML textual serialisation, or a colour-highlighted iXML form. Either the original parsed grammar or the compilation (i.e. where the grammar has been reduced to a canonical form) thereof can be shown.:
If you want to copy the parsed XML-format grammar the 'select grammar' button will select the whole of the grammar XML, so a simple 'Copy' keystroke action can get it into the clipboard as text.
Assuming the grammar has compiled and the input text string is not empty (or the 'Allow an empty string as input' option is checked), the text string is then parsed against the grammar, giving results in the Results section.
As with the parsed grammar, if you want to copy the resulting XML, the 'select result' button will select the whole of the result XML (and multiples if ambiguous or record-oriented processing was performed), so a simple 'Copy' keystroke action can get it into the clipboard as text.
There are three main categories of errors detected:
Errors in the iXML grammar presented will be displayed under the 'Grammar Details' section. Such errors can be one of:
Grammar failure G000: Invalid rule syntax. Missing rule terminator character. Expecting character:'.' - given: 'p' (codepoint 112). Near line 9, column 2. product: term, "×", operand. ^and
Grammar failure S10: A Unicode character category code must match [A-Z][a-z]?. Provided: 'LZ'. Near line 11, column 8. id: [LZ]. ^
The grammar supplied, whilst grammatically correct (i.e. its text parses correctly) is invalid. Such cases include
Condition | Example |
---|---|
There are references to non-terminals for which a rule definition has not been provided. |
Grammar failure S02: No production rules for non-terminals: number |
There are multiple rule definitions for a non-terrminal. |
Grammar failure S03: Adding productions for an already-defined non-terminal: id |
There are non-terminal rule definitions which are unreachable through reference from the starting rule. This will only be detected if the Prohibit unreachable non-terminals option has been checked. |
Grammar failure S002: Unreachable production rules for non-terminals: b |
The parsing of an input string against a valid iXML grammar can fail for a number of reasons. Such errors
are displayed in the 'Result' portion of the workbench, currently as, in line with the specification, an
XML document with @ixml:state="failed"
on the topmost element, such as:
<ixml xmlns:ixml="http://invisiblexml.org/NS" ixml:state="failed"> Failure at line 1 column 2 Given '/' (codepoint 47). Expecting one of: "+" {#8: sum: term, "+", term++"+".}, "×" {#10: product: term, "×", operand.}, [<=>≠≤≥] {#4: compare: ["<=>≠≤≥"].} Input: a/b ^ </ixml>
where an unexpected (operator) character was encountered. In this case the processor attempts to identify what characters would have been admissable at this point in the parse, and in which rules (identified by line number and with original source) the parse failed. (This is currently not available for internally-generated rules, such as those for repetition constructs.)
In cases of high potential ambiguity, such as the grammar:
specification: "{", rule*, "}". rule: definition*. definition: id, "=", value. id:[L]. value:[N].
when run with an input that can trigger such ambiguity, such as {}
(possible solutions could
include no rule or a potentially infinite sequence of rules each containing no definitions) the
jωiXML processor can get into an infinite loop. Internally there is a limit of triggering
1000 productions on processing a character from the input string. If this limit is reached a failure will
assumed:
<ixml xmlns:ixml="http://invisiblexml.org/NS" ixml:state="failed"> Probable looping processing character '{' @ line 1, column 1</ixml>
The input string may have parsed correctly, but it is still possible that the conversion (serialisation) to XML fails as the resulting tree would not be a valid XML document. In such cases the error is reported, again in the 'Result' section:
<ixml xmlns:ixml="http://invisiblexml.org/NS" ixml:state="failed"> An attribute node may not be the final parse result @input</ixml>
or
<ixml xmlns:ixml="http://invisiblexml.org/NS" ixml:state="failed"> Multiple nodes may not be the final parse result:<expression/>,@compare,<expression/></ixml>
Note that when 'Treat as records' is enabled, multiple document trees can be generated and will be serialised in sequence in the result display.
The following options controlling grammar parsing, input string treatment and result display are supported via checkboxes. Where appropriate the corresponding option or invocation in the jwiXML.processor API is described:
Option | Default | Effect | API equivalent |
---|---|---|---|
Show Advanced options | Make the advanced (and experimental) options visible. Ordinarily these shouldn't be needed - the defaults maximise the comformity of the processor. See the table below for a description of these options. | ||
Prohibit unreachable non-terminals | When checked, all non-terminals in the grammar must be reachable through a reference path from the start (first) rule. | compile() option 'unreachable' | |
Allow an empty string as input | Normally if the input is an empty string, no attempt is made to parse - just the grammar is processed and displayed. Checking this allows processing of an empty string as input, which is probably only needed for certain test cases. | ||
Treat as records | When
checked, the input is assumed to be a sequence of records separated by character sequences which match
a given regular expression (for which '\n' is the default). The separator can be edited in
the displayed text input, when this option is selected. The result is a sequence of documents, each serialised in the output result area. For repetitively structured data where the repetition separators do not appear in the data 'records' this technique can be vastly more efficient than describing the repetition/separation in the iXML grammar itself. |
parseRecords() 3rd argument $separator | |
Show only one ambiguous solution | When the parse is ambiguous, with multiple possible solutions, this forces only one to be returned, which will still be marked as ambiguous. | parse() option 'justOne' | |
Indent result | When checked, the results will be displayed as a serialization of the XML tree with indentation applied. This means that whitespace-only text nodes may be altered or in some cases deleted. If your application requires strict whitespace preservation, uncheck this option. |
Advanced Option | Default | Effect | API equivalent |
---|---|---|---|
Permit missing non-terminals | When checked, missing non-terminals may be referenced in the grammar (e.g. for experimentation in grammar combination). Using this during input parsing will lead to unpredictable results - usually some sort of crash. | compile() option 'missing' | |
Tovey-Walsh rewrites | When checked,
f+ constructs are rewritten as f+ => f-plus. f-plus: f, f-plus| ().
rather than the f-plus: f, f*. rewrite given in the spec. This is currently the default,
as it seems to perform significantly quicker. |
compile() option 'twRewrites' | |
Show Parser States | Displays the internal state transitions of the Earley parser operating on the input. This is NOT recommended for use with large grammars and inputs as memory overflows can be encountered. | ||
Show all processed marks | If
checked, directive marks for deletion ('-'), attribute ('@') or rename aliasing ('>') serialisation of
non-terminals, or deletion ('-') or insertion ('+') serialisation of
quoted strings are not honoured but rather placed on the full parse tree output
either as an @ixml:mark or @ixml:alias attribute or an
ixml:insert or ixml:delete element. This ony applies to marks on the original grammar and not to artefactual marks generated in compiling the grammar, such as those used for generated non-terminals implementing optionality or repetition. |
parse() option 'suppressMarks'' | |
Support iXML version 1.1 | Support additional features of iXML version 1.1 (e.g. renaming) | ||
Keep multi-character strings | Multi-character terminal strings (corresponding to the
quoted production of the iXML spec.) are expanded during compilation to a sequence of
single (quoted ) character strings, to correspond to a character-by-character processing
of the Earley parser. If this option is checked such strings are retained as mutliple characters and
the Earley parser 'jumps-forward' to write a new state several character positions forward in its
state records. |
compile() option 'longStrings' | |
Use regular expression matches | Uses regular expression matching for quoted strings, inclusions and exclusions and in some cases repetition and optional forms of the same. This will only work properly when the ranges of characters matched by the following term in sequence can be guaranteed disjoint with that of the current term. Automatic determination of safe conditions to do this is a type-analysis research issue. | compile() option 'regEx' |
Both grammar and input texts can be read from local filestore by using the appropriate 'Choose file' (or 'Browse') button, which permits a file to be read and its text loaded into the textarea. The name of the file loaded is displayed next to the file chooser. Files can also be 'drag-and-dropped' onto the textarea, though in Firefox security settings will probably have to be altered (it seems to work fine in Chrome).
Grammars can also be loaded from web-repositories, in particular from the InvisibleXML test-suites or sample grammars using the Grammar 'Test/sample' dropdown:
A browsable catalog of the test suite is also available.
When there are sample inputs available for one of these test or sample grammars, the 'Test/Sample' dropdown above the Input textarea will be populated.
Note that some of these test cases provide the iXML grammar in its XML serialisation form. The workbench
recognises such a situation and will show and use that form, but editing in the textarea under these
circumstances will have no effect on the grammar being used in parsing. Files containing iXML
grammars serialised as XML loaded by other means (file selection, drag-and-drop) will be (currently)
treated as simple text and currently will therefore fail to be parsed. (The jwlProcessor.xsl library contains a
jwl:parseXML()
function that will accept XML-serialised grammars)
body++(s1,s2)
a/(b|c)/d
failed whereas a/(b |c)/d
(trailing space)
succeededjwL:
namespace), corresponding to the signature and semantics given in the current draft of
XPath and XQuery Functions and
Operators 4.0
naming
production of Invisible XML Specification - Working Draft where
name
s can have optional alias
es to be used in XML tree production. (experimental)