Turtles all the way up
A WebIDL Parser for Javascript
WebIDL is a schema language for APIs that is being used (primarily) as part of W3C specifications in order to define various interfaces. If you've read any recent API specification, you've read WebIDL. It is abstract enough that using it one could generate interfaces for a great number of programming languages, but given its origin it is only normal that the vast majority of the time it is used to produce Javascript bindings.
As such it was a shame that there was no way to parse it from Javascript (there were fragments of such a parser inside the ReSpec codebase but you really don't want to stare at them too closely). This deficiency has now been addressed, and you can go grab WebIDLParser.js off GitHub. This is an alpha, it's barely been tested, use at your own risk, patches welcome, etc.
If you wish to use it in a Web context, simply include web/WebIDLParser.js
. Note
that this file
has not been minified and is quite large. In my tests it goes down to 44K minified and 6.4K
gzipped (being
generated code, it is rather repetitive and compresses well). A WebIDLParser
object
will become
available.
Using it from Node.js or a CommonJS environment is equally simple: just put
node/WebIDLParser.js
in your
library path and require it. It exports a Parser
object.
The interface is simple: the only method you need to worry about is parse(str,
[start])
. It takes
a string containing WebIDL, and returns an AST expressed as a simple JS structure (utilities to
traverse it
more easily are forthcoming). Optionally a second argument can specify which rule of the grammar
to start
with. This is useful when parsing a small WebIDL fragment that may be incomplete. The names of
the rules
can be found in lib/grammar.peg
. Just so that you don't have to look, the most
common start
parameters are:
- definitions (the default)
- interface
- Operation
- Attribute
- type
- implements
- typedef
- exception
The grammar isn't exactly the one in the WebIDL draft. In some places it is slightly more permissive (not by that much though), and its handling of extended attributes is simpler. It also adds support for WebIDL arrays, even though they are not described in the current draft's grammar.
If you wish modify the grammar, go through the following steps:
- read up on PEG.js at http://pegjs.majda.cz/
- create a directory at the root of this repository called
depends
- inside
depends
, rungit clone git://github.com/dmajda/pegjs.git
- edit the grammar in
lib/grammar.peg
- then run
node utils/generate.js
to regenerate the JS
How It Was Done
I built it atop the PEG.js parser generator. It's a very
nice
tool, with decent error reporting, and the resulting parser seems to be rather fast. Be warned
though
that the grammar listed in the "Documentation" section of the site is completely out of data.
Instead,
read what you can find in its GitHub repository.
I found
the CSS 2 grammar example to be all that I needed in order to get started. Overall, after
stumbling on
a couple gotchas (e.g. that any *
or +
specifier will cause an array
to
be passed — which is logical when you think of it) I found it to be intuitive and easy to work
with.
If you look inside utils/generate.js
you will see that I needed to work around
some PEG.js
limitations by patching the generated code in the most brutal fashion. If I get a minute to
myself I
will probably patch it so that such barbarian treatment is no longer needed, but in the
meantime here
were the issues.
The first one is that I want to pre-process the input before parsing it, because I didn't want to have to deal with comments in the grammar — just removing them upfront is easier. A similar feature could be used for a number of other things.
The other limitation is that I want it to be possible in my generated parser to specify which grammar rule to use as the starting point, at runtime (PEG.js only has an option to control that when the parser is being built). This is useful when you might need to parse a subcomponent of the grammar without its full context.
Before looking at PEG.js I gave Jison a shot. Overall it seemed pretty good but its LL(1) implementation doesn't support generating parsers proper. Its error messages were also less clear. That being said, if you need to rely on another family of grammar, it probably is worth a shot.
I couldn't find much of a test suite. If you have good test content, I'll happily take it.
Why It Was Done
Of course, I didn't just do that for the fun of it (though I'll admit that playing with the various Javascript parser generators had been on my mind for a while). The endgame here includes several targets.
First, provide ReSpec v2 with decent WebIDL processing. The current code that's used in ReSpec v1 has grown over time from just parsing a few small things to an unwieldy mess that I'm scared to touch lest I break existing content — it's time to change that. Alongside the parser I intend to add a visitor of sorts which will hopefully make it easier to generate documentation from WebIDL.
Second, using WebIDL in specifications should make it easy to automatically generate a large number of tests that can see if implementations at least support the interface properly (in terms of form that is, this can't generate behaviour tests obviously). We currently have a tool that does that but we're not entirely happy with it. Hopefully this can help improve that.
Finally, I want to write a WebIDL to JSON Schema converter. Why? Because JSON Schema is good at describing REST+JSON services. Having such a conversion would make it possible to using WebIDL to describe REST interaction. That means that many of the W3C APIs being worked on now could be exposed over the network. For some of course it might not make sense, but for others (e.g. Contacts, Calendar, File System) it could turn out to be quite interesting. I'll keep you posted.