digitalmars.D - New XML parser written for D1 and D2. - XMLP_01.zip (0/1)
- Michael Rynn (46/46) Oct 14 2009 I made a validating or optionally none validating XML parser in D.
I made a validating or optionally none validating XML parser in D. It can read and parse files and external dtds and entities with differrent BOM and encodings. This xmlp (XmlPieceParser class) passes 100% on both validating and non-validating modes for the following test sets:- oasis, sun, xmltes and ibm. I have not dared to try any of the xml 1.1 or other tests. The warnings given by, if you choose to intercept them, for not well-formed or non-valid documents may not necessarily be illuminating. My brief try of a modified std.xml against some of these tests led me to chuck it, as I learned more what the parser is actually supposed to do. This one is all my own mistakes and bad coding habits, written from near scratch, after giving up on std.xml, and taking what I could from std.encoding. I have also made a front end xmlp.delegator module that emulates the delagate callback model of std.xml. To use, you need to have a class derived from XmlParserInput, of which there are two instances, StreamParserInput and StringParserInput in xmlp.input. These wrap an InputRange interface (empty, front, popFront). Give a new XmlPieceParser the input, and an optional base directory path, and call nextPiece() repeatedly to get bits of the rather sparse XmlTree model defined in xmlp.xmldom. Or call the static XmlPieceParser.ReadDocument to get the entire thing at once. I dont know how the Tango xml parser would cope with the w3c tests. Any resemblance to what is in the Tango xml parser will be pure coincidence, as a brief glance at that code some long time ago left me none wiser. I learnt a lot of XML minutiae while getting it to parse the hundreds of w3c test cases. Some validation, such as the ELEMENT content particles validater still has wet glue and cement, and is not gauranteed to validate the all deterministic content modesl. So if you get a good test case please send. I am sure this release will be considered to be code bloated at the moment. With all those test cases, some conditional coding became a bit too contrived. After coding it for while it just got too big. Alhough I do think I got better at it towards the end. There is some scope for shrinkage. Very possibly there is a non-validating parser inside that is a fair bit smaller that this, that could one day be created by conditional compiled or re-coded from it. The package has a base module name of xmlp. I am not aiming for std.xml as yet. --------------------- Michael Rynn
Oct 14 2009