digitalmars.D.announce - std.xml2 candidate
- Michael Rynn (24/24) Dec 11 2010 Availability of Updated xml parser for D2,
- Andrei Alexandrescu (4/6) Dec 11 2010 [snip]
- Andrei Alexandrescu (5/11) Dec 11 2010 One more thing - with XML parsers, I think Tango has definitely set the
- Lutger Blijdestijn (6/20) Dec 12 2010 That is considerable. A quick benchmark suggests that a lot of work is
- so (7/11) Dec 12 2010 There is no reason a D code should perform worse than C++ if you are not...
- Lutger Blijdestijn (6/17) Dec 12 2010 I know, and tango's parser is proof of that. But it can take a lot of wo...
- so (5/8) Dec 12 2010 On that i absolutely agree.
- Eric Desbiens (30/30) Dec 12 2010 Hello,
Availability of Updated xml parser for D2, organised very presumptively as std.xml2 Downloadable with SVN. svn co http://svn.dsource.org/projects/xmlp/trunk/std (release 20). This imports a conventional DOM of linked nodes -- std.xmlp.linkdom A Core parser which emits parsed items -- std.xmlp.coreparse. A validating parser including DOCTYPE validation. std.xmlp.domparse. Performance seems not too bad. There are more lines of code, but it does the same work of std.xml in about 65% of the time. Well-formed-ness check is done during the parse, so there is no need to do separate check. It takes string inputs or file inputs in various encodings. The DOMErrorHandler DOM interface is included in the Validating parser for the linkdom. The parsers and DOM have a straight forward interface. There is aso a very nearly compatible version of the DOM used in std.xml. -- std.xmlp.arraydom. The arraydom DocumentParser is also faster than the std.xml, as it uses the std.xmlp.coreparse. Its not complete or final, nor much reviewed. The layout and interfaces seem to be OK. I expect its already more useful than std.xml. Michael Rynn.
Dec 11 2010
On 12/11/10 7:23 AM, Michael Rynn wrote:Availability of Updated xml parser for D2, organised very presumptively as std.xml2[snip] Great! Do you plan to submit this to Phobos? Andrei
Dec 11 2010
On 12/11/10 7:15 PM, Andrei Alexandrescu wrote:On 12/11/10 7:23 AM, Michael Rynn wrote:One more thing - with XML parsers, I think Tango has definitely set the performance bar where it belongs. Any proposal for Phobos would need to meet it. AndreiAvailability of Updated xml parser for D2, organised very presumptively as std.xml2[snip] Great! Do you plan to submit this to Phobos?
Dec 11 2010
Andrei Alexandrescu wrote:On 12/11/10 7:15 PM, Andrei Alexandrescu wrote:That is considerable. A quick benchmark suggests that a lot of work is needed. If you take into account that tango's xml parser does less validation and that it is up to par with the fastest C++ parsers out there, I suggest lowering the bar a little bit at first. For example, outperforming libxml2.On 12/11/10 7:23 AM, Michael Rynn wrote:One more thing - with XML parsers, I think Tango has definitely set the performance bar where it belongs. Any proposal for Phobos would need to meet it. AndreiAvailability of Updated xml parser for D2, organised very presumptively as std.xml2[snip] Great! Do you plan to submit this to Phobos?
Dec 12 2010
If you take into account that tango's xml parser does less validation and that it is up to par with the fastest C++ parsers out there, I suggest lowering the bar a little bit at first. For example, outperforming libxml2.There is no reason a D code should perform worse than C++ if you are not using some high level constructs. When it comes to strings/slicing/template, you might actually get performance boost comparing to C++. The C++ parser mentioned here (RapidXML) depends heavily on these. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 12 2010
so wrote:I know, and tango's parser is proof of that. But it can take a lot of work getting to that level. Right now we have an xml library a lot of people don't want to use, has bugs and performs 60 times worse than tango's. Imho it's better to include it if performance is merely acceptable and see if it is possible to improve from there on.If you take into account that tango's xml parser does less validation and that it is up to par with the fastest C++ parsers out there, I suggest lowering the bar a little bit at first. For example, outperforming libxml2.There is no reason a D code should perform worse than C++ if you are not using some high level constructs. When it comes to strings/slicing/template, you might actually get performance boost comparing to C++. The C++ parser mentioned here (RapidXML) depends heavily on these.
Dec 12 2010
Imho it's better to include it if performance is merely acceptable and see if it is possible to improve from there on.On that i absolutely agree. People have this misconception that D should perform worse than said languages, so i had to state the obvious :) -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 12 2010
Hello, It's great to see interest in replacing std.xml. I am also working on a replacement for std.xml, maybe we can collaborate on this and not duplicate effort. We should choose one of our codebase and develop from there a strong alternative. I propose my codebase for the following 2 reasons: 1.It performs better and scale better with file size. Here's a quick benchmark for dom parsing on my computer. I don't know how well it's performed compare to Tango. === XMLP === XMLP 1Mb Parsing time: 0.548 s XMLP 11Mb Parsing time: 29.570 === My Alternative* === Alt 1Mb Parsing time: 0.134 s Alt 11Mb Parsing time: 1.225 s *This is using XMl1.1 compliant parser. 2. It is more flexible All parsers are templated and you can choose the degree of conformance, if namespace are used, the type of entity decoding and support parsing document fragment. It also parse any type of range wich the element type is some sort of character. Your library is more complete tough. It support a Sax like interface, have a validating parser and try to be compatible with std.xml (which I'm not sure is needed). It also normalize attribute, which mine does not. On compliance, I think the 2 libraries are on the same level. Feel free to talk about your code and show where it is better than mine and if you think it should be better to build on your code instead of mine. Probably a mix of both library will make a better base. I think that if we collaborate on this, we will make a great library. Code can be downloaded from : https://github.com/olace/experimental check exp/xml.d
Dec 12 2010