www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.xml2 (collecting features)

reply "Robert burner Schadek" <rburners gmail.com> writes:
std.xml has been considered not up to specs nearly 3 years now. 
Time to build a successor. I currently plan the following featues 
for it:

- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2

Please post you feature requests, and please keep the posts DRY 
and on topic.
May 03 2015
next sibling parent reply "Joakim" <dlang joakim.fea.st> writes:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
 std.xml has been considered not up to specs nearly 3 years now. 
 Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts DRY 
 and on topic.
My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
May 03 2015
next sibling parent "Meta" <jared771 gmail.com> writes:
On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
 On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
 wrote:
 std.xml has been considered not up to specs nearly 3 years 
 now. Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts 
 DRY and on topic.
My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
That's not really an option considering the huge amount of XML data there is out there.
May 03 2015
prev sibling next sibling parent reply "w0rp" <devw0rp gmail.com> writes:
On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
 On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
 wrote:
 std.xml has been considered not up to specs nearly 3 years 
 now. Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts 
 DRY and on topic.
My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
I agree that JSON is superior through-and-through, but legacy support matters, and XML is in many places. It's good to have a quality XML parsing library.
May 03 2015
parent reply Marco Leise <Marco.Leise gmx.de> writes:
On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:

 My request: just skip it.  XML is a horrible waste of space for 
 a standard, better D doesn't support it well, anything to 
 discourage it's use.  I'd rather see you spend your time on 
 something worthwhile.  If data formats are your thing, you 
 could help get Ludwig's JSON stuff in, or better yet, enable 
 some nice binary data format.
Am Sun, 03 May 2015 18:44:11 +0000 schrieb "w0rp" <devw0rp gmail.com>:
 I agree that JSON is superior through-and-through, but legacy 
 support matters, and XML is in many places. It's good to have a 
 quality XML parsing library.
You two are terrible at motivating people. "Better D doesn't support it well" and "JSON is superior through-and-through" is overly dismissive. To me it sounds like someone saying replace C++ with JavaScript, because C++ is a horrible standard and JavaScript is so much superior. Honestly. Remember that while JSON is simpler, XML is not just a structured container for bool, Number and String data. It comes with many official side kicks covering a broad range of use cases: XPath: * allows you to use XML files like a textual database * complex enough to allow for almost any imaginable query * many tools emerged to test XPath expressions against XML documents * also powers XSLT (http://www.liquid-technologies.com/xpath-tutorial.aspx) XSL (Extensible Stylesheet Language) and XSLT (XSL Transformations): * written as XML documents * standard way to transform XML from one structure into another * convert or "compile" data to XHTML or SVG for display in a browser * output to XSL-FO XSL-FO (XSL formatting objects): * written as XSL * type-setting for XML; a XSL-FO processor is similar to a LaTex processor * reads an XML document (a "Format" document) and outputs to a PDF, RTF or similar format XML Schema Definition (XSD): * written as XML * linked in by an XML file * defines structure and validates content to some extent * can set constraints on how often an element can occur in a list * can validate data type of values (length, regex, positive, etc.) * database like unique IDs and references I think XML is the most eat-your-own-dog-food language ever and nicely covers a wide range of use cases. In any case there are many XML based file formats that we might want to parse. Amongst them SVG, OpenDocument (Open/LibreOffics), RSS feeds, several US Offices, XMP and other meta data formats. When it comes to which features to support, I personally used XSD more than XPath and the tech using it. But quite frankly both would be expected by users. Based on XPath, XSL transformations can be added any time then. Anything beyond that doesn't feel quite "core" enough to be in a XML module. -- Marco
May 04 2015
parent reply "Joakim" <dlang joakim.fea.st> writes:
On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:
 On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:

 My request: just skip it.  XML is a horrible waste of space 
 for a standard, better D doesn't support it well, anything to 
 discourage it's use.  I'd rather see you spend your time on 
 something worthwhile.  If data formats are your thing, you 
 could help get Ludwig's JSON stuff in, or better yet, enable 
 some nice binary data format.
You two are terrible at motivating people. "Better D doesn't support it well" and "JSON is superior through-and-through" is overly dismissive. To me it sounds like someone saying replace C++ with JavaScript, because C++ is a horrible standard and JavaScript is so much superior. Honestly.
You seem to have missed the point of my post, which was to discourage him from working on an XML module for phobos. As for "motivating" him, I suggested better alternatives. And I never said JSON was great, but it's certainly _much_ more readable than XML, which is one of the basic goals of a text format.
 Remember that while JSON is simpler, XML is not just a
 structured container for bool, Number and String data. It
 comes with many official side kicks covering a broad range of
 use cases:

 XPath:
  * allows you to use XML files like a textual database
  * complex enough to allow for almost any imaginable query
  * many tools emerged to test XPath expressions against XML 
 documents
  * also powers XSLT
    (http://www.liquid-technologies.com/xpath-tutorial.aspx)

 XSL (Extensible Stylesheet Language) and
 XSLT (XSL Transformations):
  * written as XML documents
  * standard way to transform XML from one structure into another
  * convert or "compile" data to XHTML or SVG for display in a 
 browser
  * output to XSL-FO

 XSL-FO (XSL formatting objects):
  * written as XSL
  * type-setting for XML; a XSL-FO processor is similar to a 
 LaTex processor
  * reads an XML document (a "Format" document) and outputs to a 
 PDF, RTF or similar format

 XML Schema Definition (XSD):
  * written as XML
  * linked in by an XML file
  * defines structure and validates content to some extent
  * can set constraints on how often an element can occur in a 
 list
  * can validate data type of values (length, regex, positive, 
 etc.)
  * database like unique IDs and references
These are all incredibly dumb ideas. I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)
 I think XML is the most eat-your-own-dog-food language ever
 and nicely covers a wide range of use cases.
The problem is you're still eating dog food. ;)
 In any case there
 are many XML based file formats that we might want to parse.
 Amongst them SVG, OpenDocument (Open/LibreOffics), RSS feeds,
 several US Offices, XMP and other meta data formats.
Sure, and if he has any real need for any of those, who are we to stop him? But if he's just looking for some way to contribute, there are better ways. On Monday, 4 May 2015 at 20:44:42 UTC, Jonathan M Davis wrote:
 Also true. Many of us just don't find enough time to work on D, 
 and we don't seem to do a good job of encouraging larger 
 contributions to Phobos, so newcomers don't tend to contribute 
 like that. And there's so much to do all around that the big 
 stuff just falls by the wayside, and it really shouldn't.
This is why I keep asking Walter and Andrei for a list of "big stuff" on the wiki- they don't have to be big, just important- so that newcomers know where help is most needed. Of course, it doesn't have to be them, it could be any member of the D core team, though whatever the BDFLs push for would have a bit more weight.
May 09 2015
parent reply "Craig Dillabaugh" <craig.dillabaugh gmail.com> writes:
On Saturday, 9 May 2015 at 10:28:53 UTC, Joakim wrote:
 On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:
 On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
clip
 Remember that while JSON is simpler, XML is not just a
 structured container for bool, Number and String data. It
 comes with many official side kicks covering a broad range of
 use cases:

 XPath:
 * allows you to use XML files like a textual database
 * complex enough to allow for almost any imaginable query
 * many tools emerged to test XPath expressions against XML 
 documents
 * also powers XSLT
   (http://www.liquid-technologies.com/xpath-tutorial.aspx)

 XSL (Extensible Stylesheet Language) and
 XSLT (XSL Transformations):
 * written as XML documents
 * standard way to transform XML from one structure into another
 * convert or "compile" data to XHTML or SVG for display in a 
 browser
 * output to XSL-FO

 XSL-FO (XSL formatting objects):
 * written as XSL
 * type-setting for XML; a XSL-FO processor is similar to a 
 LaTex processor
 * reads an XML document (a "Format" document) and outputs to a 
 PDF, RTF or similar format

 XML Schema Definition (XSD):
 * written as XML
 * linked in by an XML file
 * defines structure and validates content to some extent
 * can set constraints on how often an element can occur in a 
 list
 * can validate data type of values (length, regex, positive, 
 etc.)
 * database like unique IDs and references
These are all incredibly dumb ideas. I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)
 I think XML is the most eat-your-own-dog-food language ever
 and nicely covers a wide range of use cases.
The problem is you're still eating dog food. ;)
I have to agree with Joakim on this. Having spent much of this past week trying to get XML generated by gSOAP (project has some legacy code) to work with JAXB (Java) has reinforced my dislike for XML. I've used things like XPath and XLST in the past, so I can appreciate their power, but think the 'jobs' they perform would be better supported elsewhere (ie. language specific XML frameworks). In trying to pass data between applications I just want a simple way of packaging up the data and ideally making serialization/deserialization easy for me. At some point the programmer working on these needs to understand and validate the data anyway. Sure you can use DTD/XML Schema to handle the validation part, but it is just easier to deal with that within you own code - without having to learn a 'whole new language', that is likely harder to grok than the tools you would have at your disposal in your language of choice. Having said all that. As much as I share Joakim's sentiment that I wish XML would just go away, there is a lot of it out there, and I think having good support in Phobos is very valuable so I thank Robert for his efforts. Craig
May 09 2015
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 09 May 2015 10:28:52 +0000
schrieb "Joakim" <dlang joakim.fea.st>:

 On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:

 You two are terrible at motivating people. "Better D doesn't
 support it well" and "JSON is superior through-and-through" is
 overly dismissive.
 =E2=80=A6
=20 You seem to have missed the point of my post, which was to=20 discourage him from working on an XML module for phobos. As for=20 "motivating" him, I suggested better alternatives. And I never=20 said JSON was great, but it's certainly _much_ more readable than=20 XML, which is one of the basic goals of a text format.
Well, I was mostly answering to w0rp here. JSON is both readable and easy to parse, no question. =20
 Remember that while JSON is simpler, XML is not just a
 structured container for bool, Number and String data. It
 comes with many official side kicks covering a broad range of
 use cases:

 XPath:
  =E2=80=A6

 XSL and XSLT
  =E2=80=A6

 XSL-FO (XSL formatting objects):
  =E2=80=A6

 XML Schema Definition (XSD):
  =E2=80=A6
=20 These are all incredibly dumb ideas. I don't deny that many=20 people may use these things, but then people use hammers for all=20 kinds of things they shouldn't use them for too. :)
:) One can't really answer this one. But with many hundreds of published data exchange formats built on XML, it can't have been too shabby all along. And sometimes small things matter, like being able to add comments along with the "payload". JSON doesn't have that. Or knowing that both sender and receiver will validate the XML the same way through XSD. So if it doesn't blow up on your end, it will pass validation on the other end, too. Am Sat, 09 May 2015 13:04:57 +0000 schrieb "Craig Dillabaugh" <craig.dillabaugh gmail.com>:
 I have to agree with Joakim on this.  Having spent much of this=20
 past
 week trying to get XML generated by gSOAP (project has some legacy
 code) to work with JAXB (Java) has reinforced my dislike for XML.
=20
 I've used things like XPath and XLST in the past, so I can=20
 appreciate
 their power, but think the 'jobs' they perform would be better=20
 supported
 elsewhere (ie. language specific XML frameworks).
=20
 In trying to pass data between applications I just want a simple=20
 way
 of packaging up the data and ideally making=20
 serialization/deserialization
 easy for me.  At some point the programmer working on these needs
 to understand and validate the data anyway.  Sure you can use=20
 DTD/XML Schema to
 handle the validation part, but it is just easier to deal with=20
 that
 within you own code - without having to learn a 'whole new=20
 language', that
 is likely harder to grok than the tools you would have at your=20
 disposal
 in your language of choice.
You see, the thing is that XSD is _not_ a whole new language, it is written in XML as well, probably specifically to make it so. Try to switch the perspective: With XSD (if it is sufficient for your validation needs) _one_ person needs to learn and write it and other programmers (inside or outside the company) just use the XML library of choice to handle validation via that schema. Once the schema is loaded it is usually no more than doc.validate(); (There is also good GUI tools to assist in writing XSD.) What you propose on the other hand is that everyone involved in the data exchange writes their own validation code in their language of choice, with either no access to existing sources or functionality that doesn't translate to their language! =20 --=20 Marco
May 10 2015
next sibling parent reply "Joakim" <dlang joakim.fea.st> writes:
On Sunday, 10 May 2015 at 07:01:58 UTC, Marco Leise wrote:
 Am Sat, 09 May 2015 10:28:52 +0000
 schrieb "Joakim" <dlang joakim.fea.st>:

 On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:
 Remember that while JSON is simpler, XML is not just a
 structured container for bool, Number and String data. It
 comes with many official side kicks covering a broad range of
 use cases:

 XPath:
  …

 XSL and XSLT
  …

 XSL-FO (XSL formatting objects):
  …

 XML Schema Definition (XSD):
  …
These are all incredibly dumb ideas. I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)
:) One can't really answer this one. But with many hundreds of published data exchange formats built on XML, it can't have been too shabby all along.
It's worse than shabby, it's a horrible, horrible choice. Not just for data formats, but for _anything_. XML should not be used.
 And sometimes small things matter, like being able to add 
 comments
 along with the "payload". JSON doesn't have that.
 Or knowing that both sender and receiver will validate the XML 
 the
 same way through XSD. So if it doesn't blow up on your end, it 
 will
 pass validation on the other end, too.
One can do all these things with better formats than either XML or JSON. But why do we often end up dealing with these two? Familiarity, that is the only reason. XML seems familiar to anybody who's written some HTML, and JSON became familiar to web developers initially. Starting from those two large niches, they've expanded out to become the two most popular data interchange formats, despite XML being a horrible mess and JSON being too simple for many uses. I'd like to see a move back to binary formats, which is why I mentioned that to Robert. D would be an ideal language in which to show the superiority of binary to text formats, given its emphasis on efficiency. Many devs have learned the wrong lessons from past closed binary formats, when open binary formats wouldn't have many of those deficiencies. There have been some interesting moves back to open binary formats/protocols in recent years, like Hessian (http://hessian.caucho.com/), Thrift (https://thrift.apache.org/), MessagePack (http://msgpack.org/), and Cap'n Proto (from the protobufs guy after he left google - https://capnproto.org/). I'd rather see phobos support these, which are the future, rather than flash-in-the-pan text formats like XML or JSON.
May 10 2015
next sibling parent "Laeeth Isharc" <nospamlaeeth nospamlaeeth.com> writes:
On Sunday, 10 May 2015 at 08:54:09 UTC, Joakim wrote:
 It's worse than shabby, it's a horrible, horrible choice.  Not 
 just for data formats, but for _anything_.  XML should not be 
 used.
I feel the same way about XML, and I also think that having strong aesthetic internal emotional responses is often necessary to achieve excellence in engineering.
 But why do we often end up dealing with these two?  
 Familiarity, that is the only reason.  XML seems familiar to 
 anybody who's written some HTML, and JSON became familiar to 
 web developers initially.  Starting from those two large 
 niches, they've expanded out to become the two most popular 
 data interchange formats, despite XML being a horrible mess and 
 JSON being too simple for many uses.
Sometimes you get to pick, but often not. I can hardly tell the UK Debt Management Office to give up XML and switch to msgpack structs (well, I can, but I am not sure they would listen). So at the moment for some data series I use a python library via PyD to convert xml files to JSON. But it would be nice to do it all in D. I am not sure XML is going away very soon since new protocols keep being created using it. (Most recent one I heard of is one for allowing hedge funds to achieve full transparency of their portfolio to end investors - not necessarily something that will achieve what people think it will, but one in tune with the times). Laeeth.
May 10 2015
prev sibling parent "Kagamin" <spam here.lot> writes:
On Sunday, 10 May 2015 at 08:54:09 UTC, Joakim wrote:
 One can do all these things with better formats than either XML 
 or JSON.
Hypothetically, yes, though formats better than XML don't exist. I personally find XML perfectly readable.
May 12 2015
prev sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 10 May 2015 at 07:01:58 UTC, Marco Leise wrote:
 Well, I was mostly answering to w0rp here. JSON is both
 readable and easy to parse, no question.
JSON is just javascript literals with some silly constraints. As crappy a format as it gets. Even pure Lisp would have been better. And much more powerful!
 :) One can't really answer this one. But with many hundreds of
 published data exchange formats built on XML, it can't have been
 too shabby all along.
 And sometimes small things matter, like being able to add 
 comments
 along with the "payload".
XML is actually great for what it is: eXtensible. It means you can build forward compatible formats and annotate existing formats with metadata without breaking existing (compliant) applications etc... It also means you can datamine files whithout knowing the full format.
 Or knowing that both sender and receiver will validate the XML 
 the
 same way through XSD.
Right, or build a database/archival service that is generic. XML is not going away until there is something better, and that won't happen anytime soon. It is also one of the few formats that I actually need library and _good_ DOM support for. (JSON can be done in an afternoon, so I don't care if it is supported or not...)
May 10 2015
parent reply "Alex Parrill" <initrd.gz gmail.com> writes:
Can we please not turn this thread into an XML vs JSON flamewar?

XML is one of the most popular data formats (for better or for 
worse), so a parser would be a good addition to the standard 
library.
May 11 2015
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Monday, 11 May 2015 at 15:20:12 UTC, Alex Parrill wrote:
 Can we please not turn this thread into an XML vs JSON flamewar?
This is not a flamewar, JSON is ad hoc and I use it a lot, but it isn't actually suitable as a file and archival exchange format. It is important that people understand what the point of XML is in order to build something useful. Full XML support and tooling is very valuable for typed GC-backed batch processing. That means namespaces, entities, XQuery equivalents, DOMs etc A library backed tooling pipeline would be a valuable asset for D. The value is not in _reading_ or _writing_ XML. The value is all about providing a framework for structured grammar/namespace based _processing_ and _transforms_.
May 11 2015
prev sibling parent reply Chris <wendlec tcd.ie> writes:
On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
 On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
 wrote:
 std.xml has been considered not up to specs nearly 3 years 
 now. Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts 
 DRY and on topic.
My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
Glad to hear that someone is working on XML support. We cannot just "skip it". XML/HTML like mark up comes up all the time, here and there. I recently had to write a mini-parser (nowhere near the stuff Robert is doing, just a quick fix!) to extract data from XML input. This has nothing to do with personal preferences, it's just there [1] and has to be dealt with. [1] https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language
Feb 19 2016
parent reply Joakim <dlang joakim.fea.st> writes:
On Friday, 19 February 2016 at 12:13:53 UTC, Chris wrote:
 On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
 On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
 wrote:
 std.xml has been considered not up to specs nearly 3 years 
 now. Time to build a successor. I currently plan the 
 following featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance 
 test suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts 
 DRY and on topic.
My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
Glad to hear that someone is working on XML support. We cannot just "skip it". XML/HTML like mark up comes up all the time, here and there. I recently had to write a mini-parser (nowhere near the stuff Robert is doing, just a quick fix!) to extract data from XML input. This has nothing to do with personal preferences, it's just there [1] and has to be dealt with. [1] https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language
Then write a good XML extraction-only library and dub it. I see no reason to include this in Phobos, which will encourage those who don't know any better to use it, since it comes with the compiler. I'll close with a quote from Saint Linus of Torvalds, which I was unaware of till a couple days ago: "XML is crap. Really. There are no excuses. XML is nasty to parse for humans, and it's a disaster to parse even for computers. There's just no reason for that horrible crap to exist." https://en.wikiquote.org/wiki/Linus_Torvalds#2014
Feb 23 2016
parent reply Dmitry <dmitry indiedev.ru> writes:
On Tuesday, 23 February 2016 at 11:22:23 UTC, Joakim wrote:
 Then write a good XML extraction-only library and dub it. I see 
 no reason to include this in Phobos
You won't be able to sleep if it will be in Phobos? I use XML and I don't like check tons of side libraries for see which will be good for me, which have support (bugfixes), which will have support in some years, etc. Lot of systems already using XML and any serious language _must_ have official support for it.
 If data formats are your thing, you could help get Ludwig's 
 JSON stuff in, or better yet, enable some nice binary data 
 format.
If it better for you, it not mean that it will better for everyone.
Feb 23 2016
parent Craig Dillabaugh <craig.dillabaugh gmail.com> writes:
On Tuesday, 23 February 2016 at 12:46:38 UTC, Dmitry wrote:
 On Tuesday, 23 February 2016 at 11:22:23 UTC, Joakim wrote:
 Then write a good XML extraction-only library and dub it. I 
 see no reason to include this in Phobos
You won't be able to sleep if it will be in Phobos? I use XML and I don't like check tons of side libraries for see which will be good for me, which have support (bugfixes), which will have support in some years, etc. Lot of systems already using XML and any serious language _must_ have official support for it.
So are you trying to say C/C++ are not serious languages :o) Having said that, as much as I hate XML, basic support would be a nice feature for the language.
Feb 24 2016
prev sibling next sibling parent "Robert burner Schadek" <rburners gmail.com> writes:
- CTS to disable parsing location (line,column)
May 03 2015
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/3/2015 10:39 AM, Robert burner Schadek wrote:
 Please post you feature requests, and please keep the posts DRY and on topic.
Pipeline range interface, for example: source.xmlparse(configuration).whatever();
May 03 2015
prev sibling next sibling parent "wobbles " <grogan.colin gmail.com> writes:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek
wrote:
 std.xml has been considered not up to specs nearly 3 years now. 
 Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts DRY 
 and on topic.
Could possibly use pegged to do it? It may simplify the parsing portion of it for you at least.
May 03 2015
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/3/2015 10:39 AM, Robert burner Schadek wrote:
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
Encoding schemes should be handled by adapter algorithms, not in the XML parser itself, which should only handle UTF8.
May 03 2015
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 03 May 2015 14:00:11 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 5/3/2015 10:39 AM, Robert burner Schadek wrote:
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
Encoding schemes should be handled by adapter algorithms, not in the XML parser itself, which should only handle UTF8.
Unlike JSON, XML actually declares the encoding in the prolog, e.g.: <?xml version="1.0" encoding="Windows-1252"?> -- Marco
May 04 2015
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/3/2015 10:39 AM, Robert burner Schadek wrote:
 Please post you feature requests, and please keep the posts DRY and on topic.
Try to design the interface to it so it does not inherently require the implementation to allocate GC memory.
May 03 2015
prev sibling next sibling parent reply "Ilya Yaroshenko" <ilyayaroshenko gmail.com> writes:
Can it lazily reads huge files (files greater than memory)?
May 03 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/3/2015 2:31 PM, Ilya Yaroshenko wrote:
 Can it lazily reads huge files (files greater than memory)?
If a range interface is used, it doesn't need to be aware of where the data is coming from. In fact, the xml package should NOT be doing I/O.
May 03 2015
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 3 May 2015 at 22:02:13 UTC, Walter Bright wrote:
 On 5/3/2015 2:31 PM, Ilya Yaroshenko wrote:
 Can it lazily reads huge files (files greater than memory)?
If a range interface is used, it doesn't need to be aware of where the data is coming from. In fact, the xml package should NOT be doing I/O.
Wouldn't D-ranges make it impossible to use SIMD optimizations when scanning? However, it would make a lot of sense to just convert an existing XML solution with Boost license. I don't know which ones are any good, but RapidXML is at least Boost.
May 04 2015
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, 4 May 2015 at 09:35:55 UTC, Ola Fosheim Grøstad wrote:
 However, it would make a lot of sense to just convert an 
 existing XML solution with Boost license. I don't know which 
 ones are any good, but RapidXML is at least Boost.
Given how D's arrays work, we have the opportunity to have an _extremely_ fast XML parser thanks to slices. It's highly unlikely that any C or C++ solution is going to be able to compete, and if it can, it's likely to be far more complex than necessary. Parsing is an area where we definitely should write our own stuff rather than porting existing code from other languages or use existing libraries in other languages via C bindings. Fast parsing is definitely a killer feature of D and the fact that std.xml botches that so badly is just embarrassing. - Jonathan M Davis
May 04 2015
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/4/15 12:31 PM, Jonathan M Davis wrote:
 On Monday, 4 May 2015 at 09:35:55 UTC, Ola Fosheim Grøstad wrote:
 However, it would make a lot of sense to just convert an existing XML
 solution with Boost license. I don't know which ones are any good, but
 RapidXML is at least Boost.
Given how D's arrays work, we have the opportunity to have an _extremely_ fast XML parser thanks to slices. It's highly unlikely that any C or C++ solution is going to be able to compete, and if it can, it's likely to be far more complex than necessary. Parsing is an area where we definitely should write our own stuff rather than porting existing code from other languages or use existing libraries in other languages via C bindings. Fast parsing is definitely a killer feature of D and the fact that std.xml botches that so badly is just embarrassing.
To be frank what's more embarrassing is that we managed to do nothing about it for years (aside from endlessly wailing about it in an a capella ensemble). It's a failure of leadership (that Walter and I need to work on) that very many unimportant and arguably less interesting areas of Phobos get attention at the expense of this one. -- Andrei
May 04 2015
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, 4 May 2015 at 19:45:18 UTC, Andrei Alexandrescu wrote:
 On 5/4/15 12:31 PM, Jonathan M Davis wrote:
 Fast parsing is definitely a killer feature of
 D and the fact that std.xml botches that so badly is just 
 embarrassing.
To be frank what's more embarrassing is that we managed to do nothing about it for years (aside from endlessly wailing about it in an a capella ensemble). It's a failure of leadership (that Walter and I need to work on) that very many unimportant and arguably less interesting areas of Phobos get attention at the expense of this one. -- Andrei
Also true. Many of us just don't find enough time to work on D, and we don't seem to do a good job of encouraging larger contributions to Phobos, so newcomers don't tend to contribute like that. And there's so much to do all around that the big stuff just falls by the wayside, and it really shouldn't. - Jonathan M Davis
May 04 2015
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/4/2015 12:31 PM, Jonathan M Davis wrote:
 Given how D's arrays work, we have the opportunity to have an _extremely_ fast
 XML parser thanks to slices. It's highly unlikely that any C or C++ solution is
 going to be able to compete, and if it can, it's likely to be far more complex
 than necessary. Parsing is an area where we definitely should write our own
 stuff rather than porting existing code from other languages or use existing
 libraries in other languages via C bindings. Fast parsing is definitely a
killer
 feature of D and the fact that std.xml botches that so badly is just
embarrassing.
Tango's XML package was well regarded and the fastest in the business. It used slicing, and almost no memory allocation.
May 04 2015
prev sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Monday, 4 May 2015 at 19:31:59 UTC, Jonathan M Davis wrote:
 Given how D's arrays work, we have the opportunity to have an 
 _extremely_ fast XML parser thanks to slices.
Yes, that would be great. XML is a flexible go-to archive, exchange and application format. Things like entities, namespaces and so makes it non-trivial, but being able to conveniently process Inkscape and Open Office files etc would be very useful. One should probably look at what applications generate XML and create some large test files with existing applications.
May 05 2015
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/4/2015 2:35 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 Wouldn't D-ranges make it impossible to use SIMD optimizations when scanning?
Not at all. Algorithms can be specialized for various forms of input ranges, including ones where SIMD optimizations can be used. Specialization is one of the very cool things about D algorithms.
May 04 2015
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Sunday, 3 May 2015 at 22:02:13 UTC, Walter Bright wrote:
 On 5/3/2015 2:31 PM, Ilya Yaroshenko wrote:
 Can it lazily reads huge files (files greater than memory)?
If a range interface is used, it doesn't need to be aware of where the data is coming from. In fact, the xml package should NOT be doing I/O.
Indeed. It should operate on ranges without caring where they came from (though it may end up supporting both input ranges and random-access ranges with the idea that it can support reading of a socket with a range in a less efficient manner or operating on a whole file at once as via a random-access range for more efficient parsing). But if I/O is a big concern, I'd suggest just using std.mmfile to do the trick, since then you can still operate on the whole file as a single array without having to actually have the whole thing in memory. - Jonathan M Davis
May 04 2015
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2015-05-03 17:39:46 +0000, "Robert burner Schadek" 
<rburners gmail.com> said:

 std.xml has been considered not up to specs nearly 3 years now. Time to 
 build a successor. I currently plan the following featues for it:
 
 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance
 
 Not much code yet, I'm currently building the performance test suite 
 https://github.com/burner/std.xml2
 
 Please post you feature requests, and please keep the posts DRY and on topic.
This isn't a feature request (sorry?), but I just want to point out that you should feel free to borrow code from https://github.com/michelf/mfr-xml-d There's probably a lot you can reuse in there. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
May 03 2015
parent "Robert burner Schadek" <rburners gmail.com> writes:
On Sunday, 3 May 2015 at 23:32:28 UTC, Michel Fortin wrote:
 This isn't a feature request (sorry?), but I just want to point 
 out that you should feel free to borrow code from 
 https://github.com/michelf/mfr-xml-d  There's probably a lot 
 you can reuse in there.
nice, thank you
May 04 2015
prev sibling next sibling parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 4/05/2015 5:39 a.m., Robert burner Schadek wrote:
 std.xml has been considered not up to specs nearly 3 years now. Time to
 build a successor. I currently plan the following featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test suite
 https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts DRY and on
 topic.
Preferably the interfaces are made first 1:1 as the spec requires. Then its just a matter of building the actual reader/writer code. That way we could theoretically rewrite the reader/writer to support other formats such as html5/svg. Independently of phobos. Also would be nice to be CTFE'able!
May 03 2015
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
 std.xml has been considered not up to specs nearly 3 years now. 
 Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts DRY 
 and on topic.
If I were doing it, I'd do three types of parsers: 1. A parser that was pretty much as low level as you can get, where you basically a range of XML atributes or tags. Exactly how to build that could be a bit entertaining, since it would have to be hierarchical, and ranges aren't, but something like a range of tags where you can get a range of its attributes and sub-tags from it so that the whole document can be processed without actually getting to the level of even a SAX parser. That parser could then be used to build the other parsers, and anyone who needed insanely fast speeds could use it rather than the SAX or DOM parser so long as they were willing to pay the inevitable loss in user-friendliness. 2. SAX parser built on the low level parser. 3. DOM parser built either on the low level parser or the SAX parser (whichever made more sense). I doubt that I'm really explaining the low level parser well enough or have even though through it enough, but I really think that even a SAX parser is too high level for the base parser and that something that slightly higher than a lexer (high enough to actually be processing XML rather than individual tokens but pretty much only as high as is required to do that) would be a far better choice. IIRC, Michel Fortin's work went in that direction, and he linked to his code in another post, so I'd suggest at least looking at that for ideas. Regardless, by building layers of XML parsers rather than just the standard ones, it should be possible to get higher performance while still having the more standard, user-friendly ones for those that don't need the full performance and do need the user-friendliness (though of course, we do want the SAX and DOM parsers to be efficient as well). - Jonathan M Davis
May 04 2015
parent Jacob Carlborg <doob me.com> writes:
On 2015-05-04 21:14, Jonathan M Davis wrote:

 If I were doing it, I'd do three types of parsers:

 1. A parser that was pretty much as low level as you can get, where you
 basically a range of XML atributes or tags. Exactly how to build that
 could be a bit entertaining, since it would have to be hierarchical, and
 ranges aren't, but something like a range of tags where you can get a
 range of its attributes and sub-tags from it so that the whole document
 can be processed without actually getting to the level of even a SAX
 parser. That parser could then be used to build the other parsers, and
 anyone who needed insanely fast speeds could use it rather than the SAX
 or DOM parser so long as they were willing to pay the inevitable loss in
 user-friendliness.

 2. SAX parser built on the low level parser.

 3. DOM parser built either on the low level parser or the SAX parser
 (whichever made more sense).

 I doubt that I'm really explaining the low level parser well enough or
 have even though through it enough, but I really think that even a SAX
 parser is too high level for the base parser and that something that
 slightly higher than a lexer (high enough to actually be processing XML
 rather than individual tokens but pretty much only as high as is
 required to do that) would be a far better choice.

 IIRC, Michel Fortin's work went in that direction, and he linked to his
 code in another post, so I'd suggest at least looking at that for ideas.
This way the XML parser is structured in Tango. A pull parser at the lowest level, a SAX parser on top of that and I think the DOM parser builds on top of the pull parser. The Tango pull parser can give you the following tokens: * start element * attribute * end element * end empty element * data * comment * cdata * doctype * pi -- /Jacob Carlborg
May 04 2015
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2015-05-03 19:39, Robert burner Schadek wrote:

 Not much code yet, I'm currently building the performance test suite
 https://github.com/burner/std.xml2
I recommend benchmarking against the Tango pull parser. -- /Jacob Carlborg
May 04 2015
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/4/2015 12:28 PM, Jacob Carlborg wrote:
 On 2015-05-03 19:39, Robert burner Schadek wrote:

 Not much code yet, I'm currently building the performance test suite
 https://github.com/burner/std.xml2
I recommend benchmarking against the Tango pull parser.
I agree. The Tango XML parser has set the performance bar. If any new solution can't match that, throw it out and try again.
May 04 2015
prev sibling parent reply "Mario =?UTF-8?B?S3LDtnBsaW4i?= <linkrope github.com> writes:
On Monday, 4 May 2015 at 19:28:25 UTC, Jacob Carlborg wrote:
 On 2015-05-03 19:39, Robert burner Schadek wrote:

 Not much code yet, I'm currently building the performance test 
 suite
 https://github.com/burner/std.xml2
I recommend benchmarking against the Tango pull parser.
Recently, I compared DOM parsers for an XML files of 100 MByte: 15.8 s tango.text.xml (SiegeLord/Tango-D2) 13.4 s ae.utils.xml (CyberShadow/ae) 8.5 s xml.etree (Python) Either the Tango DOM parser is slow compared to the Tango pull parser, or the D2 port ruined the performance.
May 05 2015
next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 5 May 2015 at 10:41:37 UTC, Mario Kröplin wrote:
 On Monday, 4 May 2015 at 19:28:25 UTC, Jacob Carlborg wrote:
 On 2015-05-03 19:39, Robert burner Schadek wrote:

 Not much code yet, I'm currently building the performance 
 test suite
 https://github.com/burner/std.xml2
I recommend benchmarking against the Tango pull parser.
Recently, I compared DOM parsers for an XML files of 100 MByte: 15.8 s tango.text.xml (SiegeLord/Tango-D2) 13.4 s ae.utils.xml (CyberShadow/ae) 8.5 s xml.etree (Python) Either the Tango DOM parser is slow compared to the Tango pull parser, or the D2 port ruined the performance.
As usual: system, compiler, compiler version, compilation flags?
May 05 2015
prev sibling next sibling parent reply Richard Webb <richard.webb boldonjames.com> writes:
On 05/05/2015 11:41, "Mario =?UTF-8?B?S3LDtnBsaW4i?= 
<linkrope github.com>" wrote:
 Recently, I compared DOM parsers for an XML files of 100 MByte:

 15.8 s tango.text.xml (SiegeLord/Tango-D2)
 13.4 s ae.utils.xml (CyberShadow/ae)
   8.5 s xml.etree (Python)

 Either the Tango DOM parser is slow compared to the Tango pull parser,
 or the D2 port ruined the performance.
fwiw I did some tests a couple of years back with https://launchpad.net/d2-xml on 20 odd megabyte files and found it faster than Tango. Unfortunately that would need some work to test now, as xmlp is abandoned and wouldn't build last time I tried it :-( I also had some success with https://github.com/opticron/kxml, though it had some issues with chuffy entity decoding performance. Also, profiling showed a lot of time spent in the GC, and the recent improvements in that area might have changed things by now.
May 05 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/5/2015 4:16 AM, Richard Webb wrote:
 Also, profiling showed a lot of time spent in the GC, and the recent
 improvements in that area might have changed things by now.
I haven't read the Tango source code, but the performance of it's xml was supposedly because it did not use the GC, it used slices.
May 05 2015
parent reply Jacob Carlborg <doob me.com> writes:
On 2015-05-06 01:38, Walter Bright wrote:

 I haven't read the Tango source code, but the performance of it's xml
 was supposedly because it did not use the GC, it used slices.
That's only true for the pull parser (not sure about the SAX parser). The DOM parser needs to allocate the nodes, but if I recall correctly those are allocated in a free list. Not sure which parser was used in the test. -- /Jacob Carlborg
May 05 2015
parent Richard Webb <richard.webb boldonjames.com> writes:
On 06/05/2015 07:31, Jacob Carlborg wrote:
 On 2015-05-06 01:38, Walter Bright wrote:

 I haven't read the Tango source code, but the performance of it's xml
 was supposedly because it did not use the GC, it used slices.
That's only true for the pull parser (not sure about the SAX parser). The DOM parser needs to allocate the nodes, but if I recall correctly those are allocated in a free list. Not sure which parser was used in the test.
The direct comparisons were with the DOM parsers (I was playing with a D port of some C++ code at work at the time, and that is DOM based). xmlp has alternate parsers (event driven etc) which were faster in some simple tests i did, but I don't recall if I did a direct comparison with Tango there.
May 06 2015
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2015-05-05 12:41, "Mario =?UTF-8?B?S3LDtnBsaW4i?= 
<linkrope github.com>" wrote:

 Recently, I compared DOM parsers for an XML files of 100 MByte:

 15.8 s tango.text.xml (SiegeLord/Tango-D2)
 13.4 s ae.utils.xml (CyberShadow/ae)
   8.5 s xml.etree (Python)

 Either the Tango DOM parser is slow compared to the Tango pull parser,
Yes, of course it's slower. The DOM parser creates a DOM as well, which the pull parser doesn't. These other libraries, what kind of parsers are those using? I mean, it's not fair to compare a pull parser against a DOM parser. Could you try D1 Tango as well? Or do you have the benchmark available somewhere?
 or the D2 port ruined the performance.
Might be the case as well, see this comment [1]. [1] http://forum.dlang.org/thread/vsbsxfeciryrdsjhhfak forum.dlang.org?page=3#post-mi8hs8:24b0j:241:40digitalmars.com -- /Jacob Carlborg
May 05 2015
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 5 May 2015 at 12:10:59 UTC, Jacob Carlborg wrote:
 Yes, of course it's slower. The DOM parser creates a DOM as 
 well, which the pull parser doesn't.

 These other libraries, what kind of parsers are those using? I 
 mean, it's not fair to compare a pull parser against a DOM 
 parser.
I agree. Most applications will use a DOM parser for convenience, so sacrificing some speed initially in favour of easy-of-use makes a lot of sense. As long as it is possible to improve it later (e.g. use SIMD scanning to find the end of CDATA etc). In my opinion it is rather difficult to build a good API without also using the API in an application in parallel. So it would be a good strategy to build a specific DOM along with writing the XML infrastructure, like SVG/HTML. Also, some parsers, like RapidXML only support a subset of XML. So they cannot be used for comparisons.
May 05 2015
parent reply Jacob Carlborg <doob me.com> writes:
On 2015-05-05 16:04, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:

 In my opinion it is rather difficult to build a good API without also
 using the API in an application in parallel. So it would be a good
 strategy to build a specific DOM along with writing the XML
 infrastructure, like SVG/HTML.
Agree.
 Also, some parsers, like RapidXML only support a subset of XML. So they
 cannot be used for comparisons.
The Tango parser has some limitation as well. In some places it sacrificed correctness for speed. There's a comment claiming the parser might read past the input if it's not well formed. -- /Jacob Carlborg
May 05 2015
parent Brad Roberts via Digitalmars-d <digitalmars-d puremagic.com> writes:
An old friend of mine who was intimate with the microsoft xml parsers 
was fond of saying, particularly with respect to xml parsers, that if 
you hadn't finished implementing and testing error handling and negative 
tests (ie, malformed documents) that your positive benchmarks were 
fairly meaningless.  A whole lot of work goes into that 'second half' of 
things that can quickly cost performance.

I didn't dive or don't recall specific details as this was years ago.

The (over-)generalization from there is an old adage: it's easy to write 
an incorrect program.

On 5/5/2015 11:33 PM, Jacob Carlborg via Digitalmars-d wrote:
 On 2015-05-05 16:04, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?=
 <ola.fosheim.grostad+dlang gmail.com>" wrote:

 In my opinion it is rather difficult to build a good API without also
 using the API in an application in parallel. So it would be a good
 strategy to build a specific DOM along with writing the XML
 infrastructure, like SVG/HTML.
Agree.
 Also, some parsers, like RapidXML only support a subset of XML. So they
 cannot be used for comparisons.
The Tango parser has some limitation as well. In some places it sacrificed correctness for speed. There's a comment claiming the parser might read past the input if it's not well formed.
May 05 2015
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2015-05-03 19:39, Robert burner Schadek wrote:

 Not much code yet, I'm currently building the performance test suite
 https://github.com/burner/std.xml2
There are a couple of interesting comments about the Tango pull parser that can be worth mentioning: * Use -version=whitespace to retain whitespace as data nodes. We see a %25 increase in token count and 10% throughput drop when parsing "hamlet.xml" with this option enabled (pullparser alone) * The parser is constructed with some tradeoffs relating to document integrity. It is generally optimized for well-formed documents, and currently may read past a document-end for those that are not well formed * Making some tiny unrelated change to the code can cause notable throughput changes. We're not yet clear why these swings are so pronounced (for changes outside the code path) but they seem to be related to the alignment of codegen. It could be a cache-line issue, or something else The last comment might not relevant anymore since these are all quite old comments. -- /Jacob Carlborg
May 04 2015
prev sibling next sibling parent reply "Liam McSherry" <mcsherry.liam gmail.com> writes:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek
wrote:
 std.xml has been considered not up to specs nearly 3 years now. 
 Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts DRY 
 and on topic.
Not a feature, but if `std.data.json` [1] gets accepted in to Phobos, it may be something to consider naming this `std.data.xml` (although that might not as effectively differentiate it from `std.xml`). [1]: http://wiki.dlang.org/Review_Queue
May 04 2015
parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 5/05/2015 10:45 a.m., Liam McSherry wrote:
 On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek
 wrote:
 std.xml has been considered not up to specs nearly 3 years now. Time
 to build a successor. I currently plan the following featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test suite
 https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts DRY and on
 topic.
Not a feature, but if `std.data.json` [1] gets accepted in to Phobos, it may be something to consider naming this `std.data.xml` (although that might not as effectively differentiate it from `std.xml`). [1]: http://wiki.dlang.org/Review_Queue
It really should be std.data.xml. To keep with the new structuring. Plus it'll make transitioning a little easier.
May 04 2015
prev sibling next sibling parent reply "weaselcat" <weaselcat gmail.com> writes:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
 std.xml has been considered not up to specs nearly 3 years now. 
 Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts DRY 
 and on topic.
maybe off-topic, but it would be nice if the standard json,xml, etc etc all had identical interfaces(except for implementation-specific quirks.) This might be something worth discussing if it wasn't already agreed upon.
May 04 2015
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 05 May 2015 02:01:50 +0000
schrieb "weaselcat" <weaselcat gmail.com>:

 maybe off-topic, but it would be nice if the standard json,xml, 
 etc etc all had identical interfaces(except for 
 implementation-specific quirks.) This might be something worth 
 discussing if it wasn't already agreed upon.
I don't think this needs discussion. It is plain impossible to have a sophisticated JSON parser and a sophisticated XML parser share the same API. Established function names, structural differences in the formats and feature sets differ to much. For example in XML attributes and child elements are used somewhat interchangeably whereas in JSON attributes don't exist. So while in JSON "obj.field" makes sense in XML you would want to select either an attribute or an element with the name "field". -- Marco
May 04 2015
prev sibling next sibling parent reply Alex Vincent <ajvincent gmail.com> writes:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
 std.xml has been considered not up to specs nearly 3 years now. 
 Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts DRY 
 and on topic.
I'm looking for a status update. DUB doesn't seem to have many options posted. I was thinking about starting a SAXParser implementation.
Feb 17 2016
parent reply Robert burner Schadek <rburners gmail.com> writes:
On Thursday, 18 February 2016 at 04:34:13 UTC, Alex Vincent wrote:
 I'm looking for a status update.  DUB doesn't seem to have many 
 options posted.  I was thinking about starting a SAXParser 
 implementation.
I'm working on it, but recently I had to do some major restructuring of the code. Currently I'm trying to get this merged https://github.com/D-Programming-Language/phobos/pull/3880 because I had some problems with the encoding of test files. XML has a lot of corner cases, it just takes time. If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.
Feb 18 2016
next sibling parent reply Robert burner Schadek <rburners gmail.com> writes:
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner 
Schadek wrote:
 If you want to on some XML stuff, please join me. It is 
 properly more productive working together than creating two 
 competing implementations.
also I would like to see this https://github.com/D-Programming-Language/phobos/pull/2995 go in first to be able to accurately measure and compare performance
Feb 18 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 02/18/2016 05:49 AM, Robert burner Schadek wrote:
 On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:
 If you want to on some XML stuff, please join me. It is properly more
 productive working together than creating two competing implementations.
also I would like to see this https://github.com/D-Programming-Language/phobos/pull/2995 go in first to be able to accurately measure and compare performance
Would the measuring be possible with 2995 as a dub package? -- Andrei
Feb 18 2016
parent reply Robert burner Schadek <rburners gmail.com> writes:
On Thursday, 18 February 2016 at 12:30:29 UTC, Andrei 
Alexandrescu wrote:
 also I would like to see this
 https://github.com/D-Programming-Language/phobos/pull/2995 go 
 in first
 to be able to accurately measure and compare performance
Would the measuring be possible with 2995 as a dub package? -- Andrei
yes, after have synced the dub package to the PR
Feb 18 2016
parent Robert burner Schadek <rburners gmail.com> writes:
On Thursday, 18 February 2016 at 15:39:01 UTC, Robert burner 
Schadek wrote:
 On Thursday, 18 February 2016 at 12:30:29 UTC, Andrei 
 Alexandrescu wrote:
 Would the measuring be possible with 2995 as a dub package? -- 
 Andrei
yes, after have synced the dub package to the PR
brought the dub package up to date with the PR (v0.0.6)
Feb 23 2016
prev sibling next sibling parent Alex Vincent <ajvincent gmail.com> writes:
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner 
Schadek wrote:
 If you want to on some XML stuff, please join me. It is 
 properly more productive working together than creating two 
 competing implementations.
Oh, I absolutely agree, independent implementation is a bad thing. (Someone should rename DRY as "don't repeat yourself or others"... but DRYOO sounds weird.) Where's your repo?
Feb 18 2016
prev sibling parent reply Craig Dillabaugh <craig.dillabaugh gmail.com> writes:
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner 
Schadek wrote:
 On Thursday, 18 February 2016 at 04:34:13 UTC, Alex Vincent 
 wrote:
 I'm looking for a status update.  DUB doesn't seem to have 
 many options posted.  I was thinking about starting a 
 SAXParser implementation.
I'm working on it, but recently I had to do some major restructuring of the code. Currently I'm trying to get this merged https://github.com/D-Programming-Language/phobos/pull/3880 because I had some problems with the encoding of test files. XML has a lot of corner cases, it just takes time. If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.
Would you be interested in mentoring a student for the Google Summer of Code to do work on std.xml?
Feb 18 2016
parent Robert burner Schadek <rburners gmail.com> writes:
On Friday, 19 February 2016 at 04:02:02 UTC, Craig Dillabaugh 
wrote:
 Would you be interested in mentoring a student for the Google 
 Summer of Code to do work on std.xml?
Yes, why not!
Feb 19 2016
prev sibling next sibling parent reply Robert burner Schadek <rburners gmail.com> writes:
While working on a new xml implementation I came cross "control 
characters (CC)". [1]
When trying to validate/convert an utf string these lead to 
exceptions, because they are not valid utf character.
Unfortunately, some of these characters are allowed to appear in 
valid xml 1.* documents.

I currently see two option how to go about it:

1. Do not allow non CCs that do not work with existing 
functionality.
1.Pros
   * easy
1.Cons
   * the resulting xml implementation will not be xml 1.* complete

2. Add special cases to the existing functionality to handle CCs 
that are allowed in 1.0.
2.Pros
   * the resulting xml implementation will be xml 1.* complete
2.Cons
   * will make utf de/encoding slower as I would need to add 
additional logic

Any other ideas, feedback?




[1] https://en.wikipedia.org/wiki/C0_and_C1_control_codes
Feb 18 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 18 February 2016 at 15:56:58 UTC, Robert burner 
Schadek wrote:
 When trying to validate/convert an utf string these lead to 
 exceptions, because they are not valid utf character.
That means the user didn't encode them properly... Which one specifically are you thinking of? I'm pretty sure all those control characters have a spot in the Unicode space and can be properly encoded as UTF-8 (though I think even if they are properly encoded, some of them are illegal in XML anyway). If they appear in another form, it is invalid and/or needs a charset conversion, which should be specified in the XML document itself.
Feb 18 2016
parent reply Robert burner Schadek <rburners gmail.com> writes:
for instance, quick often I find <80> in tests that are supposed 
to be valid xml 1.0. they are invalid xml 1.1 though
Feb 18 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 18 February 2016 at 16:41:52 UTC, Robert burner 
Schadek wrote:
 for instance, quick often I find <80> in tests that are 
 supposed to be valid xml 1.0. they are invalid xml 1.1 though
What char encoding does the document declare itself as?
Feb 18 2016
parent reply Robert burner Schadek <rburners gmail.com> writes:
On Thursday, 18 February 2016 at 16:47:35 UTC, Adam D. Ruppe 
wrote:
 On Thursday, 18 February 2016 at 16:41:52 UTC, Robert burner 
 Schadek wrote:
 for instance, quick often I find <80> in tests that are 
 supposed to be valid xml 1.0. they are invalid xml 1.1 though
What char encoding does the document declare itself as?
It does not, it has no prolog and therefore no EncodingInfo. unix file says it is a utf8 encoded file, but not BOM is present.
Feb 18 2016
next sibling parent reply Robert burner Schadek <rburners gmail.com> writes:
On Thursday, 18 February 2016 at 16:54:10 UTC, Robert burner 
Schadek wrote:
 unix file says it is a utf8 encoded file, but not BOM is 
 present.
the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
Feb 18 2016
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner 
Schadek wrote:
 unix file says it is a utf8 encoded file, but not BOM is 
 present.
the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
Gah, I should have read this before replying... well, that does appear to be valid utf-8.... why is it throwing an exception then? I'm pretty sure that byte stream *is* actually well-formed xml 1.0 and should pass utf validation as well as the XML well-formedness check.
Feb 18 2016
parent reply Alex Vincent <ajvincent gmail.com> writes:
On Thursday, 18 February 2016 at 17:26:30 UTC, Adam D. Ruppe 
wrote:
 On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner 
 Schadek wrote:
 unix file says it is a utf8 encoded file, but not BOM is 
 present.
the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
Gah, I should have read this before replying... well, that does appear to be valid utf-8.... why is it throwing an exception then? I'm pretty sure that byte stream *is* actually well-formed xml 1.0 and should pass utf validation as well as the XML well-formedness check.
Regarding control characters: If you give me a complete sample file, I can run it through Mozilla's UTF stream conversion and/or XML parsing code (via either SAX or DOMParser) to tell you how that reacts as a reference. Mozilla supports XML 1.0, but not 1.1.
Feb 18 2016
parent reply Robert burner Schadek <rburners gmail.com> writes:
On Thursday, 18 February 2016 at 18:28:10 UTC, Alex Vincent wrote:
 Regarding control characters:  If you give me a complete sample 
 file, I can run it through Mozilla's UTF stream conversion 
 and/or XML parsing code (via either SAX or DOMParser) to tell 
 you how that reacts as a reference.  Mozilla supports XML 1.0, 
 but not 1.1.
thanks you making the effort https://github.com/burner/std.xml2/blob/master/tests/eduni/xml-1.1/out/010.xml
Feb 18 2016
parent Alex Vincent <ajvincent gmail.com> writes:
On Thursday, 18 February 2016 at 21:53:24 UTC, Robert burner 
Schadek wrote:
 On Thursday, 18 February 2016 at 18:28:10 UTC, Alex Vincent 
 wrote:
 Regarding control characters:  If you give me a complete 
 sample file, I can run it through Mozilla's UTF stream 
 conversion and/or XML parsing code (via either SAX or 
 DOMParser) to tell you how that reacts as a reference.  
 Mozilla supports XML 1.0, but not 1.1.
thanks you making the effort https://github.com/burner/std.xml2/blob/master/tests/eduni/xml-1.1/out/010.xml
In this case, Firefox just passes the control characters through to the contentHandler.characters method: Starting runTest Retrieved source contentHandler.startDocument() contentHandler.startElement("", "foo", "foo", {}) contentHandler.characters("\u0080") contentHandler.endElement("", "foo", "foo") contentHandler.endDocument() Done reading
Feb 19 2016
prev sibling parent reply Kagamin <spam here.lot> writes:
On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner 
Schadek wrote:
 the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
http://dpaste.dzfl.pl/80888ed31958 like this?
Feb 19 2016
parent reply Robert burner Schadek via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 2016-02-19 11:58, Kagamin via Digitalmars-d wrote:
 On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner Schadek
 wrote:
 the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
http://dpaste.dzfl.pl/80888ed31958 like this?
No, The program just takes the hex dump as string. you would need to do something like: ubyte[] arr = cast(ubyte[])[3C, 66, 6F, 6F, 3E, C2, 80, 3C, 2F, 66, 6F, 6F, 3E]); string s = cast(string)arr; dstring ds = to!dstring(s); and see what happens
Feb 19 2016
parent reply Kagamin <spam here.lot> writes:
On Friday, 19 February 2016 at 12:30:06 UTC, Robert burner 
Schadek wrote:
 ubyte[] arr = cast(ubyte[])[3C, 66, 6F, 6F, 3E, C2, 80, 3C, 2F, 
 66, 6F,
 6F, 3E]);
 string s = cast(string)arr;
 dstring ds = to!dstring(s);

 and see what happens
http://dpaste.dzfl.pl/2f8a8ff10bde like this?
Feb 19 2016
parent Robert burner Schadek <rburners gmail.com> writes:
On Friday, 19 February 2016 at 12:55:52 UTC, Kagamin wrote:
 http://dpaste.dzfl.pl/2f8a8ff10bde like this?
yes
Feb 19 2016
prev sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 18 February 2016 at 16:54:10 UTC, Robert burner 
Schadek wrote:
 It does not, it has no prolog and therefore no EncodingInfo.
In that case, it needs to be valid UTF-8 or valid UTF-16 and it is a fatal error if there's any invalid bytes: https://www.w3.org/TR/REC-xml/#charencoding == It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding. Specifically, it is a fatal error if an entity encoded in UTF-8 contains any ill-formed code unit sequences, as defined in section 3.9 of Unicode [Unicode]. Unless an encoding is determined by a higher-level protocol, it is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16. ==
Feb 18 2016
prev sibling next sibling parent reply crimaniak <crimaniak gmail.com> writes:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:

 Please post you feature requests...
- the ability to read documents with missing or incorrectly specified encoding - additional feature: relaxed mode for reading html and broken XML documents Some time ago I worked for Accusoft for the document viewing/converting software. The main experience that I get: any theoretically possible types of errors in the documents are real, when the application is popular.
Feb 20 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Saturday, 20 February 2016 at 19:08:25 UTC, crimaniak wrote:
 - the ability to read documents with missing or incorrectly 
 specified encoding
 - additional feature: relaxed mode for reading html and broken 
 XML documents
fyi, my dom.d can do those, I use it for web scraping where there's all kinds of hideous stuff out there. https://github.com/adamdruppe/arsd/blob/master/dom.d
Feb 20 2016
parent reply crimaniak <crimaniak gmail.com> writes:
On Saturday, 20 February 2016 at 19:16:47 UTC, Adam D. Ruppe 
wrote:
 On Saturday, 20 February 2016 at 19:08:25 UTC, crimaniak wrote:
 - the ability to read documents with missing or incorrectly 
 specified encoding
 - additional feature: relaxed mode for reading html and broken 
 XML documents
fyi, my dom.d can do those, I use it for web scraping where there's all kinds of hideous stuff out there. https://github.com/adamdruppe/arsd/blob/master/dom.d
It works, thanks! I will use it in my experiments, but getElementsBySelector() selector language need to be improved I think.
Feb 21 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 21 February 2016 at 23:01:22 UTC, crimaniak wrote:
 I will use it in my experiments, but getElementsBySelector() 
 selector language need to be improved I think.
What, specifically, do you have in mind?
Feb 21 2016
parent reply crimaniak <crimaniak gmail.com> writes:
On Sunday, 21 February 2016 at 23:57:40 UTC, Adam D. Ruppe wrote:
 On Sunday, 21 February 2016 at 23:01:22 UTC, crimaniak wrote:
 I will use it in my experiments, but getElementsBySelector() 
 selector language need to be improved I think.
What, specifically, do you have in mind?
Where is only a couple of ad-hoc checks for attributes values. This language is not XPath-compatible, so most easy way to cover a lot of cases is regex check for attributes. Something like "script[src/https:.+\\.googleapis\\.com/i]"
Feb 25 2016
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 25 February 2016 at 23:59:04 UTC, crimaniak wrote:
 Where is only a couple of ad-hoc checks for attributes values. 
 This language is not XPath-compatible, so most easy way to 
 cover a lot of cases is regex check for attributes. Something 
 like "script[src/https:.+\\.googleapis\\.com/i]"
The css3 selector standard offers three substring search: [attr^=foo] if it begins with foo, [attr$=foo] if it ends with foo, and [attr*=foo] if it includes foo somewhere. dom.d supports all three now. So for your regex, you could probably match: [attr*=googleapis.com] well enough.
Feb 25 2016
prev sibling next sibling parent reply Dejan Lekic <dejan.lekic gmail.com> writes:
If you really want to be serious about the XML package, then I 
humbly believe implementing the commonly-known DOM interfaces is 
a must. Luckily there is IDL available for it: 
https://www.w3.org/TR/DOM-Level-2-Core/idl/dom.idl . Also, 
speaking about DOM, all levels need to be supported!

Also, I would recommend borrowing the Tango's XML pull parser as 
it is blazingly fast.

Finally, perhaps integration with signal/slot module should 
perhaps be considered as well.
Feb 24 2016
next sibling parent reply Alex Vincent <ajvincent gmail.com> writes:
On Wednesday, 24 February 2016 at 10:55:01 UTC, Dejan Lekic wrote:
 If you really want to be serious about the XML package, then I 
 humbly believe implementing the commonly-known DOM interfaces 
 is a must. Luckily there is IDL available for it: 
 https://www.w3.org/TR/DOM-Level-2-Core/idl/dom.idl . Also, 
 speaking about DOM, all levels need to be supported!
I agree, but the Document Object Model (DOM) is a huuuuuuuuge project. It's a project I'd love to take an active hand in driving. Also, DOM "level 4" is a living standard at whatwg.org, along with rules for parsing HTML. (Which naturally means the rules are always changing.) I have a partial implementation of DOM in JavaScript, so I am serious when I say it's going to take time. Ideally (imho), we'd have a set of related packages, prefixed with std.web: * html * xml * dom * css * javascript (Yes, I'm suggesting a rename of std.xml2 to std.web.xml.) But from what I can see, realistically the community is a long way from that. I'm trying to write the SAX interfaces now. I only have a limited amount of time to devote to this (a common complaint, I gather)...
Mar 01 2016
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 2 March 2016 at 02:50:22 UTC, Alex Vincent wrote:
 I agree, but the Document Object Model (DOM) is a huuuuuuuuge 
 project.  It's a project I'd love to take an active hand in 
 driving.
My dom.d implements a fair chunk of it already. https://github.com/adamdruppe/arsd/blob/master/dom.d Yes, indeed, it is quite a lot of code, but easy to use if you are familiar with javascript and css selectors. http://dpldocs.info/experimental-docs/arsd.dom.html
Mar 02 2016
prev sibling parent reply =?UTF-8?Q?Tobias=20M=C3=BCller?= <troplin bluewin.ch> writes:
Dejan Lekic <dejan.lekic gmail.com> wrote:
 If you really want to be serious about the XML package, then I 
 humbly believe implementing the commonly-known DOM interfaces is 
 a must. Luckily there is IDL available for it: 
 https://www.w3.org/TR/DOM-Level-2-Core/idl/dom.idl . Also, 
 speaking about DOM, all levels need to be supported!
 
 Also, I would recommend borrowing the Tango's XML pull parser as 
 it is blazingly fast.
 
 Finally, perhaps integration with signal/slot module should 
 perhaps be considered as well.
 
What's the usecase of DOM outside of browser interoperability/scripting? The API isn't particularly nice, especially in languages with a rich type system.
Mar 01 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 2 March 2016 at 06:59:49 UTC, Tobias Müller wrote:
 What's the usecase of DOM outside of browser 
 interoperability/scripting? The API isn't particularly nice, 
 especially in languages with a rich type system.
I find my extended dom to be very nice, especially thanks to D's type system. I use it for a lot of things: using web apis, html scraping, config file stuff, working on my own documents, and even as my web template system. Basically, dom.d made xml cool to me.
Mar 02 2016
parent =?UTF-8?Q?Tobias=20M=C3=BCller?= <troplin bluewin.ch> writes:
Adam D. Ruppe <destructionator gmail.com> wrote:
 On Wednesday, 2 March 2016 at 06:59:49 UTC, Tobias Müller wrote:
 What's the usecase of DOM outside of browser 
 interoperability/scripting? The API isn't particularly nice, 
 especially in languages with a rich type system.
I find my extended dom to be very nice, especially thanks to D's type system. I use it for a lot of things: using web apis, html scraping, config file stuff, working on my own documents, and even as my web template system. Basically, dom.d made xml cool to me.
Sure, some kind of DOM is certainly useful. But the standard XML-DOM isn't particularly nice. What's the point of a linked list style interface when you have ranges in the language?
Mar 03 2016
prev sibling parent reply Craig Dillabaugh <craig.dillabaugh gmail.com> writes:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
 std.xml has been considered not up to specs nearly 3 years now. 
 Time to build a successor. I currently plan the following 
 featues for it:

 - SAX and DOM parser
 - in-situ / slicing parsing when possible (forward range?)
 - compile time switch (CTS) for lazy attribute parsing
 - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 - CTS for input validating
 - performance

 Not much code yet, I'm currently building the performance test 
 suite https://github.com/burner/std.xml2

 Please post you feature requests, and please keep the posts DRY 
 and on topic.
Robert, we have had some student interest in GSOC for XML. Would you be interested in mentoring a student to work with you on this. Craig
Mar 05 2016
parent reply Robert burner Schadek <rburners gmail.com> writes:
On Saturday, 5 March 2016 at 15:20:12 UTC, Craig Dillabaugh wrote:
 Robert, we have had some student interest in GSOC for XML.  
 Would you be interested in mentoring a student to work with you 
 on this.

 Craig
Of course
Mar 06 2016
next sibling parent Lodovico Giaretta <lodovico giaretart.net> writes:
On Sunday, 6 March 2016 at 11:46:00 UTC, Robert burner Schadek 
wrote:
 On Saturday, 5 March 2016 at 15:20:12 UTC, Craig Dillabaugh 
 wrote:
 Robert, we have had some student interest in GSOC for XML.  
 Would you be interested in mentoring a student to work with 
 you on this.

 Craig
Of course
Hi, I don't know if this is the right spot to join the conversation; I'm student and I'd really love to work on std.xml for GSoC! I'm just waiting March 14 to apply.
Mar 07 2016
prev sibling next sibling parent Craig Dillabaugh <craig.dillabaugh gmail.com> writes:
On Sunday, 6 March 2016 at 11:46:00 UTC, Robert burner Schadek 
wrote:
 On Saturday, 5 March 2016 at 15:20:12 UTC, Craig Dillabaugh 
 wrote:
 Robert, we have had some student interest in GSOC for XML.  
 Would you be interested in mentoring a student to work with 
 you on this.

 Craig
Of course
Great. Can you please get in touch by email so I can add you to the mentors list: craig dot dillabaugh at gmail dot com Cheers
Mar 07 2016
prev sibling parent Alex Vincent <ajvincent gmail.com> writes:
For everyone's information, I've posted a pull request to Mr. 
Schadek's github repository, with a proposed Simple API for XML 
(SAX) stub.  I'd really appreciate reviews of the stub's 
interfaces.

https://github.com/burner/std.xml2/pull/5
Mar 12 2016