digitalmars.D - Replacing std.xml
- w0rp (14/14) Aug 29 2013 Hello everybody. I've been wondering, what are the current plans
- Jonathan M Davis (32/48) Aug 29 2013 Someone needs to step forward, write it, and get it through the review
- Jacob Carlborg (5/8) Aug 29 2013 Won't that have the same problem as we talked about in of the threads
- Jonathan M Davis (13/20) Aug 29 2013 Possibly, but then all you have to do is make it so that it treats strin...
- Jacob Carlborg (10/14) Aug 29 2013 What! I hardly believe that. That might be the case for HTML but I don't...
- Chris (3/18) Aug 29 2013 And while we're at it, what about YAML? It's a subset of JSON
- Jacob Carlborg (5/7) Aug 29 2013 YAML is a super set of JSON, not the other way around. But yes, I would
- Jonathan M Davis (14/30) Aug 29 2013 Well, as I said, I couldn't remember exactly what the XML standard said ...
- Brad Anderson (9/32) Aug 29 2013 You just specify the encoding in the root element.
- Jacob Carlborg (5/10) Aug 29 2013 I don't understand. If use a range of dchar and call "front" and
- Jonathan M Davis (9/19) Aug 29 2013 Any decent parser is going to special-case strings (especially if it's u...
- Michel Fortin (18/30) Aug 29 2013 The XML standard says that an XML parser MUST support UTF-8 and UTF-16,
- H. S. Teoh (37/49) Aug 29 2013 Take a look here:
- Jacob Carlborg (6/8) Aug 29 2013 Actually, does the encoding really matters (as long as it's compatible
- Brad Anderson (5/57) Aug 29 2013 It doesn't look like adding the rest of the ISO-8859 encodings
- deadalnix (4/26) Aug 31 2013 As this is not the first time I see it used as a reliable source,
- Sean Kelly (11/23) Aug 29 2013 wstring,
- Brad Anderson (3/10) Aug 29 2013 This makes me wonder what kind of optimizations a hypothetical
- H. S. Teoh (8/28) Aug 29 2013 Right, that's why I said the core of std.xml should handle everything as
- Joakim (5/12) Aug 29 2013 I think it's great that there's no std.xml, as it implies that
- maarten van damme (4/14) Aug 29 2013 and imagine someone forced to use xml who reads this answer from the
- Chris (4/7) Aug 29 2013 No way around XML. A must have, as has been said in this thread.
- H. S. Teoh (26/34) Aug 29 2013 While I do agree that in the current state of affairs, XML support is a
- Chris (10/50) Aug 29 2013 I am moving away from XML too. Wanted to use it for a private
- Jason den Dulk (7/10) Aug 29 2013 The main disavantage of JSON vs XML is lack of validation.
- Joakim (10/18) Aug 29 2013 We already have a std.json in Phobos for years now. I think it'd
- w0rp (22/41) Aug 29 2013 JSON is better than XML in every way I can think of. Easier to
- Michel Fortin (21/37) Aug 29 2013 I wrote something like that a while ago.
- Jonathan M Davis (7/46) Aug 29 2013 Cool. I started looking at implementing something like that a while back...
- ilya-stromberg (2/19) Sep 03 2013 Can you push it to the github, please?
- Michel Fortin (9/34) Sep 03 2013 Good idea.
- Johannes Pfau (8/23) Aug 29 2013 I most points here also apply to std.xml:
- Robert Schadek (6/10) Aug 29 2013 I think, this even extends to access to all semi- and structured-data.
- Jacob Carlborg (6/11) Aug 29 2013 So you want serialization :). Which we currently are reviewing.
- Robert Schadek (4/7) Aug 29 2013 well, sort of, but also with partial serialization (think sql update),
- Brad Anderson (10/26) Aug 29 2013 That's a really great point. All of these modules that can't
- Brad Anderson (2/11) Aug 29 2013 (or maybe just improve Variant)
- Tobias Pankrath (4/18) Aug 29 2013 There is http://dsource.org/projects/xmlp, which at some point
- ilya-stromberg (8/11) Aug 31 2013 Also, we have Tango Xml:
- Jacob Carlborg (5/10) Aug 31 2013 Unfortunately the Tango XML package will never end up in Phobos due to
- ilya-stromberg (3/5) Sep 01 2013 Yes, but we can always learn source code and put attention to the
- Jonathan M Davis (6/12) Sep 01 2013 Not really. Looking at the source code effectively taints you. By doing ...
- Michel Fortin (17/27) Aug 31 2013 Someone should benchmark it against the XML implementation I made. It
- Jacob Carlborg (5/8) Aug 31 2013 I guess "Pull" is the key here. That it is the client's responsibility
- qznc (7/31) Sep 02 2013 Recursion means you use the call stack instead of stack object on
- Michel Fortin (6/20) Sep 02 2013 Good point about caring for pathological cases.
- Richard Webb (5/8) Sep 02 2013 Has anyone done any benchmarks recently to see if that is still the case...
- Andrei Alexandrescu (5/18) Aug 29 2013 I don't know much about XML, but I noticed there are a few popular
- Jonathan M Davis (7/11) Aug 29 2013 That works especially well with how Michel and I were thinking it should...
- Walter Bright (8/20) Aug 31 2013 The Tango implementation of XML has been very well received. I haven't l...
- Lionello Lunesu (10/23) Sep 02 2013 Having been the lead programmer on the Microsoft XML team for three
- Peter Williams (13/38) Sep 02 2013 For whoever ends up doing std.xml's replacement, it would be good if
- Andrei Alexandrescu (3/11) Sep 03 2013 This is great info, thanks.
Hello everybody. I've been wondering, what are the current plans to replace std.xml? I'd like to help with the effort to get a final XML library in phobos. So, I have a few questions. First, and most importantly, what do we except out of a D XML library? I'd really like to have a discussion of the form, "Here is exactly the interface the structs/classes need to implement, go forth and implement." The general idea in my mind is "something SAX-like, with something a little DOM-like." I'm aware that std.xml has some issues support different encodings, so obvious that's included. Second, is there an existing library that has gotten close to meeting whatever we need for the first point? If so, how far away is it from being able to meet all of the requirements and become the standard library version?
Aug 29 2013
On Thursday, August 29, 2013 09:25:35 w0rp wrote:Hello everybody. I've been wondering, what are the current plans to replace std.xml? I'd like to help with the effort to get a final XML library in phobos. So, I have a few questions.Someone needs to step forward, write it, and get it through the review process. A while back, someone was working on a possible new version of std.xml, but they disappeared. No one has stepped up since. I'd love to do it if I had time, but I don't. There are probably several others around here in the same boat, but until someone who has the time and skill does do it, we won't have a new std.xml.First, and most importantly, what do we except out of a D XML library? I'd really like to have a discussion of the form, "Here is exactly the interface the structs/classes need to implement, go forth and implement."Except that that's really the task of the person creating the new std.xml. Generally what happens is that the person writing the module comes up with an API and then presents it rather than asking others to come up with ideas to design it for them. Obviously, ideas can be discussed, but design-by-committee is arguably a bad idea. And it just works better to have a concrete design to discuss.The general idea in my mind is "something SAX-like, with something a little DOM-like."What I personally think would be best is to have multiple parsers. First you have something STAX-like (or maybe even lower level - I don't recall exactly what STAX gives you at the moment) that basically tokenizes the XML and returns a range of that. Then SAX and DOM parsers can be built on top of that. That way, you get the fastest parser possible as well as higher level, more functional parsers. But two of the biggest points of the design are that it's going to have to be range-based, and it's going to need to be able to take full advantage of slices (when used with any strings or random-access ranges) in order to avoid copying any of the data. That's the key design point which will allow a D parser to be extremely fast in comparison to parsers in most other languages.I'm aware that std.xml has some issues support different encodings, so obvious that's included.Personally, I would have just said use ranges of dchar and be done with it without worrying about character encodings at all, but I don't remember what all the XML standard does with encodings.Second, is there an existing library that has gotten close to meeting whatever we need for the first point? If so, how far away is it from being able to meet all of the requirements and become the standard library version?There are several D XML libraries floating around, but no one has taken the time to get any of the prepared for the Phobos review queue, and I suspect that very few of them are range-based like the Phobos XML solution needs to be, but I don't know. - Jonathan M Davis
Aug 29 2013
On 2013-08-29 09:47, Jonathan M Davis wrote:Personally, I would have just said use ranges of dchar and be done with it without worrying about character encodings at all, but I don't remember what all the XML standard does with encodings.Won't that have the same problem as we talked about in of the threads about a D lexer? That is, doing unnecessary en/decoding. -- /Jacob Carlborg
Aug 29 2013
On Thursday, August 29, 2013 11:08:18 Jacob Carlborg wrote:On 2013-08-29 09:47, Jonathan M Davis wrote:Possibly, but then all you have to do is make it so that it treats strings as ranges of code units (and possibly support ranges of char and wchar), and you can avoid the unnecessary decoding. But aside from possibly support ranges of char or wchar, that would be completely internal to the parser, and the caller wouldn't care. An alternative would be to specifically support ranges of ubyte instead of strings, though given that XML is usually treated as a string, that would arguably be a bit odd. Regardless, as far as strings go, it's easy enough to avoid decoding in the implementation. IIRC, everything in XML is ASCII anyway, with stuff like HTML codes to indicate Unicode characters. And if that's the case, avoiding unnecessary decoding is trivial when operating on strings. - Jonathan M DavisPersonally, I would have just said use ranges of dchar and be done with it without worrying about character encodings at all, but I don't remember what all the XML standard does with encodings.Won't that have the same problem as we talked about in of the threads about a D lexer? That is, doing unnecessary en/decoding.
Aug 29 2013
On 2013-08-29 11:23, Jonathan M Davis wrote:IIRC, everything in XML is ASCII anyway, with stuff like HTML codes to indicate Unicode characters. And if that's the case, avoiding unnecessary decoding is trivial when operating on strings.What! I hardly believe that. That might be the case for HTML but I don't think it is for XML. There are many file formats that are based on XML. I don't think all those use HTML codes. This is what W3 Schools says: "XML documents can contain non ASCII characters, like Norwegian æ ø å , or French ê è é. To avoid errors, specify the XML encoding, or save XML files as Unicode.". -- /Jacob Carlborg
Aug 29 2013
On Thursday, 29 August 2013 at 13:20:40 UTC, Jacob Carlborg wrote:On 2013-08-29 11:23, Jonathan M Davis wrote:And while we're at it, what about YAML? It's a subset of JSON which means the new json.d module will handle it, I suppose.IIRC, everything in XML is ASCII anyway, with stuff like HTML codes to indicate Unicode characters. And if that's the case, avoiding unnecessary decoding is trivial when operating on strings.What! I hardly believe that. That might be the case for HTML but I don't think it is for XML. There are many file formats that are based on XML. I don't think all those use HTML codes. This is what W3 Schools says: "XML documents can contain non ASCII characters, like Norwegian æ ø å , or French ê è é. To avoid errors, specify the XML encoding, or save XML files as Unicode.".
Aug 29 2013
On 2013-08-29 16:07, Chris wrote:And while we're at it, what about YAML? It's a subset of JSON which means the new json.d module will handle it, I suppose.YAML is a super set of JSON, not the other way around. But yes, I would like to have YAML support as well. -- /Jacob Carlborg
Aug 29 2013
On Thursday, 29 August 2013 at 19:26:21 UTC, Jacob Carlborg wrote:On 2013-08-29 16:07, Chris wrote:Yes of course, you are right. I found this on the internet. Seems to be abandoned. https://github.com/kiith-sa/D-YAMLAnd while we're at it, what about YAML? It's a subset of JSON which means the new json.d module will handle it, I suppose.YAML is a super set of JSON, not the other way around. But yes, I would like to have YAML support as well.
Aug 29 2013
On Thursday, 29 August 2013 at 22:56:36 UTC, Chris wrote:On Thursday, 29 August 2013 at 19:26:21 UTC, Jacob Carlborg wrote:It's not really abandoned, I keep updating it with compatibility fixes for new DMD releases as my other projects depend on it. Its API does not fit into Phobos, however (not range-based), and it won't unless I find a few weeks/months to work on it exclusively, which is unlikely in the near future. It also only supports YAML 1.1 at the moment, and recursive data structures are not yet supported.On 2013-08-29 16:07, Chris wrote:Yes of course, you are right. I found this on the internet. Seems to be abandoned. https://github.com/kiith-sa/D-YAMLAnd while we're at it, what about YAML? It's a subset of JSON which means the new json.d module will handle it, I suppose.YAML is a super set of JSON, not the other way around. But yes, I would like to have YAML support as well.
Aug 29 2013
On Thursday, August 29, 2013 15:20:39 Jacob Carlborg wrote:On 2013-08-29 11:23, Jonathan M Davis wrote:Well, as I said, I couldn't remember exactly what the XML standard said about encodings, but if it can contain non-ASCII characters, then my first inclination is to say that it has to be UTF-8, UTF-16, or UTF-32 based on the fact that that's what we support in the language and in Phobos (as I understand it, std.encodings is a bit of a joke that needs to be rethought and replaced, but regardless, it's the only Phobos module supporting any non- Unicode encodings). However, because all of the XML special symbols should be ASCII, you should still be able to avoid decoding characters for the most part. It's only when you have to actually look at the content that Unicode would potentially matter. So, the performance hit of decoding Unicode characters should mostly be able to be avoided. - Jonathan M DavisIIRC, everything in XML is ASCII anyway, with stuff like HTML codes to indicate Unicode characters. And if that's the case, avoiding unnecessary decoding is trivial when operating on strings.What! I hardly believe that. That might be the case for HTML but I don't think it is for XML. There are many file formats that are based on XML. I don't think all those use HTML codes. This is what W3 Schools says: "XML documents can contain non ASCII characters, like Norwegian æ ø å , or French ê è é. To avoid errors, specify the XML encoding, or save XML files as Unicode.".
Aug 29 2013
On Thursday, 29 August 2013 at 17:38:43 UTC, Jonathan M Davis wrote:Well, as I said, I couldn't remember exactly what the XML standard said about encodings, but if it can contain non-ASCII characters, then my first inclination is to say that it has to be UTF-8, UTF-16, or UTF-32 based on the fact that that's what we support in the language and in Phobos (as I understand it, std.encodings is a bit of a joke that needs to be rethought and replaced, but regardless, it's the only Phobos module supporting any non- Unicode encodings). However, because all of the XML special symbols should be ASCII, you should still be able to avoid decoding characters for the most part. It's only when you have to actually look at the content that Unicode would potentially matter. So, the performance hit of decoding Unicode characters should mostly be able to be avoided. - Jonathan M DavisYou just specify the encoding in the root element. <?xml version="1.0" encoding="us-ascii"?> <?xml version="1.0" encoding="windows-1252"?> <?xml version="1.0" encoding="ISO-8859-1"?> <?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-16"?> UTF-8 is the default in lieu of a BOM saying otherwise.
Aug 29 2013
On 2013-08-29 19:38, Jonathan M Davis wrote:However, because all of the XML special symbols should be ASCII, you should still be able to avoid decoding characters for the most part. It's only when you have to actually look at the content that Unicode would potentially matter. So, the performance hit of decoding Unicode characters should mostly be able to be avoided.I don't understand. If use a range of dchar and call "front" and "popFront" won't it do decoding then? -- /Jacob Carlborg
Aug 29 2013
On Thursday, August 29, 2013 21:28:09 Jacob Carlborg wrote:On 2013-08-29 19:38, Jonathan M Davis wrote:Any decent parser is going to special-case strings (especially if it's using slicing), in which case, it won't call front unless it needs to decode. The only real question is whether generic char and wchar ranges should be supported, because then you could avoid the decoding for ranges that aren't strings, but strings are already covered simply by special casing. You really can't afford to not special-case for strings for algorithms in general if efficiency is a high priority. - Jonathan M DavisHowever, because all of the XML special symbols should be ASCII, you should still be able to avoid decoding characters for the most part. It's only when you have to actually look at the content that Unicode would potentially matter. So, the performance hit of decoding Unicode characters should mostly be able to be avoided.I don't understand. If use a range of dchar and call "front" and "popFront" won't it do decoding then?
Aug 29 2013
On 2013-08-29 17:38:23 +0000, "Jonathan M Davis" <jmdavisProg gmx.com> said:Well, as I said, I couldn't remember exactly what the XML standard said about encodings, but if it can contain non-ASCII characters, then my first inclination is to say that it has to be UTF-8, UTF-16, or UTF-32 based on the fact that that's what we support in the language and in Phobos (as I understand it, std.encodings is a bit of a joke that needs to be rethought and replaced, but regardless, it's the only Phobos module supporting any non- Unicode encodings).The XML standard says that an XML parser MUST support UTF-8 and UTF-16, and MAY support other encodings. Supporting non-UTF-8 encodings is a separate problem from parsing XML, and proper code for that would have much broader applications. Keep in mind that the more encoding you support, the more bloat you add to the executable, so there's a tradeoff to be made. In many cases, UTF-8 is enough, while in many others it's not. (My XML implementation has a function that parses the XML prolog and tells you the encoding so you can take the appropriate code path before feeding the parser. A higher level API could handle encodings automatically based on that that. )However, because all of the XML special symbols should be ASCII, you should still be able to avoid decoding characters for the most part. It's only when you have to actually look at the content that Unicode would potentially matter. So, the performance hit of decoding Unicode characters should mostly be able to be avoided.Just like my XML implementation does. (I made frontUnit/popFrontUnit functions I'm using when decoding code points is unnecessary.) -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Aug 29 2013
On Thu, Aug 29, 2013 at 01:38:23PM -0400, Jonathan M Davis wrote: [...]Well, as I said, I couldn't remember exactly what the XML standard said about encodings, but if it can contain non-ASCII characters, then my first inclination is to say that it has to be UTF-8, UTF-16, or UTF-32 based on the fact that that's what we support in the language and in PhobosTake a look here: http://www.w3schools.com/xml/xml_encoding.asp XML files can have *any* valid encoding, including nastiness like windows-1252 and relics like iso-8859-1. Unfortunately, I don't think we have a way around this, since existing XML files out there probably already have all of these encodings are more, and std.xml is gonna hafta support 'em all. Otherwise we're gonna get irate users complaining "why can't std.xml parse my oddly-encoded-but-standards-compliant XML file?!"(as I understand it, std.encodings is a bit of a joke that needs to be rethought and replaced, but regardless, it's the only Phobos module supporting any non- Unicode encodings).No kidding! I was trying to write a program that navigates a website automatically using std.net.curl, and I'm running into all sorts of silly roadblocks, including std.encoding not supporting iso-8859-* encodings. The good news is that on Linux, there's a handy utility called 'recode', which comes with a library called 'librecode', that supports converting between a huge number of different encodings -- many more than probably you or I have imagined existed -- including to/from Unicode. I know we don't like including external libraries in Phobos, but I honestly don't see any justification for reinventing the wheel by writing (and maintaining!) our own equivalent to librecode, unless licensing issues prevents us from including librecode in Phobos, nicely wrapped in a modern range-based D API.However, because all of the XML special symbols should be ASCII, you should still be able to avoid decoding characters for the most part. It's only when you have to actually look at the content that Unicode would potentially matter. So, the performance hit of decoding Unicode characters should mostly be able to be avoided.[...] One way is to write the core code of std.xml in such a way that it handles all data as ubyte[] (or ushort[]/uint[] for 16-bit/32-bit encodings) so that it's encoding-independent. Then on top of this core, write some convenience wrappers that casts/converts to string, wstring, dstring. As an initial stab, we could support only UTF-8, UTF-16, UTF-32 if the user asks for string/wstring/dstring, and leave XML in other encodings up to the user to decode manually. This way, at least the user can get the data out of the file. Later on, once we've gotten our act together with std.encoding, we can hook it up to std.xml to provide autoconversion. T -- Almost all proofs have bugs, but almost all theorems are true. -- Paul Pedersen
Aug 29 2013
On 2013-08-29 20:57, H. S. Teoh wrote:XML files can have *any* valid encoding, including nastiness like windows-1252 and relics like iso-8859-1.Actually, does the encoding really matters (as long as it's compatible with ASCII). Just use a range of ubytes, the parser will only be looking for characters in the ASCII table anyway. -- /Jacob Carlborg
Aug 29 2013
On Thursday, 29 August 2013 at 18:58:57 UTC, H. S. Teoh wrote:No kidding! I was trying to write a program that navigates a website automatically using std.net.curl, and I'm running into all sorts of silly roadblocks, including std.encoding not supporting iso-8859-* encodings.It doesn't look like adding the rest of the ISO-8859 encodings would be all that difficult if you used the existing ISO-8859-1 (Latin1) as a base. I don't quite understand where and how transcoding is done though.The good news is that on Linux, there's a handy utility called 'recode', which comes with a library called 'librecode', that supports converting between a huge number of different encodings -- many more than probably you or I have imagined existed -- including to/from Unicode. I know we don't like including external libraries in Phobos, but I honestly don't see any justification for reinventing the wheel by writing (and maintaining!) our own equivalent to librecode, unless licensing issues prevents us from including librecode in Phobos, nicely wrapped in a modern range-based D API.However, because all of the XML special symbols should be ASCII, you should still be able to avoid decoding characters for the most part. It's only when you have to actually look at the content that Unicode would potentially matter. So, the performance hit of decoding Unicode characters should mostly be able to be avoided.[...] One way is to write the core code of std.xml in such a way that it handles all data as ubyte[] (or ushort[]/uint[] for 16-bit/32-bit encodings) so that it's encoding-independent. Then on top of this core, write some convenience wrappers that casts/converts to string, wstring, dstring. As an initial stab, we could support only UTF-8, UTF-16, UTF-32 if the user asks for string/wstring/dstring, and leave XML in other encodings up to the user to decode manually. This way, at least the user can get the data out of the file. Later on, once we've gotten our act together with std.encoding, we can hook it up to std.xml to provide autoconversion. T
Aug 29 2013
On Thursday, 29 August 2013 at 18:58:57 UTC, H. S. Teoh wrote:On Thu, Aug 29, 2013 at 01:38:23PM -0400, Jonathan M Davis wrote: [...]As this is not the first time I see it used as a reliable source, no, w3school is full of shit. Don't use that website when looking for precise high quality information.Well, as I said, I couldn't remember exactly what the XML standard said about encodings, but if it can contain non-ASCII characters, then my first inclination is to say that it has to be UTF-8, UTF-16, or UTF-32 based on the fact that that's what we support in the language and in PhobosTake a look here: http://www.w3schools.com/xml/xml_encoding.asp XML files can have *any* valid encoding, including nastiness like windows-1252 and relics like iso-8859-1. Unfortunately, I don't think we have a way around this, since existing XML files out there probably already have all of these encodings are more, and std.xml is gonna hafta support 'em all. Otherwise we're gonna get irate users complaining "why can't std.xml parse my oddly-encoded-but-standards-compliant XML file?!"
Aug 31 2013
On Aug 29, 2013, at 11:57 AM, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:=20 One way is to write the core code of std.xml in such a way that it handles all data as ubyte[] (or ushort[]/uint[] for 16-bit/32-bit encodings) so that it's encoding-independent. Then on top of this =core,write some convenience wrappers that casts/converts to string, =wstring,dstring. As an initial stab, we could support only UTF-8, UTF-16, =UTF-32if the user asks for string/wstring/dstring, and leave XML in other encodings up to the user to decode manually. This way, at least the =usercan get the data out of the file. =20 Later on, once we've gotten our act together with std.encoding, we can hook it up to std.xml to provide autoconversion.As long autoconversion is optional. When parsing XML or JSON or = whatever, I generally only care about specific strings, and sometimes = don't want anything decoded at all. Having decoding done automatically = before the event fires is a huge and potentially unnecessary performance = hit. Not doing this decoding automatically is what makes the Tango XML = parser so fast.=
Aug 29 2013
On Thursday, 29 August 2013 at 20:08:10 UTC, Sean Kelly wrote:As long autoconversion is optional. When parsing XML or JSON or whatever, I generally only care about specific strings, and sometimes don't want anything decoded at all. Having decoding done automatically before the event fires is a huge and potentially unnecessary performance hit. Not doing this decoding automatically is what makes the Tango XML parser so fast.This makes me wonder what kind of optimizations a hypothetical ctXml could perform.
Aug 29 2013
On Thu, Aug 29, 2013 at 12:41:16PM -0700, Sean Kelly wrote:On Aug 29, 2013, at 11:57 AM, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:Right, that's why I said the core of std.xml should handle everything as bytes, only specially treating the ASCII values of <, >, &, and other metacharacters. The tagname and tag body should just be a range over segments of the input. T -- What are you when you run out of Monet? Baroque.One way is to write the core code of std.xml in such a way that it handles all data as ubyte[] (or ushort[]/uint[] for 16-bit/32-bit encodings) so that it's encoding-independent. Then on top of this core, write some convenience wrappers that casts/converts to string, wstring, dstring. As an initial stab, we could support only UTF-8, UTF-16, UTF-32 if the user asks for string/wstring/dstring, and leave XML in other encodings up to the user to decode manually. This way, at least the user can get the data out of the file. Later on, once we've gotten our act together with std.encoding, we can hook it up to std.xml to provide autoconversion.As long autoconversion is optional. When parsing XML or JSON or whatever, I generally only care about specific strings, and sometimes don't want anything decoded at all. Having decoding done automatically before the event fires is a huge and potentially unnecessary performance hit. Not doing this decoding automatically is what makes the Tango XML parser so fast.
Aug 29 2013
On Thursday, 29 August 2013 at 07:47:35 UTC, Jonathan M Davis wrote:There are several D XML libraries floating around, but no one has taken the time to get any of the prepared for the Phobos review queue, and I suspect that very few of them are range-based like the Phobos XML solution needs to be, but I don't know.I think it's great that there's no std.xml, as it implies that nobody using D would use a dumb tech like XML. Let's keep it that way. :)
Aug 29 2013
and imagine someone forced to use xml who reads this answer from the community :p std.xml is a must, no doubt. 2013/8/29 Joakim <joakim airpost.net>On Thursday, 29 August 2013 at 07:47:35 UTC, Jonathan M Davis wrote:There are several D XML libraries floating around, but no one has taken the time to get any of the prepared for the Phobos review queue, and I suspect that very few of them are range-based like the Phobos XML solution needs to be, but I don't know.I think it's great that there's no std.xml, as it implies that nobody using D would use a dumb tech like XML. Let's keep it that way. :)
Aug 29 2013
On Thursday, 29 August 2013 at 09:24:31 UTC, Joakim wrote:I think it's great that there's no std.xml, as it implies that nobody using D would use a dumb tech like XML. Let's keep it that way. :)No way around XML. A must have, as has been said in this thread. But what would you suggest as a better alternative to XML. It might be worth creating modules for alternative too (like JSON).
Aug 29 2013
On Thu, Aug 29, 2013 at 01:14:19PM +0200, Chris wrote:On Thursday, 29 August 2013 at 09:24:31 UTC, Joakim wrote:While I do agree that in the current state of affairs, XML support is a must, I also think that XML is just way overengineered, IMNSHO. It has adds too much overhead and therefore requires compression to be efficient, and it is needlessly complex for what it does (tag attributes, all the different cases of CDATA / non-CDATA, etc.). This complexity makes it impractical to edit by hand, relegating it to machine reading/writing only, which then begs the question of why a binary format wasn't chosen instead. And don't get me started on DTDs, which are incredibly convoluted and can't even express certain things that one might want to express in an automatic validation system. Or that 17-headed monster called XSLT, which, thankfully, is fading into the obscurity of time. JSON is a nicer, simpler alternative, though there may be limitations with it that I don't know about. Word on the street is that many people are abandoning XML for JSON due to lower maintenance overhead (and this includes one of my friends, who was a hardcore XML fanatic -- I was frankly quite surprised when he told me he was considering migrating to JSON, since the original reason he chose XML was so that his data will future-proof... well, so much for *that*). But all of this is irrelevant... it doesn't alleviate the need for a std.xml replacement, since we have to live in the real world where XML exists and must be supported. :) T -- Life would be easier if I had the source code. -- YHLI think it's great that there's no std.xml, as it implies that nobody using D would use a dumb tech like XML. Let's keep it that way. :)No way around XML. A must have, as has been said in this thread. But what would you suggest as a better alternative to XML. It might be worth creating modules for alternative too (like JSON).
Aug 29 2013
On Thursday, 29 August 2013 at 15:43:36 UTC, H. S. Teoh wrote:While I do agree that in the current state of affairs, XML support is a must, I also think that XML is just way overengineered, IMNSHO. It has adds too much overhead and therefore requires compression to be efficient, and it is needlessly complex for what it does (tag attributes, all the different cases of CDATA / non-CDATA, etc.). This complexity makes it impractical to edit by hand, relegating it to machine reading/writing only, which then begs the question of why a binary format wasn't chosen instead. And don't get me started on DTDs, which are incredibly convoluted and can't even express certain things that one might want to express in an automatic validation system. Or that 17-headed monster called XSLT, which, thankfully, is fading into the obscurity of time. JSON is a nicer, simpler alternative, though there may be limitations with it that I don't know about. Word on the street is that many people are abandoning XML for JSON due to lower maintenance overhead (and this includes one of my friends, who was a hardcore XML fanatic -- I was frankly quite surprised when he told me he was considering migrating to JSON, since the original reason he chose XML was so that his data will future-proof... well, so much for *that*). But all of this is irrelevant... it doesn't alleviate the need for a std.xml replacement, since we have to live in the real world where XML exists and must be supported. :) TI am moving away from XML too. Wanted to use it for a private project. But I soon realized the madness of it, especially when there are people involved who are not programmers and have no clue whatsoever about markup languages, data storage formats etc. I think JSON and YAML are good candidates for the private project which revolves around collecting words and phrases and archiving them. I don't know exactly what I will use, but XML definitely won't get the job. DTD sounds too much like DDT!
Aug 29 2013
On Thursday, 29 August 2013 at 15:43:36 UTC, H. S. Teoh wrote:JSON is a nicer, simpler alternative, though there may be limitations with it that I don't know about.The main disavantage of JSON vs XML is lack of validation. Whenever I write code that works with JSON (or any data format), I have to write extra code to perform validation. If there was a validation addon for JSON, you could nix XML for good. Regards Jason
Aug 29 2013
On Thursday, 29 August 2013 at 11:14:21 UTC, Chris wrote:On Thursday, 29 August 2013 at 09:24:31 UTC, Joakim wrote:We already have a std.json in Phobos for years now. I think it'd be great for Phobos to nudge users in better directions, by having a std.json but no std.xml. There'll always be outside libraries to process XML, for those who can't go without, perhaps a list of XML libraries can be added to the wiki: http://wiki.dlang.org/Libraries_and_Frameworks I see no use for XML, as it's a horrible solution in search of a problem, but for those who must use it, they can always get it outside Phobos. Just a suggestion.I think it's great that there's no std.xml, as it implies that nobody using D would use a dumb tech like XML. Let's keep it that way. :)No way around XML. A must have, as has been said in this thread. But what would you suggest as a better alternative to XML. It might be worth creating modules for alternative too (like JSON).
Aug 29 2013
On Thursday, 29 August 2013 at 09:24:31 UTC, Joakim wrote:I think it's great that there's no std.xml, as it implies that nobody using D would use a dumb tech like XML. Let's keep it that way. :)JSON is better than XML in every way I can think of. Easier to map to data structures in whichever language you're using, much smaller in size, less corner cases, etc. However, just saying XML is dumb isn't a useful policy. You need ways of parsing XML on hand until people stop using it. On Thursday, 29 August 2013 at 08:15:39 UTC, Robert Schadek wrote:On 08/29/2013 09:51 AM, Johannes Pfau wrote:I'm really not so sure about that kind of approach. Automatic serialisation I think works one of two ways. Either you have control over the data you're pulling in, and you can change it to map more easily to your data structures, or you don't and you have to make your data structures more ugly to fit the data you're pulling in. I prefer just writing functions that take format X and give you in-memory representation Y over automatic serialisation stuff. I know it's boring and easy to write functions like that, but why can't some things just be boring and easy? This looks like a really popular topic, and it's cool that there seem to be quite a few implementations that are close to being what we want. I think we're probably not far off just lining up a few different implementations and reviewing them all for possible inclusion in phobos.I most points here also apply to std.xml: t Those are not strict requirements though, I just summarized what I remembered from old discussions.I think, this even extends to access to all semi- and structured-data. Think csv, sql nosql, you name it. Something which deserves a name like Uniform Access. I don't want to care if data is laid out differently. I want to define my struct or class mark the members to fill a pass it to somebodies code and don't want to care if its xml, sql or whatever.
Aug 29 2013
On 2013-08-29 07:47:17 +0000, Jonathan M Davis <jmdavisProg gmx.com> said:On Thursday, August 29, 2013 09:25:35 w0rp wrote:I wrote something like that a while ago. It only accepted arrays as input because of the lack of a "buffered range" concept that'd allow lookahead and efficient slicing from any kind of range, but that could be retrofitted in. It implements pretty much all of the XML spec, except for documents having an internal subset (which is something a little arcane). It does not deal with namespaces either, I feel like that should be done a layer above, but I'm not entirely sure. Lower-level parser: http://michelf.ca/docs/d/mfr/xmltok.html Higher-level parser built on the first one: http://michelf.ca/docs/d/mfr/xml.html The code: http://michelf.ca/docs/d/mfr-xml-2010-10-19.zip That code hasn't been compiled in a while, but it used to work very well for me. Feel free to use as a starting point. -- Michel Fortin michel.fortin michelf.ca http://michelf.caThe general idea in my mind is "something SAX-like, with something a little DOM-like."What I personally think would be best is to have multiple parsers. First you have something STAX-like (or maybe even lower level - I don't recall exactly what STAX gives you at the moment) that basically tokenizes the XML and returns a range of that. Then SAX and DOM parsers can be built on top of that. That way, you get the fastest parser possible as well as higher level, more functional parsers. But two of the biggest points of the design are that it's going to have to be range-based, and it's going to need to be able to take full advantage of slices (when used with any strings or random-access ranges) in order to avoid copying any of the data. That's the key design point which will allow a D parser to be extremely fast in comparison to parsers in most other languages.
Aug 29 2013
On Thursday, August 29, 2013 12:14:28 Michel Fortin wrote:On 2013-08-29 07:47:17 +0000, Jonathan M Davis <jmdavisProg gmx.com> said:Cool. I started looking at implementing something like that a while back but really didn't have time to get very far. But if we really care about efficiency, I think that that's the basic approach that we need to take. However, the trick as always is someone having the time to do it. Maybe one of us can take what you did and start from there or at least use it is an example to start from. - Jonathan M DavisOn Thursday, August 29, 2013 09:25:35 w0rp wrote:I wrote something like that a while ago. It only accepted arrays as input because of the lack of a "buffered range" concept that'd allow lookahead and efficient slicing from any kind of range, but that could be retrofitted in. It implements pretty much all of the XML spec, except for documents having an internal subset (which is something a little arcane). It does not deal with namespaces either, I feel like that should be done a layer above, but I'm not entirely sure. Lower-level parser: http://michelf.ca/docs/d/mfr/xmltok.html Higher-level parser built on the first one: http://michelf.ca/docs/d/mfr/xml.html The code: http://michelf.ca/docs/d/mfr-xml-2010-10-19.zip That code hasn't been compiled in a while, but it used to work very well for me. Feel free to use as a starting point.The general idea in my mind is "something SAX-like, with something a little DOM-like."What I personally think would be best is to have multiple parsers. First you have something STAX-like (or maybe even lower level - I don't recall exactly what STAX gives you at the moment) that basically tokenizes the XML and returns a range of that. Then SAX and DOM parsers can be built on top of that. That way, you get the fastest parser possible as well as higher level, more functional parsers. But two of the biggest points of the design are that it's going to have to be range-based, and it's going to need to be able to take full advantage of slices (when used with any strings or random-access ranges) in order to avoid copying any of the data. That's the key design point which will allow a D parser to be extremely fast in comparison to parsers in most other languages.
Aug 29 2013
On Thursday, 29 August 2013 at 16:14:28 UTC, Michel Fortin wrote:I wrote something like that a while ago. It only accepted arrays as input because of the lack of a "buffered range" concept that'd allow lookahead and efficient slicing from any kind of range, but that could be retrofitted in. It implements pretty much all of the XML spec, except for documents having an internal subset (which is something a little arcane). It does not deal with namespaces either, I feel like that should be done a layer above, but I'm not entirely sure. Lower-level parser: http://michelf.ca/docs/d/mfr/xmltok.html Higher-level parser built on the first one: http://michelf.ca/docs/d/mfr/xml.html The code: http://michelf.ca/docs/d/mfr-xml-2010-10-19.zip That code hasn't been compiled in a while, but it used to work very well for me. Feel free to use as a starting point.Can you push it to the github, please?
Sep 03 2013
On 2013-09-03 16:11:37 +0000, "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> said:On Thursday, 29 August 2013 at 16:14:28 UTC, Michel Fortin wrote:Good idea. http://github.com/michelf/mfr-xml-d Feel free to send pull requests if you want. I should be able to review them. -- Michel Fortin michel.fortin michelf.ca http://michelf.caI wrote something like that a while ago. It only accepted arrays as input because of the lack of a "buffered range" concept that'd allow lookahead and efficient slicing from any kind of range, but that could be retrofitted in. It implements pretty much all of the XML spec, except for documents having an internal subset (which is something a little arcane). It does not deal with namespaces either, I feel like that should be done a layer above, but I'm not entirely sure. Lower-level parser: http://michelf.ca/docs/d/mfr/xmltok.html Higher-level parser built on the first one: http://michelf.ca/docs/d/mfr/xml.html The code: http://michelf.ca/docs/d/mfr-xml-2010-10-19.zip That code hasn't been compiled in a while, but it used to work very well for me. Feel free to use as a starting point.Can you push it to the github, please?
Sep 03 2013
Am Thu, 29 Aug 2013 09:25:35 +0200 schrieb "w0rp" <devw0rp gmail.com>:Hello everybody. I've been wondering, what are the current plans to replace std.xml? I'd like to help with the effort to get a final XML library in phobos. So, I have a few questions. First, and most importantly, what do we except out of a D XML library? I'd really like to have a discussion of the form, "Here is exactly the interface the structs/classes need to implement, go forth and implement." The general idea in my mind is "something SAX-like, with something a little DOM-like." I'm aware that std.xml has some issues support different encodings, so obvious that's included.I most points here also apply to std.xml: http://wiki.dlang.org/Wish_list/std.json Those are not strict requirements though, I just summarized what I remembered from old discussions.Second, is there an existing library that has gotten close to meeting whatever we need for the first point? If so, how far away is it from being able to meet all of the requirements and become the standard library version?There's a std.xml2 in the review queue: http://wiki.dlang.org/Review_Queue
Aug 29 2013
On 08/29/2013 09:51 AM, Johannes Pfau wrote:I most points here also apply to std.xml: http://wiki.dlang.org/Wish_list/std.json Those are not strict requirements though, I just summarized what I remembered from old discussions.I think, this even extends to access to all semi- and structured-data. Think csv, sql nosql, you name it. Something which deserves a name like Uniform Access. I don't want to care if data is laid out differently. I want to define my struct or class mark the members to fill a pass it to somebodies code and don't want to care if its xml, sql or whatever.
Aug 29 2013
On 2013-08-29 10:15, Robert Schadek wrote:I think, this even extends to access to all semi- and structured-data. Think csv, sql nosql, you name it. Something which deserves a name like Uniform Access. I don't want to care if data is laid out differently. I want to define my struct or class mark the members to fill a pass it to somebodies code and don't want to care if its xml, sql or whatever.So you want serialization :). Which we currently are reviewing. Unfortunately there might be too many changes needed to get it in Phobos this time. -- /Jacob Carlborg
Aug 29 2013
On 08/29/2013 11:09 AM, Jacob Carlborg wrote:So you want serialization :). Which we currently are reviewing. Unfortunately there might be too many changes needed to get it in Phobos this time.well, sort of, but also with partial serialization (think sql update), more transparent interface and I want to define join results types at compile time and and and
Aug 29 2013
On Thursday, 29 August 2013 at 08:15:39 UTC, Robert Schadek wrote:On 08/29/2013 09:51 AM, Johannes Pfau wrote:That's a really great point. All of these modules that can't know the types and structure in advance should probably all use the same techniques for handling the situation. Perhaps a new module to unify all this stuff is in order. I seem to recall Adam D. Ruppe's "Is this D or is this Javascript?" thread[1] having some nice tricks to deal with dynamically typed data. 1. http://forum.dlang.org/thread/kuxfkakrgjaofkrdvgmx forum.dlang.orgI most points here also apply to std.xml: http://wiki.dlang.org/Wish_list/std.json Those are not strict requirements though, I just summarized what I remembered from old discussions.I think, this even extends to access to all semi- and structured-data. Think csv, sql nosql, you name it. Something which deserves a name like Uniform Access. I don't want to care if data is laid out differently. I want to define my struct or class mark the members to fill a pass it to somebodies code and don't want to care if its xml, sql or whatever.
Aug 29 2013
On Thursday, 29 August 2013 at 19:40:08 UTC, Brad Anderson wrote:That's a really great point. All of these modules that can't know the types and structure in advance should probably all use the same techniques for handling the situation. Perhaps a new module to unify all this stuff is in order. I seem to recall Adam D. Ruppe's "Is this D or is this Javascript?" thread[1] having some nice tricks to deal with dynamically typed data. 1. http://forum.dlang.org/thread/kuxfkakrgjaofkrdvgmx forum.dlang.org(or maybe just improve Variant)
Aug 29 2013
On Thursday, 29 August 2013 at 07:25:36 UTC, w0rp wrote:Hello everybody. I've been wondering, what are the current plans to replace std.xml? I'd like to help with the effort to get a final XML library in phobos. So, I have a few questions. First, and most importantly, what do we except out of a D XML library? I'd really like to have a discussion of the form, "Here is exactly the interface the structs/classes need to implement, go forth and implement." The general idea in my mind is "something SAX-like, with something a little DOM-like." I'm aware that std.xml has some issues support different encodings, so obvious that's included. Second, is there an existing library that has gotten close to meeting whatever we need for the first point? If so, how far away is it from being able to meet all of the requirements and become the standard library version?There is http://dsource.org/projects/xmlp, which at some point has been proposed for std.xml2. But that stalled for some time now.
Aug 29 2013
On Thursday, 29 August 2013 at 07:53:46 UTC, Tobias Pankrath wrote:There is http://dsource.org/projects/xmlp, which at some point has been proposed for std.xml2. But that stalled for some time now.Also, we have Tango Xml: https://github.com/SiegeLord/Tango-D2/tree/d2port/tango/text/xml It's the fastest Xml parser in the world, so may be you can find it useful: dotnot.org/blog/archives/2008/03/10/xml-benchmarks-parsequerymutateserialize/ dotnot.org/blog/archives/2008/03/12/why-is-dtango-so-fast-at-parsing-xml/
Aug 31 2013
On 2013-08-31 17:43, ilya-stromberg wrote:Also, we have Tango Xml: https://github.com/SiegeLord/Tango-D2/tree/d2port/tango/text/xml It's the fastest Xml parser in the world, so may be you can find it useful: dotnot.org/blog/archives/2008/03/10/xml-benchmarks-parsequerymutateserialize/ dotnot.org/blog/archives/2008/03/12/why-is-dtango-so-fast-at-parsing-xml/Unfortunately the Tango XML package will never end up in Phobos due to licensing issues. -- /Jacob Carlborg
Aug 31 2013
On Saturday, 31 August 2013 at 18:03:10 UTC, Jacob Carlborg wrote:Unfortunately the Tango XML package will never end up in Phobos due to licensing issues.Yes, but we can always learn source code and put attention to the design solutions.
Sep 01 2013
On Sunday, September 01, 2013 10:02:50 ilya-stromberg wrote:On Saturday, 31 August 2013 at 18:03:10 UTC, Jacob Carlborg wrote:Not really. Looking at the source code effectively taints you. By doing so, you run the risk of being accused of copying if anything you do is similar enough. It's just safer to never look at source code when the license is going to make it so that you can't use that code. - Jonathan M DavisUnfortunately the Tango XML package will never end up in Phobos due to licensing issues.Yes, but we can always learn source code and put attention to the design solutions.
Sep 01 2013
On 2013-08-31 15:43:00 +0000, "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> said:On Thursday, 29 August 2013 at 07:53:46 UTC, Tobias Pankrath wrote:Someone should benchmark it against the XML implementation I made. It has many of the same characteristics. For instance, Tango's SaxParser is based on its PullParser. This design requires the use a dynamic array to maintain a stack of opened elements. While not a huge performance hit, you don't need that if you use recursion, which you can do with my implementation. You can do that even though you can also use it as a pull tokenizer[^1] when needed (recursion is optional on a token-by-token basis). [^1]: IMHO, PullParser isn't a really good term for something that does not conform to the requirements of a parser in the XML spec. Tokenizer is a better term. -- Michel Fortin michel.fortin michelf.ca http://michelf.caThere is http://dsource.org/projects/xmlp, which at some point has been proposed for std.xml2. But that stalled for some time now.Also, we have Tango Xml: https://github.com/SiegeLord/Tango-D2/tree/d2port/tango/text/xml It's the fastest Xml parser in the world, so may be you can find it useful: dotnot.org/blog/archives/2008/03/10/xml-benchmarks-parsequerymutateserialize/ dotnot.org/blog/archives/2008/03/12/why-is-dtango-so-fast-at-parsing-xml/
Aug 31 2013
On 2013-08-31 20:53, Michel Fortin wrote:[^1]: IMHO, PullParser isn't a really good term for something that does not conform to the requirements of a parser in the XML spec. Tokenizer is a better term.I guess "Pull" is the key here. That it is the client's responsibility to fetch the next token, not the other way around. -- /Jacob Carlborg
Aug 31 2013
On Saturday, 31 August 2013 at 18:53:42 UTC, Michel Fortin wrote:On 2013-08-31 15:43:00 +0000, "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> said:Recursion means you use the call stack instead of stack object on the heap. Be careful about nesting deepness. There are XML documents out there with thousands and more nested elements. With recursion on a 32bit machine you might get a stack overflow, but a heap-stack could handle a million nested elements.On Thursday, 29 August 2013 at 07:53:46 UTC, Tobias Pankrath wrote:Someone should benchmark it against the XML implementation I made. It has many of the same characteristics. For instance, Tango's SaxParser is based on its PullParser. This design requires the use a dynamic array to maintain a stack of opened elements. While not a huge performance hit, you don't need that if you use recursion, which you can do with my implementation. You can do that even though you can also use it as a pull tokenizer[^1] when needed (recursion is optional on a token-by-token basis).There is http://dsource.org/projects/xmlp, which at some point has been proposed for std.xml2. But that stalled for some time now.Also, we have Tango Xml: https://github.com/SiegeLord/Tango-D2/tree/d2port/tango/text/xml It's the fastest Xml parser in the world, so may be you can find it useful: dotnot.org/blog/archives/2008/03/10/xml-benchmarks-parsequerymutateserialize/ dotnot.org/blog/archives/2008/03/12/why-is-dtango-so-fast-at-parsing-xml/
Sep 02 2013
On 2013-09-02 13:34:18 +0000, "qznc" <qznc web.de> said:On Saturday, 31 August 2013 at 18:53:42 UTC, Michel Fortin wrote:Good point about caring for pathological cases. -- Michel Fortin michel.fortin michelf.ca http://michelf.caFor instance, Tango's SaxParser is based on its PullParser. This design requires the use a dynamic array to maintain a stack of opened elements. While not a huge performance hit, you don't need that if you use recursion, which you can do with my implementation. You can do that even though you can also use it as a pull tokenizer[^1] when needed (recursion is optional on a token-by-token basis).Recursion means you use the call stack instead of stack object on the heap. Be careful about nesting deepness. There are XML documents out there with thousands and more nested elements. With recursion on a 32bit machine you might get a stack overflow, but a heap-stack could handle a million nested elements.
Sep 02 2013
On 31/08/2013 16:43, ilya-stromberg wrote:It's the fastest Xml parser in the world, so may be you can find it useful: dotnot.org/blog/archives/2008/03/10/xml-benchmarks-parsequerymutateserialize/ dotnot.org/blog/archives/2008/03/12/why-is-dtango-so-fast-at-parsing-xml/Has anyone done any benchmarks recently to see if that is still the case? I did some (admitedly brief) tests last year and found that xmlp was actually faster at building large XML docs into a DOM. There have been lots of changes since then, so i don't know if that is still the case.
Sep 02 2013
On 8/29/13 12:25 AM, w0rp wrote:Hello everybody. I've been wondering, what are the current plans to replace std.xml? I'd like to help with the effort to get a final XML library in phobos. So, I have a few questions. First, and most importantly, what do we except out of a D XML library? I'd really like to have a discussion of the form, "Here is exactly the interface the structs/classes need to implement, go forth and implement." The general idea in my mind is "something SAX-like, with something a little DOM-like." I'm aware that std.xml has some issues support different encodings, so obvious that's included. Second, is there an existing library that has gotten close to meeting whatever we need for the first point? If so, how far away is it from being able to meet all of the requirements and become the standard library version?I don't know much about XML, but I noticed there are a few popular libraries and models. I'd expect a replacement for std.xml would choose one of these popular models that is most appropriate for D. Andrei
Aug 29 2013
On Thursday, August 29, 2013 14:27:22 H. S. Teoh wrote:Right, that's why I said the core of std.xml should handle everything as bytes, only specially treating the ASCII values of <, >, &, and other metacharacters. The tagname and tag body should just be a range over segments of the input.That works especially well with how Michel and I were thinking it should be split up with a core that essentially just gives you a range of XML tokens/tags. You then have separate SAX and/or DOM parsers on top of that (which also should minimize decoding, but they actually have to care about decoding in some cases in order to do stuff like check matching tags). - Jonathan M Davis
Aug 29 2013
On 8/29/2013 12:25 AM, w0rp wrote:Hello everybody. I've been wondering, what are the current plans to replace std.xml? I'd like to help with the effort to get a final XML library in phobos. So, I have a few questions. First, and most importantly, what do we except out of a D XML library? I'd really like to have a discussion of the form, "Here is exactly the interface the structs/classes need to implement, go forth and implement." The general idea in my mind is "something SAX-like, with something a little DOM-like." I'm aware that std.xml has some issues support different encodings, so obvious that's included. Second, is there an existing library that has gotten close to meeting whatever we need for the first point? If so, how far away is it from being able to meet all of the requirements and become the standard library version?The Tango implementation of XML has been very well received. I haven't looked at it, but it was designed to do no memory allocation - it just did slices over the input. I don't believe it should make any attempt at decoding. Decoding entails both performance loss and memory consumption. If the user wants to do decoding, they can layer it on the output. And lastly, it should of course sport a range interface.
Aug 31 2013
On 8/29/13 15:25, w0rp wrote:Hello everybody. I've been wondering, what are the current plans to replace std.xml? I'd like to help with the effort to get a final XML library in phobos. So, I have a few questions. First, and most importantly, what do we except out of a D XML library? I'd really like to have a discussion of the form, "Here is exactly the interface the structs/classes need to implement, go forth and implement." The general idea in my mind is "something SAX-like, with something a little DOM-like." I'm aware that std.xml has some issues support different encodings, so obvious that's included. Second, is there an existing library that has gotten close to meeting whatever we need for the first point? If so, how far away is it from being able to meet all of the requirements and become the standard library version?Having been the lead programmer on the Microsoft XML team for three years, I can easily say that the most popular XML API [on MS stack] is the XmlReader and XLinq in .NET. (This has nothing to do with LINQ, by the way.) I'd be willing to help make D versions of that, but my time is limited. But as usual, I don't think it's the actual coding that will take time. Designing a good interface is the hardest part, and I'd consider that part done. L.
Sep 02 2013
On 03/09/13 12:40, Lionello Lunesu wrote:On 8/29/13 15:25, w0rp wrote:For whoever ends up doing std.xml's replacement, it would be good if some of the lower level interfaces such as encode/decode (for escaping/unescaping within text) were exposed. I'm finding the ones in std.xml useful for implementing markup in label widgets during my investigation into reimplementing the GTK+ (modified) interface in D. Of course, there's always the chance that the new D xml API provides enough to make my markup code redundant. It's possible that the current high level APIs in std.xml also provides enough make my work redundant but I decided not to investigate this possibility after I saw the "will be replaced by something different" warning. Cheers PeterHello everybody. I've been wondering, what are the current plans to replace std.xml? I'd like to help with the effort to get a final XML library in phobos. So, I have a few questions. First, and most importantly, what do we except out of a D XML library? I'd really like to have a discussion of the form, "Here is exactly the interface the structs/classes need to implement, go forth and implement." The general idea in my mind is "something SAX-like, with something a little DOM-like." I'm aware that std.xml has some issues support different encodings, so obvious that's included. Second, is there an existing library that has gotten close to meeting whatever we need for the first point? If so, how far away is it from being able to meet all of the requirements and become the standard library version?Having been the lead programmer on the Microsoft XML team for three years, I can easily say that the most popular XML API [on MS stack] is the XmlReader and XLinq in .NET. (This has nothing to do with LINQ, by the way.) I'd be willing to help make D versions of that, but my time is limited. But as usual, I don't think it's the actual coding that will take time. Designing a good interface is the hardest part, and I'd consider that part done. L.
Sep 02 2013
On 9/2/13 7:40 PM, Lionello Lunesu wrote:Having been the lead programmer on the Microsoft XML team for three years, I can easily say that the most popular XML API [on MS stack] is the XmlReader and XLinq in .NET. (This has nothing to do with LINQ, by the way.) I'd be willing to help make D versions of that, but my time is limited. But as usual, I don't think it's the actual coding that will take time. Designing a good interface is the hardest part, and I'd consider that part done.This is great info, thanks. Andrei
Sep 03 2013