digitalmars.D - Status of std.xml (D2/Phobos)
- Justin Johansson (6/6) Jun 27 2010 May I ask is anybody working on redeveloping std.xml in the D2/Phobos
- Lutger (2/10) Jun 27 2010 Interested, very much so. I think many people are.
- Simen kjaeraas (4/6) Jun 27 2010 Absolutely. It is a necessity.
- Justin Johansson (21/30) Jun 27 2010 Lutger said: Interested, very much so. I think many people are.
- Adam Ruppe (6/6) Jun 27 2010 I'm not terribly interested in it because I already wrote my own
- Justin Johansson (18/25) Jun 27 2010 Thanks Adam for replying. I'm happy to take onboard contra-views
- Adam Ruppe (7/10) Jun 27 2010 Yes, it is very simple, but so is all the XML I've ever actually
- Justin Johansson (24/36) Jun 27 2010 Yeah, I understand where you are coming from; sometimes all you
- Ellery Newcomer (3/23) Jun 27 2010 For the sake of us uninformed spectators, could you give a little taste
- Justin Johansson (31/62) Jun 28 2010 Writing an XML parser in itself is pretty much basic CS101 stuff. The
- Ellery Newcomer (3/14) Jun 29 2010 Sounds ominous. Like you'd need a serious team if you actually wanted to...
- Justin Johansson (8/25) Jun 29 2010 Yep, a serious team armed with a serious programming language that
- Andrei Alexandrescu (23/38) Jun 27 2010 Clearly std.xml can't stay the way it is. I'm even thinking of removing
- Justin Johansson (6/52) Jun 27 2010 Thanks Andrei et. al. I'll get back to the topic after some sleep
- Sean Kelly (2/21) Jun 27 2010 I'd like to cast a vote for a SAX-style parser. A DOM parser can be bui...
- Justin Johansson (22/50) Jun 29 2010 Others in this thread have suggested a preemptive removal of the current
- Lutger (10/32) Jun 27 2010 Well I dare only speak for myself. But perhaps I do can better:
- Justin Johansson (10/46) Jun 27 2010 Yes, understand.
- Lutger (7/14) Jun 27 2010 Don't know really, sorry. From a quick glance though, I would say to sta...
- Jesse Phillips (7/13) Jun 27 2010 XMLP_101327.html#N101327
- Nick Sabalausky (7/27) Jun 27 2010 I'm interested. I've been thinking of porting some of my stuff from D1/T...
- Jacob Carlborg (6/12) Jun 27 2010 I would very much have XML support in Phobos, I think all standard
- jpf (7/16) Jun 27 2010 I would also like to have a better std.xml. I still have an old D1/Tango
- Yao G. (10/17) Jun 27 2010 I did a simple implementation of a pull parser, using this API as
- Steven Schveighoffer (22/29) Jun 28 2010 Did you look at Tango's code in question, or look at their documentation...
- Alix Pexton (35/65) Jun 28 2010 I've not looked at any of the D XML offerings (shame on me?) but I've
- Steven Schveighoffer (14/24) Jun 28 2010 DOM is usually built on top of SAX, so start with the lowest common
- Alix Pexton (6/10) Jun 29 2010 I've been thinking about it, and while I believe you when you say that
- Michel Fortin (24/37) Jun 29 2010 It is closer to the metal, but there's a catch...
- Alix Pexton (18/52) Jun 29 2010 My understanding was that SAX _doesn't_ check those things either and
- BLS (6/16) Jun 28 2010 Hi Steve,
- Bernard Helyer (3/12) Jun 27 2010 std.xml needs to be replaced, but I personally don't much care as kxml
- Joe Hildebrand (6/10) Jun 28 2010 Agree, with one additional requirement: the ability to throw random chun...
- Michel Fortin (21/27) Jun 28 2010 I have made my own parser, comprised of a tokenizer and a mini DOM layer...
- Andrei Alexandrescu (6/36) Jun 28 2010 I think a tokenizer should be a higher-order range that is fed an input
- Michel Fortin (24/35) Jun 28 2010 And I've implemented a tokenizer range just like you describe on top of
- lurker (3/3) Jun 28 2010 I'm very interested.
May I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin Johansson
Jun 27 2010
Justin Johansson wrote:May I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin JohanssonInterested, very much so. I think many people are.
Jun 27 2010
Justin Johansson <no spam.com> wrote:Also what is the level of interest from library users for decent XML support in D2/Phobos?Absolutely. It is a necessity. -- Simen
Jun 27 2010
Justin Johansson wrote:May I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin JohanssonLutger said: Interested, very much so. I think many people are. Simen said: Absolutely. It is a necessity. Thanks for fast replies from Lutger and Simen. Being an XML/W3C addict myself I well concur with Simen and Lutger's sentiments. However, Lutger simply saying that he "thinks" *many* people are interested is not good enough for me. Would the *many* people also please add a voice along with Simen et. al. to inspire me to contribute to the D2 XML effort (of course under a Walter-endorsed style of licence). Brief about me: I believe I possess the skills and experience in XML/XSLT & other W3C stuff together with 20+ years as a C++ developer to contribute good, peer-reviewable work; It's just that I need inspiration to put this task into action. The only other thing is that any offer on my part is conditional upon obtaining a "sabbatical break", hopefully in the next month or so to be able to put in the time to make it happen. Cheers Justin Johansson
Jun 27 2010
I'm not terribly interested in it because I already wrote my own replacement: http://arsdnet.net/dcode/dom.d Mine is biased toward HTML, doing what I personally find useful, or mimicing what javascript in the browser would do instead of following the standard, but if there's anything in there that is useful to others, you're free to take it.
Jun 27 2010
Adam Ruppe wrote:I'm not terribly interested in it because I already wrote my own replacement: http://arsdnet.net/dcode/dom.d Mine is biased toward HTML, doing what I personally find useful, or mimicing what javascript in the browser would do instead of following the standard, but if there's anything in there that is useful to others, you're free to take it.Thanks Adam for replying. I'm happy to take onboard contra-views such as yours as well. Naturally it is no point in putting in an effort wherein there is no interest at large. Still, I'll wait for more replies on this ng before making any decision whether or not to commit myself to a new "D2 XML" effort. btw. I feel it fair to add conjecture that a DOM implementation is pretty basic stuff and that a complete XML ecosystem it much larger than just this (i.e. an in-memory DOM). There are all sorts of abstractions (Andrei read ranges) and modeling that would form part of what I believe would be a major work, and one possibly even bigger than what one person like myself could ever hope to achieve. Of course, the mammoth effort by Michael Kay in producing the Saxon (Java-based) XSLT processor is a feat that few others will ever overshadow. Cheers Justin Johansson
Jun 27 2010
On 6/27/10, Justin Johansson <no spam.com> wrote:btw. I feel it fair to add conjecture that a DOM implementation is pretty basic stuff and that a complete XML ecosystem it much larger than just this (i.e. an in-memory DOM).Yes, it is very simple, but so is all the XML I've ever actually encountered. I've seen ugly, convoluted HTML and I've seen name/value pairs in verbose XML format, but very very little in the middle. (Heck, I just used std.string.indexOf("<tagname") for quite a while.) This is probably due to my observation bias, with all my XML experience coming from working with web services.
Jun 27 2010
Adam Ruppe wrote:On 6/27/10, Justin Johansson <no spam.com> wrote:Yeah, I understand where you are coming from; sometimes all you need is some simple DOM stuff which you can hack out yourself in a few hours. OTOH, there are some really significant W3C specs that you may or may not be aware of and these are really difficult to implement in regular imperative languages like C/C++ and Java. Java, being all that is the following of Java I guess, has had the most success in implementing these specs. IMHO, the two most fundamental and significant W3C specs that D libraries could well address are as follows. These form a large amount of the (formal) XML ecosystem. XML Schema Part 2: Datatypes Second Edition http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/ and XQuery 1.0 and XPath 2.0 Data Model (XDM) http://www.w3.org/TR/2007/REC-xpath-datamodel-20070123/ I can tell you for sure that XPath 2.0, which is the basis for XSLT 2.0 and XQuery 1.0, is truly a challenge to implement in languages like C++ and Java. Others have succeeded with implementations in languages like Eiffel. I would hope though, that D2 would be up to the task (is that is wishful thinking?). Cheers Justin Johanssonbtw. I feel it fair to add conjecture that a DOM implementation is pretty basic stuff and that a complete XML ecosystem it much larger than just this (i.e. an in-memory DOM).Yes, it is very simple, but so is all the XML I've ever actually encountered. I've seen ugly, convoluted HTML and I've seen name/value pairs in verbose XML format, but very very little in the middle. (Heck, I just used std.string.indexOf("<tagname") for quite a while.) This is probably due to my observation bias, with all my XML experience coming from working with web services.
Jun 27 2010
On 06/27/2010 10:16 AM, Justin Johansson wrote:OTOH, there are some really significant W3C specs that you may or may not be aware of and these are really difficult to implement in regular imperative languages like C/C++ and Java. Java, being all that is the following of Java I guess, has had the most success in implementing these specs. IMHO, the two most fundamental and significant W3C specs that D libraries could well address are as follows. These form a large amount of the (formal) XML ecosystem. XML Schema Part 2: Datatypes Second Edition http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/ and XQuery 1.0 and XPath 2.0 Data Model (XDM) http://www.w3.org/TR/2007/REC-xpath-datamodel-20070123/ I can tell you for sure that XPath 2.0, which is the basis for XSLT 2.0 and XQuery 1.0, is truly a challenge to implement in languages like C++ and Java. Others have succeeded with implementations in languages like Eiffel. I would hope though, that D2 would be up to the task (is that is wishful thinking?). Cheers Justin JohanssonFor the sake of us uninformed spectators, could you give a little taste of the challenges to which you refer?
Jun 27 2010
Ellery Newcomer wrote:On 06/27/2010 10:16 AM, Justin Johansson wrote:Writing an XML parser in itself is pretty much basic CS101 stuff. The tough challenges come with implementing the other W3C specs in the XML ecosystem, such as XSchema and XPath 2.0 for reason that these are such humongous and complex beasts. An XSchema implementation forms the basis for writing an XML content validator and that's a pretty important tool to have for a lot of XML processing. An XPath 2.0 implementation forms the core of XSLT 2.0 and XQuery which are XML transformation languages. Again these are very useful tools. The most successful implementations of XSchema and XPath 2.0 are written in Java. This is probably mostly due to the widespread popularity of Java and there being very many open source volunteers to do the grunt. If you look at any of the Java sources for these XML projects, you will be astounded just how big they are, like the Saxon Java XSLT processor by Michael Kay for example*. Of course you will be secretly thinking to yourself that the size these works would be considerably smaller if they were written in D :-) (*Michael Kay has spent the last ten years working on it.) In the C++ world of Qt, there is the Qt XmlPatterns library which implements XPath 2.0 which is also quite sizable and currently incomplete (implementing only about 70% of the W3C spec) and there are a whole bunch of (former TrollTech?) people at Nokia working on it, again demonstrating that implementing these W3C specs is no simple feat. If you are really interested, try downloading a copy of the Qt source from Nokia and take a look at the C++ code in the XmlPatterns library. From that you will surely get more than just a taste of the challenges, you will get a whole mouthful! :-) http://qt.nokia.com/downloads/ Cheers Justin JohanssonOTOH, there are some really significant W3C specs that you may or may not be aware of and these are really difficult to implement in regular imperative languages like C/C++ and Java. Java, being all that is the following of Java I guess, has had the most success in implementing these specs. IMHO, the two most fundamental and significant W3C specs that D libraries could well address are as follows. These form a large amount of the (formal) XML ecosystem. XML Schema Part 2: Datatypes Second Edition http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/ and XQuery 1.0 and XPath 2.0 Data Model (XDM) http://www.w3.org/TR/2007/REC-xpath-datamodel-20070123/ I can tell you for sure that XPath 2.0, which is the basis for XSLT 2.0 and XQuery 1.0, is truly a challenge to implement in languages like C++ and Java. Others have succeeded with implementations in languages like Eiffel. I would hope though, that D2 would be up to the task (is that is wishful thinking?). Cheers Justin JohanssonFor the sake of us uninformed spectators, could you give a little taste of the challenges to which you refer?
Jun 28 2010
On 06/28/2010 08:13 AM, Justin Johansson wrote:If you look at any of the Java sources for these XML projects, you will be astounded just how big they are, like the Saxon Java XSLT processor by Michael Kay for example*. Of course you will be secretly thinking to yourself that the size these works would be considerably smaller if they were written in D :-) (*Michael Kay has spent the last ten years working on it.) In the C++ world of Qt, there is the Qt XmlPatterns library which implements XPath 2.0 which is also quite sizable and currently incomplete (implementing only about 70% of the W3C spec) and there are a whole bunch of (former TrollTech?) people at Nokia working on it, again demonstrating that implementing these W3C specs is no simple feat.Sounds ominous. Like you'd need a serious team if you actually wanted to do any of this stuff.
Jun 29 2010
Ellery Newcomer wrote:On 06/28/2010 08:13 AM, Justin Johansson wrote:Yep, a serious team armed with a serious programming language that is capable of realizing an event horizon to explode the singularity that is the black hole of the W3C specs and ultimately creating works of shear beauty et ordo ab chao (and order out of chaos). D2? :-) Cheers Justin JohanssonIf you look at any of the Java sources for these XML projects, you will be astounded just how big they are, like the Saxon Java XSLT processor by Michael Kay for example*. Of course you will be secretly thinking to yourself that the size these works would be considerably smaller if they were written in D :-) (*Michael Kay has spent the last ten years working on it.) In the C++ world of Qt, there is the Qt XmlPatterns library which implements XPath 2.0 which is also quite sizable and currently incomplete (implementing only about 70% of the W3C spec) and there are a whole bunch of (former TrollTech?) people at Nokia working on it, again demonstrating that implementing these W3C specs is no simple feat.Sounds ominous. Like you'd need a serious team if you actually wanted to do any of this stuff.
Jun 29 2010
Justin Johansson wrote:Adam Ruppe wrote:Clearly std.xml can't stay the way it is. I'm even thinking of removing it preemptively in wait for another implementation. If you want to work on something you enjoy, it seems like std.xml is a good choice. If you want to work on the top most important item, probably networking would come ahead. We badly need http and ftp streaming libraries. I'm thinking libcurl would be a good choice as a backend (not interface). For D integration, it would be great to integrate networking with std.stdio.File - e.g. creating File("http://xyz.org") would just connect to the thing and allow streaming, ranges, everything. Adam Ruppe has a lower-level networking protocol that also hooks into std.stdio.File, which would be very important to have too. But then it's often better to work on what you like, so don't look for a landslide vote. Ford didn't work on a faster horse etc. Some things that would be good to have in an xml library: - should work with input ranges (not only strings) - use aliases as lambdas if needed (std.xml's use of lambdas is nice, just very slow) - define templates for char, wchar, and dchar and then define one working with ranges of ubyte that dispatches depending on the encoding tag found. AndreiI'm not terribly interested in it because I already wrote my own replacement: http://arsdnet.net/dcode/dom.d Mine is biased toward HTML, doing what I personally find useful, or mimicing what javascript in the browser would do instead of following the standard, but if there's anything in there that is useful to others, you're free to take it.Thanks Adam for replying. I'm happy to take onboard contra-views such as yours as well. Naturally it is no point in putting in an effort wherein there is no interest at large. Still, I'll wait for more replies on this ng before making any decision whether or not to commit myself to a new "D2 XML" effort.
Jun 27 2010
Andrei Alexandrescu wrote:Justin Johansson wrote:Thanks Andrei et. al. I'll get back to the topic after some sleep and another day at the office tomorrow; it's way after the witching hour now in my neck of the woods. Cheers, JustinAdam Ruppe wrote:Clearly std.xml can't stay the way it is. I'm even thinking of removing it preemptively in wait for another implementation. If you want to work on something you enjoy, it seems like std.xml is a good choice. If you want to work on the top most important item, probably networking would come ahead. We badly need http and ftp streaming libraries. I'm thinking libcurl would be a good choice as a backend (not interface). For D integration, it would be great to integrate networking with std.stdio.File - e.g. creating File("http://xyz.org") would just connect to the thing and allow streaming, ranges, everything. Adam Ruppe has a lower-level networking protocol that also hooks into std.stdio.File, which would be very important to have too. But then it's often better to work on what you like, so don't look for a landslide vote. Ford didn't work on a faster horse etc. Some things that would be good to have in an xml library: - should work with input ranges (not only strings) - use aliases as lambdas if needed (std.xml's use of lambdas is nice, just very slow) - define templates for char, wchar, and dchar and then define one working with ranges of ubyte that dispatches depending on the encoding tag found. AndreiI'm not terribly interested in it because I already wrote my own replacement: http://arsdnet.net/dcode/dom.d Mine is biased toward HTML, doing what I personally find useful, or mimicing what javascript in the browser would do instead of following the standard, but if there's anything in there that is useful to others, you're free to take it.Thanks Adam for replying. I'm happy to take onboard contra-views such as yours as well. Naturally it is no point in putting in an effort wherein there is no interest at large. Still, I'll wait for more replies on this ng before making any decision whether or not to commit myself to a new "D2 XML" effort.
Jun 27 2010
Andrei Alexandrescu Wrote:Justin Johansson wrote:I'd like to cast a vote for a SAX-style parser. A DOM parser can be built on top of it, and frankly, a SAX parser the only kind I'd ever use. I'm either working with large streams where building a tree is impractical, or performance is enough of an issue that again, building a tree is impractical. I have similar feelings about the JSON parser despite it being a pretty solid implementation otherwise. I'd contribute one if I could, but I did one for work and it just isn't worth the administrative hassle.Adam Ruppe wrote:Clearly std.xml can't stay the way it is. I'm even thinking of removing it preemptively in wait for another implementation.I'm not terribly interested in it because I already wrote my own replacement: http://arsdnet.net/dcode/dom.d Mine is biased toward HTML, doing what I personally find useful, or mimicing what javascript in the browser would do instead of following the standard, but if there's anything in there that is useful to others, you're free to take it.Thanks Adam for replying. I'm happy to take onboard contra-views such as yours as well. Naturally it is no point in putting in an effort wherein there is no interest at large. Still, I'll wait for more replies on this ng before making any decision whether or not to commit myself to a new "D2 XML" effort.
Jun 27 2010
Andrei Alexandrescu wrote:Justin Johansson wrote: Clearly std.xml can't stay the way it is. I'm even thinking of removing it preemptively in wait for another implementation.Others in this thread have suggested a preemptive removal of the current std.xml incarnation also. Please add my vote to this in agreement that it *must* go. It's current state is well beyond absolutely shocking and only serves to bring D into disrepute. It would be much better to say, "sorry, D does not have a standard XML library yet"*** and ask for help rather than leaving things as they are. ***Reminds me of an old saying, "it's better to keep your mouth shut and appear to be an idiot than to open it and remove all doubt". ( Of course, I do confess to opening mine once or twice too often ;-) ) Translated to std.xml, this means better not to have it at all rather than have what we currently have.If you want to work on something you enjoy, it seems like std.xml is a good choice. If you want to work on the top most important item, probably networking would come ahead. We badly need http and ftp streaming libraries. I'm thinking libcurl would be a good choice as a backend (not interface). For D integration, it would be great to integrate networking with std.stdio.File - e.g. creating File("http://xyz.org") would just connect to the thing and allow streaming, ranges, everything. Adam Ruppe has a lower-level networking protocol that also hooks into std.stdio.File, which would be very important to have too.Sure I do enjoy XML ecosystem stuff and by that I mean well beyond just simple parsing. OTOH, streaming libraries for http and ftp are an absolute necessity to underpin industrial strength all things both Unicode and XML so I can see why you put this at the top of the wish list. Robust streaming should have support not only for all popular network protocols but content (character) encodings as well. Since this thread has promoted a lot of ideas about std.xml, I think we would do well to start a new thread on streaming. Cheers Justin JohanssonBut then it's often better to work on what you like, so don't look for a landslide vote. Ford didn't work on a faster horse etc. Some things that would be good to have in an xml library: - should work with input ranges (not only strings) - use aliases as lambdas if needed (std.xml's use of lambdas is nice, just very slow) - define templates for char, wchar, and dchar and then define one working with ranges of ubyte that dispatches depending on the encoding tag found. Andrei
Jun 29 2010
Justin Johansson wrote:Justin Johansson wrote:Well I dare only speak for myself. But perhaps I do can better: Time and again complaints about std.xml surface, so that is one indicator. See this google query: http://www.google.nl/search?q=site%3Awww.digitalmars.com%2Fpnews+std.xml And this proposal to replace std.xml with kxml: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=109646 Another point to consider is that dsource.org alone contains at least 5 xml projects (that I can see), some long abandoned but some seem to be active. There is also this code from Adam Ruppe: http://arsdnet.net/dcode/dom.dMay I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin JohanssonLutger said: Interested, very much so. I think many people are. Simen said: Absolutely. It is a necessity. Thanks for fast replies from Lutger and Simen. Being an XML/W3C addict myself I well concur with Simen and Lutger's sentiments. However, Lutger simply saying that he "thinks" *many* people are interested is not good enough for me.
Jun 27 2010
Lutger wrote:Justin Johansson wrote:Yes, understand. One wonders how many people have been put off D (and perhaps not to return) given such a large volume of hits under that URL, namely http://www.google.nl/search?q=site%3Awww.digitalmars.com%2Fpnews+std.xml On your other point about the 5+ XML projects on dsource.org, in your opinion, which of these do you think have the most promise, or at least a good grounding from which to start over? Naturally I don't expect you to waste your time looking into these 5+ project; just if you happen to know.Justin Johansson wrote:Well I dare only speak for myself. But perhaps I do can better: Time and again complaints about std.xml surface, so that is one indicator. See this google query: http://www.google.nl/search?q=site%3Awww.digitalmars.com%2Fpnews+std.xml And this proposal to replace std.xml with kxml: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=109646 Another point to consider is that dsource.org alone contains at least 5 xml projects (that I can see), some long abandoned but some seem to be active. There is also this code from Adam Ruppe: http://arsdnet.net/dcode/dom.dMay I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin JohanssonLutger said: Interested, very much so. I think many people are. Simen said: Absolutely. It is a necessity. Thanks for fast replies from Lutger and Simen. Being an XML/W3C addict myself I well concur with Simen and Lutger's sentiments. However, Lutger simply saying that he "thinks" *many* people are interested is not good enough for me.
Jun 27 2010
Justin Johansson wrote: ...On your other point about the 5+ XML projects on dsource.org, in your opinion, which of these do you think have the most promise, or at least a good grounding from which to start over? Naturally I don't expect you to waste your time looking into these 5+ project; just if you happen to know.Don't know really, sorry. From a quick glance though, I would say to start looking into xmlp and Adam Ruppe's code. xmlp even has conformance tests, perhaps you can work together with the author? http://www.dsource.org/projects/xmlp http://www.digitalmars.com/d/archives/digitalmars/D/XMLP_101327.html#N101327
Jun 27 2010
On Sun, 27 Jun 2010 16:55:56 +0200, Lutger wrote:Don't know really, sorry. From a quick glance though, I would say to start looking into xmlp and Adam Ruppe's code. xmlp even has conformance tests, perhaps you can work together with the author? http://www.dsource.org/projects/xmlp http://www.digitalmars.com/d/archives/digitalmars/D/XMLP_101327.html#N101327 I needed a simple library for parsing XML and started using xmlp since I had too many workarounds in std.xml. I emailed the author awhile back about some possible changes (complained about namespaces when reading, I believe, a docx file. So I disabled the check). But I haven't heard back from him.
Jun 27 2010
"Justin Johansson" <no spam.com> wrote in message news:i07jpn$tt$1 digitalmars.com...Justin Johansson wrote:I'm interested. I've been thinking of porting some of my stuff from D1/Tango to D2/Phobos (*not* because of political reasons or anything against Tango or any Tango team member), so anything that I'm using from Tango that doesn't have a good Phobos equivilent would be a roadblock. XML (reading) is one of those things.May I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin JohanssonLutger said: Interested, very much so. I think many people are. Simen said: Absolutely. It is a necessity. Thanks for fast replies from Lutger and Simen. Being an XML/W3C addict myself I well concur with Simen and Lutger's sentiments. However, Lutger simply saying that he "thinks" *many* people are interested is not good enough for me. Would the *many* people also please add a voice along with Simen et. al. to inspire me to contribute to the D2 XML effort (of course under a Walter-endorsed style of licence).
Jun 27 2010
On 2010-06-27 12:34, Justin Johansson wrote:May I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin JohanssonI would very much have XML support in Phobos, I think all standard libraries should have that. I'm currently working on porting the XML archive of my serialization library to D2, so I need XML support in Phobos. -- /Jacob Carlborg
Jun 27 2010
On 27.06.2010 12:34, Justin Johansson wrote:May I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin JohanssonI would also like to have a better std.xml. I still have an old D1/Tango project I wanted to port to D2/Phobos but as long as there is no good XML library for D2 (and totally unrelated: a stable network api) that wouldn't make sense. -- Johannes Pfau
Jun 27 2010
I did a simple implementation of a pull parser, using this API as reference: http://xmlpull.org/ But I used a iterator similar to the one used by Steve (from dcollections) to parse the doc. It turns out that Tango did something similar first (using iterator to parse the document), and seeing the debacle caused by the Date module, I think it would be a bad idea to release it. Yao G. On Sun, 27 Jun 2010 05:34:30 -0500, Justin Johansson <no spam.com> wrote:May I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin Johansson-- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Jun 27 2010
On Sun, 27 Jun 2010 14:56:21 -0400, Yao G. <nospamyao gmail.com> wrote:I did a simple implementation of a pull parser, using this API as reference: http://xmlpull.org/ But I used a iterator similar to the one used by Steve (from dcollections) to parse the doc. It turns out that Tango did something similar first (using iterator to parse the document), and seeing the debacle caused by the Date module, I think it would be a bad idea to release it.Did you look at Tango's code in question, or look at their documentation? If not, then you are fine. I think any implementation is going to have to at least try to use ranges or show why they are not a good idea for xml, since Andrei is set on using ranges for everything. BTW, I've not used std.xml or tango's xml, but I agree that an xml library is a very important part of today's standard libraries. Having xml in the standard allows for so much usage of it in many other places (serialization comes to mind immediately). If std.xml is bad (which I've heard from several independent people), then throw it out and make something new. I myself have tried to think of how xml can be done with ranges, but I believe one of the key elements is it has to parse xml without loading the entire document to be efficient enough for some applications. A DOM style parser which presents a range interface is probably fine, but a lazy interface would be the best. Since XML is a tree style, you need a range which allows moving down the tree. You almost need a stacking range which can move down the tree and also to the next sibling element. Ideally, the library should do as much as possible without allocating anything but buffer space to read data. -Steve
Jun 28 2010
On 28/06/2010 13:04, Steven Schveighoffer wrote:On Sun, 27 Jun 2010 14:56:21 -0400, Yao G. <nospamyao gmail.com> wrote:I've not looked at any of the D XML offerings (shame on me?) but I've been having a bit of a look at the types of API that are available in other languages, and there seems to be 3... Event based a la SAX Stream based a la StAX Tree based a la "the" DOM The simple conclusion that I have drawn is that the is no one-size-fits-all solution, and that it would therefore be a mistake to put all effort into supporting only one. (However, ranges do seem to match up quite nicely with the way that the Stream based APIs operate.) It would seem to me most logical to consider the many varied use-cases and build a core API upon which all 3 types of XML processor can be built (or at least specify a core set of types to be used by all 3), rather than focus on implementing one particular style. Interoperability of all 3 styles would then be possible and perhaps facilitate the later implementation of higher abstractions (such as XPath and XQuery). I think it is also important to remember that there are at least 4 different stages to processing XML (reading, validating, mutating, writing) and that many programming tasks allow one or more of these aspects to be ignored. This can mean that one programmer is blinded to the requirements of another in a different domain because the ways in which they work with XML either overlap only partially or not at all. I've never used anything like SAX myself, though I have used the DOM quite a lot, and spent most of the time wishing it worked a bit more like StAX (even though I hadn't heard of StAX at the time ^^). What ever is done for D, it should allow programmers to work with XML in a way that is familiar to them and compatible with what others do. Memory should be used conservatively, and reprocessing (parsing the same portion of a document multiple times) should be minimised. Most importantly, the implementation should be D-ey, rather that the abstraction used in any other language's most favoured solution, shoehorned into a D-shaped box. A... (whose 2 cents are worth no more or no less than anyone else's.)I did a simple implementation of a pull parser, using this API as reference: http://xmlpull.org/ But I used a iterator similar to the one used by Steve (from dcollections) to parse the doc. It turns out that Tango did something similar first (using iterator to parse the document), and seeing the debacle caused by the Date module, I think it would be a bad idea to release it.Did you look at Tango's code in question, or look at their documentation? If not, then you are fine. I think any implementation is going to have to at least try to use ranges or show why they are not a good idea for xml, since Andrei is set on using ranges for everything. BTW, I've not used std.xml or tango's xml, but I agree that an xml library is a very important part of today's standard libraries. Having xml in the standard allows for so much usage of it in many other places (serialization comes to mind immediately). If std.xml is bad (which I've heard from several independent people), then throw it out and make something new. I myself have tried to think of how xml can be done with ranges, but I believe one of the key elements is it has to parse xml without loading the entire document to be efficient enough for some applications. A DOM style parser which presents a range interface is probably fine, but a lazy interface would be the best. Since XML is a tree style, you need a range which allows moving down the tree. You almost need a stacking range which can move down the tree and also to the next sibling element. Ideally, the library should do as much as possible without allocating anything but buffer space to read data. -Steve
Jun 28 2010
On Mon, 28 Jun 2010 09:59:45 -0400, Alix Pexton <alix.DOT.pexton gmail.dot.com> wrote:I've never used anything like SAX myself, though I have used the DOM quite a lot, and spent most of the time wishing it worked a bit more like StAX (even though I hadn't heard of StAX at the time ^^).DOM is usually built on top of SAX, so start with the lowest common denominator.What ever is done for D, it should allow programmers to work with XML in a way that is familiar to them and compatible with what others do. Memory should be used conservatively, and reprocessing (parsing the same portion of a document multiple times) should be minimised.Parsing multiple times should be minimized, but more important than that, allocations should be minimal. Nothing kills a good parsing/input algorithm's performance in D than overuse of the GC. Tango goes as far as having you pass in stack buffers to avoid even allocating buffers (not sure about it's xml lib, but knowing the rest of the lib, probably), I don't think std.xml has to go that far.Most importantly, the implementation should be D-ey, rather that the abstraction used in any other language's most favoured solution, shoehorned into a D-shaped box.Yes, I don't think the phobos solution needs to mimic exactly the API of SAX or DOM, the author should be free to use D idioms. But starting with a common proven design is probably a good idea. -Steve
Jun 28 2010
On 28/06/2010 15:11, Steven Schveighoffer wrote:Yes, I don't think the phobos solution needs to mimic exactly the API of SAX or DOM, the author should be free to use D idioms. But starting with a common proven design is probably a good idea. -SteveI've been thinking about it, and while I believe you when you say that SAX can be used to build the DOM, I'm not convinced that SAX is the lowest common abstraction. Michel Fortin's Tokenizer/Range seems much closer to the metal to me. A...
Jun 29 2010
On 2010-06-29 04:41:50 -0400, Alix Pexton <alix.DOT.pexton gmail.DOT.com> said:On 28/06/2010 15:11, Steven Schveighoffer wrote:It is closer to the metal, but there's a catch... One issue with SAX is that you must allocate an array of strings to pass the attributes of an element, which is probably going to need a dynamic allocation at some point. A lower-level abstraction such as mine (or Tango's pull-parser) just returns each attribute as a separate token as it parses them. The downside of the tokenizer interface is that it only checks for a subset of well-formness, for instance it doesn't check that tags balance each other correctly or that there is no two attributes with the same name. It's just a "tokenizer" after all, it can't be described as a conformant XML parser by itself. The upper layer parser needs to check for these things. My mini DOM built on this tokenizer does these checks when using the tokenizer, and it's more efficient to do them there because that's where the context information is kept, which is why the tokenizer doesn't do them. Implementing SAX on top of my tokenizer consists mostly of ensuring proper tag balancing, checking for duplicate attributes, and collecting attributes in an array (or another kind of list) you can then give to the openElement SAX callback. -- Michel Fortin michel.fortin michelf.com http://michelf.com/Yes, I don't think the phobos solution needs to mimic exactly the API of SAX or DOM, the author should be free to use D idioms. But starting with a common proven design is probably a good idea. -SteveI've been thinking about it, and while I believe you when you say that SAX can be used to build the DOM, I'm not convinced that SAX is the lowest common abstraction. Michel Fortin's Tokenizer/Range seems much closer to the metal to me.
Jun 29 2010
On 29/06/2010 13:27, Michel Fortin wrote:On 2010-06-29 04:41:50 -0400, Alix Pexton <alix.DOT.pexton gmail.DOT.com> said:My understanding was that SAX _doesn't_ check those things either and that it was up to the code responding to the events to tackle wellformedness. After all, if SAX handled wellformedness, there would be no need for it to pass an argument to closeElement to state what element was being closed. SAX has its place though, when it comes to doing a single pass filter on a stream of XML that can be assumed to be wellformed, its simplicity is admittedly hard to beat. In other applications, however, there is much room for improvement. SAXplus, with a built in element memoisation, an element stack and a used id list sounds quite useful to me, as long as they remain optional of course. Admittedly, my initial disappointment when looking into SAX means that it is something that I have not followed for some time. Hmn, I suddenly just got nostalgic for the days when XML was all shiney and new and everyone was writing their own APIs or butchering old SGML/HTML tech. Makes me want to go look at my old code ^^ A...On 28/06/2010 15:11, Steven Schveighoffer wrote:It is closer to the metal, but there's a catch... One issue with SAX is that you must allocate an array of strings to pass the attributes of an element, which is probably going to need a dynamic allocation at some point. A lower-level abstraction such as mine (or Tango's pull-parser) just returns each attribute as a separate token as it parses them. The downside of the tokenizer interface is that it only checks for a subset of well-formness, for instance it doesn't check that tags balance each other correctly or that there is no two attributes with the same name. It's just a "tokenizer" after all, it can't be described as a conformant XML parser by itself. The upper layer parser needs to check for these things. My mini DOM built on this tokenizer does these checks when using the tokenizer, and it's more efficient to do them there because that's where the context information is kept, which is why the tokenizer doesn't do them. Implementing SAX on top of my tokenizer consists mostly of ensuring proper tag balancing, checking for duplicate attributes, and collecting attributes in an array (or another kind of list) you can then give to the openElement SAX callback.Yes, I don't think the phobos solution needs to mimic exactly the API of SAX or DOM, the author should be free to use D idioms. But starting with a common proven design is probably a good idea. -SteveI've been thinking about it, and while I believe you when you say that SAX can be used to build the DOM, I'm not convinced that SAX is the lowest common abstraction. Michel Fortin's Tokenizer/Range seems much closer to the metal to me.
Jun 29 2010
On 28/06/2010 14:04, Steven Schveighoffer wrote:I myself have tried to think of how xml can be done with ranges, but I believe one of the key elements is it has to parse xml without loading the entire document to be efficient enough for some applications. A DOM style parser which presents a range interface is probably fine, but a lazy interface would be the best. Since XML is a tree style, you need a range which allows moving down the tree. You almost need a stacking range which can move down the tree and also to the next sibling element. Ideally, the library should do as much as possible without allocating anything but buffer space to read data. -SteveHi Steve, Philippe Sigaud has written a very interesting lib. called dranges. http://www.dsource.org/projects/dranges/wiki I think treerange.d and graphrange.d are an excellent source of inspiration. Bjoern
Jun 28 2010
On Sun, 27 Jun 2010 20:04:30 +0930, Justin Johansson wrote:May I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos? Cheers Justin Johanssonstd.xml needs to be replaced, but I personally don't much care as kxml fits my needs nicely: http://opticron.no-ip.org/svn/branches/kxml
Jun 27 2010
On 6/27/10 8:37 PM, "Sean Kelly" <sean invisibleduck.org> wrote:I'd like to cast a vote for a SAX-style parser. A DOM parser can be built on top of it, and frankly, a SAX parser the only kind I'd ever use. I'm either working with large streams where building a tree is impractical, or performance is enough of an issue that again, building a tree is impractical.Agree, with one additional requirement: the ability to throw random chunks of bytes at the parser whenever I like, along the lines of Expat or XP. This is a must for dealing with a stream of XML like XMPP. -- Joe Hildebrand
Jun 28 2010
On 2010-06-27 07:04:30 -0400, Justin Johansson <no spam.com> said:May I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos?I have made my own parser, comprised of a tokenizer and a mini DOM layer. I'm not sure how to qualify the tokenizer: it's mainly based on callbacks like an event parser, but a callback can decide to stop the parsing process and return to the original caller of the tokenizer (which can later restart parsing), it can choose to continue parsing the next token, or to recursively continue to run the parser using a different set of callbacks. From there it's trivial to efficiently implement a pull parser or a SAX parser, but the way callbacks can recursively call the tokenizer allows greater flexibility than those two models. The mini DOM I've made is based on this tokenizer, but is quite ordinary in comparison. Here's the generated documentation: http://michelf.com/docs/d/mfr/xmltok.html http://michelf.com/docs/d/mfr/xml.html I'm slowly revamping it to use ranges instead of strings. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jun 28 2010
Michel Fortin wrote:On 2010-06-27 07:04:30 -0400, Justin Johansson <no spam.com> said:I think a tokenizer should be a higher-order range that is fed an input range of ubyte, char, wchar, or dchar (so that would be a type parameter) and is itself a range of Tokens that include the token type, token value etc. AndreiMay I ask is anybody working on redeveloping std.xml in the D2/Phobos library? (Currently it looks like it needs to be started over from scratch) Also what is the level of interest from library users for decent XML support in D2/Phobos?I have made my own parser, comprised of a tokenizer and a mini DOM layer. I'm not sure how to qualify the tokenizer: it's mainly based on callbacks like an event parser, but a callback can decide to stop the parsing process and return to the original caller of the tokenizer (which can later restart parsing), it can choose to continue parsing the next token, or to recursively continue to run the parser using a different set of callbacks. From there it's trivial to efficiently implement a pull parser or a SAX parser, but the way callbacks can recursively call the tokenizer allows greater flexibility than those two models. The mini DOM I've made is based on this tokenizer, but is quite ordinary in comparison. Here's the generated documentation: http://michelf.com/docs/d/mfr/xmltok.html http://michelf.com/docs/d/mfr/xml.html I'm slowly revamping it to use ranges instead of strings.
Jun 28 2010
On 2010-06-28 14:27:13 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:And I've implemented a tokenizer range just like you describe on top of my tokenizer function. Look at the documentation for mfr.xmltok.XMLForwardRange. (I should probably rename it to XMLTokenRange.) Personally, I prefer to use the callback approach which automatically calls the right function according to the token type. But what's nice about my tokenizer is that you can do both callbacks and pull-style tokenization (the later can be wrapped in a range), and mix these approaches together as needed. What is missing is taking arbitrary ranges as input (it deals with strings currently). Strings are like the optimized case for tokenization because you don't have to dynamically allocate anything: referencing the original string is enough when making substrings. With arbitrary ranges you have to copy the text and tag names to a string one character at a time, which is less efficient. I don't want to write two separate parsers for this, so I'm trying to abstract things at the right level to maximize code reuse while keeping performance optimized for the string-as-input case, but how to do that is not so obvious. -- Michel Fortin michel.fortin michelf.com http://michelf.com/Here's the generated documentation: http://michelf.com/docs/d/mfr/xmltok.html http://michelf.com/docs/d/mfr/xml.html I'm slowly revamping it to use ranges instead of strings.I think a tokenizer should be a higher-order range that is fed an input range of ubyte, char, wchar, or dchar (so that would be a type parameter) and is itself a range of Tokens that include the token type, token value etc.
Jun 28 2010
I'm very interested. Tango's XML code was very good and damn fast. Maybe license issues can be worked out for that part at least?
Jun 28 2010