digitalmars.D - std.xml2 (collecting features)
- Robert burner Schadek (13/13) May 03 2015 std.xml has been considered not up to specs nearly 3 years now.
- Joakim (8/21) May 03 2015 My request: just skip it. XML is a horrible waste of space for a
- Meta (3/27) May 03 2015 That's not really an option considering the huge amount of XML
- w0rp (4/28) May 03 2015 I agree that JSON is superior through-and-through, but legacy
- Marco Leise (47/56) May 04 2015 Am Sun, 03 May 2015 18:44:11 +0000
- Joakim (20/76) May 09 2015 You seem to have missed the point of my post, which was to
- Craig Dillabaugh (31/77) May 09 2015 I have to agree with Joakim on this. Having spent much of this
- Marco Leise (31/87) May 10 2015 Well, I was mostly answering to w0rp here. JSON is both
- Joakim (27/61) May 10 2015 It's worse than shabby, it's a horrible, horrible choice. Not
- Laeeth Isharc (17/27) May 10 2015 I feel the same way about XML, and I also think that having
- Kagamin (3/5) May 12 2015 Hypothetically, yes, though formats better than XML don't exist.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (15/26) May 10 2015 JSON is just javascript literals with some silly constraints. As
- Alex Parrill (4/4) May 11 2015 Can we please not turn this thread into an XML vs JSON flamewar?
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (12/13) May 11 2015 This is not a flamewar, JSON is ad hoc and I use it a lot, but it
- Chris (8/32) Feb 19 2016 Glad to hear that someone is working on XML support. We cannot
- Joakim (10/44) Feb 23 2016 Then write a good XML extraction-only library and dub it. I see
- Dmitry (9/14) Feb 23 2016 You won't be able to sleep if it will be in Phobos?
- Craig Dillabaugh (4/13) Feb 24 2016 So are you trying to say C/C++ are not serious languages :o)
- Robert burner Schadek (1/1) May 03 2015 - CTS to disable parsing location (line,column)
- Walter Bright (3/4) May 03 2015 Pipeline range interface, for example:
- wobbles (4/17) May 03 2015 Could possibly use pegged to do it?
- Walter Bright (3/4) May 03 2015 Encoding schemes should be handled by adapter algorithms, not in the XML...
- Marco Leise (6/11) May 04 2015 Unlike JSON, XML actually declares the encoding in the prolog,
- Walter Bright (3/4) May 03 2015 Try to design the interface to it so it does not inherently require the
- Ilya Yaroshenko (1/1) May 03 2015 Can it lazily reads huge files (files greater than memory)?
- Walter Bright (3/4) May 03 2015 If a range interface is used, it doesn't need to be aware of where the d...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/11) May 04 2015 Wouldn't D-ranges make it impossible to use SIMD optimizations
- Jonathan M Davis (11/14) May 04 2015 Given how D's arrays work, we have the opportunity to have an
- Andrei Alexandrescu (6/18) May 04 2015 To be frank what's more embarrassing is that we managed to do nothing
- Jonathan M Davis (7/17) May 04 2015 Also true. Many of us just don't find enough time to work on D,
- Walter Bright (3/10) May 04 2015 Tango's XML package was well regarded and the fastest in the business. I...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/10) May 05 2015 Yes, that would be great. XML is a flexible go-to archive,
- Walter Bright (5/6) May 04 2015 Not at all. Algorithms can be specialized for various forms of input ran...
- Jonathan M Davis (12/17) May 04 2015 Indeed. It should operate on ranges without caring where they
- Michel Fortin (10/24) May 03 2015 This isn't a feature request (sorry?), but I just want to point out
- Robert burner Schadek (2/6) May 04 2015 nice, thank you
- Rikki Cattermole (6/18) May 03 2015 Preferably the interfaces are made first 1:1 as the spec requires.
- Jonathan M Davis (34/47) May 04 2015 If I were doing it, I'd do three types of parsers:
- Jacob Carlborg (16/38) May 04 2015 This way the XML parser is structured in Tango. A pull parser at the
- Jacob Carlborg (4/6) May 04 2015 I recommend benchmarking against the Tango pull parser.
- Walter Bright (3/7) May 04 2015 I agree. The Tango XML parser has set the performance bar. If any new so...
- "Mario =?UTF-8?B?S3LDtnBsaW4i?= <linkrope github.com> (8/13) May 05 2015 Recently, I compared DOM parsers for an XML files of 100 MByte:
- John Colvin (2/17) May 05 2015 As usual: system, compiler, compiler version, compilation flags?
- Richard Webb (11/17) May 05 2015 fwiw I did some tests a couple of years back with
- Walter Bright (3/5) May 05 2015 I haven't read the Tango source code, but the performance of it's xml wa...
- Jacob Carlborg (7/9) May 05 2015 That's only true for the pull parser (not sure about the SAX parser).
- Richard Webb (6/13) May 06 2015 The direct comparisons were with the DOM parsers (I was playing with a D...
- Jacob Carlborg (13/19) May 05 2015 Yes, of course it's slower. The DOM parser creates a DOM as well, which
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/16) May 05 2015 I agree. Most applications will use a DOM parser for convenience,
- Jacob Carlborg (8/14) May 05 2015 Agree.
- Brad Roberts via Digitalmars-d (10/22) May 05 2015 An old friend of mine who was intimate with the microsoft xml parsers
- Jacob Carlborg (18/20) May 04 2015 There are a couple of interesting comments about the Tango pull parser
- Liam McSherry (7/20) May 04 2015 Not a feature, but if `std.data.json` [1] gets accepted in to
- Rikki Cattermole (3/25) May 04 2015 It really should be std.data.xml. To keep with the new structuring. Plus...
- weaselcat (6/19) May 04 2015 maybe off-topic, but it would be nice if the standard json,xml,
- Marco Leise (14/18) May 04 2015 I don't think this needs discussion. It is plain impossible to
- Alex Vincent (5/18) Feb 17 2016 I'm looking for a status update. DUB doesn't seem to have many
- Robert burner Schadek (10/13) Feb 18 2016 I'm working on it, but recently I had to do some major
- Robert burner Schadek (5/8) Feb 18 2016 also I would like to see this
- Andrei Alexandrescu (2/8) Feb 18 2016 Would the measuring be possible with 2995 as a dub package? -- Andrei
- Robert burner Schadek (3/9) Feb 18 2016 yes, after have synced the dub package to the PR
- Robert burner Schadek (3/9) Feb 23 2016 brought the dub package up to date with the PR (v0.0.6)
- Alex Vincent (6/9) Feb 18 2016 Oh, I absolutely agree, independent implementation is a bad
- Craig Dillabaugh (4/18) Feb 18 2016 Would you be interested in mentoring a student for the Google
- Robert burner Schadek (3/5) Feb 19 2016 Yes, why not!
- Robert burner Schadek (22/22) Feb 18 2016 While working on a new xml implementation I came cross "control
- Adam D. Ruppe (10/12) Feb 18 2016 That means the user didn't encode them properly...
- Robert burner Schadek (2/2) Feb 18 2016 for instance, quick often I find <80> in tests that are supposed
- Adam D. Ruppe (3/5) Feb 18 2016 What char encoding does the document declare itself as?
- Robert burner Schadek (4/9) Feb 18 2016 It does not, it has no prolog and therefore no EncodingInfo.
- Robert burner Schadek (3/5) Feb 18 2016 the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
- Adam D. Ruppe (7/10) Feb 18 2016 Gah, I should have read this before replying... well, that does
- Alex Vincent (7/19) Feb 18 2016 Regarding control characters: If you give me a complete sample
- Robert burner Schadek (3/8) Feb 18 2016 thanks you making the effort
- Alex Vincent (12/21) Feb 19 2016 In this case, Firefox just passes the control characters through
- Kagamin (3/4) Feb 19 2016 http://dpaste.dzfl.pl/80888ed31958 like this?
- Robert burner Schadek via Digitalmars-d (8/12) Feb 19 2016 No, The program just takes the hex dump as string.
- Kagamin (3/9) Feb 19 2016 http://dpaste.dzfl.pl/2f8a8ff10bde like this?
- Robert burner Schadek (2/3) Feb 19 2016 yes
- Adam D. Ruppe (16/17) Feb 18 2016 In that case, it needs to be valid UTF-8 or valid UTF-16 and it
- crimaniak (10/11) Feb 20 2016 - the ability to read documents with missing or incorrectly
- Adam D. Ruppe (4/8) Feb 20 2016 fyi, my dom.d can do those, I use it for web scraping where
- crimaniak (5/13) Feb 21 2016 It works, thanks! I will use it in my experiments, but
- Adam D. Ruppe (2/4) Feb 21 2016 What, specifically, do you have in mind?
- crimaniak (5/9) Feb 25 2016 Where is only a couple of ad-hoc checks for attributes values.
- Adam D. Ruppe (7/11) Feb 25 2016 The css3 selector standard offers three substring search:
- Dejan Lekic (9/9) Feb 24 2016 If you really want to be serious about the XML package, then I
- Alex Vincent (20/25) Mar 01 2016 I agree, but the Document Object Model (DOM) is a huuuuuuuuge
- Adam D. Ruppe (6/9) Mar 02 2016 My dom.d implements a fair chunk of it already.
- =?UTF-8?Q?Tobias=20M=C3=BCller?= (4/16) Mar 01 2016 What's the usecase of DOM outside of browser interoperability/scripting?
- Adam D. Ruppe (6/9) Mar 02 2016 I find my extended dom to be very nice, especially thanks to D's
- =?UTF-8?Q?Tobias=20M=C3=BCller?= (5/16) Mar 03 2016 Sure, some kind of DOM is certainly useful. But the standard XML-DOM isn...
- Craig Dillabaugh (5/18) Mar 05 2016 Robert, we have had some student interest in GSOC for XML. Would
- Robert burner Schadek (2/6) Mar 06 2016 Of course
- Lodovico Giaretta (6/14) Mar 07 2016 Hi,
- Craig Dillabaugh (6/14) Mar 07 2016 Great. Can you please get in touch by email so I can add you to
- Alex Vincent (5/5) Mar 12 2016 For everyone's information, I've posted a pull request to Mr.
std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.
May 03 2015
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
May 03 2015
On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:That's not really an option considering the huge amount of XML data there is out there.std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
May 03 2015
On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:I agree that JSON is superior through-and-through, but legacy support matters, and XML is in many places. It's good to have a quality XML parsing library.std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
May 03 2015
On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.Am Sun, 03 May 2015 18:44:11 +0000 schrieb "w0rp" <devw0rp gmail.com>:I agree that JSON is superior through-and-through, but legacy support matters, and XML is in many places. It's good to have a quality XML parsing library.You two are terrible at motivating people. "Better D doesn't support it well" and "JSON is superior through-and-through" is overly dismissive. To me it sounds like someone saying replace C++ with JavaScript, because C++ is a horrible standard and JavaScript is so much superior. Honestly. Remember that while JSON is simpler, XML is not just a structured container for bool, Number and String data. It comes with many official side kicks covering a broad range of use cases: XPath: * allows you to use XML files like a textual database * complex enough to allow for almost any imaginable query * many tools emerged to test XPath expressions against XML documents * also powers XSLT (http://www.liquid-technologies.com/xpath-tutorial.aspx) XSL (Extensible Stylesheet Language) and XSLT (XSL Transformations): * written as XML documents * standard way to transform XML from one structure into another * convert or "compile" data to XHTML or SVG for display in a browser * output to XSL-FO XSL-FO (XSL formatting objects): * written as XSL * type-setting for XML; a XSL-FO processor is similar to a LaTex processor * reads an XML document (a "Format" document) and outputs to a PDF, RTF or similar format XML Schema Definition (XSD): * written as XML * linked in by an XML file * defines structure and validates content to some extent * can set constraints on how often an element can occur in a list * can validate data type of values (length, regex, positive, etc.) * database like unique IDs and references I think XML is the most eat-your-own-dog-food language ever and nicely covers a wide range of use cases. In any case there are many XML based file formats that we might want to parse. Amongst them SVG, OpenDocument (Open/LibreOffics), RSS feeds, several US Offices, XMP and other meta data formats. When it comes to which features to support, I personally used XSD more than XPath and the tech using it. But quite frankly both would be expected by users. Based on XPath, XSL transformations can be added any time then. Anything beyond that doesn't feel quite "core" enough to be in a XML module. -- Marco
May 04 2015
On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:You seem to have missed the point of my post, which was to discourage him from working on an XML module for phobos. As for "motivating" him, I suggested better alternatives. And I never said JSON was great, but it's certainly _much_ more readable than XML, which is one of the basic goals of a text format.My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.You two are terrible at motivating people. "Better D doesn't support it well" and "JSON is superior through-and-through" is overly dismissive. To me it sounds like someone saying replace C++ with JavaScript, because C++ is a horrible standard and JavaScript is so much superior. Honestly.Remember that while JSON is simpler, XML is not just a structured container for bool, Number and String data. It comes with many official side kicks covering a broad range of use cases: XPath: * allows you to use XML files like a textual database * complex enough to allow for almost any imaginable query * many tools emerged to test XPath expressions against XML documents * also powers XSLT (http://www.liquid-technologies.com/xpath-tutorial.aspx) XSL (Extensible Stylesheet Language) and XSLT (XSL Transformations): * written as XML documents * standard way to transform XML from one structure into another * convert or "compile" data to XHTML or SVG for display in a browser * output to XSL-FO XSL-FO (XSL formatting objects): * written as XSL * type-setting for XML; a XSL-FO processor is similar to a LaTex processor * reads an XML document (a "Format" document) and outputs to a PDF, RTF or similar format XML Schema Definition (XSD): * written as XML * linked in by an XML file * defines structure and validates content to some extent * can set constraints on how often an element can occur in a list * can validate data type of values (length, regex, positive, etc.) * database like unique IDs and referencesThese are all incredibly dumb ideas. I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)I think XML is the most eat-your-own-dog-food language ever and nicely covers a wide range of use cases.The problem is you're still eating dog food. ;)In any case there are many XML based file formats that we might want to parse. Amongst them SVG, OpenDocument (Open/LibreOffics), RSS feeds, several US Offices, XMP and other meta data formats.Sure, and if he has any real need for any of those, who are we to stop him? But if he's just looking for some way to contribute, there are better ways. On Monday, 4 May 2015 at 20:44:42 UTC, Jonathan M Davis wrote:Also true. Many of us just don't find enough time to work on D, and we don't seem to do a good job of encouraging larger contributions to Phobos, so newcomers don't tend to contribute like that. And there's so much to do all around that the big stuff just falls by the wayside, and it really shouldn't.This is why I keep asking Walter and Andrei for a list of "big stuff" on the wiki- they don't have to be big, just important- so that newcomers know where help is most needed. Of course, it doesn't have to be them, it could be any member of the D core team, though whatever the BDFLs push for would have a bit more weight.
May 09 2015
On Saturday, 9 May 2015 at 10:28:53 UTC, Joakim wrote:On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:clipOn Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:I have to agree with Joakim on this. Having spent much of this past week trying to get XML generated by gSOAP (project has some legacy code) to work with JAXB (Java) has reinforced my dislike for XML. I've used things like XPath and XLST in the past, so I can appreciate their power, but think the 'jobs' they perform would be better supported elsewhere (ie. language specific XML frameworks). In trying to pass data between applications I just want a simple way of packaging up the data and ideally making serialization/deserialization easy for me. At some point the programmer working on these needs to understand and validate the data anyway. Sure you can use DTD/XML Schema to handle the validation part, but it is just easier to deal with that within you own code - without having to learn a 'whole new language', that is likely harder to grok than the tools you would have at your disposal in your language of choice. Having said all that. As much as I share Joakim's sentiment that I wish XML would just go away, there is a lot of it out there, and I think having good support in Phobos is very valuable so I thank Robert for his efforts. CraigRemember that while JSON is simpler, XML is not just a structured container for bool, Number and String data. It comes with many official side kicks covering a broad range of use cases: XPath: * allows you to use XML files like a textual database * complex enough to allow for almost any imaginable query * many tools emerged to test XPath expressions against XML documents * also powers XSLT (http://www.liquid-technologies.com/xpath-tutorial.aspx) XSL (Extensible Stylesheet Language) and XSLT (XSL Transformations): * written as XML documents * standard way to transform XML from one structure into another * convert or "compile" data to XHTML or SVG for display in a browser * output to XSL-FO XSL-FO (XSL formatting objects): * written as XSL * type-setting for XML; a XSL-FO processor is similar to a LaTex processor * reads an XML document (a "Format" document) and outputs to a PDF, RTF or similar format XML Schema Definition (XSD): * written as XML * linked in by an XML file * defines structure and validates content to some extent * can set constraints on how often an element can occur in a list * can validate data type of values (length, regex, positive, etc.) * database like unique IDs and referencesThese are all incredibly dumb ideas. I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)I think XML is the most eat-your-own-dog-food language ever and nicely covers a wide range of use cases.The problem is you're still eating dog food. ;)
May 09 2015
Am Sat, 09 May 2015 10:28:52 +0000 schrieb "Joakim" <dlang joakim.fea.st>:On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:Well, I was mostly answering to w0rp here. JSON is both readable and easy to parse, no question. =20You two are terrible at motivating people. "Better D doesn't support it well" and "JSON is superior through-and-through" is overly dismissive. =E2=80=A6=20 You seem to have missed the point of my post, which was to=20 discourage him from working on an XML module for phobos. As for=20 "motivating" him, I suggested better alternatives. And I never=20 said JSON was great, but it's certainly _much_ more readable than=20 XML, which is one of the basic goals of a text format.:) One can't really answer this one. But with many hundreds of published data exchange formats built on XML, it can't have been too shabby all along. And sometimes small things matter, like being able to add comments along with the "payload". JSON doesn't have that. Or knowing that both sender and receiver will validate the XML the same way through XSD. So if it doesn't blow up on your end, it will pass validation on the other end, too. Am Sat, 09 May 2015 13:04:57 +0000 schrieb "Craig Dillabaugh" <craig.dillabaugh gmail.com>:Remember that while JSON is simpler, XML is not just a structured container for bool, Number and String data. It comes with many official side kicks covering a broad range of use cases: XPath: =E2=80=A6 XSL and XSLT =E2=80=A6 XSL-FO (XSL formatting objects): =E2=80=A6 XML Schema Definition (XSD): =E2=80=A6=20 These are all incredibly dumb ideas. I don't deny that many=20 people may use these things, but then people use hammers for all=20 kinds of things they shouldn't use them for too. :)I have to agree with Joakim on this. Having spent much of this=20 past week trying to get XML generated by gSOAP (project has some legacy code) to work with JAXB (Java) has reinforced my dislike for XML. =20 I've used things like XPath and XLST in the past, so I can=20 appreciate their power, but think the 'jobs' they perform would be better=20 supported elsewhere (ie. language specific XML frameworks). =20 In trying to pass data between applications I just want a simple=20 way of packaging up the data and ideally making=20 serialization/deserialization easy for me. At some point the programmer working on these needs to understand and validate the data anyway. Sure you can use=20 DTD/XML Schema to handle the validation part, but it is just easier to deal with=20 that within you own code - without having to learn a 'whole new=20 language', that is likely harder to grok than the tools you would have at your=20 disposal in your language of choice.You see, the thing is that XSD is _not_ a whole new language, it is written in XML as well, probably specifically to make it so. Try to switch the perspective: With XSD (if it is sufficient for your validation needs) _one_ person needs to learn and write it and other programmers (inside or outside the company) just use the XML library of choice to handle validation via that schema. Once the schema is loaded it is usually no more than doc.validate(); (There is also good GUI tools to assist in writing XSD.) What you propose on the other hand is that everyone involved in the data exchange writes their own validation code in their language of choice, with either no access to existing sources or functionality that doesn't translate to their language! =20 --=20 Marco
May 10 2015
On Sunday, 10 May 2015 at 07:01:58 UTC, Marco Leise wrote:Am Sat, 09 May 2015 10:28:52 +0000 schrieb "Joakim" <dlang joakim.fea.st>:It's worse than shabby, it's a horrible, horrible choice. Not just for data formats, but for _anything_. XML should not be used.On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote::) One can't really answer this one. But with many hundreds of published data exchange formats built on XML, it can't have been too shabby all along.Remember that while JSON is simpler, XML is not just a structured container for bool, Number and String data. It comes with many official side kicks covering a broad range of use cases: XPath: … XSL and XSLT … XSL-FO (XSL formatting objects): … XML Schema Definition (XSD): …These are all incredibly dumb ideas. I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)And sometimes small things matter, like being able to add comments along with the "payload". JSON doesn't have that. Or knowing that both sender and receiver will validate the XML the same way through XSD. So if it doesn't blow up on your end, it will pass validation on the other end, too.One can do all these things with better formats than either XML or JSON. But why do we often end up dealing with these two? Familiarity, that is the only reason. XML seems familiar to anybody who's written some HTML, and JSON became familiar to web developers initially. Starting from those two large niches, they've expanded out to become the two most popular data interchange formats, despite XML being a horrible mess and JSON being too simple for many uses. I'd like to see a move back to binary formats, which is why I mentioned that to Robert. D would be an ideal language in which to show the superiority of binary to text formats, given its emphasis on efficiency. Many devs have learned the wrong lessons from past closed binary formats, when open binary formats wouldn't have many of those deficiencies. There have been some interesting moves back to open binary formats/protocols in recent years, like Hessian (http://hessian.caucho.com/), Thrift (https://thrift.apache.org/), MessagePack (http://msgpack.org/), and Cap'n Proto (from the protobufs guy after he left google - https://capnproto.org/). I'd rather see phobos support these, which are the future, rather than flash-in-the-pan text formats like XML or JSON.
May 10 2015
On Sunday, 10 May 2015 at 08:54:09 UTC, Joakim wrote:It's worse than shabby, it's a horrible, horrible choice. Not just for data formats, but for _anything_. XML should not be used.I feel the same way about XML, and I also think that having strong aesthetic internal emotional responses is often necessary to achieve excellence in engineering.But why do we often end up dealing with these two? Familiarity, that is the only reason. XML seems familiar to anybody who's written some HTML, and JSON became familiar to web developers initially. Starting from those two large niches, they've expanded out to become the two most popular data interchange formats, despite XML being a horrible mess and JSON being too simple for many uses.Sometimes you get to pick, but often not. I can hardly tell the UK Debt Management Office to give up XML and switch to msgpack structs (well, I can, but I am not sure they would listen). So at the moment for some data series I use a python library via PyD to convert xml files to JSON. But it would be nice to do it all in D. I am not sure XML is going away very soon since new protocols keep being created using it. (Most recent one I heard of is one for allowing hedge funds to achieve full transparency of their portfolio to end investors - not necessarily something that will achieve what people think it will, but one in tune with the times). Laeeth.
May 10 2015
On Sunday, 10 May 2015 at 08:54:09 UTC, Joakim wrote:One can do all these things with better formats than either XML or JSON.Hypothetically, yes, though formats better than XML don't exist. I personally find XML perfectly readable.
May 12 2015
On Sunday, 10 May 2015 at 07:01:58 UTC, Marco Leise wrote:Well, I was mostly answering to w0rp here. JSON is both readable and easy to parse, no question.JSON is just javascript literals with some silly constraints. As crappy a format as it gets. Even pure Lisp would have been better. And much more powerful!:) One can't really answer this one. But with many hundreds of published data exchange formats built on XML, it can't have been too shabby all along. And sometimes small things matter, like being able to add comments along with the "payload".XML is actually great for what it is: eXtensible. It means you can build forward compatible formats and annotate existing formats with metadata without breaking existing (compliant) applications etc... It also means you can datamine files whithout knowing the full format.Or knowing that both sender and receiver will validate the XML the same way through XSD.Right, or build a database/archival service that is generic. XML is not going away until there is something better, and that won't happen anytime soon. It is also one of the few formats that I actually need library and _good_ DOM support for. (JSON can be done in an afternoon, so I don't care if it is supported or not...)
May 10 2015
Can we please not turn this thread into an XML vs JSON flamewar? XML is one of the most popular data formats (for better or for worse), so a parser would be a good addition to the standard library.
May 11 2015
On Monday, 11 May 2015 at 15:20:12 UTC, Alex Parrill wrote:Can we please not turn this thread into an XML vs JSON flamewar?This is not a flamewar, JSON is ad hoc and I use it a lot, but it isn't actually suitable as a file and archival exchange format. It is important that people understand what the point of XML is in order to build something useful. Full XML support and tooling is very valuable for typed GC-backed batch processing. That means namespaces, entities, XQuery equivalents, DOMs etc A library backed tooling pipeline would be a valuable asset for D. The value is not in _reading_ or _writing_ XML. The value is all about providing a framework for structured grammar/namespace based _processing_ and _transforms_.
May 11 2015
On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:Glad to hear that someone is working on XML support. We cannot just "skip it". XML/HTML like mark up comes up all the time, here and there. I recently had to write a mini-parser (nowhere near the stuff Robert is doing, just a quick fix!) to extract data from XML input. This has nothing to do with personal preferences, it's just there [1] and has to be dealt with. [1] https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Languagestd.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
Feb 19 2016
On Friday, 19 February 2016 at 12:13:53 UTC, Chris wrote:On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:Then write a good XML extraction-only library and dub it. I see no reason to include this in Phobos, which will encourage those who don't know any better to use it, since it comes with the compiler. I'll close with a quote from Saint Linus of Torvalds, which I was unaware of till a couple days ago: "XML is crap. Really. There are no excuses. XML is nasty to parse for humans, and it's a disaster to parse even for computers. There's just no reason for that horrible crap to exist." https://en.wikiquote.org/wiki/Linus_Torvalds#2014On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:Glad to hear that someone is working on XML support. We cannot just "skip it". XML/HTML like mark up comes up all the time, here and there. I recently had to write a mini-parser (nowhere near the stuff Robert is doing, just a quick fix!) to extract data from XML input. This has nothing to do with personal preferences, it's just there [1] and has to be dealt with. [1] https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Languagestd.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.My request: just skip it. XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use. I'd rather see you spend your time on something worthwhile. If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
Feb 23 2016
On Tuesday, 23 February 2016 at 11:22:23 UTC, Joakim wrote:Then write a good XML extraction-only library and dub it. I see no reason to include this in PhobosYou won't be able to sleep if it will be in Phobos? I use XML and I don't like check tons of side libraries for see which will be good for me, which have support (bugfixes), which will have support in some years, etc. Lot of systems already using XML and any serious language _must_ have official support for it.If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.If it better for you, it not mean that it will better for everyone.
Feb 23 2016
On Tuesday, 23 February 2016 at 12:46:38 UTC, Dmitry wrote:On Tuesday, 23 February 2016 at 11:22:23 UTC, Joakim wrote:So are you trying to say C/C++ are not serious languages :o) Having said that, as much as I hate XML, basic support would be a nice feature for the language.Then write a good XML extraction-only library and dub it. I see no reason to include this in PhobosYou won't be able to sleep if it will be in Phobos? I use XML and I don't like check tons of side libraries for see which will be good for me, which have support (bugfixes), which will have support in some years, etc. Lot of systems already using XML and any serious language _must_ have official support for it.
Feb 24 2016
- CTS to disable parsing location (line,column)
May 03 2015
On 5/3/2015 10:39 AM, Robert burner Schadek wrote:Please post you feature requests, and please keep the posts DRY and on topic.Pipeline range interface, for example: source.xmlparse(configuration).whatever();
May 03 2015
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.Could possibly use pegged to do it? It may simplify the parsing portion of it for you at least.
May 03 2015
On 5/3/2015 10:39 AM, Robert burner Schadek wrote:- CTS for encoding (ubyte(ASCII), char(utf8), ... )Encoding schemes should be handled by adapter algorithms, not in the XML parser itself, which should only handle UTF8.
May 03 2015
Am Sun, 03 May 2015 14:00:11 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 5/3/2015 10:39 AM, Robert burner Schadek wrote:Unlike JSON, XML actually declares the encoding in the prolog, e.g.: <?xml version="1.0" encoding="Windows-1252"?> -- Marco- CTS for encoding (ubyte(ASCII), char(utf8), ... )Encoding schemes should be handled by adapter algorithms, not in the XML parser itself, which should only handle UTF8.
May 04 2015
On 5/3/2015 10:39 AM, Robert burner Schadek wrote:Please post you feature requests, and please keep the posts DRY and on topic.Try to design the interface to it so it does not inherently require the implementation to allocate GC memory.
May 03 2015
Can it lazily reads huge files (files greater than memory)?
May 03 2015
On 5/3/2015 2:31 PM, Ilya Yaroshenko wrote:Can it lazily reads huge files (files greater than memory)?If a range interface is used, it doesn't need to be aware of where the data is coming from. In fact, the xml package should NOT be doing I/O.
May 03 2015
On Sunday, 3 May 2015 at 22:02:13 UTC, Walter Bright wrote:On 5/3/2015 2:31 PM, Ilya Yaroshenko wrote:Wouldn't D-ranges make it impossible to use SIMD optimizations when scanning? However, it would make a lot of sense to just convert an existing XML solution with Boost license. I don't know which ones are any good, but RapidXML is at least Boost.Can it lazily reads huge files (files greater than memory)?If a range interface is used, it doesn't need to be aware of where the data is coming from. In fact, the xml package should NOT be doing I/O.
May 04 2015
On Monday, 4 May 2015 at 09:35:55 UTC, Ola Fosheim Grøstad wrote:However, it would make a lot of sense to just convert an existing XML solution with Boost license. I don't know which ones are any good, but RapidXML is at least Boost.Given how D's arrays work, we have the opportunity to have an _extremely_ fast XML parser thanks to slices. It's highly unlikely that any C or C++ solution is going to be able to compete, and if it can, it's likely to be far more complex than necessary. Parsing is an area where we definitely should write our own stuff rather than porting existing code from other languages or use existing libraries in other languages via C bindings. Fast parsing is definitely a killer feature of D and the fact that std.xml botches that so badly is just embarrassing. - Jonathan M Davis
May 04 2015
On 5/4/15 12:31 PM, Jonathan M Davis wrote:On Monday, 4 May 2015 at 09:35:55 UTC, Ola Fosheim Grøstad wrote:To be frank what's more embarrassing is that we managed to do nothing about it for years (aside from endlessly wailing about it in an a capella ensemble). It's a failure of leadership (that Walter and I need to work on) that very many unimportant and arguably less interesting areas of Phobos get attention at the expense of this one. -- AndreiHowever, it would make a lot of sense to just convert an existing XML solution with Boost license. I don't know which ones are any good, but RapidXML is at least Boost.Given how D's arrays work, we have the opportunity to have an _extremely_ fast XML parser thanks to slices. It's highly unlikely that any C or C++ solution is going to be able to compete, and if it can, it's likely to be far more complex than necessary. Parsing is an area where we definitely should write our own stuff rather than porting existing code from other languages or use existing libraries in other languages via C bindings. Fast parsing is definitely a killer feature of D and the fact that std.xml botches that so badly is just embarrassing.
May 04 2015
On Monday, 4 May 2015 at 19:45:18 UTC, Andrei Alexandrescu wrote:On 5/4/15 12:31 PM, Jonathan M Davis wrote:Also true. Many of us just don't find enough time to work on D, and we don't seem to do a good job of encouraging larger contributions to Phobos, so newcomers don't tend to contribute like that. And there's so much to do all around that the big stuff just falls by the wayside, and it really shouldn't. - Jonathan M DavisFast parsing is definitely a killer feature of D and the fact that std.xml botches that so badly is just embarrassing.To be frank what's more embarrassing is that we managed to do nothing about it for years (aside from endlessly wailing about it in an a capella ensemble). It's a failure of leadership (that Walter and I need to work on) that very many unimportant and arguably less interesting areas of Phobos get attention at the expense of this one. -- Andrei
May 04 2015
On 5/4/2015 12:31 PM, Jonathan M Davis wrote:Given how D's arrays work, we have the opportunity to have an _extremely_ fast XML parser thanks to slices. It's highly unlikely that any C or C++ solution is going to be able to compete, and if it can, it's likely to be far more complex than necessary. Parsing is an area where we definitely should write our own stuff rather than porting existing code from other languages or use existing libraries in other languages via C bindings. Fast parsing is definitely a killer feature of D and the fact that std.xml botches that so badly is just embarrassing.Tango's XML package was well regarded and the fastest in the business. It used slicing, and almost no memory allocation.
May 04 2015
On Monday, 4 May 2015 at 19:31:59 UTC, Jonathan M Davis wrote:Given how D's arrays work, we have the opportunity to have an _extremely_ fast XML parser thanks to slices.Yes, that would be great. XML is a flexible go-to archive, exchange and application format. Things like entities, namespaces and so makes it non-trivial, but being able to conveniently process Inkscape and Open Office files etc would be very useful. One should probably look at what applications generate XML and create some large test files with existing applications.
May 05 2015
On 5/4/2015 2:35 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:Wouldn't D-ranges make it impossible to use SIMD optimizations when scanning?Not at all. Algorithms can be specialized for various forms of input ranges, including ones where SIMD optimizations can be used. Specialization is one of the very cool things about D algorithms.
May 04 2015
On Sunday, 3 May 2015 at 22:02:13 UTC, Walter Bright wrote:On 5/3/2015 2:31 PM, Ilya Yaroshenko wrote:Indeed. It should operate on ranges without caring where they came from (though it may end up supporting both input ranges and random-access ranges with the idea that it can support reading of a socket with a range in a less efficient manner or operating on a whole file at once as via a random-access range for more efficient parsing). But if I/O is a big concern, I'd suggest just using std.mmfile to do the trick, since then you can still operate on the whole file as a single array without having to actually have the whole thing in memory. - Jonathan M DavisCan it lazily reads huge files (files greater than memory)?If a range interface is used, it doesn't need to be aware of where the data is coming from. In fact, the xml package should NOT be doing I/O.
May 04 2015
On 2015-05-03 17:39:46 +0000, "Robert burner Schadek" <rburners gmail.com> said:std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.This isn't a feature request (sorry?), but I just want to point out that you should feel free to borrow code from https://github.com/michelf/mfr-xml-d There's probably a lot you can reuse in there. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
May 03 2015
On Sunday, 3 May 2015 at 23:32:28 UTC, Michel Fortin wrote:This isn't a feature request (sorry?), but I just want to point out that you should feel free to borrow code from https://github.com/michelf/mfr-xml-d There's probably a lot you can reuse in there.nice, thank you
May 04 2015
On 4/05/2015 5:39 a.m., Robert burner Schadek wrote:std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.Preferably the interfaces are made first 1:1 as the spec requires. Then its just a matter of building the actual reader/writer code. That way we could theoretically rewrite the reader/writer to support other formats such as html5/svg. Independently of phobos. Also would be nice to be CTFE'able!
May 03 2015
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.If I were doing it, I'd do three types of parsers: 1. A parser that was pretty much as low level as you can get, where you basically a range of XML atributes or tags. Exactly how to build that could be a bit entertaining, since it would have to be hierarchical, and ranges aren't, but something like a range of tags where you can get a range of its attributes and sub-tags from it so that the whole document can be processed without actually getting to the level of even a SAX parser. That parser could then be used to build the other parsers, and anyone who needed insanely fast speeds could use it rather than the SAX or DOM parser so long as they were willing to pay the inevitable loss in user-friendliness. 2. SAX parser built on the low level parser. 3. DOM parser built either on the low level parser or the SAX parser (whichever made more sense). I doubt that I'm really explaining the low level parser well enough or have even though through it enough, but I really think that even a SAX parser is too high level for the base parser and that something that slightly higher than a lexer (high enough to actually be processing XML rather than individual tokens but pretty much only as high as is required to do that) would be a far better choice. IIRC, Michel Fortin's work went in that direction, and he linked to his code in another post, so I'd suggest at least looking at that for ideas. Regardless, by building layers of XML parsers rather than just the standard ones, it should be possible to get higher performance while still having the more standard, user-friendly ones for those that don't need the full performance and do need the user-friendliness (though of course, we do want the SAX and DOM parsers to be efficient as well). - Jonathan M Davis
May 04 2015
On 2015-05-04 21:14, Jonathan M Davis wrote:If I were doing it, I'd do three types of parsers: 1. A parser that was pretty much as low level as you can get, where you basically a range of XML atributes or tags. Exactly how to build that could be a bit entertaining, since it would have to be hierarchical, and ranges aren't, but something like a range of tags where you can get a range of its attributes and sub-tags from it so that the whole document can be processed without actually getting to the level of even a SAX parser. That parser could then be used to build the other parsers, and anyone who needed insanely fast speeds could use it rather than the SAX or DOM parser so long as they were willing to pay the inevitable loss in user-friendliness. 2. SAX parser built on the low level parser. 3. DOM parser built either on the low level parser or the SAX parser (whichever made more sense). I doubt that I'm really explaining the low level parser well enough or have even though through it enough, but I really think that even a SAX parser is too high level for the base parser and that something that slightly higher than a lexer (high enough to actually be processing XML rather than individual tokens but pretty much only as high as is required to do that) would be a far better choice. IIRC, Michel Fortin's work went in that direction, and he linked to his code in another post, so I'd suggest at least looking at that for ideas.This way the XML parser is structured in Tango. A pull parser at the lowest level, a SAX parser on top of that and I think the DOM parser builds on top of the pull parser. The Tango pull parser can give you the following tokens: * start element * attribute * end element * end empty element * data * comment * cdata * doctype * pi -- /Jacob Carlborg
May 04 2015
On 2015-05-03 19:39, Robert burner Schadek wrote:Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2I recommend benchmarking against the Tango pull parser. -- /Jacob Carlborg
May 04 2015
On 5/4/2015 12:28 PM, Jacob Carlborg wrote:On 2015-05-03 19:39, Robert burner Schadek wrote:I agree. The Tango XML parser has set the performance bar. If any new solution can't match that, throw it out and try again.Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2I recommend benchmarking against the Tango pull parser.
May 04 2015
On Monday, 4 May 2015 at 19:28:25 UTC, Jacob Carlborg wrote:On 2015-05-03 19:39, Robert burner Schadek wrote:Recently, I compared DOM parsers for an XML files of 100 MByte: 15.8 s tango.text.xml (SiegeLord/Tango-D2) 13.4 s ae.utils.xml (CyberShadow/ae) 8.5 s xml.etree (Python) Either the Tango DOM parser is slow compared to the Tango pull parser, or the D2 port ruined the performance.Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2I recommend benchmarking against the Tango pull parser.
May 05 2015
On Tuesday, 5 May 2015 at 10:41:37 UTC, Mario Kröplin wrote:On Monday, 4 May 2015 at 19:28:25 UTC, Jacob Carlborg wrote:As usual: system, compiler, compiler version, compilation flags?On 2015-05-03 19:39, Robert burner Schadek wrote:Recently, I compared DOM parsers for an XML files of 100 MByte: 15.8 s tango.text.xml (SiegeLord/Tango-D2) 13.4 s ae.utils.xml (CyberShadow/ae) 8.5 s xml.etree (Python) Either the Tango DOM parser is slow compared to the Tango pull parser, or the D2 port ruined the performance.Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2I recommend benchmarking against the Tango pull parser.
May 05 2015
On 05/05/2015 11:41, "Mario =?UTF-8?B?S3LDtnBsaW4i?= <linkrope github.com>" wrote:Recently, I compared DOM parsers for an XML files of 100 MByte: 15.8 s tango.text.xml (SiegeLord/Tango-D2) 13.4 s ae.utils.xml (CyberShadow/ae) 8.5 s xml.etree (Python) Either the Tango DOM parser is slow compared to the Tango pull parser, or the D2 port ruined the performance.fwiw I did some tests a couple of years back with https://launchpad.net/d2-xml on 20 odd megabyte files and found it faster than Tango. Unfortunately that would need some work to test now, as xmlp is abandoned and wouldn't build last time I tried it :-( I also had some success with https://github.com/opticron/kxml, though it had some issues with chuffy entity decoding performance. Also, profiling showed a lot of time spent in the GC, and the recent improvements in that area might have changed things by now.
May 05 2015
On 5/5/2015 4:16 AM, Richard Webb wrote:Also, profiling showed a lot of time spent in the GC, and the recent improvements in that area might have changed things by now.I haven't read the Tango source code, but the performance of it's xml was supposedly because it did not use the GC, it used slices.
May 05 2015
On 2015-05-06 01:38, Walter Bright wrote:I haven't read the Tango source code, but the performance of it's xml was supposedly because it did not use the GC, it used slices.That's only true for the pull parser (not sure about the SAX parser). The DOM parser needs to allocate the nodes, but if I recall correctly those are allocated in a free list. Not sure which parser was used in the test. -- /Jacob Carlborg
May 05 2015
On 06/05/2015 07:31, Jacob Carlborg wrote:On 2015-05-06 01:38, Walter Bright wrote:The direct comparisons were with the DOM parsers (I was playing with a D port of some C++ code at work at the time, and that is DOM based). xmlp has alternate parsers (event driven etc) which were faster in some simple tests i did, but I don't recall if I did a direct comparison with Tango there.I haven't read the Tango source code, but the performance of it's xml was supposedly because it did not use the GC, it used slices.That's only true for the pull parser (not sure about the SAX parser). The DOM parser needs to allocate the nodes, but if I recall correctly those are allocated in a free list. Not sure which parser was used in the test.
May 06 2015
On 2015-05-05 12:41, "Mario =?UTF-8?B?S3LDtnBsaW4i?= <linkrope github.com>" wrote:Recently, I compared DOM parsers for an XML files of 100 MByte: 15.8 s tango.text.xml (SiegeLord/Tango-D2) 13.4 s ae.utils.xml (CyberShadow/ae) 8.5 s xml.etree (Python) Either the Tango DOM parser is slow compared to the Tango pull parser,Yes, of course it's slower. The DOM parser creates a DOM as well, which the pull parser doesn't. These other libraries, what kind of parsers are those using? I mean, it's not fair to compare a pull parser against a DOM parser. Could you try D1 Tango as well? Or do you have the benchmark available somewhere?or the D2 port ruined the performance.Might be the case as well, see this comment [1]. [1] http://forum.dlang.org/thread/vsbsxfeciryrdsjhhfak forum.dlang.org?page=3#post-mi8hs8:24b0j:241:40digitalmars.com -- /Jacob Carlborg
May 05 2015
On Tuesday, 5 May 2015 at 12:10:59 UTC, Jacob Carlborg wrote:Yes, of course it's slower. The DOM parser creates a DOM as well, which the pull parser doesn't. These other libraries, what kind of parsers are those using? I mean, it's not fair to compare a pull parser against a DOM parser.I agree. Most applications will use a DOM parser for convenience, so sacrificing some speed initially in favour of easy-of-use makes a lot of sense. As long as it is possible to improve it later (e.g. use SIMD scanning to find the end of CDATA etc). In my opinion it is rather difficult to build a good API without also using the API in an application in parallel. So it would be a good strategy to build a specific DOM along with writing the XML infrastructure, like SVG/HTML. Also, some parsers, like RapidXML only support a subset of XML. So they cannot be used for comparisons.
May 05 2015
On 2015-05-05 16:04, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:In my opinion it is rather difficult to build a good API without also using the API in an application in parallel. So it would be a good strategy to build a specific DOM along with writing the XML infrastructure, like SVG/HTML.Agree.Also, some parsers, like RapidXML only support a subset of XML. So they cannot be used for comparisons.The Tango parser has some limitation as well. In some places it sacrificed correctness for speed. There's a comment claiming the parser might read past the input if it's not well formed. -- /Jacob Carlborg
May 05 2015
An old friend of mine who was intimate with the microsoft xml parsers was fond of saying, particularly with respect to xml parsers, that if you hadn't finished implementing and testing error handling and negative tests (ie, malformed documents) that your positive benchmarks were fairly meaningless. A whole lot of work goes into that 'second half' of things that can quickly cost performance. I didn't dive or don't recall specific details as this was years ago. The (over-)generalization from there is an old adage: it's easy to write an incorrect program. On 5/5/2015 11:33 PM, Jacob Carlborg via Digitalmars-d wrote:On 2015-05-05 16:04, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:In my opinion it is rather difficult to build a good API without also using the API in an application in parallel. So it would be a good strategy to build a specific DOM along with writing the XML infrastructure, like SVG/HTML.Agree.Also, some parsers, like RapidXML only support a subset of XML. So they cannot be used for comparisons.The Tango parser has some limitation as well. In some places it sacrificed correctness for speed. There's a comment claiming the parser might read past the input if it's not well formed.
May 05 2015
On 2015-05-03 19:39, Robert burner Schadek wrote:Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2There are a couple of interesting comments about the Tango pull parser that can be worth mentioning: * Use -version=whitespace to retain whitespace as data nodes. We see a %25 increase in token count and 10% throughput drop when parsing "hamlet.xml" with this option enabled (pullparser alone) * The parser is constructed with some tradeoffs relating to document integrity. It is generally optimized for well-formed documents, and currently may read past a document-end for those that are not well formed * Making some tiny unrelated change to the code can cause notable throughput changes. We're not yet clear why these swings are so pronounced (for changes outside the code path) but they seem to be related to the alignment of codegen. It could be a cache-line issue, or something else The last comment might not relevant anymore since these are all quite old comments. -- /Jacob Carlborg
May 04 2015
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.Not a feature, but if `std.data.json` [1] gets accepted in to Phobos, it may be something to consider naming this `std.data.xml` (although that might not as effectively differentiate it from `std.xml`). [1]: http://wiki.dlang.org/Review_Queue
May 04 2015
On 5/05/2015 10:45 a.m., Liam McSherry wrote:On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:It really should be std.data.xml. To keep with the new structuring. Plus it'll make transitioning a little easier.std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.Not a feature, but if `std.data.json` [1] gets accepted in to Phobos, it may be something to consider naming this `std.data.xml` (although that might not as effectively differentiate it from `std.xml`). [1]: http://wiki.dlang.org/Review_Queue
May 04 2015
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.maybe off-topic, but it would be nice if the standard json,xml, etc etc all had identical interfaces(except for implementation-specific quirks.) This might be something worth discussing if it wasn't already agreed upon.
May 04 2015
Am Tue, 05 May 2015 02:01:50 +0000 schrieb "weaselcat" <weaselcat gmail.com>:maybe off-topic, but it would be nice if the standard json,xml, etc etc all had identical interfaces(except for implementation-specific quirks.) This might be something worth discussing if it wasn't already agreed upon.I don't think this needs discussion. It is plain impossible to have a sophisticated JSON parser and a sophisticated XML parser share the same API. Established function names, structural differences in the formats and feature sets differ to much. For example in XML attributes and child elements are used somewhat interchangeably whereas in JSON attributes don't exist. So while in JSON "obj.field" makes sense in XML you would want to select either an attribute or an element with the name "field". -- Marco
May 04 2015
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.I'm looking for a status update. DUB doesn't seem to have many options posted. I was thinking about starting a SAXParser implementation.
Feb 17 2016
On Thursday, 18 February 2016 at 04:34:13 UTC, Alex Vincent wrote:I'm looking for a status update. DUB doesn't seem to have many options posted. I was thinking about starting a SAXParser implementation.I'm working on it, but recently I had to do some major restructuring of the code. Currently I'm trying to get this merged https://github.com/D-Programming-Language/phobos/pull/3880 because I had some problems with the encoding of test files. XML has a lot of corner cases, it just takes time. If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.
Feb 18 2016
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.also I would like to see this https://github.com/D-Programming-Language/phobos/pull/2995 go in first to be able to accurately measure and compare performance
Feb 18 2016
On 02/18/2016 05:49 AM, Robert burner Schadek wrote:On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:Would the measuring be possible with 2995 as a dub package? -- AndreiIf you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.also I would like to see this https://github.com/D-Programming-Language/phobos/pull/2995 go in first to be able to accurately measure and compare performance
Feb 18 2016
On Thursday, 18 February 2016 at 12:30:29 UTC, Andrei Alexandrescu wrote:yes, after have synced the dub package to the PRalso I would like to see this https://github.com/D-Programming-Language/phobos/pull/2995 go in first to be able to accurately measure and compare performanceWould the measuring be possible with 2995 as a dub package? -- Andrei
Feb 18 2016
On Thursday, 18 February 2016 at 15:39:01 UTC, Robert burner Schadek wrote:On Thursday, 18 February 2016 at 12:30:29 UTC, Andrei Alexandrescu wrote:brought the dub package up to date with the PR (v0.0.6)Would the measuring be possible with 2995 as a dub package? -- Andreiyes, after have synced the dub package to the PR
Feb 23 2016
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.Oh, I absolutely agree, independent implementation is a bad thing. (Someone should rename DRY as "don't repeat yourself or others"... but DRYOO sounds weird.) Where's your repo?
Feb 18 2016
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:On Thursday, 18 February 2016 at 04:34:13 UTC, Alex Vincent wrote:Would you be interested in mentoring a student for the Google Summer of Code to do work on std.xml?I'm looking for a status update. DUB doesn't seem to have many options posted. I was thinking about starting a SAXParser implementation.I'm working on it, but recently I had to do some major restructuring of the code. Currently I'm trying to get this merged https://github.com/D-Programming-Language/phobos/pull/3880 because I had some problems with the encoding of test files. XML has a lot of corner cases, it just takes time. If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.
Feb 18 2016
On Friday, 19 February 2016 at 04:02:02 UTC, Craig Dillabaugh wrote:Would you be interested in mentoring a student for the Google Summer of Code to do work on std.xml?Yes, why not!
Feb 19 2016
While working on a new xml implementation I came cross "control characters (CC)". [1] When trying to validate/convert an utf string these lead to exceptions, because they are not valid utf character. Unfortunately, some of these characters are allowed to appear in valid xml 1.* documents. I currently see two option how to go about it: 1. Do not allow non CCs that do not work with existing functionality. 1.Pros * easy 1.Cons * the resulting xml implementation will not be xml 1.* complete 2. Add special cases to the existing functionality to handle CCs that are allowed in 1.0. 2.Pros * the resulting xml implementation will be xml 1.* complete 2.Cons * will make utf de/encoding slower as I would need to add additional logic Any other ideas, feedback? [1] https://en.wikipedia.org/wiki/C0_and_C1_control_codes
Feb 18 2016
On Thursday, 18 February 2016 at 15:56:58 UTC, Robert burner Schadek wrote:When trying to validate/convert an utf string these lead to exceptions, because they are not valid utf character.That means the user didn't encode them properly... Which one specifically are you thinking of? I'm pretty sure all those control characters have a spot in the Unicode space and can be properly encoded as UTF-8 (though I think even if they are properly encoded, some of them are illegal in XML anyway). If they appear in another form, it is invalid and/or needs a charset conversion, which should be specified in the XML document itself.
Feb 18 2016
for instance, quick often I find <80> in tests that are supposed to be valid xml 1.0. they are invalid xml 1.1 though
Feb 18 2016
On Thursday, 18 February 2016 at 16:41:52 UTC, Robert burner Schadek wrote:for instance, quick often I find <80> in tests that are supposed to be valid xml 1.0. they are invalid xml 1.1 thoughWhat char encoding does the document declare itself as?
Feb 18 2016
On Thursday, 18 February 2016 at 16:47:35 UTC, Adam D. Ruppe wrote:On Thursday, 18 February 2016 at 16:41:52 UTC, Robert burner Schadek wrote:It does not, it has no prolog and therefore no EncodingInfo. unix file says it is a utf8 encoded file, but not BOM is present.for instance, quick often I find <80> in tests that are supposed to be valid xml 1.0. they are invalid xml 1.1 thoughWhat char encoding does the document declare itself as?
Feb 18 2016
On Thursday, 18 February 2016 at 16:54:10 UTC, Robert burner Schadek wrote:unix file says it is a utf8 encoded file, but not BOM is present.the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
Feb 18 2016
On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner Schadek wrote:Gah, I should have read this before replying... well, that does appear to be valid utf-8.... why is it throwing an exception then? I'm pretty sure that byte stream *is* actually well-formed xml 1.0 and should pass utf validation as well as the XML well-formedness check.unix file says it is a utf8 encoded file, but not BOM is present.the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
Feb 18 2016
On Thursday, 18 February 2016 at 17:26:30 UTC, Adam D. Ruppe wrote:On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner Schadek wrote:Regarding control characters: If you give me a complete sample file, I can run it through Mozilla's UTF stream conversion and/or XML parsing code (via either SAX or DOMParser) to tell you how that reacts as a reference. Mozilla supports XML 1.0, but not 1.1.Gah, I should have read this before replying... well, that does appear to be valid utf-8.... why is it throwing an exception then? I'm pretty sure that byte stream *is* actually well-formed xml 1.0 and should pass utf validation as well as the XML well-formedness check.unix file says it is a utf8 encoded file, but not BOM is present.the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
Feb 18 2016
On Thursday, 18 February 2016 at 18:28:10 UTC, Alex Vincent wrote:Regarding control characters: If you give me a complete sample file, I can run it through Mozilla's UTF stream conversion and/or XML parsing code (via either SAX or DOMParser) to tell you how that reacts as a reference. Mozilla supports XML 1.0, but not 1.1.thanks you making the effort https://github.com/burner/std.xml2/blob/master/tests/eduni/xml-1.1/out/010.xml
Feb 18 2016
On Thursday, 18 February 2016 at 21:53:24 UTC, Robert burner Schadek wrote:On Thursday, 18 February 2016 at 18:28:10 UTC, Alex Vincent wrote:In this case, Firefox just passes the control characters through to the contentHandler.characters method: Starting runTest Retrieved source contentHandler.startDocument() contentHandler.startElement("", "foo", "foo", {}) contentHandler.characters("\u0080") contentHandler.endElement("", "foo", "foo") contentHandler.endDocument() Done readingRegarding control characters: If you give me a complete sample file, I can run it through Mozilla's UTF stream conversion and/or XML parsing code (via either SAX or DOMParser) to tell you how that reacts as a reference. Mozilla supports XML 1.0, but not 1.1.thanks you making the effort https://github.com/burner/std.xml2/blob/master/tests/eduni/xml-1.1/out/010.xml
Feb 19 2016
On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner Schadek wrote:the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"http://dpaste.dzfl.pl/80888ed31958 like this?
Feb 19 2016
On 2016-02-19 11:58, Kagamin via Digitalmars-d wrote:On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner Schadek wrote:No, The program just takes the hex dump as string. you would need to do something like: ubyte[] arr = cast(ubyte[])[3C, 66, 6F, 6F, 3E, C2, 80, 3C, 2F, 66, 6F, 6F, 3E]); string s = cast(string)arr; dstring ds = to!dstring(s); and see what happensthe hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"http://dpaste.dzfl.pl/80888ed31958 like this?
Feb 19 2016
On Friday, 19 February 2016 at 12:30:06 UTC, Robert burner Schadek wrote:ubyte[] arr = cast(ubyte[])[3C, 66, 6F, 6F, 3E, C2, 80, 3C, 2F, 66, 6F, 6F, 3E]); string s = cast(string)arr; dstring ds = to!dstring(s); and see what happenshttp://dpaste.dzfl.pl/2f8a8ff10bde like this?
Feb 19 2016
On Friday, 19 February 2016 at 12:55:52 UTC, Kagamin wrote:http://dpaste.dzfl.pl/2f8a8ff10bde like this?yes
Feb 19 2016
On Thursday, 18 February 2016 at 16:54:10 UTC, Robert burner Schadek wrote:It does not, it has no prolog and therefore no EncodingInfo.In that case, it needs to be valid UTF-8 or valid UTF-16 and it is a fatal error if there's any invalid bytes: https://www.w3.org/TR/REC-xml/#charencoding == It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding. Specifically, it is a fatal error if an entity encoded in UTF-8 contains any ill-formed code unit sequences, as defined in section 3.9 of Unicode [Unicode]. Unless an encoding is determined by a higher-level protocol, it is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16. ==
Feb 18 2016
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:Please post you feature requests...- the ability to read documents with missing or incorrectly specified encoding - additional feature: relaxed mode for reading html and broken XML documents Some time ago I worked for Accusoft for the document viewing/converting software. The main experience that I get: any theoretically possible types of errors in the documents are real, when the application is popular.
Feb 20 2016
On Saturday, 20 February 2016 at 19:08:25 UTC, crimaniak wrote:- the ability to read documents with missing or incorrectly specified encoding - additional feature: relaxed mode for reading html and broken XML documentsfyi, my dom.d can do those, I use it for web scraping where there's all kinds of hideous stuff out there. https://github.com/adamdruppe/arsd/blob/master/dom.d
Feb 20 2016
On Saturday, 20 February 2016 at 19:16:47 UTC, Adam D. Ruppe wrote:On Saturday, 20 February 2016 at 19:08:25 UTC, crimaniak wrote:It works, thanks! I will use it in my experiments, but getElementsBySelector() selector language need to be improved I think.- the ability to read documents with missing or incorrectly specified encoding - additional feature: relaxed mode for reading html and broken XML documentsfyi, my dom.d can do those, I use it for web scraping where there's all kinds of hideous stuff out there. https://github.com/adamdruppe/arsd/blob/master/dom.d
Feb 21 2016
On Sunday, 21 February 2016 at 23:01:22 UTC, crimaniak wrote:I will use it in my experiments, but getElementsBySelector() selector language need to be improved I think.What, specifically, do you have in mind?
Feb 21 2016
On Sunday, 21 February 2016 at 23:57:40 UTC, Adam D. Ruppe wrote:On Sunday, 21 February 2016 at 23:01:22 UTC, crimaniak wrote:Where is only a couple of ad-hoc checks for attributes values. This language is not XPath-compatible, so most easy way to cover a lot of cases is regex check for attributes. Something like "script[src/https:.+\\.googleapis\\.com/i]"I will use it in my experiments, but getElementsBySelector() selector language need to be improved I think.What, specifically, do you have in mind?
Feb 25 2016
On Thursday, 25 February 2016 at 23:59:04 UTC, crimaniak wrote:Where is only a couple of ad-hoc checks for attributes values. This language is not XPath-compatible, so most easy way to cover a lot of cases is regex check for attributes. Something like "script[src/https:.+\\.googleapis\\.com/i]"The css3 selector standard offers three substring search: [attr^=foo] if it begins with foo, [attr$=foo] if it ends with foo, and [attr*=foo] if it includes foo somewhere. dom.d supports all three now. So for your regex, you could probably match: [attr*=googleapis.com] well enough.
Feb 25 2016
If you really want to be serious about the XML package, then I humbly believe implementing the commonly-known DOM interfaces is a must. Luckily there is IDL available for it: https://www.w3.org/TR/DOM-Level-2-Core/idl/dom.idl . Also, speaking about DOM, all levels need to be supported! Also, I would recommend borrowing the Tango's XML pull parser as it is blazingly fast. Finally, perhaps integration with signal/slot module should perhaps be considered as well.
Feb 24 2016
On Wednesday, 24 February 2016 at 10:55:01 UTC, Dejan Lekic wrote:If you really want to be serious about the XML package, then I humbly believe implementing the commonly-known DOM interfaces is a must. Luckily there is IDL available for it: https://www.w3.org/TR/DOM-Level-2-Core/idl/dom.idl . Also, speaking about DOM, all levels need to be supported!I agree, but the Document Object Model (DOM) is a huuuuuuuuge project. It's a project I'd love to take an active hand in driving. Also, DOM "level 4" is a living standard at whatwg.org, along with rules for parsing HTML. (Which naturally means the rules are always changing.) I have a partial implementation of DOM in JavaScript, so I am serious when I say it's going to take time. Ideally (imho), we'd have a set of related packages, prefixed with std.web: * html * xml * dom * css * javascript (Yes, I'm suggesting a rename of std.xml2 to std.web.xml.) But from what I can see, realistically the community is a long way from that. I'm trying to write the SAX interfaces now. I only have a limited amount of time to devote to this (a common complaint, I gather)...
Mar 01 2016
On Wednesday, 2 March 2016 at 02:50:22 UTC, Alex Vincent wrote:I agree, but the Document Object Model (DOM) is a huuuuuuuuge project. It's a project I'd love to take an active hand in driving.My dom.d implements a fair chunk of it already. https://github.com/adamdruppe/arsd/blob/master/dom.d Yes, indeed, it is quite a lot of code, but easy to use if you are familiar with javascript and css selectors. http://dpldocs.info/experimental-docs/arsd.dom.html
Mar 02 2016
Dejan Lekic <dejan.lekic gmail.com> wrote:If you really want to be serious about the XML package, then I humbly believe implementing the commonly-known DOM interfaces is a must. Luckily there is IDL available for it: https://www.w3.org/TR/DOM-Level-2-Core/idl/dom.idl . Also, speaking about DOM, all levels need to be supported! Also, I would recommend borrowing the Tango's XML pull parser as it is blazingly fast. Finally, perhaps integration with signal/slot module should perhaps be considered as well.What's the usecase of DOM outside of browser interoperability/scripting? The API isn't particularly nice, especially in languages with a rich type system.
Mar 01 2016
On Wednesday, 2 March 2016 at 06:59:49 UTC, Tobias Müller wrote:What's the usecase of DOM outside of browser interoperability/scripting? The API isn't particularly nice, especially in languages with a rich type system.I find my extended dom to be very nice, especially thanks to D's type system. I use it for a lot of things: using web apis, html scraping, config file stuff, working on my own documents, and even as my web template system. Basically, dom.d made xml cool to me.
Mar 02 2016
Adam D. Ruppe <destructionator gmail.com> wrote:On Wednesday, 2 March 2016 at 06:59:49 UTC, Tobias Müller wrote:Sure, some kind of DOM is certainly useful. But the standard XML-DOM isn't particularly nice. What's the point of a linked list style interface when you have ranges in the language?What's the usecase of DOM outside of browser interoperability/scripting? The API isn't particularly nice, especially in languages with a rich type system.I find my extended dom to be very nice, especially thanks to D's type system. I use it for a lot of things: using web apis, html scraping, config file stuff, working on my own documents, and even as my web template system. Basically, dom.d made xml cool to me.
Mar 03 2016
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it: - SAX and DOM parser - in-situ / slicing parsing when possible (forward range?) - compile time switch (CTS) for lazy attribute parsing - CTS for encoding (ubyte(ASCII), char(utf8), ... ) - CTS for input validating - performance Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2 Please post you feature requests, and please keep the posts DRY and on topic.Robert, we have had some student interest in GSOC for XML. Would you be interested in mentoring a student to work with you on this. Craig
Mar 05 2016
On Saturday, 5 March 2016 at 15:20:12 UTC, Craig Dillabaugh wrote:Robert, we have had some student interest in GSOC for XML. Would you be interested in mentoring a student to work with you on this. CraigOf course
Mar 06 2016
On Sunday, 6 March 2016 at 11:46:00 UTC, Robert burner Schadek wrote:On Saturday, 5 March 2016 at 15:20:12 UTC, Craig Dillabaugh wrote:Hi, I don't know if this is the right spot to join the conversation; I'm student and I'd really love to work on std.xml for GSoC! I'm just waiting March 14 to apply.Robert, we have had some student interest in GSOC for XML. Would you be interested in mentoring a student to work with you on this. CraigOf course
Mar 07 2016
On Sunday, 6 March 2016 at 11:46:00 UTC, Robert burner Schadek wrote:On Saturday, 5 March 2016 at 15:20:12 UTC, Craig Dillabaugh wrote:Great. Can you please get in touch by email so I can add you to the mentors list: craig dot dillabaugh at gmail dot com CheersRobert, we have had some student interest in GSOC for XML. Would you be interested in mentoring a student to work with you on this. CraigOf course
Mar 07 2016
For everyone's information, I've posted a pull request to Mr. Schadek's github repository, with a proposed Simple API for XML (SAX) stub. I'd really appreciate reviews of the stub's interfaces. https://github.com/burner/std.xml2/pull/5
Mar 12 2016