digitalmars.D.learn - Learning to XML with D
- Derix (21/21) Feb 06 2015 So, I set sails to transform a bunch of HTML files with D. This,
- Chris (16/37) Feb 06 2015 The documentation says:
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (9/16) Feb 06 2015 Another place to look is http://code.dlang.org/ , which contains
- Adam D. Ruppe (13/20) Feb 06 2015 Yeah, if you're used to DOM work in Javascript, my dom.d works in
- Derix (19/21) Feb 09 2015 Not exactly what I'm doing, but close. I'm in the midst of a
- CraigDillabaugh (4/52) Feb 06 2015 I added XML to the GSOC idea's page (see Phobos section), but it
- CraigDillabaugh (4/32) Feb 06 2015 Just for the record, I hate XML too, but it is VERY widely used,
- Chris (6/41) Feb 06 2015 You're right of course. It is widely (and wildly) used. I for my
- CraigDillabaugh (2/45) Feb 06 2015 Thanks for the tip. I may add a reference there!
- Adam D. Ruppe (10/14) Feb 06 2015 Function parameters in D can be qualified as in or out,
- Derix (11/15) Feb 09 2015 But of course. Actually I kinda found out just a little while
- Arjan (3/24) Feb 07 2015 Maybe, when you're on windows, you could use msxml6 through COM.
So, I set sails to transform a bunch of HTML files with D. This, of course, will happen with the std.xml library. There is this nice example : that I put to some use already, however some of the basics seem to escape me, specially in lines like xml.onEndTag["author"] = (in Element e) { book.author = e.text(); }; OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?) What I don't quite grab is the construct (in Element e) , especially the *in* part. Is it *in* as in http://dlang.org/expression.html#InExpression ? In which case I fail to see what associative array we're considering. It's probably more a way to further qualify the argument e were passing to the λ-function : could someone elaborate on that ? Of course, it is entirely possible that I completely miss the point and that I'm overlooking some fundamentals, if so have mercy and help me find my way back to teh righteous path ;-) Thxxx
Feb 06 2015
On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:So, I set sails to transform a bunch of HTML files with D. This, of course, will happen with the std.xml library. There is this nice example : that I put to some use already, however some of the basics seem to escape me, specially in lines like xml.onEndTag["author"] = (in Element e) { book.author = e.text(); }; OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?) What I don't quite grab is the construct (in Element e) , especially the *in* part. Is it *in* as in http://dlang.org/expression.html#InExpression ? In which case I fail to see what associative array we're considering. It's probably more a way to further qualify the argument e were passing to the λ-function : could someone elaborate on that ? Of course, it is entirely possible that I completely miss the point and that I'm overlooking some fundamentals, if so have mercy and help me find my way back to teh righteous path ;-) ThxxxThe documentation says: "Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term." My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated. If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there). There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.
Feb 06 2015
On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there).Another place to look is http://code.dlang.org/ , which contains packages usable with DUB. There you can find KXML, for example: http://code.dlang.org/packages/kxmlThere is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.:-P I think the reason is simply that someone has to do the actual work of pushing things forward. And to make matters worse, std.xml2 is marked as abandoned, so it would first have to be brought back into form before it can even be submitted.
Feb 06 2015
On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there).Yeah, if you're used to DOM work in Javascript, my dom.d works in a familiar way - it offers similar attributes, methods, uses css selector syntax if you want, etc.. You can download just that one file then build your program like "dmd yourfile.d dom.d" and it should just work, it has no outside dependencies. Mine can do almost any xml, but the out of the box experience is focused on HTML. When combined with my characterencodings.d from the same repo, it can handle most web pages too, making it useful for scraping html sites.There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.I don't have a problem with xml.... but my own lib Works For Me (tm) so I don't personally care much about what is or isn't in phobos....
Feb 06 2015
my dom.d works in a familiar wayOK, will check ituseful for scraping html sites.Not exactly what I'm doing, but close. I'm in the midst of a self-training spree, and what I use as test-tubes fodder is the following : a collection of 300+ html files constituting an electronic version of a technical book. My intent is to generate a clickable table of contents, by parsing the files for css styles specific to section headers. The first leg of the journey was to normalize styles accross the bunch. That is done, more or less. I already have a proto-toc, but not entirely satisfying : lacks handles for propper styling, and the way I arrived there is kinda brutish. One hurdle I haven't overcame yet is that the text content, and the section headers themsleves, contain some html tags (well, the book /is/ about html, among other things). For example, some section headers are rendered as two bold lines, with a fat <br/> in the middle, and <b></b> around. So when I parse the payload of the <p> element, I end up with some <br/> in the middle of a sentence. Survivable, but unclean. So yeah, I'll give it another try with your dom.d
Feb 09 2015
On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:I added XML to the GSOC idea's page (see Phobos section), but it still needs a mentor. Are you busy this summer? http://wiki.dlang.org/GSOC_2015_Ideas#Phobos:_D_Standard_LibrarySo, I set sails to transform a bunch of HTML files with D. This, of course, will happen with the std.xml library. There is this nice example : that I put to some use already, however some of the basics seem to escape me, specially in lines like xml.onEndTag["author"] = (in Element e) { book.author = e.text(); }; OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?) What I don't quite grab is the construct (in Element e) , especially the *in* part. Is it *in* as in http://dlang.org/expression.html#InExpression ? In which case I fail to see what associative array we're considering. It's probably more a way to further qualify the argument e were passing to the λ-function : could someone elaborate on that ? Of course, it is entirely possible that I completely miss the point and that I'm overlooking some fundamentals, if so have mercy and help me find my way back to teh righteous path ;-) ThxxxThe documentation says: "Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term." My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated. If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there). There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.
Feb 06 2015
On Friday, 6 February 2015 at 14:09:51 UTC, CraigDillabaugh wrote:On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:clipOn Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:Just for the record, I hate XML too, but it is VERY widely used, so good XML support is essential ... like it or not!I added XML to the GSOC idea's page (see Phobos section), but it still needs a mentor. Are you busy this summer? http://wiki.dlang.org/GSOC_2015_Ideas#Phobos:_D_Standard_LibraryThxxxThe documentation says: "Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term." My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated. If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there). There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.
Feb 06 2015
On Friday, 6 February 2015 at 14:11:19 UTC, CraigDillabaugh wrote:On Friday, 6 February 2015 at 14:09:51 UTC, CraigDillabaugh wrote:You're right of course. It is widely (and wildly) used. I for my part have changed my input files from XML to a simpler custom format. PS I am busy this summer. But maybe Adam's dom.d can be used as a basis for a new module, unlike std.xml2 it's not abandoned.On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:clipOn Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:Just for the record, I hate XML too, but it is VERY widely used, so good XML support is essential ... like it or not!I added XML to the GSOC idea's page (see Phobos section), but it still needs a mentor. Are you busy this summer? http://wiki.dlang.org/GSOC_2015_Ideas#Phobos:_D_Standard_LibraryThxxxThe documentation says: "Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term." My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated. If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there). There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.
Feb 06 2015
On Friday, 6 February 2015 at 14:15:44 UTC, Chris wrote:On Friday, 6 February 2015 at 14:11:19 UTC, CraigDillabaugh wrote:Thanks for the tip. I may add a reference there!On Friday, 6 February 2015 at 14:09:51 UTC, CraigDillabaugh wrote:You're right of course. It is widely (and wildly) used. I for my part have changed my input files from XML to a simpler custom format. PS I am busy this summer. But maybe Adam's dom.d can be used as a basis for a new module, unlike std.xml2 it's not abandoned.On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:clipOn Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:Just for the record, I hate XML too, but it is VERY widely used, so good XML support is essential ... like it or not!I added XML to the GSOC idea's page (see Phobos section), but it still needs a mentor. Are you busy this summer? http://wiki.dlang.org/GSOC_2015_Ideas#Phobos:_D_Standard_LibraryThxxxThe documentation says: "Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term." My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated. If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there). There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.
Feb 06 2015
On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?)yeahWhat I don't quite grab is the construct (in Element e) , especially the *in* part.Function parameters in D can be qualified as in or out, optionally: http://dlang.org/function.html#parameters (in Element e) means you are taking an argument of type Element that you only intend to take in to look at. An "in" parameter is const and you are not supposed to store a reference to it. So basically, `in` on a function parameter means "look, don't touch".
Feb 06 2015
But of course. Actually I kinda found out just a little while after posting the question. Asking questions is a great way to figure out the answer, so thank you for reading mines ;-) Thank you for your answer too, which consolidates my guess and makes me think I still have some thinking to do about the life of a function parameter. I was a bit puzzled too as to where the "Element e" comes from, how is it that it's already instanciated and all. Well, I've just found the relevant part of the documentation. To be honest, said documentation is not always easy to navigate or to decrypt. I sense some potential for progress here.What I don't quite grab is the construct (in Element e) , especially the *in* part.Function parameters in D can be qualified as in or out, optionally:
Feb 09 2015
On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:So, I set sails to transform a bunch of HTML files with D. This, of course, will happen with the std.xml library. There is this nice example : that I put to some use already, however some of the basics seem to escape me, specially in lines like xml.onEndTag["author"] = (in Element e) { book.author = e.text(); }; OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?) What I don't quite grab is the construct (in Element e) , especially the *in* part. Is it *in* as in http://dlang.org/expression.html#InExpression ? In which case I fail to see what associative array we're considering. It's probably more a way to further qualify the argument e were passing to the λ-function : could someone elaborate on that ? Of course, it is entirely possible that I completely miss the point and that I'm overlooking some fundamentals, if so have mercy and help me find my way back to teh righteous path ;-) ThxxxMaybe, when you're on windows, you could use msxml6 through COM. You have DOM, SAX, Xpath 1.0 and XSLT at your disposal.
Feb 07 2015