digitalmars.D.learn - XML Parsing
- Chris Pons (38/38) Mar 19 2012 Hey Guys,
- Adam D. Ruppe (66/66) Mar 19 2012 I know very little about std.xml (I looked at it and
- Chris Pons (2/73) Mar 20 2012 Thank you. I'll check it out.
- Iain (34/58) May 18 2012 Hi Adam,
- Adam D. Ruppe (4/6) May 18 2012 You have to link in the modules too on the command line
- Iain (2/8) May 18 2012 Aah thank you! Finally, an XML parser that works in D!!!
- Iain (33/34) May 18 2012 Adam, thanks for this! I guess you don't need much documentation
- Adam D. Ruppe (9/14) May 18 2012 Yeah, that's basically how I feel about it. I started writing
Hey Guys, I am trying to parse an XML document with std.xml. I've looked over the reference of std.xml as well as the example but i'm still stuck. I've also looked over some example code, but it's a bit confusing and doesn't entirely help explain what i'm doing wrong. As far as I understand it, I should load a file with read in std.file and save that into a string. From there, I check to make sure the string xmlData is in a proper xml format. This is where it gets a bit confusing, I followed the example and created a new instance of the class document parser and then tried to parse an attribute from the start tag map. The value i'm targeting right now is the width of the map in tiles, and want to save this into an integer. However, the value I get is 0. Any help would be MUCH appreciated. Here is a reference to the XML file: http://pastebin.com/tpUU1Wtv //These two functions are called in my main loop. void LoadMap(string filename) { enforce( filename != "" , "Filename is invalid!" ); xmlData = cast(string) read(filename); enforce( xmlData != "", "Read file Failed!" ); debug StopWatch sw = StopWatch(AutoStart.yes); check(xmlData); debug writeln( "Verified XML in ", sw.peek.msecs, "ms."); } void ParseMap() { auto xml = new DocumentParser(xmlData); xml.onStartTag["map"] = (ElementParser xml) { mapWidth = to!int(xml.tag.attr["width"]); xml.parse(); }; xml.parse(); writeln("Map Width: ", mapWidth); }
Mar 19 2012
I know very little about std.xml (I looked at it and said 'meh' and wrote my own lib), but my lib makes this pretty simple. https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff grab dom.d and characterencodings.d This has a bit of an html bias, but it works for xml too. === import arsd.dom; import std.file; import std.stdio; import std.conv; void main() { auto document = new Document(readText("test12.xml"), true, true); auto map = document.requireSelector("map"); writeln(to!int(map.width), "x", to!int(map.height)); foreach(tile; document.getElementsByTagName("tile")) writeln(tile.gid); } === $ dmd test12.d dom.d characterencodings.d $ test12 25x19 <snip tile data> Let me explain the lines: auto document = new Document(readText("test12.xml"), true, true); We use std.file.readText to read the file as a string. Document's constructor is: (string data, bool caseSensitive, bool strictMode). So, "true, true" means it will act like an XML parser, instead of trying to correct for html tag soup. Now, document is a DOM, like you see in W3C or web browsers (via javascript), though it is expanded with a lot of convenience and sugar. auto map = document.requireSelector("map"); querySelector and requireSelector use CSS selector syntax to fetch one element. querySelector may return null, whereas requireSelector will throw an exception if the element is not found. You can learn more about CSS selector syntax on the web. I tried to cover a good chunk of the standard, including most css2 and some css3. Here, I'm asking for the first element with tag name "map". You can also use querySelectorAll to get all the elements that match, returned as an array, which is great for looping. writeln(to!int(map.width), "x", to!int(map.height)); The attributes on an element are exposed via dot syntax, or you can use element.getAttribute("name") if you prefer. They are returned as strings. Using std.conv.to, we can easily convert them to integers. foreach(tile; document.getElementsByTagName("tile")) writeln(tile.gid); And finally, we get all the tile tags in the document and print out their gid attribute. Note that you can also call the element search functions on individual elements. That will only return that element and its children. Here, you didn't need it, but you can also use element.innerText to get the text inside a tag, pretty much covering basic data retrieval. Note: my library is not good at handling huge files; it eats a good chunk of memory and loads the whole document at once. But, it is the easiest way I've seen (I'm biased though) to work with xml files, so I like it.
Mar 19 2012
On Tuesday, 20 March 2012 at 04:32:13 UTC, Adam D. Ruppe wrote:I know very little about std.xml (I looked at it and said 'meh' and wrote my own lib), but my lib makes this pretty simple. https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff grab dom.d and characterencodings.d This has a bit of an html bias, but it works for xml too. === import arsd.dom; import std.file; import std.stdio; import std.conv; void main() { auto document = new Document(readText("test12.xml"), true, true); auto map = document.requireSelector("map"); writeln(to!int(map.width), "x", to!int(map.height)); foreach(tile; document.getElementsByTagName("tile")) writeln(tile.gid); } === $ dmd test12.d dom.d characterencodings.d $ test12 25x19 <snip tile data> Let me explain the lines: auto document = new Document(readText("test12.xml"), true, true); We use std.file.readText to read the file as a string. Document's constructor is: (string data, bool caseSensitive, bool strictMode). So, "true, true" means it will act like an XML parser, instead of trying to correct for html tag soup. Now, document is a DOM, like you see in W3C or web browsers (via javascript), though it is expanded with a lot of convenience and sugar. auto map = document.requireSelector("map"); querySelector and requireSelector use CSS selector syntax to fetch one element. querySelector may return null, whereas requireSelector will throw an exception if the element is not found. You can learn more about CSS selector syntax on the web. I tried to cover a good chunk of the standard, including most css2 and some css3. Here, I'm asking for the first element with tag name "map". You can also use querySelectorAll to get all the elements that match, returned as an array, which is great for looping. writeln(to!int(map.width), "x", to!int(map.height)); The attributes on an element are exposed via dot syntax, or you can use element.getAttribute("name") if you prefer. They are returned as strings. Using std.conv.to, we can easily convert them to integers. foreach(tile; document.getElementsByTagName("tile")) writeln(tile.gid); And finally, we get all the tile tags in the document and print out their gid attribute. Note that you can also call the element search functions on individual elements. That will only return that element and its children. Here, you didn't need it, but you can also use element.innerText to get the text inside a tag, pretty much covering basic data retrieval. Note: my library is not good at handling huge files; it eats a good chunk of memory and loads the whole document at once. But, it is the easiest way I've seen (I'm biased though) to work with xml files, so I like it.Thank you. I'll check it out.
Mar 20 2012
On Tuesday, 20 March 2012 at 04:32:13 UTC, Adam D. Ruppe wrote:I know very little about std.xml (I looked at it and said 'meh' and wrote my own lib), but my lib makes this pretty simple. https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff grab dom.d and characterencodings.d This has a bit of an html bias, but it works for xml too. === import arsd.dom; import std.file; import std.stdio; import std.conv; void main() { auto document = new Document(readText("test12.xml"), true, true); auto map = document.requireSelector("map"); writeln(to!int(map.width), "x", to!int(map.height)); foreach(tile; document.getElementsByTagName("tile")) writeln(tile.gid); } === $ dmd test12.d dom.d characterencodings.d $ test12 25x19 <snip tile data>Hi Adam, I'm also interested in your solution, as the std.xml page is so sparsely documented I can't make head nor tail of it. Also, neither of the examples compile for me, making life that little bit harder! Sadly, I can't get your code working either! I have downloaded the folder zip from your github link, and extracted it so that all the .d files are living in C:\D\dmd2\src\phobos\arsd\ If I try to compile the code you gave above, I get a pile of linking errors using D 2.059: C:\D\dmd2\windows\bin\dmd.exe parseSpain -O OPTLINK (R) for Win32 Release 8.00.12 Copyright (C) Digital Mars 1989-2010 All rights reserved. http://www.digitalmars.com/ctg/optlink.html parseSpain.obj(parseSpain) Error 42: Symbol Undefined _D4arsd3dom12__ModuleInfoZ parseSpain.obj(parseSpain) Error 42: Symbol Undefined _D4arsd3dom8__assertFiZv parseSpain.obj(parseSpain) Error 42: Symbol Undefined _D4arsd3dom24ElementNotFoundException7__ClassZ parseSpain.obj(parseSpain) Error 42: Symbol Undefined _D4arsd3dom24ElementNotFoundException6__ctorMFAyaAya AyaiZC4arsd3dom24ElementNotFoundException parseSpain.obj(parseSpain) Error 42: Symbol Undefined _D4arsd3dom8Document6__ctorMFAyabbZC4arsd3dom8Docume nt parseSpain.obj(parseSpain) Error 42: Symbol Undefined _D4arsd3dom8Document7__ClassZ --- errorlevel 6 Do you have any idea what's going on?!
May 18 2012
On Friday, 18 May 2012 at 23:08:59 UTC, Iain wrote:If I try to compile the code you gave above, I get a pile of linking errors using D 2.059:You have to link in the modules too on the command line dmd.exe parseSpain arsd/dom.d arsd/characterencoding.d (or whatever the full path to the modules is)
May 18 2012
On Friday, 18 May 2012 at 23:16:26 UTC, Adam D. Ruppe wrote:On Friday, 18 May 2012 at 23:08:59 UTC, Iain wrote:Aah thank you! Finally, an XML parser that works in D!!!If I try to compile the code you gave above, I get a pile of linking errors using D 2.059:You have to link in the modules too on the command line dmd.exe parseSpain arsd/dom.d arsd/characterencoding.d (or whatever the full path to the modules is)
May 18 2012
On Friday, 18 May 2012 at 23:31:05 UTC, Iain wrote:Aah thank you! Finally, an XML parser that works in D!!!Adam, thanks for this! I guess you don't need much documentation for your code, as you can just look up the wealth of tutorials that have been written for Javascript's XML parser. I have re-jigged one of std.xml's examples as follows - and it works! If there were a vote (and there probably should be) I would suggest your code ought to replace std.xml. How can D be taken seriously when it has major parts of the standard library broken? /* * read all the titles from book.xml * * uses dom.d and characterencodings.d by alex d ruppe: * https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff */ import arsd.dom; import std.file; import std.stdio; import std.conv; void main() { // http://msdn2.microsoft.com/en-us/library/ms762271(VS.85).aspx auto document = new Document(readText("book.xml"), true, true); auto map = document.requireSelector("catalog"); foreach (book; document.getElementsByTagName("book")) { string title = book.getElementsByTagName("title")[0].innerText(); writeln(title); } }
May 18 2012
On Saturday, 19 May 2012 at 00:00:50 UTC, Iain wrote:I guess you don't need much documentation for your code, as you can just look up the wealth of tutorials that have been written for Javascript's XML parser.Yeah, that's basically how I feel about it. I started writing some documentation but haven't gotten around to finishing it yet. But, if you know Javascript, you can probably get work done with my thing too.If there were a vote (and there probably should be) I would suggest your code ought to replace std.xml.This has come up before and some people are for it, but my code isn't built for speed or memory efficiency, so it isn't right for everybody.
May 18 2012