digitalmars.D - Writing XML
- Tomek =?ISO-8859-2?B?U293afFza2k=?= (76/76) Feb 06 2011 While I'm circling the problem of parsing, I took a quick look at writin...
- Jacob Carlborg (6/63) Feb 06 2011 This seems to be like the Ruby "builder" library, which is a library I
- Andrei Alexandrescu (9/10) Feb 06 2011 That's great. I won't be able to add much because I haven't worked with
- Rainer Schuetze (11/26) Feb 06 2011 This looks nice and compact Using opDispatch to specify the tag (I guess...
- Jonathan M Davis (11/49) Feb 06 2011 Actually, using opDispatch in that manner would become a big problem onc...
- spir (18/27) Feb 06 2011 Call opDispatch directly ;-)
- Tomek =?ISO-8859-2?B?U293afFza2k=?= (12/24) Feb 06 2011 =20
- Christopher Nicholson-Sauls (9/21) Feb 08 2011 Might I suggest changing the sugar to have a suffix? Ie, instead of
- Russel Winder (15/16) Feb 08 2011 using xml.bookTag would ruin it for me. I'd use something that allowed
- Russel Winder (72/100) Feb 07 2011 I am coming in half way through a thread, apologies if I am saying
- Russel Winder (18/18) Feb 07 2011 A couple of other random thoughts regarding XML:
- spir (9/66) Feb 06 2011 When does one need to write by hand, in source, structured data needing ...
- Ary Manzana (4/9) Feb 08 2011 If you have a website API that exposes its data via XML, you would like
- spir (7/18) Feb 08 2011 No, in his example, the data were hardcoded as plain constant in source ...
While I'm circling the problem of parsing, I took a quick look at writing n= ot to get stuck in analysis-paralysis. Writing XML is pretty independent fr= om parsing and an order of magnitude easier to solve. It was perfect to get= myself coding. These are the guidelines I followed: * Memory minimalism: don't force allocating an intermediate node structure= just to push a few tags down the wire. * Composability: operating on an arbitrary string output range. * Robustness: tags should not be left open, even if the routine producing = tag interior throws. * Simplicity of syntax: resembling real XML if possible. * Space efficiency / readability: can write tightly (without indents and n= ewlines) for faster network transfer and, having easy an means for temporar= y tight writing, for better readability. * Ease of use: - automatic to!string of non-string values, - automatic string escaping according to XML standard, - handle nulls: close the tags short (<tag/>), don't write attributes wi= th null values at all. * anything else? The new writer meets pretty much all of the above. Here's an example to get= a feel of it: auto books =3D [ Book([Name("Gr=EAbosz", "Jerzy")], "Pasja C++", 1999), Book([Name("Navin", "Robert", "N.")], "Mathemetics of Derivatives", 200= 7), Book([Name("Tokarczuk", "Olga")], "Podr=F3=BF ludzi Ksi=EAgi", 1996), Book([Name("Graham", "Ronald", "L."), Name("Knuth", "Donald", "E."), Name("Patashnik", "Oren")], "Matematyka Konkretna", 2008) ]; auto outputRange =3D ... ; auto xml =3D xmlWriter(outputRange); xml.comment(books.length, " favorite books of mine."); foreach (book; books) { xml.book("year", book.year, { foreach (author; book.authors) { xml.tight.authorName({ xml.first(author.first); xml.middle(author.middle); xml.last(author.last); }); } xml.tight.title(book.title); }); } --------------------------------- program output --------------------------= ------- <!-- 4 favorite books of mine. --> <book year=3D"1999"> <authorName><first>Jerzy</first><middle/><last>Gr=EAbosz</last></authorNa= me> <title>Pasja C++</title> </book> <book year=3D"2007"> <authorName><first>Robert</first><middle>N.</middle><last>Navin</last></a= uthorName> <title>Mathemetics of Derivatives</title> </book> <book year=3D"1996"> <authorName><first>Olga</first><middle/><last>Tokarczuk</last></authorNam= e> <title>Podr=F3=BF ludzi Ksi=EAgi</title> </book> <book year=3D"2008"> <authorName><first>Ronald</first><middle>L.</middle><last>Graham</last></= authorName> <authorName><first>Donald</first><middle>E.</middle><last>Knuth</last></a= uthorName> <authorName><first>Oren</first><middle/><last>Patashnik</last></authorNam= e> <title>Matematyka Konkretna</title> </book> Questions and comments? --=20 Tomek
Feb 06 2011
On 2011-02-06 15:43, Tomek Sowiński wrote:While I'm circling the problem of parsing, I took a quick look at writing not to get stuck in analysis-paralysis. Writing XML is pretty independent from parsing and an order of magnitude easier to solve. It was perfect to get myself coding. These are the guidelines I followed: * Memory minimalism: don't force allocating an intermediate node structure just to push a few tags down the wire. * Composability: operating on an arbitrary string output range. * Robustness: tags should not be left open, even if the routine producing tag interior throws. * Simplicity of syntax: resembling real XML if possible. * Space efficiency / readability: can write tightly (without indents and newlines) for faster network transfer and, having easy an means for temporary tight writing, for better readability. * Ease of use: - automatic to!string of non-string values, - automatic string escaping according to XML standard, - handle nulls: close the tags short (<tag/>), don't write attributes with null values at all. * anything else? The new writer meets pretty much all of the above. Here's an example to get a feel of it: auto books = [ Book([Name("Grębosz", "Jerzy")], "Pasja C++", 1999), Book([Name("Navin", "Robert", "N.")], "Mathemetics of Derivatives", 2007), Book([Name("Tokarczuk", "Olga")], "Podróż ludzi Księgi", 1996), Book([Name("Graham", "Ronald", "L."), Name("Knuth", "Donald", "E."), Name("Patashnik", "Oren")], "Matematyka Konkretna", 2008) ]; auto outputRange = ... ; auto xml = xmlWriter(outputRange); xml.comment(books.length, " favorite books of mine."); foreach (book; books) { xml.book("year", book.year, { foreach (author; book.authors) { xml.tight.authorName({ xml.first(author.first); xml.middle(author.middle); xml.last(author.last); }); } xml.tight.title(book.title); }); } --------------------------------- program output --------------------------------- <!-- 4 favorite books of mine. --> <book year="1999"> <authorName><first>Jerzy</first><middle/><last>Grębosz</last></authorName> <title>Pasja C++</title> </book> <book year="2007"> <authorName><first>Robert</first><middle>N.</middle><last>Navin</last></authorName> <title>Mathemetics of Derivatives</title> </book> <book year="1996"> <authorName><first>Olga</first><middle/><last>Tokarczuk</last></authorName> <title>Podróż ludzi Księgi</title> </book> <book year="2008"> <authorName><first>Ronald</first><middle>L.</middle><last>Graham</last></authorName> <authorName><first>Donald</first><middle>E.</middle><last>Knuth</last></authorName> <authorName><first>Oren</first><middle/><last>Patashnik</last></authorName> <title>Matematyka Konkretna</title> </book> Questions and comments?This seems to be like the Ruby "builder" library, which is a library I like. This is a perfect candidate for the syntax sugar I've proposed for passing delegates to a function after the parameter list. -- /Jacob Carlborg
Feb 06 2011
On 2/6/11 9:43 AM, Tomek Sowiński wrote:While I'm circling the problem of parsing, I took a quick look at writing not to get stuck in analysis-paralysis.That's great. I won't be able to add much because I haven't worked with XML so I don't know what people need. The example looks nice and clean. Generally laziness may be a tactics you could use to help with memory use. A good example is split() vs. splitter(). The "er" version offers one element at a time thus never forcing an allocation. The split() version must do all work upfront and also allocate a structure for depositing the output. Andrei
Feb 06 2011
Tomek Sowiński wrote:auto xml = xmlWriter(outputRange); xml.comment(books.length, " favorite books of mine."); foreach (book; books) { xml.book("year", book.year, { foreach (author; book.authors) { xml.tight.authorName({ xml.first(author.first); xml.middle(author.middle); xml.last(author.last); }); } xml.tight.title(book.title); }); }This looks nice and compact Using opDispatch to specify the tag (I guess that is what you are using to create a tag "book" by calling xml.book()) feels like misusing opDispatch, though. Does it add readability in contrast to passing the tag as a string to some function? How do you write a tag named "tight"? Or a tag calculated at runtime? Something more conventional would be xml.tag("book", attr("year", book.year), { ... but I'm not sure that pairing the attribute name and value adds readability or mere noise. Rainer
Feb 06 2011
On Sunday 06 February 2011 13:59:19 Rainer Schuetze wrote:Tomek Sowi=F1ski wrote:Actually, using opDispatch in that manner would become a big problem once y= ou=20 tried to have an xml tag with any function name that xml would have on it. = It=20 really doesn't sound like a good idea and really doesn't provide much benef= it -=20 if any - as far as I can see. It's so simple to just take the tag name as a= =20 string that I see no reason to do otherwise. =2D Jonathan M Davisauto xml =3D xmlWriter(outputRange); =20 xml.comment(books.length, " favorite books of mine."); foreach (book; books) { =20 xml.book("year", book.year, { =20 foreach (author; book.authors) { =20 xml.tight.authorName({ =20 xml.first(author.first); xml.middle(author.middle); xml.last(author.last); =20 }); =20 } xml.tight.title(book.title); =20 }); =20 }=20 This looks nice and compact Using opDispatch to specify the tag (I guess that is what you are using to create a tag "book" by calling xml.book()) feels like misusing opDispatch, though. Does it add readability in contrast to passing the tag as a string to some function? =20 How do you write a tag named "tight"? Or a tag calculated at runtime? =20 Something more conventional would be =20 xml.tag("book", attr("year", book.year), { ... =20 but I'm not sure that pairing the attribute name and value adds readability or mere noise.
Feb 06 2011
On 02/06/2011 10:59 PM, Rainer Schuetze wrote:This looks nice and compact Using opDispatch to specify the tag (I guess that is what you are using to create a tag "book" by calling xml.book()) feels like misusing opDispatch, though. Does it add readability in contrast to passing the tag as a string to some function?About readability, I for one had to thnk really hard ;-)How do you write a tag named "tight"? Or a tag calculated at runtime?Call opDispatch directly ;-)Something more conventional would be xml.tag("book", attr("year", book.year), { ...Would prefere that by far. (even if a few chars more verbose: who cares, the code here is actually data description, by definition done once and for all?)but I'm not sure that pairing the attribute name and value adds readability or mere noise.This raises the same famous issue (repetedly pointed to) as by Lua's tables: since they are both objects and collections (and both arrays and AAs, bu the way), then there is no way to tell apart attributes (members) from elements (coll data), when needed. t.count & t["count"] could /both/ mean attribute 'count' or element which key is "count". Too bad. non-distinction of arrays and AAs, which prevents development of good builtin functionality for each, because of conflicting requirements.) Denis -- _________________ vita es estrany spir.wikidot.com
Feb 06 2011
Rainer Schuetze napisa=B3:This looks nice and compact Using opDispatch to specify the tag (I guess==20that is what you are using to create a tag "book" by calling xml.book())==20feels like misusing opDispatch, though. Does it add readability in=20 contrast to passing the tag as a string to some function? =20 How do you write a tag named "tight"? Or a tag calculated at runtime?xml.tag("tight", attributes..., { make content }); =20 That's the base implementation. opDispatch is just syntax sugar over it.Something more conventional would be =20 xml.tag("book", attr("year", book.year), { ... =20 but I'm not sure that pairing the attribute name and value adds=20 readability or mere noise.Putting name and value without a wrapper tuple is just sugar. Having some s= ort of structure representing an attribute is inevitable as we come at name= spaces. In the end it should accept any range of (namespace-)name-value tup= les as attributes. --=20 Tomek
Feb 06 2011
On 02/06/11 18:18, Tomek Sowiński wrote:Rainer Schuetze napisał:Might I suggest changing the sugar to have a suffix? Ie, instead of xml.book(...) as sugar for xml.tag("book",...) make it xml.bookTag(...) instead (or something similar). Very easy to check for using an if condition, and in the event that some XML application actually has a "tag" tag... well, xml.tagTag() might look funny, but at least it'd work. Could also support "_tag" as an alternate suffix for those with such sensibilities; xml.book_tag(...). -- Chris N-SThis looks nice and compact Using opDispatch to specify the tag (I guess that is what you are using to create a tag "book" by calling xml.book()) feels like misusing opDispatch, though. Does it add readability in contrast to passing the tag as a string to some function? How do you write a tag named "tight"? Or a tag calculated at runtime?xml.tag("tight", attributes..., { make content }); That's the base implementation. opDispatch is just syntax sugar over it.
Feb 08 2011
On Wed, 2011-02-09 at 00:16 -0600, Christopher Nicholson-Sauls wrote: [ . . . ]xml.book(...) as sugar for xml.tag("book",...) make it xml.bookTag(...)using xml.bookTag would ruin it for me. I'd use something that allowed just book, xml.book is already not enough of a DSL. =20 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 08 2011
I am coming in half way through a thread, apologies if I am saying something that has already been said or is not relevant. On Sun, 2011-02-06 at 22:59 +0100, Rainer Schuetze wrote:Tomek Sowi=C5=84ski wrote:This looks to be heading down the road Groovy trod 6 years ago with the MarkupBuilder, indeed the whole builders concept. Validation that Groovy's builders framework is a good idea is that Ruby took up the idea wholesale and Python is starting to as well. It seems the idea may fly in D as well even though it is a very different form of meta-object protocol (MOP). Basically Groovy (, Ruby, and Python) allow you to get rid of the xml. in the above code and it makes the function calls and closures work much better as a DSL for describing the markup. This relies on a MOP of course since it relies on the function despatch being redefinable.=20auto xml =3D xmlWriter(outputRange); =20 xml.comment(books.length, " favorite books of mine."); foreach (book; books) { xml.book("year", book.year, { foreach (author; book.authors) { xml.tight.authorName({ xml.first(author.first); xml.middle(author.middle); xml.last(author.last); }); } xml.tight.title(book.title); }); }This looks nice and compact Using opDispatch to specify the tag (I guess==20that is what you are using to create a tag "book" by calling xml.book())==20feels like misusing opDispatch, though. Does it add readability in=20 contrast to passing the tag as a string to some function?Experience from Groovy, Ruby and Python is a strong yes. Having the tag name as the name of the function with attributes as keyword parameters, string content as an unnamed parameter and nested tag content in a closure leads to beautiful HTML, XHTML, XML, . . . production. well-formedness is guaranteed, though you can still generate invalid documents. Groovy's MarkupBuilder makes for very nice computation of webpages. Here is a real example of generating a part of my website: def writer =3D new StringWriter ( ) ( new MarkupBuilder ( writer ) ).html { head { 'meta' ( 'http-equiv' : 'Content-Type' , content : 'text/html; charse= t=3DUTF-8' ) title ( 'Dr Russel Winder — A Short Biography' ) link ( rel : 'stylesheet' , href : 'style.css' , type : 'text/css' ) } body { div ( id : 'main' ) h1 ( 'Concertant Articles by Russel Winder' ) ul { evaluate ( new File ( 'concertantArticles.groovy' ) ).each { item -=li ( "${ extractPageTitle ( ( new File ( System.properties.'user.= home' + concertantWebpagesSourceDirectory + '/Articles/' + it em[0] ) ).text ) }, ${item[2][0..9]}, " ) { a ( href : "http://www.concertant.com/Articles/${item[0]}" , it= em[1] ) } } } h1 ( 'Articles about SC08 for Concertant by Russel Winder' ) ul { evaluate ( new File ( 'concertantSupercomputing2008Articles.groovy'= ) ).each { item -> li ( "${ extractPageTitle ( ( new File ( System.properties.'user.= home' + concertantWebpagesSourceDirectory + '/Supercomputing2 008Articles/' + item[0] ) ).text ) }, ${item[2][0..9]}, " ) { a ( href : "http://www.concertant.com/Supercomputing2008Article= s/${item[0]}", item[1] ) } } } } }How do you write a tag named "tight"? Or a tag calculated at runtime? =20 Something more conventional would be =20 xml.tag("book", attr("year", book.year), { ... =20 but I'm not sure that pairing the attribute name and value adds=20 readability or mere noise.I don't see this being anything like as useful as what Groovy et al. already has via the MarkupBuilder. "Conventional" is not really the way to go for this you need the full DSL approach. At least in my opinion which I think is becoming the norm in the dynamic programming language arena. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 07 2011
A couple of other random thoughts regarding XML: 1. Groovy has XMLSlurper which is an surprisingly fast way of reading XML and processing it as needed. It was developed for fast SAX-underneath, document-based but not-W3C-DOM processing of multi-Gigabyte XML documents. http://groovy.codehaus.org/api/groovy/util/XmlSlurper.html 2. Python seems to be going down the lxml route. lxml is a Python binding to libxml2 and libxslt. http://codespeak.net/lxml/ --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 07 2011
On 02/06/2011 03:43 PM, Tomek Sowiński wrote:While I'm circling the problem of parsing, I took a quick look at writing not to get stuck in analysis-paralysis. Writing XML is pretty independent from parsing and an order of magnitude easier to solve. It was perfect to get myself coding. These are the guidelines I followed: * Memory minimalism: don't force allocating an intermediate node structure just to push a few tags down the wire. * Composability: operating on an arbitrary string output range. * Robustness: tags should not be left open, even if the routine producing tag interior throws. * Simplicity of syntax: resembling real XML if possible. * Space efficiency / readability: can write tightly (without indents and newlines) for faster network transfer and, having easy an means for temporary tight writing, for better readability. * Ease of use: - automatic to!string of non-string values, - automatic string escaping according to XML standard, - handle nulls: close the tags short (<tag/>), don't write attributes with null values at all. * anything else? The new writer meets pretty much all of the above. Here's an example to get a feel of it: auto books = [ Book([Name("Grębosz", "Jerzy")], "Pasja C++", 1999), Book([Name("Navin", "Robert", "N.")], "Mathemetics of Derivatives", 2007), Book([Name("Tokarczuk", "Olga")], "Podróż ludzi Księgi", 1996), Book([Name("Graham", "Ronald", "L."), Name("Knuth", "Donald", "E."), Name("Patashnik", "Oren")], "Matematyka Konkretna", 2008) ]; auto outputRange = ... ; auto xml = xmlWriter(outputRange); xml.comment(books.length, " favorite books of mine."); foreach (book; books) { xml.book("year", book.year, { foreach (author; book.authors) { xml.tight.authorName({ xml.first(author.first); xml.middle(author.middle); xml.last(author.last); }); } xml.tight.title(book.title); }); } --------------------------------- program output --------------------------------- <!-- 4 favorite books of mine. --> <book year="1999"> <authorName><first>Jerzy</first><middle/><last>Grębosz</last></authorName> <title>Pasja C++</title> </book> <book year="2007"> <authorName><first>Robert</first><middle>N.</middle><last>Navin</last></authorName> <title>Mathemetics of Derivatives</title> </book> <book year="1996"> <authorName><first>Olga</first><middle/><last>Tokarczuk</last></authorName> <title>Podróż ludzi Księgi</title> </book> <book year="2008"> <authorName><first>Ronald</first><middle>L.</middle><last>Graham</last></authorName> <authorName><first>Donald</first><middle>E.</middle><last>Knuth</last></authorName> <authorName><first>Oren</first><middle/><last>Patashnik</last></authorName> <title>Matematyka Konkretna</title> </book> Questions and comments?When does one need to write by hand, in source, structured data needing to be serialised into XML (or any other format)? In my (admittedly very limited), such data always are outputs of some processing (if only reading from other file). denis -- _________________ vita es estrany spir.wikidot.com
Feb 06 2011
On 2/6/11 8:32 PM, spir wrote:When does one need to write by hand, in source, structured data needing to be serialised into XML (or any other format)? In my (admittedly very limited), such data always are outputs of some processing (if only reading from other file). denisIf you have a website API that exposes its data via XML, you would like to generate it like that. What do you mean "outputs of some processing"? As far as I know, the code that Tomek showed is "some processing". :-P
Feb 08 2011
On 02/08/2011 07:44 PM, Ary Manzana wrote:On 2/6/11 8:32 PM, spir wrote:No, in his example, the data were hardcoded as plain constant in source code. Denis -- _________________ vita es estrany spir.wikidot.comWhen does one need to write by hand, in source, structured data needing to be serialised into XML (or any other format)? In my (admittedly very limited), such data always are outputs of some processing (if only reading from other file). denisIf you have a website API that exposes its data via XML, you would like to generate it like that. What do you mean "outputs of some processing"? As far as I know, the code that Tomek showed is "some processing". :-P
Feb 08 2011