digitalmars.D - XML Benchmarks in D
- Scott Sanders (7/7) Mar 12 2008 I have done some benchmarks of the D xml parsers alongside C/C++/Java pa...
- Sean Kelly (2/2) Mar 12 2008 Nice work!
- Walter Bright (2/7) Mar 12 2008 Reddit link: http://reddit.com/r/programming/info/6bt6n/comments/
- N/A (8/12) Mar 12 2008 impact.
- Sean Kelly (4/6) Mar 12 2008 I believe the suggested approach in this case is to access the input as ...
- N/A (8/14) Mar 12 2008 input?
- Scott Sanders (7/23) Mar 12 2008 Should be able to:
- N/A (3/9) Mar 13 2008 Thanks,
- Scott Sanders (3/15) Mar 13 2008 PullParser is exactly the same, just swap Document!(char) with PullParse...
- BCS (7/18) Mar 13 2008 what might be interesting is to make a version that works with slices of...
- Kris (2/21) Mar 13 2008 It would be interesting, but isn't that kinda what memory-mapped files p...
- BCS (14/29) Mar 13 2008 Not as I understand it (I looked this up about a year ago so I'm a bit r...
- Kris (4/24) Mar 13 2008 Reply to BCS:
- Alexander Panek (6/37) Mar 14 2008 I've got this strange feeling in my stomach that shouts out "WTF?!" when...
- Koroskin Denis (6/43) Mar 14 2008 It sounds strange, but even large companies like Google or Yahoo store
- Alexander Panek (2/15) Mar 14 2008 That does, indeed, sound strange. :X
- Robert Fraser (2/6) Mar 14 2008 It's a shame the "O RLY?" owl died out years ago...
- Jarrett Billingsley (4/10) Mar 14 2008 O RLY?
- Bruno Medeiros (6/13) Mar 23 2008 SRSLY?
- Christopher Wright (2/15) Mar 23 2008 You know Sir Sly?
- Sean Kelly (5/42) Mar 14 2008 It's quite possible that an XML stream could be used as the transport me...
- BCS (6/29) Mar 14 2008 Truth be told, I'm not that far from agreeing with you (on seeing that I...
I have done some benchmarks of the D xml parsers alongside C/C++/Java parsers, and as you can see from the graphs, D is rocking with Tango! http://dotnot.org/blog/index.php I wanted to post to let the D community know that good language and library design can really make an impact. As always, I am open to comments/changes/additions, etc. I will be happy to run any other project code through the benchmark if someone submits a patch to me containing the code. And Walter, I am trying to use "D Programming Language" everywhere I can :) Cheers, Scott Sanders
Mar 12 2008
Scott Sanders wrote:I have done some benchmarks of the D xml parsers alongside C/C++/Java parsers, and as you can see from the graphs, D is rocking with Tango! http://dotnot.org/blog/index.phpReddit link: http://reddit.com/r/programming/info/6bt6n/comments/
Mar 12 2008
== Quote from Scott Sanders (scott stonecobra.com)'s articleI have done some benchmarks of the D xml parsers alongside C/C++/Java parsers, and as you can see from thegraphs, D is rocking with Tango!http://dotnot.org/blog/index.php I wanted to post to let the D community know that good language and library design can really make animpact.As always, I am open to comments/changes/additions, etc. I will be happy to run any other project codethrough the benchmark if someone submits a patch to me containing the code. The charts look great. I generally handle files that are a few hundred MB to a few gigs and I noticed that the input is a char[], do you also plan on adding file streams as input? N/A
Mar 12 2008
== Quote from N/A (NA NA.na)'s articleI generally handle files that are a few hundred MB to a few gigs and I noticed that the input is a char[], do you also plan on adding file streams as input?I believe the suggested approach in this case is to access the input as a memory mapped file. This does place some restrictions on file size in 32-bit applications, but then those are ideally in decline. Sean
Mar 12 2008
== Quote from Sean Kelly (sean invisibleduck.org)'s article== Quote from N/A (NA NA.na)'s articleand I noticed that theI generally handle files that are a few hundred MB to a few gigsinput?input is a char[], do you also plan on adding file streams asI believe the suggested approach in this case is to access theinput as a memory mapped file. This doesplace some restrictions on file size in 32-bit applications, butthen those are ideally in decline.SeanAny examples on how to approach this using Tango? Cheers, N/A
Mar 12 2008
N/A Wrote:== Quote from Sean Kelly (sean invisibleduck.org)'s articleShould be able to: auto fc = new FileConduit ("test.txt"); auto buf = new MappedBuffer(fc); auto doc = new Document!(char); doc.parse(buf.getContent()); That should do it.== Quote from N/A (NA NA.na)'s articleand I noticed that theI generally handle files that are a few hundred MB to a few gigsinput?input is a char[], do you also plan on adding file streams asI believe the suggested approach in this case is to access theinput as a memory mapped file. This doesplace some restrictions on file size in 32-bit applications, butthen those are ideally in decline.SeanAny examples on how to approach this using Tango? Cheers, N/A
Mar 12 2008
Should be able to: auto fc = new FileConduit ("test.txt"); auto buf = new MappedBuffer(fc); auto doc = new Document!(char); doc.parse(buf.getContent()); That should do it.Thanks, I was wondering on how to do it using the PullParser. Cheers
Mar 13 2008
N/A Wrote:PullParser is exactly the same, just swap Document!(char) with PullParser!(char). ScottShould be able to: auto fc = new FileConduit ("test.txt"); auto buf = new MappedBuffer(fc); auto doc = new Document!(char); doc.parse(buf.getContent()); That should do it.Thanks, I was wondering on how to do it using the PullParser.
Mar 13 2008
Sean Kelly wrote:== Quote from N/A (NA NA.na)'s articlewhat might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.I generally handle files that are a few hundred MB to a few gigs and I noticed that the input is a char[], do you also plan on adding file streams as input?I believe the suggested approach in this case is to access the input as a memory mapped file. This does place some restrictions on file size in 32-bit applications, but then those are ideally in decline. Sean
Mar 13 2008
BCS Wrote:Sean Kelly wrote:It would be interesting, but isn't that kinda what memory-mapped files provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?== Quote from N/A (NA NA.na)'s articlewhat might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.I generally handle files that are a few hundred MB to a few gigs and I noticed that the input is a char[], do you also plan on adding file streams as input?I believe the suggested approach in this case is to access the input as a memory mapped file. This does place some restrictions on file size in 32-bit applications, but then those are ideally in decline. Sean
Mar 13 2008
Reply to kris,BCS Wrote:Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.It would be interesting, but isn't that kinda what memory-mapped files provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Mar 13 2008
Reply to BCS: "BCS" <ao pathlink.com> wrote in message news:55391cb32a6178ca5358fd65a320 news.digitalmars.com...Reply to kris,Doh. You're right, of course. Thank goodness for 64bit machines :)BCS Wrote:Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB)what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.It would be interesting, but isn't that kinda what memory-mapped files provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Mar 13 2008
BCS wrote:Reply to kris,I've got this strange feeling in my stomach that shouts out "WTF?!" when I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).BCS Wrote:Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.It would be interesting, but isn't that kinda what memory-mapped files provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Mar 14 2008
On Fri, 14 Mar 2008 11:40:20 +0300, Alexander Panek <alexander.panek brainsware.org> wrote:BCS wrote:It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.Reply to kris,I've got this strange feeling in my stomach that shouts out "WTF?!" when I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).BCS Wrote:Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.It would be interesting, but isn't that kinda what memory-mapped files provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Mar 14 2008
Koroskin Denis wrote:On Fri, 14 Mar 2008 11:40:20 +0300, Alexander Panek <alexander.panek brainsware.org> wrote:That does, indeed, sound strange. :XI've got this strange feeling in my stomach that shouts out "WTF?!" when I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.
Mar 14 2008
Koroskin Denis wrote:It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.It's a shame the "O RLY?" owl died out years ago...
Mar 14 2008
"Robert Fraser" <fraserofthenight gmail.com> wrote in message news:freg27$1m7l$1 digitalmars.com...Koroskin Denis wrote:O RLY? Good internet memes never die, they just go into hibernation ;)It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.It's a shame the "O RLY?" owl died out years ago...
Mar 14 2008
Robert Fraser wrote:Koroskin Denis wrote:SRSLY? :P -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DIt sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.It's a shame the "O RLY?" owl died out years ago...
Mar 23 2008
Bruno Medeiros wrote:Robert Fraser wrote:You know Sir Sly?Koroskin Denis wrote:SRSLY? :PIt sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.It's a shame the "O RLY?" owl died out years ago...
Mar 23 2008
== Quote from Alexander Panek (alexander.panek brainsware.org)'s articleBCS wrote:It's quite possible that an XML stream could be used as the transport mechanism for the result of a database query. In such an instance, I wouldn't be at all surprised if a response were more than 3-4GB. In fact, I've designed such a system and the proper query would definitely have produced such a dataset. SeanReply to kris,I've got this strange feeling in my stomach that shouts out "WTF?!" when I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).BCS Wrote:Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.It would be interesting, but isn't that kinda what memory-mapped files provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Mar 14 2008
Reply to Alexander,BCS wrote:Truth be told, I'm not that far from agreeing with you (on seeing that I'd think: "WTF?!?!.... Um... OoooK.... well..."). I can't think of a justification for the lib I described if the only thing it would be used for would be a XML parser. It might be used for managing parts of something like... a database table. <G>Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)I've got this strange feeling in my stomach that shouts out "WTF?!" when I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).
Mar 14 2008