digitalmars.D - I wrote a JSON library
- w0rp (12/12) May 07 2013 I wasn't quite satisfied with std.json or the JSON libraries in
- evilrat (5/8) May 07 2013 looks like you do reinvented the wheel. std.json already has
- deadalnix (5/14) May 07 2013 I always was unhappy with phobos json lib. Mostly because the API.
- Sean Kelly (6/13) May 07 2013 I'm not terribly happy with std.json do this is welcome. I do wish that ...
- Piotr Szturmaj (6/8) May 07 2013 JSON without allocating.
- Sean Kelly (25/27) May 07 2013 Thanks for the link. Unfortunately, I couldn't get it to
- Jacob Carlborg (4/11) May 07 2013 That's quite a big difference.
- Jonathan M Davis (7/16) May 07 2013 Yeah. For both JSON and XML, it should be quite possible to implement a ...
- w0rp (30/37) May 07 2013 This is very interesting. This jepJson library seems to be pretty
- Sean Kelly (30/52) May 07 2013 Yes, the jep parser does no allocation at all--all callbacks
- w0rp (9/13) May 07 2013 Well this is embarrassing. I do apologise. I appear to have
- w0rp (27/27) May 07 2013 I completely missed something out there. Namely, my reasons why I
- Nick Sabalausky (6/10) May 07 2013 Parsing a simple grammar can indeed be very fun! I did that recently,
- deadalnix (9/14) May 08 2013 The API look really nice ! I'd love to sse something similar into
- w0rp (2/10) May 08 2013 I think that's a good point. I'll change them immediately and
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/10) May 07 2013 Just for reference, the vibe.d JSON implementation does not depend on
- Jacob Carlborg (5/6) May 07 2013 I never heard of anyone complaining about too many unit tests. I don't
- David (2/16) May 07 2013 And there is https://256.makerslocal.org/wiki/index.php/Libdjson
- Denis Shelomovskij (8/20) May 10 2013 Good luck as my personal attempt to improve std.json to at least this:
- w0rp (41/41) May 14 2013 I have been working on a few improvements to the library. First,
- deadalnix (3/11) May 14 2013 Awesome. I want to try that lib next time I have to do some code
I wasn't quite satisfied with std.json or the JSON libraries in frameworks. The standard library doesn't make it easy enough to create JSON objects, and my primary objection for the framework solutions is that they seem to depend on other parts of the frameworks. (I'd rather not depend on a host of libraries I won't be using just to use one I will.) So, desiring an easy-to-use and atomic library, I took to writing my own from scratch. https://github.com/w0rp/dson/blob/master/json.d I would love to hear some comments on my implementation. Criticism is mostly what I am after. It's hard for me to self-criticise. Perhaps the most obvious criticism to me is that I seem to write too damn many unit tests.
May 07 2013
On Tuesday, 7 May 2013 at 07:29:16 UTC, w0rp wrote:... https://github.com/w0rp/dson/blob/master/json.d ...looks like you do reinvented the wheel. std.json already has anything we need to read/write json, and it is really small. after all, i would enjoy the idea if someone would write std.xml like stuff for reading/writing which will be pulled to std.json.
May 07 2013
On Tuesday, 7 May 2013 at 07:52:29 UTC, evilrat wrote:On Tuesday, 7 May 2013 at 07:29:16 UTC, w0rp wrote:I always was unhappy with phobos json lib. Mostly because the API. wOrp, can you provide some usage example, to see how the lib is intended to be used ? Do you have some benchmarks ? Sone, same thing ?... https://github.com/w0rp/dson/blob/master/json.d ...looks like you do reinvented the wheel. std.json already has anything we need to read/write json, and it is really small. after all, i would enjoy the idea if someone would write std.xml like stuff for reading/writing which will be pulled to std.json.
May 07 2013
I'm not terribly happy with std.json do this is welcome. I do wish that ther= e were a SAX style parser available too though, so I could parse JSON withou= t allocating.=20 On May 7, 2013, at 12:52 AM, "evilrat" <evilrat666 gmail.com> wrote:On Tuesday, 7 May 2013 at 07:29:16 UTC, w0rp wrote:eed to read/write json, and it is really small.... https://github.com/w0rp/dson/blob/master/json.d ...=20 looks like you do reinvented the wheel. std.json already has anything we n=after all, i would enjoy the idea if someone would write std.xml like stuf=f for reading/writing which will be pulled to std.json.
May 07 2013
W dniu 07.05.2013 16:53, Sean Kelly pisze:I'm not terribly happy with std.json do this is welcome. I do wish that there were a SAX style parser available too though, so I could parseJSON without allocating. You may find this useful: https://github.com/pszturmaj/json-streaming-parser Don't be scared by the TODOs, they're not much relevant for normal usage. The only thing you shouldn't do is calling whole() methods more than once.
May 07 2013
On Tuesday, 7 May 2013 at 17:13:18 UTC, Piotr Szturmaj wrote:You may find this useful: https://github.com/pszturmaj/json-streaming-parserThanks for the link. Unfortunately, I couldn't get it to compiler out of the box. I did use the test routine you had to benchmark std.json and the JSON implementation from the OP of this thread as well as an event-based JSON parser I implemented for work. On a single parse of this large (189MB) JSON file: https://github.com/zeMirco/sf-city-lots-json Here are my results for one parse, where "newJson" is the OP's JSON parser and "jepJson" is mine: $ main n = 1 Milliseconds to call stdJson() n times: 73054 Milliseconds to call newJson() n times: 44022 Milliseconds to call jepJson() n times: 839 newJson() is faster than stdJson() 1.66x times jepJson() is faster than stdJson() 87.1x times Now obviously, in many cases convenience is preferable to raw speed, but I think code in Phobos should be an option for both types of uses whenever possible. What I'd really like to see is the variant-type front-end layered on top of an event-based parser so the user could just use parseJSON as-is to generate a tree of JSON objects or call the event-driven parser directly when performance is desired. I don't think the parser needs to be resumable either, since in most cases JSON is transported in an HTTP message, so a plain old recursive descent parser is fine.
May 07 2013
On 2013-05-07 20:36, Sean Kelly wrote:$ main n = 1 Milliseconds to call stdJson() n times: 73054 Milliseconds to call newJson() n times: 44022 Milliseconds to call jepJson() n times: 839 newJson() is faster than stdJson() 1.66x times jepJson() is faster than stdJson() 87.1x timesThat's quite a big difference. -- /Jacob Carlborg
May 07 2013
On Tuesday, May 07, 2013 20:36:19 Sean Kelly wrote:Now obviously, in many cases convenience is preferable to raw speed, but I think code in Phobos should be an option for both types of uses whenever possible. What I'd really like to see is the variant-type front-end layered on top of an event-based parser so the user could just use parseJSON as-is to generate a tree of JSON objects or call the event-driven parser directly when performance is desired. I don't think the parser needs to be resumable either, since in most cases JSON is transported in an HTTP message, so a plain old recursive descent parser is fine.Yeah. For both JSON and XML, it should be quite possible to implement a low- level API which gives you raw speed and then build more convenient APIs on top of them, thereby giving users the choice. And given how slices work, parsers like this should be able to beat the pants off of most parsers in other languages, especially with the low-level API. - Jonathan M Davis
May 07 2013
On Tuesday, 7 May 2013 at 18:36:20 UTC, Sean Kelly wrote:$ main n = 1 Milliseconds to call stdJson() n times: 73054 Milliseconds to call newJson() n times: 44022 Milliseconds to call jepJson() n times: 839 newJson() is faster than stdJson() 1.66x times jepJson() is faster than stdJson() 87.1x timesThis is very interesting. This jepJson library seems to be pretty fast. I imagine this library works very similar to SAX, so you can save quite a bit on simply not having to allocate. Before I read this, I went about creating my own benchmark. Here is a .zip containing the source and some nice looking bar charts comparing std.json, vibe.d's json library, and my own against various arrays of objects held in memory as a string: http://www.mediafire.com/download.php?gabsvk8ta711q4u For those less interested in downloading and looking at the .ods file, here are the results for the largest input size. (Array of 100,000 small objects) std.json - 2689375370 ms vibe.data.json - 2835431576 ms dson - 3705095251 ms Where 'dson' is my library. I have done my duty and made my own library look the worst in benchmarks. I think overall these are all linear time algorithms that do very similar things, and the speed difference is very minor. As always with benchmarks, mileage may vary. Per request for examples of my library, I have produced this little snippet. http://pastebin.com/sU8heFXZ It's hard to enumerate all of the features I put in there at once, but that's a pretty good start. I also listed a few examples in a doc comment at the top of the json.d source. The idea presented in this thread of building a nice tagged union reader (like std.json, vibe.d, and my own) on top of a recursive event (SAX-like) parser seems pretty attractive to me now. I can envision re-writing my own library to work on top of such a parser.
May 07 2013
On Tuesday, 7 May 2013 at 20:14:20 UTC, w0rp wrote:On Tuesday, 7 May 2013 at 18:36:20 UTC, Sean Kelly wrote:Yes, the jep parser does no allocation at all--all callbacks simply receive a slice of the value. It does full validation according to the spec, but there's no interpretation of the values beyond that either, so if you want the integer string you were passed converted to an int, for example, you'd do the conversion yourself. The same goes for unescaping of string data, and in practice I often end up unescaping the strings in-place since I typically never need to re-parse the input buffer. In practice, it's kind of a pain to use the jep parser for arbitrary processing so I have some functions layered on top of it that iterate across array values and object keys: int foreachArrayElem(char[] buf, scope int delegate(char[] value)); int foreachObjectField(char[] buf, scope int delegate(char[] name, char[] value)); This works basically the same as opApply, so having the delegate return a nonzero value causes parsing to abort and return that value from the foreach routine. The parser is sufficiently fast that I generally just nest calls to these foreach routines to parse complex types, even though this results in multiple passes across the same data. The only other thing I was careful to do is design the library in such a way that each parser callback could call a corresponding writer routine to simply pass through the input to an output buffer. This makes auto-reformatting a breeze because you just set a "format output" flag on the writer and implement a few one-line functions.$ main n = 1 Milliseconds to call stdJson() n times: 73054 Milliseconds to call newJson() n times: 44022 Milliseconds to call jepJson() n times: 839 newJson() is faster than stdJson() 1.66x times jepJson() is faster than stdJson() 87.1x timesThis is very interesting. This jepJson library seems to be pretty fast. I imagine this library works very similar to SAX, so you can save quite a bit on simply not having to allocate.Before I read this, I went about creating my own benchmark. Here is a .zip containing the source and some nice looking bar charts comparing std.json, vibe.d's json library, and my own against various arrays of objects held in memory as a string: http://www.mediafire.com/download.php?gabsvk8ta711q4u For those less interested in downloading and looking at the .ods file, here are the results for the largest input size. (Array of 100,000 small objects) std.json - 2689375370 ms vibe.data.json - 2835431576 ms dson - 3705095251 msThese results don't seem correct. Is this really milliseconds?
May 07 2013
Well this is embarrassing. I do apologise. I appear to have printed the TickDuration object value itself instead of the milliseconds. I think I spent too much time writing the benchmark and too little looking at the actual results. I ran it again quickly correcting the error (.msecs) and got much more reasonable looking results on a size of 1,000: std.json : 7370 ms vibe.data.json : 6878 ms json : 9150 msstd.json - 2689375370 ms vibe.data.json - 2835431576 ms dson - 3705095251 msThese results don't seem correct. Is this really milliseconds?
May 07 2013
I completely missed something out there. Namely, my reasons why I just didn't like the existing implementations enough. Overall, the other libraries are all very similar, so I don't have major complaints, just little ones. For vibe.d, it's actually pretty close to what I wanted. My big objection is that I don't like the 'Undefined' types. I would rather experience runtime errors in those cases. I also have to pretty much depend on Vibe to use it, rather than just a JSON library. Aside from that, it's not far off from what I'm after. For Libdjson, it uses classes to represent json types. That just seems very awkward to use, and that shouts out "unnecessary garbage creation" to me. The standard library (std.json) seems to nail the parsing of JSON, but lacks the ability to write a JSON string to an output range, and doesn't really offer any conveniences for working with the JSON data structure itself. std.json, vibe.d, and my own representation of JSON are all very similar. They are tagged unions implemented with union {} and an enum. What makes vibe.d and my own library nice is all of the operator overloads, properties, and convenience functions. Another issue with std.json is lack of pretty-printing, which both vibe.d and my own library address. (Mine has toJSON!4 and writeJSON!8 for a string indented by 4 characters and writing to an output range indented by 8 characters, respectively.) So that's essentially my rationale. Overall, writing the library was mostly done because I found it to be a rather entertaining challenge for myself.
May 07 2013
On Tue, 07 May 2013 23:09:35 +0200 "w0rp" <devw0rp gmail.com> wrote:So that's essentially my rationale. Overall, writing the library was mostly done because I found it to be a rather entertaining challenge for myself.Parsing a simple grammar can indeed be very fun! I did that recently, too (not JSON though), partly to try my hand at LL for a change, and had a blast. Designing and implementing a good API can actually be the hard/tedius part (well, and the unittests can be pretty tedius).
May 07 2013
On Tuesday, 7 May 2013 at 20:14:20 UTC, w0rp wrote:Per request for examples of my library, I have produced this little snippet. http://pastebin.com/sU8heFXZ It's hard to enumerate all of the features I put in there at once, but that's a pretty good start. I also listed a few examples in a doc comment at the top of the json.d source.The API look really nice ! I'd love to sse something similar into phobos APIwise. But I don't like the shortcut choices. arr => array you only win 2 chars ! That is nothing and certainly not worth the confusion. Same for obj => object. With this kind of practices, everybody come with its own set of shortcut and you have to remember all of them for each library ! What seems like a speedup at first ends up being a slowdown.
May 08 2013
The API look really nice ! I'd love to sse something similar into phobos APIwise. But I don't like the shortcut choices. arr => array you only win 2 chars ! That is nothing and certainly not worth the confusion. Same for obj => object. With this kind of practices, everybody come with its own set of shortcut and you have to remember all of them for each library ! What seems like a speedup at first ends up being a slowdown.I think that's a good point. I'll change them immediately and push to github.
May 08 2013
On Wednesday, 8 May 2013 at 21:05:55 UTC, w0rp wrote:Awesome. Another nice thing you can do it to use alias this on a property to allow for implicit conversion to int. Overall, the API is super nice ! If performance don't matter, I definitively recommend to use the lib.The API look really nice ! I'd love to sse something similar into phobos APIwise. But I don't like the shortcut choices. arr => array you only win 2 chars ! That is nothing and certainly not worth the confusion. Same for obj => object. With this kind of practices, everybody come with its own set of shortcut and you have to remember all of them for each library ! What seems like a speedup at first ends up being a slowdown.I think that's a good point. I'll change them immediately and push to github.
May 08 2013
On Thursday, 9 May 2013 at 01:42:41 UTC, deadalnix wrote:Awesome. Another nice thing you can do it to use alias this on a property to allow for implicit conversion to int. Overall, the API is super nice ! If performance don't matter, I definitively recommend to use the lib.I'll have to experiment with the alias this idea. There are still a few things I need to work out. I'm missing an overload for opCmp (plus the host of math operators), and the append behaviour is perhaps strange. I had to choose between ~ meaning a JSON array is added to the LHS, [] ~ [1, 2] == [[1, 2]], or an array is concatenated, like the normal D arrays, [] ~ [1, 2] == [1, 2]. I went with the former for now, but I might have made the wrong choice. It all came about because of this. auto arr = jsonArray(); arr ~= 1; // [1] arr ~= "foo"; // [1, "foo"] arr ~= jsonArray() // Currently: [1, "foo", []] auto another = jsonArray(); another ~= 3; arr.array ~= another.array; // Always: [1, "foo", [], 3] I swear that I wrote a concat(JSON, JSON) function for this, but it's not there. That would have accomplished this: arr.concat(another)
May 09 2013
This entire thread is a really good example of why all new modules should live in exp. for a year after birth before moving to std... On 9 May 2013 17:21, w0rp <devw0rp gmail.com> wrote:On Thursday, 9 May 2013 at 01:42:41 UTC, deadalnix wrote:Awesome. Another nice thing you can do it to use alias this on a property to allow for implicit conversion to int. Overall, the API is super nice ! If performance don't matter, I definitively recommend to use the lib.I'll have to experiment with the alias this idea. There are still a few things I need to work out. I'm missing an overload for opCmp (plus the host of math operators), and the append behaviour is perhaps strange. I had to choose between ~ meaning a JSON array is added to the LHS, [] ~ [1, 2] == [[1, 2]], or an array is concatenated, like the normal D arrays, [] ~ [1, 2] == [1, 2]. I went with the former for now, but I might have made the wrong choice. It all came about because of this. auto arr = jsonArray(); arr ~= 1; // [1] arr ~= "foo"; // [1, "foo"] arr ~= jsonArray() // Currently: [1, "foo", []] auto another = jsonArray(); another ~= 3; arr.array ~= another.array; // Always: [1, "foo", [], 3] I swear that I wrote a concat(JSON, JSON) function for this, but it's not there. That would have accomplished this: arr.concat(another)
May 09 2013
Am 07.05.2013 09:29, schrieb w0rp:I wasn't quite satisfied with std.json or the JSON libraries in frameworks. The standard library doesn't make it easy enough to create JSON objects, and my primary objection for the framework solutions is that they seem to depend on other parts of the frameworks. (I'd rather not depend on a host of libraries I won't be using just to use one I will.)Just for reference, the vibe.d JSON implementation does not depend on other parts of the library: https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/data/json.d
May 07 2013
On 2013-05-07 09:29, w0rp wrote:Perhaps the most obvious criticism to me is that I seem to write too damn many unit tests.I never heard of anyone complaining about too many unit tests. I don't see it as a problem. -- /Jacob Carlborg
May 07 2013
Am 07.05.2013 09:29, schrieb w0rp:I wasn't quite satisfied with std.json or the JSON libraries in frameworks. The standard library doesn't make it easy enough to create JSON objects, and my primary objection for the framework solutions is that they seem to depend on other parts of the frameworks. (I'd rather not depend on a host of libraries I won't be using just to use one I will.) So, desiring an easy-to-use and atomic library, I took to writing my own from scratch. https://github.com/w0rp/dson/blob/master/json.d I would love to hear some comments on my implementation. Criticism is mostly what I am after. It's hard for me to self-criticise. Perhaps the most obvious criticism to me is that I seem to write too damn many unit tests.And there is https://256.makerslocal.org/wiki/index.php/Libdjson
May 07 2013
07.05.2013 11:29, w0rp пишет:I wasn't quite satisfied with std.json or the JSON libraries in frameworks. The standard library doesn't make it easy enough to create JSON objects, and my primary objection for the framework solutions is that they seem to depend on other parts of the frameworks. (I'd rather not depend on a host of libraries I won't be using just to use one I will.) So, desiring an easy-to-use and atomic library, I took to writing my own from scratch. https://github.com/w0rp/dson/blob/master/json.d I would love to hear some comments on my implementation. Criticism is mostly what I am after. It's hard for me to self-criticise. Perhaps the most obvious criticism to me is that I seem to write too damn many unit tests.Good luck as my personal attempt to improve std.json to at least this: https://github.com/D-Programming-Language/phobos/pull/1206#issuecomment-14826562 got stuck on even this simple pull: https://github.com/D-Programming-Language/phobos/pull/1263 -- Денис В. Шеломовский Denis V. Shelomovskij
May 10 2013
I have been working on a few improvements to the library. First, I made a few performance tweaks. Aside from very small (and therefore hard to describe) tweaks, I made two major improvements. 1. Manual parsing of numbers has been implemented. 2. When the input is a string, the indices and the length are used instead of the array InputRange functions. The first one is a dangerous idea, but my unit tests show that at least what I have tested works. The reasoning behind it is that before, a string buffer (Appender!string) was created for numbers, and then one of parse!long or parse!real was chosen based upon whether or not the parser figured out it was an integer or not. Now it will read the input into actual numbers as it goes and then spit out an integer or a floating point number after it hits the end. You need to put a helmet on, but there's less allocation along the way. Perhaps this idea could be better encapsulated at some point with an std.conv function which accepts a range and returns a tagged union. The second improvement is actually pretty nice, because I already wrapped the input range functions in methods anyway, so it was a simple matter of inserting 'static if ... else' to flip the string optimisation on. Perhaps it is better in general to wrap strings in range structs than to rely on std.array's range functions. The end result is that I can now cheat at my own benchmark. --- Ran for 100 runs std.json : 674 ms vibe.data.json : 604 ms json : 548 ms --- Which I updated slightly to match some function renaming (plus to correct my earlier embarrassing omission of .msecs) here: http://pastebin.com/KciFit4b It's not a complete test of speed, and as always with benchmarks, mileage will vary. In addition to these things, I made a few of the property and function names a little nicer, and generally improved on the documentation, which currently looks a little like this. http://www.mediafire.com/?q5lwtj2cc22s1t0 I apologise for my current lack of hosting. (I plan to correct this at a later date, perhaps with a website written in D!)
May 14 2013
On Tuesday, 14 May 2013 at 20:23:42 UTC, w0rp wrote:It's not a complete test of speed, and as always with benchmarks, mileage will vary. In addition to these things, I made a few of the property and function names a little nicer, and generally improved on the documentation, which currently looks a little like this. http://www.mediafire.com/?q5lwtj2cc22s1t0 I apologise for my current lack of hosting. (I plan to correct this at a later date, perhaps with a website written in D!)Awesome. I want to try that lib next time I have to do some code involving JSON.
May 14 2013