digitalmars.D.learn - Save JSONValue binary in file?
- Chopin (6/6) Oct 12 2012 Hello!
- Piotr Szturmaj (4/10) Oct 12 2012 Try this implementation:
- Chopin (7/7) Oct 12 2012 Thanks! I tried using it:
- Piotr Szturmaj (9/15) Oct 12 2012 If you're sure that content is an array:
- Sean Kelly (50/55) Oct 12 2012 The performance problem is because std.json works like a DOM parser for ...
- Jacob Carlborg (10/26) Oct 13 2012 I tried JSON parser in Tango, using D2, this is the results I got for a
Hello! I got this 109 MB json file that I read... and it takes over 32 seconds for parseJSON() to finish it. So I was wondering if it was a way to save it as binary or something like that so I can read it super fast? Thanks for all suggestions :)
Oct 12 2012
Chopin wrote:Hello! I got this 109 MB json file that I read... and it takes over 32 seconds for parseJSON() to finish it. So I was wondering if it was a way to save it as binary or something like that so I can read it super fast? Thanks for all suggestions :)Try this implementation: https://github.com/pszturmaj/json-streaming-parser, you can parse all to memory or do streaming style parsing.
Oct 12 2012
Thanks! I tried using it: auto document = parseJSON(content).array; // this works with std.json :) Using json.d from the link: auto j = JSONReader!string(content); auto document = j.value.whole.array; // this doesn't.... "Error: undefined identifier 'array'"
Oct 12 2012
Chopin wrote:Thanks! I tried using it: auto document = parseJSON(content).array; // this works with std.json :) Using json.d from the link: auto j = JSONReader!string(content); auto document = j.value.whole.array; // this doesn't.... "Error: undefined identifier 'array'"If you're sure that content is an array: auto j = JSONReader!string(content); auto jv = j.value.whole; assert(jv.type == JSONType.array); auto jsonArray = jv.as!(JSONValue[]); alternatively you can replace last line with alias JSONValue[] JSONArray; auto jsonArray = jv.as!JSONArray;
Oct 12 2012
On Oct 12, 2012, at 9:40 AM, Chopin <robert.bue gmail.com> wrote:=20 I got this 109 MB json file that I read... and it takes over 32 seconds for parseJSON() to finish it. So I was wondering if it was a way to save it as binary or something like that so I can read it super fast?The performance problem is because std.json works like a DOM parser for = XML--it allocates a node per value in the JSON stream. What we really = need is something that works more like a SAX parser with the DOM version = as an optional layer built on top. Just for kicks, I grabbed the fourth = (largest) JSON blob from here: http://www.json.org/example.html then wrapped it in array tags and duplicated the object until I had a = ~350 MB input file. ie. [ paste, paste, paste, =85 ] Then I parsed it via this test app, based on an example in a SAX-style = JSON parser I wrote in C: import core.stdc.stdlib; import core.sys.posix.unistd; import core.sys.posix.sys.stat; import core.sys.posix.fcntl; import std.json; void main() { auto filename =3D "input.txt\0".dup; stat_t st; stat(filename.ptr, &st); auto sz =3D st.st_size; auto buf =3D cast(char*) malloc(sz); auto fh =3D open(filename.ptr, O_RDONLY); read(fh, buf, sz); auto json =3D parseJSON(buf[0 .. sz]); } Here are my results: $ dmd -release -inline -O dtest $ ll input.txt -rw-r--r-- 1 sean staff 365105313 Oct 12 15:50 input.txt $ time dtest real 1m36.462s user 1m32.468s sys 0m1.102s =20 Then I ran my SAX style parser example on the same input file: $ make example cc example.c -o example lib/release/myparser.a $ time example real 0m2.191s user 0m1.944s sys 0m0.241s So clearly the problem isn't parsing JSON in general but rather = generating an object tree for a large input stream. Note that the D app = used gigabytes of memory to process this file--I believe the total VM = footprint was around 3.5 GB--while my app used a fixed amount roughly = equal to the size of the input file. In short, DOM style parsers are = great for small data and terrible for large data.
Oct 12 2012
On 2012-10-13 01:26, Sean Kelly wrote:Here are my results: $ dmd -release -inline -O dtest $ ll input.txt -rw-r--r-- 1 sean staff 365105313 Oct 12 15:50 input.txt $ time dtest real 1m36.462s user 1m32.468s sys 0m1.102s Then I ran my SAX style parser example on the same input file: $ make example cc example.c -o example lib/release/myparser.a $ time example real 0m2.191s user 0m1.944s sys 0m0.241s So clearly the problem isn't parsing JSON in general but rather generating an object tree for a large input stream. Note that the D app used gigabytes of memory to process this file--I believe the total VM footprint was around 3.5 GB--while my app used a fixed amount roughly equal to the size of the input file. In short, DOM style parsers are great for small data and terrible for large data.I tried JSON parser in Tango, using D2, this is the results I got for a file just below 360 MB: real 1m2.848s user 0m58.321s sys 0m1.423s Since the XML parser in Tango is so fast I expected more from the JSON parser as well. But I have no idea what kind of parser the JSON parser uses. -- /Jacob Carlborg
Oct 13 2012