digitalmars.D - Lazily parse a JSON text file using stdx.data.json?
- David Gileadi (17/17) Dec 16 2017 I'm a longtime fan of dlang, but haven't had a chance to do much
- Jonathan M Davis (22/39) Dec 17 2017 I don't know what problems specifically you were hitting, but a lot of
- Steven Schveighoffer (12/20) Dec 17 2017 There is an even more work-in-progress library built on that, but it's
- Dr.No (4/18) May 22 2018 Does this cause infine loop?
- Steven Schveighoffer (13/16) May 22 2018 Possibly. Bug reports are welcome :) I think on this line, it will make
- WebFreak001 (5/15) Dec 17 2017 uh I don't know about stdx.data.json but if you didn't manage to
- David Gileadi (4/10) Dec 17 2017 Thanks, reading the whole file into memory worked fine. However, asdf
- Marco Leise (7/18) Dec 30 2017 There is also the JSON parser from
- David Gileadi (9/12) Jan 01 2018 Nice, I'll take a look.
I'm a longtime fan of dlang, but haven't had a chance to do much in-depth dlang programming, and especially not range programming. Today I thought I'd use stdx.data.json to read from a text file. Since it's a somewhat large file, I thought I'd create a text range from the file and parse it that way. stdx.data.json has a great interface for lazily parsing text into JSON values, so all I had to do was turn a text file into a lazy range of UTF-8 chars that stdx.data.json's lexer could use. (In my best Clarkson voice:) How hard could it be? Several hours later, I've finally given up and am just reading the whole file into a string. There may be a magic incantation I could use to make it work, but I can't find it, and frankly I can't see why I should need an incantation in the first place. It really ought to just be a method of std.stdio.File. Apparently some of the complexity is caused by autodecoding (e.g. joiner returns a range of dchar from char ranges), and some of the fault may be in stdx.data.json, but either way I'm surprised that I couldn't do it. This is the kind of thing I expected to be ground level stuff.
Dec 16 2017
On Saturday, December 16, 2017 21:34:22 David Gileadi via Digitalmars-d wrote:I'm a longtime fan of dlang, but haven't had a chance to do much in-depth dlang programming, and especially not range programming. Today I thought I'd use stdx.data.json to read from a text file. Since it's a somewhat large file, I thought I'd create a text range from the file and parse it that way. stdx.data.json has a great interface for lazily parsing text into JSON values, so all I had to do was turn a text file into a lazy range of UTF-8 chars that stdx.data.json's lexer could use. (In my best Clarkson voice:) How hard could it be? Several hours later, I've finally given up and am just reading the whole file into a string. There may be a magic incantation I could use to make it work, but I can't find it, and frankly I can't see why I should need an incantation in the first place. It really ought to just be a method of std.stdio.File. Apparently some of the complexity is caused by autodecoding (e.g. joiner returns a range of dchar from char ranges), and some of the fault may be in stdx.data.json, but either way I'm surprised that I couldn't do it. This is the kind of thing I expected to be ground level stuff.I don't know what problems specifically you were hitting, but a lot of range-based stuff (especially parsing) requires forward ranges so that there can be some amount of lookahead (having just a basic input range can be incredibly restrictive), and forward ranges and lazily reading from a file don't tend to go together very well, because it tends to require allocating buffers that then have to be copied on save. It gets to be rather difficult to do it efficiently. std.stdio.File does support lazily reading in a file, which works well with foreach, but if you're trying to process the entire file as a range, it's usually just way easier to read in the entire file at once and operate on it as a dynamic array. The option halfway in between is to use std.mmfile so that the file gets treated as a dynamic array but the OS is reading it in piecemeal for you. If I were seriously looking at reading in a file lazily as a forward range, I'd look at http://code.dlang.org/packages/iopipe, though as I understand it, it's very much a work in progress. As for auto-decoding, yeah, it sucks. You can work around it with stuff like std.utf.byCodeUnit, but auto-decoding is a problem all around, and it's one that we're likely stuck with, because unfortunately, we haven't found a way to remove it without breaking everything. - Jonathan M Davis
Dec 17 2017
On 12/17/17 4:44 AM, Jonathan M Davis wrote:If I were seriously looking at reading in a file lazily as a forward range, I'd look at http://code.dlang.org/packages/iopipe, though as I understand it, it's very much a work in progress.There is an even more work-in-progress library built on that, but it's not yet in dub (this was the library I wrote for my dconf talk this year): https://github.com/schveiguy/jsoniopipe This kind of demonstrates how to parse json data lazily with pretty high performance. It really depends on what you are trying to do, though.As for auto-decoding, yeah, it sucks. You can work around it with stuff like std.utf.byCodeUnit, but auto-decoding is a problem all around, and it's one that we're likely stuck with, because unfortunately, we haven't found a way to remove it without breaking everything.I think there eventually will have to be a day of reckoning for auto-decoding. But it probably will take a monumental effort to show how it can be done without being too painful for existing code. I still believe it can be done. -Steve
Dec 17 2017
On Sunday, 17 December 2017 at 16:51:21 UTC, Steven Schveighoffer wrote:On 12/17/17 4:44 AM, Jonathan M Davis wrote:Does this cause infine loop? https://github.com/schveiguy/jsoniopipe/blob/master/source/jsoniopipe/dom.d#L134[...]There is an even more work-in-progress library built on that, but it's not yet in dub (this was the library I wrote for my dconf talk this year): https://github.com/schveiguy/jsoniopipe This kind of demonstrates how to parse json data lazily with pretty high performance. It really depends on what you are trying to do, though.[...]I think there eventually will have to be a day of reckoning for auto-decoding. But it probably will take a monumental effort to show how it can be done without being too painful for existing code. I still believe it can be done. -Steve
May 22 2018
On 5/22/18 3:58 PM, Dr.No wrote:Does this cause infine loop? https://github.com/schveiguy/jsoniopipe/blob/master/source/j oniopipe/dom.d#L134Possibly. Bug reports are welcome :) I think on this line, it will make progress: https://github.com/schveiguy/jsoniopipe/blob/master/source/js niopipe/dom.d#L148, but I'm not confident enough to say I'm sure of it. Of course, as you can probably see, I've spent almost no time working on that code base so far. I need to get back to it. The DOM parser has very little real usage, I just got it working with the given unittests and then checked it in. I've changed iopipe a bit since then as well, but I think I got it compiling just before my "lightning talk" at the Munich D meetup during dconf. Didn't have time to demonstrate it though. -Steve
May 22 2018
On Sunday, 17 December 2017 at 04:34:22 UTC, David Gileadi wrote:I'm a longtime fan of dlang, but haven't had a chance to do much in-depth dlang programming, and especially not range programming. Today I thought I'd use stdx.data.json to read from a text file. Since it's a somewhat large file, I thought I'd create a text range from the file and parse it that way. stdx.data.json has a great interface for lazily parsing text into JSON values, so all I had to do was turn a text file into a lazy range of UTF-8 chars that stdx.data.json's lexer could use. (In my best Clarkson voice:) How hard could it be? [...]uh I don't know about stdx.data.json but if you didn't manage to succeed yet, I know that asdf[1] works really well with streaming json. There is also an example how it works. [1]: http://asdf.dub.pm
Dec 17 2017
On 12/17/17 3:28 AM, WebFreak001 wrote:On Sunday, 17 December 2017 at 04:34:22 UTC, David Gileadi wrote: uh I don't know about stdx.data.json but if you didn't manage to succeed yet, I know that asdf[1] works really well with streaming json. There is also an example how it works. [1]: http://asdf.dub.pmThanks, reading the whole file into memory worked fine. However, asdf looks really cool. I'll definitely look into next time I need to deal with JSON.
Dec 17 2017
Am Sun, 17 Dec 2017 10:21:33 -0700 schrieb David Gileadi <gileadisNOSPM gmail.com>:On 12/17/17 3:28 AM, WebFreak001 wrote:There is also the JSON parser from https://github.com/mleise/fast if you need to parse 2x faster than RapidJSON ;) -- MarcoOn Sunday, 17 December 2017 at 04:34:22 UTC, David Gileadi wrote: uh I don't know about stdx.data.json but if you didn't manage to succeed yet, I know that asdf[1] works really well with streaming json. There is also an example how it works. [1]: http://asdf.dub.pmThanks, reading the whole file into memory worked fine. However, asdf looks really cool. I'll definitely look into next time I need to deal with JSON.
Dec 30 2017
On 12/30/17 8:16 PM, Marco Leise wrote:There is also the JSON parser from https://github.com/mleise/fast if you need to parse 2x faster than RapidJSON ;)Nice, I'll take a look. My original post was mainly to express how surprised I was that one of D's front-page features was, for me, impossible to get working in this context. I posted in hopes that more experienced folks might consider making fixes to help smooth future attempts by others. I realize that compile-time ranges are not runtime interfaces like many languages provide for iteration, but right now ranges seem too hard to get right when it feels like they should just work.
Jan 01 2018