digitalmars.D - RFC: std.json sucessor
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (37/37) Aug 21 2014 Following up on the recent "std.jgrandson" thread [1], I've picked up
- Brian Schott (16/17) Aug 21 2014 source/stdx/data/json/lexer.d(263:8)[warn]: 'JSONToken' has
- Justin Whear (2/2) Aug 21 2014 Someone needs to make a "showbrianmycode" bot: mention a D github repo
- Idan Arye (3/6) Aug 21 2014 Why bother with mentioning a GitHub repo? Just make the bot
- Brian Schott (2/4) Aug 21 2014 It's kind of picky. http://i.imgur.com/SHNAWnH.png
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/18) Aug 22 2014 Fixed all of them (neither was causing harm, but it's still nicer that
- Ary Borenszweig (12/21) Aug 21 2014 Say I have a class Person with name (string) and age (int) with a
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (13/24) Aug 21 2014 Without a serialization framework it would in theory work like this:
- Ary Borenszweig (4/30) Aug 22 2014 But does this parse the whole json into JSONValue? I want to create a
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (20/36) Aug 22 2014 That would be done by the serialization framework. Instead of using
- Ary Borenszweig (2/39) Aug 22 2014 Cool, that looks good :-)
- Colden Cullen (5/5) Aug 21 2014 I notice in the docs there are several references to a
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/8) Aug 21 2014 Seems like I forgot to replace a few mentions. They are called
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (4/12) Aug 22 2014 https://github.com/D-Programming-Language/phobos/pull/2452
- matovitch (5/5) Aug 22 2014 Very nice ! I had started (and dropped) a json module based on
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (13/17) Aug 22 2014 Exactly, that's the syntax you'd use for JSONValue. But my favorite way
- matovitch (5/27) Aug 22 2014 Completely agree, I am waiting for a serializer too. I would love
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/5) Aug 22 2014 I see, so you just have to write your own number/string parsing routines...
- matovitch (10/18) Aug 22 2014 It's kind of "low level" indeed...I don't know what kind of back
- Jacob Carlborg (15/24) Aug 22 2014 * Opening braces should be put on their own line to follow Phobos style
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/10) Aug 22 2014 Hmm... my initial reaction was "not as default - it should throw
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (12/35) Aug 22 2014 There are actually no invalid tokens at all, the "invalid" enum value is...
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (4/29) Aug 22 2014 and an additional "error" kind has been added, which implements the
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (16/16) Aug 22 2014 Some thoughts about the API:
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (18/31) Aug 22 2014 For those functions it may be acceptable, although I really dislike that...
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (15/42) Aug 22 2014 I'm not really concerned about the amount of typing, it just
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (9/24) Aug 22 2014 That would be nice, but then it should also work together with std.conv,...
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/35) Aug 22 2014 The easiest and cleanest way would be to add a function in
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/32) Aug 22 2014 Okay, for parse that may work, but what about to!()?
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (2/16) Aug 22 2014 What's the problem with to!()?
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/20) Aug 23 2014 to!() definitely doesn't have a template constraint that excludes
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (6/30) Aug 23 2014 For converting a JSONValue to a different type, JSONValue can
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/34) Aug 23 2014 That would just introduce the said dependency cycle between JSONValue,
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (4/52) Aug 23 2014 That's what I expect it to do anyway. For parsing, there are
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/43) Aug 23 2014 Probably, but then to!() is inconsistent with parse!(). Usually they are...
- Christian Manning (9/9) Aug 22 2014 It would be nice to have integers treated separately to doubles.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/16) Aug 22 2014 That's how I've done it for vibe.data.json, too. For the new
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/14) Aug 22 2014 It should automatically fall back to double on overflow. Maybe
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/20) Aug 22 2014 I guess BigInt + exponent would be the only lossless way to represent
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (4/31) Aug 22 2014 As the functions will be templatized anyway, it should include a
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/33) Aug 22 2014 I'm actually in the process of converting the "track_location" parameter...
- Christian Manning (5/32) Aug 22 2014 You could check for a decimal point and a 0 at the front
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (7/36) Aug 22 2014 Yes, no decimal point + no exponent would work without overhead to
- John Colvin (12/64) Aug 22 2014 It might be the right choice anyway (seeing as json/js do
- Christian Manning (18/24) Aug 22 2014 Ah I see.
- Walter Bright (17/18) Aug 22 2014 Thanks for taking this on! This is valuable work. On to destruction!
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (12/29) Aug 22 2014 The latest version now features a LexOptions.noThrow option which causes...
- Walter Bright (9/31) Aug 22 2014 Having a nothrow option may prevent the functions from being attributed ...
- Walter Bright (5/14) Aug 22 2014 Another possibility is to have the user pass in a resizeable buffer whic...
- Ola Fosheim Gr (3/9) Aug 22 2014 Does this mean that D is getting resizable stack allocations in
- Walter Bright (2/7) Aug 22 2014 scopebuffer does not require resizeable stack allocations.
- Ola Fosheim Gr (16/21) Aug 22 2014 So you cannot use the stack for resizable allocations.
- Walter Bright (2/9) Aug 22 2014 Please, take a look at how scopebuffer works.
- Ola Fosheim Gr (12/24) Aug 22 2014 I have? It requires an upperbound to stay on the stack, that
- Walter Bright (5/26) Aug 22 2014 Scopebuffer is extensively used in Warp, and works very well. The "hole"...
- Ola Fosheim Gr (9/13) Aug 22 2014 Well, on a webserver you don't want to push out the caches for no
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/17) Aug 23 2014 It's a compile time option, so that shouldn't be an issue. There is also...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/19) Aug 23 2014 I've added two new types now to abstract away how strings and numbers
- Walter Bright (2/3) Aug 23 2014 Why the immutable(ubyte)[] ?
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/8) Aug 23 2014 I've adopted that basically from Andrei's module. The idea is to allow
- Walter Bright (3/12) Aug 23 2014 I feel that non-UTF encodings should be handled by adapter algorithms, n...
- Brad Roberts via Digitalmars-d (9/24) Aug 23 2014 For performance purposes, determining encoding during lexing is useful.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/7) Aug 23 2014 I am not so sure when it comes to SIMD lexing. I think the
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/3) Aug 23 2014 Some baselines for performance:
- Walter Bright (2/6) Aug 23 2014 I'm not convinced that using an adapter algorithm won't be just as fast.
- Brad Roberts via Digitalmars-d (7/14) Aug 23 2014 Consider your own talks on optimizing the existing dmd lexer. In those
- Walter Bright (5/11) Aug 25 2014 On the other hand, deadalnix demonstrated that the ldc optimizer was abl...
- simendsjo (7/23) Aug 25 2014 I just happened to write a very small script yesterday and tested with
- Walter Bright (2/8) Aug 25 2014 Speed optimizations are different.
- Jacob Carlborg (5/6) Aug 26 2014 It's because the latest release of LDC has the --gc-sections falg
- Entusiastic user (4/9) Aug 26 2014 I tried using "-disable-linker-strip-dead", but it had no effect.
- Andrei Alexandrescu (8/23) Aug 23 2014 I think accepting ubyte it's a good idea. It means "got this stream of
- Walter Bright (9/14) Aug 23 2014 Using an adapter still makes sense, because:
- Andrei Alexandrescu (6/25) Aug 23 2014 An adapter would solve the wrong problem here. There's nothing to adapt
- Walter Bright (6/10) Aug 25 2014 The adaptation is to take arbitrary byte input in an unknown encoding an...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/16) Aug 25 2014 I agree.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/18) Aug 25 2014 BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159, which
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/13) Aug 25 2014 The lexer cannot assume valid UTF since the client might be a
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (13/25) Aug 25 2014 But why should UTF validation be the job of the lexer in the first
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (21/32) Aug 25 2014 Because you want to save time, it is faster to integrate
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (17/21) Aug 25 2014 I think it is doable and worth it…
- Kiith-Sa (23/44) Aug 25 2014 D:YAML uses a similar approach, but with 8 bytes (plain ulong -
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/10) Aug 25 2014 Cool!
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/40) Aug 26 2014 I guess it depends on if you look at the grammar as productions or
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (12/24) Aug 26 2014 I think you should validate JSON-strings to be UTF-8 encoded even
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (22/43) Aug 26 2014 I think this is a misunderstanding. What I mean is that if the input
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/5) Aug 26 2014 Yes, so this will be supported? Because this is what is most
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/7) Aug 26 2014 If nobody plays a veto card, I'll implement it that way.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/11) Aug 25 2014 Btw, maybe it would be a good idea to take a look on the JSON
- Walter Bright (2/4) Aug 25 2014 I think that settles it.
- Andrej Mitrovic via Digitalmars-d (11/12) Aug 22 2014 This confused me for a solid minute:
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/16) Aug 22 2014 Hmmm, but it *is* a string. Isn't the problem more the use of with in
- Andrej Mitrovic via Digitalmars-d (3/5) Aug 23 2014 Yeah, maybe so. I thought for a second it was a tuple, but then I saw
- deadalnix (17/17) Aug 22 2014 First thank you for your work. std.json is horrible to use right
- ketmar via Digitalmars-d (3/8) Aug 22 2014 jsvar using opDispatch, and S=C3=B6nke wrote:
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (17/31) Aug 23 2014 Setting the issue of opDispatch aside, one of the goals was to use
- w0rp (8/17) Aug 23 2014 I have seen similar issues to these with simplexml in PHP. Using
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/7) Aug 23 2014 It's split into two separate functions now. Having to type out a full
- deadalnix (3/12) Aug 23 2014 Yes, I don't mind missing that one. It look like a false good
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (18/18) Aug 25 2014 I've added support (compile time option [1]) for long and BigInt in the
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (12/15) Aug 25 2014 It can be very useful to have a base 10 exponent representation
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/20) Aug 25 2014 In fact, I've already prepared the code for that, but commented it out
- Don (14/61) Aug 25 2014 One missing feature (which is also missing from the existing
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (15/21) Aug 25 2014 I believe you are allowed to use very high exponents, though.
- Walter Bright (5/21) Aug 25 2014 Infinity. Mapping to max value would be a horrible bug.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/14) Aug 25 2014 Yes… but then you are reading an illegal value that JSON does not
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/5) Aug 25 2014 It is defined in C++11:
- Walter Bright (5/9) Aug 25 2014 I didn't know that. But recall I did implement it in DMC++, and it turne...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/6) Aug 25 2014 Well, one should initialize with signaling NaN. Then you get an
- Walter Bright (3/9) Aug 25 2014 That's the theory. The practice doesn't work out so well.
- Don (6/19) Aug 26 2014 To be more concrete:
- Ola Fosheim Gr (3/7) Aug 26 2014 I disagree. AFAIK signaling NaN was standardized in IEEE 754-2008.
- Don (20/29) Aug 26 2014 It was always in IEEE754. The decision in 754-2008 was simply to
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/14) Aug 26 2014 It was implementation defined before. I think they specified the
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/7) Aug 26 2014 …
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (34/34) Aug 26 2014 With the danger of being noisy, these instructions are subject to
- Don (6/16) Aug 26 2014 No, it's more subtle. On the original x87, signalling NaNs are
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/14) Aug 26 2014 You are right, but it happens for loads from the FP-stack too:
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/10) Aug 26 2014 Sorry for being off-topic, but MOVSS and VMOVSS on AMD don't
- Walter Bright (7/24) Aug 27 2014 The other issues were just when the snan => qnan conversion took place. ...
- Don (16/50) Aug 28 2014 I think the way to think of it is, to the programmer, there is
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/19) Aug 28 2014 I disagree with this view.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/7) Aug 28 2014 Or to be more explicit:
- Don (9/16) Aug 28 2014 No. Once you load an SNAN, it isn't an SNAN any more! It is a
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/24) Aug 28 2014 By which definition? It is only if you consume the SNAN with an
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/9) Aug 28 2014 Let me try again:
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (17/17) Aug 28 2014 Kahan states this in a 1997 paper:
- Daniel Murphy (5/7) Aug 28 2014 So should we get rid of them from the language completely? Using them a...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/61) Aug 25 2014 This would probably best added as another (CT) optional feature. I think...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/72) Aug 25 2014 By default, floating-point special values are now output as 'null',
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/11) Aug 25 2014 ECMAScript presumes double. I think one should base Phobos on
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (34/38) Aug 25 2014 Let me expand a bit on the difference between web clients and
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/10) Aug 25 2014 Like... node.js?
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (16/26) Aug 25 2014 Well, of course it's based on that RFC, did you seriously think
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/5) Aug 25 2014 Sorry, to be precise, it has no suggestion of how to *handle* infinity
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (26/36) Aug 25 2014 I made no assumptions, just responded to what you wrote :-). It
- Don (13/104) Aug 26 2014 Yes, it should be optional, but not a compile-time option.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (13/32) Aug 26 2014 Why not a compile time option?
- Don (15/60) Aug 26 2014 Please note, I've been talking about the lexer. I'm choosing my
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/61) Aug 26 2014 I've been talking about the lexer, too. Sorry for the confusing use of
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/13) Aug 26 2014 One argument against supporting it in the parser is that the parser
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/13) Aug 26 2014 I don't care either way, but JSON.stringify() has the following
- Entusiastic user (43/43) Aug 25 2014 Hi!
- Entusiastic user (1/4) Aug 25 2014 I meant Ubuntu 13.10 :D
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/7) Aug 26 2014 I've fixed all errors on DMD 2.065 now. Hopefully that should also fix L...
- David Soria Parra (4/14) Aug 26 2014 Do we have any benchmarks for this yet. Note that the main
- Atila Neves (5/52) Sep 08 2014 Been using it for a bit now, I think the only thing I have to say
- Andrei Alexandrescu (54/54) Oct 12 2014 Here's my destruction of std.data.json.
- Sean Kelly (6/18) Oct 12 2014 I'd like to see unescapeStringLiteral() made public. Then I can
- Sean Kelly (2/2) Oct 12 2014 Oh, it looks like you aren't checking for 0x7F (DEL) as a control
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/5) Oct 13 2014 It doesn't get mentioned in the JSON spec, so I left it out. But I guess...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/6) Oct 13 2014 Will do. Same for the inverse functions.
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (67/120) Oct 13 2014 This is actually more or less done in unescapeStringLiteral() - if it
- Jacob Carlborg (4/7) Oct 13 2014 64k?
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (5/11) Oct 13 2014 Oh, I've read "both line and column into a single uint", because of
- Daniel Murphy (2/6) Oct 13 2014 I suppose a 4GB single-line json file is still possible.
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (5/11) Oct 13 2014 If we make that assumption, we'd have to change it from size_t to ulong,...
- Kiith-Sa (4/22) Oct 13 2014 What are you using the location structs for?
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/23) Oct 13 2014 Within the package itself they are also only used for error information....
- Andrei Alexandrescu (2/15) Oct 13 2014 Agreed. -- Andrei
- Andrei Alexandrescu (2/16) Oct 13 2014 Yah, one uint for each. -- Andrei
- Jacob Carlborg (4/11) Oct 13 2014 JSONToken.Kind and JSONParserNode.Kind could be "ubyte" to save space.
- =?ISO-8859-15?Q?S=F6nke_Ludwig?= (4/14) Oct 13 2014 But it won't save space in practice, at least on x86, due to alignment,
- Andrei Alexandrescu (2/18) Oct 13 2014 Correct. -- Andrei
- Ary Borenszweig (4/15) Oct 17 2014 Once its done you can compare its performance against other languages
- Sean Kelly (24/27) Oct 18 2014 Wow, the C++Rapid parser is really impressive. I threw together
- Sean Kelly (29/50) Oct 18 2014 I just commented out the sscanf() call that was parsing the float
- Ary Borenszweig (4/33) Oct 19 2014 Yes, C++ rapid seems to be really, really fast. It has some sse2/see4
- David Soria Parra (4/9) Oct 20 2014 I assume this is the standard json module? I am wondering how
- Jakob Ovrum (4/5) Feb 05 2015 Added to the review queue as a work in progress with relevant
- Andrei Alexandrescu (2/6) Feb 05 2015 Yay! -- Andrei
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/7) Feb 05 2015 Thanks! I(t) should be ready for an official review in one or two weeks
Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_json The new code contains: - Lazy lexer in the form of a token input range (using slices of the input if possible) - Lazy streaming parser (StAX style) in the form of a node input range - Eager DOM style parser returning a JSONValue - Range based JSON string generator taking either a token range, a node range, or a JSONValue - Opt-out location tracking (line/column) for tokens, nodes and values - No opDispatch() for JSONValue - this has shown to do more harm than good in vibe.data.json The DOM style JSONValue type is based on std.variant.Algebraic. This currently has a few usability issues that can be solved by upgrading/fixing Algebraic: - Operator overloading only works sporadically - No "tag" enum is supported, so that switch()ing on the type of a value doesn't work and an if-else cascade is required - Operations and conversions between different Algebraic types is not conveniently supported, which gets important when other similar formats get supported (e.g. BSON) Assuming that those points are solved, I'd like to get some early feedback before going for an official review. One open issue is how to handle unescaping of string literals. Currently it always unescapes immediately, which is more efficient for general input ranges when the unescaped result is needed, but less efficient for string inputs when the unescaped result is not needed. Maybe a flag could be used to conditionally switch behavior depending on the input range type. Destroy away! ;) [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.com
Aug 21 2014
On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:Destroy away! ;)source/stdx/data/json/lexer.d(263:8)[warn]: 'JSONToken' has method 'opEquals', but not 'toHash'. source/stdx/data/json/lexer.d(499:65)[warn]: Use parenthesis to clarify this expression. source/stdx/data/json/parser.d(516:8)[warn]: 'JSONParserNode' has method 'opEquals', but not 'toHash'. source/stdx/data/json/value.d(95:10)[warn]: Variable c is never used. source/stdx/data/json/value.d(99:10)[warn]: Variable d is never used. source/stdx/data/json/package.d(942:14)[warn]: Variable val is never used. It's likely that you can ignore these, but I thought I'd post them anyways. (The last three are in unittest blocks, for example.)
Aug 21 2014
Someone needs to make a "showbrianmycode" bot: mention a D github repo and it runs static analysis for you.
Aug 21 2014
On Thursday, 21 August 2014 at 23:27:28 UTC, Justin Whear wrote:Someone needs to make a "showbrianmycode" bot: mention a D github repo and it runs static analysis for you.Why bother with mentioning a GitHub repo? Just make the bot periodically scan the DUB registry.
Aug 21 2014
On Thursday, 21 August 2014 at 23:33:35 UTC, Idan Arye wrote:Why bother with mentioning a GitHub repo? Just make the bot periodically scan the DUB registry.It's kind of picky. http://i.imgur.com/SHNAWnH.png
Aug 21 2014
Am 22.08.2014 00:48, schrieb Brian Schott:On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:Fixed all of them (neither was causing harm, but it's still nicer that way). Also added safe and nothrow where possible. BTW, anyone knows what's holding back formattedWrite() from being safe for simple types?Destroy away! ;)source/stdx/data/json/lexer.d(263:8)[warn]: 'JSONToken' has method 'opEquals', but not 'toHash'. source/stdx/data/json/lexer.d(499:65)[warn]: Use parenthesis to clarify this expression. source/stdx/data/json/parser.d(516:8)[warn]: 'JSONParserNode' has method 'opEquals', but not 'toHash'. source/stdx/data/json/value.d(95:10)[warn]: Variable c is never used. source/stdx/data/json/value.d(99:10)[warn]: Variable d is never used. source/stdx/data/json/package.d(942:14)[warn]: Variable val is never used. It's likely that you can ignore these, but I thought I'd post them anyways. (The last three are in unittest blocks, for example.)
Aug 22 2014
On 8/21/14, 7:35 PM, Sönke Ludwig wrote:Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_jsonSay I have a class Person with name (string) and age (int) with a constructor that receives both. How would I create an instance of a Person from a json with the json stream? Suppose the json is this: {"age": 10, "name": "John"} And the class is this: class Person { this(string name, int age) { // ... } }
Aug 21 2014
Am 22.08.2014 02:42, schrieb Ary Borenszweig:Say I have a class Person with name (string) and age (int) with a constructor that receives both. How would I create an instance of a Person from a json with the json stream? Suppose the json is this: {"age": 10, "name": "John"} And the class is this: class Person { this(string name, int age) { // ... } }Without a serialization framework it would in theory work like this: JSONValue v = parseJSON(`{"age": 10, "name": "John"}`); auto p = new Person(v["name"].get!string, v["age"].get!int); unfortunately the operator overloading doesn't work like this currently, so this is needed: JSONValue v = parseJSON(`{"age": 10, "name": "John"}`); auto p = new Person( v.get!(Json[string])["name"].get!string, v.get!(Json[string])["age"].get!int); That should be solved together with the new module (it could of course also easily be added to JSONValue itself instead of Algebraic, but the value of having it in Algebraic would be much higher).
Aug 21 2014
On 8/22/14, 3:33 AM, Sönke Ludwig wrote:Am 22.08.2014 02:42, schrieb Ary Borenszweig:But does this parse the whole json into JSONValue? I want to create a Person without creating an intermediate JSONValue for the whole json. Can this be done?Say I have a class Person with name (string) and age (int) with a constructor that receives both. How would I create an instance of a Person from a json with the json stream? Suppose the json is this: {"age": 10, "name": "John"} And the class is this: class Person { this(string name, int age) { // ... } }Without a serialization framework it would in theory work like this: JSONValue v = parseJSON(`{"age": 10, "name": "John"}`); auto p = new Person(v["name"].get!string, v["age"].get!int); unfortunately the operator overloading doesn't work like this currently, so this is needed: JSONValue v = parseJSON(`{"age": 10, "name": "John"}`); auto p = new Person( v.get!(Json[string])["name"].get!string, v.get!(Json[string])["age"].get!int);
Aug 22 2014
Am 22.08.2014 16:53, schrieb Ary Borenszweig:On 8/22/14, 3:33 AM, Sönke Ludwig wrote:That would be done by the serialization framework. Instead of using parseJSON(), it could use parseJSONStream() to populate the Person instance on the fly, without putting the whole JSON into memory. But I'd like to leave that for a later addition, because we'd otherwise end up with duplicate functionality once std.serialization gets finalized. Manually it would work similar to this: auto nodes = parseJSONStream(`{"age": 10, "name": "John"}`); with (JSONParserNode.Kind) { enforce(nodes.front == objectStart); nodes.popFront(); while (nodes.front != objectEnd) { auto key = nodes.front.key; nodes.popFront(); if (key == "name") person.name = nodes.front.literal.string; else if (key == "age") person.age = nodes.front.literal.number; } }Without a serialization framework it would in theory work like this: JSONValue v = parseJSON(`{"age": 10, "name": "John"}`); auto p = new Person(v["name"].get!string, v["age"].get!int); unfortunately the operator overloading doesn't work like this currently, so this is needed: JSONValue v = parseJSON(`{"age": 10, "name": "John"}`); auto p = new Person( v.get!(Json[string])["name"].get!string, v.get!(Json[string])["age"].get!int);But does this parse the whole json into JSONValue? I want to create a Person without creating an intermediate JSONValue for the whole json. Can this be done?
Aug 22 2014
On 8/22/14, 1:24 PM, Sönke Ludwig wrote:Am 22.08.2014 16:53, schrieb Ary Borenszweig:Cool, that looks good :-)On 8/22/14, 3:33 AM, Sönke Ludwig wrote:That would be done by the serialization framework. Instead of using parseJSON(), it could use parseJSONStream() to populate the Person instance on the fly, without putting the whole JSON into memory. But I'd like to leave that for a later addition, because we'd otherwise end up with duplicate functionality once std.serialization gets finalized. Manually it would work similar to this: auto nodes = parseJSONStream(`{"age": 10, "name": "John"}`); with (JSONParserNode.Kind) { enforce(nodes.front == objectStart); nodes.popFront(); while (nodes.front != objectEnd) { auto key = nodes.front.key; nodes.popFront(); if (key == "name") person.name = nodes.front.literal.string; else if (key == "age") person.age = nodes.front.literal.number; } }Without a serialization framework it would in theory work like this: JSONValue v = parseJSON(`{"age": 10, "name": "John"}`); auto p = new Person(v["name"].get!string, v["age"].get!int); unfortunately the operator overloading doesn't work like this currently, so this is needed: JSONValue v = parseJSON(`{"age": 10, "name": "John"}`); auto p = new Person( v.get!(Json[string])["name"].get!string, v.get!(Json[string])["age"].get!int);But does this parse the whole json into JSONValue? I want to create a Person without creating an intermediate JSONValue for the whole json. Can this be done?
Aug 22 2014
I notice in the docs there are several references to a `parseJSON` and `parseJson`, but I can't seem to find where either of these are defined. Is this just a typo? Hope this helps: https://github.com/s-ludwig/std_data_json/search?q=parseJson&type=Code
Aug 21 2014
Am 22.08.2014 04:35, schrieb Colden Cullen:I notice in the docs there are several references to a `parseJSON` and `parseJson`, but I can't seem to find where either of these are defined. Is this just a typo? Hope this helps: https://github.com/s-ludwig/std_data_json/search?q=parseJson&type=CodeSeems like I forgot to replace a few mentions. They are called parseJSONValue and toJSONValue now for clarity.
Aug 21 2014
Am 22.08.2014 00:35, schrieb Sönke Ludwig:The DOM style JSONValue type is based on std.variant.Algebraic. This currently has a few usability issues that can be solved by upgrading/fixing Algebraic: - Operator overloading only works sporadically - (...) - Operations and conversions between different Algebraic types is not conveniently supported, which gets important when other similar formats get supported (e.g. BSON)https://github.com/D-Programming-Language/phobos/pull/2452 https://github.com/D-Programming-Language/phobos/pull/2453 Those fix the most important operators, index access and binary arithmetic.
Aug 22 2014
Very nice ! I had started (and dropped) a json module based on Algebraic too. So without opDispatch you plan to use a syntax like jPerson["age"] = 10 ? You didn't use stdx.d.lexer. Any reason why ? (I am asking even if I never used this module.(never coded much in D in fact))
Aug 22 2014
Am 22.08.2014 14:17, schrieb matovitch:Very nice ! I had started (and dropped) a json module based on Algebraic too. So without opDispatch you plan to use a syntax like jPerson["age"] = 10 ? You didn't use stdx.d.lexer. Any reason why ? (I am asking even if I never used this module.(never coded much in D in fact))Exactly, that's the syntax you'd use for JSONValue. But my favorite way to work with most JSON data is actually to directly read the JSON string into a D struct using a serialization framework and then access the struct in a strongly typed way. This has both, less syntactic and less runtime overhead, and also greatly reduces the chance for field name/type related bugs. The module is written against current Phobos, which is why stdx.d.lexer wasn't really an option. I'm also unsure if std.lexer would be able to handle the parsing required for JSON numbers and strings. But it would certainly be nice already if at least the token structure could be reused. However, it should also be possible to find a painless migration path later, when std.lexer is actually part of Phobos.
Aug 22 2014
On Friday, 22 August 2014 at 12:39:08 UTC, Sönke Ludwig wrote:Am 22.08.2014 14:17, schrieb matovitch:Completely agree, I am waiting for a serializer too. I would love to see something like cap'n proto in D.Very nice ! I had started (and dropped) a json module based on Algebraic too. So without opDispatch you plan to use a syntax like jPerson["age"] = 10 ? You didn't use stdx.d.lexer. Any reason why ? (I am asking even if I never used this module.(never coded much in D in fact))Exactly, that's the syntax you'd use for JSONValue. But my favorite way to work with most JSON data is actually to directly read the JSON string into a D struct using a serialization framework and then access the struct in a strongly typed way. This has both, less syntactic and less runtime overhead, and also greatly reduces the chance for field name/type related bugs.The module is written against current Phobos, which is why stdx.d.lexer wasn't really an option. I'm also unsure if std.lexer would be able to handle the parsing required for JSON numbers and strings. But it would certainly be nice already if at least the token structure could be reused. However, it should also be possible to find a painless migration path later, when std.lexer is actually part of Phobos.Ok. I think I remember there was a stdx.d.lexer's Json parser provided as sample.
Aug 22 2014
Am 22.08.2014 14:47, schrieb matovitch:Ok. I think I remember there was a stdx.d.lexer's Json parser provided as sample.I see, so you just have to write your own number/string parsing routines: https://github.com/Hackerpilot/lexer-demo/blob/master/jsonlexer.d
Aug 22 2014
On Friday, 22 August 2014 at 13:00:19 UTC, Sönke Ludwig wrote:Am 22.08.2014 14:47, schrieb matovitch:It's kind of "low level" indeed...I don't know what kind of back magic are doing all these template mixins but the code looks quite clean. Confusing : // Therefore, this always returns false. bool isSeparating(size_t offset) pure nothrow safe { return true; }Ok. I think I remember there was a stdx.d.lexer's Json parser provided as sample.I see, so you just have to write your own number/string parsing routines: https://github.com/Hackerpilot/lexer-demo/blob/master/jsonlexer.d
Aug 22 2014
On 2014-08-22 00:35, Sönke Ludwig wrote:Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_json* Opening braces should be put on their own line to follow Phobos style guides * I'm wondering about the assert in lexer.d, line 160. What happens if two invalid tokens after each other occur? * I think we have talked about this before, when reviewing D lexers. I'm thinking of how to handle invalid data. Is it the best solution to throw an exception? Would it be possible to return an error token and have the client decide what to do about? Shouldn't it be possible to build a JSON validator on this? * The lexer seems to always convert JSON types to their native D types, is that wise to do? That's unnecessary if you're implementing syntax highlighting -- /Jacob Carlborg
Aug 22 2014
On Friday, 22 August 2014 at 15:47:51 UTC, Jacob Carlborg wrote:* I think we have talked about this before, when reviewing D lexers. I'm thinking of how to handle invalid data. Is it the best solution to throw an exception? Would it be possible to return an error token and have the client decide what to do about?Hmm... my initial reaction was "not as default - it should throw on error, otherwise noone will check for errors". But if it's returning an error token, maybe it would be sufficient if that token throws when its value is accessed?
Aug 22 2014
Am 22.08.2014 17:47, schrieb Jacob Carlborg:On 2014-08-22 00:35, Sönke Ludwig wrote:Will do.Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_json* Opening braces should be put on their own line to follow Phobos style guides* I'm wondering about the assert in lexer.d, line 160. What happens if two invalid tokens after each other occur?There are actually no invalid tokens at all, the "invalid" enum value is only used to denote that no token is currently stored in _front. If readToken() doesn't throw, there will always be a valid token.* I think we have talked about this before, when reviewing D lexers. I'm thinking of how to handle invalid data. Is it the best solution to throw an exception? Would it be possible to return an error token and have the client decide what to do about? Shouldn't it be possible to build a JSON validator on this?That would indeed be a possibility, it's how I used to handle it in my private version of std.lexer, too. It could also be made a compile time option.* The lexer seems to always convert JSON types to their native D types, is that wise to do? That's unnecessary if you're implementing syntax highlightingIt's basically the same trade-off as for unescaping string literals. For "string" inputs, it would be more efficient to just store a slice, but for generic input ranges it avoids the otherwise needed allocation. The proposed flag could make an improvement here, too.
Aug 22 2014
Am 22.08.2014 18:13, schrieb Sönke Ludwig:Am 22.08.2014 17:47, schrieb Jacob Carlborg:Renamed from "invalid" to "none" now to avoid confusion ->* Opening braces should be put on their own line to follow Phobos style guidesWill do.* I'm wondering about the assert in lexer.d, line 160. What happens if two invalid tokens after each other occur?There are actually no invalid tokens at all, the "invalid" enum value is only used to denote that no token is currently stored in _front. If readToken() doesn't throw, there will always be a valid token.and an additional "error" kind has been added, which implements the above. Enabled using LexOptions.noThrow.* I think we have talked about this before, when reviewing D lexers. I'm thinking of how to handle invalid data. Is it the best solution to throw an exception? Would it be possible to return an error token and have the client decide what to do about? Shouldn't it be possible to build a JSON validator on this?That would indeed be a possibility, it's how I used to handle it in my private version of std.lexer, too. It could also be made a compile time option.* The lexer seems to always convert JSON types to their native D types, is that wise to do? That's unnecessary if you're implementing syntax highlightingIt's basically the same trade-off as for unescaping string literals. For "string" inputs, it would be more efficient to just store a slice, but for generic input ranges it avoids the otherwise needed allocation. The proposed flag could make an improvement here, too.
Aug 22 2014
Some thoughts about the API: 1) Instead of `parseJSONValue` and `lexJSON`, how about static methods `JSON.parse` and `JSON.lex`, or even a module level functions `std.data.json.parse` etc.? The "JSON" part of the name is redundant. 2) Also, `parseJSONValue` and `parseJSONStream` probably don't need to have different names. They can be distinguished by their parameter types. 3) `toJSONString` shouldn't just take a boolean as flag for pretty-printing. It should either use something like `Pretty.YES`, or the function should be called `toPrettyJSONString` (I believe I have seen this latter convention elsewhere). We should also think about whether we can just call the functions `toString` and `toPrettyString`. Alternatively, `toJSON` and `toPrettyJSON` should be considered.
Aug 22 2014
Am 22.08.2014 18:15, schrieb "Marc Schütz" <schuetzm gmx.net>":Some thoughts about the API: 1) Instead of `parseJSONValue` and `lexJSON`, how about static methods `JSON.parse` and `JSON.lex`, or even a module level functions `std.data.json.parse` etc.? The "JSON" part of the name is redundant.For those functions it may be acceptable, although I really dislike that style, because it makes the code harder to read (what exactly does this parse?) and the functions are rarely used, so that that typing that additional "JSON" should be no issue at all. On the other hand, if you always type "JSON.lex" it's more to type than just "lexJSON". But for "[JSON]Value" it gets ugly really quick, because "Value"s are such a common thing and quickly occur in multiple kinds in the same source file.2) Also, `parseJSONValue` and `parseJSONStream` probably don't need to have different names. They can be distinguished by their parameter types.Actually they take exactly the same parameters and just differ in their return value. It would be more descriptive to name them parseAsJSONValue and parseAsJSONStream - or maybe parseJSONAsValue or parseJSONToValue? The current naming is somewhat modeled after std.conv's "to!T" and "parse!T".3) `toJSONString` shouldn't just take a boolean as flag for pretty-printing. It should either use something like `Pretty.YES`, or the function should be called `toPrettyJSONString` (I believe I have seen this latter convention elsewhere). We should also think about whether we can just call the functions `toString` and `toPrettyString`. Alternatively, `toJSON` and `toPrettyJSON` should be considered.Agreed, a boolean isn't good for a public interface, renaming the current writeAsString to private writeAsStringImpl and then adding "(writeAs/to)[Pretty]String" sounds reasonable. Actually I've done it that way for vibe.data.json.
Aug 22 2014
On Friday, 22 August 2014 at 16:48:44 UTC, Sönke Ludwig wrote:Am 22.08.2014 18:15, schrieb "Marc Schütz" <schuetzm gmx.net>":I'm not really concerned about the amount of typing, it just seemed a bit odd to have the redundant JSON in there, as we have module names for namespacing. Your argument about readability is true nevertheless. But...Some thoughts about the API: 1) Instead of `parseJSONValue` and `lexJSON`, how about static methods `JSON.parse` and `JSON.lex`, or even a module level functions `std.data.json.parse` etc.? The "JSON" part of the name is redundant.For those functions it may be acceptable, although I really dislike that style, because it makes the code harder to read (what exactly does this parse?) and the functions are rarely used, so that that typing that additional "JSON" should be no issue at all. On the other hand, if you always type "JSON.lex" it's more to type than just "lexJSON".But for "[JSON]Value" it gets ugly really quick, because "Value"s are such a common thing and quickly occur in multiple kinds in the same source file.... why not use exactly the same convention then? => `parse!JSONValue` Would be nice to have a "pluggable" API where you just need to specify the type in a factory method to choose the input format. Then there could be `parse!BSON`, `parse!YAML`, with the same style as `parse!(int[])`. I know this sound a bit like bike-shedding, but the API shouldn't stand by itself, but fit into the "big picture", especially as there will probably be other parsers (you already named the module std._data_.json).2) Also, `parseJSONValue` and `parseJSONStream` probably don't need to have different names. They can be distinguished by their parameter types.Actually they take exactly the same parameters and just differ in their return value. It would be more descriptive to name them parseAsJSONValue and parseAsJSONStream - or maybe parseJSONAsValue or parseJSONToValue? The current naming is somewhat modeled after std.conv's "to!T" and "parse!T".
Aug 22 2014
Am 22.08.2014 19:24, schrieb "Marc Schütz" <schuetzm gmx.net>":On Friday, 22 August 2014 at 16:48:44 UTC, Sönke Ludwig wrote:That would be nice, but then it should also work together with std.conv, which basically is exactly this pluggable API. Just like this it would result in an ambiguity error if both std.data.json and std.conv are imported at the same time. Is there a way to make std.conv work properly with JSONValue? I guess the only theoretical way would be to put something in JSONValue, but that would result in a slightly ugly cyclic dependency between parser.d and value.d.Actually they take exactly the same parameters and just differ in their return value. It would be more descriptive to name them parseAsJSONValue and parseAsJSONStream - or maybe parseJSONAsValue or parseJSONToValue? The current naming is somewhat modeled after std.conv's "to!T" and "parse!T".... why not use exactly the same convention then? => `parse!JSONValue` Would be nice to have a "pluggable" API where you just need to specify the type in a factory method to choose the input format. Then there could be `parse!BSON`, `parse!YAML`, with the same style as `parse!(int[])`. I know this sound a bit like bike-shedding, but the API shouldn't stand by itself, but fit into the "big picture", especially as there will probably be other parsers (you already named the module std._data_.json).
Aug 22 2014
On Friday, 22 August 2014 at 17:35:20 UTC, Sönke Ludwig wrote:The easiest and cleanest way would be to add a function in std.data.json: auto parse(Target, Source)(Source input) if(is(Target == JSONValue)) { return ...; } The various overloads of `std.conv.parse` already have mutually exclusive template constraints, they will not collide with our function.... why not use exactly the same convention then? => `parse!JSONValue` Would be nice to have a "pluggable" API where you just need to specify the type in a factory method to choose the input format. Then there could be `parse!BSON`, `parse!YAML`, with the same style as `parse!(int[])`. I know this sound a bit like bike-shedding, but the API shouldn't stand by itself, but fit into the "big picture", especially as there will probably be other parsers (you already named the module std._data_.json).That would be nice, but then it should also work together with std.conv, which basically is exactly this pluggable API. Just like this it would result in an ambiguity error if both std.data.json and std.conv are imported at the same time. Is there a way to make std.conv work properly with JSONValue? I guess the only theoretical way would be to put something in JSONValue, but that would result in a slightly ugly cyclic dependency between parser.d and value.d.
Aug 22 2014
Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":On Friday, 22 August 2014 at 17:35:20 UTC, Sönke Ludwig wrote:Okay, for parse that may work, but what about to!()?The easiest and cleanest way would be to add a function in std.data.json: auto parse(Target, Source)(Source input) if(is(Target == JSONValue)) { return ...; } The various overloads of `std.conv.parse` already have mutually exclusive template constraints, they will not collide with our function.... why not use exactly the same convention then? => `parse!JSONValue` Would be nice to have a "pluggable" API where you just need to specify the type in a factory method to choose the input format. Then there could be `parse!BSON`, `parse!YAML`, with the same style as `parse!(int[])`. I know this sound a bit like bike-shedding, but the API shouldn't stand by itself, but fit into the "big picture", especially as there will probably be other parsers (you already named the module std._data_.json).That would be nice, but then it should also work together with std.conv, which basically is exactly this pluggable API. Just like this it would result in an ambiguity error if both std.data.json and std.conv are imported at the same time. Is there a way to make std.conv work properly with JSONValue? I guess the only theoretical way would be to put something in JSONValue, but that would result in a slightly ugly cyclic dependency between parser.d and value.d.
Aug 22 2014
On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":What's the problem with to!()?The easiest and cleanest way would be to add a function in std.data.json: auto parse(Target, Source)(Source input) if(is(Target == JSONValue)) { return ...; } The various overloads of `std.conv.parse` already have mutually exclusive template constraints, they will not collide with our function.Okay, for parse that may work, but what about to!()?
Aug 22 2014
Am 22.08.2014 21:00, schrieb "Marc Schütz" <schuetzm gmx.net>":On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:to!() definitely doesn't have a template constraint that excludes JSONValue. Instead, it will convert any struct type that doesn't define toString() to a D-like representation.Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":What's the problem with to!()?The easiest and cleanest way would be to add a function in std.data.json: auto parse(Target, Source)(Source input) if(is(Target == JSONValue)) { return ...; } The various overloads of `std.conv.parse` already have mutually exclusive template constraints, they will not collide with our function.Okay, for parse that may work, but what about to!()?
Aug 23 2014
On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:Am 22.08.2014 21:00, schrieb "Marc Schütz" <schuetzm gmx.net>":For converting a JSONValue to a different type, JSONValue can implement `opCast`, which is the regular interface that std.conv.to uses if it's available. For converting something _to_ a JSONValue, std.conv.to will simply create an instance of it by calling the constructor.On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:to!() definitely doesn't have a template constraint that excludes JSONValue. Instead, it will convert any struct type that doesn't define toString() to a D-like representation.Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":What's the problem with to!()?The easiest and cleanest way would be to add a function in std.data.json: auto parse(Target, Source)(Source input) if(is(Target == JSONValue)) { return ...; } The various overloads of `std.conv.parse` already have mutually exclusive template constraints, they will not collide with our function.Okay, for parse that may work, but what about to!()?
Aug 23 2014
Am 23.08.2014 19:25, schrieb "Marc Schütz" <schuetzm gmx.net>":On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:That would just introduce the said dependency cycle between JSONValue, the parser and the lexer. Possible, but not particularly pretty. Also, using the JSONValue constructor to parse an input string would contradict the intuitive behavior to just store the string value.Am 22.08.2014 21:00, schrieb "Marc Schütz" <schuetzm gmx.net>":For converting a JSONValue to a different type, JSONValue can implement `opCast`, which is the regular interface that std.conv.to uses if it's available. For converting something _to_ a JSONValue, std.conv.to will simply create an instance of it by calling the constructor.On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:to!() definitely doesn't have a template constraint that excludes JSONValue. Instead, it will convert any struct type that doesn't define toString() to a D-like representation.Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":What's the problem with to!()?The easiest and cleanest way would be to add a function in std.data.json: auto parse(Target, Source)(Source input) if(is(Target == JSONValue)) { return ...; } The various overloads of `std.conv.parse` already have mutually exclusive template constraints, they will not collide with our function.Okay, for parse that may work, but what about to!()?
Aug 23 2014
On Saturday, 23 August 2014 at 17:32:01 UTC, Sönke Ludwig wrote:Am 23.08.2014 19:25, schrieb "Marc Schütz" <schuetzm gmx.net>":That's what I expect it to do anyway. For parsing, there are already other functions. "mystring".to!JSONValue should just wrap "mystring".On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:That would just introduce the said dependency cycle between JSONValue, the parser and the lexer. Possible, but not particularly pretty. Also, using the JSONValue constructor to parse an input string would contradict the intuitive behavior to just store the string value.Am 22.08.2014 21:00, schrieb "Marc Schütz" <schuetzm gmx.net>":For converting a JSONValue to a different type, JSONValue can implement `opCast`, which is the regular interface that std.conv.to uses if it's available. For converting something _to_ a JSONValue, std.conv.to will simply create an instance of it by calling the constructor.On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:to!() definitely doesn't have a template constraint that excludes JSONValue. Instead, it will convert any struct type that doesn't define toString() to a D-like representation.Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":What's the problem with to!()?The easiest and cleanest way would be to add a function in std.data.json: auto parse(Target, Source)(Source input) if(is(Target == JSONValue)) { return ...; } The various overloads of `std.conv.parse` already have mutually exclusive template constraints, they will not collide with our function.Okay, for parse that may work, but what about to!()?
Aug 23 2014
Am 23.08.2014 20:31, schrieb "Marc Schütz" <schuetzm gmx.net>":On Saturday, 23 August 2014 at 17:32:01 UTC, Sönke Ludwig wrote:Probably, but then to!() is inconsistent with parse!(). Usually they are both the same apart from how the tail of the input string is handled.Am 23.08.2014 19:25, schrieb "Marc Schütz" <schuetzm gmx.net>":That's what I expect it to do anyway. For parsing, there are already other functions. "mystring".to!JSONValue should just wrap "mystring".On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:That would just introduce the said dependency cycle between JSONValue, the parser and the lexer. Possible, but not particularly pretty. Also, using the JSONValue constructor to parse an input string would contradict the intuitive behavior to just store the string value.Am 22.08.2014 21:00, schrieb "Marc Schütz" <schuetzm gmx.net>":For converting a JSONValue to a different type, JSONValue can implement `opCast`, which is the regular interface that std.conv.to uses if it's available. For converting something _to_ a JSONValue, std.conv.to will simply create an instance of it by calling the constructor.On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:to!() definitely doesn't have a template constraint that excludes JSONValue. Instead, it will convert any struct type that doesn't define toString() to a D-like representation.Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":What's the problem with to!()?The easiest and cleanest way would be to add a function in std.data.json: auto parse(Target, Source)(Source input) if(is(Target == JSONValue)) { return ...; } The various overloads of `std.conv.parse` already have mutually exclusive template constraints, they will not collide with our function.Okay, for parse that may work, but what about to!()?
Aug 23 2014
It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type. I'd also like to see some benchmarks, particularly against some of the high performance C++ parsers, i.e. rapidjson, gason, sajson. Or even some of the "not bad" performance parsers with better APIs, i.e. QJsonDocument, jsoncpp and jsoncons (slow but perhaps comparable interface to this proposal?).
Aug 22 2014
Am 22.08.2014 18:31, schrieb Christian Manning:It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type.That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?I'd also like to see some benchmarks, particularly against some of the high performance C++ parsers, i.e. rapidjson, gason, sajson. Or even some of the "not bad" performance parsers with better APIs, i.e. QJsonDocument, jsoncpp and jsoncons (slow but perhaps comparable interface to this proposal?).That would indeed be nice to have, but I'm not sure if I can manage to squeeze that in besides finishing the module itself. My time frame for working on this is quite limited.
Aug 22 2014
On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:Am 22.08.2014 18:31, schrieb Christian Manning:It should automatically fall back to double on overflow. Maybe even use BigInt if applicable?It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type.That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?
Aug 22 2014
Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:I guess BigInt + exponent would be the only lossless way to represent any JSON number. That could then be converted to any desired smaller type as required. But checking for overflow during number parsing would definitely have an impact on parsing speed, as well as using a BigInt of course, so the question is how we want set up the trade off here (or if there is another way that is overhead-free).Am 22.08.2014 18:31, schrieb Christian Manning:It should automatically fall back to double on overflow. Maybe even use BigInt if applicable?It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type.That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?
Aug 22 2014
On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":As the functions will be templatized anyway, it should include a flags parameter. These and possible future extensions can then be selected by the user.On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:I guess BigInt + exponent would be the only lossless way to represent any JSON number. That could then be converted to any desired smaller type as required. But checking for overflow during number parsing would definitely have an impact on parsing speed, as well as using a BigInt of course, so the question is how we want set up the trade off here (or if there is another way that is overhead-free).Am 22.08.2014 18:31, schrieb Christian Manning:It should automatically fall back to double on overflow. Maybe even use BigInt if applicable?It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type.That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?
Aug 22 2014
Am 22.08.2014 20:01, schrieb "Marc Schütz" <schuetzm gmx.net>":On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:I'm actually in the process of converting the "track_location" parameter to a flags enum and to add support for an error token, so this would fit right in.Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":As the functions will be templatized anyway, it should include a flags parameter. These and possible future extensions can then be selected by the user.On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:I guess BigInt + exponent would be the only lossless way to represent any JSON number. That could then be converted to any desired smaller type as required. But checking for overflow during number parsing would definitely have an impact on parsing speed, as well as using a BigInt of course, so the question is how we want set up the trade off here (or if there is another way that is overhead-free).Am 22.08.2014 18:31, schrieb Christian Manning:It should automatically fall back to double on overflow. Maybe even use BigInt if applicable?It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type.That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?
Aug 22 2014
On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":You could check for a decimal point and a 0 at the front (excluding possible - sign), either would indicate a double, making the reasonable assumption that anything else will fit in a long.On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:I guess BigInt + exponent would be the only lossless way to represent any JSON number. That could then be converted to any desired smaller type as required. But checking for overflow during number parsing would definitely have an impact on parsing speed, as well as using a BigInt of course, so the question is how we want set up the trade off here (or if there is another way that is overhead-free).Am 22.08.2014 18:31, schrieb Christian Manning:It should automatically fall back to double on overflow. Maybe even use BigInt if applicable?It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type.That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?
Aug 22 2014
Am 22.08.2014 21:48, schrieb Christian Manning:On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:Yes, no decimal point + no exponent would work without overhead to detect integers, but that wouldn't solve the proposed automatic long->double overflow, which is what I meant. My current idea is to default to double and optionally support any of long, BigInt and "Decimal" (BigInt+exponent), where integer overflow only works for long->BigInt.Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":You could check for a decimal point and a 0 at the front (excluding possible - sign), either would indicate a double, making the reasonable assumption that anything else will fit in a long.On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:I guess BigInt + exponent would be the only lossless way to represent any JSON number. That could then be converted to any desired smaller type as required. But checking for overflow during number parsing would definitely have an impact on parsing speed, as well as using a BigInt of course, so the question is how we want set up the trade off here (or if there is another way that is overhead-free).Am 22.08.2014 18:31, schrieb Christian Manning:It should automatically fall back to double on overflow. Maybe even use BigInt if applicable?It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type.That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?
Aug 22 2014
On Friday, 22 August 2014 at 20:02:41 UTC, Sönke Ludwig wrote:Am 22.08.2014 21:48, schrieb Christian Manning:It might be the right choice anyway (seeing as json/js do overflow to double), but fwiw it's still atrocious. double a = long.max; assert(iota(1, 1000000).map!(d => (a+d)-a).until!"a != 0".walkLength == 1024); Yuk. Floating point numbers and integers are so completely different in behaviour that it's just dishonest to transparently switch between the two. This especially the case for overflow from long -> double, where by definition you're 10 bits past being able to reliably accurately represent the integer in question.On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:Yes, no decimal point + no exponent would work without overhead to detect integers, but that wouldn't solve the proposed automatic long->double overflow, which is what I meant. My current idea is to default to double and optionally support any of long, BigInt and "Decimal" (BigInt+exponent), where integer overflow only works for long->BigInt.Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":You could check for a decimal point and a 0 at the front (excluding possible - sign), either would indicate a double, making the reasonable assumption that anything else will fit in a long.On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:I guess BigInt + exponent would be the only lossless way to represent any JSON number. That could then be converted to any desired smaller type as required. But checking for overflow during number parsing would definitely have an impact on parsing speed, as well as using a BigInt of course, so the question is how we want set up the trade off here (or if there is another way that is overhead-free).Am 22.08.2014 18:31, schrieb Christian Manning:It should automatically fall back to double on overflow. Maybe even use BigInt if applicable?It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type.That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?
Aug 22 2014
Yes, no decimal point + no exponent would work without overhead to detect integers, but that wouldn't solve the proposed automatic long->double overflow, which is what I meant. My current idea is to default to double and optionally support any of long, BigInt and "Decimal" (BigInt+exponent), where integer overflow only works for long->BigInt.Ah I see. I have to say, if you are going to treat integers and floating point numbers differently, then you should store them differently. long should be used to store integers, double for floating point numbers. 64 bit signed integer (long) is a totally reasonable limitation for integers, but even that would lose precision stored as a double as you are proposing (if I'm understanding right). I don't think BigInt needs to be brought into this at all really. In the case of integers met in the parser which are too large/small to fit in long, give an error IMO. Such integers should be (and are by other libs IIRC) serialised in the form "1.234e-123" to force double parsing, perhaps losing precision at that stage rather than invisibly inside the library. Size of JSON numbers is implementation defined and the whole thing shouldn't be degraded in both performance and usability to cover JSON serialisers who go beyond common native number types. Of course, you are free to do whatever you like :)
Aug 22 2014
On 8/21/2014 3:35 PM, Sönke Ludwig wrote:Destroy away! ;)Thanks for taking this on! This is valuable work. On to destruction! I'm looking at: http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I anticipate this will be used a LOT and in very high speed demanding applications. With that in mind, 1. There's no mention of what will happen if it is passed malformed JSON strings. I presume an exception is thrown. Exceptions are both slow and consume GC memory. I suggest an alternative would be to emit an "Error" token instead; this would be much like how the UTF decoding algorithms emit a "replacement char" for invalid UTF sequences. 2. The escape sequenced strings presumably consume GC memory. This will be a problem for high performance code. I suggest either leaving them undecoded in the token stream, and letting higher level code decide what to do about them, or provide a hook that the user can override with his own allocation scheme. If we don't make it possible to use std.json without invoking the GC, I believe the module will fail in the long term.
Aug 22 2014
Am 22.08.2014 20:08, schrieb Walter Bright:On 8/21/2014 3:35 PM, Sönke Ludwig wrote:The latest version now features a LexOptions.noThrow option which causes an error token to be emitted instead. After popping the error token, the range is always empty.Destroy away! ;)Thanks for taking this on! This is valuable work. On to destruction! I'm looking at: http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I anticipate this will be used a LOT and in very high speed demanding applications. With that in mind, 1. There's no mention of what will happen if it is passed malformed JSON strings. I presume an exception is thrown. Exceptions are both slow and consume GC memory. I suggest an alternative would be to emit an "Error" token instead; this would be much like how the UTF decoding algorithms emit a "replacement char" for invalid UTF sequences.2. The escape sequenced strings presumably consume GC memory. This will be a problem for high performance code. I suggest either leaving them undecoded in the token stream, and letting higher level code decide what to do about them, or provide a hook that the user can override with his own allocation scheme.The problem is that it really depends on the use case and on the type of input stream which approach is more efficient (storing the escaped version of a string might require *two* allocations if the input range cannot be sliced and if the decoded string is then requested by the parser). My current idea therefore is to simply make this configurable, too. Enabling the use of custom allocators should be easily possible as an add-on functionality later on. At least my suggestion would be to wait with this until we have a finished std.allocator module.
Aug 22 2014
On 8/22/2014 2:27 PM, Sönke Ludwig wrote:Am 22.08.2014 20:08, schrieb Walter Bright:Having a nothrow option may prevent the functions from being attributed as "nothrow". But in any case, to worship at the Altar Of Composability, the error token could always be emitted, and then provide another algorithm which passes through all non-error tokens, and throws if it sees an error token.1. There's no mention of what will happen if it is passed malformed JSON strings. I presume an exception is thrown. Exceptions are both slow and consume GC memory. I suggest an alternative would be to emit an "Error" token instead; this would be much like how the UTF decoding algorithms emit a "replacement char" for invalid UTF sequences.The latest version now features a LexOptions.noThrow option which causes an error token to be emitted instead. After popping the error token, the range is always empty.I'm worried that std.allocator is stalled and we'll be digging ourselves deeper into needing to revise things later to remove GC usage. I'd really like to find a way to abstract the allocation away from the algorithm.2. The escape sequenced strings presumably consume GC memory. This will be a problem for high performance code. I suggest either leaving them undecoded in the token stream, and letting higher level code decide what to do about them, or provide a hook that the user can override with his own allocation scheme.The problem is that it really depends on the use case and on the type of input stream which approach is more efficient (storing the escaped version of a string might require *two* allocations if the input range cannot be sliced and if the decoded string is then requested by the parser). My current idea therefore is to simply make this configurable, too. Enabling the use of custom allocators should be easily possible as an add-on functionality later on. At least my suggestion would be to wait with this until we have a finished std.allocator module.
Aug 22 2014
On 8/22/2014 6:05 PM, Walter Bright wrote:Another possibility is to have the user pass in a resizeable buffer which then will be used to store the strings in as necessary. One example is std.internal.scopebuffer. The nice thing about that is the user can use the stack for the storage, which works out to be very, very fast.The problem is that it really depends on the use case and on the type of input stream which approach is more efficient (storing the escaped version of a string might require *two* allocations if the input range cannot be sliced and if the decoded string is then requested by the parser). My current idea therefore is to simply make this configurable, too. Enabling the use of custom allocators should be easily possible as an add-on functionality later on. At least my suggestion would be to wait with this until we have a finished std.allocator module.
Aug 22 2014
On Saturday, 23 August 2014 at 02:30:23 UTC, Walter Bright wrote:Another possibility is to have the user pass in a resizeable buffer which then will be used to store the strings in as necessary. One example is std.internal.scopebuffer. The nice thing about that is the user can use the stack for the storage, which works out to be very, very fast.Does this mean that D is getting resizable stack allocations in lower stack frames? That has a lot of implications for code gen.
Aug 22 2014
On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:On Saturday, 23 August 2014 at 02:30:23 UTC, Walter Bright wrote:scopebuffer does not require resizeable stack allocations.One example is std.internal.scopebuffer. The nice thing about that is the user can use the stack for the storage, which works out to be very, very fast.Does this mean that D is getting resizable stack allocations in lower stack frames? That has a lot of implications for code gen.
Aug 22 2014
On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:So you cannot use the stack for resizable allocations. That would however be a nice optimization. Iff an algorithm only have one alloca, can be inlined in a way which does not extend the stack and use a resizable buffer that grows downwards in memory then you can have a resizable buffer on the stack: HIMEM ... Algorihm stack frame vars Inlined vars Buffer head/book keeping vars Buffer end Buffer front ...add to front here... End of stack LOMEMDoes this mean that D is getting resizable stack allocations in lower stack frames? That has a lot of implications for code gen.scopebuffer does not require resizeable stack allocations.
Aug 22 2014
On 8/22/2014 9:48 PM, Ola Fosheim Gr wrote:On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:Please, take a look at how scopebuffer works.On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:So you cannot use the stack for resizable allocations.Does this mean that D is getting resizable stack allocations in lower stack frames? That has a lot of implications for code gen.scopebuffer does not require resizeable stack allocations.
Aug 22 2014
On Saturday, 23 August 2014 at 05:28:55 UTC, Walter Bright wrote:On 8/22/2014 9:48 PM, Ola Fosheim Gr wrote:I have? It requires an upperbound to stay on the stack, that creates a big hole in the stack. I don't think wasting the stack or moving to the heap is a nice predictable solution. It would be better to just have a couple of regions that do "reverse" stack allocations, but the most efficient solution is the one I outlined. With json you might be able to create an upperbound of say 4-8 times the size of the source iff you know the file size. You don't if you are streaming. (scopebuffer is too unpredictable for real time, a pure stack solution is predictable)On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:Please, take a look at how scopebuffer works.On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:So you cannot use the stack for resizable allocations.Does this mean that D is getting resizable stack allocations in lower stack frames? That has a lot of implications for code gen.scopebuffer does not require resizeable stack allocations.
Aug 22 2014
On 8/22/2014 11:25 PM, Ola Fosheim Gr wrote:On Saturday, 23 August 2014 at 05:28:55 UTC, Walter Bright wrote:Scopebuffer is extensively used in Warp, and works very well. The "hole" in the stack is not a significant problem.On 8/22/2014 9:48 PM, Ola Fosheim Gr wrote:I have? It requires an upperbound to stay on the stack, that creates a big hole in the stack. I don't think wasting the stack or moving to the heap is a nice predictable solution. It would be better to just have a couple of regions that do "reverse" stack allocations, but the most efficient solution is the one I outlined.On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:Please, take a look at how scopebuffer works.On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:So you cannot use the stack for resizable allocations.Does this mean that D is getting resizable stack allocations in lower stack frames? That has a lot of implications for code gen.scopebuffer does not require resizeable stack allocations.With json you might be able to create an upperbound of say 4-8 times the size of the source iff you know the file size. You don't if you are streaming. (scopebuffer is too unpredictable for real time, a pure stack solution is predictable)You can always implement your own buffering system and pass it in - that's the point, it's under user control.
Aug 22 2014
On Saturday, 23 August 2014 at 06:41:11 UTC, Walter Bright wrote:Scopebuffer is extensively used in Warp, and works very well. The "hole" in the stack is not a significant problem.Well, on a webserver you don't want to push out the caches for no good reason.You can always implement your own buffering system and pass it in - that's the point, it's under user control.My point is that you need compiler support to get good buffering options on the stack. Something like an alloca_inline: auto buffer = alloca_inline getstuff(); process(buffer); I think all memory allocation should be under compiler control, the library solutions are bound to be suboptimal, i.e. slower.
Aug 22 2014
Am 23.08.2014 03:05, schrieb Walter Bright:On 8/22/2014 2:27 PM, Sönke Ludwig wrote:It's a compile time option, so that shouldn't be an issue. There is also just a single "throw" statement in the source, so it's easy to isolate.Am 22.08.2014 20:08, schrieb Walter Bright:Having a nothrow option may prevent the functions from being attributed as "nothrow".1. There's no mention of what will happen if it is passed malformed JSON strings. I presume an exception is thrown. Exceptions are both slow and consume GC memory. I suggest an alternative would be to emit an "Error" token instead; this would be much like how the UTF decoding algorithms emit a "replacement char" for invalid UTF sequences.The latest version now features a LexOptions.noThrow option which causes an error token to be emitted instead. After popping the error token, the range is always empty.
Aug 23 2014
Am 22.08.2014 20:08, schrieb Walter Bright:(...) 2. The escape sequenced strings presumably consume GC memory. This will be a problem for high performance code. I suggest either leaving them undecoded in the token stream, and letting higher level code decide what to do about them, or provide a hook that the user can override with his own allocation scheme. If we don't make it possible to use std.json without invoking the GC, I believe the module will fail in the long term.I've added two new types now to abstract away how strings and numbers are represented in memory. For string literals this means that for input types "string" and "immutable(ubyte)[]" they will always be stored as slices to the input buffer. JSONValue has a .rawValue property to access them, as well as an "alias this"ed .value property that transparently unescapes. At that place it would also be easy to provide a method that takes an arbitrary output range to unescape without allocations. Documentation and code are both updated (also added a note about exception behavior).
Aug 23 2014
On 8/23/2014 9:36 AM, Sönke Ludwig wrote:input types "string" and "immutable(ubyte)[]"Why the immutable(ubyte)[] ?
Aug 23 2014
Am 23.08.2014 19:38, schrieb Walter Bright:On 8/23/2014 9:36 AM, Sönke Ludwig wrote:I've adopted that basically from Andrei's module. The idea is to allow processing data with arbitrary character encoding. However, the output will always be Unicode and JSON is defined to be encoded as Unicode, too, so that could probably be dropped...input types "string" and "immutable(ubyte)[]"Why the immutable(ubyte)[] ?
Aug 23 2014
On 8/23/2014 10:42 AM, Sönke Ludwig wrote:Am 23.08.2014 19:38, schrieb Walter Bright:I feel that non-UTF encodings should be handled by adapter algorithms, not embedded into the JSON lexer, so yes, I'd drop that.On 8/23/2014 9:36 AM, Sönke Ludwig wrote:I've adopted that basically from Andrei's module. The idea is to allow processing data with arbitrary character encoding. However, the output will always be Unicode and JSON is defined to be encoded as Unicode, too, so that could probably be dropped...input types "string" and "immutable(ubyte)[]"Why the immutable(ubyte)[] ?
Aug 23 2014
On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:On 8/23/2014 10:42 AM, Sönke Ludwig wrote:For performance purposes, determining encoding during lexing is useful. You can avoid any conversion costs when you know that the original string is ascii or utf-8 or other. The cost during lexing is essentially zero. The cost of storing that state might be a concern, or it might be free in otherwise unused padding space. The cost of re-scanning strings that can be avoided is non-trivial. My past experience with this was in an http parser, where there's even more complex logic than json parsing, but the concepts still apply.Am 23.08.2014 19:38, schrieb Walter Bright:I feel that non-UTF encodings should be handled by adapter algorithms, not embedded into the JSON lexer, so yes, I'd drop that.On 8/23/2014 9:36 AM, Sönke Ludwig wrote:I've adopted that basically from Andrei's module. The idea is to allow processing data with arbitrary character encoding. However, the output will always be Unicode and JSON is defined to be encoded as Unicode, too, so that could probably be dropped...input types "string" and "immutable(ubyte)[]"Why the immutable(ubyte)[] ?
Aug 23 2014
On Saturday, 23 August 2014 at 19:01:13 UTC, Brad Roberts via Digitalmars-d wrote:original string is ascii or utf-8 or other. The cost during lexing is essentially zero.I am not so sure when it comes to SIMD lexing. I think the specified behaviour should be done in a way which encourage later optimizations.
Aug 23 2014
Some baselines for performance: https://github.com/mloskot/json_benchmark http://chadaustin.me/2013/01/json-parser-benchmarking/
Aug 23 2014
On 8/23/2014 12:00 PM, Brad Roberts via Digitalmars-d wrote:On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:I'm not convinced that using an adapter algorithm won't be just as fast.I feel that non-UTF encodings should be handled by adapter algorithms, not embedded into the JSON lexer, so yes, I'd drop that.For performance purposes, determining encoding during lexing is useful.
Aug 23 2014
On 8/23/2014 3:20 PM, Walter Bright via Digitalmars-d wrote:On 8/23/2014 12:00 PM, Brad Roberts via Digitalmars-d wrote:Consider your own talks on optimizing the existing dmd lexer. In those talks you've talked about the evils of additional processing on every byte. That's what you're talking about here. While it's possible that the inliner and other optimizer steps might be able to integrate the two phases and remove some overhead, I'll believe it when I see the resulting assembly code.On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:I'm not convinced that using an adapter algorithm won't be just as fast.I feel that non-UTF encodings should be handled by adapter algorithms, not embedded into the JSON lexer, so yes, I'd drop that.For performance purposes, determining encoding during lexing is useful.
Aug 23 2014
On 8/23/2014 6:32 PM, Brad Roberts via Digitalmars-d wrote:On the other hand, deadalnix demonstrated that the ldc optimizer was able to remove the extra code. I have a reasonable faith that optimization can be improved where necessary to cover this.I'm not convinced that using an adapter algorithm won't be just as fast.Consider your own talks on optimizing the existing dmd lexer. In those talks you've talked about the evils of additional processing on every byte. That's what you're talking about here. While it's possible that the inliner and other optimizer steps might be able to integrate the two phases and remove some overhead, I'll believe it when I see the resulting assembly code.
Aug 25 2014
On 08/25/2014 09:35 PM, Walter Bright wrote:On 8/23/2014 6:32 PM, Brad Roberts via Digitalmars-d wrote:I just happened to write a very small script yesterday and tested with the compilers (with dub --build=release). dmd: 2.8 mb gdc: 3.3 mb ldc 0.5 mb So ldc can remove quite a substantial amount of code in some cases.On the other hand, deadalnix demonstrated that the ldc optimizer was able to remove the extra code. I have a reasonable faith that optimization can be improved where necessary to cover this.I'm not convinced that using an adapter algorithm won't be just as fast.Consider your own talks on optimizing the existing dmd lexer. In those talks you've talked about the evils of additional processing on every byte. That's what you're talking about here. While it's possible that the inliner and other optimizer steps might be able to integrate the two phases and remove some overhead, I'll believe it when I see the resulting assembly code.
Aug 25 2014
On 8/25/2014 12:49 PM, simendsjo wrote:I just happened to write a very small script yesterday and tested with the compilers (with dub --build=release). dmd: 2.8 mb gdc: 3.3 mb ldc 0.5 mb So ldc can remove quite a substantial amount of code in some cases.Speed optimizations are different.
Aug 25 2014
On 25/08/14 21:49, simendsjo wrote:So ldc can remove quite a substantial amount of code in some cases.It's because the latest release of LDC has the --gc-sections falg enabled by default. -- /Jacob Carlborg
Aug 26 2014
I tried using "-disable-linker-strip-dead", but it had no effect. From the error messages it seems the problem is compile-time and not link-time... On Tuesday, 26 August 2014 at 07:01:09 UTC, Jacob Carlborg wrote:On 25/08/14 21:49, simendsjo wrote:So ldc can remove quite a substantial amount of code in some cases.It's because the latest release of LDC has the --gc-sections falg enabled by default.
Aug 26 2014
On 8/23/14, 10:46 AM, Walter Bright wrote:On 8/23/2014 10:42 AM, Sönke Ludwig wrote:I think accepting ubyte it's a good idea. It means "got this stream of bytes off of the wire and it hasn't been validated as a UTF string". It also means (which is true) that the lexer does enough validation to constrain arbitrary bytes into text, and saves caller from either a check (expensive) or a cast (unpleasant). Reality is the JSON lexer takes ubytes and produces tokens. AndreiAm 23.08.2014 19:38, schrieb Walter Bright:I feel that non-UTF encodings should be handled by adapter algorithms, not embedded into the JSON lexer, so yes, I'd drop that.On 8/23/2014 9:36 AM, Sönke Ludwig wrote:I've adopted that basically from Andrei's module. The idea is to allow processing data with arbitrary character encoding. However, the output will always be Unicode and JSON is defined to be encoded as Unicode, too, so that could probably be dropped...input types "string" and "immutable(ubyte)[]"Why the immutable(ubyte)[] ?
Aug 23 2014
On 8/23/2014 2:36 PM, Andrei Alexandrescu wrote:I think accepting ubyte it's a good idea. It means "got this stream of bytes off of the wire and it hasn't been validated as a UTF string". It also means (which is true) that the lexer does enough validation to constrain arbitrary bytes into text, and saves caller from either a check (expensive) or a cast (unpleasant). Reality is the JSON lexer takes ubytes and produces tokens.Using an adapter still makes sense, because: 1. The adapter should be just as fast as wiring it in internally 2. The adapter then becomes a general purpose tool that can be used elsewhere where the encoding is unknown or suspect 3. The scope of the adapter is small, so it is easier to get it right, and being reusable means every user benefits from it 4. If we can't make adapters efficient, we've failed at the ranges+algorithms model, and I'm very unwilling to fail at that
Aug 23 2014
On 8/23/14, 3:24 PM, Walter Bright wrote:On 8/23/2014 2:36 PM, Andrei Alexandrescu wrote:An adapter would solve the wrong problem here. There's nothing to adapt from and to. An adapter would be good if e.g. the stream uses UTF-16 or some Windows encoding. Bytes are the natural input for a json parser. AndreiI think accepting ubyte it's a good idea. It means "got this stream of bytes off of the wire and it hasn't been validated as a UTF string". It also means (which is true) that the lexer does enough validation to constrain arbitrary bytes into text, and saves caller from either a check (expensive) or a cast (unpleasant). Reality is the JSON lexer takes ubytes and produces tokens.Using an adapter still makes sense, because: 1. The adapter should be just as fast as wiring it in internally 2. The adapter then becomes a general purpose tool that can be used elsewhere where the encoding is unknown or suspect 3. The scope of the adapter is small, so it is easier to get it right, and being reusable means every user benefits from it 4. If we can't make adapters efficient, we've failed at the ranges+algorithms model, and I'm very unwilling to fail at that
Aug 23 2014
On 8/23/2014 3:51 PM, Andrei Alexandrescu wrote:An adapter would solve the wrong problem here. There's nothing to adapt from and to. An adapter would be good if e.g. the stream uses UTF-16 or some Windows encoding. Bytes are the natural input for a json parser.The adaptation is to take arbitrary byte input in an unknown encoding and produce valid UTF. Note that many html readers scan the bytes to see if it is ASCII, UTF, some code page encoding, Shift-JIS, etc., and translate accordingly. I do not see why that is less costly to put inside the JSON lexer than as an adapter.
Aug 25 2014
On Monday, 25 August 2014 at 19:38:05 UTC, Walter Bright wrote:The adaptation is to take arbitrary byte input in an unknown encoding and produce valid UTF.I agree. For a restful http service the encoding should be specified in the http header and the input rejected if it isn't UTF compatible. For that use scenario you only want validation, not conversion. However some validation is free, like if you only accept numbers you could just turn off parsing of strings in the template… If files are read from storage then you can reread the file if it fails validation on the first pass. I wonder, in which use scenario it is that both of these conditions fail? 1. unspecified character-set and cannot assume UTF for JSON 3. unable to re-parse
Aug 25 2014
Am 25.08.2014 21:50, schrieb "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>":On Monday, 25 August 2014 at 19:38:05 UTC, Walter Bright wrote:BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159, which is another argument for just letting the lexer assume valid UTF.The adaptation is to take arbitrary byte input in an unknown encoding and produce valid UTF.I agree. For a restful http service the encoding should be specified in the http header and the input rejected if it isn't UTF compatible. For that use scenario you only want validation, not conversion. However some validation is free, like if you only accept numbers you could just turn off parsing of strings in the template… If files are read from storage then you can reread the file if it fails validation on the first pass. I wonder, in which use scenario it is that both of these conditions fail? 1. unspecified character-set and cannot assume UTF for JSON 3. unable to re-parse
Aug 25 2014
On Monday, 25 August 2014 at 20:35:32 UTC, Sönke Ludwig wrote:BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159, which is another argument for just letting the lexer assume valid UTF.The lexer cannot assume valid UTF since the client might be a rogue, but it can just bail out if the lookahead isn't jSON? So UTF-validation is limited to strings. You have to parse the strings because of the \uXXXX escapes of course, so some basic validation is unavoidable? But I guess full validation of string content could be another useful option along with "ignore escapes" for the case where you want to avoid decode-encode scenarios. (like for a proxy, or if you store pre-escaped unicode in a database)
Aug 25 2014
Am 25.08.2014 22:51, schrieb "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>":On Monday, 25 August 2014 at 20:35:32 UTC, Sönke Ludwig wrote:But why should UTF validation be the job of the lexer in the first place? D's "string" type is also defined to be UTF-8, so given that, it would of course be free to assume valid UTF-8. I agree with Walter there that validation/conversion should be added as a separate proxy range. But if we end up going for validating in the lexer, it would indeed be enough to validate inside strings, because the rest of the grammar assumes a subset of ASCII.BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159, which is another argument for just letting the lexer assume valid UTF.The lexer cannot assume valid UTF since the client might be a rogue, but it can just bail out if the lookahead isn't jSON? So UTF-validation is limited to strings.You have to parse the strings because of the \uXXXX escapes of course, so some basic validation is unavoidable?At least no UTF validation is needed. Since all non-ASCII characters will always be composed of bytes >0x7F, a sequence \uXXXX can be assumed to be valid wherever in the string it occurs, and all other bytes that don't belong to an escape sequence are just passed through as-is.But I guess full validation of string content could be another useful option along with "ignore escapes" for the case where you want to avoid decode-encode scenarios. (like for a proxy, or if you store pre-escaped unicode in a database)
Aug 25 2014
On Monday, 25 August 2014 at 21:27:42 UTC, Sönke Ludwig wrote:But why should UTF validation be the job of the lexer in the first place?Because you want to save time, it is faster to integrate validation? The most likely use scenario is to receive REST data over HTTP that needs validation. Well, so then I agree with Andrei… array of bytes it is. ;-)added as a separate proxy range. But if we end up going for validating in the lexer, it would indeed be enough to validate inside strings, because the rest of the grammar assumes a subset of ASCII.Not assumes, but defines! :-) If you have to validate UTF before lexing then you will end up needlessly scanning lots of ascii if the file contains lots of non-strings or is from a encoder that only sends pure ascii. If you want to have "plugin" validation of strings then you also need to differentiate strings so that the user can select which data should be just ascii, utf8, numbers, ids etc. Otherwise the user will end up doing double validation (you have to bypass >7F followed by string-end anyway). The advantage of integrated validation is that you can use 16 bytes SIMD registers on the buffer. I presume you can load 16 bytes and do BITWISE-AND on the MSB, then match against string-end and carefully use this to boost performance of simultanous UTF validation, escape-scanning, and string-end scan. A bit tricky, of course.At least no UTF validation is needed. Since all non-ASCII characters will always be composed of bytes >0x7F, a sequence \uXXXX can be assumed to be valid wherever in the string it occurs, and all other bytes that don't belong to an escape sequence are just passed through as-is.You cannot assume \u… to be valid if you convert it.
Aug 25 2014
On Monday, 25 August 2014 at 21:53:50 UTC, Ola Fosheim Grøstad wrote:I presume you can load 16 bytes and do BITWISE-AND on the MSB, then match against string-end and carefully use this to boost performance of simultanous UTF validation, escape-scanning, and string-end scan. A bit tricky, of course.I think it is doable and worth it… https://software.intel.com/sites/landingpage/IntrinsicsGuide/ e.g.: __mmask16 _mm_cmpeq_epu8_mask (__m128i a, __m128i b) __mmask32 _mm256_cmpeq_epu8_mask (__m256i a, __m256i b) __mmask64 _mm512_cmpeq_epu8_mask (__m512i a, __m512i b) __mmask16 _mm_test_epi8_mask (__m128i a, __m128i b) etc. So you can: 1. preload registers with "\\\\\\\\…" , "\"\"…" and "\0\0\0…" 2. then compare signed/unsigned/equal whatever. 3. then load 16,32 or 64 bytes of data and stream until the masks trigger 4. tests masks 5. resolve any potential issues, goto 3
Aug 25 2014
On Monday, 25 August 2014 at 22:40:00 UTC, Ola Fosheim Grøstad wrote:On Monday, 25 August 2014 at 21:53:50 UTC, Ola Fosheim Grøstad wrote:D:YAML uses a similar approach, but with 8 bytes (plain ulong - portable) to detect how many ASCII chars are there before the first non-ASCII UTF-8 sequence, and it significantly improves performance (didn't keep any numbers unfortunately, but it decreases decoding overhead to a fraction for most inputs (since YAML (and JSON) files tend to be mostly-ASCII with non-ASCII from time to time in strings), if we know that we have e.g. 100 chars incoming that are plain ASCII, we can use a fast path for them and only consider decoding after that)) See the countASCII() function in https://github.com/kiith-sa/D-YAML/blob/master/source/dyaml/reader.d However, this approach is useful only if you decode the whole buffer at once, not if you do something like foreach(dchar ch; "asdsššdfáľäô") {}, which is the most obvious way to decode in D. FWIW, decoding _was_ a significant overhead in D:YAML (again, didn't keep numbers, but at a time it was around 10% in the profiler), and I didn't like the fact that it prevented making my code nogc - I ended up copying chunks of std.utf and making them nogc nothrow (D:YAML as a whole is not nogc but I use nogc in some parts basically as " noalloc" to ensure I don't allocate anything)I presume you can load 16 bytes and do BITWISE-AND on the MSB, then match against string-end and carefully use this to boost performance of simultanous UTF validation, escape-scanning, and string-end scan. A bit tricky, of course.I think it is doable and worth it… https://software.intel.com/sites/landingpage/IntrinsicsGuide/ e.g.: __mmask16 _mm_cmpeq_epu8_mask (__m128i a, __m128i b) __mmask32 _mm256_cmpeq_epu8_mask (__m256i a, __m256i b) __mmask64 _mm512_cmpeq_epu8_mask (__m512i a, __m512i b) __mmask16 _mm_test_epi8_mask (__m128i a, __m128i b) etc. So you can: 1. preload registers with "\\\\\\\\…" , "\"\"…" and "\0\0\0…" 2. then compare signed/unsigned/equal whatever. 3. then load 16,32 or 64 bytes of data and stream until the masks trigger 4. tests masks 5. resolve any potential issues, goto 3
Aug 25 2014
On Monday, 25 August 2014 at 23:24:43 UTC, Kiith-Sa wrote:D:YAML uses a similar approach, but with 8 bytes (plain ulong - portable) to detect how many ASCII chars are there before the first non-ASCII UTF-8 sequence, and it significantly improves performance (didn't keep any numbers unfortunately, but itCool! I think often you will have an array of numbers so you could subtract "000000000…", then parse offset-bytes and convert the mantissa/exponent using shuffles and simd. Somehow…
Aug 25 2014
Am 25.08.2014 23:53, schrieb "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>":On Monday, 25 August 2014 at 21:27:42 UTC, Sönke Ludwig wrote:I guess it depends on if you look at the grammar as productions or comprehensions(right term?) ;)But why should UTF validation be the job of the lexer in the first place?Because you want to save time, it is faster to integrate validation? The most likely use scenario is to receive REST data over HTTP that needs validation. Well, so then I agree with Andrei… array of bytes it is. ;-)added as a separate proxy range. But if we end up going for validating in the lexer, it would indeed be enough to validate inside strings, because the rest of the grammar assumes a subset of ASCII.Not assumes, but defines! :-)If you have to validate UTF before lexing then you will end up needlessly scanning lots of ascii if the file contains lots of non-strings or is from a encoder that only sends pure ascii.That's true. So the ideal solution would be to *assume* UTF-8 when the input is char based and to *validate* if the input is "numeric".If you want to have "plugin" validation of strings then you also need to differentiate strings so that the user can select which data should be just ascii, utf8, numbers, ids etc. Otherwise the user will end up doing double validation (you have to bypass >7F followed by string-end anyway). The advantage of integrated validation is that you can use 16 bytes SIMD registers on the buffer. I presume you can load 16 bytes and do BITWISE-AND on the MSB, then match against string-end and carefully use this to boost performance of simultanous UTF validation, escape-scanning, and string-end scan. A bit tricky, of course.Well, that's something that's definitely out of the scope of this proposal. Definitely an interesting direction to pursue, though.I meant "X" to stand for a hex digit. The point was just that you don't have to worry about interacting in a bad way with UTF sequences when you find "\uXXXX".At least no UTF validation is needed. Since all non-ASCII characters will always be composed of bytes >0x7F, a sequence \uXXXX can be assumed to be valid wherever in the string it occurs, and all other bytes that don't belong to an escape sequence are just passed through as-is.You cannot assume \u… to be valid if you convert it.
Aug 26 2014
On Tuesday, 26 August 2014 at 07:51:04 UTC, Sönke Ludwig wrote:That's true. So the ideal solution would be to *assume* UTF-8 when the input is char based and to *validate* if the input is "numeric".I think you should validate JSON-strings to be UTF-8 encoded even if you allow illegal unicode values. Basically ensuring that0x7f has the right number of bytes after it, so you don't get 0x7f as the last byte in a string etc.Well, that's something that's definitely out of the scope of this proposal. Definitely an interesting direction to pursue, though.Maybe the interface/code structure is or could be designed so that the implementation could later be version()'ed to SIMD where possible.When you convert "\uXXXX" to UTF-8 bytes, is it then validated as a legal code point? I guess it is not necessary. Btw, I believe rapidJSON achieves high speed by converting strings in situ, so that if the prefix is escape free it just converts in place when it hits the first escape. Thus avoiding some moving.You cannot assume \u… to be valid if you convert it.I meant "X" to stand for a hex digit. The point was just that you don't have to worry about interacting in a bad way with UTF sequences when you find "\uXXXX".
Aug 26 2014
Am 26.08.2014 10:24, schrieb "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>":On Tuesday, 26 August 2014 at 07:51:04 UTC, Sönke Ludwig wrote:I think this is a misunderstanding. What I mean is that if the input range passed to the lexer is char/wchar/dchar based, the lexer should assume that the input is well formed UTF. After all this is how D strings are defined. When on the other hand a ubyte/ushort/uint range is used, the lexer should validate all string literals.That's true. So the ideal solution would be to *assume* UTF-8 when the input is char based and to *validate* if the input is "numeric".I think you should validate JSON-strings to be UTF-8 encoded even if you allow illegal unicode values. Basically ensuring that >0x7f has the right number of bytes after it, so you don't get >0x7f as the last byte in a string etc.I guess that shouldn't be an issue. From the outside it's just a generic range that is passed in and internally it's always possible to add special cases for array inputs. If someone else wants to play around with this idea, we could of course also integrate it right away, it's just that I personally don't have the time to go to the extreme here.Well, that's something that's definitely out of the scope of this proposal. Definitely an interesting direction to pursue, though.Maybe the interface/code structure is or could be designed so that the implementation could later be version()'ed to SIMD where possible.What is validated is that it forms valid UTF-16 surrogate pairs, and those are converted to a single dchar instead (if applicable). This is necessary, because otherwise the lexer would produce invalid UTF-8 for valid inputs. Apart from that, the value is used verbatim as a dchar.When you convert "\uXXXX" to UTF-8 bytes, is it then validated as a legal code point? I guess it is not necessary.You cannot assume \u… to be valid if you convert it.I meant "X" to stand for a hex digit. The point was just that you don't have to worry about interacting in a bad way with UTF sequences when you find "\uXXXX".Btw, I believe rapidJSON achieves high speed by converting strings in situ, so that if the prefix is escape free it just converts in place when it hits the first escape. Thus avoiding some moving.The same is true for this lexer, at least for array inputs. It actually currently just stores a slice of the string literal in all cases and lazily decodes on the first access. While doing that, it first skips any escape sequence free prefix and returns a slice if the whole string is escape sequence free.
Aug 26 2014
On Tuesday, 26 August 2014 at 09:05:05 UTC, Sönke Ludwig wrote:When on the other hand a ubyte/ushort/uint range is used, the lexer should validate all string literals.Yes, so this will be supported? Because this is what is most useful.
Aug 26 2014
Am 26.08.2014 11:11, schrieb "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>":On Tuesday, 26 August 2014 at 09:05:05 UTC, Sönke Ludwig wrote:If nobody plays a veto card, I'll implement it that way.When on the other hand a ubyte/ushort/uint range is used, the lexer should validate all string literals.Yes, so this will be supported? Because this is what is most useful.
Aug 26 2014
Btw, maybe it would be a good idea to take a look on the JSON that various browsers generates to see if there are any differences? Then one could tune optimizations to what is the most common coding, like this: 1. start parsing assuming "browser style restricted JSON" grammar. 2. on failure jump to the slower "generic JSON" Chrome does not seem to generate whitespace in JSON.stringfy(). And I would not be surprised if the encoding of double is similar across browsers. Ola.
Aug 25 2014
On 8/25/2014 1:35 PM, Sönke Ludwig wrote:BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159, which is another argument for just letting the lexer assume valid UTF.I think that settles it.
Aug 25 2014
On 8/22/14, Sönke Ludwig <digitalmars-d puremagic.com> wrote:Docs: http://s-ludwig.github.io/std_data_json/This confused me for a solid minute: // Lex a JSON string into a lazy range of tokens auto tokens = lexJSON(`{"name": "Peter", "age": 42}`); with (JSONToken.Kind) { assert(tokens.map!(t => t.kind).equal( [objectStart, string, colon, string, comma, string, colon, number, objectEnd])); } Generally I'd avoid using de-facto reserved names as enum member names (e.g. string).
Aug 22 2014
Am 22.08.2014 21:15, schrieb Andrej Mitrovic via Digitalmars-d:On 8/22/14, Sönke Ludwig <digitalmars-d puremagic.com> wrote:Hmmm, but it *is* a string. Isn't the problem more the use of with in this case? Maybe the example should just use with(JSONToken) and then Kind.string?Docs: http://s-ludwig.github.io/std_data_json/This confused me for a solid minute: // Lex a JSON string into a lazy range of tokens auto tokens = lexJSON(`{"name": "Peter", "age": 42}`); with (JSONToken.Kind) { assert(tokens.map!(t => t.kind).equal( [objectStart, string, colon, string, comma, string, colon, number, objectEnd])); } Generally I'd avoid using de-facto reserved names as enum member names (e.g. string).
Aug 22 2014
On 8/22/14, Sönke Ludwig <digitalmars-d puremagic.com> wrote:Hmmm, but it *is* a string. Isn't the problem more the use of with in this case?Yeah, maybe so. I thought for a second it was a tuple, but then I saw the square brackets and was left scratching my head. :)
Aug 23 2014
First thank you for your work. std.json is horrible to use right now, so a replacement is more than welcome. I haven't played with your code yet, so I may be asking for somethign that already exists, but did you had a look to jsvar by Adam ? You can find it here: https://github.com/adamdruppe/arsd/blob/master/jsvar.d One of the big pain when one work with format like JSON is that you go from the untyped world to the typed world (the same problem occurs with XML and various config format as well). I think Adam got the right balance in jsvar. It behave closely enough to javascript so it is convenient to manipulate, while removing the most dangerous behavior (concatenation is still done using ~and not + as in JS). If that is not already the case, I'd love that the element I get out of my JSON behave that way. If you can do that, you have a user.
Aug 22 2014
On Sat, 23 Aug 2014 02:23:25 +0000 deadalnix via Digitalmars-d <digitalmars-d puremagic.com> wrote:I haven't played with your code yet, so I may be asking for=20 somethign that already exists, but did you had a look to jsvar by=20 Adam ?jsvar using opDispatch, and S=C3=B6nke wrote:- No opDispatch() for JSONValue - this has shown to do more harm than good in vibe.data.json
Aug 22 2014
Am 23.08.2014 04:23, schrieb deadalnix:First thank you for your work. std.json is horrible to use right now, so a replacement is more than welcome. I haven't played with your code yet, so I may be asking for somethign that already exists, but did you had a look to jsvar by Adam ? You can find it here: https://github.com/adamdruppe/arsd/blob/master/jsvar.d One of the big pain when one work with format like JSON is that you go from the untyped world to the typed world (the same problem occurs with XML and various config format as well). I think Adam got the right balance in jsvar. It behave closely enough to javascript so it is convenient to manipulate, while removing the most dangerous behavior (concatenation is still done using ~and not + as in JS). If that is not already the case, I'd love that the element I get out of my JSON behave that way. If you can do that, you have a user.Setting the issue of opDispatch aside, one of the goals was to use Algebraic to store values. It is probably not completely as flexible as jsvar, but still transparently enables a lot of operations (with those pull requests merged at least). But it has another big advantage, which is that we can later define other types based on Algebraic, such as BSONValue, and those can be transparently runtime converted between each other in a generic way. A special case type on the other hand produces nasty dependencies between the formats. Main issues of using opDispatch: - Prone to bugs where a normal field/method of the JSONValue struct is accessed instead of a JSON field - On top of that the var.field syntax gives the wrong impression that you are working with static typing, while var["field"] makes it clear that runtime indexing is going on - Every interface change of JSONValue would be a silent breaking change, because the whole string domain is used up for opDispatch
Aug 23 2014
On Saturday, 23 August 2014 at 09:22:01 UTC, Sönke Ludwig wrote:Main issues of using opDispatch: - Prone to bugs where a normal field/method of the JSONValue struct is accessed instead of a JSON field - On top of that the var.field syntax gives the wrong impression that you are working with static typing, while var["field"] makes it clear that runtime indexing is going on - Every interface change of JSONValue would be a silent breaking change, because the whole string domain is used up for opDispatchI have seen similar issues to these with simplexml in PHP. Using opDispatch to match all possible names except a few doesn't work so well. I'm not sure if you've changed it already, but I agree with the earlier comment about changing the flag for pretty printing from a boolean to an enum value. Booleans in interfaces is one of my pet peeves.
Aug 23 2014
Am 23.08.2014 14:19, schrieb w0rp:I'm not sure if you've changed it already, but I agree with the earlier comment about changing the flag for pretty printing from a boolean to an enum value. Booleans in interfaces is one of my pet peeves.It's split into two separate functions now. Having to type out a full enum value I guess would be too distracting in this case, since they will be pretty frequently used.
Aug 23 2014
On Saturday, 23 August 2014 at 09:22:01 UTC, Sönke Ludwig wrote:Main issues of using opDispatch: - Prone to bugs where a normal field/method of the JSONValue struct is accessed instead of a JSON field - On top of that the var.field syntax gives the wrong impression that you are working with static typing, while var["field"] makes it clear that runtime indexing is going on - Every interface change of JSONValue would be a silent breaking change, because the whole string domain is used up for opDispatchYes, I don't mind missing that one. It look like a false good idea.
Aug 23 2014
I've added support (compile time option [1]) for long and BigInt in the lexer (and parser), see [2]. JSONValue currently still only stores double for numbers. There are two options for extending JSONValue: 1. Add long and BigInt to the set of supported types for JSONValue. This preserves all features of Algebraic and would later still allow transparent conversion to other similar value types (e.g. BSONValue). On the other hand it would be necessary to always check the actual type before accessing a number, or the Algebraic would throw. 2. Instead of double, store a JSONNumber in the Algebraic. This enables all the transparent conversions of JSONNumber and would thus be more convenient, but blocks the way for possible automatic conversions in the future. I'm leaning towards 1, because allowing generic conversion between different JSONValue-like types was one of my prime goals for the new module. [1]: http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/LexOptions.html [2]: http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/JSONNumber.html
Aug 25 2014
On Monday, 25 August 2014 at 11:30:15 UTC, Sönke Ludwig wrote:I've added support (compile time option [1]) for long and BigInt in the lexer (and parser), see [2]. JSONValue currently still only stores double for numbers.It can be very useful to have a base 10 exponent representation in certain situations where you need to have the exact same results in two systems (like a third party ERP server versus a client side application). Base 2 exponents are tricky (incorrect) when you read ascii. E.g. I have resorted to using Decimal in Python just to avoid the weird round off issues when calculating prices where the price is given in fractions of the order unit. Perhaps a marginal problem, but could be important for some serious application areas where you need to integrate D with existing systems (for which you don't have the source code).
Aug 25 2014
Am 25.08.2014 14:12, schrieb "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>":On Monday, 25 August 2014 at 11:30:15 UTC, Sönke Ludwig wrote:In fact, I've already prepared the code for that, but commented it out for now, because I wanted to have an efficient algorithm for converting double to Decimal and because we should probably first add a Decimal type to Phobos instead of adding it to the JSON module.I've added support (compile time option [1]) for long and BigInt in the lexer (and parser), see [2]. JSONValue currently still only stores double for numbers.It can be very useful to have a base 10 exponent representation in certain situations where you need to have the exact same results in two systems (like a third party ERP server versus a client side application). Base 2 exponents are tricky (incorrect) when you read ascii. E.g. I have resorted to using Decimal in Python just to avoid the weird round off issues when calculating prices where the price is given in fractions of the order unit. Perhaps a marginal problem, but could be important for some serious application areas where you need to integrate D with existing systems (for which you don't have the source code).
Aug 25 2014
On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_json The new code contains: - Lazy lexer in the form of a token input range (using slices of the input if possible) - Lazy streaming parser (StAX style) in the form of a node input range - Eager DOM style parser returning a JSONValue - Range based JSON string generator taking either a token range, a node range, or a JSONValue - Opt-out location tracking (line/column) for tokens, nodes and values - No opDispatch() for JSONValue - this has shown to do more harm than good in vibe.data.json The DOM style JSONValue type is based on std.variant.Algebraic. This currently has a few usability issues that can be solved by upgrading/fixing Algebraic: - Operator overloading only works sporadically - No "tag" enum is supported, so that switch()ing on the type of a value doesn't work and an if-else cascade is required - Operations and conversions between different Algebraic types is not conveniently supported, which gets important when other similar formats get supported (e.g. BSON) Assuming that those points are solved, I'd like to get some early feedback before going for an official review. One open issue is how to handle unescaping of string literals. Currently it always unescapes immediately, which is more efficient for general input ranges when the unescaped result is needed, but less efficient for string inputs when the unescaped result is not needed. Maybe a flag could be used to conditionally switch behavior depending on the input range type. Destroy away! ;) [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.comOne missing feature (which is also missing from the existing std.json) is support for NaN and Infinity as JSON values. Although they are not part of the formal JSON spec (which is a ridiculous omission, the argument given for excluding them is fallacious), they do get generated if you use Javascript's toString to create the JSON. Many JSON libraries (eg Google's) also generate them, so they are frequently encountered in practice. So a JSON parser should at least be able to lex them. ie this should be parsable: {"foo": NaN, "bar": Infinity, "baz": -Infinity} You should also put tests in for what happens when you pass NaN or infinity to toJSON. It shouldn't silently generate invalid JSON.
Aug 25 2014
On Monday, 25 August 2014 at 13:07:08 UTC, Don wrote:practice. So a JSON parser should at least be able to lex them. ie this should be parsable: {"foo": NaN, "bar": Infinity, "baz": -Infinity} You should also put tests in for what happens when you pass NaN or infinity to toJSON. It shouldn't silently generate invalid JSON.I believe you are allowed to use very high exponents, though. Like: 1E999 . So you need to decide if those should be mapped to +Infinity or to the max value… NaN also come in two forms with differing semantics: signalling(NaNs) and quiet (NaN). NaN is used for 0/0 and sqrt(-1), but NaNs is used for illegal values and failure. For some reason D does not seem to support this aspect of IEEE754? I cannot find ".nans" listed on the page http://dlang.org/property.html The distinction is important when you do conditional branching. With NaNs you might not be able to figure out which branch to take since you might have missed out on a real value, with NaN you got the value (which is known to be not real) and you might be able to branch.
Aug 25 2014
On 8/25/2014 6:23 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:On Monday, 25 August 2014 at 13:07:08 UTC, Don wrote:Infinity. Mapping to max value would be a horrible bug.practice. So a JSON parser should at least be able to lex them. ie this should be parsable: {"foo": NaN, "bar": Infinity, "baz": -Infinity} You should also put tests in for what happens when you pass NaN or infinity to toJSON. It shouldn't silently generate invalid JSON.I believe you are allowed to use very high exponents, though. Like: 1E999 . So you need to decide if those should be mapped to +Infinity or to the max value…NaN also come in two forms with differing semantics: signalling(NaNs) and quiet (NaN). NaN is used for 0/0 and sqrt(-1), but NaNs is used for illegal values and failure. For some reason D does not seem to support this aspect of IEEE754? I cannot find ".nans" listed on the page http://dlang.org/property.htmlBecause I tried supporting them in C++. It doesn't work for various reasons. Nobody else supports them, either.
Aug 25 2014
On Monday, 25 August 2014 at 19:42:03 UTC, Walter Bright wrote:Infinity. Mapping to max value would be a horrible bug.Yes… but then you are reading an illegal value that JSON does not support…I haven't tested, but Python is supposed to throw on NaNs. gcc has support for nans in their documentation: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html IBM Fortran supports it… I think supporting signaling NaN is important for correctness.For some reason D does not seem to support this aspect of IEEE754? I cannot find ".nans" listed on the page http://dlang.org/property.htmlBecause I tried supporting them in C++. It doesn't work for various reasons. Nobody else supports them, either.
Aug 25 2014
On Monday, 25 August 2014 at 20:04:10 UTC, Ola Fosheim Grøstad wrote:I think supporting signaling NaN is important for correctness.It is defined in C++11: http://en.cppreference.com/w/cpp/types/numeric_limits/signaling_NaN
Aug 25 2014
On 8/25/2014 1:21 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:On Monday, 25 August 2014 at 20:04:10 UTC, Ola Fosheim Grøstad wrote:I didn't know that. But recall I did implement it in DMC++, and it turned out to simply not be useful. I'd be surprised if the new C++ support for it does anything worthwhile.I think supporting signaling NaN is important for correctness.It is defined in C++11: http://en.cppreference.com/w/cpp/types/numeric_limits/signaling_NaN
Aug 25 2014
On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright wrote:I didn't know that. But recall I did implement it in DMC++, and it turned out to simply not be useful. I'd be surprised if the new C++ support for it does anything worthwhile.Well, one should initialize with signaling NaN. Then you get an exception if you try to compute using uninitialized values.
Aug 25 2014
On 8/25/2014 4:15 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright wrote:That's the theory. The practice doesn't work out so well.I didn't know that. But recall I did implement it in DMC++, and it turned out to simply not be useful. I'd be surprised if the new C++ support for it does anything worthwhile.Well, one should initialize with signaling NaN. Then you get an exception if you try to compute using uninitialized values.
Aug 25 2014
On Monday, 25 August 2014 at 23:29:21 UTC, Walter Bright wrote:On 8/25/2014 4:15 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:To be more concrete: Processors from AMD have signalling NaN behaviour which is different from processors from Intel. And the situation is worst on most other architectures. It's a lost cause, I think.On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright wrote:That's the theory. The practice doesn't work out so well.I didn't know that. But recall I did implement it in DMC++, and it turned out to simply not be useful. I'd be surprised if the new C++ support for it does anything worthwhile.Well, one should initialize with signaling NaN. Then you get an exception if you try to compute using uninitialized values.
Aug 26 2014
On Tuesday, 26 August 2014 at 07:24:19 UTC, Don wrote:Processors from AMD have signalling NaN behaviour which is different from processors from Intel. And the situation is worst on most other architectures. It's a lost cause, I think.I disagree. AFAIK signaling NaN was standardized in IEEE 754-2008. So it receives attention.
Aug 26 2014
On Tuesday, 26 August 2014 at 07:34:05 UTC, Ola Fosheim Gr wrote:On Tuesday, 26 August 2014 at 07:24:19 UTC, Don wrote:It was always in IEEE754. The decision in 754-2008 was simply to not remove it from the spec (a lot of people wanted to remove it). I don't think anything has changed. The point is, existing hardware does not support it consistently. It's not possible at reasonable cost. --- real uninitialized_var = real.snan; void foo() { real other_var = void; asm { fld uninitialized_var; fstp other_var; } } --- will signal on AMD, but not Intel. I'd love for this to work, but the hardware is fighting against us. I think it's useful only for debugging.Processors from AMD have signalling NaN behaviour which is different from processors from Intel. And the situation is worst on most other architectures. It's a lost cause, I think.I disagree. AFAIK signaling NaN was standardized in IEEE 754-2008. So it receives attention.
Aug 26 2014
On Tuesday, 26 August 2014 at 10:55:20 UTC, Don wrote:It was always in IEEE754. The decision in 754-2008 was simply to not remove it from the spec (a lot of people wanted to remove it). I don't think anything has changed.It was implementation defined before. I think they specified the bit in 2008.fld uninitialized_var; fstp other_var;This is not SSE, but I guess MOVSS does not create exceptions either. AVX is quite complicated, but searching for "signaling" gives some hints about the semantics you can rely on. https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf Ola.
Aug 26 2014
On Tuesday, 26 August 2014 at 12:37:58 UTC, Ola Fosheim Grøstad wrote:either. AVX is quite complicated, but searching for "signaling" gives some hints about the semantics you can rely on.…https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf(Actually, searching for "SNAN" is better…)
Aug 26 2014
With the danger of being noisy, these instructions are subject to floating point exceptions according to my (perhaps sloppy) reading of Intel Architecture Instruction Set Extensions Programming Reference (2012): (V)ADDPD, (V)ADDPS, (V)ADDSUBPD, (V)ADDSUBPS, (V)CMPPD, (V)CMPPS, (V)CVTDQ2PS, (V)CVTPD2DQ, (V)CVTPD2PS, (V)CVTPS2DQ, (V)CVTTPD2DQ, (V)CVTTPS2DQ, (V)DIVPD, (V)DIVPS, (V)DPPD*, (V)DPPS*, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD, VFMADDSUB132PS, VFMADDSUB213PS, VFMADDSUB231PS, VFMSUBADD132PD, VFMSUBADD213PD, VFMSUBADD231PD, VFMSUBADD132PS, VFMSUBADD213PS, VFMSUBADD231PS, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUB132PS, VFMSUB213PS, VFMSUB231PS, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADD132PS, VFNMADD213PS, VFNMADD231PS, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUB132PS, VFNMSUB213PS, VFNMSUB231PS, (V)HADDPD, (V)HADDPS, (V)HSUBPD, (V)HSUBPS, (V)MAXPD, (V)MAXPS, (V)MINPD, (V)MINPS, (V)MULPD, (V)MULPS, (V)ROUNDPS, (V)ROUNDPS, (V)SQRTPD, (V)SQRTPS, (V)SUBPD, (V)SUBPS (V)ADDSD, (V)ADDSS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)CVTPS2PD, (V)CVTSD2SI, (V)CVTSD2SS, (V)CVTSI2SD, (V)CVTSI2SS, (V)CVTSS2SD, (V)CVTSS2SI, (V)CVTTSD2SI, (V)CVTTSS2SI, (V)DIVSD, (V)DIVSS, VFMADD132SD, VFMADD213SD, VFMADD231SD, VFMADD132SS, VFMADD213SS, VFMADD231SS, VFMSUB132SD, VFMSUB213SD, VFMSUB231SD, VFMSUB132SS, VFMSUB213SS, VFMSUB231SS, VFNMADD132SD, VFNMADD213SD, VFNMADD231SD, VFNMADD132SS, VFNMADD213SS, VFNMADD231SS, VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231SD, VFNMSUB132SS, VFNMSUB213SS, VFNMSUB231SS, (V)MAXSD, (V)MAXSS, (V)MINSD, (V)MINSS, (V)MULSD, (V)MULSS, (V)ROUNDSD, (V)ROUNDSS, (V)SQRTSD, (V)SQRTSS, (V)SUBSD, (V)SUBSS, (V)UCOMISD, (V)UCOMISS VCVTPH2PS, VCVTPS2PH So I guess Intel floating point exceptions trigger on computations, but not on moves? Ola.
Aug 26 2014
On Tuesday, 26 August 2014 at 12:37:58 UTC, Ola Fosheim Grøstad wrote:On Tuesday, 26 August 2014 at 10:55:20 UTC, Don wrote:No, it's more subtle. On the original x87, signalling NaNs are triggered for 64 bits loads, but not for 80 bit loads. You have to read the fine print to discover this. I don't think the behaviour was intentional.It was always in IEEE754. The decision in 754-2008 was simply to not remove it from the spec (a lot of people wanted to remove it). I don't think anything has changed.It was implementation defined before. I think they specified the bit in 2008.fld uninitialized_var; fstp other_var;This is not SSE, but I guess MOVSS does not create exceptions either.
Aug 26 2014
On Tuesday, 26 August 2014 at 13:24:11 UTC, Don wrote:No, it's more subtle. On the original x87, signalling NaNs are triggered for 64 bits loads, but not for 80 bit loads. You have to read the fine print to discover this.You are right, but it happens for loads from the FP-stack too: «Source operand is an SNaN. Does not occur if the source operand is in double extended-precision floating-point format (FLD m80fp or FLD ST(i)).»I don't think the behaviour was intentional.It seems reasonable, you need to load/save NaNs without exceptions if you do a context switch? I don't think the extended format was not meant for "end users". Anyway, the x87 FP stack is history, even MOVSS is considered legacy by Intel…
Aug 26 2014
On Tuesday, 26 August 2014 at 13:43:56 UTC, Ola Fosheim Grøstad wrote:Anyway, the x87 FP stack is history, even MOVSS is considered legacy by Intel…Sorry for being off-topic, but MOVSS and VMOVSS on AMD don't throw FP exceptions either, but calculations does. So it seems like AMD and Intel sufficiently close for D to support NaNs, IMHO. Forget the legacy… http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/26568_APM_v41.pdf Ola.
Aug 26 2014
On 8/26/2014 12:24 AM, Don wrote:On Monday, 25 August 2014 at 23:29:21 UTC, Walter Bright wrote:The other issues were just when the snan => qnan conversion took place. This is quite unclear given the extensive constant folding, CTFE, etc., that D does. It was also affected by how dmd generates code. Some code gen on floating point doesn't need the FPU, such as toggling the sign bit. But then what happens with snan => qnan? The whole thing is an undefined, unmanageable mess.On 8/25/2014 4:15 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:To be more concrete: Processors from AMD have signalling NaN behaviour which is different from processors from Intel. And the situation is worst on most other architectures. It's a lost cause, I think.On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright wrote:That's the theory. The practice doesn't work out so well.I didn't know that. But recall I did implement it in DMC++, and it turned out to simply not be useful. I'd be surprised if the new C++ support for it does anything worthwhile.Well, one should initialize with signaling NaN. Then you get an exception if you try to compute using uninitialized values.
Aug 27 2014
On Wednesday, 27 August 2014 at 23:51:54 UTC, Walter Bright wrote:On 8/26/2014 12:24 AM, Don wrote:I think the way to think of it is, to the programmer, there is *no such thing* as an snan value. It's an implementation detail that should be invisible. Semantically, a signalling nan is a qnan value with a hardware breakpoint on it. An SNAN should never enter the CPU. The CPU always converts them to QNAN if you try. You're kind of not supposed to know that SNAN exists. Because of this, I think SNAN only ever makes sense for static variables. Setting local variables to snan doesn't make sense. since the snan has to enter the CPU. Making that work without triggering the snan is very painful. Making it trigger the snan on all forms of access is even worse. If float.init exists, it cannot be an snan, since you are allowed to use float.init.On Monday, 25 August 2014 at 23:29:21 UTC, Walter Bright wrote:The other issues were just when the snan => qnan conversion took place. This is quite unclear given the extensive constant folding, CTFE, etc., that D does. It was also affected by how dmd generates code. Some code gen on floating point doesn't need the FPU, such as toggling the sign bit. But then what happens with snan => qnan? The whole thing is an undefined, unmanageable mess.On 8/25/2014 4:15 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:To be more concrete: Processors from AMD have signalling NaN behaviour which is different from processors from Intel. And the situation is worst on most other architectures. It's a lost cause, I think.On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright wrote:That's the theory. The practice doesn't work out so well.I didn't know that. But recall I did implement it in DMC++, and it turned out to simply not be useful. I'd be surprised if the new C++ support for it does anything worthwhile.Well, one should initialize with signaling NaN. Then you get an exception if you try to compute using uninitialized values.
Aug 28 2014
On Thursday, 28 August 2014 at 11:09:16 UTC, Don wrote:I think the way to think of it is, to the programmer, there is *no such thing* as an snan value. It's an implementation detail that should be invisible. Semantically, a signalling nan is a qnan value with a hardware breakpoint on it.I disagree with this view. QNAN: there is a value, but it does not result in a real SNAN: the value is missing for an unspecified reason AFAIK some x86 ops such as ROUNDPD allows you to treat SNAN as QNAN or throw an exception. So there is an builtin test if needed. Other ops such as reciprocals don't throw any FP exceptions and will treat SNAN as QNAN.An SNAN should never enter the CPU. The CPU always converts them to QNAN if you try. You're kind of not supposed to know that SNAN exists.I'm not sure how you reached this interpretation? The solution should be to emit a test for SNAN explicitly or implicitly if you cannot prove that SNAN is impossible.
Aug 28 2014
Or to be more explicit: If have SNAN then there is no point in trying to recompute the expression using a different algorithm. If have QNAN then you might want to recompute the expression using a different algorithm (e.g. complex numbers or analytically). ?
Aug 28 2014
On Thursday, 28 August 2014 at 12:10:58 UTC, Ola Fosheim Grøstad wrote:Or to be more explicit: If have SNAN then there is no point in trying to recompute the expression using a different algorithm. If have QNAN then you might want to recompute the expression using a different algorithm (e.g. complex numbers or analytically). ?No. Once you load an SNAN, it isn't an SNAN any more! It is a QNAN. You cannot have an SNAN in a floating-point register (unless you do a nasty hack to pass it in). It gets converted during loading. const float x = snan; x = x; // x is now a qnan.
Aug 28 2014
On Thursday, 28 August 2014 at 14:43:30 UTC, Don wrote:No. Once you load an SNAN, it isn't an SNAN any more! It is a QNAN.By which definition? It is only if you consume the SNAN with an fp-exception-free arithmetic op that it should be turned into a QNAN. If you compute with an op that throws then it should throw an exception. MOV should not be viewed as a computation… It also makes sense to save SNAN to file when converting corrupted data-files. SNAN could then mean "corrupted" and QNAN could mean "absent". You should not get an exception for loading a file. You should get an exception if you start computing on the SNAN in the file.You cannot have an SNAN in a floating-point register (unless you do a nasty hack to pass it in). It gets converted during loading.I don't understand this position. If you cannot load SNAN then why does SSE handle SNAN in arithmetic ops and compares?const float x = snan; x = x; // x is now a qnan.I disagree (and why const?) Assignment does nothing, it should not consume the SNAN. Assignment is just "naming". It is not "computing".
Aug 28 2014
Let me try again: SNAN => unfortunately absent QNAN => deliberately absent So you can have: compute(SNAN) => handle(exception) { if(can turn unfortunate situation into deliberate) then compute(QNAN) else throw )
Aug 28 2014
Kahan states this in a 1997 paper: «[…]An SNaN may be moved ( copied ) without incident, but any other arithmetic operation upon an SNaN is an INVALID operation ( and so is loading one onto the ix87's stack ) that must trap or else produce a new nonsignaling NaN. ( Another way to turn an SNaN into a NaN is to turn 0xxx...xxx into 1xxx...xxx with a logical OR.) Intended for, among other things, data missing from statistical collections, and for uninitialized variables[…]» ( http://www.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF) x87 is legacy, it predates IEEE754 by 5 years and should be forgotten. Note also that the string representation for a signalling nan is "NANS", so it reasonable to save it to file if you need to represent missing data. "NAN" represents 0/0, sqrt(-1), not missing data. I'm not really sure how it can be interpreted differently? Ola.
Aug 28 2014
"Don" wrote in message news:fvxmsrbicgpqkkiufdyv forum.dlang.org...If float.init exists, it cannot be an snan, since you are allowed to use float.init.So should we get rid of them from the language completely? Using them as template parameters does even respect the sign of the NaN last time I checked, let alone the s/q or payload. If we change float.init to be a qnan then it won't be possible to make one at compile time.
Aug 28 2014
Am 25.08.2014 15:07, schrieb Don:On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:This would probably best added as another (CT) optional feature. I think the default should strictly adhere to the JSON specification, though.Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_json The new code contains: - Lazy lexer in the form of a token input range (using slices of the input if possible) - Lazy streaming parser (StAX style) in the form of a node input range - Eager DOM style parser returning a JSONValue - Range based JSON string generator taking either a token range, a node range, or a JSONValue - Opt-out location tracking (line/column) for tokens, nodes and values - No opDispatch() for JSONValue - this has shown to do more harm than good in vibe.data.json The DOM style JSONValue type is based on std.variant.Algebraic. This currently has a few usability issues that can be solved by upgrading/fixing Algebraic: - Operator overloading only works sporadically - No "tag" enum is supported, so that switch()ing on the type of a value doesn't work and an if-else cascade is required - Operations and conversions between different Algebraic types is not conveniently supported, which gets important when other similar formats get supported (e.g. BSON) Assuming that those points are solved, I'd like to get some early feedback before going for an official review. One open issue is how to handle unescaping of string literals. Currently it always unescapes immediately, which is more efficient for general input ranges when the unescaped result is needed, but less efficient for string inputs when the unescaped result is not needed. Maybe a flag could be used to conditionally switch behavior depending on the input range type. Destroy away! ;) [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.comOne missing feature (which is also missing from the existing std.json) is support for NaN and Infinity as JSON values. Although they are not part of the formal JSON spec (which is a ridiculous omission, the argument given for excluding them is fallacious), they do get generated if you use Javascript's toString to create the JSON. Many JSON libraries (eg Google's) also generate them, so they are frequently encountered in practice. So a JSON parser should at least be able to lex them. ie this should be parsable: {"foo": NaN, "bar": Infinity, "baz": -Infinity}You should also put tests in for what happens when you pass NaN or infinity to toJSON. It shouldn't silently generate invalid JSON.Good point. The current solution to just use formattedWrite("%.16g") is also not ideal.
Aug 25 2014
Am 25.08.2014 16:04, schrieb Sönke Ludwig:Am 25.08.2014 15:07, schrieb Don:http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/LexOptions.specialFloatLiterals.htmlOn Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:This would probably best added as another (CT) optional feature. I think the default should strictly adhere to the JSON specification, though.Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_json The new code contains: - Lazy lexer in the form of a token input range (using slices of the input if possible) - Lazy streaming parser (StAX style) in the form of a node input range - Eager DOM style parser returning a JSONValue - Range based JSON string generator taking either a token range, a node range, or a JSONValue - Opt-out location tracking (line/column) for tokens, nodes and values - No opDispatch() for JSONValue - this has shown to do more harm than good in vibe.data.json The DOM style JSONValue type is based on std.variant.Algebraic. This currently has a few usability issues that can be solved by upgrading/fixing Algebraic: - Operator overloading only works sporadically - No "tag" enum is supported, so that switch()ing on the type of a value doesn't work and an if-else cascade is required - Operations and conversions between different Algebraic types is not conveniently supported, which gets important when other similar formats get supported (e.g. BSON) Assuming that those points are solved, I'd like to get some early feedback before going for an official review. One open issue is how to handle unescaping of string literals. Currently it always unescapes immediately, which is more efficient for general input ranges when the unescaped result is needed, but less efficient for string inputs when the unescaped result is not needed. Maybe a flag could be used to conditionally switch behavior depending on the input range type. Destroy away! ;) [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.comOne missing feature (which is also missing from the existing std.json) is support for NaN and Infinity as JSON values. Although they are not part of the formal JSON spec (which is a ridiculous omission, the argument given for excluding them is fallacious), they do get generated if you use Javascript's toString to create the JSON. Many JSON libraries (eg Google's) also generate them, so they are frequently encountered in practice. So a JSON parser should at least be able to lex them. ie this should be parsable: {"foo": NaN, "bar": Infinity, "baz": -Infinity}By default, floating-point special values are now output as 'null', according to the ECMA-script standard. Optionally, they will be emitted as 'NaN' and 'Infinity': http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.specialFloatLiterals.htmlYou should also put tests in for what happens when you pass NaN or infinity to toJSON. It shouldn't silently generate invalid JSON.Good point. The current solution to just use formattedWrite("%.16g") is also not ideal.
Aug 25 2014
On Monday, 25 August 2014 at 15:34:29 UTC, Sönke Ludwig wrote:By default, floating-point special values are now output as 'null', according to the ECMA-script standard. Optionally, they will be emitted as 'NaN' and 'Infinity':ECMAScript presumes double. I think one should base Phobos on language-independent standards. I suggest: http://tools.ietf.org/html/rfc7159 For a web server it would be most useful to get an exception since you risk ending up with web-clients not working with no logging. It is better to have an exception and log an error so the problem can be fixed.
Aug 25 2014
On Monday, 25 August 2014 at 15:46:12 UTC, Ola Fosheim Grøstad wrote:For a web server it would be most useful to get an exception since you risk ending up with web-clients not working with no logging. It is better to have an exception and log an error so the problem can be fixed.Let me expand a bit on the difference between web clients and servers, assuming D is used on the server: * Web servers have to check all input and log illegal activity. It is either a bug or an attack. * Web clients don't have to check input from the server (at most a crypto check) and should not do double work if servers validate anyway. * Web servers detect errors and send the error as a response to the client that displays it as a warning to the user. This is the uncommon case so you don't want to burden the client with it. From this we can infer: - It makes more sense for ECMAScript to turn illegal values into null since it runs on the client. - The server needs efficient validation of input so that it can have faster response. - The more integration of validation of typedness you can have in the parser, the better. Thus it would be an advantage to be able to configure the validation done in the parser (through template mechanisms): 1. On write: throw exception on all illegal values or values that cannot be represented in the format. If the values are illegal then the client should not receive it. It could cause legal problems (like wrong prices). 2. On read: add the ability to configure the validation of typedness on many parameters: - no nulls, no dicts, only nesting arrays etc - predetermined key-values and automatic mapping to structs on exact match. - require all leaf arrays to be uniform (array of strings, array of numbers) - match a predefined grammar etc
Aug 25 2014
- It makes more sense for ECMAScript to turn illegal values into null since it runs on the client.Like... node.js? Sorry, just kidding. I don't think it makes sense for clients to be less strict about such things, but I do agree with your assessment about being as strict as possible on the server. I also do think that exceptions are a perfect tool especially for server applications and that instead of avoiding them because they are slow, they should better be made fast enough to not be an issue.
Aug 25 2014
Am 25.08.2014 17:46, schrieb "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>":On Monday, 25 August 2014 at 15:34:29 UTC, Sönke Ludwig wrote:Well, of course it's based on that RFC, did you seriously think something else? However, that standard has no mention of infinity or NaN, and since JSON is designed to be a subset of ECMA script, it's basically the only thing that comes close.By default, floating-point special values are now output as 'null', according to the ECMA-script standard. Optionally, they will be emitted as 'NaN' and 'Infinity':ECMAScript presumes double. I think one should base Phobos on language-independent standards. I suggest: http://tools.ietf.org/html/rfc7159For a web server it would be most useful to get an exception since you risk ending up with web-clients not working with no logging. It is better to have an exception and log an error so the problem can be fixed.Although you have a point there of course, it's also highly unlikely that those clients would work correctly if we presume that JSON supported infinity/NaN. So it would really be just coincidence to detect a bug like that. But I generally agree, it's just that the anti-exception voices are pretty loud these days (including Walter's), so that I opted for a non-throwing solution instead. I guess it wouldn't hurt though to default to throwing an exception, while still providing the GeneratorOptions.specialFloatLiterals option to handle those values without exception overhead, but in a non standard-conforming way.
Aug 25 2014
Am 25.08.2014 22:21, schrieb Sönke Ludwig:that standard has no mention of infinity or NaNSorry, to be precise, it has no suggestion of how to *handle* infinity or NaN.
Aug 25 2014
On Monday, 25 August 2014 at 20:21:01 UTC, Sönke Ludwig wrote:Well, of course it's based on that RFC, did you seriously think something else?I made no assumptions, just responded to what you wrote :-). It would be reasonable in the context of vibe.d to assume the ECMAScript spec.But I generally agree, it's just that the anti-exception voices are pretty loud these days (including Walter's), so that I opted for a non-throwing solution instead.Yes, the minimum requirement is to just get "did not validate" directly as a single value. One can create a wrapper to get exceptions.I guess it wouldn't hurt though to default to throwing an exception, while still providing the GeneratorOptions.specialFloatLiterals option to handle those values without exception overhead, but in a non standard-conforming way.What I care most about is getting all the free validation that can be added with no extra cost. That will make writing web services easier. Like if you can define constraints like: - root is array, values are strings. - root is array, second level only arrays, third level is numbers - root is dict, all arrays contain only numbers What is a bit annoying about generic libs is that you have no idea what you are getting so you have to spend time creating dull validation code. But maybe StructuredJSON should be a separate library. It would be useful for REST services to specify the grammar and auto-generate both javascript and D structures to hold it along with validation code. However, just turning off parsing of "true", "false", "null", "[", "{" etc seems like a cheap addition that also can improve parsing speed if the compiler can make do with two if statements instead of a switch. Ola.
Aug 25 2014
On Monday, 25 August 2014 at 14:04:12 UTC, Sönke Ludwig wrote:Am 25.08.2014 15:07, schrieb Don:Yes, it should be optional, but not a compile-time option. I think it should parse it, and based on a runtime flag, throw an error (perhaps an OutOfRange error or something, and use the same thing for values that exceed the representable range). An app may accept these non-standard values under certain circumstances and not others. In real-world code, you see a *lot* of these guys. Part of the reason these are important, is that NaN or Infinity generally means some Javascript code just has an uninitialized variable. Any other kind of invalid JSON typically means something very nasty has happened. It's important to distinguish these.On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:This would probably best added as another (CT) optional feature. I think the default should strictly adhere to the JSON specification, though.Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_json The new code contains: - Lazy lexer in the form of a token input range (using slices of the input if possible) - Lazy streaming parser (StAX style) in the form of a node input range - Eager DOM style parser returning a JSONValue - Range based JSON string generator taking either a token range, a node range, or a JSONValue - Opt-out location tracking (line/column) for tokens, nodes and values - No opDispatch() for JSONValue - this has shown to do more harm than good in vibe.data.json The DOM style JSONValue type is based on std.variant.Algebraic. This currently has a few usability issues that can be solved by upgrading/fixing Algebraic: - Operator overloading only works sporadically - No "tag" enum is supported, so that switch()ing on the type of a value doesn't work and an if-else cascade is required - Operations and conversions between different Algebraic types is not conveniently supported, which gets important when other similar formats get supported (e.g. BSON) Assuming that those points are solved, I'd like to get some early feedback before going for an official review. One open issue is how to handle unescaping of string literals. Currently it always unescapes immediately, which is more efficient for general input ranges when the unescaped result is needed, but less efficient for string inputs when the unescaped result is not needed. Maybe a flag could be used to conditionally switch behavior depending on the input range type. Destroy away! ;) [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.comOne missing feature (which is also missing from the existing std.json) is support for NaN and Infinity as JSON values. Although they are not part of the formal JSON spec (which is a ridiculous omission, the argument given for excluding them is fallacious), they do get generated if you use Javascript's toString to create the JSON. Many JSON libraries (eg Google's) also generate them, so they are frequently encountered in practice. So a JSON parser should at least be able to lex them. ie this should be parsable: {"foo": NaN, "bar": Infinity, "baz": -Infinity}
Aug 26 2014
Am 26.08.2014 15:43, schrieb Don:On Monday, 25 August 2014 at 14:04:12 UTC, Sönke Ludwig wrote:Why not a compile time option? That sounds to me like such an app should simply enable parsing those values and manually test for NaN at places where it matters. For all other (the majority) of applications, encountering NaN/Infinity will simply mean that there is a bug, so it makes sense to not accept those at all by default. Apart from that I don't think that it's a good idea for the lexer in general to accept non-standard input by default.Am 25.08.2014 15:07, schrieb Don:Yes, it should be optional, but not a compile-time option. I think it should parse it, and based on a runtime flag, throw an error (perhaps an OutOfRange error or something, and use the same thing for values that exceed the representable range). An app may accept these non-standard values under certain circumstances and not others. In real-world code, you see a *lot* of these guys.ie this should be parsable: {"foo": NaN, "bar": Infinity, "baz": -Infinity}This would probably best added as another (CT) optional feature. I think the default should strictly adhere to the JSON specification, though.Part of the reason these are important, is that NaN or Infinity generally means some Javascript code just has an uninitialized variable. Any other kind of invalid JSON typically means something very nasty has happened. It's important to distinguish these.As far as I understood, JavaScript will output those special values as null (at least when not using external JSON libraries). But even if not, an uninitialized variable can also be very nasty, so it's hard to see why that kind of bug should be silently supported (by default).
Aug 26 2014
On Tuesday, 26 August 2014 at 14:06:42 UTC, Sönke Ludwig wrote:Am 26.08.2014 15:43, schrieb Don:Please note, I've been talking about the lexer. I'm choosing my words very carefully.On Monday, 25 August 2014 at 14:04:12 UTC, Sönke Ludwig wrote:Why not a compile time option? That sounds to me like such an app should simply enable parsing those values and manually test for NaN at places where it matters. For all other (the majority) of applications, encountering NaN/Infinity will simply mean that there is a bug, so it makes sense to not accept those at all by default. Apart from that I don't think that it's a good idea for the lexer in general to accept non-standard input by default.Am 25.08.2014 15:07, schrieb Don:Yes, it should be optional, but not a compile-time option. I think it should parse it, and based on a runtime flag, throw an error (perhaps an OutOfRange error or something, and use the same thing for values that exceed the representable range). An app may accept these non-standard values under certain circumstances and not others. In real-world code, you see a *lot* of these guys.ie this should be parsable: {"foo": NaN, "bar": Infinity, "baz": -Infinity}This would probably best added as another (CT) optional feature. I think the default should strictly adhere to the JSON specification, though.No. Javascript generates them directly. Naive JS code generates these guys. That's why they're so important.Part of the reason these are important, is that NaN or Infinity generally means some Javascript code just has an uninitialized variable. Any other kind of invalid JSON typically means something very nasty has happened. It's important to distinguish these.As far as I understood, JavaScript will output those special values as null (at least when not using external JSON libraries).But even if not, an uninitialized variable can also be very nasty, so it's hard to see why that kind of bug should be silently supported (by default).I never said it should accepted by default. I said it is a situation which should be *lexed*. Ideally, by default it should give a different error from simply 'invalid JSON'. I believe it should ALWAYS be lexed, even if an error is ultimately generated. This is the difference: if you get NaN or Infinity, there's probably a straightforward bug in the Javascript code, but your D code is fine. Any other kind of JSON parsing error means you've got a garbage string that isn't JSON at all. They are very different errors. It's a diagnostics issue.
Aug 26 2014
Am 26.08.2014 16:40, schrieb Don:On Tuesday, 26 August 2014 at 14:06:42 UTC, Sönke Ludwig wrote:I've been talking about the lexer, too. Sorry for the confusing use of the term "parsing" (after all, the lexer is also a parser, but anyway).Am 26.08.2014 15:43, schrieb Don:Please note, I've been talking about the lexer. I'm choosing my words very carefully.On Monday, 25 August 2014 at 14:04:12 UTC, Sönke Ludwig wrote:Why not a compile time option? That sounds to me like such an app should simply enable parsing those values and manually test for NaN at places where it matters. For all other (the majority) of applications, encountering NaN/Infinity will simply mean that there is a bug, so it makes sense to not accept those at all by default. Apart from that I don't think that it's a good idea for the lexer in general to accept non-standard input by default.Am 25.08.2014 15:07, schrieb Don:Yes, it should be optional, but not a compile-time option. I think it should parse it, and based on a runtime flag, throw an error (perhaps an OutOfRange error or something, and use the same thing for values that exceed the representable range). An app may accept these non-standard values under certain circumstances and not others. In real-world code, you see a *lot* of these guys.ie this should be parsable: {"foo": NaN, "bar": Infinity, "baz": -Infinity}This would probably best added as another (CT) optional feature. I think the default should strictly adhere to the JSON specification, though.JSON.stringify(0/0) == "null" Holds for all browsers that I've tested.No. Javascript generates them directly. Naive JS code generates these guys. That's why they're so important.Part of the reason these are important, is that NaN or Infinity generally means some Javascript code just has an uninitialized variable. Any other kind of invalid JSON typically means something very nasty has happened. It's important to distinguish these.As far as I understood, JavaScript will output those special values as null (at least when not using external JSON libraries).The error will be more like "filename(line:column): Invalid token" - possibly the text following the line/column could also be displayed. Wouldn't that be sufficient?But even if not, an uninitialized variable can also be very nasty, so it's hard to see why that kind of bug should be silently supported (by default).I never said it should accepted by default. I said it is a situation which should be *lexed*. Ideally, by default it should give a different error from simply 'invalid JSON'. I believe it should ALWAYS be lexed, even if an error is ultimately generated. This is the difference: if you get NaN or Infinity, there's probably a straightforward bug in the Javascript code, but your D code is fine. Any other kind of JSON parsing error means you've got a garbage string that isn't JSON at all. They are very different errors. It's a diagnostics issue.
Aug 26 2014
Am 26.08.2014 16:51, schrieb Sönke Ludwig:Am 26.08.2014 16:40, schrieb Don:One argument against supporting it in the parser is that the parser currently works without any configuration, but the user would then have to specify two sets of configuration options with this added.This is the difference: if you get NaN or Infinity, there's probably a straightforward bug in the Javascript code, but your D code is fine. Any other kind of JSON parsing error means you've got a garbage string that isn't JSON at all. They are very different errors. It's a diagnostics issue.The error will be more like "filename(line:column): Invalid token" - possibly the text following the line/column could also be displayed. Wouldn't that be sufficient?
Aug 26 2014
On Tuesday, 26 August 2014 at 14:40:02 UTC, Don wrote:This is the difference: if you get NaN or Infinity, there's probably a straightforward bug in the Javascript code, but your D code is fine. Any other kind of JSON parsing error means you've got a garbage string that isn't JSON at all. They are very different errors.I don't care either way, but JSON.stringify() has the following support: IE8 and up Firefox 3.5 and up Safari 4 and up Chrome So not using it is very much legacy…
Aug 26 2014
Hi! Thanks for the effort you've put in this. I am having problems with building with LDC 0.14.0. DMD 2.066.0 seems to work fine (all unit tests pass). Do you have any ideas why? I am using Ubuntu 3.10 (Linux 3.11.0-15-generic x86_64). Master was at 6a9f8e62e456c3601fe8ff2e1fbb640f38793d08. $ dub fetch std_data_json --version=~master $ cd std_data_json-master/ $ dub test --compiler=ldc2 Generating test runner configuration '__test__library__' for 'library' (library). Building std_data_json ~master configuration "__test__library__", build type unittest. Running ldc2... source/stdx/data/json/parser.d(77): Error: safe function 'stdx.data.json.parser.__unittestL68_22' cannot call system function 'object.AssociativeArray!(string, JSONValue).AssociativeArray.length' source/stdx/data/json/parser.d(124): Error: safe function 'stdx.data.json.parser.__unittestL116_24' cannot call system function 'object.AssociativeArray!(string, JSONValue).AssociativeArray.length' source/stdx/data/json/parser.d(341): Error: function stdx.data.json.parser.JSONParserRange!(JSONLexerRange!string).JSONParserRange.opAssign is not callable because it is annotated with disable source/stdx/data/json/parser.d(341): Error: safe function 'stdx.data.json.parser.__unittestL318_32' cannot call system function 'stdx.data.json.parser.JSONParserRange!(JSONLexerRange!string).JSONParserRange.opAssign' source/stdx/data/json/parser.d(633): Error: function stdx.data.json.lexer.JSONToken.opAssign is not callable because it is annotated with disable source/stdx/data/json/parser.d(633): Error: 'stdx.data.json.lexer.JSONToken.opAssign' is not nothrow source/stdx/data/json/parser.d(630): Error: function 'stdx.data.json.parser.JSONParserNode.literal' is nothrow yet may throw FAIL .dub/build/__test__library__-unittest-linux.posix-x86_64-ldc2-0F620B217010475A5A4E545A57CDD09A/ __test__library__ executable Error executing command test: ldc2 failed with exit code 1. Thanks
Aug 25 2014
... I am using Ubuntu 3.10 (Linux 3.11.0-15-generic x86_64). ...I meant Ubuntu 13.10 :D
Aug 25 2014
Am 26.08.2014 03:31, schrieb Entusiastic user:Hi! Thanks for the effort you've put in this. I am having problems with building with LDC 0.14.0. DMD 2.066.0 seems to work fine (all unit tests pass). Do you have any ideas why?I've fixed all errors on DMD 2.065 now. Hopefully that should also fix LDC.
Aug 26 2014
On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_jsonDo we have any benchmarks for this yet. Note that the main motivation for a new json parsers was that std.json is remarkable slow in comparison to python's json or ujson.
Aug 26 2014
Been using it for a bit now, I think the only thing I have to say is having to insert all of those `JSONValue` everywhere is tiresome and I never know when I have to do it. Atila On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_json The new code contains: - Lazy lexer in the form of a token input range (using slices of the input if possible) - Lazy streaming parser (StAX style) in the form of a node input range - Eager DOM style parser returning a JSONValue - Range based JSON string generator taking either a token range, a node range, or a JSONValue - Opt-out location tracking (line/column) for tokens, nodes and values - No opDispatch() for JSONValue - this has shown to do more harm than good in vibe.data.json The DOM style JSONValue type is based on std.variant.Algebraic. This currently has a few usability issues that can be solved by upgrading/fixing Algebraic: - Operator overloading only works sporadically - No "tag" enum is supported, so that switch()ing on the type of a value doesn't work and an if-else cascade is required - Operations and conversions between different Algebraic types is not conveniently supported, which gets important when other similar formats get supported (e.g. BSON) Assuming that those points are solved, I'd like to get some early feedback before going for an official review. One open issue is how to handle unescaping of string literals. Currently it always unescapes immediately, which is more efficient for general input ranges when the unescaped result is needed, but less efficient for string inputs when the unescaped result is not needed. Maybe a flag could be used to conditionally switch behavior depending on the input range type. Destroy away! ;) [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.com
Sep 08 2014
Here's my destruction of std.data.json. * lexer.d: ** Beautifully done. From what I understand, if the input is string or immutable(ubyte)[] then the strings are carved out as slices of the input, as opposed to newly allocated. Awesome. ** The string after lexing is correctly scanned and stored in raw format (escapes are not rewritten) and decoded on demand. Problem with decoding is that it may allocate memory, and it would be great (and not difficult) to make the lexer 100% lazy/non-allocating. To achieve that, lexer.d should define TWO "Kind"s of strings at the lexer level: regular string and undecoded string. The former is lexer.d's way of saying "I got lucky" in the sense that it didn't detect any '\\' so the raw and decoded strings are identical. No need for anyone to do any further processing in the majority of cases => win. The latter means the lexer lexed the string, saw at least one '\\', and leaves it to the caller to do the actual decoding. ** After moving the decoding business out of lexer.d, a way to take this further would be to qualify lexer methods as nogc if the input is string/immutable(ubyte)[]. I wonder how to implement a conditional attribute. We'll probably need a language enhancement for that. ** The implementation uses manually-defined tagged unions for work. Could we use Algebraic instead - dogfooding and all that? I recall there was a comment in Sönke's original work that Algebraic has a specific issue (was it false pointers?) - so the question arises, should we fix Algebraic and use it thus helping other uses as well? ** I see the "boolean" kind, should we instead have the "true_" and "false_" kinds? ** Long story short I couldn't find any major issue with this module, and I looked! I do think the decoding logic should be moved outside of lexer.d or at least the JSONLexerRange. * generator.d: looking good, no special comments. Like the consistent use of structs filled with options as template parameters. * foundation.d: ** At four words per token, Location seems pretty bulky. How about reducing line and column to uint? ** Could JSONException create the message string in toString (i.e. when/if used) as opposed to in the constructor? * parser.d: ** How about using .init instead of .defaults for options? ** I'm a bit surprised by JSONParserNode.Kind. E.g. the objectStart/End markers shouldn't appear as nodes. There should be an "object" node only. I guess that's needed for laziness. ** It's unclear where memory is being allocated in the parser. nogc annotations wherever appropriate would be great. * value.d: ** Looks like this is/may be the only place where memory is being managed, at least if the input is string/immutable(ubyte)[]. Right? ** Algebraic ftw. ============================ Overall: This is very close to everything I hoped! A bit more care to nogc would be awesome, especially with the upcoming focus on memory management going forward. After one more pass it would be great to move forward for review. Andrei
Oct 12 2014
On Sunday, 12 October 2014 at 18:17:29 UTC, Andrei Alexandrescu wrote:** The string after lexing is correctly scanned and stored in raw format (escapes are not rewritten) and decoded on demand. Problem with decoding is that it may allocate memory, and it would be great (and not difficult) to make the lexer 100% lazy/non-allocating. To achieve that, lexer.d should define TWO "Kind"s of strings at the lexer level: regular string and undecoded string. The former is lexer.d's way of saying "I got lucky" in the sense that it didn't detect any '\\' so the raw and decoded strings are identical. No need for anyone to do any further processing in the majority of cases => win. The latter means the lexer lexed the string, saw at least one '\\', and leaves it to the caller to do the actual decoding.I'd like to see unescapeStringLiteral() made public. Then I can unescape multiple strings to the same preallocated destination, or even unescape in place (guaranteed to work since the result will always be smaller than the input).
Oct 12 2014
Oh, it looks like you aren't checking for 0x7F (DEL) as a control character.
Oct 12 2014
Am 12.10.2014 23:52, schrieb Sean Kelly:Oh, it looks like you aren't checking for 0x7F (DEL) as a control character.It doesn't get mentioned in the JSON spec, so I left it out. But I guess nothing speaks against adding it anyway.
Oct 13 2014
Am 12.10.2014 21:04, schrieb Sean Kelly:I'd like to see unescapeStringLiteral() made public. Then I can unescape multiple strings to the same preallocated destination, or even unescape in place (guaranteed to work since the result will always be smaller than the input).Will do. Same for the inverse functions.
Oct 13 2014
Am 12.10.2014 20:17, schrieb Andrei Alexandrescu:Here's my destruction of std.data.json. * lexer.d: ** Beautifully done. From what I understand, if the input is string or immutable(ubyte)[] then the strings are carved out as slices of the input, as opposed to newly allocated. Awesome. ** The string after lexing is correctly scanned and stored in raw format (escapes are not rewritten) and decoded on demand. Problem with decoding is that it may allocate memory, and it would be great (and not difficult) to make the lexer 100% lazy/non-allocating. To achieve that, lexer.d should define TWO "Kind"s of strings at the lexer level: regular string and undecoded string. The former is lexer.d's way of saying "I got lucky" in the sense that it didn't detect any '\\' so the raw and decoded strings are identical. No need for anyone to do any further processing in the majority of cases => win. The latter means the lexer lexed the string, saw at least one '\\', and leaves it to the caller to do the actual decoding.This is actually more or less done in unescapeStringLiteral() - if it doesn't find any '\\', it just returns the original string. Also JSONString allows to access its .rawValue without doing any decoding/allocations. https://github.com/s-ludwig/std_data_json/blob/master/source/stdx/data/json/lexer.d#L1421 Unfortunately .rawValue can't be nogc because the "raw" value might have to be constructed first when the input is not a "string" (in this case unescaping is done on-the-fly for efficiency reasons).** After moving the decoding business out of lexer.d, a way to take this further would be to qualify lexer methods as nogc if the input is string/immutable(ubyte)[]. I wonder how to implement a conditional attribute. We'll probably need a language enhancement for that.Isn't nogc inferred? Everything is templated, so that should be possible. Or does attribute inference only work for template function and not for methods of templated types? Should it?** The implementation uses manually-defined tagged unions for work. Could we use Algebraic instead - dogfooding and all that? I recall there was a comment in Sönke's original work that Algebraic has a specific issue (was it false pointers?) - so the question arises, should we fix Algebraic and use it thus helping other uses as well?I had started on an implementation of a type and ID safe TaggedAlgebraic that uses Algebraic for its internal storage. If we can get that in first, it should be no problem to use it instead (with no or minimal API breakage). However, it uses a struct instead of an enum to define the "Kind" (which is the only nice way I could conceive to safely couple enum value and type at compile time), so it's not as nice in the generated documentation.** I see the "boolean" kind, should we instead have the "true_" and "false_" kinds?I always found it cumbersome and awkward to work like that. What would be the reason to go that route?** Long story short I couldn't find any major issue with this module, and I looked! I do think the decoding logic should be moved outside of lexer.d or at least the JSONLexerRange. * generator.d: looking good, no special comments. Like the consistent use of structs filled with options as template parameters. * foundation.d: ** At four words per token, Location seems pretty bulky. How about reducing line and column to uint?Single line JSON files >64k (or line counts >64k) are no exception, so that would only work in a limited way. My thought about this was that it is quite unusual to actually store the tokens for most purposes (especially when directly serializing to a native D type), so that it should have minimal impact on performance or memory consumption.** Could JSONException create the message string in toString (i.e. when/if used) as opposed to in the constructor?That could of course be done, but the you'd not get the full error message using ex.msg, only with ex.toString(), which usually prints a call trace instead. Alternatively, it's also possible to completely avoid using exceptions with LexOptions.noThrow.* parser.d: ** How about using .init instead of .defaults for options?I'd slightly tend to prefer the more explicit "defaults", especially because "init" could mean either "defaults" or "none" (currently it means "none"). But another idea would be to invert the option values so that defaults==none... any objections?** I'm a bit surprised by JSONParserNode.Kind. E.g. the objectStart/End markers shouldn't appear as nodes. There should be an "object" node only. I guess that's needed for laziness.While you could infer the end of an object in the parser range by looking for the first entry that doesn't start with a "key" node, the same would not be possible for arrays, so in general the end marker *is* required. Not that the parser range is a StAX style parser, which is still very close to the lexical structure of the document. I was also wondering if there might be a better name than "JSONParserNode". It's not really embedded into a tree or graph structure, which the name tends to suggest.** It's unclear where memory is being allocated in the parser. nogc annotations wherever appropriate would be great.The problem is that the parser accesses the lexer, which in turn accesses the underlying input range, which in turn could allocate. Depending on the options passed to the lexer, it could also throw, and thus allocate, an exception. In the end only JSONParserRange.empty could generally be made nogc. However, attribute inference should be possible here in theory (the noThrow option is compile-time).* value.d: ** Looks like this is/may be the only place where memory is being managed, at least if the input is string/immutable(ubyte)[]. Right?Yes, at least when setting aside optional exceptions and lazy allocations.** Algebraic ftw. ============================ Overall: This is very close to everything I hoped! A bit more care to nogc would be awesome, especially with the upcoming focus on memory management going forward.I've tried to use nogc (as well as nothrow) in more places, but mostly due to not knowing if the underlying input range allocates, it hasn't really been possible. Even on lower levels (private functions) almost any Phobos function that is called is currently not nogc for reasons that are not always obvious, so I gave up on that for now.After one more pass it would be great to move forward for review.There is also still one pending change that I didn't finish yet; the optional UTF input validation (never validate "string" inputs, but do validate "ubyte[]" inputs). Oh and there is the open issue of how to allocate in case of non-array inputs. Initially I wanted to wait with this until we have an allocators module, but Walter would like to have a way to do manual memory management in the initial version. However, the ideal design is still unclear to me - it would either simply resemble a general allocator interface, or could use something like a callback that returns an output range, which would probably be quite cumbersome to work with. Any ideas in this direction would be welcome. Sönke
Oct 13 2014
On 13/10/14 09:39, Sönke Ludwig wrote:64k? -- /Jacob Carlborg** At four words per token, Location seems pretty bulky. How about reducing line and column to uint?Single line JSON files >64k (or line counts >64k) are no exception
Oct 13 2014
Am 13.10.2014 13:33, schrieb Jacob Carlborg:On 13/10/14 09:39, Sönke Ludwig wrote:Oh, I've read "both line and column into a single uint", because of "four words per token" - considering that "word == 16bit", but Andrei obviously meant "word == (void*).sizeof". If simply using uint instead of size_t is meant, then that's of course a different thing.64k?** At four words per token, Location seems pretty bulky. How about reducing line and column to uint?Single line JSON files >64k (or line counts >64k) are no exception
Oct 13 2014
"Sönke Ludwig" wrote in message news:m1ge08$10ub$1 digitalmars.com...Oh, I've read "both line and column into a single uint", because of "four words per token" - considering that "word == 16bit", but Andrei obviously meant "word == (void*).sizeof". If simply using uint instead of size_t is meant, then that's of course a different thing.I suppose a 4GB single-line json file is still possible.
Oct 13 2014
Am 13.10.2014 16:36, schrieb Daniel Murphy:"Sönke Ludwig" wrote in message news:m1ge08$10ub$1 digitalmars.com...If we make that assumption, we'd have to change it from size_t to ulong, but my feeling is that this case (format error at >4GB && human tries to look at that place using an editor) should be rare enough that we can make the compromise in favor of a smaller struct size.Oh, I've read "both line and column into a single uint", because of "four words per token" - considering that "word == 16bit", but Andrei obviously meant "word == (void*).sizeof". If simply using uint instead of size_t is meant, then that's of course a different thing.I suppose a 4GB single-line json file is still possible.
Oct 13 2014
On Monday, 13 October 2014 at 17:21:44 UTC, Sönke Ludwig wrote:Am 13.10.2014 16:36, schrieb Daniel Murphy:What are you using the location structs for? In D:YAML they're only used for info about errors, so I use ushorts and ushort.max means "65535 or more"."Sönke Ludwig" wrote in message news:m1ge08$10ub$1 digitalmars.com...If we make that assumption, we'd have to change it from size_t to ulong, but my feeling is that this case (format error atOh, I've read "both line and column into a single uint", because of "four words per token" - considering that "word == 16bit", but Andrei obviously meant "word == (void*).sizeof". If simply using uint instead of size_t is meant, then that's of course a different thing.I suppose a 4GB single-line json file is still possible.4GB && human tries to look at that place using an editor)should be rare enough that we can make the compromise in favor of a smaller struct size.
Oct 13 2014
Am 13.10.2014 19:40, schrieb Kiith-Sa:On Monday, 13 October 2014 at 17:21:44 UTC, Sönke Ludwig wrote:Within the package itself they are also only used for error information. But they are also generally available with each token/node/value, so people could do very different things with them.Am 13.10.2014 16:36, schrieb Daniel Murphy:What are you using the location structs for? In D:YAML they're only used for info about errors, so I use ushorts and ushort.max means "65535 or more"."Sönke Ludwig" wrote in message news:m1ge08$10ub$1 digitalmars.com...If we make that assumption, we'd have to change it from size_t to ulong, but my feeling is that this case (format error atOh, I've read "both line and column into a single uint", because of "four words per token" - considering that "word == 16bit", but Andrei obviously meant "word == (void*).sizeof". If simply using uint instead of size_t is meant, then that's of course a different thing.I suppose a 4GB single-line json file is still possible.4GB && human tries to look at that place using an editor)should be rare enough that we can make the compromise in favor of a smaller struct size.
Oct 13 2014
On 10/13/14, 10:21 AM, Sönke Ludwig wrote:Am 13.10.2014 16:36, schrieb Daniel Murphy:Agreed. -- Andrei"Sönke Ludwig" wrote in message news:m1ge08$10ub$1 digitalmars.com...If we make that assumption, we'd have to change it from size_t to ulong, but my feeling is that this case (format error at >4GB && human tries to look at that place using an editor) should be rare enough that we can make the compromise in favor of a smaller struct size.Oh, I've read "both line and column into a single uint", because of "four words per token" - considering that "word == 16bit", but Andrei obviously meant "word == (void*).sizeof". If simply using uint instead of size_t is meant, then that's of course a different thing.I suppose a 4GB single-line json file is still possible.
Oct 13 2014
On 10/13/14, 4:45 AM, Sönke Ludwig wrote:Am 13.10.2014 13:33, schrieb Jacob Carlborg:Yah, one uint for each. -- AndreiOn 13/10/14 09:39, Sönke Ludwig wrote:Oh, I've read "both line and column into a single uint", because of "four words per token" - considering that "word == 16bit", but Andrei obviously meant "word == (void*).sizeof". If simply using uint instead of size_t is meant, then that's of course a different thing.64k?** At four words per token, Location seems pretty bulky. How about reducing line and column to uint?Single line JSON files >64k (or line counts >64k) are no exception
Oct 13 2014
On 22/08/14 00:35, Sönke Ludwig wrote:Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_jsonJSONToken.Kind and JSONParserNode.Kind could be "ubyte" to save space. -- /Jacob Carlborg
Oct 13 2014
Am 13.10.2014 13:37, schrieb Jacob Carlborg:On 22/08/14 00:35, Sönke Ludwig wrote:But it won't save space in practice, at least on x86, due to alignment, and depending on what the compiler assumes, the access can also be slower that way.Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_jsonJSONToken.Kind and JSONParserNode.Kind could be "ubyte" to save space.
Oct 13 2014
On 10/13/14, 4:48 AM, Sönke Ludwig wrote:Am 13.10.2014 13:37, schrieb Jacob Carlborg:Correct. -- AndreiOn 22/08/14 00:35, Sönke Ludwig wrote:But it won't save space in practice, at least on x86, due to alignment, and depending on what the compiler assumes, the access can also be slower that way.Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_jsonJSONToken.Kind and JSONParserNode.Kind could be "ubyte" to save space.
Oct 13 2014
On 8/21/14, 7:35 PM, Sönke Ludwig wrote:Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ DUB: http://code.dlang.org/packages/std_data_json Destroy away! ;) [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.comOnce its done you can compare its performance against other languages with this benchmark: https://github.com/kostya/benchmarks/tree/master/json
Oct 17 2014
On Friday, 17 October 2014 at 18:27:34 UTC, Ary Borenszweig wrote:Once its done you can compare its performance against other languages with this benchmark: https://github.com/kostya/benchmarks/tree/master/jsonWow, the C++Rapid parser is really impressive. I threw together a test with my own parser for comparison, and Rapid still beat it. It's the first parser I've encountered that's faster. Ruby 0.4995479721139979 0.49977992077421846 0.49981146157805545 7.53s, 2330.9Mb Python 0.499547972114 0.499779920774 0.499811461578 12.01s, 1355.1Mb C++ Rapid 0.499548 0.49978 0.499811 1.75s, 1009.0Mb JEP (mine) 0.49954797 0.49977992 0.49981146 2.38s, 203.4Mb
Oct 18 2014
On Saturday, 18 October 2014 at 19:53:23 UTC, Sean Kelly wrote:On Friday, 17 October 2014 at 18:27:34 UTC, Ary Borenszweig wrote:I just commented out the sscanf() call that was parsing the float and re-ran the test to see what the difference would be. Here's the new timing: JEP (mine) 0.00000000 0.00000000 0.00000000 1.23s, 203.1Mb So nearly half of the total execution time was spent simply parsing floats. For this reason, I'm starting to think that this isn't the best benchmark of JSON parser performance. The other issue with my parser is that it's written in C, and so all of the user-defined bits are called via a bank of function pointers. If it were converted to C++ or D where this could be done via templates it would be much faster. Just as a test I nulled out the function pointers I'd set to see what the cost of indirection was, and here's the result: JEP (mine) nan nan nan 0.57s, 109.4Mb The memory difference is interesting, and I can't entirely explain it other than to say that it's probably an artifact of my mapping in the file as virtual memory rather than reading it into an allocated buffer. Either way, roughly 0.60s can be attributed to indirect function calls and the bit of logic on the other side, which seems like a good candidate for optimization.Once its done you can compare its performance against other languages with this benchmark: https://github.com/kostya/benchmarks/tree/master/jsonWow, the C++Rapid parser is really impressive. I threw together a test with my own parser for comparison, and Rapid still beat it. It's the first parser I've encountered that's faster. C++ Rapid 0.499548 0.49978 0.499811 1.75s, 1009.0Mb JEP (mine) 0.49954797 0.49977992 0.49981146 2.38s, 203.4Mb
Oct 18 2014
On 10/18/14, 4:53 PM, Sean Kelly wrote:On Friday, 17 October 2014 at 18:27:34 UTC, Ary Borenszweig wrote:Yes, C++ rapid seems to be really, really fast. It has some sse2/see4 specific optimizations and I guess a lot more. I have to investigate more in order to do something similar :-)Once its done you can compare its performance against other languages with this benchmark: https://github.com/kostya/benchmarks/tree/master/jsonWow, the C++Rapid parser is really impressive. I threw together a test with my own parser for comparison, and Rapid still beat it. It's the first parser I've encountered that's faster. Ruby 0.4995479721139979 0.49977992077421846 0.49981146157805545 7.53s, 2330.9Mb Python 0.499547972114 0.499779920774 0.499811461578 12.01s, 1355.1Mb C++ Rapid 0.499548 0.49978 0.499811 1.75s, 1009.0Mb JEP (mine) 0.49954797 0.49977992 0.49981146 2.38s, 203.4Mb
Oct 19 2014
On Saturday, 18 October 2014 at 19:53:23 UTC, Sean Kelly wrote:Python 0.499547972114 0.499779920774 0.499811461578 12.01s, 1355.1MbI assume this is the standard json module? I am wondering how ujson is performing, which is considered the fastest python module.
Oct 20 2014
On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:...Added to the review queue as a work in progress with relevant links: http://wiki.dlang.org/Review_Queue
Feb 05 2015
On 2/5/15 1:07 AM, Jakob Ovrum wrote:On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:Yay! -- Andrei...Added to the review queue as a work in progress with relevant links: http://wiki.dlang.org/Review_Queue
Feb 05 2015
Am 05.02.2015 um 10:07 schrieb Jakob Ovrum:On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:Thanks! I(t) should be ready for an official review in one or two weeks when my schedule relaxes a little bit....Added to the review queue as a work in progress with relevant links: http://wiki.dlang.org/Review_Queue
Feb 05 2015