digitalmars.D - std.data.json formal review
- Atila Neves (4/4) Jul 28 2015 Start of the two week process, folks.
- Rikki Cattermole (8/12) Jul 28 2015 Right now, my view is no.
- Etienne Cimon (8/24) Jul 28 2015 I totally agree with that, but shouldn't it be consistent in
- Brad Anderson (6/22) Jul 28 2015 Just a reminder that this is the review thread, not the vote
- Etienne Cimon (5/32) Jul 28 2015 From what I see from std.allocator, there's no Allocator
- Rikki Cattermole (6/36) Jul 28 2015 There is one. IAllocator.
- Mathias Lang via Digitalmars-d (11/22) Jul 28 2015 Allocator is definitely a separate issue. It's a moving target, it's not
- Rikki Cattermole (5/25) Jul 28 2015 Right now we just need a plan, and we're all good for std.data.json.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (15/29) Jul 28 2015 If you pass a string or byte array as input, then there will be no
- Rikki Cattermole (6/38) Jul 28 2015 It was after 3am when I did my initial look. But I saw the appender
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/5) Jul 28 2015 You should still have a closer look, as it isn't very similar to the
- Rikki Cattermole (3/8) Jul 28 2015 Again after 3am when I first looked. I'll take a closer look and create
- Etienne Cimon (8/12) Jul 28 2015 This is cool:
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (9/23) Jul 28 2015 An idea might be to support something like this:
- Brad Anderson (6/15) Jul 28 2015 +1
- Etienne Cimon (6/36) Jul 28 2015 I like it quite well. No, actually, a lot. Thinking about it some
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/7) Jul 28 2015 Thanks for making it happen! Can you also post a quick link to this
- Walter Bright (37/38) Jul 28 2015 Thank you very much, Sönke, for taking this on. Thank you, Atila, for t...
- H. S. Teoh via Digitalmars-d (18/50) Jul 28 2015 +1. The API should be as simple as possible.
- Walter Bright (8/28) Jul 28 2015 That is a good point.
- H. S. Teoh via Digitalmars-d (11/42) Jul 28 2015 I'm pretty sure std.conv has interfaces that allow you to keep
- Walter Bright (3/7) Jul 28 2015 Not range friendly.
- Jacob Carlborg (5/6) Jul 29 2015 But in most cases I think there will be one root node, of type object.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/10) Jul 29 2015 I think a better approach that to add such a special case is to add a
- Walter Bright (3/7) Jul 29 2015 I don't understand the question.
- Jacob Carlborg (10/16) Jul 29 2015 I guess I'm finding it difficult to picture a JSON structure as a range....
- Walter Bright (7/14) Jul 29 2015 It if was returned as a range of nodes, it would be:
- Jacob Carlborg (5/8) Jul 30 2015 Ah, that make sense. Never though of an "end" mark like that, pretty
- Walter Bright (14/18) Jul 28 2015 So it appears that JSON can be in one of 3 useful states:
- H. S. Teoh via Digitalmars-d (6/18) Jul 28 2015 [...]
- Walter Bright (3/4) Jul 29 2015 You'd need to add a special node type, 'end'. So an array [1,true] would...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (26/45) Jul 29 2015 There are actually even four levels:
- Walter Bright (5/25) Jul 29 2015 What's the need for users to see a token stream? I don't know what the D...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (7/36) Jul 29 2015 Hm, I misread "container of JSON values" as "range of JSON values". I
- Walter Bright (3/6) Jul 29 2015 Ok, I see your point. The question then becomes does the node stream rea...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/20) Jul 30 2015 I agree that in case of JSON their difference can be a bit subtle.
- =?windows-1252?Q?S=F6nke_Ludwig?= (11/29) Jul 29 2015 We could maybe do that if we keep the current JSONValue as a struct
- Piotr Szturmaj (5/23) Jul 29 2015 Here's mine range based parser, you can parse 1 TB json file without a
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (56/97) Jul 29 2015 This is actually one of my pet peeves. Having a *readable* API that
- Walter Bright (41/140) Jul 29 2015 I agree with your goal of readability. And if someone wants to write cod...
- Suliman (2/2) Jul 30 2015 If this implementation will be merged with phobos will vibed
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/7) Jul 30 2015 I'll then make the vibe.d JSON module compatible using "alias this"
- Brad Anderson (17/26) Jul 30 2015 Is there any reason why D doesn't allow json.parseStream() in
- Walter Bright (12/16) Jul 30 2015 I would think it unlikely to be parsing two different formats in one fil...
- H. S. Teoh via Digitalmars-d (9/31) Jul 30 2015 Yeah, local imports are fast becoming my preferred D coding style,
- Walter Bright (3/8) Jul 30 2015 Funny how my preferred D style of writing code is steadily diverging fro...
- H. S. Teoh via Digitalmars-d (6/15) Jul 30 2015 One would hope so, otherwise why are we here instead of in the C++
- Suliman (6/6) Jul 31 2015 is the current build is ready for production? I am getting error:
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/10) Jul 31 2015 2.068 "fixed" possible safety issues with VariantN by marking the
- Suliman (11/25) Jul 31 2015 Wat revision are usable? I checked some and all have issue like:
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/27) Aug 01 2015 parseJSONValue takes a reference to an input range, so that it can
- Suliman (20/25) Aug 01 2015 Yes please, because it's hard to understand difference. Maybe
- Suliman (5/5) Aug 01 2015 Look like it's Variant type. So I tried to use method get! do
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/6) Aug 01 2015 The correct syntax is: response["code"].get!int
- Jacob Carlborg (14/22) Jul 30 2015 I kind of agree with that, but at the same time, if one always need to
- Nick Sabalausky (6/18) Aug 21 2015 It also fucks up UFCS, and I'm a huge fan of UFCS.
- David Nadlinger (4/5) Aug 21 2015 Are you saying that "import json : parseJSON = parse;
- Nick Sabalausky (32/36) Aug 22 2015 Ok, fair point, although I was referring more to fully-qualified name
- Don (17/55) Jul 29 2015 Related to this: it should not be importing std.bigint. Note that
- H. S. Teoh via Digitalmars-d (17/47) Jul 29 2015 [...]
- Laeeth Isharc (6/18) Jul 29 2015 Some JSON files can be quite large...
- sigod (4/11) Jul 29 2015 I think in your case it wouldn't matter. Comments are text,
- =?windows-1252?Q?S=F6nke_Ludwig?= (6/51) Jul 29 2015 That means a performance hit, because the string has to be parsed twice
- matovitch (13/13) Jul 29 2015 Hi Sonke,
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/17) Jul 29 2015 Hm, that example is outdated, I'll fix it ASAP. Currently it uses toJSON...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (15/29) Jul 29 2015 BigInt is opt-in, at least as far as the lexer goes. But why would such
- Dmitry Olshansky (14/31) Aug 02 2015 Actually JSON is defined as subset of EMCASCript-262 spec hence
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/24) Aug 03 2015
- Dmitry Olshansky (44/86) Aug 03 2015 Hm about 5 solid pages and indeed it leaves everything unspecified for
- Marco Leise (21/37) Sep 27 2015 I would take RapidJSON with a grain of salt, its main goal is
- Dmitry Olshansky (13/49) Sep 27 2015 Agreed. Still keep in mind the whole reason that Ruby supports it is
- Walter Bright (3/3) Jul 28 2015 A speed optimization, since JSON parsing speed is critical:
- Brad Anderson (3/7) Jul 28 2015 That's what it does (depending on which parser you use). The StAX
- Walter Bright (2/10) Jul 28 2015 Great!
- Andrea Fontana (40/44) Jul 29 2015 Why don't do a shortcut like:
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (12/54) Jul 29 2015 That would be another possibility. What do you think about the
- Andrea Fontana (25/74) Jul 29 2015 I implemented it too, but I removed.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (28/57) Jul 30 2015 In this case, since it would be a separate type, there are no static
- deadalnix (13/17) Aug 03 2015 Looked in the doc (
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (17/37) Aug 04 2015 The documentation is lacking, I'll improve that. JSONValue includes an
- deadalnix (10/17) Aug 04 2015 That is not going to cut it. I've been working with these for
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/20) Aug 11 2015 I just said that jsvar should be supported (even in its full glory), so
- deadalnix (8/30) Aug 11 2015 Ok, then maybe there was a misunderstanding on my part.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (12/35) Aug 12 2015 But take into account that Algebraic already behaves much like jsvar (at...
- Meta (3/9) Aug 12 2015 In relation to that, you may find this thread interesting:
- Atila Neves (6/10) Aug 11 2015 I forgot to give warnings that the two week period was about to
- deadalnix (13/25) Aug 11 2015 Ok some actionable items.
- Dmitry Olshansky (6/31) Aug 11 2015 +1 Also most JS engines use nan-boxing to fit type tag along with the
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (14/23) Aug 11 2015 But the array field already needs 16 bytes on 64-bit systems anyway. We
- Dmitry Olshansky (18/42) Aug 11 2015 Pointer to array should work for all fields > 8 bytes. Depending on the
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (26/68) Aug 12 2015 The trouble begins with long vs. ulong, even if we'd leave larger
- Walter Bright (2/6) Aug 13 2015 Make the type for storing a Number be a template parameter.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (9/18) Aug 14 2015 Then we'd lose the ability to distinguish between integers and floating
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/11) Aug 14 2015 Why can't you specify many types? You should be able to query the
- Walter Bright (5/11) Aug 14 2015 Two other solutions:
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/4) Aug 14 2015 Except for the lowest negative value…
- Walter Bright (3/7) Aug 14 2015 You can always use T for that.
- Matthias Bentrup (4/8) Aug 14 2015 actually the x87 format has 64 mantissa bits, although the bit 63
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/15) Aug 14 2015 Yes, Walter was right.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (37/48) Aug 11 2015 See
- deadalnix (31/72) Aug 11 2015 Urg. Looks like BigInt should steal a bit somewhere instead of
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (32/102) Aug 12 2015 Agreed, this was what I also thought. Considering that BigInt is heavy
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/9) Aug 12 2015 First proof of concept:
- Andrei Alexandrescu (10/20) Aug 14 2015 struct TaggedAlgebraic(U) if (is(U == union)) { ... }
- Timon Gehr (4/23) Aug 14 2015 The tag is an implementation detail. Algebraic types are actually more
- Andrei Alexandrescu (18/43) Aug 17 2015 Ping on this. My working hypothesis:
- Dmitry Olshansky (39/88) Aug 17 2015 Actually one can combine the two:
- Andrei Alexandrescu (2/5) Aug 17 2015 But a pointer tag can do everything that an integer tag does. -- Andrei
- Dmitry Olshansky (4/10) Aug 17 2015 albeit quite a deal slooower.
- Andrei Alexandrescu (3/12) Aug 18 2015 I think there's a misunderstanding. Pointers _are_ 64-bit integers and
- Dmitry Olshansky (9/22) Aug 18 2015 Integer in a small range is faster to switch on. Plus comparing to zero
- Andrei Alexandrescu (6/28) Aug 18 2015 But I'm talking about using pointers for indirect calls IN ADDITION to
- Dmitry Olshansky (11/40) Aug 18 2015 If common type fast path with 0 is not relevant then the only gain of
- deadalnix (10/67) Aug 17 2015 From the compiler perspective, the tag is much nicer. Compiler
- Andrei Alexandrescu (4/11) Aug 17 2015 Point taken. Question is if this is worth it.
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (8/23) Aug 18 2015 Not really, because it most likely doesn't point to where you
- Johannes Pfau (24/37) Aug 18 2015 Here's an example with an enum tag, showing what compilers can do:
- Andrei Alexandrescu (3/27) Aug 18 2015 That's a language issue - switch does not work with any pointers. I just...
- Johannes Pfau (8/44) Aug 18 2015 Yes, if we enable switch for pointers we get nicer D code.
- Andrei Alexandrescu (5/10) Aug 18 2015 I agree there's a margin here in favor of integers, but it's getting
- deadalnix (7/21) Aug 18 2015 No, enum can also be cramed inline in the code for cheap, they
- deadalnix (3/6) Aug 18 2015 No it is not. Is the set of values is not compact, no jump table.
- Andrei Alexandrescu (3/11) Aug 18 2015 No, in std.variant it points to a dispatcher function. -- Andrei
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (18/51) Aug 17 2015 (reposting to NG, accidentally replied by e-mail)
- Johannes Pfau (13/81) Aug 17 2015 I think Andrei's point is that a pointer tag can do most things a
- Suliman (6/6) Aug 17 2015 Why not working:
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/9) Aug 17 2015 toJSONValue() is the right function in this case. I've update the
- Suliman (5/16) Aug 17 2015 I think that I miss understanding conception of ranges. I reread
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/12) Aug 17 2015 String is a valid range, but parseJSONValue takes a *reference* to a
- Suliman (11/18) Aug 17 2015 Yeas, I understood, but maybe it's better to rename it (or add
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/16) Aug 17 2015 I agree that the naming can be a bit confusing at first, but I chose
- Suliman (4/4) Aug 17 2015 Also I can't build last build from git. I am getting error:
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/7) Aug 17 2015 Do you use DUB to build? It should automatically download the
- Suliman (3/3) Aug 17 2015 Also could you look at theme
- Andrei Alexandrescu (9/23) Aug 17 2015 Sounds tenuous.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (23/46) Aug 18 2015 It's more convenient/readable in cases where a complex type is used
- Andrei Alexandrescu (6/61) Aug 21 2015 Well I guess I would, but no matter. It's something where reasonable
- =?UTF-8?Q?S=c3=b6nke_Ludwig?= (10/39) Aug 22 2015 It depends on the perspective/use case, so it's surely not unreasonable
- deadalnix (5/17) Aug 12 2015 Thing is, the schema is not always known perfectly? Typical case
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (13/28) Aug 12 2015 For example in the serialization framework of vibe.d you can have
- Walter Bright (2/5) Aug 12 2015 Hah, I'd like to replace dmd.conf with a .json file.
- CraigDillabaugh (4/11) Aug 13 2015 Not .json!
- Walter Bright (2/3) Aug 13 2015 [ "comment" : "and you thought it couldn't have comments!" ]
- Craig Dillabaugh (7/11) Aug 13 2015 You are cheating :o)
- Andrei Alexandrescu (2/6) Aug 14 2015 There can't be two comments with the same key though. -- Andrei
- Steven Schveighoffer (13/20) Aug 14 2015 This is invalid (though probably unintentionally). An array cannot have
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (27/39) Aug 14 2015 http://tools.ietf.org/html/rfc7159
- Andrei Alexandrescu (4/23) Aug 14 2015 You're right. Good convo:
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/41) Aug 14 2015 No, he is wrong, and even if he was right, he would still be
- Steven Schveighoffer (10/46) Aug 14 2015 Yes, that's what I checked first :)
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/10) Aug 14 2015 One should have a config file format for which there are standard
- Steven Schveighoffer (5/14) Aug 14 2015 And that would be possible here. JSON file format says nothing about how...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/17) Aug 14 2015 It isn't important since JSON is not too good as a config file
- deadalnix (4/22) Aug 14 2015 It doesn't matter what you think of JSON.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/5) Aug 14 2015 The discussion was about suitability as a standard config file
- Steven Schveighoffer (4/6) Aug 14 2015 I think you are missing that this sub-discussion is about using json to
- rsw0x (3/10) Aug 14 2015 dub uses sdlang, why not dmd?
- Walter Bright (4/10) Aug 14 2015 When going for portability, it is not a good idea to emit duplicate keys...
- Walter Bright (3/10) Aug 14 2015 The Json spec doesn't say that - it doesn't specify any semantic meaning...
- Walter Bright (3/13) Aug 14 2015 That is, the ECMA 404 spec. There seems to be more than one JSON spec.
- Nick Sabalausky (2/4) Aug 21 2015 Amusingly, that "ECMA-404" link results in an actual HTTP 404.
- Adam D. Ruppe (3/4) Aug 13 2015 There's an awful lot of people out there replacing json with more
- Brad Anderson (3/8) Aug 13 2015 Referring to TOML?
- Walter Bright (3/7) Aug 13 2015 We've currently invented our own, rather stupid and limited, format. The...
- Dmitry Olshansky (5/14) Aug 13 2015 YAML is (plus/minus braces) the same but supports comments and is
- Walter Bright (2/15) Aug 14 2015 Yes, but we (will) have a .json parser in Phobos.
- Jacob Carlborg (4/5) Aug 14 2015 Time to add a YAML parser ;)
- Rikki Cattermole (2/5) Aug 14 2015 Heyyy Sonke ;)
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/6) Aug 14 2015 I think kiith-sa has started on that:
- Walter Bright (4/7) Aug 14 2015 That's a good idea, but since dmd already emits json and requires incorp...
- suliman (4/14) Aug 14 2015 Walter, and what I should to do for commenting stringin config
- Walter Bright (4/9) Aug 14 2015 json is a format that everybody understands, and dmd has json code alrea...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/14) Aug 15 2015 And you end up with each D tool having their own config format…
- Nick Sabalausky (6/13) Aug 21 2015 I'll take an "invented our own, rather stupid and limited, format" over
- Dmitry Olshansky (6/23) Aug 14 2015 We actually have YAML parser in DUB repository plus so that can be
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/13) Aug 12 2015 Maybe it is better to just focus on having a top-of-the-line
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (28/39) Aug 13 2015 I think we really need to have an informal pre-vote about the BigInt and...
- Walter Bright (9/11) Aug 13 2015 1. What about the issue of having the API be a composable range interfac...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/24) Aug 13 2015 In this case, the lexer will perform on-the-fly UTF validation of the
- Walter Bright (10/38) Aug 14 2015 Ok, my mistake. I didn't look at the others.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (19/63) Aug 15 2015 I'll rename it to isCharInputRange. We don't have something like that in...
- Suliman (4/4) Aug 15 2015 I talked with few people and they said that they are prefer
- Laeeth Isharc (2/6) Aug 15 2015 New stream parser is fast! (See prior thread on benchmarks).
- Walter Bright (25/49) Aug 15 2015 That's right, there isn't one. But I use:
- Dmitry Olshansky (13/34) Aug 15 2015 Actually there are next to none. `validate` that throws on failed
- Walter Bright (4/7) Aug 16 2015 Perhaps, but I wouldn't be convinced without benchmarks to prove it on a...
- Dmitry Olshansky (9/18) Aug 16 2015 About x2 faster then decode + check-if-alphabetic on my stuff:
- Walter Bright (2/7) Aug 16 2015 Thank you.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (33/94) Aug 16 2015 Good, I'll use `if (isInputRange!R &&
- Jacob Carlborg (5/11) Aug 16 2015 I agree. Signatures like this are what's making std.algorithm look more
- Walter Bright (14/53) Aug 16 2015 Except that there is no reason to support wchar, dchar, int, ubyte, or a...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (28/93) Aug 22 2015 But you have seen ubyte[] when reading something from a file or from a
- Walter Bright (8/25) Aug 24 2015 Not if the illuminating example in the Json API description does it that...
- =?UTF-8?Q?S=c3=b6nke_Ludwig?= (25/58) Aug 24 2015 That's true, but then they will possibly have to understand the inner
- Sebastiaan Koppe (2/5) Aug 25 2015 One can also say the problem is that you have a string variable.
- =?UTF-8?Q?S=c3=b6nke_Ludwig?= (16/21) Aug 25 2015 But ranges are not always the right solution:
- Jay Norwood (8/20) Aug 15 2015 I like this #3. If I understand it correctly, this would provide
- Andrei Alexandrescu (5/9) Aug 17 2015 I'll submit a review in short order, but thought this might be of use in...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/11) Aug 17 2015 I've added some changes in the latest version (docs updated):
- Andrei Alexandrescu (134/138) Aug 17 2015 I'll preface my review with a general comment. This API comes at an
- Jacob Carlborg (6/11) Aug 17 2015 I don't think this is excessive. We should strive to have small modules....
- Andrei Alexandrescu (2/11) Aug 18 2015 How about a module with 20? -- Andrei
- Jacob Carlborg (4/5) Aug 18 2015 If it's used in several other modules, I don't see a problem with it.
- Andrei Alexandrescu (2/5) Aug 18 2015 Me neither if internal. I do see a problem if it's public. -- Andrei
- Jacob Carlborg (5/6) Aug 18 2015 If it's public and those 20 lines are useful on its own, I don't see a
- Andrei Alexandrescu (4/8) Aug 18 2015 In this case at least they aren't. There is no need to import the JSON
- Dmitry Olshansky (6/16) Aug 19 2015 To catch it? Generally I agree - just merge things sensibly, there could...
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/14) Aug 19 2015 The only other module where it would fit would be lexer.d, but that
- Andrei Alexandrescu (3/19) Aug 21 2015 I'm sure there are a number of better options to package things nicely.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/22) Aug 22 2015 I'm all ears ;)
- Nick Sabalausky (3/9) Aug 21 2015 Module boundaries should be determined by organizational grouping, not
- David Nadlinger (5/7) Aug 21 2015 By organizational grouping as well as encapsulation concerns.
- Andrei Alexandrescu (3/14) Aug 21 2015 Rather by usefulness. As I mentioned, nobody would ever need only JSON's...
- Jacob Carlborg (6/8) Aug 23 2015 Well, but it depends on how you decide what should be in a group. Size
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (22/92) Aug 18 2015 For iterating tree-like structures, a callback-based seems nicer,
- Marco Leise (10/12) Sep 28 2015 You mean the user should write a JSON number parsing routine
- Marc =?UTF-8?B?U2Now7x0eg==?= (10/20) Sep 29 2015 No, the JSON type should just store the raw unparsed token and
- Laeeth Isharc (13/35) Sep 29 2015 I was just speaking to Sonke about another aspect of this. It's
- Marco Leise (16/28) Sep 30 2015 Ah, the duck typing approach of accepting any numeric type
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (103/236) Aug 18 2015 That would mean a single module that is >5k lines long. Spreading out
- Andrei Alexandrescu (69/278) Aug 21 2015 That would help. My point is it's good design to make the response
- Andrei Alexandrescu (3/8) Aug 21 2015 I should add that in parseJSONStream, "stream" refers to the input,
- tired_eyes (3/7) Aug 21 2015 Wow. Just wow.
- Andrei Alexandrescu (2/10) Aug 21 2015 By "it" there I mean "the brake" :o). -- Andrei
- H. S. Teoh via Digitalmars-d (7/19) Aug 21 2015 Wait, wait. So you're saying the GC is a brake, and we should remove the
- Andrei Alexandrescu (3/19) Aug 21 2015 Nothing new here. We want to make it a pleasant experience to use D
- H. S. Teoh via Digitalmars-d (6/27) Aug 21 2015 Making it pleasant to use without a GC is not the same thing as removing
- Steven Schveighoffer (8/29) Aug 21 2015 Allow me to (possibly) clarify.
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (85/242) Aug 22 2015 Most lines are needed for tests and documentation. Surely dropping some
- Martin Nowak (8/18) Aug 24 2015 Also see "utf/unicode should only be validated once"
- =?UTF-8?Q?S=c3=b6nke_Ludwig?= (9/28) Aug 25 2015 The performance benefit comes from the fact that almost all of JSON is a...
- Martin Nowak (3/9) Aug 25 2015 I see, then we should indeed exploit this fact and offer lexing of
- Timon Gehr (2/6) Aug 19 2015 What about the comma tokens?
- Andrei Alexandrescu (4/11) Aug 19 2015 Forgot about those. The invariant is that byToken should return a
- Jacob Carlborg (4/6) Aug 19 2015 That should be possible without the comma tokens in this case?
- Andrei Alexandrescu (3/7) Aug 19 2015 That is correct, but would do little else than confusing folks. FWIW the...
- Martin Nowak (3/5) Aug 24 2015 Though stdx (or better std.x) would have been a prettier and more
- Timon Gehr (3/8) Aug 25 2015 The great thing about the experimental package is that we are actually
- Steven Schveighoffer (6/16) Aug 25 2015 I strongly oppose renaming it. I don't want Phobos to fall into the trap...
- Martin Nowak (3/3) Aug 25 2015 Will try to convert a piece of code I wrote a few days ago.
- tired_eyes (3/3) Sep 24 2015 So, what is the current status of std.data.json? This topic is
- Atila Neves (4/7) Sep 24 2015 I probably should have posted here. Soenke is working on all the
- Marco Leise (16/22) Oct 02 2015 There is one thing I noticed today that I personally feel
- Alex (20/20) Oct 06 2015 JSON is a particular file format useful for serialising
- =?UTF-8?Q?S=c3=b6nke_Ludwig?= (12/31) Oct 06 2015 A generic serialization framework is definitely needed! Jacob Carlborg
- Sebastiaan Koppe (15/18) Oct 06 2015 I think there are too many particulars making an abstract
- Atila Neves (7/10) Oct 06 2015 The binary one is the one I care about, so that's the one I wrote:
- =?UTF-8?B?TcOhcmNpbw==?= Martins (4/8) Oct 09 2018 Sorry for the late ping, but it's been 3 years - what has
- Nicholas Wilson (3/13) Oct 09 2018 I presume it became vibe.data.json, there is also asdf if you're
- Jonathan M Davis (41/56) Oct 09 2018 As I understand it, it was originally part of vibe.d (though I think tha...
- Basile B. (6/16) Oct 09 2018 It's been moved here
Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ Atila
Jul 28 2015
On 29/07/2015 2:07 a.m., Atila Neves wrote:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaRight now, my view is no. Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.
Jul 28 2015
On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:On 29/07/2015 2:07 a.m., Atila Neves wrote:I totally agree with that, but shouldn't it be consistent in Phobos? I don't think it's possible to make an interface for custom allocators right now, because that question simply hasn't been ironed out along with std.allocator. So, anything related to allocators belongs in another thread imo, and the review process here would be about the actual json interfaceStart of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaRight now, my view is no. Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.
Jul 28 2015
On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:On 29/07/2015 2:07 a.m., Atila Neves wrote:Just a reminder that this is the review thread, not the vote thread (in case anyone reading got confused).Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaRight now, my view is no.Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.That's a good point. This is the perfect opportunity to hammer out how allocators are going to be integrated into other parts of Phobos.
Jul 28 2015
On Tuesday, 28 July 2015 at 15:55:04 UTC, Brad Anderson wrote:On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:From what I see from std.allocator, there's no Allocator interface? I think this would require changing the type to `struct JSONValue(Allocator)`, unless we see an actual interface implemented in phobos.On 29/07/2015 2:07 a.m., Atila Neves wrote:Just a reminder that this is the review thread, not the vote thread (in case anyone reading got confused).Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaRight now, my view is no.Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.That's a good point. This is the perfect opportunity to hammer out how allocators are going to be integrated into other parts of Phobos.
Jul 28 2015
On 29/07/2015 4:23 a.m., Etienne Cimon wrote:On Tuesday, 28 July 2015 at 15:55:04 UTC, Brad Anderson wrote:There is one. IAllocator. I use it throughout std.experimental.image. Unfortunately site is down atm so can't link docs *grumbles*. Btw even if an allocator is a struct, there is a type to wrap it up in a class.On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:From what I see from std.allocator, there's no Allocator interface? I think this would require changing the type to `struct JSONValue(Allocator)`, unless we see an actual interface implemented in phobos.On 29/07/2015 2:07 a.m., Atila Neves wrote:Just a reminder that this is the review thread, not the vote thread (in case anyone reading got confused).Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaRight now, my view is no.Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.That's a good point. This is the perfect opportunity to hammer out how allocators are going to be integrated into other parts of Phobos.
Jul 28 2015
2015-07-28 17:55 GMT+02:00 Brad Anderson via Digitalmars-d < digitalmars-d puremagic.com>:Unless there is some sort of proof that it will work with allocators.Allocator is definitely a separate issue. It's a moving target, it's not yet part of a release, and consequently barely field-tested. We will find bugs, we might find design mistakes, we might head in a direction which will turn out to be an anti-pattern (just like `opDispatch` for JSONValue ;) ) It's not to say the quality of the module isn't good - that would mean our release process is broken -, but making a module inclusion to experimental dependent on another module in experimental will not improve the quality of the reviewed module.I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.That's a good point. This is the perfect opportunity to hammer out how allocators are going to be integrated into other parts of Phobos.
Jul 28 2015
On 29/07/2015 4:25 a.m., Mathias Lang via Digitalmars-d wrote:2015-07-28 17:55 GMT+02:00 Brad Anderson via Digitalmars-d <digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com>>: Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC. That's a good point. This is the perfect opportunity to hammer out how allocators are going to be integrated into other parts of Phobos. Allocator is definitely a separate issue. It's a moving target, it's not yet part of a release, and consequently barely field-tested. We will find bugs, we might find design mistakes, we might head in a direction which will turn out to be an anti-pattern (just like `opDispatch` for JSONValue ;) ) It's not to say the quality of the module isn't good - that would mean our release process is broken -, but making a module inclusion to experimental dependent on another module in experimental will not improve the quality of the reviewed module.Right now we just need a plan, and we're all good for std.data.json. Doesn't need to implemented right now, but I'd rather we had a plan going forward to add allocators to it, then you know find out a year down the track that it would need a whole rewrite.
Jul 28 2015
Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:On 29/07/2015 2:07 a.m., Atila Neves wrote:If you pass a string or byte array as input, then there will be no allocations at all (the interface is nogc). For other cases it supports custom allocation through an appender factory [1][2], since there is no standard allocator interface, yet. But since that's the only place where memory is allocated (apart from lower level code, such as BigInt), as soon as Appender supports custom allocators, or you write your own appender, the JSON parser will, too. Only if you use the DOM parser, there will be some inevitable GC allocations, because the DOM representation uses dynamic and associative arrays. 1: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/lexer.d#L66 2: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/parser.d#L286Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaRight now, my view is no. Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.
Jul 28 2015
On 29/07/2015 4:41 a.m., Sönke Ludwig wrote:Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:It was after 3am when I did my initial look. But I saw the appender usage. I'm ok with this. The DOM parser on the other hand.. ugh this is where we do need IAllocator being used. Although by the sounds of it, we would need a map collection which supports allocators before it can be done.On 29/07/2015 2:07 a.m., Atila Neves wrote:If you pass a string or byte array as input, then there will be no allocations at all (the interface is nogc). For other cases it supports custom allocation through an appender factory [1][2], since there is no standard allocator interface, yet. But since that's the only place where memory is allocated (apart from lower level code, such as BigInt), as soon as Appender supports custom allocators, or you write your own appender, the JSON parser will, too. Only if you use the DOM parser, there will be some inevitable GC allocations, because the DOM representation uses dynamic and associative arrays. 1: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/lexer.d#L66 2: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/parser.d#L286Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaRight now, my view is no. Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.
Jul 28 2015
Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:I have used the code from vibe.d days so its not an issue of how well it works nor nit picky.You should still have a closer look, as it isn't very similar to the vibe.d code at all, but a rather radical evolution.
Jul 28 2015
On 29/07/2015 4:43 a.m., Sönke Ludwig wrote:Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:Again after 3am when I first looked. I'll take a closer look and create a new thread on this post about anything I find.I have used the code from vibe.d days so its not an issue of how well it works nor nit picky.You should still have a closer look, as it isn't very similar to the vibe.d code at all, but a rather radical evolution.
Jul 28 2015
On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaThis is cool: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/value.d#L183 I was getting tired of programmatically checking for null, then checking for object type, before moving along in the object and doing the same recursively. Not quite as intuitive as the optional chaining ?. operator in swift but it gets pretty close https://blog.sabintsev.com/optionals-in-swift-c94fd231e7a4#5622
Jul 28 2015
Am 28.07.2015 um 17:19 schrieb Etienne Cimon:On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:An idea might be to support something like this: json_value.opt.foo.bar[2].baz or opt(json_value).foo.bar[2].baz opt (name is debatable) would return a wrapper struct around the JSONValue that supports opDispatch/opIndex and propagates a missing field to the top gracefully. It could also keep track of the complete path to give a nice error message when a non-existent value is dereferenced.Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaThis is cool: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/value.d#L183 I was getting tired of programmatically checking for null, then checking for object type, before moving along in the object and doing the same recursively. Not quite as intuitive as the optional chaining ?. operator in swift but it gets pretty close https://blog.sabintsev.com/optionals-in-swift-c94fd231e7a4#5622
Jul 28 2015
On Tuesday, 28 July 2015 at 18:45:51 UTC, Sönke Ludwig wrote:An idea might be to support something like this: json_value.opt.foo.bar[2].baz or opt(json_value).foo.bar[2].baz opt (name is debatable) would return a wrapper struct around the JSONValue that supports opDispatch/opIndex and propagates a missing field to the top gracefully. It could also keep track of the complete path to give a nice error message when a non-existent value is dereferenced.+1 This would solve the cumbersome access of something deeply nested that I've had to deal with when using stdx.data.json. Combine that with the Algebraic improvements you've mentioned before and it'll be just about as pleasant as it could be to use.
Jul 28 2015
On Tuesday, 28 July 2015 at 18:45:51 UTC, Sönke Ludwig wrote:Am 28.07.2015 um 17:19 schrieb Etienne Cimon:I like it quite well. No, actually, a lot. Thinking about it some more... this could end up being the most convenient feature ever known to mankind and would likely push it towards a new age of grand discoveries, infinite fusion power and space colonization. Lets do itOn Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:An idea might be to support something like this: json_value.opt.foo.bar[2].baz or opt(json_value).foo.bar[2].baz opt (name is debatable) would return a wrapper struct around the JSONValue that supports opDispatch/opIndex and propagates a missing field to the top gracefully. It could also keep track of the complete path to give a nice error message when a non-existent value is dereferenced.Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaThis is cool: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/value.d#L183 I was getting tired of programmatically checking for null, then checking for object type, before moving along in the object and doing the same recursively. Not quite as intuitive as the optional chaining ?. operator in swift but it gets pretty close https://blog.sabintsev.com/optionals-in-swift-c94fd231e7a4#5622
Jul 28 2015
Am 28.07.2015 um 16:07 schrieb Atila Neves:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaThanks for making it happen! Can you also post a quick link to this thread in D.announce?
Jul 28 2015
On 7/28/2015 7:07 AM, Atila Neves wrote:Start of the two week process, folks.Thank you very much, Sönke, for taking this on. Thank you, Atila, for taking on the thankless job of being review manager. Just looking at the documentation only, some general notes: 1. Not sure that 'JSON' needs to be embedded in the public names. 'parseJSONStream' should just be 'parseStream', etc. Name disambiguation, if needed, should be ably taken care of by a number of D features for that purpose. Additionally, I presume that the stdx.data package implies a number of different formats. These formats should all use the same names with as similar as possible APIs - this won't work too well if JSON is embedded in the APIs. 2. JSON is a trivial format, http://json.org/. But I count 6 files and 30 names in the public API. 3. Stepping back a bit, when I think of parsing JSON data, I think: auto ast = inputrange.toJSON(); where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary. To create output: auto r = ast.toChars(); // r is an InputRange of characters writeln(r); So, we'll need: toJSON toChars JSONException The possible JSON values are: string number object (associative arrays) array true false null Since these are D builtin types, they can actually be a simple union of D builtin types. There is a decision needed about whether toJSON() allocates data or returns slices into its inputrange. This can be 'static if' tested by: if inputrange can return immutable slices. toChars() can take a compile time argument to determine if it is 'pretty' or not.
Jul 28 2015
On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote: [...]3. Stepping back a bit, when I think of parsing JSON data, I think: auto ast = inputrange.toJSON(); where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary.+1. The API should be as simple as possible. Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then you can just use to() to convert between a JSON container and the value that it represents (assuming the types are compatible). OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once). I'm not sure what a good API for that would be, though.To create output: auto r = ast.toChars(); // r is an InputRange of characters writeln(r); So, we'll need: toJSON toCharsShouldn't it just be toString()? [...]The possible JSON values are: string number object (associative arrays) array true false null Since these are D builtin types, they can actually be a simple union of D builtin types. There is a decision needed about whether toJSON() allocates data or returns slices into its inputrange. This can be 'static if' tested by: if inputrange can return immutable slices. toChars() can take a compile time argument to determine if it is 'pretty' or not.Whether or not toJSON() allocates *data*, it will have to allocate container nodes of some sort. At the minimum, it will need to use AA's, so it cannot be nogc. T -- Recently, our IT department hired a bug-fix engineer. He used to work for Volkswagen.
Jul 28 2015
On 7/28/2015 3:37 PM, H. S. Teoh via Digitalmars-d wrote:On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote: Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then you can just use to() to convert between a JSON container and the value that it represents (assuming the types are compatible).Well, I wouldn't want std.conv to be importing std.json.OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once).That is a good point.I'm not sure what a good API for that would be, though.Probably simply returning an InputRange of JSON values.No. toString() returns a string, which has to be allocated. toChars() (an upcoming convention) would return an InputRange instead, side-stepping allocation.To create output: auto r = ast.toChars(); // r is an InputRange of characters writeln(r); So, we'll need: toJSON toCharsShouldn't it just be toString()?Whether or not toJSON() allocates *data*, it will have to allocate container nodes of some sort. At the minimum, it will need to use AA's, so it cannot be nogc.That's right. At some point the API will need to add a parameter for Andrei's allocator system.
Jul 28 2015
On Tue, Jul 28, 2015 at 03:55:22PM -0700, Walter Bright via Digitalmars-d wrote:On 7/28/2015 3:37 PM, H. S. Teoh via Digitalmars-d wrote:I'm pretty sure std.conv has interfaces that allow you to keep JSON-specific stuff in std.json, so that you don't get the JSON conversion capability until you actually import std.json.On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote: Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then you can just use to() to convert between a JSON container and the value that it represents (assuming the types are compatible).Well, I wouldn't want std.conv to be importing std.json.But how would you capture the nesting substructures?OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once).That is a good point.I'm not sure what a good API for that would be, though.Probably simply returning an InputRange of JSON values.[...] ??! Surely you have heard of the non-allocating overload of toString? void toString(scope void delegate(const(char)[]) dg); T -- When solving a problem, take care that you do not become part of the problem.No. toString() returns a string, which has to be allocated. toChars() (an upcoming convention) would return an InputRange instead, side-stepping allocation.To create output: auto r = ast.toChars(); // r is an InputRange of characters writeln(r); So, we'll need: toJSON toCharsShouldn't it just be toString()?
Jul 28 2015
On 7/28/2015 5:15 PM, H. S. Teoh via Digitalmars-d wrote:A JSON value is a tagged union of the various types.Probably simply returning an InputRange of JSON values.But how would you capture the nesting substructures???! Surely you have heard of the non-allocating overload of toString? void toString(scope void delegate(const(char)[]) dg);Not range friendly.
Jul 28 2015
On 2015-07-29 06:57, Walter Bright wrote:A JSON value is a tagged union of the various types.But in most cases I think there will be one root node, of type object. In that case it would be range with only one element? How does that help? -- /Jacob Carlborg
Jul 29 2015
Am 29.07.2015 um 12:10 schrieb Jacob Carlborg:On 2015-07-29 06:57, Walter Bright wrote:I think a better approach that to add such a special case is to add a readValue function that takes a range of parser nodes and reads into a single JSONValue. That way one can use the pull parser to jump between array or object entries and then extract individual values, or maybe even use nodes.map!readValue to get a range of values...A JSON value is a tagged union of the various types.But in most cases I think there will be one root node, of type object. In that case it would be range with only one element? How does that help?
Jul 29 2015
On 7/29/2015 3:10 AM, Jacob Carlborg wrote:On 2015-07-29 06:57, Walter Bright wrote:An object is a collection of other Values.A JSON value is a tagged union of the various types.But in most cases I think there will be one root node, of type object.In that case it would be range with only one element? How does that help?I don't understand the question.
Jul 29 2015
On 2015-07-29 20:33, Walter Bright wrote:On 7/29/2015 3:10 AM, Jacob Carlborg wrote:I guess I'm finding it difficult to picture a JSON structure as a range. How would the following JSON be returned as a range? { "a": 1, "b": [2, 3], "c": { "d": 4 } } -- /Jacob CarlborgBut in most cases I think there will be one root node, of type object.An object is a collection of other Values. > In that case it would be range with only one element? How does that help? I don't understand the question.
Jul 29 2015
On 7/29/2015 11:51 AM, Jacob Carlborg wrote:I guess I'm finding it difficult to picture a JSON structure as a range. How would the following JSON be returned as a range? { "a": 1, "b": [2, 3], "c": { "d": 4 } }It if was returned as a range of nodes, it would be: Object, string, number, string, array, number, number, end, string, object, string, number, end, end If was returned as a Value, then you could ask the value to return a range of nodes. A container is not a range, although it may offer a way to get range that iterates over its contents.
Jul 29 2015
On 2015-07-30 01:34, Walter Bright wrote:It if was returned as a range of nodes, it would be: Object, string, number, string, array, number, number, end, string, object, string, number, end, endAh, that make sense. Never though of an "end" mark like that, pretty cleaver. -- /Jacob Carlborg
Jul 30 2015
On 7/28/2015 3:55 PM, Walter Bright wrote:So it appears that JSON can be in one of 3 useful states: 1. a range of characters (rc) 2. a range of nodes (rn) 3. a container of JSON values (values) What's necessary is simply the ability to convert between these states: (names are just for illustration) rn = rc.toNodes(); values = rn.toValues(); rn = values.toNodes(); rc = rn.toChars(); So, if I wanted to simply pretty print a JSON string s: s.toNodes.toChars(); I.e. it's all composable.OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once).That is a good point.
Jul 28 2015
On Tue, Jul 28, 2015 at 10:43:20PM -0700, Walter Bright via Digitalmars-d wrote:On 7/28/2015 3:55 PM, Walter Bright wrote:[...] How does a linear range of nodes convey a nested structure? T -- Let's call it an accidental feature. -- Larry WallSo it appears that JSON can be in one of 3 useful states: 1. a range of characters (rc) 2. a range of nodes (rn) 3. a container of JSON values (values)OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once).That is a good point.
Jul 28 2015
On 7/28/2015 10:49 PM, H. S. Teoh via Digitalmars-d wrote:How does a linear range of nodes convey a nested structure?You'd need to add a special node type, 'end'. So an array [1,true] would look like: array number true end
Jul 29 2015
Am 29.07.2015 um 07:43 schrieb Walter Bright:On 7/28/2015 3:55 PM, Walter Bright wrote:There are actually even four levels: 1. Range of characters 2. Range of tokens 3. Range of nodes 4. DOM value Having a special case for range of DOM values may or may not be a worthwhile thing to optimize for handling big JSON arrays of values. But there is always the pull parser for that kind of data processing. Currently not all, but most, conversions between the levels are implemented, and sometimes a level is skipped for efficiency. The question is if it would be worth the effort and the API complexity to implement all of them. lexJSON: character range -> token range parseJSONStream: character range -> node range parseJSONStream: token range -> node range parseJSONValue: character range -> DOM value parseJSONValue: token range -> DOM value (same for toJSONValue) writeJSON: token range -> character range (output range) writeJSON: node range -> character range (output range) writeJSON: DOM value -> character range (output range) writeJSON: to -> character range (output range) (same for toJSON with string output) Adding an InputStream based version of writeJSON would be an option, but the question is how performant that would be and how to go about implementing the number->InputRange functionality.So it appears that JSON can be in one of 3 useful states: 1. a range of characters (rc) 2. a range of nodes (rn) 3. a container of JSON values (values) What's necessary is simply the ability to convert between these states: (names are just for illustration) rn = rc.toNodes(); values = rn.toValues(); rn = values.toNodes(); rc = rn.toChars(); So, if I wanted to simply pretty print a JSON string s: s.toNodes.toChars(); I.e. it's all composable.OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once).That is a good point.
Jul 29 2015
On 7/29/2015 1:37 AM, Sönke Ludwig wrote:There are actually even four levels: 1. Range of characters 2. Range of tokens 3. Range of nodes 4. DOM valueWhat's the need for users to see a token stream? I don't know what the DOM value is - is that just JSON as an ast?Having a special case for range of DOM values may or may not be a worthwhile thing to optimize for handling big JSON arrays of values.I see no point for that.Currently not all, but most, conversions between the levels are implemented, and sometimes a level is skipped for efficiency. The question is if it would be worth the effort and the API complexity to implement all of them. lexJSON: character range -> token range parseJSONStream: character range -> node range parseJSONStream: token range -> node range parseJSONValue: character range -> DOM value parseJSONValue: token range -> DOM value (same for toJSONValue) writeJSON: token range -> character range (output range) writeJSON: node range -> character range (output range) writeJSON: DOM value -> character range (output range) writeJSON: to -> character range (output range) (same for toJSON with string output)I don't see why there are more than the 3 I mentioned.
Jul 29 2015
Am 29.07.2015 um 20:44 schrieb Walter Bright:On 7/29/2015 1:37 AM, Sönke Ludwig wrote:Yes.There are actually even four levels: 1. Range of characters 2. Range of tokens 3. Range of nodes 4. DOM valueWhat's the need for users to see a token stream? I don't know what the DOM value is - is that just JSON as an ast?Hm, I misread "container of JSON values" as "range of JSON values". I guess you just meant JSONValue, so my comment doesn't apply.Having a special case for range of DOM values may or may not be a worthwhile thing to optimize for handling big JSON arrays of values.I see no point for that.The token level is useful for reasoning about the text representation. It could be used for example to implement syntax highlighting, or for using the location information to mark errors in the source code.Currently not all, but most, conversions between the levels are implemented, and sometimes a level is skipped for efficiency. The question is if it would be worth the effort and the API complexity to implement all of them. lexJSON: character range -> token range parseJSONStream: character range -> node range parseJSONStream: token range -> node range parseJSONValue: character range -> DOM value parseJSONValue: token range -> DOM value (same for toJSONValue) writeJSON: token range -> character range (output range) writeJSON: node range -> character range (output range) writeJSON: DOM value -> character range (output range) writeJSON: to -> character range (output range) (same for toJSON with string output)I don't see why there are more than the 3 I mentioned.
Jul 29 2015
On 7/29/2015 1:41 PM, Sönke Ludwig wrote:The token level is useful for reasoning about the text representation. It could be used for example to implement syntax highlighting, or for using the location information to mark errors in the source code.Ok, I see your point. The question then becomes does the node stream really add enough value to justify its existence, as it greatly overlaps the token stream.
Jul 29 2015
Am 30.07.2015 um 05:25 schrieb Walter Bright:On 7/29/2015 1:41 PM, Sönke Ludwig wrote:I agree that in case of JSON their difference can be a bit subtle. Basically the node stream adds knowledge about the nesting of elements, as well as adding semantic meaning to special token sequences that the library users would otherwise have to parse themselves. Finally, it also guarantees a valid JSON structure, while a token range could have tokens in any order. Especially the knowledge about nesting is also a requirement for the high-level pull parser functions (skipToKey, readArray, readString etc.) that make working with that kind of pull parser interface actually bearable, outside of mechanic code like a generic serialization framework.The token level is useful for reasoning about the text representation. It could be used for example to implement syntax highlighting, or for using the location information to mark errors in the source code.Ok, I see your point. The question then becomes does the node stream really add enough value to justify its existence, as it greatly overlaps the token stream.
Jul 30 2015
Am 29.07.2015 um 00:37 schrieb H. S. Teoh via Digitalmars-d:On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote: [...]http://s-ludwig.github.io/std_data_json/stdx/data/json/parser/toJSONValue.html3. Stepping back a bit, when I think of parsing JSON data, I think: auto ast = inputrange.toJSON(); where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary.+1. The API should be as simple as possible.Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then you can just use to() to convert between a JSON container and the value that it represents (assuming the types are compatible).We could maybe do that if we keep the current JSONValue as a struct wrapper around Algebraic. But it I guess that this will create an ambiguity between JSONValue("...") creating parsing a JSON string, or being constructed as a JSON string value. Or does to! hook up to something else than the constructor?OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once). I'm not sure what a good API for that would be, though.See http://s-ludwig.github.io/std_data_json/stdx/data/json/parser/parseJSONStream.html and the various UFCS "read" and "skip" functions in http://s-ludwig.github.io/std_data_json/stdx/data/json/parser.html
Jul 29 2015
W dniu 2015-07-29 o 00:37, H. S. Teoh via Digitalmars-d pisze:On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote: [...]Here's mine range based parser, you can parse 1 TB json file without a single allocation. It needs heavy polishing, but I didnt have time/need to do it. Basically a WIP, but maybe someone will find it useful. https://github.com/pszturmaj/json-streaming-parser3. Stepping back a bit, when I think of parsing JSON data, I think: auto ast = inputrange.toJSON(); where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary.+1. The API should be as simple as possible. Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then you can just use to() to convert between a JSON container and the value that it represents (assuming the types are compatible). OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once). I'm not sure what a good API for that would be, though.
Jul 29 2015
Am 29.07.2015 um 00:29 schrieb Walter Bright:On 7/28/2015 7:07 AM, Atila Neves wrote:This is actually one of my pet peeves. Having a *readable* API that tells the reader immediately what happens is IMO one of the most important aspects (far more important than an API that allows quick typing). A number of times I've seen D code that omits part of what it actually does in its name and the result was that it was constantly necessary to scroll up to see where a particular name might come from. So I have a strong preference to keep "JSON", because it's an integral part of the semantics.Start of the two week process, folks.Thank you very much, Sönke, for taking this on. Thank you, Atila, for taking on the thankless job of being review manager. Just looking at the documentation only, some general notes: 1. Not sure that 'JSON' needs to be embedded in the public names. 'parseJSONStream' should just be 'parseStream', etc. Name disambiguation, if needed, should be ably taken care of by a number of D features for that purpose. Additionally, I presume that the stdx.data package implies a number of different formats. These formats should all use the same names with as similar as possible APIs - this won't work too well if JSON is embedded in the APIs.2. JSON is a trivial format, http://json.org/. But I count 6 files and 30 names in the public API.The whole thing provides a stream parser with high level helpers to make it convenient to use, a DOM module, a separate lexer and a generator module that operates in various different modes (maybe two additional modes still to come!). Every single function provides real and frequently useful benefits. So if anything, there are still some little things missing. All in all, even if JSON may be a simple format, the source code is already almost 5k LOC (includes unit tests of course). But apart from maintainability they have mainly been separated to minimize the amount of code that needs to be dragged in for a particular functionality (not only other JSON modules, but also from different parts of Phobos).3. Stepping back a bit, when I think of parsing JSON data, I think: auto ast = inputrange.toJSON(); where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary.We can drop the "Value" part of the name of course, if we expect that function to be used a lot, but there is still the parseJSONStream function which is arguably not less important. BTW, you just mentioned the DOM part so far, but for any code that where performance is a priority, the stream based pull parser is basically the way to go. This would also be the natural entry point for any serialization library. And my prediction is, if we do it right, that working with JSON will in most cases simply mean "S s = deserializeJSON(json_input);", where S is a D struct that gets populated with the deserialized JSON data. Where that doesn't fit, performance oriented code would use the pull parser. So the DOM part of the system, which is the only thing the current JSON module has, will only be left as a niche functionality.To create output: auto r = ast.toChars(); // r is an InputRange of characters writeln(r);Do we have an InputRange version of the various number-to-string conversions? It would be quite inconvenient to reinvent those (double, long, BigInt) in the JSON package. Of course, using to!string internally would be an option, but it would obviously destroy all nogc opportunities and performance benefits.So, we'll need: toJSON toChars JSONException The possible JSON values are: string number object (associative arrays) array true false null Since these are D builtin types, they can actually be a simple union of D builtin types.The idea is to have JSONValue be a simple alias to Algebraic!(...), just that there are currently still some workarounds for DMD < 2.067.0 on top, which means that JSONValue is a struct that "alias this" inherits from Algebraic for the time being. Those workarounds will be removed when the code is actually put into Phobos. But a simple union would obviously not be enough, it still needs a type tag of some form and needs to provide a safe interface on top of it. Algebraic is the only thing that comes close right now, but I'd really prefer to have a fully statically typed version of Algebraic that uses an enum as the type tag instead of working with delegates/typeinfo.There is a decision needed about whether toJSON() allocates data or returns slices into its inputrange. This can be 'static if' tested by: if inputrange can return immutable slices.The test is currently "is(T == string) || is (T == immutable(ubyte)[])", but slicing is done in those cases and the non-DOM parser interface is even nogc as long as exceptions are disabled.toChars() can take a compile time argument to determine if it is 'pretty' or not.As long as JSON DOM values are stored in a generic Algebraic (which is a huge win in terms of interoperability!), toChars won't suffice as a name. It would have to be toJSON(Chars) (as it basically is now). I've gave the "pretty" version a separate name simply because it's more convenient to use and pretty printing will probably be by far the most frequently used option when converting to a string.
Jul 29 2015
On 7/29/2015 1:18 AM, Sönke Ludwig wrote:Am 29.07.2015 um 00:29 schrieb Walter Bright:I agree with your goal of readability. And if someone wants to write code that emphasizes it's JSON, they can write it as std.data.json.parseStream. (It's not about saving typing, it's about avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-) ) This is not a huge deal for me, but I'm not in favor of establishing a new convention that repeats the module name. It eschews one of the advantages of having module name spaces in the first place, and evokes the old C style naming conventions.1. Not sure that 'JSON' needs to be embedded in the public names. 'parseJSONStream' should just be 'parseStream', etc. Name disambiguation, if needed, should be ably taken care of by a number of D features for that purpose. Additionally, I presume that the stdx.data package implies a number of different formats. These formats should all use the same names with as similar as possible APIs - this won't work too well if JSON is embedded in the APIs.This is actually one of my pet peeves. Having a *readable* API that tells the reader immediately what happens is IMO one of the most important aspects (far more important than an API that allows quick typing). A number of times I've seen D code that omits part of what it actually does in its name and the result was that it was constantly necessary to scroll up to see where a particular name might come from. So I have a strong preference to keep "JSON", because it's an integral part of the semantics.I understand there is a purpose to each of those things, but there's also considerable value in a simpler API.2. JSON is a trivial format, http://json.org/. But I count 6 files and 30 names in the public API.The whole thing provides a stream parser with high level helpers to make it convenient to use, a DOM module, a separate lexer and a generator module that operates in various different modes (maybe two additional modes still to come!). Every single function provides real and frequently useful benefits. So if anything, there are still some little things missing.All in all, even if JSON may be a simple format, the source code is already almost 5k LOC (includes unit tests of course).I don't count unit tests as LOC :-)But apart from maintainability they have mainly been separated to minimize the amount of code that needs to be dragged in for a particular functionality (not only other JSON modules, but also from different parts of Phobos).They are so strongly related I don't see this as a big issue. Also, if they are templates, they don't get compiled in if not used.Agreed elsewhere. But still, I am not seeing a range interface on the functions. The lexer, for example, does not accept an input range of characters. Having a range interface is absolutely critical, and is the thing I am the most adamant about with all new Phobos additions. Any function that accepts arbitrarily long data should accept an input range instead, any function that generates arbitrary data should present that as an input range. Any function that builds a container should accept an input range to fill that container with. Any function that builds a container should also be an output range.3. Stepping back a bit, when I think of parsing JSON data, I think: auto ast = inputrange.toJSON(); where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary.We can drop the "Value" part of the name of course, if we expect that function to be used a lot, but there is still the parseJSONStream function which is arguably not less important. BTW, you just mentioned the DOM part so far, but for any code that where performance is a priority, the stream based pull parser is basically the way to go. This would also be the natural entry point for any serialization library.And my prediction is, if we do it right, that working with JSON will in most cases simply mean "S s = deserializeJSON(json_input);", where S is a D struct that gets populated with the deserialized JSON data.json_input must be a input range of characters.Where that doesn't fit, performance oriented code would use the pull parser.I am not sure what you mean by 'pull parser'. Do you mean the parser presents an input range as its output, and incrementally parses only as the next value is requested?So the DOM part of the system, which is the only thing the current JSON module has, will only be left as a niche functionality.That's ok. Is it normal practice to call the JSON data structure a Document Object Model?We do now, at least for integers. I plan to do ones for floating point.To create output: auto r = ast.toChars(); // r is an InputRange of characters writeln(r);Do we have an InputRange version of the various number-to-string conversions?It would be quite inconvenient to reinvent those (double, long, BigInt) in the JSON package.Right. It's been reinvented multiple times in Phobos, which is absurd. If you're reinventing them in std.data.json, then we're doing something wrong again.Of course, using to!string internally would be an option, but it would obviously destroy all nogc opportunities and performance benefits.That's exactly why the range versions were done.Agreed.So, we'll need: toJSON toChars JSONException The possible JSON values are: string number object (associative arrays) array true false null Since these are D builtin types, they can actually be a simple union of D builtin types.The idea is to have JSONValue be a simple alias to Algebraic!(...), just that there are currently still some workarounds for DMD < 2.067.0 on top, which means that JSONValue is a struct that "alias this" inherits from Algebraic for the time being. Those workarounds will be removed when the code is actually put into Phobos. But a simple union would obviously not be enough, it still needs a type tag of some form and needs to provide a safe interface on top of it.Algebraic is the only thing that comes close right now, but I'd really prefer to have a fully statically typed version of Algebraic that uses an enum as the type tag instead of working with delegates/typeinfo.If Algebraic is not good enough for this, it is a failure and must be fixed.With a range interface, you can test for 1) hasSlicing and 2) if ElementEncodingType is immutable. Why is ubyte being accepted? The ECMA-404 spec sez: "Conforming JSON text is a sequence of Unicode code points".There is a decision needed about whether toJSON() allocates data or returns slices into its inputrange. This can be 'static if' tested by: if inputrange can return immutable slices.The test is currently "is(T == string) || is (T == immutable(ubyte)[])", but slicing is done in those cases and the non-DOM parser interface is even nogc as long as exceptions are disabled.Why not?toChars() can take a compile time argument to determine if it is 'pretty' or not.As long as JSON DOM values are stored in a generic Algebraic (which is a huge win in terms of interoperability!), toChars won't suffice as a name.It would have to be toJSON(Chars) (as it basically is now). I've gave the "pretty" version a separate name simply because it's more convenient to use and pretty printing will probably be by far the most frequently used option when converting to a string.So make pretty printing the default. In fact, I'm skeptical that a non-pretty printed version is worth while. Note that an adapter algorithm can strip redundant whitespace.
Jul 29 2015
If this implementation will be merged with phobos will vibed migrate to it, or it would two similar libs?
Jul 30 2015
Am 30.07.2015 um 09:27 schrieb Suliman:If this implementation will be merged with phobos will vibed migrate to it, or it would two similar libs?I'll then make the vibe.d JSON module compatible using "alias this" implicit conversions and then deprecate it over a longer period of time before it gets removed. And of course the serialization framework will be adjusted to work with the new JSON module.
Jul 30 2015
On Thursday, 30 July 2015 at 04:41:51 UTC, Walter Bright wrote:I agree with your goal of readability. And if someone wants to write code that emphasizes it's JSON, they can write it as std.data.json.parseStream. (It's not about saving typing, it's about avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-) ) This is not a huge deal for me, but I'm not in favor of establishing a new convention that repeats the module name. It eschews one of the advantages of having module name spaces in the first place, and evokes the old C style naming conventions.Is there any reason why D doesn't allow json.parseStream() in this case? I remember the requirement of having the full module path being my first head scratcher while learning D. The first example in TDPL had some source code that called split() (if memory serves) and phobos had changed since the book was written and you needed to disambiguate. I found it very odd that you have type the whole thing when just the next level up would suffice to disambiguate it. The trend seems to be toward more deeply nested modules in Phobos so having to type the full path will increasingly be a wart of D's. If we can't have the minimal necessary module paths then I'm completely in favor of parseJSONStream over the more general parseStream. I want that "json" in there one way or another (preferably by the method which makes it optional while maintaining brevity).
Jul 30 2015
On 7/30/2015 9:58 AM, Brad Anderson wrote:If we can't have the minimal necessary module paths then I'm completely in favor of parseJSONStream over the more general parseStream. I want that "json" in there one way or another (preferably by the method which makes it optional while maintaining brevity).I would think it unlikely to be parsing two different formats in one file. But in any case, you can always do this: import std.data.json : parseJSON = parse; Or put the import in a scope: void doNaughtyThingsWithJson() { import std.data.json; ... x.parse(); } The latter seems to be becoming the preferred D style.
Jul 30 2015
On Thu, Jul 30, 2015 at 12:43:40PM -0700, Walter Bright via Digitalmars-d wrote:On 7/30/2015 9:58 AM, Brad Anderson wrote:Yeah, local imports are fast becoming my preferred D coding style, because it makes code portable -- if you move a function to a new module, you don't have to untangle its import dependencies if all imports are local. It's one of those little, overlooked things about D that contribute toward making it an awesome language. T -- Written on the window of a clothing store: No shirt, no shoes, no service.If we can't have the minimal necessary module paths then I'm completely in favor of parseJSONStream over the more general parseStream. I want that "json" in there one way or another (preferably by the method which makes it optional while maintaining brevity).I would think it unlikely to be parsing two different formats in one file. But in any case, you can always do this: import std.data.json : parseJSON = parse; Or put the import in a scope: void doNaughtyThingsWithJson() { import std.data.json; ... x.parse(); } The latter seems to be becoming the preferred D style.
Jul 30 2015
On 7/30/2015 12:57 PM, H. S. Teoh via Digitalmars-d wrote:Yeah, local imports are fast becoming my preferred D coding style, because it makes code portable -- if you move a function to a new module, you don't have to untangle its import dependencies if all imports are local. It's one of those little, overlooked things about D that contribute toward making it an awesome language.Funny how my preferred D style of writing code is steadily diverging from C++ style :-)
Jul 30 2015
On Thu, Jul 30, 2015 at 01:26:17PM -0700, Walter Bright via Digitalmars-d wrote:On 7/30/2015 12:57 PM, H. S. Teoh via Digitalmars-d wrote:One would hope so, otherwise why are we here instead of in the C++ world? ;-) T -- This is not a sentence.Yeah, local imports are fast becoming my preferred D coding style, because it makes code portable -- if you move a function to a new module, you don't have to untangle its import dependencies if all imports are local. It's one of those little, overlooked things about D that contribute toward making it an awesome language.Funny how my preferred D style of writing code is steadily diverging from C++ style :-)
Jul 30 2015
is the current build is ready for production? I am getting error: source\stdx\data\json\value.d(81): Error: safe function 'stdx.data.json.value.JSONValue.this' cannot call system function 'std.variant.VariantN!(12u, typeof(null), bool, double, long, BigInt, string, JSONValue[], JSONValue[string]).VariantN.__ctor!(typeof(null)).this'
Jul 31 2015
Am 31.07.2015 um 10:13 schrieb Suliman:is the current build is ready for production? I am getting error: source\stdx\data\json\value.d(81): Error: safe function 'stdx.data.json.value.JSONValue.this' cannot call system function 'std.variant.VariantN!(12u, typeof(null), bool, double, long, BigInt, string, JSONValue[], JSONValue[string]).VariantN.__ctor!(typeof(null)).this'2.068 "fixed" possible safety issues with VariantN by marking the interface system instead of trusted. Unfortunately that broke any safe code using Variant/Algebraic.
Jul 31 2015
On Friday, 31 July 2015 at 12:16:02 UTC, Sönke Ludwig wrote:Am 31.07.2015 um 10:13 schrieb Suliman:Wat revision are usable? I checked some and all have issue like: source\App.d(5,34): Error: template stdx.data.json.parser.parseJSONValue cannot deduce function from argument types !()(string), candidates are: source\stdx\data\json\parser.d(105,11): stdx.data.json.parser.parseJSONVa lue(LexOptions options = LexOptions.init, Input)(ref Input input, string filenam e = "") if (isStringInputRange!Input || isIntegralInputRange!Input)is the current build is ready for production? I am getting error: source\stdx\data\json\value.d(81): Error: safe function 'stdx.data.json.value.JSONValue.this' cannot call system function 'std.variant.VariantN!(12u, typeof(null), bool, double, long, BigInt, string, JSONValue[], JSONValue[string]).VariantN.__ctor!(typeof(null)).this'2.068 "fixed" possible safety issues with VariantN by marking the interface system instead of trusted. Unfortunately that broke any safe code using Variant/Algebraic.
Jul 31 2015
Am 31.07.2015 um 22:15 schrieb Suliman:On Friday, 31 July 2015 at 12:16:02 UTC, Sönke Ludwig wrote:parseJSONValue takes a reference to an input range, so that it can consume the input and leave any trailing text after the JSON value in the range. For just converting a string to a JSONValue, use toJSONValue instead. I'll make this more clear in the documentation.Am 31.07.2015 um 10:13 schrieb Suliman:Wat revision are usable? I checked some and all have issue like: source\App.d(5,34): Error: template stdx.data.json.parser.parseJSONValue cannot deduce function from argument types !()(string), candidates are: source\stdx\data\json\parser.d(105,11): stdx.data.json.parser.parseJSONVa lue(LexOptions options = LexOptions.init, Input)(ref Input input, string filenam e = "") if (isStringInputRange!Input || isIntegralInputRange!Input)is the current build is ready for production? I am getting error: source\stdx\data\json\value.d(81): Error: safe function 'stdx.data.json.value.JSONValue.this' cannot call system function 'std.variant.VariantN!(12u, typeof(null), bool, double, long, BigInt, string, JSONValue[], JSONValue[string]).VariantN.__ctor!(typeof(null)).this'2.068 "fixed" possible safety issues with VariantN by marking the interface system instead of trusted. Unfortunately that broke any safe code using Variant/Algebraic.
Aug 01 2015
parseJSONValue takes a reference to an input range, so that it can consume the input and leave any trailing text after the JSON value in the range. For just converting a string to a JSONValue, use toJSONValue instead. I'll make this more clear in the documentation.Yes please, because it's hard to understand difference. Maybe it's possible to simplify it more? Also I get trouble with extracting value: response = toJSONValue(res.bodyReader.readAllUTF8()); writeln(to!int(response["code"])); C:\D\dmd2\windows\bin\..\..\src\phobos\std\conv.d(295,24): Error: template std.c onv.toImpl cannot deduce function from argument types !(int)(VariantN!20u), cand idates are: C:\D\dmd2\windows\bin\..\..\src\phobos\std\conv.d(361,3): std.conv.toImpl (T, S)(S value) if (isImplicitlyConvertible!(S, T) && !isEnumStrToStr!(S, T) && !isNullToStr!(S, T)) If I am doing simple: writeln(response["code"]); Code produce right result (for example 200) What value consist in key "code"? It's look like not simple "200". How I can convert it to string or int?
Aug 01 2015
Look like it's Variant type. So I tried to use method get! do extract value from it writeln(get!(response["code"])); But I get error: Error: variable response cannot be read at compile time
Aug 01 2015
Am 01.08.2015 um 16:15 schrieb Suliman:Look like it's Variant type. So I tried to use method get! do extract value from it writeln(get!(response["code"])); But I get error: Error: variable response cannot be read at compile timeThe correct syntax is: response["code"].get!int
Aug 01 2015
On Saturday, 1 August 2015 at 14:52:55 UTC, Sönke Ludwig wrote:Am 01.08.2015 um 16:15 schrieb Suliman:Thanks! But How to get access to elements that in result: {} for example to: "name":"_system" {"result":{"name":"_system","id":"76067","path":"database-6067","isSystem":true},"error":false,"code":200} Could you also extend docs with code example.Look like it's Variant type. So I tried to use method get! do extract value from it writeln(get!(response["code"])); But I get error: Error: variable response cannot be read at compile timeThe correct syntax is: response["code"].get!int
Aug 01 2015
On Saturday, 1 August 2015 at 14:52:55 UTC, Sönke Ludwig wrote:Am 01.08.2015 um 16:15 schrieb Suliman:connectInfo.statusCode = response["code"].get!int; std.variant.VariantException std\variant.d(1445): Variant: attempting to use incompatible types stdx.data.json.value.JSONValue and intLook like it's Variant type. So I tried to use method get! do extract value from it writeln(get!(response["code"])); But I get error: Error: variable response cannot be read at compile timeThe correct syntax is: response["code"].get!int
Aug 01 2015
On 2015-07-30 06:41, Walter Bright wrote:I agree with your goal of readability. And if someone wants to write code that emphasizes it's JSON, they can write it as std.data.json.parseStream. (It's not about saving typing, it's about avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-) ) This is not a huge deal for me, but I'm not in favor of establishing a new convention that repeats the module name. It eschews one of the advantages of having module name spaces in the first place, and evokes the old C style naming conventions.I kind of agree with that, but at the same time, if one always need to use the fully qualified name (or an alias) because there's a conflict then that's quite annoying. A prefect example of that is the Path module in Tango. It has functions as "split" and "join". Every time I use it I alias the import: import Path = tango.io.Path; Because otherwise it will conflict with the string manipulating functions with the same names. In Phobos the names in the path module are different compared to the string functions. For example, I think "Value" and "parse" are too generic to not include "JSON" in their name. -- /Jacob Carlborg
Jul 30 2015
On 07/30/2015 02:40 PM, Jacob Carlborg wrote:On 2015-07-30 06:41, Walter Bright wrote:It also fucks up UFCS, and I'm a huge fan of UFCS. I do agree that D's module system is awesome here and worth taking advantage of to avoid C++-style naming conventions, but I still think balance is needed. Sometimes, just because we can use a shorter potentially-conflicting name doesn't mean we necessarily should.I agree with your goal of readability. And if someone wants to write code that emphasizes it's JSON, they can write it as std.data.json.parseStream. (It's not about saving typing, it's about avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-) ) This is not a huge deal for me, but I'm not in favor of establishing a new convention that repeats the module name. It eschews one of the advantages of having module name spaces in the first place, and evokes the old C style naming conventions.I kind of agree with that, but at the same time, if one always need to use the fully qualified name (or an alias) because there's a conflict then that's quite annoying.
Aug 21 2015
On Friday, 21 August 2015 at 15:58:22 UTC, Nick Sabalausky wrote:It also fucks up UFCS, and I'm a huge fan of UFCS.Are you saying that "import json : parseJSON = parse; foo.parseJSON.bar;" does not work? – David
Aug 21 2015
On 08/21/2015 12:29 PM, David Nadlinger wrote:On Friday, 21 August 2015 at 15:58:22 UTC, Nick Sabalausky wrote:Ok, fair point, although I was referring more to fully-qualified name lookups, as in the snippet I quoted from Jacob. Ie, this doesn't work: someJsonCode.std.json.parse(); I do think though, generally speaking, if there is much need to do a renamed import, the symbol in question probably didn't have the best name in the first place. Renamed importing is a great feature to have, but when you see it used it raises the question "*Why* is this being renamed? Why not just use it's real name?" For the most part, I see two main reasons: 1. "Just because. I like this bikeshed color better." But this is merely a code smell, not a legitimate reason to even bother. or 2. The symbol has a questionable name in the first place. If there's reason to even bring up renamed imports as a solution, then it's probably falling into the "questionably named" category. Just because we CAN use D's module system and renamed imports and such to clear up ambiguities, doesn't mean we should let ourselves take things TOO far to the opposite extreme when avoiding C/C++'s "big long ugly names as a substitute for modules". Like Walter, I do very much dislike C/C++'s super-long, super-unambiguous names. But IMO, preferring parseStream over parseJSONStream isn't a genuine case of avoiding C/C++-style naming, it's just being overrun by fear of C/C++-style naming and thus taking things too far to the opposite extreme. We can strike a better balance than choosing between "brief and unclear-at-a-glance" and "C++-level verbosity". Yea, we CAN do "import std.json : parseJSONStream = parseStream;", but if there's even any motivation to do so in the first place, we may as well just use the better name right from the start. Besides, those who prefer ultra-brevity are free to paint their bikesheds with renamed imports, too ;)It also fucks up UFCS, and I'm a huge fan of UFCS.Are you saying that "import json : parseJSON = parse; foo.parseJSON.bar;" does not work?
Aug 22 2015
On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:On 7/28/2015 7:07 AM, Atila Neves wrote:Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file). Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in. And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case. It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.Start of the two week process, folks.Thank you very much, Sönke, for taking this on. Thank you, Atila, for taking on the thankless job of being review manager. Just looking at the documentation only, some general notes: 1. Not sure that 'JSON' needs to be embedded in the public names. 'parseJSONStream' should just be 'parseStream', etc. Name disambiguation, if needed, should be ably taken care of by a number of D features for that purpose. Additionally, I presume that the stdx.data package implies a number of different formats. These formats should all use the same names with as similar as possible APIs - this won't work too well if JSON is embedded in the APIs. 2. JSON is a trivial format, http://json.org/. But I count 6 files and 30 names in the public API. 3. Stepping back a bit, when I think of parsing JSON data, I think: auto ast = inputrange.toJSON(); where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary. To create output: auto r = ast.toChars(); // r is an InputRange of characters writeln(r); So, we'll need: toJSON toChars JSONException The possible JSON values are: string number object (associative arrays) array true false null Since these are D builtin types, they can actually be a simple union of D builtin types.
Jul 29 2015
On Wed, Jul 29, 2015 at 03:22:05PM +0000, Don via Digitalmars-d wrote:On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:[...][...] Here's a thought: what about always storing JSON numbers as strings (albeit tagged with the "number" type, to differentiate them from actual strings in the input), and the user specifies what type to convert it to? The default type can be something handy, like int, but the user has the option to ask for size_t, or double, or even BigInt if they want (IIRC, the BigInt ctor can initialize an instance from a digit string, so if we adopt the convention that non-built-in number-like types can be initialized from digit strings, then std.json can simply take a template parameter for the output type, and hand it the digit string. This way, we can get rid of the std.bigint dependency, except where the user actually wants to use BigInt.) T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.The possible JSON values are: string number object (associative arrays) array true false null Since these are D builtin types, they can actually be a simple union of D builtin types.Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file). Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in. And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case. It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.
Jul 29 2015
Here's a thought: what about always storing JSON numbers as strings (albeit tagged with the "number" type, to differentiate them from actual strings in the input), and the user specifies what type to convert it to? The default type can be something handy, like int, but the user has the option to ask for size_t, or double, or even BigInt if they want (IIRC, the BigInt ctor can initialize an instance from a digit string, so if we adopt the convention that non-built-in number-like types can be initialized from digit strings, then std.json can simply take a template parameter for the output type, and hand it the digit string. This way, we can get rid of the std.bigint dependency, except where the user actually wants to use BigInt.)Some JSON files can be quite large... For example, I have a compressed 175 Gig of Reddit comments (one file per month) I would like to work with using D, and time + memory demands = money. Wouldn't it be a pain not to store numbers directly when parsing in those cases (if I understood you correctly)?
Jul 29 2015
On Wednesday, 29 July 2015 at 17:04:33 UTC, Laeeth Isharc wrote:I think in your case it wouldn't matter. Comments are text, mostly. There's probably just one or two fields with "number" type.[...]Some JSON files can be quite large... For example, I have a compressed 175 Gig of Reddit comments (one file per month) I would like to work with using D, and time + memory demands = money. Wouldn't it be a pain not to store numbers directly when parsing in those cases (if I understood you correctly)?
Jul 29 2015
Am 29.07.2015 um 18:47 schrieb H. S. Teoh via Digitalmars-d:On Wed, Jul 29, 2015 at 03:22:05PM +0000, Don via Digitalmars-d wrote:That means a performance hit, because the string has to be parsed twice - once for validation and once for conversion. And it means that for non-string inputs the lexer has to allocate for each number. It also doesn't know the length of the number in advance, so it can't allocate in a generally efficient way.On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:[...][...] Here's a thought: what about always storing JSON numbers as strings (albeit tagged with the "number" type, to differentiate them from actual strings in the input), and the user specifies what type to convert it to? The default type can be something handy, like int, but the user has the option to ask for size_t, or double, or even BigInt if they want (IIRC, the BigInt ctor can initialize an instance from a digit string, so if we adopt the convention that non-built-in number-like types can be initialized from digit strings, then std.json can simply take a template parameter for the output type, and hand it the digit string. This way, we can get rid of the std.bigint dependency, except where the user actually wants to use BigInt.) TThe possible JSON values are: string number object (associative arrays) array true false null Since these are D builtin types, they can actually be a simple union of D builtin types.Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file). Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in. And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case. It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.
Jul 29 2015
Hi Sonke, Great to see your module moving towards phobos inclusion (I have not been following the latest progress of D sadly :() ! Just a small remark from the documentation example. Maybe it would be better to replace : value.toJSONString!true() by value.toJSONString!prettify() using a well-named enum instead of a boolean which could seem obscure I now Eigen C++ lib use a similar thing for static vs dynamic matrix. Thanks for the read. Regards, matovitch
Jul 29 2015
Am 29.07.2015 um 20:21 schrieb matovitch:Hi Sonke, Great to see your module moving towards phobos inclusion (I have not been following the latest progress of D sadly :() ! Just a small remark from the documentation example. Maybe it would be better to replace : value.toJSONString!true() by value.toJSONString!prettify() using a well-named enum instead of a boolean which could seem obscure I now Eigen C++ lib use a similar thing for static vs dynamic matrix. Thanks for the read. Regards, matovitchHm, that example is outdated, I'll fix it ASAP. Currently it uses toJSON and a separate toPrettyJSON function. An obvious alternative would be to add an entry GeneratorOptions.prettify, because toJSON already takes that as a template argument: toJSON!(GeneratorOptions.prettify)
Jul 29 2015
Am 29.07.2015 um 17:22 schrieb Don:Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file). Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in.BigInt is opt-in, at least as far as the lexer goes. But why would such a number be rejected? Any of the usual floating point parsers would simply parse the number and just lose precision if it can't be represented exactly. And after all, it's still valid JSON. But note that I've only added this due to multiple requests, it doesn't seem to be that uncommon. We *could* in theory make the JSONNumber type a template and make the bigint fields optional. That would be the only thing missing to making the import optional, too.And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case. It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.If we'd have a Decimal type in Phobos, I would have integrated that, too. The string representation may be an alternative, but since the weight of the import is the main argument, I'd rather choose the more comfortable/logical option - or probably rather try to avoid std.bigint being such a heavy import (such as local imports to defer secondary imports).
Jul 29 2015
On Wednesday, 29 July 2015 at 15:22:06 UTC, Don wrote:On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:[snip]Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file). Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in. And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case. It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.Actually JSON is defined as subset of EMCASCript-262 spec hence it may not ciontain anything other 64-bit5 IEEE-754 numbers period. See: http://www.ecma-international.org/ecma-262/6.0/index.html#sec-terms-and-definitions-number-value http://www.ecma-international.org/ecma-262/6.0/index.html#sec-ecmascript-language-types-number-type Anything else is e-hm an "extension" (or simply put - violation of spec), I've certainly seen 64-bit integers in the wild - how often true big ints are found out there? If no one can present some run of the mill REST JSON API breaking the rules I'd suggest demoting BigInt handling to optional feature.
Aug 02 2015
Am 02.08.2015 um 19:14 schrieb Dmitry Olshansky:Actually JSON is defined as subset of EMCASCript-262 spec hence it may not ciontain anything other 64-bit5 IEEE-754 numbers period. See: http://www.ecma-international.org/ecma-262/6.0/index.html#sec-terms-and-definitions-number-value http://www.ecma-international.org/ecma-262/6.0/index.html#sec-ecmascript-language-types-number-type Anything else is e-hm an "extension" (or simply put - violation of spec), I've certainly seen 64-bit integers in the wild - how often true big ints are found out there? If no one can present some run of the mill REST JSON API breaking the rules I'd suggest demoting BigInt handling to optional feature.This is not true. Quoting from ECMA-404:JSON is a text format that facilitates structured data interchange between all programming languages. JSON is syntax of braces, brackets, colons, and commas that is useful in many contexts, profiles, and applications. JSON was inspired by the object literals of JavaScript aka ECMAScript as defined in the ECMAScript Language Specification, third Edition [1]. It does not attempt to impose ECMAScript’s internal data representations on other programming languages. Instead, it shares a small subset of ECMAScript’s textual representations with all other programming languages. JSON is agnostic about numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal. That can make interchange between different programming languages difficult. JSON instead offers only the representation of numbers that humans use: a sequence of digits. All programming languages know how to make sense of digit sequences even if they disagree on internal representations. That is enough to allow interchange.
Aug 03 2015
On 03-Aug-2015 10:56, Sönke Ludwig wrote:Am 02.08.2015 um 19:14 schrieb Dmitry Olshansky:Hm about 5 solid pages and indeed it leaves everything unspecified for extensibility so I stand corrected. Still I'm more inclined to put my trust in RFCs, such as the new one: http://www.ietf.org/rfc/rfc7159.txt Which states: This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available. Note that when such software is used, numbers that are integers and are in the range [-(2**53)+1, (2**53)-1] are interoperable in the sense that implementations will agree exactly on their numeric values. And it implies setting limits on everything: 9. Parsers A JSON parser transforms a JSON text into another representation. A JSON parser MUST accept all texts that conform to the JSON grammar. A JSON parser MAY accept non-JSON forms or extensions. An implementation may set limits on the size of texts that it accepts. An implementation may set limits on the maximum depth of nesting. An implementation may set limits on the range and precision of numbers. An implementation may set limits on the length and character contents of strings. Now back to our land let's look at say rapidJSON. It MAY seem to handle big integers: https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/biginteger.h But it's used only to parse doubles: https://github.com/miloyip/rapidjson/pull/137 Anyhow the API says it all - only integers up to 64bit and doubles: http://rapidjson.org/md_doc_sax.html#Handler Pretty much what I expect by default. And plz-plz don't hardcode BitInteger in JSON parser, it's slow plus it causes epic code bloat as Don already pointed out. -- Dmitry OlshanskyActually JSON is defined as subset of EMCASCript-262 spec hence it may not ciontain anything other 64-bit5 IEEE-754 numbers period. See: http://www.ecma-international.org/ecma-262/6.0/index.html#sec-terms-and-definitions-number-value http://www.ecma-international.org/ecma-262/6.0/index.html#sec-ecmascript-language-types-number-type Anything else is e-hm an "extension" (or simply put - violation of spec), I've certainly seen 64-bit integers in the wild - how often true big ints are found out there? If no one can present some run of the mill REST JSON API breaking the rules I'd suggest demoting BigInt handling to optional feature.This is not true. Quoting from ECMA-404:JSON is a text format that facilitates structured data interchange between all programming languages. JSON is syntax of braces, brackets, colons, and commas that is useful in many contexts, profiles, and applications. JSON was inspired by the object literals of JavaScript aka ECMAScript as defined in the ECMAScript Language Specification, third Edition [1]. It does not attempt to impose ECMAScript’s internal data representations on other programming languages. Instead, it shares a small subset of ECMAScript’s textual representations with all other programming languages. JSON is agnostic about numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal. That can make interchange between different programming languages difficult. JSON instead offers only the representation of numbers that humans use: a sequence of digits. All programming languages know how to make sense of digit sequences even if they disagree on internal representations. That is enough to allow interchange.
Aug 03 2015
Am Mon, 03 Aug 2015 12:11:14 +0300 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:[...] Now back to our land let's look at say rapidJSON. It MAY seem to handle big integers: https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/biginteger.h But it's used only to parse doubles: https://github.com/miloyip/rapidjson/pull/137 Anyhow the API says it all - only integers up to 64bit and doubles: http://rapidjson.org/md_doc_sax.html#Handler Pretty much what I expect by default. And plz-plz don't hardcode BitInteger in JSON parser, it's slow plus it causes epic code bloat as Don already pointed out.I would take RapidJSON with a grain of salt, its main goal is to be the fastest JSON parser. Nothing wrong with that, but BigInt and fast doesn't naturally match and the C standard library also doesn't come with a BigInt type that could conveniently be plugged in. Please compare again with JSON parsers in languages that provide BigInts, e.g. Ruby: http://ruby-doc.org/stdlib-1.9.3/libdoc/json/rdoc/JSON/Ext/Generator/GeneratorMethods/Bignum.html Optional ok, but no support at all would be so 90s. My impression is that the standard wants to allow JSON being used in environments that cannot provide BigInt support, but a modern language for PCs with a BigInt module should totally support reading long integers and be able to do proper rounding of double values. I thought about reading two BigInts: one for the significand and one for the base-10 exponent, so you don't need a BigFloat but have the full accuracy from the textual string still as x*10^y. -- Marco
Sep 27 2015
On 27-Sep-2015 20:43, Marco Leise wrote:Am Mon, 03 Aug 2015 12:11:14 +0300 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:Yes, yet support should be optional.[...] Now back to our land let's look at say rapidJSON. It MAY seem to handle big integers: https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/biginteger.h But it's used only to parse doubles: https://github.com/miloyip/rapidjson/pull/137 Anyhow the API says it all - only integers up to 64bit and doubles: http://rapidjson.org/md_doc_sax.html#Handler Pretty much what I expect by default. And plz-plz don't hardcode BitInteger in JSON parser, it's slow plus it causes epic code bloat as Don already pointed out.I would take RapidJSON with a grain of salt, its main goal is to be the fastest JSON parser. Nothing wrong with that, but BigInt and fast doesn't naturally match and the C standard library also doesn't come with a BigInt type that could conveniently be plugged in.Please compare again with JSON parsers in languages that provide BigInts, e.g. Ruby: http://ruby-doc.org/stdlib-1.9.3/libdoc/json/rdoc/JSON/Ext/Generator/GeneratorMethods/Bignum.html Optional ok, but no support at all would be so 90s.Agreed. Still keep in mind the whole reason that Ruby supports it is because its "integer" type is multi-precision by default. So if your native integer type is multi-precision than indeed why add a special case for fixnums.My impression is that the standard wants to allow JSON being used in environments that cannot provide BigInt support, but a modern language for PCs with a BigInt module should totally support reading long integers and be able to do proper rounding of double values. I thought about reading two BigInts: one for the significand and one for the base-10 exponent, so you don't need a BigFloat but have the full accuracy from the textual string still as x*10^y.All of that is sensible ... in the slow code path. The common path must be simple and lean, bigints are certainly an exception rather then the rule. Therefore support for big int should not come at the expense for other use cases. Also - pluggability should allow me to e.g. use my own "big" decimal floating point. -- Dmitry Olshansky
Sep 27 2015
A speed optimization, since JSON parsing speed is critical: If the parser is able to use slices of its input, store numbers as slices. Only convert them to numbers lazily, as the numeric conversion can take significant time.
Jul 28 2015
On Tuesday, 28 July 2015 at 23:16:34 UTC, Walter Bright wrote:A speed optimization, since JSON parsing speed is critical: If the parser is able to use slices of its input, store numbers as slices. Only convert them to numbers lazily, as the numeric conversion can take significant time.That's what it does (depending on which parser you use). The StAX style parser included is lazy and non-allocating.
Jul 28 2015
On 7/28/2015 4:24 PM, Brad Anderson wrote:On Tuesday, 28 July 2015 at 23:16:34 UTC, Walter Bright wrote:Great!A speed optimization, since JSON parsing speed is critical: If the parser is able to use slices of its input, store numbers as slices. Only convert them to numbers lazily, as the numeric conversion can take significant time.That's what it does (depending on which parser you use). The StAX style parser included is lazy and non-allocating.
Jul 28 2015
On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaWhy don't do a shortcut like: jv.opt("/this/is/a/path") ? I use it in my json/bson binding. Anyway, opt(...).isNull return true if that sub-obj doesn't exists. How can I check instead if that sub-object is actually null? Something like: { "a" : { "b" : null} } ? It would be nice to have a way to get a default if it doesn't exists. On my library that behave in a different way i write: Object is : { address : { number: 15 } } // as!xxx try to get a value of that type, if it can't it tries to convert it using .to!xxx if it fails again it returns default // Converted as string assert(obj["/address/number"].as!string == "15"); // This doesn't exists assert(obj["/address/asdasd"].as!int == int.init); // A default value is specified assert(obj["/address/asdasd"].as!int(50) == 50); // A default value is specified (but value exists) assert(obj["/address/number"].as!int(50) == 15); // This doesn't exists assert(!obj["address"]["number"]["this"].exists); My library has a get!xxx string too (that throws an exception if value is not xxx) and to!xxx that throws an exception if value can't converted to xxx. Other feature: // This field doesn't exists return default value auto tmpField = obj["/address/asdasd"].as!int(50); assert(tmpField.error == true); // Value is defaulted ... assert(tmpField.exists == false); // ... because it doesn't exists assert(tmpField == 50); // This field exists, but can't be converted to int. Return default value. tmpField = obj["/tags/0"].as!int(50); assert(tmpField.error == true); // Value is defaulted ... assert(tmpField.exists == true); // ... but a field is actually here assert(tmpField == 50);
Jul 29 2015
Am 29.07.2015 um 09:46 schrieb Andrea Fontana:On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:That would be another possibility. What do you think about the opt(jv).foo.bar[12].baz alternative? One advantage is that it could work without parsing a string and the implications thereof (error handling?).Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaWhy don't do a shortcut like: jv.opt("/this/is/a/path") ? I use it in my json/bson binding.Anyway, opt(...).isNull return true if that sub-obj doesn't exists. How can I check instead if that sub-object is actually null? Something like: { "a" : { "b" : null} } ?opt(...) == nullIt would be nice to have a way to get a default if it doesn't exists. On my library that behave in a different way i write: Object is : { address : { number: 15 } } // as!xxx try to get a value of that type, if it can't it tries to convert it using .to!xxx if it fails again it returns default // Converted as string assert(obj["/address/number"].as!string == "15"); // This doesn't exists assert(obj["/address/asdasd"].as!int == int.init); // A default value is specified assert(obj["/address/asdasd"].as!int(50) == 50); // A default value is specified (but value exists) assert(obj["/address/number"].as!int(50) == 15); // This doesn't exists assert(!obj["address"]["number"]["this"].exists); My library has a get!xxx string too (that throws an exception if value is not xxx) and to!xxx that throws an exception if value can't converted to xxx.I try to build this from existing building blocks in Phobos, so opt basically returns a Nullable!Algebraic. I guess some of it could simply be implemented in Algebraic, for example by adding an overload of .get that takes a default value. Instead of .to, you already have .coerce. The other possible approach, which would be more convenient to use, would be add a "default value" overload to "opt", for example: jv.opt("defval").foo.barOther feature: // This field doesn't exists return default value auto tmpField = obj["/address/asdasd"].as!int(50); assert(tmpField.error == true); // Value is defaulted ... assert(tmpField.exists == false); // ... because it doesn't exists assert(tmpField == 50); // This field exists, but can't be converted to int. Return default value. tmpField = obj["/tags/0"].as!int(50); assert(tmpField.error == true); // Value is defaulted ... assert(tmpField.exists == true); // ... but a field is actually here assert(tmpField == 50);
Jul 29 2015
On Wednesday, 29 July 2015 at 08:55:20 UTC, Sönke Ludwig wrote:That would be another possibility. What do you think about the opt(jv).foo.bar[12].baz alternative? One advantage is that it could work without parsing a string and the implications thereof (error handling?).I implemented it too, but I removed. Many times fields name are functions name or similar and it breaks the code. In my implementation it creates a lot of temporary objects (one for each subobj) using the string instead, i just create the last one. It's not easy for me to use assignments with that syntax. Something like: obj.with.a.new.field = 3; It's difficult to implement. It's much easier to implement: obj["/field/doesnt/exists"] = 3 It's much easier to write formatted-string paths. It allows future implementation of something like xpath/jquery style If your json contains keys with "/" inside, you can still use old plain syntax... String parsing it's quite easy (at compile time too) of course. If a part of path doesn't exists it works like a part of opt("a", "b", "c") doesn't. It's just syntax sugar. :)Does it works? Anyway it seems ambiguous: opt(...) == null => false opt(...).isNull => trueAnyway, opt(...).isNull return true if that sub-obj doesn't exists. How can I check instead if that sub-object is actually null? Something like: { "a" : { "b" : null} } ?opt(...) == nullIsn't jv.opt("defval") taking the value of ("defval") rather than setting a default value?It would be nice to have a way to get a default if it doesn't exists. On my library that behave in a different way i write: Object is : { address : { number: 15 } } // as!xxx try to get a value of that type, if it can't it tries to convert it using .to!xxx if it fails again it returns default // Converted as string assert(obj["/address/number"].as!string == "15"); // This doesn't exists assert(obj["/address/asdasd"].as!int == int.init); // A default value is specified assert(obj["/address/asdasd"].as!int(50) == 50); // A default value is specified (but value exists) assert(obj["/address/number"].as!int(50) == 15); // This doesn't exists assert(!obj["address"]["number"]["this"].exists); My library has a get!xxx string too (that throws an exception if value is not xxx) and to!xxx that throws an exception if value can't converted to xxx.I try to build this from existing building blocks in Phobos, so opt basically returns a Nullable!Algebraic. I guess some of it could simply be implemented in Algebraic, for example by adding an overload of .get that takes a default value. Instead of .to, you already have .coerce. The other possible approach, which would be more convenient to use, would be add a "default value" overload to "opt", for example: jv.opt("defval").foo.bar
Jul 29 2015
Am 29.07.2015 um 11:58 schrieb Andrea Fontana:On Wednesday, 29 July 2015 at 08:55:20 UTC, Sönke Ludwig wrote:In this case, since it would be a separate type, there are no static members apart from the automatically generated ones and maybe something like opIndex/opAssign. It can of course also overload opIndex with a string argument, so that there is a generic alternative in case of conflicts or runtime key names.That would be another possibility. What do you think about the opt(jv).foo.bar[12].baz alternative? One advantage is that it could work without parsing a string and the implications thereof (error handling?).I implemented it too, but I removed. Many times fields name are functions name or similar and it breaks the code.In my implementation it creates a lot of temporary objects (one for each subobj) using the string instead, i just create the last one.If the temporary objects are cheap, I don't see an issue there. Without keeping track of the path, a simple pointer to a JSONValue should be sufficient (the temporary objects have to be made non-copyable).It's not easy for me to use assignments with that syntax. Something like: obj.with.a.new.field = 3; It's difficult to implement. It's much easier to implement: obj["/field/doesnt/exists"] = 3Maybe more difficult, but certainly possible. If the complexity doesn't explode, I'd say that shouldn't be a primary concern, since this is all still pretty simple.It's much easier to write formatted-string paths. It allows future implementation of something like xpath/jquery styleAdvanced path queries could indeed be interesting, possibly even more interesting if applied to the pull parser.If your json contains keys with "/" inside, you can still use old plain syntax...A possible alternative would be to support some kind of escape syntax.String parsing it's quite easy (at compile time too) of course. If a part of path doesn't exists it works like a part of opt("a", "b", "c") doesn't. It's just syntax sugar. :)Granted, it's not really much in this case, but you do get less static checking, which means that some things will only be caught at run time. Also, you'll get an ambiguity if you want to support array indices, too. Finally, it may even be security relevant, because an attacker might try to sneak in a key that contains slash characters to access/overwrite fields that would normally not be possible. So every user input that may end up in a path query will have to be validated first now.Does it works? Anyway it seems ambiguous: opt(...) == null => false opt(...).isNull => trueThe former gets forwarded to Algebraic, while the latter is a method of the enclosing Nullable. I've tested it and it works. But I also agree it isn't particularly pretty in this case, but that's what we have in D as basic building blocks (or do we have an Optional type somewhere, yet).It would be an opt with different semantics, just a theoretical alternative. This behavior would be mutually exclusive to the current opt.The other possible approach, which would be more convenient to use, would be add a "default value" overload to "opt", for example: jv.opt("defval").foo.barIsn't jv.opt("defval") taking the value of ("defval") rather than setting a default value?
Jul 30 2015
On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaLooked in the doc ( http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.html ). I wanted to know how JSONValue can be manipulated. That is not very explicit. First, it doesn't looks like the value can embed null as a value. null is a valid json value. Secondly, it seems that it accept bigint. As per JSON spec, the only kind of numeric value you can have in there is a num, which doesn't even make the difference between floating point and integer (!) and with 53 bits of precision. By having double and long in there, we are already way over spec, so I'm not sure why we'd want to put bigint in there. Finally, I'd love to see that JSONValue to exhibit a similar API than jsvar.
Aug 03 2015
Am 03.08.2015 um 23:15 schrieb deadalnix:On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:The documentation is lacking, I'll improve that. JSONValue includes an alias this to an Algebraic, which provides the actual data API. Its type list includes typeof(null).Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaLooked in the doc ( http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.html ). I wanted to know how JSONValue can be manipulated. That is not very explicit. First, it doesn't looks like the value can embed null as a value. null is a valid json value.Secondly, it seems that it accept bigint. As per JSON spec, the only kind of numeric value you can have in there is a num, which doesn't even make the difference between floating point and integer (!) and with 53 bits of precision. By having double and long in there, we are already way over spec, so I'm not sure why we'd want to put bigint in there.See also my reply a few posts back. JSON does not specify anything WRT the precision or length of numbers. In the ECMA standard it is mentioned explicitly that this was done so that applications are not limited in what kind of numbers can be transferred. The only thing explicitly mentioned is that implementations *may* choose to support only 64-bit floats. But large integer numbers are used in practice, so we should be able to handle those, too (one way or another).Finally, I'd love to see that JSONValue to exhibit a similar API than jsvar.This is how it used to be in the vibe.data.json module. I consider that to be a mistake now for multiple reasons, at least on this abstraction level. My proposal would be to have a clean, "strongly typed" JSONValue and a generic jsvar like struct on top of that, which is defined independently, and could for example work on a BSONValue, too. The usage would simply be "var value = parseJSONValue(...);".
Aug 04 2015
On Tuesday, 4 August 2015 at 13:10:11 UTC, Sönke Ludwig wrote:This is how it used to be in the vibe.data.json module. I consider that to be a mistake now for multiple reasons, at least on this abstraction level. My proposal would be to have a clean, "strongly typed" JSONValue and a generic jsvar like struct on top of that, which is defined independently, and could for example work on a BSONValue, too. The usage would simply be "var value = parseJSONValue(...);".That is not going to cut it. I've been working with these for ages. This is the very kind of scenarios where dynamically typed languages are way more convenient. I've used both quite extensively and this is clear cut: you don't want what you call the strongly typed version of things. I've done it in many languages, including in java for instance. jsvar interface remove the problematic parts of JS (use ~ instead of + for concat strings and do not implement the opDispatch part of the API).
Aug 04 2015
Am 04.08.2015 um 19:14 schrieb deadalnix:On Tuesday, 4 August 2015 at 13:10:11 UTC, Sönke Ludwig wrote:I just said that jsvar should be supported (even in its full glory), so why is that not going to cut it? Also, in theory, Algebraic already does more or less exactly what you propose (forwards operators, but skips opDispatch and JS-like string operators).This is how it used to be in the vibe.data.json module. I consider that to be a mistake now for multiple reasons, at least on this abstraction level. My proposal would be to have a clean, "strongly typed" JSONValue and a generic jsvar like struct on top of that, which is defined independently, and could for example work on a BSONValue, too. The usage would simply be "var value = parseJSONValue(...);".That is not going to cut it. I've been working with these for ages. This is the very kind of scenarios where dynamically typed languages are way more convenient. I've used both quite extensively and this is clear cut: you don't want what you call the strongly typed version of things. I've done it in many languages, including in java for instance. jsvar interface remove the problematic parts of JS (use ~ instead of + for concat strings and do not implement the opDispatch part of the API).
Aug 11 2015
On Tuesday, 11 August 2015 at 21:27:48 UTC, Sönke Ludwig wrote:Ok, then maybe there was a misunderstanding on my part. My understanding was that there was a Node coming from the parser, and that the node could be wrapped in some facility providing a jsvar like API. My position is that it is preferable to have whatever DOM node be jsvar like out of the box rather than having to wrap it into something to get that.That is not going to cut it. I've been working with these for ages. This is the very kind of scenarios where dynamically typed languages are way more convenient. I've used both quite extensively and this is clear cut: you don't want what you call the strongly typed version of things. I've done it in many languages, including in java for instance. jsvar interface remove the problematic parts of JS (use ~ instead of + for concat strings and do not implement the opDispatch part of the API).I just said that jsvar should be supported (even in its full glory), so why is that not going to cut it? Also, in theory, Algebraic already does more or less exactly what you propose (forwards operators, but skips opDispatch and JS-like string operators).
Aug 11 2015
Am 11.08.2015 um 23:52 schrieb deadalnix:On Tuesday, 11 August 2015 at 21:27:48 UTC, Sönke Ludwig wrote:Okay, no that's correct.Ok, then maybe there was a misunderstanding on my part. My understanding was that there was a Node coming from the parser, and that the node could be wrapped in some facility providing a jsvar like API.That is not going to cut it. I've been working with these for ages. This is the very kind of scenarios where dynamically typed languages are way more convenient. I've used both quite extensively and this is clear cut: you don't want what you call the strongly typed version of things. I've done it in many languages, including in java for instance. jsvar interface remove the problematic parts of JS (use ~ instead of + for concat strings and do not implement the opDispatch part of the API).I just said that jsvar should be supported (even in its full glory), so why is that not going to cut it? Also, in theory, Algebraic already does more or less exactly what you propose (forwards operators, but skips opDispatch and JS-like string operators).My position is that it is preferable to have whatever DOM node be jsvar like out of the box rather than having to wrap it into something to get that.But take into account that Algebraic already behaves much like jsvar (at least ideally), just without opDispatch and JavaScript operator emulation (which I'm strongly opposed to as a *default*). So the jsvar wrapper would really just be needed for the cases where really concise code is desired when operating on JSON objects. We also discussed an alternative approach similar to opt(n).foo.bar[1].baz, where n is a JSONValue and opt() creates a wrapper that enables safe navigation within the DOM, propagating any missing/mismatched fields to the final result instead of throwing. This could also be combined with a final type query: opt!string(n).foo.bar
Aug 12 2015
On Wednesday, 12 August 2015 at 07:19:05 UTC, Sönke Ludwig wrote:We also discussed an alternative approach similar to opt(n).foo.bar[1].baz, where n is a JSONValue and opt() creates a wrapper that enables safe navigation within the DOM, propagating any missing/mismatched fields to the final result instead of throwing. This could also be combined with a final type query: opt!string(n).foo.barIn relation to that, you may find this thread interesting: http://forum.dlang.org/post/lnsc0c$1sip$1 digitalmars.com
Aug 12 2015
On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaI forgot to give warnings that the two week period was about to be up, and was unsure from comments if this would be ready for voting, so let's give it another two days unless there are objections. Atila
Aug 11 2015
On Tuesday, 11 August 2015 at 17:08:39 UTC, Atila Neves wrote:On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:Ok some actionable items. 1/ How big is a JSON struct ? What is the biggest element in the union ? Is that element really needed ? Recurse. 2/ As far as I can see, the element are discriminated using typeid. An enum is preferable as the compiler would know values ahead of time and optimize based on this. It also allow use of things like final switch. 3/ Going from the untyped world to the typed world and provide an API to get back to the untyped word is a loser strategy. That sounds true intuitively, but also from my experience manipulating JSON in various languages. The Nodes produced by this lib need to be "manipulatable" as the unstructured values they represent.Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaI forgot to give warnings that the two week period was about to be up, and was unsure from comments if this would be ready for voting, so let's give it another two days unless there are objections. Atila
Aug 11 2015
On 11-Aug-2015 20:30, deadalnix wrote:On Tuesday, 11 August 2015 at 17:08:39 UTC, Atila Neves wrote:+1 Also most JS engines use nan-boxing to fit type tag along with the payload in 8 bytes total. At least the _fast_ path of std.data.json should take advantage of similar techniques.On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:Ok some actionable items. 1/ How big is a JSON struct ? What is the biggest element in the union ? Is that element really needed ? Recurse.Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaI forgot to give warnings that the two week period was about to be up, and was unsure from comments if this would be ready for voting, so let's give it another two days unless there are objections. Atila2/ As far as I can see, the element are discriminated using typeid. An enum is preferable as the compiler would know values ahead of time and optimize based on this. It also allow use of things like final switch.3/ Going from the untyped world to the typed world and provide an API to get back to the untyped word is a loser strategy. That sounds true intuitively, but also from my experience manipulating JSON in various languages. The Nodes produced by this lib need to be "manipulatable" as the unstructured values they represent.-- Dmitry Olshansky
Aug 11 2015
Am 11.08.2015 um 20:15 schrieb Dmitry Olshansky:On 11-Aug-2015 20:30, deadalnix wrote:But the array field already needs 16 bytes on 64-bit systems anyway. We could surely abuse some bits there to at least not use up more for the type tag, but before we go that far, we should first tackle some other questions, such as the allocation strategy of JSONValues during parsing, the Location field and BigInt/Decimal support. Maybe we should first have a vote about whether BigInt/Decimal should be supported or not, because that would at least solve some of the controversial tradeoffs. I didn't have a use for those personally, but at least we had the real-world issue in vibe.d's implementation that a ulong wasn't exactly representable. My view generally still is that the DOM representation is something for convenient manipulation of small chunks of JSON, so that performance is not a priority, but feature completeness is.Ok some actionable items. 1/ How big is a JSON struct ? What is the biggest element in the union ? Is that element really needed ? Recurse.+1 Also most JS engines use nan-boxing to fit type tag along with the payload in 8 bytes total. At least the _fast_ path of std.data.json should take advantage of similar techniques.
Aug 11 2015
On 12-Aug-2015 00:21, Sönke Ludwig wrote:Am 11.08.2015 um 20:15 schrieb Dmitry Olshansky:Pointer to array should work for all fields > 8 bytes. Depending on the ratio frequency of value vs frequency of array (which is at least an ~5-10 in any practical scenario) it would make things both more compact and faster.On 11-Aug-2015 20:30, deadalnix wrote:But the array field already needs 16 bytes on 64-bit systems anyway. We could surely abuse some bits there to at least not use up more for the type tag, but before we go that far, we should first tackle some other questions, such as the allocation strategy of JSONValues during parsing, the Location field and BigInt/Decimal support.Ok some actionable items. 1/ How big is a JSON struct ? What is the biggest element in the union ? Is that element really needed ? Recurse.+1 Also most JS engines use nan-boxing to fit type tag along with the payload in 8 bytes total. At least the _fast_ path of std.data.json should take advantage of similar techniques.Maybe we should first have a vote about whether BigInt/Decimal should be supported or not, because that would at least solve some of the controversial tradeoffs. I didn't have a use for those personally, but at least we had the real-world issue in vibe.d's implementation that a ulong wasn't exactly representable.Well I've stated why I think BigInt should be optional. The reason is C++ parsers don't even bother with anything beyond ULong/double, nor would any e.g. Node.js stuff bother with things beyond double. Lastly we don't have BigFloat so supporting BigInt but not BigFloat is kinda half-way. So please make it an option. And again add an extra indirection (that is BigInt*) for BigInt field in a union because they are extremely rare.My view generally still is that the DOM representation is something for convenient manipulation of small chunks of JSON, so that performance is not a priority, but feature completeness is.I'm confused - there must be some struct that represents a useful value. And more importantly - is JSONValue going to be converted to jsvar? If not - I'm fine. Otherwise whatever inefficiency present in JSONValue would be accumulated by this conversion process. -- Dmitry Olshansky
Aug 11 2015
Am 12.08.2015 um 08:28 schrieb Dmitry Olshansky:On 12-Aug-2015 00:21, Sönke Ludwig wrote:The trouble begins with long vs. ulong, even if we'd leave larger numbers aside. We'd really have to support both, but choosing between the two is ambiguous, which isn't very pretty overall.Am 11.08.2015 um 20:15 schrieb Dmitry Olshansky:Pointer to array should work for all fields > 8 bytes. Depending on the ratio frequency of value vs frequency of array (which is at least an ~5-10 in any practical scenario) it would make things both more compact and faster.On 11-Aug-2015 20:30, deadalnix wrote:But the array field already needs 16 bytes on 64-bit systems anyway. We could surely abuse some bits there to at least not use up more for the type tag, but before we go that far, we should first tackle some other questions, such as the allocation strategy of JSONValues during parsing, the Location field and BigInt/Decimal support.Ok some actionable items. 1/ How big is a JSON struct ? What is the biggest element in the union ? Is that element really needed ? Recurse.+1 Also most JS engines use nan-boxing to fit type tag along with the payload in 8 bytes total. At least the _fast_ path of std.data.json should take advantage of similar techniques.Maybe we should first have a vote about whether BigInt/Decimal should be supported or not, because that would at least solve some of the controversial tradeoffs. I didn't have a use for those personally, but at least we had the real-world issue in vibe.d's implementation that a ulong wasn't exactly representable.Well I've stated why I think BigInt should be optional. The reason is C++ parsers don't even bother with anything beyond ULong/double, nor would any e.g. Node.js stuff bother with things beyond double.Lastly we don't have BigFloat so supporting BigInt but not BigFloat is kinda half-way.That's where Decimal would come in. There is some code for that commented out, but I really didn't want to add it without a standard Phobos implementation. But I wouldn't say that this is really an argument against BigInt, maybe more one for implementing a Decimal type.So please make it an option. And again add an extra indirection (that is BigInt*) for BigInt field in a union because they are extremely rare.Good idea, didn't think about that.There is also the lower level JSONParserNode that represents data of a single bit of the JSON document. But since that struct is just part of a range, its size doesn't matter for speed or memory consumption (they are not allocated or copied while parsing).My view generally still is that the DOM representation is something for convenient manipulation of small chunks of JSON, so that performance is not a priority, but feature completeness is.I'm confused - there must be some struct that represents a useful value.And more importantly - is JSONValue going to be converted to jsvar? If not - I'm fine. Otherwise whatever inefficiency present in JSONValue would be accumulated by this conversion process.By default and currently it isn't, but it might be an idea for the future. The jsvar struct could possibly be implemented as a wrapper around JSONValue as a whole, so that it doesn't have to perform an actual conversion of the whole document. Generally, working with JSONValue is already rather inefficient due to all of the dynamic allocations to populate dynamic and associative arrays. Changing that would require switching to completely different underlying container types, which would at least make the API a lot less intuitive. We could of course also simply provide an alternative value representation that is not based on Algebraic (or an enum tag based alternative) and is not augmented with location information, but optimized solely for speed and low memory consumption.
Aug 12 2015
On 8/12/2015 12:44 AM, Sönke Ludwig wrote:That's where Decimal would come in. There is some code for that commented out, but I really didn't want to add it without a standard Phobos implementation. But I wouldn't say that this is really an argument against BigInt, maybe more one for implementing a Decimal type.Make the type for storing a Number be a template parameter.
Aug 13 2015
Am 14.08.2015 um 07:11 schrieb Walter Bright:On 8/12/2015 12:44 AM, Sönke Ludwig wrote:Then we'd lose the ability to distinguish between integers and floating point in the same lexer instantiation, which is vital for certain input files to avoid losing precision for 64-bit integers. The only solution would be to use Decimal, but that doesn't exist yet and would be slow. But the use of BigInt is already controlled by a template parameter, only the std.bigint import is currently there unconditionally. Hm, another idea would be to store a void* (to a BigInt) instead of a BigInt and only import std.bigint locally in the accessor functions.That's where Decimal would come in. There is some code for that commented out, but I really didn't want to add it without a standard Phobos implementation. But I wouldn't say that this is really an argument against BigInt, maybe more one for implementing a Decimal type.Make the type for storing a Number be a template parameter.
Aug 14 2015
On Friday, 14 August 2015 at 07:14:34 UTC, Sönke Ludwig wrote:Am 14.08.2015 um 07:11 schrieb Walter Bright:Why can't you specify many types? You should be able to query the range/precision of each type?Make the type for storing a Number be a template parameter.Then we'd lose the ability to distinguish between integers and floating point in the same lexer instantiation, which is vital for certain input files to avoid losing precision for 64-bit integers. The only solution would be to use Decimal, but that doesn't exist yet and would be slow.
Aug 14 2015
On 8/14/2015 12:14 AM, Sönke Ludwig wrote:Am 14.08.2015 um 07:11 schrieb Walter Bright:Two other solutions: 1. 'real' has enough precision to hold 64 bit integers. 2. You can use a union of 'long' and a template type T. Use the 'long' if it fits, and T if it doesn't.Make the type for storing a Number be a template parameter.Then we'd lose the ability to distinguish between integers and floating point in the same lexer instantiation, which is vital for certain input files to avoid losing precision for 64-bit integers. The only solution would be to use Decimal, but that doesn't exist yet and would be slow.
Aug 14 2015
On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:1. 'real' has enough precision to hold 64 bit integers.Except for the lowest negative value… (it has only 63 bits + floating point sign bit)
Aug 14 2015
On 8/14/2015 2:20 AM, Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com> wrote:On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:You can always use T for that.1. 'real' has enough precision to hold 64 bit integers.Except for the lowest negative value… (it has only 63 bits + floating point sign bit)
Aug 14 2015
On Friday, 14 August 2015 at 09:20:14 UTC, Ola Fosheim Grøstad wrote:On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:actually the x87 format has 64 mantissa bits, although the bit 63 is always '1' for normalized numbers.1. 'real' has enough precision to hold 64 bit integers.Except for the lowest negative value… (it has only 63 bits + floating point sign bit)
Aug 14 2015
On Friday, 14 August 2015 at 11:44:35 UTC, Matthias Bentrup wrote:On Friday, 14 August 2015 at 09:20:14 UTC, Ola Fosheim Grøstad wrote:Yes, Walter was right. The most negative number can be represented since it is a -(2^63) , so you only need the exponent to represent it (you only need 1 bit from the mantissa).On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:actually the x87 format has 64 mantissa bits, although the bit 63 is always '1' for normalized numbers.1. 'real' has enough precision to hold 64 bit integers.Except for the lowest negative value… (it has only 63 bits + floating point sign bit)
Aug 14 2015
Am 11.08.2015 um 19:30 schrieb deadalnix:Ok some actionable items. 1/ How big is a JSON struct ? What is the biggest element in the union ? Is that element really needed ? Recurse.See http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html The question whether each field is "really" needed obviously depends on the application. However, the biggest type is BigInt that, form a quick look, contains a dynamic array + a bool field, so it's not as compact as it could be, but also not really large. There is also an additional Location field that may sometimes be important for good error messages and the like and sometimes may be totally unneeded. However, my goal when implementing this has never been to make the DOM representation as efficient as possible. The simple reason is that a DOM representation is inherently inefficient when compared to operating on the structure using either the pull parser or using a deserializer that directly converts into a static D type. IMO these should be advertised instead of trying to milk a dead cow (in terms of performance).2/ As far as I can see, the element are discriminated using typeid. An enum is preferable as the compiler would know values ahead of time and optimize based on this. It also allow use of things like final switch.Using a tagged union like structure is definitely what I'd like to have, too. However, the main goal was to build the DOM type upon a generic algebraic type instead of using a home-brew tagged union. The reason is that it automatically makes different DOM types with a similar structure interoperable (JSON/BSON/TOML/...). Now Phobos unfortunately only has Algebraic, which not only doesn't have a type enum, but is currently also really bad at keeping static type information when forwarding function calls or operators. The only options were basically to resort to Algebraic for now, but have something that works, or to first implement an alternative algebraic type and get it accepted into Phobos, which would delay the whole process nearly indefinitely.3/ Going from the untyped world to the typed world and provide an API to get back to the untyped word is a loser strategy. That sounds true intuitively, but also from my experience manipulating JSON in various languages. The Nodes produced by this lib need to be "manipulatable" as the unstructured values they represent.It isn't really clear to me what you mean by this. What exactly about JSONValue can't be manipulated like the "unstructured values [it] represent[s]"? Or do you perhaps mean the JSON -> deserialize -> manipulate -> serialize -> JSON approach? That definitely is not a "loser strategy"*, but yes, it is limited to applications where you have a partially fixed schema. However, arguably most applications fall into that category. * OT: My personal observation is that sadly the overall tone in the community has generally become a lot less friendly over the last months. I'm a bit worried about where this may lead in the long term.
Aug 11 2015
On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:See http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html The question whether each field is "really" needed obviously depends on the application. However, the biggest type is BigInt that, form a quick look, contains a dynamic array + a bool field, so it's not as compact as it could be, but also not really large. There is also an additional Location field that may sometimes be important for good error messages and the like and sometimes may be totally unneeded.Urg. Looks like BigInt should steal a bit somewhere instead of having a bool like this. That is not really your lib's fault, but that's quite an heavy cost. Consider this, if the struct fit into 2 registers, it will be passed around as such rather than in memory. That is a significant difference. For BigInt itself, and, by proxy, for the JSON library. Putting the BigInt thing aside, it seems like the biggest field in there is an array of JSONValues or a string. For the string, you can artificially limit the length by 3 bits to stick a tag. That still give absurdly large strings. For the JSONValue case, the alignment on the pointer is such as you can steal 3 bits from there. Or as for string, the length can be used. It seems very realizable to me to have the JSONValue struct fit into 2 registers, granted the tag fit in 3 bits (8 different types). I can help with that if you want to.However, my goal when implementing this has never been to make the DOM representation as efficient as possible. The simple reason is that a DOM representation is inherently inefficient when compared to operating on the structure using either the pull parser or using a deserializer that directly converts into a static D type. IMO these should be advertised instead of trying to milk a dead cow (in terms of performance).Indeed. Still, JSON nodes should be as lightweight as possible.That is a great point that I haven't considered. I'd go the other way around about it: providing a compatible typeid based struct from the enum tagged one for compatibility. It can even be alias this so the transition is transparent. The transformation is not bijective, so that'd be great to get the most restrictive form (the enum) and fallback on the least restrictive one (alias this) when wanted.2/ As far as I can see, the element are discriminated using typeid. An enum is preferable as the compiler would know values ahead of time and optimize based on this. It also allow use of things like final switch.Using a tagged union like structure is definitely what I'd like to have, too. However, the main goal was to build the DOM type upon a generic algebraic type instead of using a home-brew tagged union. The reason is that it automatically makes different DOM types with a similar structure interoperable (JSON/BSON/TOML/...).Now Phobos unfortunately only has Algebraic, which not only doesn't have a type enum, but is currently also really bad at keeping static type information when forwarding function calls or operators. The only options were basically to resort to Algebraic for now, but have something that works, or to first implement an alternative algebraic type and get it accepted into Phobos, which would delay the whole process nearly indefinitely.That's fine. Done is better than perfect. Still API changes tend to be problematic, so we need to nail that part at least, and an enum with fallback on typeid based solution seems like the best option.Or do you perhaps mean the JSON -> deserialize -> manipulate -> serialize -> JSON approach? That definitely is not a "loser strategy"*, but yes, it is limited to applications where you have a partially fixed schema. However, arguably most applications fall into that category.Yes.
Aug 11 2015
Am 12.08.2015 um 00:21 schrieb deadalnix:On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:Agreed, this was what I also thought. Considering that BigInt is heavy anyway, Dimitry's suggestion to store a "BigInt*" sounds like a good idea to sidestep that issue, though.See http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html The question whether each field is "really" needed obviously depends on the application. However, the biggest type is BigInt that, form a quick look, contains a dynamic array + a bool field, so it's not as compact as it could be, but also not really large. There is also an additional Location field that may sometimes be important for good error messages and the like and sometimes may be totally unneeded.Urg. Looks like BigInt should steal a bit somewhere instead of having a bool like this. That is not really your lib's fault, but that's quite an heavy cost. Consider this, if the struct fit into 2 registers, it will be passed around as such rather than in memory. That is a significant difference. For BigInt itself, and, by proxy, for the JSON library.Putting the BigInt thing aside, it seems like the biggest field in there is an array of JSONValues or a string. For the string, you can artificially limit the length by 3 bits to stick a tag. That still give absurdly large strings. For the JSONValue case, the alignment on the pointer is such as you can steal 3 bits from there. Or as for string, the length can be used. It seems very realizable to me to have the JSONValue struct fit into 2 registers, granted the tag fit in 3 bits (8 different types). I can help with that if you want to.The question is mainly just, should we decide on a single way to represent values (either speed, or features), or let the library user decide by either making JSONValue a template, or by providing two separate structs optimized for each case. In the latter case, we could really optimize on all fronts and for example use custom containers that use less allocations and are more cache friendly than the built-in ones.As long as the set of types is fixed, it would even be bijective. Anyway, I've just started to work on a generic variant of an enum based algebraic type that exploits as much static type information as possible. If that works out (compiler bugs?), it would be a great thing to have in Phobos, so maybe it's worth to delay the JSON module for that if necessary. The optimization to store the type enum in the length field of dynamic arrays could also be built into the generic type.However, my goal when implementing this has never been to make the DOM representation as efficient as possible. The simple reason is that a DOM representation is inherently inefficient when compared to operating on the structure using either the pull parser or using a deserializer that directly converts into a static D type. IMO these should be advertised instead of trying to milk a dead cow (in terms of performance).Indeed. Still, JSON nodes should be as lightweight as possible.That is a great point that I haven't considered. I'd go the other way around about it: providing a compatible typeid based struct from the enum tagged one for compatibility. It can even be alias this so the transition is transparent. The transformation is not bijective, so that'd be great to get the most restrictive form (the enum) and fallback on the least restrictive one (alias this) when wanted.2/ As far as I can see, the element are discriminated using typeid. An enum is preferable as the compiler would know values ahead of time and optimize based on this. It also allow use of things like final switch.Using a tagged union like structure is definitely what I'd like to have, too. However, the main goal was to build the DOM type upon a generic algebraic type instead of using a home-brew tagged union. The reason is that it automatically makes different DOM types with a similar structure interoperable (JSON/BSON/TOML/...).Yeah, the transition is indeed problematic. Sadly the "alias this" idea wouldn't work for for that either, because operators and methods of the enum based algebraic type usually have different return types.Now Phobos unfortunately only has Algebraic, which not only doesn't have a type enum, but is currently also really bad at keeping static type information when forwarding function calls or operators. The only options were basically to resort to Algebraic for now, but have something that works, or to first implement an alternative algebraic type and get it accepted into Phobos, which would delay the whole process nearly indefinitely.That's fine. Done is better than perfect. Still API changes tend to be problematic, so we need to nail that part at least, and an enum with fallback on typeid based solution seems like the best option.Just to state explicitly what I mean: This strategy has the most efficient in-memory storage format and profits from all the static type checking niceties of the compiler. It also means that there is a documented schema in the code that be used for reference by the developers and that will automatically be verified by the serializer, resulting in less and better checked code. So where applicable I claim that this is the best strategy to work with such data. For maximum efficiency, it can also be transparently combined with the pull parser. The pull parser can for example be used to jump between array entries and the serializer then reads each single array entry.Or do you perhaps mean the JSON -> deserialize -> manipulate -> serialize -> JSON approach? That definitely is not a "loser strategy"*, but yes, it is limited to applications where you have a partially fixed schema. However, arguably most applications fall into that category.Yes.
Aug 12 2015
Anyway, I've just started to work on a generic variant of an enum based algebraic type that exploits as much static type information as possible. If that works out (compiler bugs?), it would be a great thing to have in Phobos, so maybe it's worth to delay the JSON module for that if necessary.First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
Aug 12 2015
On 8/12/15 5:43 AM, Sönke Ludwig wrote:struct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, AndreiAnyway, I've just started to work on a generic variant of an enum based algebraic type that exploits as much static type information as possible. If that works out (compiler bugs?), it would be a great thing to have in Phobos, so maybe it's worth to delay the JSON module for that if necessary.First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
Aug 14 2015
On 08/14/2015 01:40 PM, Andrei Alexandrescu wrote:On 8/12/15 5:43 AM, Sönke Ludwig wrote:No, it isn't. I believe the word you might want is "pleonasm". :o)struct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoronAnyway, I've just started to work on a generic variant of an enum based algebraic type that exploits as much static type information as possible. If that works out (compiler bugs?), it would be a great thing to have in Phobos, so maybe it's worth to delay the JSON module for that if necessary.First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.as there's no untagged algebraic type).The tag is an implementation detail. Algebraic types are actually more naturally expressed as polymorphic higher-order functions.
Aug 14 2015
On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:On 8/12/15 5:43 AM, Sönke Ludwig wrote:Ping on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception. Andreistruct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, AndreiAnyway, I've just started to work on a generic variant of an enum based algebraic type that exploits as much static type information as possible. If that works out (compiler bugs?), it would be a great thing to have in Phobos, so maybe it's worth to delay the JSON module for that if necessary.First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
Aug 17 2015
On 17-Aug-2015 21:12, Andrei Alexandrescu wrote:On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:Actually one can combine the two: - use integer type tag for everything built-in - use pointer tag for what is not In code: union NiftyTaggedUnion { // pointer must be at least 4-byte aligned // To discern int tag must have the LSB == 1 // this assumes little-endian though, big-endian is doable too property bool isIntTag(){ return common.head & 1; } IntTagged intTagged; PtrTagged ptrTagged; CommonUnion common; } struct CommonUnion { ubyte[size_of_max_builtin] store; // this is where the type-tag starts - pointer or int uint head; } union IntTagged // int-tagged { union{ // builtins go here int ival; double dval; // .... } uint tag; } union PtrTagged // ptr to typeinfo scheme { ubyte[size_of_max_builtin] payload; TypeInfo* pinfo; } It's going to be challenging but I think I can pull off even nan-boxing with this scheme. -- Dmitry OlshanskyOn 8/12/15 5:43 AM, Sönke Ludwig wrote:Ping on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception.struct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, AndreiAnyway, I've just started to work on a generic variant of an enum based algebraic type that exploits as much static type information as possible. If that works out (compiler bugs?), it would be a great thing to have in Phobos, so maybe it's worth to delay the JSON module for that if necessary.First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
Aug 17 2015
On 8/17/15 2:47 PM, Dmitry Olshansky wrote:Actually one can combine the two: - use integer type tag for everything built-in - use pointer tag for what is notBut a pointer tag can do everything that an integer tag does. -- Andrei
Aug 17 2015
On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:On 8/17/15 2:47 PM, Dmitry Olshansky wrote:albeit quite a deal slooower. -- Dmitry OlshanskyActually one can combine the two: - use integer type tag for everything built-in - use pointer tag for what is notBut a pointer tag can do everything that an integer tag does. -- Andrei
Aug 17 2015
On 8/18/15 2:55 AM, Dmitry Olshansky wrote:On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:I think there's a misunderstanding. Pointers _are_ 64-bit integers and may be compared as such. You can use a pointer as an integer. -- AndreiOn 8/17/15 2:47 PM, Dmitry Olshansky wrote:albeit quite a deal slooower.Actually one can combine the two: - use integer type tag for everything built-in - use pointer tag for what is notBut a pointer tag can do everything that an integer tag does. -- Andrei
Aug 18 2015
On 18-Aug-2015 16:19, Andrei Alexandrescu wrote:On 8/18/15 2:55 AM, Dmitry Olshansky wrote:Integer in a small range is faster to switch on. Plus comparing to zero is faster, so if the common type has tag == 0 it's a net gain. Strictly speaking pointer with vtbl is about as fast as switch but when we have to switch on 2 types the vtbl dispatch needs to be based on 2 types instead of one. So ideally we need vtbl per pair of type to support e.g. fast binary operators on TaggedAlgebraic. -- Dmitry OlshanskyOn 18-Aug-2015 01:33, Andrei Alexandrescu wrote:I think there's a misunderstanding. Pointers _are_ 64-bit integers and may be compared as such. You can use a pointer as an integer. -- AndreiOn 8/17/15 2:47 PM, Dmitry Olshansky wrote:albeit quite a deal slooower.Actually one can combine the two: - use integer type tag for everything built-in - use pointer tag for what is notBut a pointer tag can do everything that an integer tag does. -- Andrei
Aug 18 2015
On 8/18/15 12:31 PM, Dmitry Olshansky wrote:On 18-Aug-2015 16:19, Andrei Alexandrescu wrote:Agreed. These are small gains though unless tight loops are concerned.On 8/18/15 2:55 AM, Dmitry Olshansky wrote:Integer in a small range is faster to switch on. Plus comparing to zero is faster, so if the common type has tag == 0 it's a net gain.On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:I think there's a misunderstanding. Pointers _are_ 64-bit integers and may be compared as such. You can use a pointer as an integer. -- AndreiOn 8/17/15 2:47 PM, Dmitry Olshansky wrote:albeit quite a deal slooower.Actually one can combine the two: - use integer type tag for everything built-in - use pointer tag for what is notBut a pointer tag can do everything that an integer tag does. -- AndreiStrictly speaking pointer with vtbl is about as fast as switch but when we have to switch on 2 types the vtbl dispatch needs to be based on 2 types instead of one. So ideally we need vtbl per pair of type to support e.g. fast binary operators on TaggedAlgebraic.But I'm talking about using pointers for indirect calls IN ADDITION to using pointers for simple integral comparison. So the comparison is not appropriate. It's better to have both options instead of just one. Andrei
Aug 18 2015
On 18-Aug-2015 19:35, Andrei Alexandrescu wrote:On 8/18/15 12:31 PM, Dmitry Olshansky wrote:If common type fast path with 0 is not relevant then the only gain of integer is being able to fit it in a couple of bytes or even reuse some vacant bits. Another thing is that function addresses are rather sparse so switch statement should do some special preprocessing to make it more dense: - subtract start of the code segment (maybe, but this won't work with DLLs though) - shift right by 2(4?) as functions are usually aligned -- Dmitry OlshanskyOn 18-Aug-2015 16:19, Andrei Alexandrescu wrote:Agreed. These are small gains though unless tight loops are concerned.On 8/18/15 2:55 AM, Dmitry Olshansky wrote:Integer in a small range is faster to switch on. Plus comparing to zero is faster, so if the common type has tag == 0 it's a net gain.On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:I think there's a misunderstanding. Pointers _are_ 64-bit integers and may be compared as such. You can use a pointer as an integer. -- AndreiOn 8/17/15 2:47 PM, Dmitry Olshansky wrote:albeit quite a deal slooower.Actually one can combine the two: - use integer type tag for everything built-in - use pointer tag for what is notBut a pointer tag can do everything that an integer tag does. -- AndreiStrictly speaking pointer with vtbl is about as fast as switch but when we have to switch on 2 types the vtbl dispatch needs to be based on 2 types instead of one. So ideally we need vtbl per pair of type to support e.g. fast binary operators on TaggedAlgebraic.But I'm talking about using pointers for indirect calls IN ADDITION to using pointers for simple integral comparison. So the comparison is not appropriate. It's better to have both options instead of just one.
Aug 18 2015
On Monday, 17 August 2015 at 18:12:02 UTC, Andrei Alexandrescu wrote:On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:From the compiler perspective, the tag is much nicer. Compiler can use jump table for instance. It is not a good solution for Variant (which needs to be able to represent arbitrary types) but if the amount of types is finite, tag is almost always a win. In the case of JSON, using a tag and packing trick, it is possible to pack everything in a 2 pointers sized struct without much trouble.On 8/12/15 5:43 AM, Sönke Ludwig wrote:Ping on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception. Andreistruct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, AndreiAnyway, I've just started to work on a generic variant of an enum based algebraic type that exploits as much static type information as possible. If that works out (compiler bugs?), it would be a great thing to have in Phobos, so maybe it's worth to delay the JSON module for that if necessary.First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
Aug 17 2015
On 8/17/15 2:51 PM, deadalnix wrote:From the compiler perspective, the tag is much nicer. Compiler can use jump table for instance.The pointer is a more direct conduit to a jump table.It is not a good solution for Variant (which needs to be able to represent arbitrary types) but if the amount of types is finite, tag is almost always a win. In the case of JSON, using a tag and packing trick, it is possible to pack everything in a 2 pointers sized struct without much trouble.Point taken. Question is if this is worth it. Andrei
Aug 17 2015
On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu wrote:On 8/17/15 2:51 PM, deadalnix wrote:Not really, because it most likely doesn't point to where you need it, but to a `TypeInfo` struct instead, which doesn't help you in a `switch` statement. Besides, you probably shouldn't compare pointers vs integers, but pointers vs enums.From the compiler perspective, the tag is much nicer. Compiler can use jump table for instance.The pointer is a more direct conduit to a jump table.Anything that makes it fit in two registers instead of three (= 2 regs + memory, in practice) is most likely worth it.It is not a good solution for Variant (which needs to be able to represent arbitrary types) but if the amount of types is finite, tag is almost always a win. In the case of JSON, using a tag and packing trick, it is possible to pack everything in a 2 pointers sized struct without much trouble.Point taken. Question is if this is worth it.
Aug 18 2015
Am Tue, 18 Aug 2015 09:10:25 +0000 schrieb "Marc Sch=C3=BCtz" <schuetzm gmx.net>:On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu=20 wrote:Here's an example with an enum tag, showing what compilers can do: http://goo.gl/NUZwNo ARM ASM is easier to read for me. Feel free to switch to X86. necessary for a final switch, probably a GDC/GCC enhancement). All instructions/data should be in the instruction cache. There's no register save / function call overhead. If you use a pointer: http://goo.gl/9kb0vQ No jump table optimization. Cache should be OK as well. No call overhead. Note how both examples can also combine the code for uint/int. If you use a function pointer instead you'll call different function. Calling a function through pointer: http://goo.gl/zTU3sA You have one indirect call. Probably hard for the branch prediction, although I don't really know. Probably also worse regarding cache. I also cheated by using one pointer only for add. In reality you'll need to store one pointer per operation or use a switch inside the called function. I think it's reasonable to expect the enum version to be faster. To be really sure we'd need some benchmarks.On 8/17/15 2:51 PM, deadalnix wrote:=20 Not really, because it most likely doesn't point to where you=20 need it, but to a `TypeInfo` struct instead, which doesn't help=20 you in a `switch` statement. Besides, you probably shouldn't=20 compare pointers vs integers, but pointers vs enums.From the compiler perspective, the tag is much nicer.=20 Compiler can use jump table for instance.The pointer is a more direct conduit to a jump table.
Aug 18 2015
On 8/18/15 7:02 AM, Johannes Pfau wrote:Am Tue, 18 Aug 2015 09:10:25 +0000 schrieb "Marc Schütz" <schuetzm gmx.net>:That's a language issue - switch does not work with any pointers. I just submitted https://issues.dlang.org/show_bug.cgi?id=14931. -- AndreiOn Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu wrote:Here's an example with an enum tag, showing what compilers can do: http://goo.gl/NUZwNo ARM ASM is easier to read for me. Feel free to switch to X86. necessary for a final switch, probably a GDC/GCC enhancement). All instructions/data should be in the instruction cache. There's no register save / function call overhead. If you use a pointer: http://goo.gl/9kb0vQOn 8/17/15 2:51 PM, deadalnix wrote:Not really, because it most likely doesn't point to where you need it, but to a `TypeInfo` struct instead, which doesn't help you in a `switch` statement. Besides, you probably shouldn't compare pointers vs integers, but pointers vs enums.From the compiler perspective, the tag is much nicer. Compiler can use jump table for instance.The pointer is a more direct conduit to a jump table.
Aug 18 2015
Am Tue, 18 Aug 2015 10:58:17 -0400 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:On 8/18/15 7:02 AM, Johannes Pfau wrote:Yes, if we enable switch for pointers we get nicer D code. No, this won't improve the ASM much: Enum values start at 0 and are consecutive. With a final switch they're also bounded. All these points do not apply to pointers. They don't start at 0, are not guaranteed to be consecutive and likely can't be used with final switch. Because of that a switch on pointers can never use jump tables.Am Tue, 18 Aug 2015 09:10:25 +0000 schrieb "Marc Sch=C3=BCtz" <schuetzm gmx.net>:=20 That's a language issue - switch does not work with any pointers. I just submitted https://issues.dlang.org/show_bug.cgi?id=3D14931. -- Andrei =20On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu wrote:Here's an example with an enum tag, showing what compilers can do: http://goo.gl/NUZwNo ARM ASM is easier to read for me. Feel free to switch to X86. be necessary for a final switch, probably a GDC/GCC enhancement). All instructions/data should be in the instruction cache. There's no register save / function call overhead. If you use a pointer: http://goo.gl/9kb0vQOn 8/17/15 2:51 PM, deadalnix wrote:Not really, because it most likely doesn't point to where you need it, but to a `TypeInfo` struct instead, which doesn't help you in a `switch` statement. Besides, you probably shouldn't compare pointers vs integers, but pointers vs enums.From the compiler perspective, the tag is much nicer. Compiler can use jump table for instance.The pointer is a more direct conduit to a jump table.
Aug 18 2015
On 8/18/15 11:39 AM, Johannes Pfau wrote:No, this won't improve the ASM much: Enum values start at 0 and are consecutive. With a final switch they're also bounded. All these points do not apply to pointers. They don't start at 0, are not guaranteed to be consecutive and likely can't be used with final switch. Because of that a switch on pointers can never use jump tables.I agree there's a margin here in favor of integers, but it's getting thin. Meanwhile, pointers maintain large advantages of principle. I suggest we pursue better use of pointers as tags instead of adding integral-tagged unions to phobos. -- Andrei
Aug 18 2015
On Tuesday, 18 August 2015 at 16:22:20 UTC, Andrei Alexandrescu wrote:On 8/18/15 11:39 AM, Johannes Pfau wrote:No, enum can also be cramed inline in the code for cheap, they can be inserted in existing structure for cheap using bits manipulations most of the time, the compiler can check that all cases are handled in an exhaustive manner. It is not getting thinner.No, this won't improve the ASM much: Enum values start at 0 and are consecutive. With a final switch they're also bounded. All these points do not apply to pointers. They don't start at 0, are not guaranteed to be consecutive and likely can't be used with final switch. Because of that a switch on pointers can never use jump tables.I agree there's a margin here in favor of integers, but it's getting thin. Meanwhile, pointers maintain large advantages of principle. I suggest we pursue better use of pointers as tags instead of adding integral-tagged unions to phobos. -- Andrei
Aug 18 2015
On Tuesday, 18 August 2015 at 14:58:08 UTC, Andrei Alexandrescu wrote:That's a language issue - switch does not work with any pointers. I just submitted https://issues.dlang.org/show_bug.cgi?id=14931. -- AndreiNo it is not. Is the set of values is not compact, no jump table.
Aug 18 2015
On 8/18/15 5:10 AM, "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net>" wrote:On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu wrote:No, in std.variant it points to a dispatcher function. -- AndreiOn 8/17/15 2:51 PM, deadalnix wrote:Not really, because it most likely doesn't point to where you need it, but to a `TypeInfo` struct insteadFrom the compiler perspective, the tag is much nicer. Compiler can use jump table for instance.The pointer is a more direct conduit to a jump table.
Aug 18 2015
Am 17.08.2015 um 20:12 schrieb Andrei Alexandrescu:On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:(reposting to NG, accidentally replied by e-mail) Some more points come to mind: - The enum is useful to be able to identify the types outside of the D code itself. For example when serializing the data to disk, or when communicating with C code. - It enables the use of pattern matching (final switch), which is often very convenient, faster, and safer than an if-else cascade. - A hypothesis is that it is faster, because there is no function call indirection involved. - It naturally enables fully statically typed operator forwarding as far as possible (have a look at the examples of the current version). A pointer based version could do this, too, but only by jumping through hoops. - The same type can be used multiple times with a different enum name. This can alternatively be solved using a Typedef!T, but I had several occasions where that proved useful. They both have their place, but IMO where the pointer approach really shines is for unbounded Variant types.struct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, AndreiPing on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception. Andrei
Aug 17 2015
Am Mon, 17 Aug 2015 20:56:18 +0200 schrieb S=C3=B6nke Ludwig <sludwig outerproduct.org>:Am 17.08.2015 um 20:12 schrieb Andrei Alexandrescu:I think Andrei's point is that a pointer tag can do most things a integral tag could as you don't have to dereference the pointer: void* tag; if (tag =3D=3D &someFunc!A) So the only benefit is that the compiler knows that the _enum_ (not simply an integral) tag is bounded. So we gain: * easier debugging (readable type tag) * potentially better codegen (jump tables fit perfectly: ordered values, 0-x, no gaps) * final switch In some cases enum tags might also be smaller than a pointer.On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:=20 (reposting to NG, accidentally replied by e-mail) =20 Some more points come to mind: =20 - The enum is useful to be able to identify the types outside of the D code itself. For example when serializing the data to disk, or when=20 communicating with C code. =20 - It enables the use of pattern matching (final switch), which is often very convenient, faster, and safer than an if-else cascade. =20 - A hypothesis is that it is faster, because there is no function call indirection involved. =20 - It naturally enables fully statically typed operator forwarding as far as possible (have a look at the examples of the current version). A pointer based version could do this, too, but only by jumping through hoops. =20 - The same type can be used multiple times with a different enum name. This can alternatively be solved using a Typedef!T, but I had several occasions where that proved useful. =20 They both have their place, but IMO where the pointer approach really=20 shines is for unbounded Variant types.struct TaggedAlgebraic(U) if (is(U =3D=3D union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, AndreiPing on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception. Andrei
Aug 17 2015
Why not working: JSONValue x = parseJSONValue(`{"a": true, "b": "test"}`); but: string str = `{"a": true, "b": "test"}`; JSONValue x = parseJSONValue(str); work fine?
Aug 17 2015
Am 17.08.2015 um 21:32 schrieb Suliman:Why not working: JSONValue x = parseJSONValue(`{"a": true, "b": "test"}`); but: string str = `{"a": true, "b": "test"}`; JSONValue x = parseJSONValue(str); work fine?toJSONValue() is the right function in this case. I've update the docs/examples to make that clearer.
Aug 17 2015
On Monday, 17 August 2015 at 20:07:24 UTC, Sönke Ludwig wrote:Am 17.08.2015 um 21:32 schrieb Suliman:I think that I miss understanding conception of ranges. I reread docs but can't understand what I am missing. Ranges is way to access of sequences, but why I can't take input from string? string is not range?Why not working: JSONValue x = parseJSONValue(`{"a": true, "b": "test"}`); but: string str = `{"a": true, "b": "test"}`; JSONValue x = parseJSONValue(str); work fine?toJSONValue() is the right function in this case. I've update the docs/examples to make that clearer.
Aug 17 2015
Am 17.08.2015 um 22:23 schrieb Suliman:On Monday, 17 August 2015 at 20:07:24 UTC, Sönke Ludwig wrote:String is a valid range, but parseJSONValue takes a *reference* to a range, because it directly consumes the range and leaves anything that appears after the JSON value in the range. toJSON() on the other hand assumes that the JSON value occupies the whole input range.toJSONValue() is the right function in this case. I've update the docs/examples to make that clearer.I think that I miss understanding conception of ranges. I reread docs but can't understand what I am missing. Ranges is way to access of sequences, but why I can't take input from string? string is not range?
Aug 17 2015
String is a valid range, but parseJSONValue takes a *reference* to a range, because it directly consumes the range and leaves anything that appears after the JSON value in the range. toJSON() on the other hand assumes that the JSON value occupies the whole input range.Yeas, I understood, but maybe it's better to rename it (or add attention in docs, I seen your changes, but I think that you should extend it more, to prevent people doing mistake that I did) , because I think that it would be hard to understand it for people who come from other languages. I am writing in D for a long time, but still some things make me confuse...Failed to download http://code.dlang.org/packages/vibe-d/0.7.24.zip: 500 Internal Server Error Possible it was issue with my provider. I will check it later. Error above was during attempted to download new version of vibed.Do you use DUB to build? It should automatically download the dependency.
Aug 17 2015
Am 17.08.2015 um 22:58 schrieb Suliman:I agree that the naming can be a bit confusing at first, but I chose those names to be consistent with std.conv (to!T and parse!T). I've also just noticed that the parser module example erroneously uses parseJSONValue(). With proper examples, this should hopefully not be that big of a deal.String is a valid range, but parseJSONValue takes a *reference* to a range, because it directly consumes the range and leaves anything that appears after the JSON value in the range. toJSON() on the other hand assumes that the JSON value occupies the whole input range.Yeas, I understood, but maybe it's better to rename it (or add attention in docs, I seen your changes, but I think that you should extend it more, to prevent people doing mistake that I did) , because I think that it would be hard to understand it for people who come from other languages. I am writing in D for a long time, but still some things make me confuse...
Aug 17 2015
Also I can't build last build from git. I am getting error: source\stdx\data\json\value.d(25,8): Error: module taggedalgebraic is in file 'taggedalgebraic.d' which cannot be read
Aug 17 2015
Am 17.08.2015 um 22:31 schrieb Suliman:Also I can't build last build from git. I am getting error: source\stdx\data\json\value.d(25,8): Error: module taggedalgebraic is in file 'taggedalgebraic.d' which cannot be readDo you use DUB to build? It should automatically download the dependency. Alternatively, it's located here: https://github.com/s-ludwig/taggedalgebraic/blob/master/source/taggedalgebraic.d
Aug 17 2015
Also could you look at theme http://stackoverflow.com/questions/32033817/how-to-insert-date-to-arangodb And suggest your variant or approve on of existent.
Aug 17 2015
On 8/17/15 2:56 PM, Sönke Ludwig wrote:- The enum is useful to be able to identify the types outside of the D code itself. For example when serializing the data to disk, or when communicating with C code.OK.- It enables the use of pattern matching (final switch), which is often very convenient, faster, and safer than an if-else cascade.Sounds tenuous.- A hypothesis is that it is faster, because there is no function call indirection involved.Again: pointers do all integrals do. To compare: if (myptr == ThePtrOf!int) { ... this is an int ... } I want to make clear that this is understood.- It naturally enables fully statically typed operator forwarding as far as possible (have a look at the examples of the current version). A pointer based version could do this, too, but only by jumping through hoops.I'm unclear on that. Could you please point me to the actual file and lines?- The same type can be used multiple times with a different enum name. This can alternatively be solved using a Typedef!T, but I had several occasions where that proved useful.Unclear on this. Andrei
Aug 17 2015
Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:On 8/17/15 2:56 PM, Sönke Ludwig wrote:It's more convenient/readable in cases where a complex type is used (typeID == Type.object vs. has!(JSONValue[string]). This is especially true if the type is ever changed (or parametric) and all has!()/get!() code needs to be adjusted accordingly. It's faster, even if there is no indirect call involved in the pointer case, because the compiler can emit efficient jump tables instead of generating a series of conditional jumps (if-else-cascade). It's safer because of the possibility to use final switch in addition to a normal switch. I wouldn't call that tenuous.- The enum is useful to be able to identify the types outside of the D code itself. For example when serializing the data to disk, or when communicating with C code.OK.- It enables the use of pattern matching (final switch), which is often very convenient, faster, and safer than an if-else cascade.Sounds tenuous.Got that.- A hypothesis is that it is faster, because there is no function call indirection involved.Again: pointers do all integrals do. To compare: if (myptr == ThePtrOf!int) { ... this is an int ... } I want to make clear that this is understood.See the operator implementation code [1] that is completely statically typed until the final "switch" happens [2]. You can of course do the same for the pointer based Algebraic, but that would just duplicate/override the code that is already implemented by the pointer method.- It naturally enables fully statically typed operator forwarding as far as possible (have a look at the examples of the current version). A pointer based version could do this, too, but only by jumping through hoops.I'm unclear on that. Could you please point me to the actual file and lines?I'd say this is just a little perk of the representation but not a hard argument since it can be achieved in a different way relatively easily. [1]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L145 [2]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L551- The same type can be used multiple times with a different enum name. This can alternatively be solved using a Typedef!T, but I had several occasions where that proved useful.Unclear on this.
Aug 18 2015
On 8/18/15 1:21 PM, Sönke Ludwig wrote:Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:Well I guess I would, but no matter. It's something where reasonable people may disagree.On 8/17/15 2:56 PM, Sönke Ludwig wrote:It's more convenient/readable in cases where a complex type is used (typeID == Type.object vs. has!(JSONValue[string]). This is especially true if the type is ever changed (or parametric) and all has!()/get!() code needs to be adjusted accordingly. It's faster, even if there is no indirect call involved in the pointer case, because the compiler can emit efficient jump tables instead of generating a series of conditional jumps (if-else-cascade). It's safer because of the possibility to use final switch in addition to a normal switch. I wouldn't call that tenuous.- The enum is useful to be able to identify the types outside of the D code itself. For example when serializing the data to disk, or when communicating with C code.OK.- It enables the use of pattern matching (final switch), which is often very convenient, faster, and safer than an if-else cascade.Sounds tenuous.Classic code factoring can be done to avoid duplication.Got that.- A hypothesis is that it is faster, because there is no function call indirection involved.Again: pointers do all integrals do. To compare: if (myptr == ThePtrOf!int) { ... this is an int ... } I want to make clear that this is understood.See the operator implementation code [1] that is completely statically typed until the final "switch" happens [2]. You can of course do the same for the pointer based Algebraic, but that would just duplicate/override the code that is already implemented by the pointer method.- It naturally enables fully statically typed operator forwarding as far as possible (have a look at the examples of the current version). A pointer based version could do this, too, but only by jumping through hoops.I'm unclear on that. Could you please point me to the actual file and lines?Thanks. AndreiI'd say this is just a little perk of the representation but not a hard argument since it can be achieved in a different way relatively easily. [1]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L145 [2]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L551- The same type can be used multiple times with a different enum name. This can alternatively be solved using a Typedef!T, but I had several occasions where that proved useful.Unclear on this.
Aug 21 2015
Am 21.08.2015 um 18:56 schrieb Andrei Alexandrescu:On 8/18/15 1:21 PM, Sönke Ludwig wrote:It depends on the perspective/use case, so it's surely not unreasonable to disagree here. But I'm especially not happy with the "final switch" argument getting dismissed so easily. By the same logic, we could also question the existence of "final switch", or even "switch", as a feature in the first place. Performance benefits are certainly nice, too, but that's really just an implementation detail. The important trait is that the types get a name and that they form an enumerable set. This is quite similar to comparing a struct with named members to an anonymous Tuple!(T...).Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:Well I guess I would, but no matter. It's something where reasonable people may disagree.On 8/17/15 2:56 PM, Sönke Ludwig wrote:It's more convenient/readable in cases where a complex type is used (typeID == Type.object vs. has!(JSONValue[string]). This is especially true if the type is ever changed (or parametric) and all has!()/get!() code needs to be adjusted accordingly. It's faster, even if there is no indirect call involved in the pointer case, because the compiler can emit efficient jump tables instead of generating a series of conditional jumps (if-else-cascade). It's safer because of the possibility to use final switch in addition to a normal switch. I wouldn't call that tenuous.- The enum is useful to be able to identify the types outside of the D code itself. For example when serializing the data to disk, or when communicating with C code.OK.- It enables the use of pattern matching (final switch), which is often very convenient, faster, and safer than an if-else cascade.Sounds tenuous.
Aug 22 2015
On Wednesday, 12 August 2015 at 08:21:41 UTC, Sönke Ludwig wrote:Just to state explicitly what I mean: This strategy has the most efficient in-memory storage format and profits from all the static type checking niceties of the compiler. It also means that there is a documented schema in the code that be used for reference by the developers and that will automatically be verified by the serializer, resulting in less and better checked code. So where applicable I claim that this is the best strategy to work with such data. For maximum efficiency, it can also be transparently combined with the pull parser. The pull parser can for example be used to jump between array entries and the serializer then reads each single array entry.Thing is, the schema is not always known perfectly? Typical case is JSON used for configuration, and diverse version of the software adding new configurations capabilities, or ignoring old ones.
Aug 12 2015
Am 12.08.2015 um 19:10 schrieb deadalnix:On Wednesday, 12 August 2015 at 08:21:41 UTC, Sönke Ludwig wrote:For example in the serialization framework of vibe.d you can have optional or Nullable fields, you can choose to ignore or error out on unknown fields, and you can have fields of type "Json" or associative arrays to match arbitrary structures. This usually gives enough flexibility, assuming that the program is just interested in fields that it knows about. Of course there are situations where you really just want to access the raw JSON structure, possibly because you are just interested in a small subset of the data. Both, the DOM or the pull parser based approaches, fit in there, based on convenience vs. performance considerations. But things like storing data as JSON in a database or implementing a JSON based protocol usually fit the schema based approach perfectly.Just to state explicitly what I mean: This strategy has the most efficient in-memory storage format and profits from all the static type checking niceties of the compiler. It also means that there is a documented schema in the code that be used for reference by the developers and that will automatically be verified by the serializer, resulting in less and better checked code. So where applicable I claim that this is the best strategy to work with such data. For maximum efficiency, it can also be transparently combined with the pull parser. The pull parser can for example be used to jump between array entries and the serializer then reads each single array entry.Thing is, the schema is not always known perfectly? Typical case is JSON used for configuration, and diverse version of the software adding new configurations capabilities, or ignoring old ones.
Aug 12 2015
On 8/12/2015 10:10 AM, deadalnix wrote:Thing is, the schema is not always known perfectly? Typical case is JSON used for configuration, and diverse version of the software adding new configurations capabilities, or ignoring old ones.Hah, I'd like to replace dmd.conf with a .json file.
Aug 12 2015
On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:On 8/12/2015 10:10 AM, deadalnix wrote:Not .json! No configuration file should be in a format that doesn't support comments.Thing is, the schema is not always known perfectly? Typical case is JSON used for configuration, and diverse version of the software adding new configurations capabilities, or ignoring old ones.Hah, I'd like to replace dmd.conf with a .json file.
Aug 13 2015
On 8/13/2015 5:22 AM, CraigDillabaugh wrote:No configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]
Aug 13 2015
On Friday, 14 August 2015 at 00:16:47 UTC, Walter Bright wrote:On 8/13/2015 5:22 AM, CraigDillabaugh wrote:You are cheating :o) There do seem to be some ways to comment JSON files, but they all feel, and look like hacks. I think something like YAML or SDLang even would be better. Anyway, at least you aren't proposing XML, so I won't complain too loudly.No configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]
Aug 13 2015
On 8/13/15 8:16 PM, Walter Bright wrote:On 8/13/2015 5:22 AM, CraigDillabaugh wrote:There can't be two comments with the same key though. -- AndreiNo configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]
Aug 14 2015
On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:On 8/13/15 8:16 PM, Walter Bright wrote:This is invalid (though probably unintentionally). An array cannot have names for elements.On 8/13/2015 5:22 AM, CraigDillabaugh wrote:No configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]There can't be two comments with the same key though. -- AndreiWhy not? I believe this is valid json: { "comment" : "this is the first value", "value1" : 42, "comment" : "this is the second value", "value2" : 101 } Though, I would much rather see a better comment tag than "comment":. json isn't ideal for this. -Steve
Aug 14 2015
On Friday, 14 August 2015 at 13:10:53 UTC, Steven Schveighoffer wrote:On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:http://tools.ietf.org/html/rfc7159 «The names within an object SHOULD be unique.» «An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates. JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.»On 8/13/15 8:16 PM, Walter Bright wrote:This is invalid (though probably unintentionally). An array cannot have names for elements.On 8/13/2015 5:22 AM, CraigDillabaugh wrote:No configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]There can't be two comments with the same key though. -- AndreiWhy not? I believe this is valid json:
Aug 14 2015
On 8/14/15 9:10 AM, Steven Schveighoffer wrote:On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:You're right. Good convo: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplica e-keys-in-an-object -- AndreiOn 8/13/15 8:16 PM, Walter Bright wrote:This is invalid (though probably unintentionally). An array cannot have names for elements.On 8/13/2015 5:22 AM, CraigDillabaugh wrote:No configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]There can't be two comments with the same key though. -- AndreiWhy not? I believe this is valid json: { "comment" : "this is the first value", "value1" : 42, "comment" : "this is the second value", "value2" : 101 } Though, I would much rather see a better comment tag than "comment":. json isn't ideal for this.
Aug 14 2015
On Friday, 14 August 2015 at 13:30:44 UTC, Andrei Alexandrescu wrote:On 8/14/15 9:10 AM, Steven Schveighoffer wrote:No, he is wrong, and even if he was right, he would still be wrong. JSON objects are unordered so if you read then write you can get: { "comment" : "this is the second value", "value1" : 42, "value2" : 101 }On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:You're right. Good convo: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplica e-keys-in-an-object -- AndreiOn 8/13/15 8:16 PM, Walter Bright wrote:This is invalid (though probably unintentionally). An array cannot have names for elements.On 8/13/2015 5:22 AM, CraigDillabaugh wrote:No configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]There can't be two comments with the same key though. -- AndreiWhy not? I believe this is valid json: { "comment" : "this is the first value", "value1" : 42, "comment" : "this is the second value", "value2" : 101 } Though, I would much rather see a better comment tag than "comment":. json isn't ideal for this.
Aug 14 2015
On 8/14/15 9:37 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:On Friday, 14 August 2015 at 13:30:44 UTC, Andrei Alexandrescu wrote:Yes, that's what I checked first :)On 8/14/15 9:10 AM, Steven Schveighoffer wrote:On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:You're right. Good convo: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-objectOn 8/13/15 8:16 PM, Walter Bright wrote:This is invalid (though probably unintentionally). An array cannot have names for elements.On 8/13/2015 5:22 AM, CraigDillabaugh wrote:No configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]There can't be two comments with the same key though. -- AndreiWhy not? I believe this is valid json: { "comment" : "this is the first value", "value1" : 42, "comment" : "this is the second value", "value2" : 101 } Though, I would much rather see a better comment tag than "comment":. json isn't ideal for this.No, he is wrong, and even if he was right, he would still be wrong. JSON objects are unordered so if you read then write you can get: { "comment" : "this is the second value", "value1" : 42, "value2" : 101 }Sure, but: a) we aren't writing b) comments are for the human reader, not for the program. Dmd should ignore the comments, and it doesn't matter the order. c) it's not important, I think we all agree a format that has specific allowances for comments is better than json. -Steve
Aug 14 2015
On Friday, 14 August 2015 at 14:09:25 UTC, Steven Schveighoffer wrote:a) we aren't writing b) comments are for the human reader, not for the program. Dmd should ignore the comments, and it doesn't matter the order. c) it's not important, I think we all agree a format that has specific allowances for comments is better than json.One should have a config file format for which there are standard libraries that preserves structure and comments. It is quite common to have tools that read and write config files.
Aug 14 2015
On 8/14/15 10:44 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:On Friday, 14 August 2015 at 14:09:25 UTC, Steven Schveighoffer wrote:And that would be possible here. JSON file format says nothing about how the data is stored in your library. But again, not important. -Stevea) we aren't writing b) comments are for the human reader, not for the program. Dmd should ignore the comments, and it doesn't matter the order. c) it's not important, I think we all agree a format that has specific allowances for comments is better than json.One should have a config file format for which there are standard libraries that preserves structure and comments. It is quite common to have tools that read and write config files.
Aug 14 2015
On Friday, 14 August 2015 at 15:11:41 UTC, Steven Schveighoffer wrote:And that would be possible here. JSON file format says nothing about how the data is stored in your library. But again, not important.It isn't important since JSON is not too good as a config file format, but it is important when considering other formats. When you read a JSON file into Python or Javascript and write it back all dictionary objects will be restructured. For instance, when a tool reads a config file and removes attributes it is desirable that removed attributes are commented out. With JSON you would have to hack around it like this: [ {fieldname1:value1}, {fieldname2:value2} ] Which is ugly. I think it would be nice if all D tooling standardized on YAML and provided a convenient DOM for it. It is used quite a lot and editors have support for it.
Aug 14 2015
On Friday, 14 August 2015 at 15:29:12 UTC, Ola Fosheim Grøstad wrote:On Friday, 14 August 2015 at 15:11:41 UTC, Steven Schveighoffer wrote:It doesn't matter what you think of JSON. JSON is widely used an needed in the standard lib. PERIOD.And that would be possible here. JSON file format says nothing about how the data is stored in your library. But again, not important.It isn't important since JSON is not too good as a config file format, but it is important when considering other formats. When you read a JSON file into Python or Javascript and write it back all dictionary objects will be restructured. For instance, when a tool reads a config file and removes attributes it is desirable that removed attributes are commented out. With JSON you would have to hack around it like this: [ {fieldname1:value1}, {fieldname2:value2} ] Which is ugly. I think it would be nice if all D tooling standardized on YAML and provided a convenient DOM for it. It is used quite a lot and editors have support for it.
Aug 14 2015
On Friday, 14 August 2015 at 17:31:02 UTC, deadalnix wrote:JSON is widely used an needed in the standard lib. PERIOD.The discussion was about suitability as a standard config file format for D not whether it should be in the standard lib. JSON, XML and YAML all belong in a standard lib.
Aug 14 2015
On 8/14/15 1:30 PM, deadalnix wrote:It doesn't matter what you think of JSON. JSON is widely used an needed in the standard lib. PERIOD.I think you are missing that this sub-discussion is about using json to replace dmd configuration file. -Steve
Aug 14 2015
On Friday, 14 August 2015 at 17:40:01 UTC, Steven Schveighoffer wrote:On 8/14/15 1:30 PM, deadalnix wrote:dub uses sdlang, why not dmd?It doesn't matter what you think of JSON. JSON is widely used an needed in the standard lib. PERIOD.I think you are missing that this sub-discussion is about using json to replace dmd configuration file. -Steve
Aug 14 2015
On 8/14/2015 6:30 AM, Andrei Alexandrescu wrote:On 8/14/15 9:10 AM, Steven Schveighoffer wrote:When going for portability, it is not a good idea to emit duplicate keys because many json parsers fail on it. For our own json readers, such as reading a dmd.json file with our own parser, it should be fine.Though, I would much rather see a better comment tag than "comment":. json isn't ideal for this.You're right. Good convo: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object -- Andrei
Aug 14 2015
On 8/14/2015 5:51 AM, Andrei Alexandrescu wrote:On 8/13/15 8:16 PM, Walter Bright wrote:Should be { }, not [ ]On 8/13/2015 5:22 AM, CraigDillabaugh wrote:No configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]There can't be two comments with the same key though. -- AndreiThe Json spec doesn't say that - it doesn't specify any semantic meaning.
Aug 14 2015
On 8/14/2015 1:30 PM, Walter Bright wrote:On 8/14/2015 5:51 AM, Andrei Alexandrescu wrote:That is, the ECMA 404 spec. There seems to be more than one JSON spec. www.ecma-international.org/.../files/.../ECMA-404.pdfOn 8/13/15 8:16 PM, Walter Bright wrote:Should be { }, not [ ]On 8/13/2015 5:22 AM, CraigDillabaugh wrote:No configuration file should be in a format that doesn't support comments.[ "comment" : "and you thought it couldn't have comments!" ]There can't be two comments with the same key though. -- AndreiThe Json spec doesn't say that - it doesn't specify any semantic meaning.
Aug 14 2015
On 08/14/2015 04:33 PM, Walter Bright wrote:That is, the ECMA 404 spec. There seems to be more than one JSON spec. www.ecma-international.org/.../files/.../ECMA-404.pdfAmusingly, that "ECMA-404" link results in an actual HTTP 404.
Aug 21 2015
On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:Hah, I'd like to replace dmd.conf with a .json file.There's an awful lot of people out there replacing json with more ini-like files....
Aug 13 2015
On Friday, 14 August 2015 at 00:18:39 UTC, Adam D. Ruppe wrote:On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:Referring to TOML? https://github.com/toml-lang/tomlHah, I'd like to replace dmd.conf with a .json file.There's an awful lot of people out there replacing json with more ini-like files....
Aug 13 2015
On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:We've currently invented our own, rather stupid and limited, format. There's no point to it over .json.Hah, I'd like to replace dmd.conf with a .json file.There's an awful lot of people out there replacing json with more ini-like files....
Aug 13 2015
On 14-Aug-2015 03:48, Walter Bright wrote:On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:YAML is (plus/minus braces) the same but supports comments and is increasingly popular for hierarchical configuration files. -- Dmitry OlshanskyOn Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:We've currently invented our own, rather stupid and limited, format. There's no point to it over .json.Hah, I'd like to replace dmd.conf with a .json file.There's an awful lot of people out there replacing json with more ini-like files....
Aug 13 2015
On 8/13/2015 11:54 PM, Dmitry Olshansky wrote:On 14-Aug-2015 03:48, Walter Bright wrote:Yes, but we (will) have a .json parser in Phobos.On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:YAML is (plus/minus braces) the same but supports comments and is increasingly popular for hierarchical configuration files.On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:We've currently invented our own, rather stupid and limited, format. There's no point to it over .json.Hah, I'd like to replace dmd.conf with a .json file.There's an awful lot of people out there replacing json with more ini-like files....
Aug 14 2015
On 2015-08-14 10:04, Walter Bright wrote:Yes, but we (will) have a .json parser in Phobos.Time to add a YAML parser ;) -- /Jacob Carlborg
Aug 14 2015
On 15/08/2015 12:40 a.m., Jacob Carlborg wrote:On 2015-08-14 10:04, Walter Bright wrote:Heyyy Sonke ;)Yes, but we (will) have a .json parser in Phobos.Time to add a YAML parser ;)
Aug 14 2015
On Friday, 14 August 2015 at 12:40:32 UTC, Jacob Carlborg wrote:On 2015-08-14 10:04, Walter Bright wrote:I think kiith-sa has started on that: https://github.com/kiith-sa/D-YAMLYes, but we (will) have a .json parser in Phobos.Time to add a YAML parser ;)
Aug 14 2015
On 8/14/2015 5:40 AM, Jacob Carlborg wrote:On 2015-08-14 10:04, Walter Bright wrote:That's a good idea, but since dmd already emits json and requires incorporation of the json code, the fewer file formats it has to deal with, the better. Config files will work fine with json format.Yes, but we (will) have a .json parser in Phobos.Time to add a YAML parser ;)
Aug 14 2015
On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:On 8/14/2015 5:40 AM, Jacob Carlborg wrote:Walter, and what I should to do for commenting stringin config for test purpose? How it's can be done with json? I really think that dmd should use same format as dubOn 2015-08-14 10:04, Walter Bright wrote:That's a good idea, but since dmd already emits json and requires incorporation of the json code, the fewer file formats it has to deal with, the better. Config files will work fine with json format.Yes, but we (will) have a .json parser in Phobos.Time to add a YAML parser ;)
Aug 14 2015
On 8/14/2015 9:58 PM, suliman wrote:On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:{ "comment" : "this is a comment" }Config files will work fine with json format.Walter, and what I should to do for commenting stringin config for test purpose? How it's can be done with json?I really think that dmd should use same format as dubjson is a format that everybody understands, and dmd has json code already in it (as dmd generates json files)
Aug 14 2015
On Saturday, 15 August 2015 at 05:03:52 UTC, Walter Bright wrote:On 8/14/2015 9:58 PM, suliman wrote:And you end up with each D tool having their own config format… :-( http://www.json2yaml.com/On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:{ "comment" : "this is a comment" }Config files will work fine with json format.Walter, and what I should to do for commenting stringin config for test purpose? How it's can be done with json?I really think that dmd should use same format as dubjson is a format that everybody understands, and dmd has json code already in it (as dmd generates json files)
Aug 15 2015
On 08/15/2015 01:03 AM, Walter Bright wrote:On 8/14/2015 9:58 PM, suliman wrote:I'll take an "invented our own, rather stupid and limited, format" over comments that ugly any day. Seriously, with DUB, I've been using json for configuration file a lot lately, and dmd.conf is a way nicer config format. There's very good reason DUB's added an alternate format.On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:{ "comment" : "this is a comment" }Config files will work fine with json format.Walter, and what I should to do for commenting stringin config for test purpose? How it's can be done with json?
Aug 21 2015
On 14-Aug-2015 11:04, Walter Bright wrote:On 8/13/2015 11:54 PM, Dmitry Olshansky wrote:We actually have YAML parser in DUB repository plus so that can be copied over to the compiler source interim. And doesn't have to be particularly fast it just have to work resonably well. -- Dmitry OlshanskyOn 14-Aug-2015 03:48, Walter Bright wrote:Yes, but we (will) have a .json parser in Phobos.On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:YAML is (plus/minus braces) the same but supports comments and is increasingly popular for hierarchical configuration files.On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:We've currently invented our own, rather stupid and limited, format. There's no point to it over .json.Hah, I'd like to replace dmd.conf with a .json file.There's an awful lot of people out there replacing json with more ini-like files....
Aug 14 2015
On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:However, my goal when implementing this has never been to make the DOM representation as efficient as possible. The simple reason is that a DOM representation is inherently inefficient when compared to operating on the structure using either the pull parser or using a deserializer that directly converts into a static D type. IMO these should be advertised instead of trying to milk a dead cow (in terms of performance).Maybe it is better to just focus on having a top-of-the-line parser and then let competing DOM implementations build on top of it. I'm personally only interested in structured JSON, I think most webapps use structured JSON informally.
Aug 12 2015
Am 11.08.2015 um 19:08 schrieb Atila Neves:On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:I think we really need to have an informal pre-vote about the BigInt and DOM efficiency vs. functionality issues. Basically there are three options for each: 1. Keep them: May have an impact on compile time for big DOMs (run time/memory consumption wouldn't be affected if a pointer to BigInt is stored). But provides an out-of-the-box experience for a broad set of applications. 2. Remove them: Results in a slim and clean API that is fast (to run/compile), but also one that will be less useful for certain applications. 3. Make them CT configurable: Best of both worlds in terms of speed, at the cost of a more complex API. 4. Use a string representation instead of BigInt: This has it's own set of issues, but would also enable some special use cases [1] [2] ([2] is also solved by BigInt/Decimal support, though). I'd also like to postpone the main vote, if there are no objections, until the question of using a general enum based alternative to Algebraic is answered. I've published an initial candidate for this now [3]. These were, AFAICS, the only major open issues (a decision for an opt() variant would be nice, but fortunately that's not a fundamental decision in any way). There is also the topic of avoiding any redundancy in symbol names, which I don't agree with, but I would of course change it if the inclusion depends on that. [1]: https://github.com/rejectedsoftware/vibe.d/issues/431 [2]: http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/10098/ [3]: http://code.dlang.org/packages/taggedalgebraicStart of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaI forgot to give warnings that the two week period was about to be up, and was unsure from comments if this would be ready for voting, so let's give it another two days unless there are objections. Atila
Aug 13 2015
On 8/13/2015 3:51 AM, Sönke Ludwig wrote:These were, AFAICS, the only major open issues (a decision for an opt() variant would be nice, but fortunately that's not a fundamental decision in any way).1. What about the issue of having the API be a composable range interface? http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I.e. the input range should be the FIRST argument, not the last. 2. Why are integers acceptable as lexer input? The spec specifies Unicode. 3. Why are there 4 functions that do the same thing? http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html After all, there already is a http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html
Aug 13 2015
Am 14.08.2015 um 02:26 schrieb Walter Bright:On 8/13/2015 3:51 AM, Sönke Ludwig wrote:Hm, it *is* the first function argument, just the last template argument.These were, AFAICS, the only major open issues (a decision for an opt() variant would be nice, but fortunately that's not a fundamental decision in any way).1. What about the issue of having the API be a composable range interface? http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I.e. the input range should be the FIRST argument, not the last.2. Why are integers acceptable as lexer input? The spec specifies Unicode.In this case, the lexer will perform on-the-fly UTF validation of the input. It can do so more efficiently than first validating the input using a wrapper range, because it has to check the value of most incoming code units anyway.3. Why are there 4 functions that do the same thing? http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html After all, there already is a http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.htmlThere are two classes of functions that are not covered by GeneratorOptions: writing to a stream or returning a string. But you are right that pretty printing should be controlled by GeneratorOptions. I'll fix that. The suggestion to use pretty printing by default also sounds good.
Aug 13 2015
On 8/13/2015 11:52 PM, Sönke Ludwig wrote:Am 14.08.2015 um 02:26 schrieb Walter Bright:Ok, my mistake. I didn't look at the others. I don't know what 'isStringInputRange' is. Whatever it is, it should be a 'range of char'.On 8/13/2015 3:51 AM, Sönke Ludwig wrote:Hm, it *is* the first function argument, just the last template argument.These were, AFAICS, the only major open issues (a decision for an opt() variant would be nice, but fortunately that's not a fundamental decision in any way).1. What about the issue of having the API be a composable range interface? http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I.e. the input range should be the FIRST argument, not the last.There is no reason to validate UTF-8 input. The only place where non-ASCII code units can even legally appear is inside strings, and there they can just be copied verbatim while looking for the end of the string.2. Why are integers acceptable as lexer input? The spec specifies Unicode.In this case, the lexer will perform on-the-fly UTF validation of the input. It can do so more efficiently than first validating the input using a wrapper range, because it has to check the value of most incoming code units anyway.Why do both? Always return an input range. If the user wants a string, he can pipe the input range to a string generator, such as .array3. Why are there 4 functions that do the same thing? http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html After all, there already is a http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.htmlThere are two classes of functions that are not covered by GeneratorOptions: writing to a stream or returning a string.But you are right that pretty printing should be controlled by GeneratorOptions. I'll fix that. The suggestion to use pretty printing by default also sounds good.Thanks
Aug 14 2015
Am 14.08.2015 um 10:17 schrieb Walter Bright:On 8/13/2015 11:52 PM, Sönke Ludwig wrote:I'll rename it to isCharInputRange. We don't have something like that in Phobos, right?Am 14.08.2015 um 02:26 schrieb Walter Bright:Ok, my mistake. I didn't look at the others. I don't know what 'isStringInputRange' is. Whatever it is, it should be a 'range of char'.On 8/13/2015 3:51 AM, Sönke Ludwig wrote:Hm, it *is* the first function argument, just the last template argument.These were, AFAICS, the only major open issues (a decision for an opt() variant would be nice, but fortunately that's not a fundamental decision in any way).1. What about the issue of having the API be a composable range interface? http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I.e. the input range should be the FIRST argument, not the last.The idea is to assume that any char based input is already valid UTF (as D defines it), while integer based input comes from an unverified source, so that it still has to be validated before being cast/copied into a 'string'. I think this is a sensible approach, both semantically and performance-wise.There is no reason to validate UTF-8 input. The only place where non-ASCII code units can even legally appear is inside strings, and there they can just be copied verbatim while looking for the end of the string.2. Why are integers acceptable as lexer input? The spec specifies Unicode.In this case, the lexer will perform on-the-fly UTF validation of the input. It can do so more efficiently than first validating the input using a wrapper range, because it has to check the value of most incoming code units anyway.Convenience for one. The lack of number to input range conversion functions is another concern. I'm not really keen to implement an input range style floating-point to string conversion routine just for this module. Finally, I'm a little worried about performance. The output range based approach can keep a lot of state implicitly using the program counter register. But an input range would explicitly have to keep track of the current JSON element, as well as the current character/state within that element (and possibly one level deeper, for example for escape sequences). This means that it will require either multiple branches or indirection for each popFront().Why do both? Always return an input range. If the user wants a string, he can pipe the input range to a string generator, such as .array3. Why are there 4 functions that do the same thing? http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html After all, there already is a http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.htmlThere are two classes of functions that are not covered by GeneratorOptions: writing to a stream or returning a string.
Aug 15 2015
I talked with few people and they said that they are prefer current vibed's json implementation. What's wrong with it? Why do not stay old? They look more easier that new... IMHO API of current is much harder.
Aug 15 2015
On Saturday, 15 August 2015 at 17:07:36 UTC, Suliman wrote:I talked with few people and they said that they are prefer current vibed's json implementation. What's wrong with it? Why do not stay old? They look more easier that new... IMHO API of current is much harder.New stream parser is fast! (See prior thread on benchmarks).
Aug 15 2015
On 8/15/2015 3:18 AM, Sönke Ludwig wrote:That's right, there isn't one. But I use: if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char)) I'm not a fan of more names for trivia, the deluge of names has its own costs.I don't know what 'isStringInputRange' is. Whatever it is, it should be a 'range of char'.I'll rename it to isCharInputRange. We don't have something like that in Phobos, right?The json parser will work fine without doing any validation at all. I've been implementing string handling code in Phobos with the idea of doing validation only if the algorithm requires it, and only for those parts that require it. There are many validation algorithms in Phobos one can tack on - having two implementations of every algorithm, one with an embedded reinvented validation and one without - is too much. The general idea with algorithms is that they do not combine things, but they enable composition.There is no reason to validate UTF-8 input. The only place where non-ASCII code units can even legally appear is inside strings, and there they can just be copied verbatim while looking for the end of the string.The idea is to assume that any char based input is already valid UTF (as D defines it), while integer based input comes from an unverified source, so that it still has to be validated before being cast/copied into a 'string'. I think this is a sensible approach, both semantically and performance-wise.Back to the previous point, that means that every algorithm in Phobos should have two versions, one that returns a range and the other a string? All these variations will result in a combinatorical explosion. The other problem, of course, is that returning a string means the algorithm has to decide how to allocate that string. As much as possible, algorithms should not be making allocation decisions.Why do both? Always return an input range. If the user wants a string, he can pipe the input range to a string generator, such as .arrayConvenience for one.The lack of number to input range conversion functions is another concern. I'm not really keen to implement an input range style floating-point to string conversion routine just for this module.Not sure what you mean. Phobos needs such routines anyway, and you still have to do something about floating point.Finally, I'm a little worried about performance. The output range based approach can keep a lot of state implicitly using the program counter register. But an input range would explicitly have to keep track of the current JSON element, as well as the current character/state within that element (and possibly one level deeper, for example for escape sequences). This means that it will require either multiple branches or indirection for each popFront().Often this is made up for by not needing to allocate storage. Also, that state is in the cached "hot zone" on top of the stack, which is much faster to access than a cold uninitialized array. I share your concern with performance, and I had very good results with Warp by keeping all the state on the stack in this manner.
Aug 15 2015
On 16-Aug-2015 03:50, Walter Bright wrote:On 8/15/2015 3:18 AM, Sönke Ludwig wrote:Aye.The json parser will work fine without doing any validation at all. I've been implementing string handling code in Phobos with the idea of doing validation only if the algorithm requires it, and only for those parts that require it.There is no reason to validate UTF-8 input. The only place where non-ASCII code units can even legally appear is inside strings, and there they can just be copied verbatim while looking for the end of the string.The idea is to assume that any char based input is already valid UTF (as D defines it), while integer based input comes from an unverified source, so that it still has to be validated before being cast/copied into a 'string'. I think this is a sensible approach, both semantically and performance-wise.There are many validation algorithms in Phobos one can tack on - having two implementations of every algorithm, one with an embedded reinvented validation and one without - is too much.Actually there are next to none. `validate` that throws on failed validation is a misnomer.The general idea with algorithms is that they do not combine things, but they enable composition.At the lower level such as tokenizers combining a couple of simple steps together makes sense because it makes things run faster. It usually eliminates the need for temporary result that must be digestible by the next range. For instance "combining" decoding and character classification one may side-step generating the codepoint value itself (because now it doesn't have to produce it for the top-level algorithm). -- Dmitry Olshansky
Aug 15 2015
On 8/15/2015 11:52 PM, Dmitry Olshansky wrote:For instance "combining" decoding and character classification one may side-step generating the codepoint value itself (because now it doesn't have to produce it for the top-level algorithm).Perhaps, but I wouldn't be convinced without benchmarks to prove it on a case-by-case basis. But it's moot, as json lexing never needs to decode.
Aug 16 2015
On 16-Aug-2015 11:30, Walter Bright wrote:On 8/15/2015 11:52 PM, Dmitry Olshansky wrote:About x2 faster then decode + check-if-alphabetic on my stuff: https://github.com/DmitryOlshansky/gsoc-bench-2012 I haven't updated it in a while. There are nice bargraphs for decoding versions by David comparing DMD vs LDC vs GDC: Page 15 at http://dconf.org/2013/talks/nadlinger.pdfFor instance "combining" decoding and character classification one may side-step generating the codepoint value itself (because now it doesn't have to produce it for the top-level algorithm).Perhaps, but I wouldn't be convinced without benchmarks to prove it on a case-by-case basis.But it's moot, as json lexing never needs to decode.Agreed. -- Dmitry Olshansky
Aug 16 2015
On 8/16/2015 3:39 AM, Dmitry Olshansky wrote:About x2 faster then decode + check-if-alphabetic on my stuff: https://github.com/DmitryOlshansky/gsoc-bench-2012 I haven't updated it in a while. There are nice bargraphs for decoding versions by David comparing DMD vs LDC vs GDC: Page 15 at http://dconf.org/2013/talks/nadlinger.pdfThank you.
Aug 16 2015
Am 16.08.2015 um 02:50 schrieb Walter Bright:On 8/15/2015 3:18 AM, Sönke Ludwig wrote:Good, I'll use `if (isInputRange!R && (isSomeChar!(ElementEncodingType!R) || isIntegral!(ElementEncodingType!R))`. It's just used in number of places and quite a bit more verbose (twice as long) and I guess a large number of algorithms in Phobos accept char ranges, so that may actually warrant a name in this case.That's right, there isn't one. But I use: if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char)) I'm not a fan of more names for trivia, the deluge of names has its own costs.I don't know what 'isStringInputRange' is. Whatever it is, it should be a 'range of char'.I'll rename it to isCharInputRange. We don't have something like that in Phobos, right?Yes, and it won't do that if a char range is passed in. If the integral range path gets removed there are basically two possibilities left, perform the validation up-front (slower), or risk UTF exceptions in unrelated parts of the code base. I don't see why we shouldn't take the opportunity for a full and fast validation here. But I'll relay this to Andrei, it was his idea originally.The json parser will work fine without doing any validation at all. I've been implementing string handling code in Phobos with the idea of doing validation only if the algorithm requires it, and only for those parts that require it.There is no reason to validate UTF-8 input. The only place where non-ASCII code units can even legally appear is inside strings, and there they can just be copied verbatim while looking for the end of the string.The idea is to assume that any char based input is already valid UTF (as D defines it), while integer based input comes from an unverified source, so that it still has to be validated before being cast/copied into a 'string'. I think this is a sensible approach, both semantically and performance-wise.There are many validation algorithms in Phobos one can tack on - having two implementations of every algorithm, one with an embedded reinvented validation and one without - is too much.There is nothing reinvented here. It simply implicitly validates all non-string parts of a JSON document and uses validate() for parts of JSON strings that can contain unicode characters.The general idea with algorithms is that they do not combine things, but they enable composition.It's just that there is no way to achieve the same performance using composition in this case.This may be a factor of two, but not a combinatorial explosion.Back to the previous point, that means that every algorithm in Phobos should have two versions, one that returns a range and the other a string? All these variations will result in a combinatorical explosion.Why do both? Always return an input range. If the user wants a string, he can pipe the input range to a string generator, such as .arrayConvenience for one.The other problem, of course, is that returning a string means the algorithm has to decide how to allocate that string. As much as possible, algorithms should not be making allocation decisions.Granted, the fact that format() and to!() support input ranges (I didn't notice that until now) makes the issue less important. But without those, it would basically mean that almost all places that generate JSON strings would have to import std.array and append .array. Nothing particularly bad if viewed in isolation, but makes the language appear a lot less clean/more verbose if it occurs often. It's also a stepping stone for language newcomers.There are output range and allocation based float->string conversions available, but no input range based one. But well, using an internal buffer together with formattedWrite would probably be a viable workaround...The lack of number to input range conversion functions is another concern. I'm not really keen to implement an input range style floating-point to string conversion routine just for this module.Not sure what you mean. Phobos needs such routines anyway, and you still have to do something about floating point.Just branch misprediction will most probably be problematic. But I think this can be made fast enough anyway by making the input range partially eager and serving chunks of strings at a time. That way, the additional branching only has to happen once per chunk. I'll have a look.Finally, I'm a little worried about performance. The output range based approach can keep a lot of state implicitly using the program counter register. But an input range would explicitly have to keep track of the current JSON element, as well as the current character/state within that element (and possibly one level deeper, for example for escape sequences). This means that it will require either multiple branches or indirection for each popFront().Often this is made up for by not needing to allocate storage. Also, that state is in the cached "hot zone" on top of the stack, which is much faster to access than a cold uninitialized array.I share your concern with performance, and I had very good results with Warp by keeping all the state on the stack in this manner.
Aug 16 2015
On 2015-08-16 14:34, Sönke Ludwig wrote:Good, I'll use `if (isInputRange!R && (isSomeChar!(ElementEncodingType!R) || isIntegral!(ElementEncodingType!R))`. It's just used in number of places and quite a bit more verbose (twice as long) and I guess a large number of algorithms in Phobos accept char ranges, so that may actually warrant a name in this case.I agree. Signatures like this are what's making std.algorithm look more complicated than it is. -- /Jacob Carlborg
Aug 16 2015
On 8/16/2015 5:34 AM, Sönke Ludwig wrote:Am 16.08.2015 um 02:50 schrieb Walter Bright:Except that there is no reason to support wchar, dchar, int, ubyte, or anything other than char. The idea is not to support something just because you can, but there should be an identifiable, real use case for it first. Has anyone ever seen Json data as ulongs? I haven't either.if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char)) I'm not a fan of more names for trivia, the deluge of names has its own costs.Good, I'll use `if (isInputRange!R && (isSomeChar!(ElementEncodingType!R) || isIntegral!(ElementEncodingType!R))`. It's just used in number of places and quite a bit more verbose (twice as long) and I guess a large number of algorithms in Phobos accept char ranges, so that may actually warrant a name in this case.That argument could be used to justify validation in every single algorithm that deals with strings.The json parser will work fine without doing any validation at all. I've been implementing string handling code in Phobos with the idea of doing validation only if the algorithm requires it, and only for those parts that require it.Yes, and it won't do that if a char range is passed in. If the integral range path gets removed there are basically two possibilities left, perform the validation up-front (slower), or risk UTF exceptions in unrelated parts of the code base. I don't see why we shouldn't take the opportunity for a full and fast validation here. But I'll relay this to Andrei, it was his idea originally.We're already up to validate or not, to string or not, i.e. 4 combinations.This may be a factor of two, but not a combinatorial explosion.Back to the previous point, that means that every algorithm in Phobos should have two versions, one that returns a range and the other a string? All these variations will result in a combinatorical explosion.Why do both? Always return an input range. If the user wants a string, he can pipe the input range to a string generator, such as .arrayConvenience for one.This has been argued before, and the problem is it applies to EVERY algorithm in Phobos, and winds up with a doubling of the number of functions to deal with it. I do not view this as clean. D is going to be built around ranges as a fundamental way of coding. Users will need to learn something about them. Appending .array is not a big hill to climb.The other problem, of course, is that returning a string means the algorithm has to decide how to allocate that string. As much as possible, algorithms should not be making allocation decisions.Granted, the fact that format() and to!() support input ranges (I didn't notice that until now) makes the issue less important. But without those, it would basically mean that almost all places that generate JSON strings would have to import std.array and append .array. Nothing particularly bad if viewed in isolation, but makes the language appear a lot less clean/more verbose if it occurs often. It's also a stepping stone for language newcomers.There are output range and allocation based float->string conversions available, but no input range based one. But well, using an internal buffer together with formattedWrite would probably be a viable workaround...I plan to fix that, so using a workaround in the meantime is appropriate.
Aug 16 2015
Am 17.08.2015 um 00:03 schrieb Walter Bright:On 8/16/2015 5:34 AM, Sönke Ludwig wrote:But you have seen ubyte[] when reading something from a file or from a network stream. But since Andrei now also wants to remove it, so be it. I'll answer some of the other points anyway:Am 16.08.2015 um 02:50 schrieb Walter Bright:Except that there is no reason to support wchar, dchar, int, ubyte, or anything other than char. The idea is not to support something just because you can, but there should be an identifiable, real use case for it first. Has anyone ever seen Json data as ulongs? I haven't either.if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char)) I'm not a fan of more names for trivia, the deluge of names has its own costs.Good, I'll use `if (isInputRange!R && (isSomeChar!(ElementEncodingType!R) || isIntegral!(ElementEncodingType!R))`. It's just used in number of places and quite a bit more verbose (twice as long) and I guess a large number of algorithms in Phobos accept char ranges, so that may actually warrant a name in this case.Not really for all, but indeed there are more where this could apply in theory. However, JSON is used frequently in situations where parsing speed, or performance in general, is often crucial (e.g. web services), which makes it stand out due to practical concerns. Others, such as an XML parser would apply, too, but probably none of the generic string manipulation functions.That argument could be used to justify validation in every single algorithm that deals with strings.The json parser will work fine without doing any validation at all. I've been implementing string handling code in Phobos with the idea of doing validation only if the algorithm requires it, and only for those parts that require it.Yes, and it won't do that if a char range is passed in. If the integral range path gets removed there are basically two possibilities left, perform the validation up-front (slower), or risk UTF exceptions in unrelated parts of the code base. I don't see why we shouldn't take the opportunity for a full and fast validation here. But I'll relay this to Andrei, it was his idea originally.Validation is part of the lexer and not the generator. There is no combinatorial relation between the two. Validation is also just a template parameter, so there are no two combinations in terms of implementation either. There is just a "static if" statement somewhere to decide if validate() should be called or not.We're already up to validate or not, to string or not, i.e. 4 combinations.This may be a factor of two, but not a combinatorial explosion.Back to the previous point, that means that every algorithm in Phobos should have two versions, one that returns a range and the other a string? All these variations will result in a combinatorical explosion.Why do both? Always return an input range. If the user wants a string, he can pipe the input range to a string generator, such as .arrayConvenience for one.It isn't if you get taught about it. But it surely is if you don't know about it yet and try to get something working based only on the JSON API (language newcomer that wants to work with JSON). It's also still an additional thing to remember, type and read, making it an additional piece of cognitive load, even for developers that are fluent with this. Have many of such pieces and they add up to a point where productivity goes to its knees. I already personally find it quite annoying constantly having to import std.range, std.array and std.algorithm to just use some small piece of functionality in std.algorithm. It's also often not clear in which of the three modules/packages a certain function is. We need to find a better balance here if D is to keep its appeal as a language where you stay in "the zone" (a.k.a flow), which always has been a big thing for me.This has been argued before, and the problem is it applies to EVERY algorithm in Phobos, and winds up with a doubling of the number of functions to deal with it. I do not view this as clean. D is going to be built around ranges as a fundamental way of coding. Users will need to learn something about them. Appending .array is not a big hill to climb.The other problem, of course, is that returning a string means the algorithm has to decide how to allocate that string. As much as possible, algorithms should not be making allocation decisions.Granted, the fact that format() and to!() support input ranges (I didn't notice that until now) makes the issue less important. But without those, it would basically mean that almost all places that generate JSON strings would have to import std.array and append .array. Nothing particularly bad if viewed in isolation, but makes the language appear a lot less clean/more verbose if it occurs often. It's also a stepping stone for language newcomers.
Aug 22 2015
On 8/22/2015 5:21 AM, Sönke Ludwig wrote:Am 17.08.2015 um 00:03 schrieb Walter Bright:Not if the illuminating example in the Json API description does it that way. Newbies will tend to copy/pasta the examples as a starting point.D is going to be built around ranges as a fundamental way of coding. Users will need to learn something about them. Appending .array is not a big hill to climb.It isn't if you get taught about it. But it surely is if you don't know about it yet and try to get something working based only on the JSON API (language newcomer that wants to work with JSON).It's also still an additional thing to remember, type and read, making it an additional piece of cognitive load, even for developers that are fluent with this. Have many of such pieces and they add up to a point where productivity goes to its knees.Having composable components behaving in predictable ways is not an additional piece of cognitive load, it is less of one.I already personally find it quite annoying constantly having to import std.range, std.array and std.algorithm to just use some small piece of functionality in std.algorithm. It's also often not clear in which of the three modules/packages a certain function is. We need to find a better balance here if D is to keep its appeal as a language where you stay in "the zone" (a.k.a flow), which always has been a big thing for me.If I buy a toy car, I get a toy car. If I get a lego set, I can build any toy with it. I believe the composable component approach will make Phobos smaller and much more flexible and useful, as opposed to monolithic APIs.
Aug 24 2015
Am 24.08.2015 um 22:25 schrieb Walter Bright:On 8/22/2015 5:21 AM, Sönke Ludwig wrote:That's true, but then they will possibly have to understand the inner workings soon after, for example when something goes wrong and they get cryptic error messages. It makes the learning curve steeper, even if some of that can be mitigated with good documentation/tutorials.Am 17.08.2015 um 00:03 schrieb Walter Bright:Not if the illuminating example in the Json API description does it that way. Newbies will tend to copy/pasta the examples as a starting point.D is going to be built around ranges as a fundamental way of coding. Users will need to learn something about them. Appending .array is not a big hill to climb.It isn't if you get taught about it. But it surely is if you don't know about it yet and try to get something working based only on the JSON API (language newcomer that wants to work with JSON).Having to write additional things that are not part of the problem (".array", "import std.array : array;") is cognitive load and having to read such things is cognitive and visual load. Also, having to remember where those additional components reside is cognitive load, at least if they are not used really frequently. This has of course nothing to do with predictable behavior of the components, but with the API/language boundary between ranges and arrays.It's also still an additional thing to remember, type and read, making it an additional piece of cognitive load, even for developers that are fluent with this. Have many of such pieces and they add up to a point where productivity goes to its knees.Having composable components behaving in predictable ways is not an additional piece of cognitive load, it is less of one.I'm not arguing against a range based approach! It's just that such an approach ideally shouldn't come at the expense of simplicity and relevance. If I have a string variable and I want to store the upper case version of another string, the direct mental translation is "dst = toUpper(src);" - and not "dst = toUpper(src).array;". It reminds me of the unwrap() calls in Rust code. They can produce a huge amount of visual noise for dealing with errors, whereas an exception based approach lets you focus on the actual problem. Of course exceptions have their own issues, but that's a different topic. Keeping toString in addition to toChars would be enough to avoid the issue here. A possible alternative would be to let the proposed JSON text input range have an "alias this" to "std.array.array(this)". Then it wouldn't even require a rename of toString to toChars to get both worlds.I already personally find it quite annoying constantly having to import std.range, std.array and std.algorithm to just use some small piece of functionality in std.algorithm. It's also often not clear in which of the three modules/packages a certain function is. We need to find a better balance here if D is to keep its appeal as a language where you stay in "the zone" (a.k.a flow), which always has been a big thing for me.If I buy a toy car, I get a toy car. If I get a lego set, I can build any toy with it. I believe the composable component approach will make Phobos smaller and much more flexible and useful, as opposed to monolithic APIs.
Aug 24 2015
On Tuesday, 25 August 2015 at 06:56:23 UTC, Sönke Ludwig wrote:If I have a string variable and I want to store the upper case version of another string, the direct mental translation is "dst = toUpper(src);" - and not "dst = toUpper(src).array;".One can also say the problem is that you have a string variable.
Aug 25 2015
Am 25.08.2015 um 14:14 schrieb Sebastiaan Koppe:On Tuesday, 25 August 2015 at 06:56:23 UTC, Sönke Ludwig wrote:But ranges are not always the right solution: - For fields or setter properties, the exact type of the range is fixed, which is generally unpractical - If the underlying data of a range is stored on the stack or any other transient storage, it cannot be stored on the heap - If the range is only an input range, it must be copied to an array anyway if it's going to be read multiple times - Ranges cannot be immutable (no safe slicing or passing between threads) - If for some reason template land needs to be left, ranges have trouble following (although there are wrapper classes available) - Most existing APIs are string based - Re-evaluating a computed range each time a variable is read is usually wasteful There are probably a bunch of other problems that simply make ranges not the best answer in every situation.If I have a string variable and I want to store the upper case version of another string, the direct mental translation is "dst = toUpper(src);" - and not "dst = toUpper(src).array;".One can also say the problem is that you have a string variable.
Aug 25 2015
On Thursday, 13 August 2015 at 10:51:47 UTC, Sönke Ludwig wrote:I think we really need to have an informal pre-vote about the BigInt and DOM efficiency vs. functionality issues. Basically there are three options for each: 1. Keep them: May have an impact on compile time for big DOMs (run time/memory consumption wouldn't be affected if a pointer to BigInt is stored). But provides an out-of-the-box experience for a broad set of applications. 2. Remove them: Results in a slim and clean API that is fast (to run/compile), but also one that will be less useful for certain applications. 3. Make them CT configurable: Best of both worlds in terms of speed, at the cost of a more complex API.the template to extend the supported data types, correct? However, I also think that you shouldn't try to make the basic storage format handle everything that might be more appropriately handled by a meta-model. Are the range operations compatible with the std.parallelism library?
Aug 15 2015
On 7/28/15 10:07 AM, Atila Neves wrote:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaI'll submit a review in short order, but thought this might be of use in performance comparisons: https://www.reddit.com/r/programming/comments/3hbt4w/using_json_in_a_low_ atency_environment/ -- Andrei
Aug 17 2015
I've added some changes in the latest version (docs updated): - Switched to TaggedAlgebraic with full static operator forwarding - Removed toPrettyJSON (now the default), added GeneratorOptions.compact - The bigInt field in JSONValue is now stored as a pointer - Removed is(String/Integral)InputRange helper functions - Added opt2() [1] as an alternative candidate to opt() [2] with a more natural syntax The possible optimization to store the type tag in unused parts of the data fields could be implemented later directly in TaggedAlgebraic. [1]: http://s-ludwig.github.io/std_data_json/stdx/data/json/value/opt2.html [2]: http://s-ludwig.github.io/std_data_json/stdx/data/json/value/opt.html
Aug 17 2015
On 7/28/15 10:07 AM, Atila Neves wrote:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaI'll preface my review with a general comment. This API comes at an interesting juncture; we're striving as much as possible for interfaces that abstract away lifetime management, so they can be used comfortably with GC, or at high performance (and hopefully no or only marginal loss of comfort) with client-chosen lifetime management policies. The JSON API is a great test bed for our emerging recommended "push lifetime up" idioms; it's not too complicated yet it's not trivial either, and has great usefulness. With this, here are some points: * All new stuff should go in std.experimental. I assume "stdx" would change to that, should this work be merged. * On the face of it, dedicating 6 modules to such a small specification as JSON seems excessive. I'm thinking one module here. (As a simple point: who would ever want to import only foundation, which in turn has one exception type and one location type in it?) I think it shouldn't be up for debate that we must aim for simple and clean APIs. * stdx.data.json.generator: I think the API for converting in-memory JSON values to strings needs to be redone, as follows: - JSONValue should offer a byToken range, which offers the contents of the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' token followed by three numeric tokens with the respective values followed by the ']' token. - On top of byToken it's immediate to implement a method (say toJSON or toString) that accepts an output range of characters and formatting options. - On top of the method above with output range, implementing a toString overload that returns a string for convenience is a two-liner. However, it shouldn't return a "string"; Phobos APIs should avoid "hardcoding" the string type. Instead, it should return a user-chosen string type (including reference counting strings). - While at it make prettyfication a flag in the options, not its own part of the function name. * stdx.data.json.lexer: - I assume the idea was to accept ranges of integrals to mean "there's some raw input from a file". This seems to be a bit overdone, e.g. there's no need to accept signed integers or 64-bit integers. I suggest just going with the three character types. - I see tokenization accepts input ranges. This forces the tokenizer to store its own copy of things, which is no doubt the business of appenderFactory. Here the departure of the current approach from what I think should become canonical Phobos APIs deepens for multiple reasons. First, appenderFactory does allow customization of the append operation (nice) but that's not enough to allow the user to customize the lifetime of the created strings, which is usually reflected in the string type itself. So the lexing method should be parameterized by the string type used. (By default string (as is now) should be fine.) Therefore instead of customizing the append method just customize the string type used in the token. - The lexer should internally take optimization opportunities, e.g. if the string type is "string" and the lexed type is also "string", great, just use slices of the input instead of appending them to the tokens. - As a consequence the JSONToken type also needs to be parameterized by the type of its string that holds the payload. I understand this is a complication compared to the current approach, but I don't see an out. In the grand scheme of things it seems a necessary evil: tokens may or may not need a means to manage lifetime of their payload, and that's determined by the type of the payload. Hopefully simplifications in other areas of the API would offset this. - At token level there should be no number parsing. Just store the payload with the token and leave it for later. Very often numbers are converted without there being a need, and the process is costly. This also nicely sidesteps the entire matter of bigints, floating point etc. at this level. - Also, at token level strings should be stored with escapes unresolved. If the user wants a string with the escapes resolved, a lazy range does it. - Validating UTF is tricky; I've seen some discussion in this thread about it. On the face of it JSON only accepts valid UTF characters. As such, a modularity-based argument is to pipe UTF validation before tokenization. (We need a lazy UTF validator and sanitizer stat!) An efficiency-based argument is to do validation during tokenization. I'm inclining in favor of modularization, which allows us to focus on one thing at a time and do it well, instead of duplicationg validation everywhere. Note that it's easy to write routines that do JSON tokenization and leave UTF validation for later, so there's a lot of flexibility in composing validation with JSONization. - Litmus test: if the input type is a forward range AND if the string type chosen for tokens is the same as input type, successful tokenization should allocate exactly zero memory. I think this is a simple way to make sure that the tokenization API works well. - If noThrow is a runtime option, some functions can't be nothrow (and consequently nogc). Not sure how important this is. Probably quite a bit because of the current gc implications of exceptions. IMHO: at lexing level a sound design might just emit error tokens (with the culprit as payload) and never throw. Clients may always throw when they see an error token. * stdx.data.json.parser: - Similar considerations regarding string type used apply here as well: everything should be parameterized with it - the use case to keep in mind is someone wants everything with refcounted strings. - The JSON value does its own internal allocation (for e.g. arrays and hashtables), which should be fine as long as it's encapsulated and we can tweak it later (e.g. make it use reference counting inside). - parseJSONStream should parameterize on string type, not on appenderFactory. - Why both parseJSONStream and parseJSONValue? I'm thinking parseJSONValue would be enough because then you trivially parse a stream with repeated calls to parseJSONValue. - FWIW I think the whole thing with accommodating BigInt etc. is an exaggeration. Just stick with long and double. - readArray suddenly introduces a distinct kind of interacting - callbacks. Why? Should be a lazy range lazy range lazy range. An adapter using callbacks is then a two-liner. - Why is readBool even needed? Just readJSONValue and then enforce it as a bool. Same reasoning applies to readDouble and readString. - readObject is with callbacks again - it would be nice if it were a lazy range. - skipXxx are nice to have and useful. * stdx.data.json.value: - The etymology of "opt" is unclear - no word starting with "opt" or obviously abbreviating to it is in the documentation. "opt2" is awkward. How about "path" and "dyn", respectively. - I think Algebraic should be used throughout instead of TaggedAlgebraic, or motivation be given for the latter. - JSONValue should be more opaque and not expose representation as much as it does now. In particular, offering a built-in hashtable is bound to be problematic because those are expensive to construct, create garbage, and are not customizable. Instead, the necessary lookup and set APIs should be provided by JSONValue whilst keeping the implementation hidden. The same goes about array - a JSONValue shall not be exposed; instead, indexed access primitives should be exposed. Separate types might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary. The string type should be a type parameter of JSONValue. ============================== So, here we are. I realize a good chunk of this is surprising ("you mean I shouldn't create strings in my APIs?"). My point here is, again, we're at a juncture. We're trying to factor garbage (heh) out of API design in ways that defer the lifetime management to the user of the API. We could pull json into std.experimental and defer the policy decisions for later, but I think it's a great driver for them. (Thanks Sönke for doing all the work, this is a great baseline.) I think we should use the JSON API as a guinea pig for the new era of D API design in which we have a solid set of principles, tools, and guidelines to defer lifetime management. Please advise. Andrei
Aug 17 2015
On 2015-08-18 00:21, Andrei Alexandrescu wrote:* On the face of it, dedicating 6 modules to such a small specification as JSON seems excessive. I'm thinking one module here. (As a simple point: who would ever want to import only foundation, which in turn has one exception type and one location type in it?) I think it shouldn't be up for debate that we must aim for simple and clean APIs.I don't think this is excessive. We should strive to have small modules. We already have/had problems with std.algorithm and std.datetime, let's not repeat those mistakes. A module with 2000 lines is more than enough. -- /Jacob Carlborg
Aug 17 2015
On 8/18/15 2:31 AM, Jacob Carlborg wrote:On 2015-08-18 00:21, Andrei Alexandrescu wrote:How about a module with 20? -- Andrei* On the face of it, dedicating 6 modules to such a small specification as JSON seems excessive. I'm thinking one module here. (As a simple point: who would ever want to import only foundation, which in turn has one exception type and one location type in it?) I think it shouldn't be up for debate that we must aim for simple and clean APIs.I don't think this is excessive. We should strive to have small modules. We already have/had problems with std.algorithm and std.datetime, let's not repeat those mistakes. A module with 2000 lines is more than enough.
Aug 18 2015
On 2015-08-18 15:18, Andrei Alexandrescu wrote:How about a module with 20? -- AndreiIf it's used in several other modules, I don't see a problem with it. -- /Jacob Carlborg
Aug 18 2015
On 8/18/15 9:31 AM, Jacob Carlborg wrote:On 2015-08-18 15:18, Andrei Alexandrescu wrote:Me neither if internal. I do see a problem if it's public. -- AndreiHow about a module with 20? -- AndreiIf it's used in several other modules, I don't see a problem with it.
Aug 18 2015
On 2015-08-18 17:18, Andrei Alexandrescu wrote:Me neither if internal. I do see a problem if it's public. -- AndreiIf it's public and those 20 lines are useful on its own, I don't see a problem with that either. -- /Jacob Carlborg
Aug 18 2015
On 8/18/15 1:24 PM, Jacob Carlborg wrote:On 2015-08-18 17:18, Andrei Alexandrescu wrote:In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- AndreiMe neither if internal. I do see a problem if it's public. -- AndreiIf it's public and those 20 lines are useful on its own, I don't see a problem with that either.
Aug 18 2015
On 19-Aug-2015 04:58, Andrei Alexandrescu wrote:On 8/18/15 1:24 PM, Jacob Carlborg wrote:To catch it? Generally I agree - just merge things sensibly, there could be traits.d/primitives.d module should it define isXYZ constraints and other lightweight interface-only entities. -- Dmitry OlshanskyOn 2015-08-18 17:18, Andrei Alexandrescu wrote:In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- AndreiMe neither if internal. I do see a problem if it's public. -- AndreiIf it's public and those 20 lines are useful on its own, I don't see a problem with that either.
Aug 19 2015
Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu:On 8/18/15 1:24 PM, Jacob Carlborg wrote:The only other module where it would fit would be lexer.d, but that means that importing JSONValue also has to import the parser and lexer modules, which is usually only needed in a few places.On 2015-08-18 17:18, Andrei Alexandrescu wrote:In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- AndreiMe neither if internal. I do see a problem if it's public. -- AndreiIf it's public and those 20 lines are useful on its own, I don't see a problem with that either.
Aug 19 2015
On 8/19/15 4:55 AM, Sönke Ludwig wrote:Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu:I'm sure there are a number of better options to package things nicely. -- AndreiOn 8/18/15 1:24 PM, Jacob Carlborg wrote:The only other module where it would fit would be lexer.d, but that means that importing JSONValue also has to import the parser and lexer modules, which is usually only needed in a few places.On 2015-08-18 17:18, Andrei Alexandrescu wrote:In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- AndreiMe neither if internal. I do see a problem if it's public. -- AndreiIf it's public and those 20 lines are useful on its own, I don't see a problem with that either.
Aug 21 2015
Am 21.08.2015 um 18:54 schrieb Andrei Alexandrescu:On 8/19/15 4:55 AM, Sönke Ludwig wrote:I'm all ears ;)Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu:I'm sure there are a number of better options to package things nicely. -- AndreiOn 8/18/15 1:24 PM, Jacob Carlborg wrote:The only other module where it would fit would be lexer.d, but that means that importing JSONValue also has to import the parser and lexer modules, which is usually only needed in a few places.On 2015-08-18 17:18, Andrei Alexandrescu wrote:In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- AndreiMe neither if internal. I do see a problem if it's public. -- AndreiIf it's public and those 20 lines are useful on its own, I don't see a problem with that either.
Aug 22 2015
On 08/18/2015 09:18 AM, Andrei Alexandrescu wrote:On 8/18/15 2:31 AM, Jacob Carlborg wrote:Module boundaries should be determined by organizational grouping, not by size.I don't think this is excessive. We should strive to have small modules. We already have/had problems with std.algorithm and std.datetime, let's not repeat those mistakes. A module with 2000 lines is more than enough.How about a module with 20? -- Andrei
Aug 21 2015
On Friday, 21 August 2015 at 16:25:40 UTC, Nick Sabalausky wrote:Module boundaries should be determined by organizational grouping, not by size.By organizational grouping as well as encapsulation concerns. Modules are the smallest units of encapsulation in D, visibility-wise. — David
Aug 21 2015
On 8/21/15 12:25 PM, Nick Sabalausky wrote:On 08/18/2015 09:18 AM, Andrei Alexandrescu wrote:Rather by usefulness. As I mentioned, nobody would ever need only JSON's exceptions and location. -- AndreiOn 8/18/15 2:31 AM, Jacob Carlborg wrote:Module boundaries should be determined by organizational grouping, not by size.I don't think this is excessive. We should strive to have small modules. We already have/had problems with std.algorithm and std.datetime, let's not repeat those mistakes. A module with 2000 lines is more than enough.How about a module with 20? -- Andrei
Aug 21 2015
On 2015-08-21 18:25, Nick Sabalausky wrote:Module boundaries should be determined by organizational grouping, not by size.Well, but it depends on how you decide what should be in a group. Size is usually a part of that decision, although it might not be conscious. You wouldn't but the whole D compiler on one module ;) -- /Jacob Carlborg
Aug 23 2015
On Monday, 17 August 2015 at 22:21:50 UTC, Andrei Alexandrescu wrote:* stdx.data.json.generator: I think the API for converting in-memory JSON values to strings needs to be redone, as follows: - JSONValue should offer a byToken range, which offers the contents of the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' token followed by three numeric tokens with the respective values followed by the ']' token.For iterating tree-like structures, a callback-based seems nicer, because it can naturally use the stack for storing its state. (I assume std.concurrency.Generator is too heavy-weight for this case.)- On top of byToken it's immediate to implement a method (say toJSON or toString) that accepts an output range of characters and formatting options.If there really needs to be a range, `joiner` and `copy` should do the job.- On top of the method above with output range, implementing a toString overload that returns a string for convenience is a two-liner. However, it shouldn't return a "string"; Phobos APIs should avoid "hardcoding" the string type. Instead, it should return a user-chosen string type (including reference counting strings).`to!string`, for compatibility with std.conv.- While at it make prettyfication a flag in the options, not its own part of the function name.(That's already done.)* stdx.data.json.lexer: - I assume the idea was to accept ranges of integrals to mean "there's some raw input from a file". This seems to be a bit overdone, e.g. there's no need to accept signed integers or 64-bit integers. I suggest just going with the three character types. - I see tokenization accepts input ranges. This forces the tokenizer to store its own copy of things, which is no doubt the business of appenderFactory. Here the departure of the current approach from what I think should become canonical Phobos APIs deepens for multiple reasons. First, appenderFactory does allow customization of the append operation (nice) but that's not enough to allow the user to customize the lifetime of the created strings, which is usually reflected in the string type itself. So the lexing method should be parameterized by the string type used. (By default string (as is now) should be fine.) Therefore instead of customizing the append method just customize the string type used in the token. - The lexer should internally take optimization opportunities, e.g. if the string type is "string" and the lexed type is also "string", great, just use slices of the input instead of appending them to the tokens. - As a consequence the JSONToken type also needs to be parameterized by the type of its string that holds the payload. I understand this is a complication compared to the current approach, but I don't see an out. In the grand scheme of things it seems a necessary evil: tokens may or may not need a means to manage lifetime of their payload, and that's determined by the type of the payload. Hopefully simplifications in other areas of the API would offset this.I've never seen JSON encoded in anything other than UTF-8. Is it really necessary to complicate everything for such an infrequent niche case?- At token level there should be no number parsing. Just store the payload with the token and leave it for later. Very often numbers are converted without there being a need, and the process is costly. This also nicely sidesteps the entire matter of bigints, floating point etc. at this level. - Also, at token level strings should be stored with escapes unresolved. If the user wants a string with the escapes resolved, a lazy range does it.This was already suggested, and it looks like a good idea, though there was an objection because of possible performance costs. The other objection, that it requires an allocation, is no longer valid if sliceable input is used.- Validating UTF is tricky; I've seen some discussion in this thread about it. On the face of it JSON only accepts valid UTF characters. As such, a modularity-based argument is to pipe UTF validation before tokenization. (We need a lazy UTF validator and sanitizer stat!) An efficiency-based argument is to do validation during tokenization. I'm inclining in favor of modularization, which allows us to focus on one thing at a time and do it well, instead of duplicationg validation everywhere. Note that it's easy to write routines that do JSON tokenization and leave UTF validation for later, so there's a lot of flexibility in composing validation with JSONization.Well, in an ideal world, there should be no difference in performance between manually combined tokenization/validation, and composed ranges. We should practice what we preach here.* stdx.data.json.parser: - FWIW I think the whole thing with accommodating BigInt etc. is an exaggeration. Just stick with long and double.Or, as above, leave it to the end user and provide a `to(T)` method that can support built-in types and `BigInt` alike.
Aug 18 2015
Am Tue, 18 Aug 2015 09:05:32 +0000 schrieb "Marc Sch=C3=BCtz" <schuetzm gmx.net>:Or, as above, leave it to the end user and provide a `to(T)`=20 method that can support built-in types and `BigInt` alike.You mean the user should write a JSON number parsing routine on their own? Then which part is responsible for validation of JSON contraints? If it is the to!(T) function, then it is code duplication with chances of getting something wrong, if it is the JSON parser, then the number is parsed twice. Besides, there is a lot of code to be shared for every T. --=20 Marco
Sep 28 2015
On Monday, 28 September 2015 at 07:02:35 UTC, Marco Leise wrote:Am Tue, 18 Aug 2015 09:05:32 +0000 schrieb "Marc Schütz" <schuetzm gmx.net>:No, the JSON type should just store the raw unparsed token and implement: struct JSON { T to(T) if(isNumeric!T && is(typeof(T("")))) { return T(this.raw); } } The end user can then call: auto value = json.to!BigInt;Or, as above, leave it to the end user and provide a `to(T)` method that can support built-in types and `BigInt` alike.You mean the user should write a JSON number parsing routine on their own? Then which part is responsible for validation of JSON contraints? If it is the to!(T) function, then it is code duplication with chances of getting something wrong, if it is the JSON parser, then the number is parsed twice. Besides, there is a lot of code to be shared for every T.
Sep 29 2015
On Tuesday, 29 September 2015 at 11:06:03 UTC, Marc Schütz wrote:On Monday, 28 September 2015 at 07:02:35 UTC, Marco Leise wrote:I was just speaking to Sonke about another aspect of this. It's not just numbers where this might be the case - dates are also often in a weird format (because the data comes from some ancient mainframe, for example). And similarly for enums where the field is a string but actually ought to fit in a fixed set of categories. I forgot the original context to this long thread, so hopefully this point is relevant. It's more relevant for the layer that will go on top where you want to be able to parse a json array or object as a D array/associative array of structs, as you can do in vibe.d currently. But maybe needs to be considered in lower level - I forget at this point.Am Tue, 18 Aug 2015 09:05:32 +0000 schrieb "Marc Schütz" <schuetzm gmx.net>:No, the JSON type should just store the raw unparsed token and implement: struct JSON { T to(T) if(isNumeric!T && is(typeof(T("")))) { return T(this.raw); } } The end user can then call: auto value = json.to!BigInt;Or, as above, leave it to the end user and provide a `to(T)` method that can support built-in types and `BigInt` alike.You mean the user should write a JSON number parsing routine on their own? Then which part is responsible for validation of JSON contraints? If it is the to!(T) function, then it is code duplication with chances of getting something wrong, if it is the JSON parser, then the number is parsed twice. Besides, there is a lot of code to be shared for every T.
Sep 29 2015
Am Tue, 29 Sep 2015 11:06:01 +0000 schrieb Marc Sch=C3=BCtz <schuetzm gmx.net>:No, the JSON type should just store the raw unparsed token and=20 implement: =20 struct JSON { T to(T) if(isNumeric!T && is(typeof(T("")))) { return T(this.raw); } } =20 The end user can then call: =20 auto value =3D json.to!BigInt;Ah, the duck typing approach of accepting any numeric type constructible from a string. Still: You need to parse the number first to know how long the digit string is that you pass to T's ctor. And then you have two sets of syntaxes for numbers: JSON and T's ctor. T could potentially parse numbers with the system locale's setting for the decimal point which may be ',' while JSON uses '.' or support hexadecimal numbers which are also invalid JSON. On the other hand, a ctor for some integral type may not support the exponential notation "2e10", which could legitimately be used by JSON writers (Ruby's uses shortest way to store numbers) to save on bandwidth. --=20 Marco
Sep 30 2015
Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:I'll preface my review with a general comment. This API comes at an interesting juncture; we're striving as much as possible for interfaces that abstract away lifetime management, so they can be used comfortably with GC, or at high performance (and hopefully no or only marginal loss of comfort) with client-chosen lifetime management policies. The JSON API is a great test bed for our emerging recommended "push lifetime up" idioms; it's not too complicated yet it's not trivial either, and has great usefulness. With this, here are some points: * All new stuff should go in std.experimental. I assume "stdx" would change to that, should this work be merged.Check.* On the face of it, dedicating 6 modules to such a small specification as JSON seems excessive. I'm thinking one module here. (As a simple point: who would ever want to import only foundation, which in turn has one exception type and one location type in it?) I think it shouldn't be up for debate that we must aim for simple and clean APIs.That would mean a single module that is >5k lines long. Spreading out certain things, such as JSONValue into an own module also makes sense to avoid unnecessarily large imports where other parts of the functionality isn't needed. Maybe we could move some private things to "std.internal" or similar and merge some of the modules? But I also think that grouping symbols by topic is a good thing and makes figuring out the API easier. There is also always package.d if you really want to import everything.* stdx.data.json.generator: I think the API for converting in-memory JSON values to strings needs to be redone, as follows: - JSONValue should offer a byToken range, which offers the contents of the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' token followed by three numeric tokens with the respective values followed by the ']' token.An input range style generator is on the TODO list, but would a token range be really useful for anything in practice? I would just go straight for a char range. Another thing I'd like to add is an output range that takes parser nodes and writes to a string output range. This would be the kind of interface that would be most useful for a serialization framework.- On top of byToken it's immediate to implement a method (say toJSON or toString) that accepts an output range of characters and formatting options. - On top of the method above with output range, implementing a toString overload that returns a string for convenience is a two-liner. However, it shouldn't return a "string"; Phobos APIs should avoid "hardcoding" the string type. Instead, it should return a user-chosen string type (including reference counting strings).Without any existing code to test this against, how would this look like? Simply using an `Appender!rcstring`?- While at it make prettyfication a flag in the options, not its own part of the function name.Already done. Pretty printing is now the default and there is GeneratorOptions.compact.* stdx.data.json.lexer: - I assume the idea was to accept ranges of integrals to mean "there's some raw input from a file". This seems to be a bit overdone, e.g. there's no need to accept signed integers or 64-bit integers. I suggest just going with the three character types.It's funny you say that, because this was your own design proposal. Regarding the three character types, if we drop everything but those, I think we could also go with Walter's suggestion and just drop everything apart from "char". Putting a conversion range from dchar to char would be trivial and should be fast enough.- I see tokenization accepts input ranges. This forces the tokenizer to store its own copy of things, which is no doubt the business of appenderFactory. Here the departure of the current approach from what I think should become canonical Phobos APIs deepens for multiple reasons. First, appenderFactory does allow customization of the append operation (nice) but that's not enough to allow the user to customize the lifetime of the created strings, which is usually reflected in the string type itself. So the lexing method should be parameterized by the string type used. (By default string (as is now) should be fine.) Therefore instead of customizing the append method just customize the string type used in the token.Okay, sounds reasonable if Appender!rcstring is just going to work.- The lexer should internally take optimization opportunities, e.g. if the string type is "string" and the lexed type is also "string", great, just use slices of the input instead of appending them to the tokens.It does.- As a consequence the JSONToken type also needs to be parameterized by the type of its string that holds the payload. I understand this is a complication compared to the current approach, but I don't see an out. In the grand scheme of things it seems a necessary evil: tokens may or may not need a means to manage lifetime of their payload, and that's determined by the type of the payload. Hopefully simplifications in other areas of the API would offset this.It wouldn't be too bad here, because it's presumably pretty rare to store tokens or parser nodes. Worse is JSONValue.- At token level there should be no number parsing. Just store the payload with the token and leave it for later. Very often numbers are converted without there being a need, and the process is costly. This also nicely sidesteps the entire matter of bigints, floating point etc. at this level.Okay, again, this was your own suggestion. The downside of always storing the string representation is that it requires allocations if no slices are used, and that the string will have to be parsed twice if the number is indeed going to be used. This can have a considerable performance impact.- Also, at token level strings should be stored with escapes unresolved. If the user wants a string with the escapes resolved, a lazy range does it.To make things efficient, it currently stores escaped strings if slices of the input are used, but stores unescaped strings if allocations are necessary anyway.- Validating UTF is tricky; I've seen some discussion in this thread about it. On the face of it JSON only accepts valid UTF characters. As such, a modularity-based argument is to pipe UTF validation before tokenization. (We need a lazy UTF validator and sanitizer stat!) An efficiency-based argument is to do validation during tokenization. I'm inclining in favor of modularization, which allows us to focus on one thing at a time and do it well, instead of duplicationg validation everywhere. Note that it's easy to write routines that do JSON tokenization and leave UTF validation for later, so there's a lot of flexibility in composing validation with JSONization.It's unfortunate to see this change of mind in face of the work that already went into the implementation. I also still think that this is a good optimization opportunity that doesn't really affect the implementation complexity. Validation isn't duplicated, but reused from std.utf.- Litmus test: if the input type is a forward range AND if the string type chosen for tokens is the same as input type, successful tokenization should allocate exactly zero memory. I think this is a simple way to make sure that the tokenization API works well.Supporting arbitrary forward ranges doesn't seem to be enough, it would at least have to be combined with something like take(), but then the type doesn't equal the string type anymore. I'd suggest to keep it to "if is sliceable and input type equals string type", at least for the initial version.- If noThrow is a runtime option, some functions can't be nothrow (and consequently nogc). Not sure how important this is. Probably quite a bit because of the current gc implications of exceptions. IMHO: at lexing level a sound design might just emit error tokens (with the culprit as payload) and never throw. Clients may always throw when they see an error token.noThrow is a compile time option and there are nothrow unit tests to make sure that the API is nothrow at least for string inputs.* stdx.data.json.parser: - Similar considerations regarding string type used apply here as well: everything should be parameterized with it - the use case to keep in mind is someone wants everything with refcounted strings.Okay.- The JSON value does its own internal allocation (for e.g. arrays and hashtables), which should be fine as long as it's encapsulated and we can tweak it later (e.g. make it use reference counting inside).Since it's based on (Tagged)Algebraic, the internal types are part of the interface. Changing them later is bound to break some code. So AFICS this would either require to make the types used parameterized (string, array and AA types). Or to abstract them away completely, i.e. only forward operations but deny direct access to the type. ... thinking about it, TaggedAlgebraic could do that, while Algebraic can't.- parseJSONStream should parameterize on string type, not on appenderFactory.Okay.- Why both parseJSONStream and parseJSONValue? I'm thinking parseJSONValue would be enough because then you trivially parse a stream with repeated calls to parseJSONValue.parseJSONStream is the pull parser (StAX style) interface. It returns the contents of a JSON document as individual nodes instead of storing them in a DOM. This part is vital for high-performance parsing, especially of large documents.- FWIW I think the whole thing with accommodating BigInt etc. is an exaggeration. Just stick with long and double.As mentioned earlier somewhere in this thread, there are practical needs to at least be able to handle ulong, too. Maybe the solution is indeed to just (optionally) store the string representation, so people can convert as they see fit.- readArray suddenly introduces a distinct kind of interacting - callbacks. Why? Should be a lazy range lazy range lazy range. An adapter using callbacks is then a two-liner.It just has a more complicated implementation, but is already on the TODO list.- Why is readBool even needed? Just readJSONValue and then enforce it as a bool. Same reasoning applies to readDouble and readString.This is for lower level access, using parseJSONValue would certainly be possible, but it would have quite some unneeded overhead and would also be non- nogc.- readObject is with callbacks again - it would be nice if it were a lazy range.Okay, is also already on the list.- skipXxx are nice to have and useful. * stdx.data.json.value: - The etymology of "opt" is unclear - no word starting with "opt" or obviously abbreviating to it is in the documentation. "opt2" is awkward. How about "path" and "dyn", respectively.The names are just placeholders currently. I think one of the two should also be enough. I've just implemented both, so that both can be tested/seen in practice. There have also been some more name suggestions in a thread mentioned by Meta with a more general suggestion for normal D member access. I'll see if I can dig those up, too.- I think Algebraic should be used throughout instead of TaggedAlgebraic, or motivation be given for the latter.There have already been quite some arguments that I think are compelling, especially with a lack of counter arguments (maybe their consequences need to be explained better, though). TaggedAlgebraic could also (implicitly) convert to Algebraic. An additional argument is the potential possibility of TaggedAlgebraic to abstract away the underlying type, since it doesn't rely on a has!T and get!T API. But apart from that, algebraic is unfortunately currently quite unsuited for this kind of abstraction, even if that can be solved in theory (with a lot of work). It requires to write things like obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just obj["foo"], because it simply returns Variant from all of its forwarded operators.- JSONValue should be more opaque and not expose representation as much as it does now. In particular, offering a built-in hashtable is bound to be problematic because those are expensive to construct, create garbage, and are not customizable. Instead, the necessary lookup and set APIs should be provided by JSONValue whilst keeping the implementation hidden. The same goes about array - a JSONValue shall not be exposed; instead, indexed access primitives should be exposed. Separate types might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary. The string type should be a type parameter of JSONValue.This would unfortunately at the same time destroy almost all benefits that using (Tagged)Algebraic has, namely that it would opens up the possibility to have interoperability between different data formats (for example, passing a JSONValue to a BSON generator without letting the BSON generator know about JSON). This is unfortunately an area that I've also not yet properly explored, but I think it's important as we go forward with other data formats.============================== So, here we are. I realize a good chunk of this is surprising ("you mean I shouldn't create strings in my APIs?"). My point here is, again, we're at a juncture. We're trying to factor garbage (heh) out of API design in ways that defer the lifetime management to the user of the API.Most suggestions so far sound very reasonable, namely parameterizing parsing/lexing on the string type and using ranges where possible. JSONValue is a different beast that needs some more thought if we really want to keep it generic in terms of allocation/lifetime model. In terms of removing "garbage" from the API, I'm just not 100% sure if removing small but frequently used functions, such as a string conversion function (one that returns an allocated string) is really a good idea (what Walter's suggested).We could pull json into std.experimental and defer the policy decisions for later, but I think it's a great driver for them. (Thanks Sönke for doing all the work, this is a great baseline.) I think we should use the JSON API as a guinea pig for the new era of D API design in which we have a solid set of principles, tools, and guidelines to defer lifetime management. Please advise.
Aug 18 2015
On 8/18/15 12:54 PM, Sönke Ludwig wrote:Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:That would help. My point is it's good design to make the response proportional to the problem. 5K lines is not a lot, but reducing those 5K in the first place would be a noble pursuit. And btw saving parsing time is so C++ :o).* On the face of it, dedicating 6 modules to such a small specification as JSON seems excessive. I'm thinking one module here. (As a simple point: who would ever want to import only foundation, which in turn has one exception type and one location type in it?) I think it shouldn't be up for debate that we must aim for simple and clean APIs.That would mean a single module that is >5k lines long. Spreading out certain things, such as JSONValue into an own module also makes sense to avoid unnecessarily large imports where other parts of the functionality isn't needed. Maybe we could move some private things to "std.internal" or similar and merge some of the modules?But I also think that grouping symbols by topic is a good thing and makes figuring out the API easier. There is also always package.d if you really want to import everything.Figuring out the API easily is a good goal. The best way to achieve that is making the API no larger than necessary.Sounds good.* stdx.data.json.generator: I think the API for converting in-memory JSON values to strings needs to be redone, as follows: - JSONValue should offer a byToken range, which offers the contents of the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' token followed by three numeric tokens with the respective values followed by the ']' token.An input range style generator is on the TODO list, but would a token range be really useful for anything in practice? I would just go straight for a char range.Another thing I'd like to add is an output range that takes parser nodes and writes to a string output range. This would be the kind of interface that would be most useful for a serialization framework.Couldn't that be achieved trivially by e.g. using map!(t => t.toString) or similar? This is the nice thing about rangifying everything - suddenly you have a host of tools at your disposal.Yes.- On top of byToken it's immediate to implement a method (say toJSON or toString) that accepts an output range of characters and formatting options. - On top of the method above with output range, implementing a toString overload that returns a string for convenience is a two-liner. However, it shouldn't return a "string"; Phobos APIs should avoid "hardcoding" the string type. Instead, it should return a user-chosen string type (including reference counting strings).Without any existing code to test this against, how would this look like? Simply using an `Appender!rcstring`?Great, thanks.- While at it make prettyfication a flag in the options, not its own part of the function name.Already done. Pretty printing is now the default and there is GeneratorOptions.compact.Ooops...* stdx.data.json.lexer: - I assume the idea was to accept ranges of integrals to mean "there's some raw input from a file". This seems to be a bit overdone, e.g. there's no need to accept signed integers or 64-bit integers. I suggest just going with the three character types.It's funny you say that, because this was your own design proposal.Regarding the three character types, if we drop everything but those, I think we could also go with Walter's suggestion and just drop everything apart from "char". Putting a conversion range from dchar to char would be trivial and should be fast enough.That's great, thanks.Awesome, thanks.- I see tokenization accepts input ranges. This forces the tokenizer to store its own copy of things, which is no doubt the business of appenderFactory. Here the departure of the current approach from what I think should become canonical Phobos APIs deepens for multiple reasons. First, appenderFactory does allow customization of the append operation (nice) but that's not enough to allow the user to customize the lifetime of the created strings, which is usually reflected in the string type itself. So the lexing method should be parameterized by the string type used. (By default string (as is now) should be fine.) Therefore instead of customizing the append method just customize the string type used in the token.Okay, sounds reasonable if Appender!rcstring is just going to work.Yay to that.- The lexer should internally take optimization opportunities, e.g. if the string type is "string" and the lexed type is also "string", great, just use slices of the input instead of appending them to the tokens.It does.Hmm, point taken. I'm not too worried about the parsing part but string allocation may be problematic.- At token level there should be no number parsing. Just store the payload with the token and leave it for later. Very often numbers are converted without there being a need, and the process is costly. This also nicely sidesteps the entire matter of bigints, floating point etc. at this level.Okay, again, this was your own suggestion. The downside of always storing the string representation is that it requires allocations if no slices are used, and that the string will have to be parsed twice if the number is indeed going to be used. This can have a considerable performance impact.That seems a good balance, and probably could be applied to numbers as well.- Also, at token level strings should be stored with escapes unresolved. If the user wants a string with the escapes resolved, a lazy range does it.To make things efficient, it currently stores escaped strings if slices of the input are used, but stores unescaped strings if allocations are necessary anyway.Well if the validation is reused from std.utf, it can't have been very much work. I maintain that separating concerns seems like a good strategy here.- Validating UTF is tricky; I've seen some discussion in this thread about it. On the face of it JSON only accepts valid UTF characters. As such, a modularity-based argument is to pipe UTF validation before tokenization. (We need a lazy UTF validator and sanitizer stat!) An efficiency-based argument is to do validation during tokenization. I'm inclining in favor of modularization, which allows us to focus on one thing at a time and do it well, instead of duplicationg validation everywhere. Note that it's easy to write routines that do JSON tokenization and leave UTF validation for later, so there's a lot of flexibility in composing validation with JSONization.It's unfortunate to see this change of mind in face of the work that already went into the implementation. I also still think that this is a good optimization opportunity that doesn't really affect the implementation complexity. Validation isn't duplicated, but reused from std.utf.I had "take" in mind. Don't forget that "take" automatically uses slices wherever applicable. So if you just use typeof(take(...)), you get the best of all worlds. The more restrictive version seems reasonable for the first release.- Litmus test: if the input type is a forward range AND if the string type chosen for tokens is the same as input type, successful tokenization should allocate exactly zero memory. I think this is a simple way to make sure that the tokenization API works well.Supporting arbitrary forward ranges doesn't seem to be enough, it would at least have to be combined with something like take(), but then the type doesn't equal the string type anymore. I'd suggest to keep it to "if is sliceable and input type equals string type", at least for the initial version.Awesome.- If noThrow is a runtime option, some functions can't be nothrow (and consequently nogc). Not sure how important this is. Probably quite a bit because of the current gc implications of exceptions. IMHO: at lexing level a sound design might just emit error tokens (with the culprit as payload) and never throw. Clients may always throw when they see an error token.noThrow is a compile time option and there are nothrow unit tests to make sure that the API is nothrow at least for string inputs.Well if you figure the general Algebraic type is better replaced by a type specialized for JSON, fine. What we shouldn't endorse is two nearly identical library types (Algebraic and TaggedAlgebraic) that are only different in subtle matters related to performance in certain use patterns. If integral tags are better for closed type universes, specialize Algebraic to use integral tags where applicable.- The JSON value does its own internal allocation (for e.g. arrays and hashtables), which should be fine as long as it's encapsulated and we can tweak it later (e.g. make it use reference counting inside).Since it's based on (Tagged)Algebraic, the internal types are part of the interface. Changing them later is bound to break some code. So AFICS this would either require to make the types used parameterized (string, array and AA types). Or to abstract them away completely, i.e. only forward operations but deny direct access to the type. ... thinking about it, TaggedAlgebraic could do that, while Algebraic can't.So perhaps this is just a naming issue. The names don't suggest everything you said. What I see is "parse a JSON stream" and "parse a JSON value". So I naturally assumed we're looking at consuming a full stream vs. consuming only one value off a stream and stopping. How about better names?- Why both parseJSONStream and parseJSONValue? I'm thinking parseJSONValue would be enough because then you trivially parse a stream with repeated calls to parseJSONValue.parseJSONStream is the pull parser (StAX style) interface. It returns the contents of a JSON document as individual nodes instead of storing them in a DOM. This part is vital for high-performance parsing, especially of large documents.Great. I trust you'll find the right compromise there. All I'm saying is that BigInt here stands like a sore thumb in the whole affair. Best to just take it out and let folks who need it build on top of the lexer.- FWIW I think the whole thing with accommodating BigInt etc. is an exaggeration. Just stick with long and double.As mentioned earlier somewhere in this thread, there are practical needs to at least be able to handle ulong, too. Maybe the solution is indeed to just (optionally) store the string representation, so people can convert as they see fit.Great. Let me say again that with ranges you get to instantly tap into a wealth of tools. I say get rid of the callbacks and let a "tee" take care of it for whomever needs it.- readArray suddenly introduces a distinct kind of interacting - callbacks. Why? Should be a lazy range lazy range lazy range. An adapter using callbacks is then a two-liner.It just has a more complicated implementation, but is already on the TODO list.Meh, fine. But all of this is adding weight to the API in the wrong places.- Why is readBool even needed? Just readJSONValue and then enforce it as a bool. Same reasoning applies to readDouble and readString.This is for lower level access, using parseJSONValue would certainly be possible, but it would have quite some unneeded overhead and would also be non- nogc.Awes!- readObject is with callbacks again - it would be nice if it were a lazy range.Okay, is also already on the list.Okay.- skipXxx are nice to have and useful. * stdx.data.json.value: - The etymology of "opt" is unclear - no word starting with "opt" or obviously abbreviating to it is in the documentation. "opt2" is awkward. How about "path" and "dyn", respectively.The names are just placeholders currently. I think one of the two should also be enough. I've just implemented both, so that both can be tested/seen in practice. There have also been some more name suggestions in a thread mentioned by Meta with a more general suggestion for normal D member access. I'll see if I can dig those up, too.To reiterate the point I made above: we should not endorse two mostly equivalent types that exhibit subtle performance differences. Feel free to change Algebraic to use integrals for some/most cases when the number of types involved is bounded. Adding new methods to Algebraic should also be fine. Just don't add a new type that's 98% the same.- I think Algebraic should be used throughout instead of TaggedAlgebraic, or motivation be given for the latter.There have already been quite some arguments that I think are compelling, especially with a lack of counter arguments (maybe their consequences need to be explained better, though). TaggedAlgebraic could also (implicitly) convert to Algebraic. An additional argument is the potential possibility of TaggedAlgebraic to abstract away the underlying type, since it doesn't rely on a has!T and get!T API.But apart from that, algebraic is unfortunately currently quite unsuited for this kind of abstraction, even if that can be solved in theory (with a lot of work). It requires to write things like obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just obj["foo"], because it simply returns Variant from all of its forwarded operators.Algebraic does not expose opIndex. We could add it to Algebraic such that obj["foo"] returns the same type a "this". It's easy for anyone to say that what's there is unfit for a particular purpose. It's also easy for many to define a ever-so-slightly-different new artifact that fits a particular purpose. Where you come as a talented hacker is to operate with the understanding of the importance of making things work, and make it work.I think we need to do it. Otherwise we're stuck with "D's JSON API cannot be used without the GC". We want to escape that gravitational pull. I know it's hard. But it's worth it.- JSONValue should be more opaque and not expose representation as much as it does now. In particular, offering a built-in hashtable is bound to be problematic because those are expensive to construct, create garbage, and are not customizable. Instead, the necessary lookup and set APIs should be provided by JSONValue whilst keeping the implementation hidden. The same goes about array - a JSONValue shall not be exposed; instead, indexed access primitives should be exposed. Separate types might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary. The string type should be a type parameter of JSONValue.This would unfortunately at the same time destroy almost all benefits that using (Tagged)Algebraic has, namely that it would opens up the possibility to have interoperability between different data formats (for example, passing a JSONValue to a BSON generator without letting the BSON generator know about JSON). This is unfortunately an area that I've also not yet properly explored, but I think it's important as we go forward with other data formats.We must accommodate a GC-less world. It's definitely time to acknowledge the GC as a brake that limits D adoption, and put our full thrust behind removing it. Andrei============================== So, here we are. I realize a good chunk of this is surprising ("you mean I shouldn't create strings in my APIs?"). My point here is, again, we're at a juncture. We're trying to factor garbage (heh) out of API design in ways that defer the lifetime management to the user of the API.Most suggestions so far sound very reasonable, namely parameterizing parsing/lexing on the string type and using ranges where possible. JSONValue is a different beast that needs some more thought if we really want to keep it generic in terms of allocation/lifetime model. In terms of removing "garbage" from the API, I'm just not 100% sure if removing small but frequently used functions, such as a string conversion function (one that returns an allocated string) is really a good idea (what Walter's suggested).
Aug 21 2015
On 8/21/15 1:30 PM, Andrei Alexandrescu wrote:So perhaps this is just a naming issue. The names don't suggest everything you said. What I see is "parse a JSON stream" and "parse a JSON value". So I naturally assumed we're looking at consuming a full stream vs. consuming only one value off a stream and stopping. How about better names?I should add that in parseJSONStream, "stream" refers to the input, whereas in parseJSONValue, "value" refers to the output. -- Andrei
Aug 21 2015
On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:We must accommodate a GC-less world. It's definitely time to acknowledge the GC as a brake that limits D adoption, and put our full thrust behind removing it. AndreiWow. Just wow.
Aug 21 2015
On 8/21/15 2:03 PM, tired_eyes wrote:On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:By "it" there I mean "the brake" :o). -- AndreiWe must accommodate a GC-less world. It's definitely time to acknowledge the GC as a brake that limits D adoption, and put our full thrust behind removing it. AndreiWow. Just wow.
Aug 21 2015
On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:On 8/21/15 2:03 PM, tired_eyes wrote:Wait, wait. So you're saying the GC is a brake, and we should remove the brake, and therefore we should remove the GC? This is ... wow. I'm speechless here. T -- He who sacrifices functionality for ease of use, loses both and deserves neither. -- SlashdotterOn Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:By "it" there I mean "the brake" :o). -- AndreiWe must accommodate a GC-less world. It's definitely time to acknowledge the GC as a brake that limits D adoption, and put our full thrust behind removing it. AndreiWow. Just wow.
Aug 21 2015
On 8/21/15 2:50 PM, H. S. Teoh via Digitalmars-d wrote:On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:Nothing new here. We want to make it a pleasant experience to use D without a garbage collector. -- AndreiOn 8/21/15 2:03 PM, tired_eyes wrote:Wait, wait. So you're saying the GC is a brake, and we should remove the brake, and therefore we should remove the GC? This is ... wow. I'm speechless here.On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:By "it" there I mean "the brake" :o). -- AndreiWe must accommodate a GC-less world. It's definitely time to acknowledge the GC as a brake that limits D adoption, and put our full thrust behind removing it. AndreiWow. Just wow.
Aug 21 2015
On Fri, Aug 21, 2015 at 03:22:25PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:On 8/21/15 2:50 PM, H. S. Teoh via Digitalmars-d wrote:Making it pleasant to use without a GC is not the same thing as removing the GC. Which is it? T -- Try to keep an open mind, but not so open your brain falls out. -- thebozOn Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:Nothing new here. We want to make it a pleasant experience to use D without a garbage collector. -- AndreiOn 8/21/15 2:03 PM, tired_eyes wrote:Wait, wait. So you're saying the GC is a brake, and we should remove the brake, and therefore we should remove the GC? This is ... wow. I'm speechless here.On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:By "it" there I mean "the brake" :o). -- AndreiWe must accommodate a GC-less world. It's definitely time to acknowledge the GC as a brake that limits D adoption, and put our full thrust behind removing it. AndreiWow. Just wow.
Aug 21 2015
On 8/21/15 3:22 PM, Andrei Alexandrescu wrote:On 8/21/15 2:50 PM, H. S. Teoh via Digitalmars-d wrote:Allow me to (possibly) clarify. What Andrei is saying is that you should be able to use D and phobos *without* the GC, not that we should remove the GC. e.g. what Walter was talking about at dconf2015 that instead of converting an integer to a GC-allocated string, you return a range that does the same thing but doesn't allocate. -SteveOn Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:Nothing new here. We want to make it a pleasant experience to use D without a garbage collector. -- AndreiOn 8/21/15 2:03 PM, tired_eyes wrote:Wait, wait. So you're saying the GC is a brake, and we should remove the brake, and therefore we should remove the GC? This is ... wow. I'm speechless here.On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:By "it" there I mean "the brake" :o). -- AndreiWe must accommodate a GC-less world. It's definitely time to acknowledge the GC as a brake that limits D adoption, and put our full thrust behind removing it. AndreiWow. Just wow.
Aug 21 2015
Am 21.08.2015 um 19:30 schrieb Andrei Alexandrescu:On 8/18/15 12:54 PM, Sönke Ludwig wrote:Most lines are needed for tests and documentation. Surely dropping some functionality would make the module smaller, too. But there is not a lot to take away without making severe compromises in terms of actual functionality or usability.Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:That would help. My point is it's good design to make the response proportional to the problem. 5K lines is not a lot, but reducing those 5K in the first place would be a noble pursuit. And btw saving parsing time is so C++ :o).* On the face of it, dedicating 6 modules to such a small specification as JSON seems excessive. I'm thinking one module here. (As a simple point: who would ever want to import only foundation, which in turn has one exception type and one location type in it?) I think it shouldn't be up for debate that we must aim for simple and clean APIs.That would mean a single module that is >5k lines long. Spreading out certain things, such as JSONValue into an own module also makes sense to avoid unnecessarily large imports where other parts of the functionality isn't needed. Maybe we could move some private things to "std.internal" or similar and merge some of the modules?So, what's your suggestion, remove all read*/skip* functions for example? Make them member functions of JSONParserRange instead of UFCS functions? We could of course also just use the pseudo modules that std.algorithm had for example, where we'd create a table in the documentation for each category of functions.But I also think that grouping symbols by topic is a good thing and makes figuring out the API easier. There is also always package.d if you really want to import everything.Figuring out the API easily is a good goal. The best way to achieve that is making the API no larger than necessary.No, the idea is to have an output range like so: Appender!string dst; JSONNodeOutputRange r(&dst); r.put(beginArray); r.put(1); r.put(2); r.put(endArray); This would provide a forward interface for code that has to directly iterate over its input, which is the case for a serializer - it can't provide an input range interface in a sane way. The alternative would be to either let the serializer re-implement all of JSON, or to just provide some primitives (writeJSON() that takes bool, number or string) and to let the serializer implement the rest of JSON (arrays/objects), which includes certain options, such as pretty-printing.Another thing I'd like to add is an output range that takes parser nodes and writes to a string output range. This would be the kind of interface that would be most useful for a serialization framework.Couldn't that be achieved trivially by e.g. using map!(t => t.toString) or similar? This is the nice thing about rangifying everything - suddenly you have a host of tools at your disposal.With the difference that numbers stored as numbers never need to allocate, so for non-slicable inputs the compromise is not the same. What about just offering basically three (CT selectable) modes: - Always parse as double (parse lazily if slicing can be used) (default) - Parse double or long (again, lazily if slicing can be used) - Always store the string representation The question that remains is how to handle this in JSONValue - support just double there? Or something like JSONNumber that abstracts away the differences, but makes writing generic code against JSONValue difficult? Or make it also parameterized in what it can store?That seems a good balance, and probably could be applied to numbers as well.- Also, at token level strings should be stored with escapes unresolved. If the user wants a string with the escapes resolved, a lazy range does it.To make things efficient, it currently stores escaped strings if slices of the input are used, but stores unescaped strings if allocations are necessary anyway.There is more than the actual call to validate(), such as writing tests and making sure the surroundings work, adjusting the interface and writing documentation. It's not *that* much work, but nonetheless wasted work. I also still think that this hasn't been a bad idea at all. Because it speeds up the most important use case, parsing JSON from a non-memory source that has not yet been validated. I also very much like the idea of making it a programming error to have invalid UTF stored in a string, i.e. forcing the validation to happen before the cast from bytes to chars.Well if the validation is reused from std.utf, it can't have been very much work. I maintain that separating concerns seems like a good strategy here.- Validating UTF is tricky; I've seen some discussion in this thread about it. On the face of it JSON only accepts valid UTF characters. As such, a modularity-based argument is to pipe UTF validation before tokenization. (We need a lazy UTF validator and sanitizer stat!) An efficiency-based argument is to do validation during tokenization. I'm inclining in favor of modularization, which allows us to focus on one thing at a time and do it well, instead of duplicationg validation everywhere. Note that it's easy to write routines that do JSON tokenization and leave UTF validation for later, so there's a lot of flexibility in composing validation with JSONization.It's unfortunate to see this change of mind in face of the work that already went into the implementation. I also still think that this is a good optimization opportunity that doesn't really affect the implementation complexity. Validation isn't duplicated, but reused from std.utf.Okay.I had "take" in mind. Don't forget that "take" automatically uses slices wherever applicable. So if you just use typeof(take(...)), you get the best of all worlds. The more restrictive version seems reasonable for the first release.- Litmus test: if the input type is a forward range AND if the string type chosen for tokens is the same as input type, successful tokenization should allocate exactly zero memory. I think this is a simple way to make sure that the tokenization API works well.Supporting arbitrary forward ranges doesn't seem to be enough, it would at least have to be combined with something like take(), but then the type doesn't equal the string type anymore. I'd suggest to keep it to "if is sliceable and input type equals string type", at least for the initial version.TaggedAlgebraic would not be a type specialized for JSON! It's useful for all kinds of applications and just happens to have some advantages here, too. An (imperfect) idea for merging this with the existing Algebraic name: template Algebraic(T) if (is(T == struct) || is(T == union)) { // ... implementation of TaggedAlgebraic ... } To avoid the ambiguity with a single type Algebraic, a UDA could be required for T to get the actual TaggedAgebraic behavior. Everything else would be problematic, because TaggedAlgebraic needs to be supplied with names for the different types, so the Algebraic(T...) way of specifying allowed types doesn't really work. And, more importantly, because exploiting static type information in the generated interface means breaking code that currently is built around a Variant return value.Well if you figure the general Algebraic type is better replaced by a type specialized for JSON, fine. What we shouldn't endorse is two nearly identical library types (Algebraic and TaggedAlgebraic) that are only different in subtle matters related to performance in certain use patterns. If integral tags are better for closed type universes, specialize Algebraic to use integral tags where applicable.- The JSON value does its own internal allocation (for e.g. arrays and hashtables), which should be fine as long as it's encapsulated and we can tweak it later (e.g. make it use reference counting inside).Since it's based on (Tagged)Algebraic, the internal types are part of the interface. Changing them later is bound to break some code. So AFICS this would either require to make the types used parameterized (string, array and AA types). Or to abstract them away completely, i.e. only forward operations but deny direct access to the type. ... thinking about it, TaggedAlgebraic could do that, while Algebraic can't.parseToJSONValue/parseToJSONStream? parseAsX?So perhaps this is just a naming issue. The names don't suggest everything you said. What I see is "parse a JSON stream" and "parse a JSON value". So I naturally assumed we're looking at consuming a full stream vs. consuming only one value off a stream and stopping. How about better names?- Why both parseJSONStream and parseJSONValue? I'm thinking parseJSONValue would be enough because then you trivially parse a stream with repeated calls to parseJSONValue.parseJSONStream is the pull parser (StAX style) interface. It returns the contents of a JSON document as individual nodes instead of storing them in a DOM. This part is vital for high-performance parsing, especially of large documents.The callbacks would surely be dropped when ranges get available. foreach() should usually be all that is needed.Great. Let me say again that with ranges you get to instantly tap into a wealth of tools. I say get rid of the callbacks and let a "tee" take care of it for whomever needs it.- readArray suddenly introduces a distinct kind of interacting - callbacks. Why? Should be a lazy range lazy range lazy range. An adapter using callbacks is then a two-liner.It just has a more complicated implementation, but is already on the TODO list.Frankly, I don't think that this is even the wrong place. The pull parser interface is the single most important part of the API when we talk about allocation-less and high-performance operation. It also really has low weight, as it's just a small function that joins the other read* functions quite naturally and doesn't create any additional cognitive load.Meh, fine. But all of this is adding weight to the API in the wrong places.- Why is readBool even needed? Just readJSONValue and then enforce it as a bool. Same reasoning applies to readDouble and readString.This is for lower level access, using parseJSONValue would certainly be possible, but it would have quite some unneeded overhead and would also be non- nogc.It could return a Tuple!(string, JSONNodeRange). But probably there should also be an opApply for the object field range, so that foreach (key, value; ...) becomes possible.Awes!- readObject is with callbacks again - it would be nice if it were a lazy range.Okay, is also already on the list.https://github.com/D-Programming-Language/phobos/blob/6df5d551fd8a21feef061483c226e7d9b26d6cd4/std/variant.d#L1088 https://github.com/D-Programming-Language/phobos/blob/6df5d551fd8a21feef061483c226e7d9b26d6cd4/std/variant.d#L1348But apart from that, algebraic is unfortunately currently quite unsuited for this kind of abstraction, even if that can be solved in theory (with a lot of work). It requires to write things like obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just obj["foo"], because it simply returns Variant from all of its forwarded operators.Algebraic does not expose opIndex. We could add it to Algebraic such that obj["foo"] returns the same type a "this".It's easy for anyone to say that what's there is unfit for a particular purpose. It's also easy for many to define a ever-so-slightly-different new artifact that fits a particular purpose. Where you come as a talented hacker is to operate with the understanding of the importance of making things work, and make it work.The problem is that making Algebraic exploit static type information means nothing short of a complete reimplementation, which TaggedAlgebraic is. It also means breaking existing code, if, for example, alg[0] suddenly returns a string instead of just a Variant with a string stored inside.I can't fight the feeling that what Phobos currently has in terms of allocators, containters and reference counting is simply not mature enough to make a good decision here. Restricting JSONValue as much as possible would at least keep the possibility to extend it later, but I think that we can and should do better in the long term.I think we need to do it. Otherwise we're stuck with "D's JSON API cannot be used without the GC". We want to escape that gravitational pull. I know it's hard. But it's worth it.- JSONValue should be more opaque and not expose representation as much as it does now. In particular, offering a built-in hashtable is bound to be problematic because those are expensive to construct, create garbage, and are not customizable. Instead, the necessary lookup and set APIs should be provided by JSONValue whilst keeping the implementation hidden. The same goes about array - a JSONValue shall not be exposed; instead, indexed access primitives should be exposed. Separate types might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary. The string type should be a type parameter of JSONValue.This would unfortunately at the same time destroy almost all benefits that using (Tagged)Algebraic has, namely that it would opens up the possibility to have interoperability between different data formats (for example, passing a JSONValue to a BSON generator without letting the BSON generator know about JSON). This is unfortunately an area that I've also not yet properly explored, but I think it's important as we go forward with other data formats.
Aug 22 2015
On Saturday, 22 August 2015 at 13:41:49 UTC, Sönke Ludwig wrote:There is more than the actual call to validate(), such as writing tests and making sure the surroundings work, adjusting the interface and writing documentation. It's not *that* much work, but nonetheless wasted work. I also still think that this hasn't been a bad idea at all. Because it speeds up the most important use case, parsing JSON from a non-memory source that has not yet been validated. I also very much like the idea of making it a programming error to have invalid UTF stored in a string, i.e. forcing the validation to happen before the cast from bytes to chars.Also see "utf/unicode should only be validated once" https://issues.dlang.org/show_bug.cgi?id=14919 If combining lexing and validation is faster (why?) then a ubyte consuming interface should be available, though why couldn't it be done by adding a lazy ubyte->char validator range to std.utf. In any case during lexing we should avoid autodecoding of narrow strings for redundant validation.
Aug 24 2015
Am 25.08.2015 um 07:55 schrieb Martin Nowak:On Saturday, 22 August 2015 at 13:41:49 UTC, Sönke Ludwig wrote:The performance benefit comes from the fact that almost all of JSON is a subset of ASCII, so that lexing the input will implicitly validate it as correct UTF. The only places where actual UTF sequences can occur is in string literals outside of escape sequences. Depending on the type of document, that can result is a lot less conditionals compared to a full validation of the input. Autodecoding during lexing is being avoided, everything happens on the code unit level.There is more than the actual call to validate(), such as writing tests and making sure the surroundings work, adjusting the interface and writing documentation. It's not *that* much work, but nonetheless wasted work. I also still think that this hasn't been a bad idea at all. Because it speeds up the most important use case, parsing JSON from a non-memory source that has not yet been validated. I also very much like the idea of making it a programming error to have invalid UTF stored in a string, i.e. forcing the validation to happen before the cast from bytes to chars.Also see "utf/unicode should only be validated once" https://issues.dlang.org/show_bug.cgi?id=14919 If combining lexing and validation is faster (why?) then a ubyte consuming interface should be available, though why couldn't it be done by adding a lazy ubyte->char validator range to std.utf. In any case during lexing we should avoid autodecoding of narrow strings for redundant validation.
Aug 25 2015
On 08/25/2015 09:03 AM, Sönke Ludwig wrote:The performance benefit comes from the fact that almost all of JSON is a subset of ASCII, so that lexing the input will implicitly validate it as correct UTF. The only places where actual UTF sequences can occur is in string literals outside of escape sequences. Depending on the type of document, that can result is a lot less conditionals compared to a full validation of the input.I see, then we should indeed exploit this fact and offer lexing of ubyte[]-ish ranges.
Aug 25 2015
On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:- JSONValue should offer a byToken range, which offers the contents of the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' token followed by three numeric tokens with the respective values followed by the ']' token.What about the comma tokens?
Aug 19 2015
On 8/19/15 8:42 AM, Timon Gehr wrote:On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:Forgot about those. The invariant is that byToken should return a sequence of tokens that, when parsed, produces the originating object. -- Andrei- JSONValue should offer a byToken range, which offers the contents of the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' token followed by three numeric tokens with the respective values followed by the ']' token.What about the comma tokens?
Aug 19 2015
On 2015-08-19 19:29, Andrei Alexandrescu wrote:Forgot about those. The invariant is that byToken should return a sequence of tokens that, when parsed, produces the originating object.That should be possible without the comma tokens in this case? -- /Jacob Carlborg
Aug 19 2015
On 8/19/15 1:59 PM, Jacob Carlborg wrote:On 2015-08-19 19:29, Andrei Alexandrescu wrote:That is correct, but would do little else than confusing folks. FWIW the distinction is similar to AST vs. CST (C = Concrete). -- AndreiForgot about those. The invariant is that byToken should return a sequence of tokens that, when parsed, produces the originating object.That should be possible without the comma tokens in this case?
Aug 19 2015
On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:* All new stuff should go in std.experimental. I assume "stdx" would change to that, should this work be merged.Though stdx (or better std.x) would have been a prettier and more exciting name for std.experimental to begin with.
Aug 24 2015
On 08/25/2015 08:18 AM, Martin Nowak wrote:On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:The great thing about the experimental package is that we are actually allowed to rename it. :-)* All new stuff should go in std.experimental. I assume "stdx" would change to that, should this work be merged.Though stdx (or better std.x) would have been a prettier and more exciting name for std.experimental to begin with.
Aug 25 2015
On 8/25/15 11:02 AM, Timon Gehr wrote:On 08/25/2015 08:18 AM, Martin Nowak wrote:I strongly oppose renaming it. I don't want Phobos to fall into the trap of javax, which was supposed to be "experimental" but then became unmovable. std.experimental is much more obvious that you shouldn't expect things to live there forever. -SteveOn 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:The great thing about the experimental package is that we are actually allowed to rename it. :-)* All new stuff should go in std.experimental. I assume "stdx" would change to that, should this work be merged.Though stdx (or better std.x) would have been a prettier and more exciting name for std.experimental to begin with.
Aug 25 2015
Will try to convert a piece of code I wrote a few days ago. https://github.com/MartinNowak/rabbitmq-munin/blob/48c3e7451dec0dcb2b6dccbb9b4230b224e2e647/src/app.d Right now working with json for trivial stuff is a pain.
Aug 25 2015
So, what is the current status of std.data.json? This topic is almost two month old, what is the result of "two week process"? Wiki page tells nothing except of "ready for comments".
Sep 24 2015
On Thursday, 24 September 2015 at 20:44:57 UTC, tired_eyes wrote:So, what is the current status of std.data.json? This topic is almost two month old, what is the result of "two week process"? Wiki page tells nothing except of "ready for comments".I probably should have posted here. Soenke is working on all the comments as far as I know. It'll come back. Atila
Sep 24 2015
Am Tue, 28 Jul 2015 14:07:18 +0000 schrieb "Atila Neves" <atila.neves gmail.com>:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaThere is one thing I noticed today that I personally feel strongly about: Serialized double values are not restored accurately. That is, when I send a double value via JSON and use enough digits to represent it accurately, it may not be decoded to the same value. `std.json` does not have this problem with the random values from [0..1) I tested with. I also tried `LexOptions.useBigInt/.useLong` to no avail. Looking at the unittests it seems the decision was deliberate, as `approxEqual` is used in parsing tests. JSON specs don't enforce any specific accuracy, but they say that you can arrange for a lossless transmission of the widely supported IEEE double values, by using up to 17 significant digits. -- Marco
Oct 02 2015
JSON is a particular file format useful for serialising heirachical data. Given that D also has an XML module which appears to be deprecated, I wonder if it would be better to write a more abstract serialisation/persistance module that could use either json,xml,some binary format and future formats. I would estimate that more than 70% of the times, the JSON data will only be read and written by a single D application, with only occasional inspection by developers etc. In these cases it is undesirable to have code littered with types coming from a particular serialisation file format library. As the software evolves that file format might become obsolete/slow/unfashionable etc, and it would be much nicer if the format could be changed without a lot of code being touched. The other 30% of uses will genuinely need raw JSON control when reading/writing files written/read by other software, and this needs to be in Phobos to implement the backends. It would be better for most people to not write their code in terms of JSON, but in terms of the more abstract concept of persistence/serialisation (whatever you want to call it).
Oct 06 2015
Am 06.10.2015 um 12:05 schrieb Alex:JSON is a particular file format useful for serialising heirachical data. Given that D also has an XML module which appears to be deprecated, I wonder if it would be better to write a more abstract serialisation/persistance module that could use either json,xml,some binary format and future formats. I would estimate that more than 70% of the times, the JSON data will only be read and written by a single D application, with only occasional inspection by developers etc. In these cases it is undesirable to have code littered with types coming from a particular serialisation file format library. As the software evolves that file format might become obsolete/slow/unfashionable etc, and it would be much nicer if the format could be changed without a lot of code being touched. The other 30% of uses will genuinely need raw JSON control when reading/writing files written/read by other software, and this needs to be in Phobos to implement the backends. It would be better for most people to not write their code in terms of JSON, but in terms of the more abstract concept of persistence/serialisation (whatever you want to call it).A generic serialization framework is definitely needed! Jacob Carlborg had once tried to get the Orange[1] serialization library into Phobos, but the amount of requested changes was quite overwhelming and it didn't work out so far. There is also a serialization framework in vibe.d[2], but in contrast to Orange it doesn't handle cross references (for pointers/reference types). But this is definitely outside of the scope of this particular module and will require a separate effort. It is intended to be well suited for that purpose, though. [1]: https://github.com/jacob-carlborg/orange [2]: http://vibed.org/api/vibe.data.serialization/
Oct 06 2015
On Tuesday, 6 October 2015 at 10:05:46 UTC, Alex wrote:I wonder if it would be better to write a more abstract serialisation/persistance module that could use either json,xml,some binary format and future formats.I think there are too many particulars making an abstract (de)serialization module unworkable. If that wasn't the case it would be easy to transform any format into another, by simply deserializing from format A and serializing to format B. But a little experiment will show you that it requires a lot of complexity for the non-trivial case. And the format's particulars will still show up in your code. At which point it begs the question, why not just write simple primitive (de)serialization modules that only do one format? Probably easier to build, maintain and debug. I am reminded of a binary file format I once wrote which supported referenced objects and had enough meta-data to allow garbage collection. It was a big ugly c++ template monster. Any abstract deserializer is going to stay away from that.
Oct 06 2015
On Tuesday, 6 October 2015 at 15:47:08 UTC, Sebastiaan Koppe wrote:At which point it begs the question, why not just write simple primitive (de)serialization modules that only do one format? Probably easier to build, maintain and debug.The binary one is the one I care about, so that's the one I wrote: https://github.com/atilaneves/cerealed I've thinking of adding other formats. I don't know if it's worth it. Atila
Oct 06 2015
On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaSorry for the late ping, but it's been 3 years - what has happened to this? Has it been forgotten? Working with JSON in D is still quite painful.
Oct 09 2018
On Tuesday, 9 October 2018 at 18:07:44 UTC, Márcio Martins wrote:On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:I presume it became vibe.data.json, there is also asdf if you're looking for some other library.Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaSorry for the late ping, but it's been 3 years - what has happened to this? Has it been forgotten? Working with JSON in D is still quite painful.
Oct 09 2018
On Tuesday, October 9, 2018 5:45:02 PM MDT Nicholas Wilson via Digitalmars-d wrote:On Tuesday, 9 October 2018 at 18:07:44 UTC, Márcio Martins wrote:As I understand it, it was originally part of vibe.d (though I think that it may have been internal-only to be begin with) and was put into std_data_json for the attempt to get into Phobos. The version inside of vibe.d has continued to be maintained while the separate version hasn't really been. Either way, there wasn't enough agreement about the design during the Phobos review process for it to make it into Phobos, and Sonke gave up on it. I've used std_data_json on a few projects, and it works reasonably well for reading JSON, but I've found it rather frustrating when writing JSON, because you have no control over the order it writes data in - which results in perfectly valid JSON, since the key-value pairs are not ordered, but it's really annoying to use it for configuration files and the like where you organize the file the way you'd like, and then your program completely reorders it when it needs to make an adjustment to the file. However, I really need to check out the properly maintained version in vibe.d, and as you say, there are other JSON parsers on code.dlang.org such as asdf. Writing a JSON parser is pretty easy. It's coming up with an API that would get through the Phobos review process that's not necessarily easy. While we would like to replace std.json, someone is going to have to put in the effort to write (or complete) something - and push it through the Phobos review process - in order to replace std.json. And while there's clearly some interest in having certain modules in Phobos replaced (or in some cases, new modules added), there doesn't seem to be many people willing to push their code through the Phobos review process at this point even if they've put the time and effort into writing the code. They're far more willing to just put it up on code.dlang.org and leave it at that. I think that the fact that code.dlang.org works as well as it does has to a large extent killed off interest in attempting to put anything through the Phobos review process. It's been quite some time since anyone has made the attempt. Personally, I don't think that we even need some of the stuff in Phobos that's in there (like JSON or XML parsers) and that having it on code.dlang.org makes more sense, but I do think that having subpar versions of them in Phobos is a problem - arguably even more so when we say that at the top of the documentation and have had for years as is the case with std.xml. I don't think that std.json is rated as badly, but it's been talked about as needing replacement for years, and std_data_json would have replaced it had it made it through the review process. So, we should probably replace it with something one of these days, but of course, someone has to put in the time and effort, which no one wants to do. - Jonathan M DavisOn Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:I presume it became vibe.data.json, there is also asdf if you're looking for some other library.Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaSorry for the late ping, but it's been 3 years - what has happened to this? Has it been forgotten? Working with JSON in D is still quite painful.
Oct 09 2018
On Tuesday, 9 October 2018 at 18:07:44 UTC, Márcio Martins wrote:On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:It's been moved here https://github.com/dlang-community/std_data_json since a few weeks. Contributions are welcome and actually if someone want to take the leadership for this library then show some willingness and you'll get invited.Start of the two week process, folks. Code: https://github.com/s-ludwig/std_data_json Docs: http://s-ludwig.github.io/std_data_json/ AtilaSorry for the late ping, but it's been 3 years - what has happened to this? Has it been forgotten?Working with JSON in D is still quite painful
Oct 09 2018