www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Code review: JSON unmarshaller

reply "Tyler Jameson Little" <beatgammit gmail.com> writes:
https://gist.github.com/3894337

This is my first non-trivial D code, and I'd eventually like to 
get this into Phobos as part of std.json.

I haven't written the marshaller yet, but that shouldn't be too 
hard. I wanted to get some feedback on whether this code is up to 
the quality standards of Phobos.

I used a lot of templates, so I hope I didn't break any cardinal 
sins, especially in terms of readability. I did my best in 
grokking std.traits, but I may have missed some subtleties about 
what the templates are actually testing.

I used asserts and contracts to validate input, so the following 
would throw an AssertError:

     int x = unmarshalJSON!int(`"5"`);

I wasn't sure if this is bad style, since AssertError is in 
core.exception. If this is considered bad style in D, I can 
create a JSONMarshalException and throw that instead.
Oct 15 2012
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-10-15 21:03, Tyler Jameson Little wrote:
 https://gist.github.com/3894337

 This is my first non-trivial D code, and I'd eventually like to get this
 into Phobos as part of std.json.

 I haven't written the marshaller yet, but that shouldn't be too hard. I
 wanted to get some feedback on whether this code is up to the quality
 standards of Phobos.
I'm not sure what your goal with this marshaller is but I would say it's a lot harder than you think if you want to have a complete serialization library. A couple of things making it harder to create a fully working serialization library: * Pointers * Array slices * Serializing through base class references * const/immutable fields * Any reference type (not really hard but it's more work) Have a look at for a basically fully working serialization library Orange: https://github.com/jacob-carlborg/orange -- /Jacob Carlborg
Oct 15 2012
parent reply "Tyler Jameson Little" <beatgammit gmail.com> writes:
 I'm not sure what your goal with this marshaller is but I would 
 say it's a lot harder than you think if you want to have a 
 complete serialization library. A couple of things making it 
 harder to create a fully working serialization library:
I'm basically trying to reproduce other JSON marshallers, like Go's, but using compile-time reflection. Go uses runtime reflection, which D notably does not support. I like the idea of compile-time reflection better anyway. There are a few things that would make it easier (like a __traits call like allMembers that excludes functions). I use a lot of JSON, so a JSON marshaller/unmarshaller is going to save a lot of time, and make my code a lot cleaner.
 * Pointers
I've done this, but haven't fully tested it. Basic pointers work.
 * Array slices
I think this is handled.
 * Serializing through base class references
Doesn't __traits(allMembers, T) give everything from all super classes?
 * const/immutable fields
Hmm, not sure to handle this. These have to be set in the constructor, right?
 * Any reference type (not really hard but it's more work)
Are you talking about aliases? What other kind of reference types are there in structs/classes? I'm assuming this will have more to do with marshalling as opposed to unmarshalling.
 Have a look at for a basically fully working serialization 
 library Orange:

 https://github.com/jacob-carlborg/orange
Hmm, looks interesting. This looks like it only supports XML, which I don't use, but I'm sure you've already solved a lot of the corner cases. Thanks, I'll take a look!
Oct 15 2012
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-10-15 22:35, Tyler Jameson Little wrote:

 I'm basically trying to reproduce other JSON marshallers, like Go's, but
 using compile-time reflection. Go uses runtime reflection, which D
 notably does not support. I like the idea of compile-time reflection
 better anyway. There are a few things that would make it easier (like a
 __traits call like allMembers that excludes functions).
Most other languages are not as complicated as D, it's basically only C and C++ that are. Implementing a marshaller in Ruby would be dead simple. No pointers, no array slices (in the same way as D), support for full runtime reflection.
 I use a lot of JSON, so a JSON marshaller/unmarshaller is going to save
 a lot of time, and make my code a lot cleaner.
Most of these points are when unmarshalling. I haven't actually looked if your marshaller can handle these cases but looking at the small amount of code I would guess no.
 * Pointers
I've done this, but haven't fully tested it. Basic pointers work.
Are they correctly setup when unmarshaling. Example: int a = 3; // global/TLS class Foo { int b = 4; int* c; int* d; } auto foo = new Foo; foo.c = &a; foo.d = &foo.b; When unmarshaling will "foo.d" point to "foo.b"?
 * Array slices
I think this is handled.
This is basically the same as pointers: class Foo { int[] a; int[] b; } auto foo = new Foo; foo.a = [3, 4, 5, 6]; foo.b = foo.a[1 .. 3]; When unmarshaling will "foo.b" point to "foo.a"?
 * Serializing through base class references
Doesn't __traits(allMembers, T) give everything from all super classes?
__traits only work at compile time. class A { int a; } class B : A { int b; } A b = new B; The static type of "b" is "A" so all information about "B" is lost at compile time. You either need to provide a way to register all subclasses that should be be marshaled through a base class reference or you need to implement proper runtime reflection.
 * const/immutable fields
Hmm, not sure to handle this. These have to be set in the constructor, right?
You shouldn't call the constructor when unmarshaling. That's another problem. Do you want to limit your marshaller to only work with classes that have a default constructor or none. You need to create the class instances without calling the constructor. Then you could provide a method that will be called before/after unmarshaling. Have a look that this post: http://www.digitalmars.com/d/archives/digitalmars/D/Deserializing_const_fields_175774.html
 * Any reference type (not really hard but it's more work)
Are you talking about aliases? What other kind of reference types are there in structs/classes? I'm assuming this will have more to do with marshalling as opposed to unmarshalling.
Yes, you don't want to marshal the same object twice. References types in D are: objects, pointers, associative arrays and arrays. These are the ones I can think of for now.
 Have a look at for a basically fully working serialization library
 Orange:

 https://github.com/jacob-carlborg/orange
Hmm, looks interesting. This looks like it only supports XML, which I don't use, but I'm sure you've already solved a lot of the corner cases. Thanks, I'll take a look!
I have solved a lot of corner cases but there are a few left. I have a branch for handling const/immutable fields but it needs more testing before merging it with the master branch. I'm also not really happy about the deserializing of arrays. It's quite slow. Apparently it's also breaks as soon as you turn on some kind of optimization when compiling. The goal of Orange was to be able serialize basically everything found in D. Also to support multiple archive types, i.e. XML, JSON, binary and so on. -- /Jacob Carlborg
Oct 15 2012
next sibling parent reply "Tyler Jameson Little" <beatgammit gmail.com> writes:
I could make my marshaller/unmarshaller only update objects in 
place. I think this is more useful and would remove the overlap 
between orange and the JSON library. We could then write a JSON 
archiver for orange and include it in std.json as well.

The call to unmarshal would look like:

bool unmarshalJSON(T)(JSONValue val, out T ret);

The following restrictions would apply:

* T must be fully instantiated (all pointers are valid [not null])
* T must not be recursive (results in infinite recursion, and 
hence stack overflow)

And the marshaller:

JSONValue marshalJSON(T)(in T val);

For marshalling, the restrictions are:

* Slices are handled as if they were an array (copy all values)
* Same as unmarshaller, except null pointers will be treated as 
JSON null

I really like Go's JSON marshaller/unmarshaller, so I'm trying to 
model after that one. It allows updating an object in place, 
which was already a goal.

There should probably be some standard D serialization format. In 
working with a structure trained on data (for machine learning, 
natural language processing, etc), a complete serialization 
solution makes sense. But for simple data passing, JSON makes a 
lot of sense.

What do you think, do you think there's a place in Phobos for a 
simple JSON marshaller/unmarshaller?

I'll have some updated code soon, and I'll post back when that's 
done, in case you'd like to have a look.
Oct 17 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-10-17 19:39, Tyler Jameson Little wrote:
 I could make my marshaller/unmarshaller only update objects in place. I
 think this is more useful and would remove the overlap between orange
 and the JSON library. We could then write a JSON archiver for orange and
 include it in std.json as well.

 The call to unmarshal would look like:

 bool unmarshalJSON(T)(JSONValue val, out T ret);
Orange works with the archive at a lower level. For example, the archive doesn't really have to know how to (un)archive an object or struct. The serializer will break down the object into its fields and ask the archive to (un)archive the individual fields. The only thing the archive needs to know is that "here starts an object, from now on all (un)archived values will be part of the object until I say otherwise".
 The following restrictions would apply:

 * T must be fully instantiated (all pointers are valid [not null])
That seems to be an unnecessary restriction.
 * T must not be recursive (results in infinite recursion, and hence
 stack overflow)
I think the serializer in Orange can handle this. That would mean the archive doesn't need to handle this.
 And the marshaller:

 JSONValue marshalJSON(T)(in T val);

 For marshalling, the restrictions are:

 * Slices are handled as if they were an array (copy all values)
So mean: int[] a = [3, 4, 5, 6]; int[] b = [1 .. $ - 1]; That "a" and "b" would be marshaled as two distinct arrays? In Orange, I think the serializer will handle this and the archive doesn't need to care. I tried to but as much of the code in the serializer so the archives doesn't need to bother with these kind of things.
 * Same as unmarshaller, except null pointers will be treated as JSON null
If you can marshal a null pointer, how can you not unmarshal it?
 I really like Go's JSON marshaller/unmarshaller, so I'm trying to model
 after that one. It allows updating an object in place, which was already
 a goal.

 There should probably be some standard D serialization format. In
 working with a structure trained on data (for machine learning, natural
 language processing, etc), a complete serialization solution makes
 sense. But for simple data passing, JSON makes a lot of sense.
Absolutely, there is a need for both, see below.
 What do you think, do you think there's a place in Phobos for a simple
 JSON marshaller/unmarshaller?
Absolutely. I think there is a need for several types and variants of serialization. Sometimes you need to have a fully capable serialization library that can handle all types, custom serialization of third party types and so on. In other cases you don't really care an just want to dump some data to disk or whatever.
 I'll have some updated code soon, and I'll post back when that's done,
 in case you'd like to have a look.
-- /Jacob Carlborg
Oct 17 2012
prev sibling next sibling parent reply "Tyler Jameson Little" <beatgammit gmail.com> writes:
Here's the updated code. It's got a marshaller and unmarshaller:

https://gist.github.com/3894337

It's about 650 lines. If you have time, I'd be very interested in 
getting some feedback (or from anyone else who sees this post of 
course).

The main problem I'm having right now is that classes/structs 
have to be static. I'm not 100% sure why the compiler cannot see 
non-static classes/structs at compile time. Do you happen to know 
why? It seems like a template should work in either case, 
assuming I'm understanding D templates correctly.

I didn't find any clear documentation for static outer classes, 
only static inner classes. It's not the same as static Java 
classes, which cannot be instantiated (if memory serves).
Oct 17 2012
next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 17 October 2012 at 19:44:47 UTC, Tyler Jameson 
Little wrote:
 The main problem I'm having right now is that classes/structs 
 have to be static.
That seems weird, I've done something similar with non-static structs before. Maybe it will help if you use __traits(getMember, obj, name) instead of mixin.
Oct 17 2012
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-10-17 21:44, Tyler Jameson Little wrote:
 Here's the updated code. It's got a marshaller and unmarshaller:

 https://gist.github.com/3894337

 It's about 650 lines. If you have time, I'd be very interested in
 getting some feedback (or from anyone else who sees this post of course).

 The main problem I'm having right now is that classes/structs have to be
 static. I'm not 100% sure why the compiler cannot see non-static
 classes/structs at compile time. Do you happen to know why? It seems
 like a template should work in either case, assuming I'm understanding D
 templates correctly.
What do you mean with "static structs/classes"? Are you talking about nested classes and structs?
 I didn't find any clear documentation for static outer classes, only
 static inner classes. It's not the same as static Java classes, which
 cannot be instantiated (if memory serves).
-- /Jacob Carlborg
Oct 17 2012
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-10-17 21:44, Tyler Jameson Little wrote:
 Here's the updated code. It's got a marshaller and unmarshaller:

 https://gist.github.com/3894337

 It's about 650 lines. If you have time, I'd be very interested in
 getting some feedback (or from anyone else who sees this post of course).
I'll try and see if I can find some time to give feedback on this. -- /Jacob Carlborg
Oct 17 2012
prev sibling parent reply "Kagamin" <spam here.lot> writes:
On Tuesday, 16 October 2012 at 06:37:55 UTC, Jacob Carlborg wrote:
 The goal of Orange was to be able serialize basically 
 everything found in D.
Can it serialize Variant?
Oct 17 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-10-17 22:03, Kagamin wrote:

 Can it serialize Variant?
No, but I'm working on it. Actually, it can serialize it, but not deserialize it. -- /Jacob Carlborg
Oct 18 2012
prev sibling parent "Dan" <dbdavidson yahoo.com> writes:
On Monday, 15 October 2012 at 20:35:34 UTC, Tyler Jameson Little 
wrote:
 I'm basically trying to reproduce other JSON marshallers, like 
 Go's, but using compile-time reflection. Go uses runtime 
 reflection, which D notably does not support. I like the idea 
 of compile-time reflection better anyway. There are a few 
 things that would make it easier (like a __traits call like 
 allMembers that excludes functions).

 I use a lot of JSON, so a JSON marshaller/unmarshaller is going 
 to save a lot of time, and make my code a lot cleaner.
I like Go's JSON convenience as well. There is a nice feature where you can add attributes to the members that are then available at runtime and therefore used by the serializer. So you could have: ------ type AcquiredRetired struct { Acquired tvm.Date `bson:"a"` Retired tvm.Date `bson:"r"` } ------ Here it specifies a shortened key for bson, but you can do the same for json. The size benefit can be significant. A design choice they made is to only serialize members that are capitalized which means visible. There is a nice json serialize/deserialize library in vibed. When I throw this struct at your marshalJSON I get compile errors. ---------- import std.stdio; struct X { class D { string b = "B"; } string a = "A"; D d; } void main() { auto c = new X(); auto o = marshalJSON(c); writeln(o); } ---------- Thanks Dan
Oct 18 2012
prev sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 10/15/2012 12:03 PM, Tyler Jameson Little wrote:

I did my best in grokking
 std.traits, but I may have missed some subtleties about what the
 templates are actually testing.
You have mentioned needing an allMembers that excluded functions in one of your other posts. The following thread was exactly about that. I can never remember the solution, but I found it again: :) http://www.digitalmars.com/d/archives/digitalmars/D/learn/Getting_only_the_data_members_of_a_type_34086.html (Although JSON is not mentioned in there, that thread has been inspired by marshalling as well. :))
 I used asserts and contracts to validate input, so the following would
 throw an AssertError:

 int x = unmarshalJSON!int(`"5"`);
std.exception.enforce is the right choice in that case. You don't want the checks to disappear when asserts are turned off.
 I wasn't sure if this is bad style, since AssertError is in
 core.exception. If this is considered bad style in D, I can create a
 JSONMarshalException and throw that instead.
That makes sense too. There is enforceEx() to throw a specific type of exception. Ali
Oct 16 2012
parent reply "Tyler Jameson Little" <beatgammit gmail.com> writes:
 You have mentioned needing an allMembers that excluded 
 functions in one of your other posts. The following thread was 
 exactly about that. I can never remember the solution, but I 
 found it again: :)


 http://www.digitalmars.com/d/archives/digitalmars/D/learn/Getting_only_the_data_members_of_a_type_34086.html
The mentioned solution doesn't account for shared fields from a super class: class A { int a; } class S { int b; } foreach (i, type; typeof(S.tupleof)) { enum name = S.tupleof[i].stringof[4..$]; writef("(%s) %s\n", type.stringof, name); } This will print: (int) b My implementation is ugly, but it works for this case: (ret.b) b (ret.a) a I could use std.traits.BaseClassTuple, but then I'd have to filter out common fields, and that sounds like a lot of work, especially since there's no practical difference.
 I used asserts and contracts to validate input, so the
following would
 throw an AssertError:

 int x = unmarshalJSON!int(`"5"`);
std.exception.enforce is the right choice in that case. You don't want the checks to disappear when asserts are turned off.
 I wasn't sure if this is bad style, since AssertError is in
 core.exception. If this is considered bad style in D, I can
create a
 JSONMarshalException and throw that instead.
That makes sense too. There is enforceEx() to throw a specific type of exception. Ali
Good point. I'll probably make a JSONMarshalException, which is separate from JSONException in std.json so the library clearly indicates which part failed. Thanks for the link, it was an interesting read! Maybe I'll have to dig around in std.traits and maybe add some missing stuff. With mixin() (I'd forgotten about it) I was able to get rid of all __traits calls except for allMembers.
Oct 17 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-10-17 20:36, Tyler Jameson Little wrote:

 The mentioned solution doesn't account for shared fields from a super
 class:

      class A { int a; }
      class S { int b; }

      foreach (i, type; typeof(S.tupleof)) {
          enum name = S.tupleof[i].stringof[4..$];
          writef("(%s) %s\n", type.stringof, name);
      }

 This will print:

      (int) b
Just do something like this: alias BaseTypeOf!(S) BaseType; BaseType t = type; And run the same loop again. -- /Jacob Carlborg
Oct 17 2012