www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Review: std.msgpack

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
There's one larger submission that needs review - Masahiro Nakagawa's 
std.msgpack. I think it would be great if this community established a 
review process. To do so, we should abstract away starting from at least 
one review :o). Here's Masahiro's work:

code: https://dl.dropbox.com/u/374829/std/msgpack.d
doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and 
doc)

Feel free to comment here.


Andrei
Jun 17 2010
next sibling parent reply Graham Fawcett <fawcett uwindsor.ca> writes:
On Thu, 17 Jun 2010 21:29:50 -0700, Andrei Alexandrescu wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's
 std.msgpack. I think it would be great if this community established a
 review process. 
+1!
 To do so, we should abstract away starting from at least
 one review :o). Here's Masahiro's work:
 
 code: https://dl.dropbox.com/u/374829/std/msgpack.d doc: 
 https://dl.dropbox.com/u/374829/std/msgpack.html zip: 
 https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and doc)
 
 Feel free to comment here.
Nice! The docs should link to the MessagePack Format Spec page in the first paragraph. I had not known that MessagePack was an existing binary serialization format. http://redmine.msgpack.org/projects/msgpack/wiki/FormatSpec I was going to complain that this module didn't handle D-specific types like Variants, but now I see that this is out of scope. Just curious, why have MessagePack support in Phobos, instead of Packed Encoding Rules, BERT, BSON, Protocol Buffers, etc.? Not that this isn't a very useful module, but I'm curious about why this particular binary encoding. If it's a 'batteries included' thing ("let's put as many useful modules as possible in the stdlib"), then maybe a dotted name like 'std.serializer.msgpack' might be better, to give a proper home for 'std.serializer.per', etc.? I'm not sure this deserves the same 'top level billing' as, say, std.stdio or std.range; and eventually the toplevel namespace will be very cluttered (c.f. Python's stdlib). In a perfect world, I would see this as an excellent member of a CPAN-like system for D, rather than a stdlib module. Since that doesn't exist yet, +1, but with a nested module name. Best, Graham
Jun 18 2010
parent reply "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Fri, 18 Jun 2010 23:05:45 +0900, Graham Fawcett <fawcett uwindsor.ca>  
wrote:

 On Thu, 17 Jun 2010 21:29:50 -0700, Andrei Alexandrescu wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's
 std.msgpack. I think it would be great if this community established a
 review process.
+1!
 To do so, we should abstract away starting from at least
 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d doc:
 https://dl.dropbox.com/u/374829/std/msgpack.html zip:
 https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and doc)

 Feel free to comment here.
Nice! The docs should link to the MessagePack Format Spec page in the first paragraph. I had not known that MessagePack was an existing binary serialization format. http://redmine.msgpack.org/projects/msgpack/wiki/FormatSpec I was going to complain that this module didn't handle D-specific types like Variants, but now I see that this is out of scope.
All right. I moved link to See_Also section in 1st paragraph.
 Just curious, why have MessagePack support in Phobos, instead of
 Packed Encoding Rules, BERT, BSON, Protocol Buffers, etc.? Not that
 this isn't a very useful module, but I'm curious about why this
 particular binary encoding.

 If it's a 'batteries included' thing ("let's put as many useful
 modules as possible in the stdlib"), then maybe a dotted name like
 'std.serializer.msgpack' might be better, to give a proper home for
 'std.serializer.per', etc.? I'm not sure this deserves the same 'top
 level billing' as, say, std.stdio or std.range; and eventually the
 toplevel namespace will be very cluttered (c.f. Python's stdlib).
I like 'batteries included' approach(Python supports a diversity of modules). Of course, quality is the most important thing. I agree std.serializer.msgpack.
 In a perfect world, I would see this as an excellent member of a
 CPAN-like system for D, rather than a stdlib module. Since that
 doesn't exist yet, +1, but with a nested module name.
D doesn't have CPAN-like system :'( Masahiro
Jun 18 2010
parent KennyTM~ <kennytm gmail.com> writes:
On Jun 19, 10 09:36, Masahiro Nakagawa wrote:
 On Fri, 18 Jun 2010 23:05:45 +0900, Graham Fawcett <fawcett uwindsor.ca>
 wrote:

 On Thu, 17 Jun 2010 21:29:50 -0700, Andrei Alexandrescu wrote:
[snip]
 In a perfect world, I would see this as an excellent member of a
 CPAN-like system for D, rather than a stdlib module. Since that
 doesn't exist yet, +1, but with a nested module name.
D doesn't have CPAN-like system :'( Masahiro
DSSS needs to be revived.
Jun 18 2010
prev sibling next sibling parent reply "Rory McGuire" <rmcguire neonova.co.za> writes:
On Fri, 18 Jun 2010 06:29:50 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's  
 std.msgpack. I think it would be great if this community established a  
 review process. To do so, we should abstract away starting from at least  
 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and  
 doc)

 Feel free to comment here.


 Andrei
If you're going to have a std.serialize surely it should be called that and surely it should have various serialization drivers? I'm not convinced that it makes sense to just have one binary serialization mechanism when during debugging you might prefer it to use xml or some text based serialization mechanism. I really like this serialization mechanism. It looks like its a simplified copy of Hessian dsource.org/projects/hessiand. I like hessian a lot, I even made my own simple hessian implementation (not hessiand thats by someone else). msgpack seems to rely on a IDL compiler I don't think it should be in the standard library if this is the case, better to have something that automatically generates the code from an interface and then uses a xml, binary, ini, etc... driver to generate the serialized/de-serialized data. Just my two cents. -Rory PS: your book is awesome, really well written.
Jun 18 2010
parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Fri, 18 Jun 2010 23:28:04 +0900, Rory McGuire <rmcguire neonova.co.za>  
wrote:
 msgpack seems to rely on a IDL compiler I don't think it should be in  
 the standard library if this is the case, better to have something that  
 automatically generates the code from an interface and then uses a xml,  
 binary, ini, etc... driver to generate the serialized/de-serialized data.
MessagePack doesn't use IDL. This is one of reason that I select MessagePack. Masahiro
Jun 18 2010
prev sibling next sibling parent reply "Rory McGuire" <rmcguire neonova.co.za> writes:
On Fri, 18 Jun 2010 06:29:50 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's  
 std.msgpack. I think it would be great if this community established a  
 review process. To do so, we should abstract away starting from at least  
 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and  
 doc)

 Feel free to comment here.


 Andrei
I would prefer to see something like this as standard: http://www.dsource.org/projects/orange/ -Rory
Jun 18 2010
parent reply "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Fri, 18 Jun 2010 23:36:54 +0900, Rory McGuire <rmcguire neonova.co.za>  
wrote:
 I would prefer to see something like this as standard:
 http://www.dsource.org/projects/orange/
I think Doost.serialization is one of candidates for std.serializer(serialization?). But I don't know current status. In addition, Michel has serialization module. I don't know module detail, but he mentioned at d.announce. http://lists.puremagic.com/pipermail/digitalmars-d-announce/2010-April/015271.html I don't know other general serialization modules. Masahiro
Jun 18 2010
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-06-18 22:18:20 -0400, "Masahiro Nakagawa" <repeatedly gmail.com> said:

 In addition, Michel has serialization module.
 I don't know module detail, but he mentioned at d.announce.
 http://lists.puremagic.com/pipermail/digitalmars-d-announce/2010-April/015271.html
Indeed,
 
and I'm still improving it, very slowly though. Here are some highlights. The serialization format has a few basic types: integer, floating-point, string, bytestring, array, associative-array, object, but a couple of representation for each for efficient storage. The 'encode' and 'decode' template functions can be used to encode and decode any type to/from an archive: MyStruct m; archiver.encode(m); unarchiver.decode(m); It throws an exception of the archive format is not compatible with your type. If you want to serialize a struct, you must define it like this: struct MyStruct { mixin Archivable!(int, "i"); mixin Archivable!(string, "s"); } or like this: struct MyStruct { int i; string s; void encode(ref KeyArchiver archive) { archive.encode("i", i); archive.encode("s", s); } void decode(ref KeyUnarchiver archive) { foreach (key, value; archive) { switch (key) { case "i": value.decode(i); break; case "s": value.decode(s); break; default: break; } } } } or like this: struct MyStruct { int i; string s; void encode(ref KeyArchiver archive) { archive.encode("i", i); archive.encode("s", s); } void decode(ref KeyUnarchiver archive) { archive.decode("i", i); archive.decode("s", s); } } This last decode function is more convenient since you can read values in a different order than what they're stored, but it but has more overhead than the previous one using a 'switch'. For archiving/unarchiving objects it works the same, but there are two more requirements: the class needs a default constructor and it must implements the KeyArchivable interface (which contains the encode/decode functions above). I'm using a custom serialization format, derived from Hessian but no longer compatible. I started by trying to conform to Hessian but parts of the spec seemed inappropriate for D so I decided to fork. The format can handle cycles and multiple references to the same object, but that's not yet implemented. The archiver and unarchiver are very much based on templates, and as such it's hard to support multiple archive formats when calling the virtual encode/decode functions on an object. But there's nothing standing in the way of having a slower but more generic interface cohabiting with the format-specific one. The archiver would check for the generic interface as an alternative when the format-specific one isn't implemented for a given class. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jun 19 2010
parent reply Jacob Carlborg <doob me.com> writes:
On 2010-06-19 20:15, Michel Fortin wrote:
 On 2010-06-18 22:18:20 -0400, "Masahiro Nakagawa" <repeatedly gmail.com>
 said:

 In addition, Michel has serialization module.
 I don't know module detail, but he mentioned at d.announce.
 http://lists.puremagic.com/pipermail/digitalmars-d-announce/2010-April/015271.html
Indeed,

 and I'm still improving it, very slowly though. Here are some highlights.

 The serialization format has a few basic types: integer, floating-point,
 string, bytestring, array, associative-array, object, but a couple of
 representation for each for efficient storage. The 'encode' and 'decode'
 template functions can be used to encode and decode any type to/from an
 archive:

 MyStruct m;

 archiver.encode(m);
 unarchiver.decode(m);

 It throws an exception of the archive format is not compatible with your
 type.

 If you want to serialize a struct, you must define it like this:

 struct MyStruct {
 mixin Archivable!(int, "i");
 mixin Archivable!(string, "s");
 }

 or like this:

 struct MyStruct {
 int i;
 string s;

 void encode(ref KeyArchiver archive) {
 archive.encode("i", i);
 archive.encode("s", s);
 }
 void decode(ref KeyUnarchiver archive) {
 foreach (key, value; archive) {
 switch (key) {
 case "i": value.decode(i); break;
 case "s": value.decode(s); break;
 default: break;
 }
 }
 }
 }

 or like this:

 struct MyStruct {
 int i;
 string s;

 void encode(ref KeyArchiver archive) {
 archive.encode("i", i);
 archive.encode("s", s);
 }
 void decode(ref KeyUnarchiver archive) {
 archive.decode("i", i);
 archive.decode("s", s);
 }
 }

 This last decode function is more convenient since you can read values
 in a different order than what they're stored, but it but has more
 overhead than the previous one using a 'switch'.

 For archiving/unarchiving objects it works the same, but there are two
 more requirements: the class needs a default constructor and it must
 implements the KeyArchivable interface (which contains the encode/decode
 functions above).

 I'm using a custom serialization format, derived from Hessian but no
 longer compatible. I started by trying to conform to Hessian but parts
 of the spec seemed inappropriate for D so I decided to fork. The format
 can handle cycles and multiple references to the same object, but that's
 not yet implemented.

 The archiver and unarchiver are very much based on templates, and as
 such it's hard to support multiple archive formats when calling the
 virtual encode/decode functions on an object. But there's nothing
 standing in the way of having a slower but more generic interface
 cohabiting with the format-specific one. The archiver would check for
 the generic interface as an alternative when the format-specific one
 isn't implemented for a given class.
Orange (http://dsource.org/projects/orange/) doesn't have those problems (it probably has other problems). These are some of the features: * It handles both serializing and deserializing * It automatically serializes the base classes * It supports events (before and after (de)serializing) * It supports non-serialized fields (you can say that some fields in a class should not be serialized) * It's licensed under the Boost license * It's fairly std/runtime library independent * You can create new archive types and use them with the existing serializer * Serializes through base class references * Serializes third party types The only thing missing that I want to implement is to somehow support different versions of classes and structs. I'm currently working on porting the XML archive to Phobos. -- /Jacob Carlborg
Jun 19 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-06-19 15:52:51 -0400, Jacob Carlborg <doob me.com> said:

 Orange (http://dsource.org/projects/orange/) doesn't have those 
 problems (it probably has other problems). These are some of the 
 features:
 
 * It automatically serializes the base classes
  ...
 * Serializes through base class references
 * Serializes third party types
By that, do you mean it serializes the fields of the base classes and other types even when those types weren't designed to be serialized? Personally, I think it's dangerous to break encapsulation like that. You can never guaranty that the base class won't contain data that shouldn't be serialized (a reference count, a file or socket handle, cached data bound to be different at a later time, a reference to the root of a huge hierarchy isn't "part" of the object, etc.). Serialization might work for one version of the base class, and when the implementation changes it breaks. I prefer the reverse design: only types with a known serialization procedure are serializable. That said, I can still appreciate the merit of simplicity when it comes to serializing types like this one: struct Point { float x, y; } But it's easy to make it explicitly serializable too: mixin AllMembersArchivable!(Point);
 * It's licensed under the Boost license
And mine isn't public yet. :-)
 The only thing missing that I want to implement is to somehow support 
 different versions of classes and structs. I'm currently working on 
 porting the XML archive to Phobos.
Versioning is something I generally prefer to handle at the application level. Having class and struct fields serialized with a string key makes that quite simple and allows keeping the serialized data compatible with earlier versions of the same struct/class. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jun 19 2010
parent Jacob Carlborg <doob me.com> writes:
On 2010-06-20 00:34, Michel Fortin wrote:
 On 2010-06-19 15:52:51 -0400, Jacob Carlborg <doob me.com> said:

 Orange (http://dsource.org/projects/orange/) doesn't have those
 problems (it probably has other problems). These are some of the
 features:

 * It automatically serializes the base classes
 ...
 * Serializes through base class references
 * Serializes third party types
By that, do you mean it serializes the fields of the base classes and other types even when those types weren't designed to be serialized? Personally, I think it's dangerous to break encapsulation like that. You can never guaranty that the base class won't contain data that shouldn't be serialized (a reference count, a file or socket handle, cached data bound to be different at a later time, a reference to the root of a huge hierarchy isn't "part" of the object, etc.). Serialization might work for one version of the base class, and when the implementation changes it breaks.
You always have the option to register a serializer for a specific type if you want to customize the serialization. I'm also thinking of adding the option to register a serializer for a specific field in a specific type.
 I prefer the reverse design: only types with a known serialization
 procedure are serializable. That said, I can still appreciate the merit
 of simplicity when it comes to serializing types like this one:

 struct Point { float x, y; }

 But it's easy to make it explicitly serializable too:

 mixin AllMembersArchivable!(Point);
In Orange you have three options: 1. Let the library do all the work 2. Implement two methods in the class/struct that handle the serialization 3. Register a (de)serializer function/delegate that handles the serialization You can also combine these three options. For example, let the library handle two out of three classes in a class hierarchy (1) and handle the third one by yourself (2 or 3). The third option also allows you to customize third party types.
 * It's licensed under the Boost license
And mine isn't public yet. :-)
 The only thing missing that I want to implement is to somehow support
 different versions of classes and structs. I'm currently working on
 porting the XML archive to Phobos.
Versioning is something I generally prefer to handle at the application level. Having class and struct fields serialized with a string key makes that quite simple and allows keeping the serialized data compatible with earlier versions of the same struct/class.
Every value in Orange is serialized with a key. By default an object will just have a numerical id from a counter and the fields will have their names as the key. -- /Jacob Carlborg
Jun 20 2010
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2010-06-19 04:18, Masahiro Nakagawa wrote:
 On Fri, 18 Jun 2010 23:36:54 +0900, Rory McGuire 
 <rmcguire neonova.co.za> wrote:
 I would prefer to see something like this as standard:
 http://www.dsource.org/projects/orange/
I think Doost.serialization is one of candidates for std.serializer(serialization?). But I don't know current status.
Doost.serialization doesn't support: * Serialization from base class is not possible * Creation of class/struct instance without default/public constructor Serialization from base class was one of my goals when I implemented Orange. But you need to register a function for that, until D2 fixes the "getMembers" method in TypeInfo_Class.
 In addition, Michel has serialization module.
 I don't know module detail, but he mentioned at d.announce.
 http://lists.puremagic.com/pipermail/digitalmars-d-announce/20
0-April/015271.html 
 
 
 I don't know other general serialization modules.
 
 
 Masahiro
-- /Jacob Carlborg
Jun 19 2010
parent reply "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Sun, 20 Jun 2010 04:27:08 +0900, Jacob Carlborg <doob me.com> wrote:
 Serialization from base class was one of my goals when I implemented  
 Orange.
Good.
 But you need to register a function for that, until D2 fixes the
 "getMembers" method in TypeInfo_Class.
You reported this bug? I couldn't find closest bug in bugzilla. Masahiro
Jun 21 2010
parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 21 Jun 2010 22:24:42 -0400, Masahiro Nakagawa  
<repeatedly gmail.com> wrote:

 On Sun, 20 Jun 2010 04:27:08 +0900, Jacob Carlborg <doob me.com> wrote:
 Serialization from base class was one of my goals when I implemented  
 Orange.
Good.
 But you need to register a function for that, until D2 fixes the
 "getMembers" method in TypeInfo_Class.
You reported this bug? I couldn't find closest bug in bugzilla.
The root bug is 1348: offTi property of TypeInfo return empty array. offTi or offset typeinfo should contain an array of offsets and their typeinfo, i.e. runtime reflection. There are also a bunch of other typeinfo bugs in bugzilla.
 Masahiro
Jun 21 2010
parent Jacob Carlborg <doob me.com> writes:
On 2010-06-22 04:39, Robert Jacques wrote:
 On Mon, 21 Jun 2010 22:24:42 -0400, Masahiro Nakagawa
 <repeatedly gmail.com> wrote:

 On Sun, 20 Jun 2010 04:27:08 +0900, Jacob Carlborg <doob me.com> wrote:
 Serialization from base class was one of my goals when I implemented
 Orange.
Good.
 But you need to register a function for that, until D2 fixes the
 "getMembers" method in TypeInfo_Class.
You reported this bug? I couldn't find closest bug in bugzilla.
The root bug is 1348: offTi property of TypeInfo return empty array. offTi or offset typeinfo should contain an array of offsets and their typeinfo, i.e. runtime reflection. There are also a bunch of other typeinfo bugs in bugzilla.
Like the missing const declaration on the "name" method in TypeInfo_Member (or what the name of the class is).
 Masahiro
-- /Jacob Carlborg
Jun 22 2010
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 18 Jun 2010 00:29:50 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's  
 std.msgpack. I think it would be great if this community established a  
 review process. To do so, we should abstract away starting from at least  
 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and  
 doc)

 Feel free to comment here.


 Andrei
First, +1 to the review process. Second, a lot of the design and examples are very similar to the MsgPack reference implementations, which are all Apache licensed. Since some of these design decisions seem to not be 'D'-ish (more on that later) this raises questions on just how clean room the implementation was. Third, there's a lot of things that shouldn't be in the final module. -Should be removed: SimpleBuffer, VRefBuffer/vRefBuffer, -Redundant with other parts of phobos: BinaryFileWriter => LockingTextWriter, mp_Type/mp_Object/mp_KeyValue => std.variant -Should be moved elsewhere DeflateFilter/deflateFilter => zip/zlib Forth, msgpack was designed to be a simple, wire level encoding of some higher structure. As opposed to JSON or XML, you are not going to be modifying it dynamically. The very concept that it would use buffers internally, particularly the existence of the "zero-copy" buffer, baffles me. (Not to mention it leaves one open to aliasing bugs.) MsgPack has to decorate every single type it stores, so there is nothing "zero-copy" about it. To that end, I think MsgPack/MsgUnPack should wrap an input or output range as appropriate and be able to be used by a general serialization library such as Orange. Other little things, -Why isn't nil mapped to null? -Why isn't there any big warnings about serializing cycles?
Jun 18 2010
parent reply "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Fri, 18 Jun 2010 23:47:38 +0900, Robert Jacques <sandford jhu.edu>  
wrote:
 Second, a lot of the design and examples are very similar to the MsgPack  
 reference implementations, which are all Apache licensed. Since some of  
 these design decisions seem to not be 'D'-ish (more on that later) this  
 raises questions on just how clean room the implementation was.
The author of MessagePack allows this module to apply Boost Software License.
 Third, there's a lot of things that shouldn't be in the final module.
 -Should be removed:
 SimpleBuffer, VRefBuffer/vRefBuffer,
 -Redundant with other parts of phobos:
 BinaryFileWriter => LockingTextWriter, mp_Type/mp_Object/mp_KeyValue =>  
 std.variant
BinaryFileWriter removed. mp_Object is based on std.json.JSONValue. Variant can't have mp_pack method. Hmm..., I check Variant again.
 -Should be moved elsewhere
 DeflateFilter/deflateFilter => zip/zlib
I agree.
 Forth, msgpack was designed to be a simple, wire level encoding of some  
 higher structure. As opposed to JSON or XML, you are not going to be  
 modifying it dynamically. The very concept that it would use buffers  
 internally, particularly the existence of the "zero-copy" buffer,  
 baffles me. (Not to mention it leaves one open to aliasing bugs.)  
 MsgPack has to decorate every single type it stores, so there is nothing  
 "zero-copy" about it. To that end, I think MsgPack/MsgUnPack should wrap  
 an input or output range as appropriate and be able to be used by a  
 general serialization library such as Orange.
If Phobos has std.serializer like Orange(or Doost.Serialization?), I will implement MsgPackArchive. But general serializer usually is low-performance and lacks some features. So when user wants performance, user will use Packer and Unpacker directly.
 Other little things,
 -Why isn't nil mapped to null?
Where?
 -Why isn't there any big warnings about serializing cycles?
Oops. Current implementation can't deal with a circular reference. I added NOTE section to Packer's comment. Masahiro
Jun 18 2010
parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 19 Jun 2010 00:02:28 -0400, Masahiro Nakagawa  
<repeatedly gmail.com> wrote:

 On Fri, 18 Jun 2010 23:47:38 +0900, Robert Jacques <sandford jhu.edu>  
 wrote:
 Second, a lot of the design and examples are very similar to the  
 MsgPack reference implementations, which are all Apache licensed. Since  
 some of these design decisions seem to not be 'D'-ish (more on that  
 later) this raises questions on just how clean room the implementation  
 was.
The author of MessagePack allows this module to apply Boost Software License.
 Third, there's a lot of things that shouldn't be in the final module.
 -Should be removed:
 SimpleBuffer, VRefBuffer/vRefBuffer,
 -Redundant with other parts of phobos:
 BinaryFileWriter => LockingTextWriter, mp_Type/mp_Object/mp_KeyValue =>  
 std.variant
BinaryFileWriter removed. mp_Object is based on std.json.JSONValue. Variant can't have mp_pack method. Hmm..., I check Variant again.
Well, std.json is an incomplete, alpha library that's buggy and unreviewed. For example, the design is currently very C-ish with structs, enums, functions and exceptions all being independent top level constructs.
 -Should be moved elsewhere
 DeflateFilter/deflateFilter => zip/zlib
I agree.
 Forth, msgpack was designed to be a simple, wire level encoding of some  
 higher structure. As opposed to JSON or XML, you are not going to be  
 modifying it dynamically. The very concept that it would use buffers  
 internally, particularly the existence of the "zero-copy" buffer,  
 baffles me. (Not to mention it leaves one open to aliasing bugs.)  
 MsgPack has to decorate every single type it stores, so there is  
 nothing "zero-copy" about it. To that end, I think MsgPack/MsgUnPack  
 should wrap an input or output range as appropriate and be able to be  
 used by a general serialization library such as Orange.
If Phobos has std.serializer like Orange(or Doost.Serialization?), I will implement MsgPackArchive. But general serializer usually is low-performance and lacks some features. So when user wants performance, user will use Packer and Unpacker directly.
I'd disagree, with regard to performance: there's no reason why a general (cycle free) serializer would be slower than your direct routines. A serializer's only job is to pack/unpack complex objects correctly: the key advantage is that polymorphic objects/struct only have to support one custom serialization routine/mixin. Also, separating physical-encoding from logical-encoding allows for some very powerful idioms. You could, for example, add cycle detection/support to msgpack by having the serializer write in a micro-format on-top of msgpack. Another, more important example is letting the user decide between an array and a map representation for structs/objects. Using arrays for objects is extremely brittle. If say, a new hire refactors a class from A{ int x; int y;} to A{ int y; int x; } you now have an extremely hard to find and track down logic bug in your program. To say nothing about trying to upgrade an implementation. On the other hand, using maps for objects adds a decent amount of overhead to the total serialized length.
 Other little things,
 -Why isn't nil mapped to null?
Where?
MsgPack has a nil type which I assume is equivalent to null. So why isn't it simply included in the object serializer (and arguably also in the array and map serializer as well)? For that matter, packNil, packTrue, packFalse, etc are all redundant with pack(T). Furthermore, stream, packArray, packMap and packRaw all should be internal routines. (Though, a rawPack(T)(T value) routine might not go amiss) (Returns: this to method chain. => Returns: self. i.e. for method chaining ) Getting back to pack(T), there are 18 of these methods, one of which has incorrect constraints ( pack(Types...) if (Types.length > 1) ) with four different pieces of documentation. Good, clean documentation is extremely important to a library, so this needs to be reduced to one documentation block and a handful of methods at most. I'd recommend using isNumeric and writing an isSerializable template. Oh, as for packing real you'll need to use CustomFloat!(80) for cross platform inter-op. Also, it shouldn't be system: there's nothing memory-unsafe about writing or reading them. Please don't abuse system, etc for meanings other their intended meanings.
 -Why isn't there any big warnings about serializing cycles?
Oops. Current implementation can't deal with a circular reference. I added NOTE section to Packer's comment. Masahiro
Jun 19 2010
parent reply "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Sun, 20 Jun 2010 02:08:07 +0900, Robert Jacques <sandford jhu.edu>  
wrote:

 On Sat, 19 Jun 2010 00:02:28 -0400, Masahiro Nakagawa  
 <repeatedly gmail.com> wrote:

 mp_Object is based on std.json.JSONValue.
 Variant can't have mp_pack method.
 Hmm..., I check Variant again.
Well, std.json is an incomplete, alpha library that's buggy and unreviewed. For example, the design is currently very C-ish with structs, enums, functions and exceptions all being independent top level constructs.
I checked Variant, but Variant[Variant] doesn't work... :(
 If Phobos has std.serializer like Orange(or Doost.Serialization?),
 I will implement MsgPackArchive.
 But general serializer usually is low-performance and lacks some  
 features.
 So when user wants performance, user will use Packer and Unpacker  
 directly.
I'd disagree, with regard to performance: there's no reason why a general (cycle free) serializer would be slower than your direct routines. A serializer's only job is to pack/unpack complex objects correctly: the key advantage is that polymorphic objects/struct only have to support one custom serialization routine/mixin. Also, separating physical-encoding from logical-encoding allows for some very powerful idioms. You could, for example, add cycle detection/support to msgpack by having the serializer write in a micro-format on-top of msgpack. Another, more important example is letting the user decide between an array and a map representation for structs/objects. Using arrays for objects is extremely brittle. If say, a new hire refactors a class from A{ int x; int y;} to A{ int y; int x; } you now have an extremely hard to find and track down logic bug in your program. To say nothing about trying to upgrade an implementation. On the other hand, using maps for objects adds a decent amount of overhead to the total serialized length.
Yeah, It's a implementation-dependent problem. So I wrote "usually". If Phobos supports good serializer, I will remove Packer and direct-conversion Unpacker.
 Other little things,
 -Why isn't nil mapped to null?
Where?
MsgPack has a nil type which I assume is equivalent to null. So why isn't it simply included in the object serializer (and arguably also in the array and map serializer as well)?
Sorry, I can't understand this statement. Nullable types(class, array and associative-array) is serialized to nil if null.
 For that matter, packNil, packTrue, packFalse, etc are all redundant  
 with pack(T).
I removed packTrue, packFalse.
 Furthermore, stream, packArray, packMap and packRaw all should be  
 internal routines. (Though, a rawPack(T)(T value) routine might not go  
 amiss)
I moved packRaw to internal method, but packArray and packMap should be public. MessagePack's array and map can contain any type.
 (Returns: this to method chain. => Returns: self. i.e. for method  
 chaining )
Changed.
 Oh, as for packing real you'll need to use CustomFloat!(80) for cross  
 platform inter-op. Also, it shouldn't be  system: there's nothing  
 memory-unsafe about writing or reading them. Please don't abuse  system,  
 etc for meanings other their intended meanings.
I forgot this. CutstomFloat was already fixed by you and David in trunk. I added 80bit floating point support for non-x86 environment. Masahiro
Jun 20 2010
parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 21 Jun 2010 01:22:40 -0400, Masahiro Nakagawa  
<repeatedly gmail.com> wrote:
 On Sun, 20 Jun 2010 02:08:07 +0900, Robert Jacques <sandford jhu.edu>  
 wrote:

 On Sat, 19 Jun 2010 00:02:28 -0400, Masahiro Nakagawa  
 <repeatedly gmail.com> wrote:

 mp_Object is based on std.json.JSONValue.
 Variant can't have mp_pack method.
 Hmm..., I check Variant again.
Well, std.json is an incomplete, alpha library that's buggy and unreviewed. For example, the design is currently very C-ish with structs, enums, functions and exceptions all being independent top level constructs.
I checked Variant, but Variant[Variant] doesn't work... :(
I know. But then again, how would you define a total ordering for Variant? The sadder aspect is that Variant[string] doesn't work either. Anyways, I primarily trying to recommend the API, although fixing Variant is the better long term solution. For what it's worth, this works: Variant[Variant] bar = [ Variant("foo"): Variant(5), Variant(5.5):Variant(6) ]; But this doesn't Variant a = "bar"; Variant b = 5; bar[a] = b;
 If Phobos has std.serializer like Orange(or Doost.Serialization?),
 I will implement MsgPackArchive.
 But general serializer usually is low-performance and lacks some  
 features.
 So when user wants performance, user will use Packer and Unpacker  
 directly.
I'd disagree, with regard to performance: there's no reason why a general (cycle free) serializer would be slower than your direct routines. A serializer's only job is to pack/unpack complex objects correctly: the key advantage is that polymorphic objects/struct only have to support one custom serialization routine/mixin. Also, separating physical-encoding from logical-encoding allows for some very powerful idioms. You could, for example, add cycle detection/support to msgpack by having the serializer write in a micro-format on-top of msgpack. Another, more important example is letting the user decide between an array and a map representation for structs/objects. Using arrays for objects is extremely brittle. If say, a new hire refactors a class from A{ int x; int y;} to A{ int y; int x; } you now have an extremely hard to find and track down logic bug in your program. To say nothing about trying to upgrade an implementation. On the other hand, using maps for objects adds a decent amount of overhead to the total serialized length.
Yeah, It's a implementation-dependent problem. So I wrote "usually". If Phobos supports good serializer, I will remove Packer and direct-conversion Unpacker.
 Other little things,
 -Why isn't nil mapped to null?
Where?
MsgPack has a nil type which I assume is equivalent to null. So why isn't it simply included in the object serializer (and arguably also in the array and map serializer as well)?
Sorry, I can't understand this statement. Nullable types(class, array and associative-array) is serialized to nil if null.
Sorry. The comment was about redundancy. i.e. packNil <=> pack(null).
 For that matter, packNil, packTrue, packFalse, etc are all redundant  
 with pack(T).
I removed packTrue, packFalse.
 Furthermore, stream, packArray, packMap and packRaw all should be  
 internal routines. (Though, a rawPack(T)(T value) routine might not go  
 amiss)
I moved packRaw to internal method, but packArray and packMap should be public. MessagePack's array and map can contain any type.
I can see the usefulness of the functionality, but for an end-user API, it is brittle and error-prone. For example: p.packArray(3).pack(5).pack("six"); produces an incorrect msgPack stream. A better syntax would be: p.pack(5,"six"); or p.packArray(5,"six"); I know you're using pack(Types...) to mean pack(Types[0]).pack(Types[1])... but using it for variable type arrays is worth considering. For packMap, pack( ["key": "value"], [5:"six"] ) could work, but packMap( ["key": "value"], [5:"six"] ) or packMap("key","value",5,"six") would avoid confusion with pack(Types...).
 (Returns: this to method chain. => Returns: self. i.e. for method  
 chaining )
Changed.
 Oh, as for packing real you'll need to use CustomFloat!(80) for cross  
 platform inter-op. Also, it shouldn't be  system: there's nothing  
 memory-unsafe about writing or reading them. Please don't abuse  
  system, etc for meanings other their intended meanings.
I forgot this. CutstomFloat was already fixed by you and David in trunk. I added 80bit floating point support for non-x86 environment.
Don't forget that some x86 platform store 80-bit floats in more than 80-bits. So you do need to check the size of real on x86 platforms as well.
 Masahiro
P.S. Don't forget to apply changes to packer's API to unpacker.
Jun 21 2010
parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
Sorry for the delay.


On Tue, 22 Jun 2010 13:47:19 +0900, Robert Jacques <sandford jhu.edu>  
wrote:

 Other little things,
 -Why isn't nil mapped to null?
Where?
MsgPack has a nil type which I assume is equivalent to null. So why isn't it simply included in the object serializer (and arguably also in the array and map serializer as well)?
Sorry, I can't understand this statement. Nullable types(class, array and associative-array) is serialized to nil if null.
Sorry. The comment was about redundancy. i.e. packNil <=> pack(null).
I added pointer type (de)serialization.
 For that matter, packNil, packTrue, packFalse, etc are all redundant  
 with pack(T).
I removed packTrue, packFalse.
 Furthermore, stream, packArray, packMap and packRaw all should be  
 internal routines. (Though, a rawPack(T)(T value) routine might not go  
 amiss)
I moved packRaw to internal method, but packArray and packMap should be public. MessagePack's array and map can contain any type.
I can see the usefulness of the functionality, but for an end-user API, it is brittle and error-prone. For example: p.packArray(3).pack(5).pack("six"); produces an incorrect msgPack stream. A better syntax would be: p.pack(5,"six"); or p.packArray(5,"six"); I know you're using pack(Types...) to mean pack(Types[0]).pack(Types[1])... but using it for variable type arrays is worth considering. For packMap, pack( ["key": "value"], [5:"six"] ) could work, but packMap( ["key": "value"], [5:"six"] ) or packMap("key","value",5,"six") would avoid confusion with pack(Types...).
I have thought about this point. In the result, I changed (Un)Packer API. It's true that your suggested syntax is better. Old packArray was renamed to beginArray and new packArray serializes container type using variadic arguments. Of course, Map too. ----- /* old */ packer.packArray(2).pack(true, 10); /* new */ packer.packArray(true, 10); // == packer.beginArray(2).pack(true, 10) ----- Thanks for pointing out this. Masahiro
Jun 24 2010
prev sibling parent reply =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
Dnia 18-06-2010 o 06:29:50 Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org> napisa=B3(a):

 There's one larger submission that needs review - Masahiro Nakagawa's =
=
 std.msgpack. I think it would be great if this community established a=
=
 review process. To do so, we should abstract away starting from at lea=
st =
 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code a=
nd =
 doc)

 Feel free to comment here.
Doc bug: "VRefBuffer is a zero copy buffer for more efficient" ... for more efficient what?
Jun 18 2010
parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Sat, 19 Jun 2010 04:47:35 +0900, Tomek Sowi=C5=84ski <just ask.me> wr=
ote:

 Dnia 18-06-2010 o 06:29:50 Andrei Alexandrescu  =
 <SeeWebsiteForEmail erdani.org> napisa=C5=82(a):

 There's one larger submission that needs review - Masahiro Nakagawa's=
=
 std.msgpack. I think it would be great if this community established =
a =
 review process. To do so, we should abstract away starting from at  =
 least one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code =
=
 and doc)

 Feel free to comment here.
Doc bug: "VRefBuffer is a zero copy buffer for more efficient" ... for more efficient what?
Oh, I forgot "serialization". Thanks! Masahiro
Jun 18 2010