digitalmars.D - Review: std.msgpack

Andrei Alexandrescu (10/10) Jun 17 2010 There's one larger submission that needs review - Masahiro Nakagawa's

Graham Fawcett (24/35) Jun 18 2010 Nice!

Masahiro Nakagawa (9/42) Jun 18 2010 All right. I moved link to See_Also section in 1st paragraph.

KennyTM~ (3/12) Jun 18 2010 DSSS needs to be revived.

Rory McGuire (19/29) Jun 18 2010 If you're going to have a std.serialize surely it should be called that ...

Masahiro Nakagawa (5/9) Jun 18 2010 MessagePack doesn't use IDL.

Rory McGuire (5/15) Jun 18 2010 I would prefer to see something like this as standard:

Masahiro Nakagawa (10/12) Jun 18 2010 I think Doost.serialization is one of candidates for

Michel Fortin (73/77) Jun 19 2010 and I'm still improving it, very slowly though. Here are some highlights...

Jacob Carlborg (18/92) Jun 19 2010 Orange (http://dsource.org/projects/orange/) doesn't have those problems...

Michel Fortin (25/37) Jun 19 2010 By that, do you mean it serializes the fields of the base classes and

Jacob Carlborg (19/52) Jun 20 2010 You always have the option to register a serializer for a specific type

Jacob Carlborg (9/28) Jun 19 2010 Doost.serialization doesn't support:

Masahiro Nakagawa (4/8) Jun 21 2010 You reported this bug? I couldn't find closest bug in bugzilla.

Robert Jacques (6/15) Jun 21 2010 The root bug is 1348: offTi property of TypeInfo return empty array. off...

Jacob Carlborg (5/24) Jun 22 2010 Like the missing const declaration on the "name" method in

Robert Jacques (27/37) Jun 18 2010 First, +1 to the review process.

Masahiro Nakagawa (17/41) Jun 18 2010 The author of MessagePack allows this module to apply Boost Software

Robert Jacques (37/82) Jun 19 2010 Well, std.json is an incomplete, alpha library that's buggy and

Masahiro Nakagawa (17/65) Jun 20 2010 I checked Variant, but Variant[Variant] doesn't work... :(

Robert Jacques (26/96) Jun 21 2010 I know. But then again, how would you define a total ordering for Varian...

Masahiro Nakagawa (17/51) Jun 24 2010 Sorry for the delay.

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (9/18) Jun 18 2010 =

Masahiro Nakagawa (8/24) Jun 18 2010 =

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

There's one larger submission that needs review - Masahiro Nakagawa's 
std.msgpack. I think it would be great if this community established a 
review process. To do so, we should abstract away starting from at least 
one review :o). Here's Masahiro's work:

code: https://dl.dropbox.com/u/374829/std/msgpack.d
doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and 
doc)

Feel free to comment here.


Andrei

Jun 17 2010

Graham Fawcett <fawcett uwindsor.ca> writes:

On Thu, 17 Jun 2010 21:29:50 -0700, Andrei Alexandrescu wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's
 std.msgpack. I think it would be great if this community established a
 review process. 

+1!

 To do so, we should abstract away starting from at least
 one review :o). Here's Masahiro's work:
 
 code: https://dl.dropbox.com/u/374829/std/msgpack.d doc: 
 https://dl.dropbox.com/u/374829/std/msgpack.html zip: 
 https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and doc)
 
 Feel free to comment here.

Nice!

The docs should link to the MessagePack Format Spec page in the first
paragraph. I had not known that MessagePack was an existing binary
serialization format.

http://redmine.msgpack.org/projects/msgpack/wiki/FormatSpec

I was going to complain that this module didn't handle D-specific
types like Variants, but now I see that this is out of scope.

Just curious, why have MessagePack support in Phobos, instead of
Packed Encoding Rules, BERT, BSON, Protocol Buffers, etc.? Not that
this isn't a very useful module, but I'm curious about why this
particular binary encoding.

If it's a 'batteries included' thing ("let's put as many useful
modules as possible in the stdlib"), then maybe a dotted name like
'std.serializer.msgpack' might be better, to give a proper home for
'std.serializer.per', etc.? I'm not sure this deserves the same 'top
level billing' as, say, std.stdio or std.range; and eventually the
toplevel namespace will be very cluttered (c.f. Python's stdlib).

In a perfect world, I would see this as an excellent member of a
CPAN-like system for D, rather than a stdlib module. Since that
doesn't exist yet, +1, but with a nested module name.

Best, 
Graham

Jun 18 2010

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Fri, 18 Jun 2010 23:05:45 +0900, Graham Fawcett <fawcett uwindsor.ca>  
wrote:

 On Thu, 17 Jun 2010 21:29:50 -0700, Andrei Alexandrescu wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's
 std.msgpack. I think it would be great if this community established a
 review process.

 +1!

 To do so, we should abstract away starting from at least
 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d doc:
 https://dl.dropbox.com/u/374829/std/msgpack.html zip:
 https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and doc)

 Feel free to comment here.

 Nice!

 The docs should link to the MessagePack Format Spec page in the first
 paragraph. I had not known that MessagePack was an existing binary
 serialization format.

 http://redmine.msgpack.org/projects/msgpack/wiki/FormatSpec

 I was going to complain that this module didn't handle D-specific
 types like Variants, but now I see that this is out of scope.

All right. I moved link to See_Also section in 1st paragraph.

 Just curious, why have MessagePack support in Phobos, instead of
 Packed Encoding Rules, BERT, BSON, Protocol Buffers, etc.? Not that
 this isn't a very useful module, but I'm curious about why this
 particular binary encoding.

 If it's a 'batteries included' thing ("let's put as many useful
 modules as possible in the stdlib"), then maybe a dotted name like
 'std.serializer.msgpack' might be better, to give a proper home for
 'std.serializer.per', etc.? I'm not sure this deserves the same 'top
 level billing' as, say, std.stdio or std.range; and eventually the
 toplevel namespace will be very cluttered (c.f. Python's stdlib).

I like 'batteries included' approach(Python supports a diversity of  
modules).
Of course, quality is the most important thing.

I agree std.serializer.msgpack.

 In a perfect world, I would see this as an excellent member of a
 CPAN-like system for D, rather than a stdlib module. Since that
 doesn't exist yet, +1, but with a nested module name.

D doesn't have CPAN-like system :'(


Masahiro

Jun 18 2010

KennyTM~ <kennytm gmail.com> writes:

On Jun 19, 10 09:36, Masahiro Nakagawa wrote:
 On Fri, 18 Jun 2010 23:05:45 +0900, Graham Fawcett <fawcett uwindsor.ca>
 wrote:

 On Thu, 17 Jun 2010 21:29:50 -0700, Andrei Alexandrescu wrote:


[snip]

 In a perfect world, I would see this as an excellent member of a
 CPAN-like system for D, rather than a stdlib module. Since that
 doesn't exist yet, +1, but with a nested module name.

 D doesn't have CPAN-like system :'(


 Masahiro

DSSS needs to be revived.

Jun 18 2010

"Rory McGuire" <rmcguire neonova.co.za> writes:

On Fri, 18 Jun 2010 06:29:50 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's  
 std.msgpack. I think it would be great if this community established a  
 review process. To do so, we should abstract away starting from at least  
 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and  
 doc)

 Feel free to comment here.


 Andrei


If you're going to have a std.serialize surely it should be called that  
and surely it should have various serialization drivers?

I'm not convinced that it makes sense to just have one binary  
serialization mechanism when during debugging you might prefer it to use  
xml or some text
based serialization mechanism.

I really like this serialization mechanism. It looks like its a simplified  
copy of Hessian dsource.org/projects/hessiand. I like hessian a lot, I  
even made my own simple hessian implementation (not hessiand thats by  
someone else).


msgpack seems to rely on a IDL compiler I don't think it should be in the  
standard library if this is the case, better to have something that  
automatically generates the code from an interface and then uses a xml,  
binary, ini, etc... driver to generate the serialized/de-serialized data.

Just my two cents.

-Rory

PS: your book is awesome, really well written.

Jun 18 2010

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Fri, 18 Jun 2010 23:28:04 +0900, Rory McGuire <rmcguire neonova.co.za>  
wrote:
 msgpack seems to rely on a IDL compiler I don't think it should be in  
 the standard library if this is the case, better to have something that  
 automatically generates the code from an interface and then uses a xml,  
 binary, ini, etc... driver to generate the serialized/de-serialized data.

MessagePack doesn't use IDL.
This is one of reason that I select MessagePack.


Masahiro

Jun 18 2010

"Rory McGuire" <rmcguire neonova.co.za> writes:

On Fri, 18 Jun 2010 06:29:50 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's  
 std.msgpack. I think it would be great if this community established a  
 review process. To do so, we should abstract away starting from at least  
 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and  
 doc)

 Feel free to comment here.


 Andrei


I would prefer to see something like this as standard:
http://www.dsource.org/projects/orange/

-Rory

Jun 18 2010

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Fri, 18 Jun 2010 23:36:54 +0900, Rory McGuire <rmcguire neonova.co.za>  
wrote:
 I would prefer to see something like this as standard:
 http://www.dsource.org/projects/orange/

I think Doost.serialization is one of candidates for  
std.serializer(serialization?).
But I don't know current status.

In addition, Michel has serialization module.
I don't know module detail, but he mentioned at d.announce.
http://lists.puremagic.com/pipermail/digitalmars-d-announce/2010-April/015271.html

I don't know other general serialization modules.


Masahiro

Jun 18 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-06-18 22:18:20 -0400, "Masahiro Nakagawa" <repeatedly gmail.com> said:

 In addition, Michel has serialization module.
 I don't know module detail, but he mentioned at d.announce.
 http://lists.puremagic.com/pipermail/digitalmars-d-announce/2010-April/015271.html

Indeed, 
 

and I'm still improving it, very slowly though. Here are some highlights.

The serialization format has a few basic types: integer, 
floating-point, string, bytestring, array, associative-array, object, 
but a couple of representation for each for efficient storage. The 
'encode' and 'decode' template functions can be used to encode and 
decode any type to/from an archive:

	MyStruct m;

	archiver.encode(m);
	unarchiver.decode(m);

It throws an exception of the archive format is not compatible with your type.

If you want to serialize a struct, you must define it like this:

	struct MyStruct {
		mixin Archivable!(int, "i");
		mixin Archivable!(string, "s");
	}

or like this:

	struct MyStruct {
		int i;
		string s;
		
		void encode(ref KeyArchiver archive) {
			archive.encode("i", i);
			archive.encode("s", s);
		}
		void decode(ref KeyUnarchiver archive) {
			foreach (key, value; archive) {
				switch (key) {
					case "i": value.decode(i); break;
					case "s": value.decode(s); break;
					default: break;
				}
			}
		}
	}

or like this:

	struct MyStruct {
		int i;
		string s;
		
		void encode(ref KeyArchiver archive) {
			archive.encode("i", i);
			archive.encode("s", s);
		}
		void decode(ref KeyUnarchiver archive) {
			archive.decode("i", i);
			archive.decode("s", s);
		}
	}

This last decode function is more convenient since you can read values 
in a different order than what they're stored, but it but has more 
overhead than the previous one using a 'switch'.

For archiving/unarchiving objects it works the same, but there are two 
more requirements: the class needs a default constructor and it must 
implements the KeyArchivable interface (which contains the 
encode/decode functions above).

I'm using a custom serialization format, derived from Hessian but no 
longer compatible. I started by trying to conform to Hessian but parts 
of the spec seemed inappropriate for D so I decided to fork. The format 
can handle cycles and multiple references to the same object, but 
that's not yet implemented.

The archiver and unarchiver are very much based on templates, and as 
such it's hard to support multiple archive formats when calling the 
virtual encode/decode functions on an object. But there's nothing 
standing in the way of having a slower but more generic interface 
cohabiting with the format-specific one. The archiver would check for 
the generic interface as an alternative when the format-specific one 
isn't implemented for a given class.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 19 2010

Jacob Carlborg <doob me.com> writes:

On 2010-06-19 20:15, Michel Fortin wrote:
 On 2010-06-18 22:18:20 -0400, "Masahiro Nakagawa" <repeatedly gmail.com>
 said:

 In addition, Michel has serialization module.
 I don't know module detail, but he mentioned at d.announce.
 http://lists.puremagic.com/pipermail/digitalmars-d-announce/2010-April/015271.html

 Indeed,

 and I'm still improving it, very slowly though. Here are some highlights.

 The serialization format has a few basic types: integer, floating-point,
 string, bytestring, array, associative-array, object, but a couple of
 representation for each for efficient storage. The 'encode' and 'decode'
 template functions can be used to encode and decode any type to/from an
 archive:

 MyStruct m;

 archiver.encode(m);
 unarchiver.decode(m);

 It throws an exception of the archive format is not compatible with your
 type.

 If you want to serialize a struct, you must define it like this:

 struct MyStruct {
 mixin Archivable!(int, "i");
 mixin Archivable!(string, "s");
 }

 or like this:

 struct MyStruct {
 int i;
 string s;

 void encode(ref KeyArchiver archive) {
 archive.encode("i", i);
 archive.encode("s", s);
 }
 void decode(ref KeyUnarchiver archive) {
 foreach (key, value; archive) {
 switch (key) {
 case "i": value.decode(i); break;
 case "s": value.decode(s); break;
 default: break;
 }
 }
 }
 }

 or like this:

 struct MyStruct {
 int i;
 string s;

 void encode(ref KeyArchiver archive) {
 archive.encode("i", i);
 archive.encode("s", s);
 }
 void decode(ref KeyUnarchiver archive) {
 archive.decode("i", i);
 archive.decode("s", s);
 }
 }

 This last decode function is more convenient since you can read values
 in a different order than what they're stored, but it but has more
 overhead than the previous one using a 'switch'.

 For archiving/unarchiving objects it works the same, but there are two
 more requirements: the class needs a default constructor and it must
 implements the KeyArchivable interface (which contains the encode/decode
 functions above).

 I'm using a custom serialization format, derived from Hessian but no
 longer compatible. I started by trying to conform to Hessian but parts
 of the spec seemed inappropriate for D so I decided to fork. The format
 can handle cycles and multiple references to the same object, but that's
 not yet implemented.

 The archiver and unarchiver are very much based on templates, and as
 such it's hard to support multiple archive formats when calling the
 virtual encode/decode functions on an object. But there's nothing
 standing in the way of having a slower but more generic interface
 cohabiting with the format-specific one. The archiver would check for
 the generic interface as an alternative when the format-specific one
 isn't implemented for a given class.

Orange (http://dsource.org/projects/orange/) doesn't have those problems 
(it probably has other problems). These are some of the features:

* It handles both serializing and deserializing
* It automatically serializes the base classes
* It supports events (before and after (de)serializing)
* It supports non-serialized fields (you can say that some fields in a 
class should not be serialized)
* It's licensed under the Boost license
* It's fairly std/runtime library independent
* You can create new archive types and use them with the existing serializer
* Serializes through base class references
* Serializes third party types

The only thing missing that I want to implement is to somehow support 
different versions of classes and structs. I'm currently working on 
porting the XML archive to Phobos.

-- 
/Jacob Carlborg

Jun 19 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-06-19 15:52:51 -0400, Jacob Carlborg <doob me.com> said:

 Orange (http://dsource.org/projects/orange/) doesn't have those 
 problems (it probably has other problems). These are some of the 
 features:
 
 * It automatically serializes the base classes
  ...
 * Serializes through base class references
 * Serializes third party types

By that, do you mean it serializes the fields of the base classes and 
other types even when those types weren't designed to be serialized? 
Personally, I think it's dangerous to break encapsulation like that. 
You can never guaranty that the base class won't contain data that 
shouldn't be serialized (a reference count, a file or socket handle, 
cached data bound to be different at a later time, a reference to the 
root of a huge hierarchy isn't "part" of the object, etc.). 
Serialization might work for one version of the base class, and when 
the implementation changes it breaks.

I prefer the reverse design: only types with a known serialization 
procedure are serializable. That said, I can still appreciate the merit 
of simplicity when it comes to serializing types like this one:

	struct Point { float x, y; }

But it's easy to make it explicitly serializable too:

	mixin AllMembersArchivable!(Point);


 * It's licensed under the Boost license

And mine isn't public yet. :-)

 The only thing missing that I want to implement is to somehow support 
 different versions of classes and structs. I'm currently working on 
 porting the XML archive to Phobos.

Versioning is something I generally prefer to handle at the application 
level. Having class and struct fields serialized with a string key 
makes that quite simple and allows keeping the serialized data 
compatible with earlier versions of the same struct/class.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 19 2010

Jacob Carlborg <doob me.com> writes:

On 2010-06-20 00:34, Michel Fortin wrote:
 On 2010-06-19 15:52:51 -0400, Jacob Carlborg <doob me.com> said:

 Orange (http://dsource.org/projects/orange/) doesn't have those
 problems (it probably has other problems). These are some of the
 features:

 * It automatically serializes the base classes
 ...
 * Serializes through base class references
 * Serializes third party types

 By that, do you mean it serializes the fields of the base classes and
 other types even when those types weren't designed to be serialized?
 Personally, I think it's dangerous to break encapsulation like that. You
 can never guaranty that the base class won't contain data that shouldn't
 be serialized (a reference count, a file or socket handle, cached data
 bound to be different at a later time, a reference to the root of a huge
 hierarchy isn't "part" of the object, etc.). Serialization might work
 for one version of the base class, and when the implementation changes
 it breaks.

You always have the option to register a serializer for a specific type 
if you want to customize the serialization. I'm also thinking of adding 
the option to register a serializer for a specific field in a specific 
type.

 I prefer the reverse design: only types with a known serialization
 procedure are serializable. That said, I can still appreciate the merit
 of simplicity when it comes to serializing types like this one:

 struct Point { float x, y; }

 But it's easy to make it explicitly serializable too:

 mixin AllMembersArchivable!(Point);

In Orange you have three options:

1. Let the library do all the work
2. Implement two methods in the class/struct that handle the serialization
3. Register a (de)serializer function/delegate that handles the 
serialization

You can also combine these three options. For example, let the library 
handle two out of three classes in a class hierarchy (1) and handle the 
third one by yourself (2 or 3). The third option also allows you to 
customize third party types.


 * It's licensed under the Boost license

 And mine isn't public yet. :-)

 The only thing missing that I want to implement is to somehow support
 different versions of classes and structs. I'm currently working on
 porting the XML archive to Phobos.

 Versioning is something I generally prefer to handle at the application
 level. Having class and struct fields serialized with a string key makes
 that quite simple and allows keeping the serialized data compatible with
 earlier versions of the same struct/class.

Every value in Orange is serialized with a key. By default an object 
will just have a numerical id from a counter and the fields will have 
their names as the key.

-- 
/Jacob Carlborg

Jun 20 2010

Jacob Carlborg <doob me.com> writes:

On 2010-06-19 04:18, Masahiro Nakagawa wrote:
 On Fri, 18 Jun 2010 23:36:54 +0900, Rory McGuire 
 <rmcguire neonova.co.za> wrote:
 I would prefer to see something like this as standard:
 http://www.dsource.org/projects/orange/

 
 I think Doost.serialization is one of candidates for 
 std.serializer(serialization?).
 But I don't know current status.

Doost.serialization doesn't support:
 * Serialization from base class is not possible
 * Creation of class/struct instance without default/public constructor

Serialization from base class was one of my goals when I implemented
Orange. But you need to register a function for that, until D2 fixes the
"getMembers" method in TypeInfo_Class.

 In addition, Michel has serialization module.
 I don't know module detail, but he mentioned at d.announce.
 http://lists.puremagic.com/pipermail/digitalmars-d-announce/20
0-April/015271.html 
 
 
 I don't know other general serialization modules.
 
 
 Masahiro


-- 
/Jacob Carlborg

Jun 19 2010

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Sun, 20 Jun 2010 04:27:08 +0900, Jacob Carlborg <doob me.com> wrote:
 Serialization from base class was one of my goals when I implemented  
 Orange.

Good.

 But you need to register a function for that, until D2 fixes the
 "getMembers" method in TypeInfo_Class.

You reported this bug? I couldn't find closest bug in bugzilla.


Masahiro

Jun 21 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 21 Jun 2010 22:24:42 -0400, Masahiro Nakagawa  
<repeatedly gmail.com> wrote:

 On Sun, 20 Jun 2010 04:27:08 +0900, Jacob Carlborg <doob me.com> wrote:
 Serialization from base class was one of my goals when I implemented  
 Orange.

 Good.

 But you need to register a function for that, until D2 fixes the
 "getMembers" method in TypeInfo_Class.

 You reported this bug? I couldn't find closest bug in bugzilla.

The root bug is 1348: offTi property of TypeInfo return empty array. offTi  
or offset typeinfo should contain an array of offsets and their typeinfo,  
i.e. runtime reflection. There are also a bunch of other typeinfo bugs in  
bugzilla.

 Masahiro

Jun 21 2010

Jacob Carlborg <doob me.com> writes:

On 2010-06-22 04:39, Robert Jacques wrote:
 On Mon, 21 Jun 2010 22:24:42 -0400, Masahiro Nakagawa
 <repeatedly gmail.com> wrote:

 On Sun, 20 Jun 2010 04:27:08 +0900, Jacob Carlborg <doob me.com> wrote:
 Serialization from base class was one of my goals when I implemented
 Orange.

 Good.

 But you need to register a function for that, until D2 fixes the
 "getMembers" method in TypeInfo_Class.

 You reported this bug? I couldn't find closest bug in bugzilla.

 The root bug is 1348: offTi property of TypeInfo return empty array.
 offTi or offset typeinfo should contain an array of offsets and their
 typeinfo, i.e. runtime reflection. There are also a bunch of other
 typeinfo bugs in bugzilla.

Like the missing const declaration on the "name" method in 
TypeInfo_Member (or what the name of the class is).

 Masahiro



-- 
/Jacob Carlborg

Jun 22 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 18 Jun 2010 00:29:50 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 There's one larger submission that needs review - Masahiro Nakagawa's  
 std.msgpack. I think it would be great if this community established a  
 review process. To do so, we should abstract away starting from at least  
 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code and  
 doc)

 Feel free to comment here.


 Andrei

First, +1 to the review process.

Second, a lot of the design and examples are very similar to the MsgPack  
reference implementations, which are all Apache licensed. Since some of  
these design decisions seem to not be 'D'-ish (more on that later) this  
raises questions on just how clean room the implementation was.

Third, there's a lot of things that shouldn't be in the final module.
-Should be removed:
SimpleBuffer, VRefBuffer/vRefBuffer,
-Redundant with other parts of phobos:
BinaryFileWriter => LockingTextWriter, mp_Type/mp_Object/mp_KeyValue =>  
std.variant
-Should be moved elsewhere
DeflateFilter/deflateFilter => zip/zlib

Forth, msgpack was designed to be a simple, wire level encoding of some  
higher structure. As opposed to JSON or XML, you are not going to be  
modifying it dynamically. The very concept that it would use buffers  
internally, particularly the existence of the "zero-copy" buffer, baffles  
me. (Not to mention it leaves one open to aliasing bugs.) MsgPack has to  
decorate every single type it stores, so there is nothing "zero-copy"  
about it. To that end, I think MsgPack/MsgUnPack should wrap an input or  
output range as appropriate and be able to be used by a general  
serialization library such as Orange.

Other little things,
-Why isn't nil mapped to null?
-Why isn't there any big warnings about serializing cycles?

Jun 18 2010

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Fri, 18 Jun 2010 23:47:38 +0900, Robert Jacques <sandford jhu.edu>  
wrote:
 Second, a lot of the design and examples are very similar to the MsgPack  
 reference implementations, which are all Apache licensed. Since some of  
 these design decisions seem to not be 'D'-ish (more on that later) this  
 raises questions on just how clean room the implementation was.

The author of MessagePack allows this module to apply Boost Software  
License.

 Third, there's a lot of things that shouldn't be in the final module.
 -Should be removed:
 SimpleBuffer, VRefBuffer/vRefBuffer,
 -Redundant with other parts of phobos:
 BinaryFileWriter => LockingTextWriter, mp_Type/mp_Object/mp_KeyValue =>  
 std.variant

BinaryFileWriter removed.

mp_Object is based on std.json.JSONValue.
Variant can't have mp_pack method.
Hmm..., I check Variant again.

 -Should be moved elsewhere
 DeflateFilter/deflateFilter => zip/zlib

I agree.

 Forth, msgpack was designed to be a simple, wire level encoding of some  
 higher structure. As opposed to JSON or XML, you are not going to be  
 modifying it dynamically. The very concept that it would use buffers  
 internally, particularly the existence of the "zero-copy" buffer,  
 baffles me. (Not to mention it leaves one open to aliasing bugs.)  
 MsgPack has to decorate every single type it stores, so there is nothing  
 "zero-copy" about it. To that end, I think MsgPack/MsgUnPack should wrap  
 an input or output range as appropriate and be able to be used by a  
 general serialization library such as Orange.

If Phobos has std.serializer like Orange(or Doost.Serialization?),
I will implement MsgPackArchive.
But general serializer usually is low-performance and lacks some features.
So when user wants performance, user will use Packer and Unpacker directly.

 Other little things,
 -Why isn't nil mapped to null?

Where?

 -Why isn't there any big warnings about serializing cycles?

Oops. Current implementation can't deal with a circular reference.
I added NOTE section to Packer's comment.


Masahiro

Jun 18 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 19 Jun 2010 00:02:28 -0400, Masahiro Nakagawa  
<repeatedly gmail.com> wrote:

 On Fri, 18 Jun 2010 23:47:38 +0900, Robert Jacques <sandford jhu.edu>  
 wrote:
 Second, a lot of the design and examples are very similar to the  
 MsgPack reference implementations, which are all Apache licensed. Since  
 some of these design decisions seem to not be 'D'-ish (more on that  
 later) this raises questions on just how clean room the implementation  
 was.

 The author of MessagePack allows this module to apply Boost Software  
 License.

 Third, there's a lot of things that shouldn't be in the final module.
 -Should be removed:
 SimpleBuffer, VRefBuffer/vRefBuffer,
 -Redundant with other parts of phobos:
 BinaryFileWriter => LockingTextWriter, mp_Type/mp_Object/mp_KeyValue =>  
 std.variant

 BinaryFileWriter removed.

 mp_Object is based on std.json.JSONValue.
 Variant can't have mp_pack method.
 Hmm..., I check Variant again.

Well, std.json is an incomplete, alpha library that's buggy and  
unreviewed. For example, the design is currently very C-ish with structs,  
enums, functions and exceptions all being independent top level constructs.

 -Should be moved elsewhere
 DeflateFilter/deflateFilter => zip/zlib

 I agree.

 Forth, msgpack was designed to be a simple, wire level encoding of some  
 higher structure. As opposed to JSON or XML, you are not going to be  
 modifying it dynamically. The very concept that it would use buffers  
 internally, particularly the existence of the "zero-copy" buffer,  
 baffles me. (Not to mention it leaves one open to aliasing bugs.)  
 MsgPack has to decorate every single type it stores, so there is  
 nothing "zero-copy" about it. To that end, I think MsgPack/MsgUnPack  
 should wrap an input or output range as appropriate and be able to be  
 used by a general serialization library such as Orange.

 If Phobos has std.serializer like Orange(or Doost.Serialization?),
 I will implement MsgPackArchive.
 But general serializer usually is low-performance and lacks some  
 features.
 So when user wants performance, user will use Packer and Unpacker  
 directly.

I'd disagree, with regard to performance: there's no reason why a general  
(cycle free) serializer would be slower than your direct routines. A  
serializer's only job is to pack/unpack complex objects correctly: the key  
advantage is that polymorphic objects/struct only have to support one  
custom serialization routine/mixin. Also, separating physical-encoding  
 from logical-encoding allows for some very powerful idioms. You could, for  
example, add cycle detection/support to msgpack by having the serializer  
write in a micro-format on-top of msgpack. Another, more important example  
is letting the user decide between an array and a map representation for  
structs/objects. Using arrays for objects is extremely brittle. If say, a  
new hire refactors a class from A{ int x; int y;} to A{ int y; int x; }  
you now have an extremely hard to find and track down logic bug in your  
program. To say nothing about trying to upgrade an implementation. On the  
other hand, using maps for objects adds a decent amount of overhead to the  
total serialized length.

 Other little things,
 -Why isn't nil mapped to null?

 Where?

MsgPack has a nil type which I assume is equivalent to null. So why isn't  
it simply included in the object serializer (and arguably also in the  
array and map serializer as well)? For that matter, packNil, packTrue,  
packFalse, etc are all redundant with pack(T). Furthermore, stream,  
packArray, packMap and packRaw all should be internal routines. (Though, a  
rawPack(T)(T value) routine might not go amiss) (Returns: this to method  
chain. => Returns: self. i.e. for method chaining )

Getting back to pack(T), there are 18 of these methods, one of which has  
incorrect constraints ( pack(Types...) if (Types.length > 1) ) with four  
different pieces of documentation. Good, clean documentation is extremely  
important to a library, so this needs to be reduced to one documentation  
block and a handful of methods at most. I'd recommend using isNumeric and  
writing an isSerializable template.

Oh, as for packing real you'll need to use CustomFloat!(80) for cross  
platform inter-op. Also, it shouldn't be  system: there's nothing  
memory-unsafe about writing or reading them. Please don't abuse  system,  
etc for meanings other their intended meanings.

 -Why isn't there any big warnings about serializing cycles?

 Oops. Current implementation can't deal with a circular reference.
 I added NOTE section to Packer's comment.


 Masahiro

Jun 19 2010

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Sun, 20 Jun 2010 02:08:07 +0900, Robert Jacques <sandford jhu.edu>  
wrote:

 On Sat, 19 Jun 2010 00:02:28 -0400, Masahiro Nakagawa  
 <repeatedly gmail.com> wrote:

 mp_Object is based on std.json.JSONValue.
 Variant can't have mp_pack method.
 Hmm..., I check Variant again.

 Well, std.json is an incomplete, alpha library that's buggy and  
 unreviewed. For example, the design is currently very C-ish with  
 structs, enums, functions and exceptions all being independent top level  
 constructs.

I checked Variant, but Variant[Variant] doesn't work... :(

 If Phobos has std.serializer like Orange(or Doost.Serialization?),
 I will implement MsgPackArchive.
 But general serializer usually is low-performance and lacks some  
 features.
 So when user wants performance, user will use Packer and Unpacker  
 directly.

 I'd disagree, with regard to performance: there's no reason why a  
 general (cycle free) serializer would be slower than your direct  
 routines. A serializer's only job is to pack/unpack complex objects  
 correctly: the key advantage is that polymorphic objects/struct only  
 have to support one custom serialization routine/mixin. Also, separating  
 physical-encoding  from logical-encoding allows for some very powerful  
 idioms. You could, for example, add cycle detection/support to msgpack  
 by having the serializer write in a micro-format on-top of msgpack.  
 Another, more important example is letting the user decide between an  
 array and a map representation for structs/objects. Using arrays for  
 objects is extremely brittle. If say, a new hire refactors a class from  
 A{ int x; int y;} to A{ int y; int x; } you now have an extremely hard  
 to find and track down logic bug in your program. To say nothing about  
 trying to upgrade an implementation. On the other hand, using maps for  
 objects adds a decent amount of overhead to the total serialized length.

Yeah, It's a implementation-dependent problem. So I wrote "usually".
If Phobos supports good serializer, I will remove Packer and  
direct-conversion Unpacker.

 Other little things,
 -Why isn't nil mapped to null?

 Where?

 MsgPack has a nil type which I assume is equivalent to null. So why  
 isn't it simply included in the object serializer (and arguably also in  
 the array and map serializer as well)?

Sorry, I can't understand this statement. Nullable types(class, array and  
associative-array)
is serialized to nil if null.

 For that matter, packNil, packTrue, packFalse, etc are all redundant  
 with pack(T).

I removed packTrue, packFalse.

 Furthermore, stream, packArray, packMap and packRaw all should be  
 internal routines. (Though, a rawPack(T)(T value) routine might not go  
 amiss)

I moved packRaw to internal method, but packArray and packMap should be  
public.
MessagePack's array and map can contain any type.

 (Returns: this to method chain. => Returns: self. i.e. for method  
 chaining )

Changed.

 Oh, as for packing real you'll need to use CustomFloat!(80) for cross  
 platform inter-op. Also, it shouldn't be  system: there's nothing  
 memory-unsafe about writing or reading them. Please don't abuse  system,  
 etc for meanings other their intended meanings.

I forgot this. CutstomFloat was already fixed by you and David in trunk.
I added 80bit floating point support for non-x86 environment.


Masahiro

Jun 20 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 21 Jun 2010 01:22:40 -0400, Masahiro Nakagawa  
<repeatedly gmail.com> wrote:
 On Sun, 20 Jun 2010 02:08:07 +0900, Robert Jacques <sandford jhu.edu>  
 wrote:

 On Sat, 19 Jun 2010 00:02:28 -0400, Masahiro Nakagawa  
 <repeatedly gmail.com> wrote:

 mp_Object is based on std.json.JSONValue.
 Variant can't have mp_pack method.
 Hmm..., I check Variant again.

 Well, std.json is an incomplete, alpha library that's buggy and  
 unreviewed. For example, the design is currently very C-ish with  
 structs, enums, functions and exceptions all being independent top  
 level constructs.

 I checked Variant, but Variant[Variant] doesn't work... :(

I know. But then again, how would you define a total ordering for Variant?  
The sadder aspect is that Variant[string] doesn't work either. Anyways, I  
primarily trying to recommend the API, although fixing Variant is the  
better long term solution.

For what it's worth, this works:
   Variant[Variant] bar = [ Variant("foo"): Variant(5),  
Variant(5.5):Variant(6) ];

But this doesn't
   Variant a = "bar";
   Variant b = 5;
   bar[a] = b;


 If Phobos has std.serializer like Orange(or Doost.Serialization?),
 I will implement MsgPackArchive.
 But general serializer usually is low-performance and lacks some  
 features.
 So when user wants performance, user will use Packer and Unpacker  
 directly.

 I'd disagree, with regard to performance: there's no reason why a  
 general (cycle free) serializer would be slower than your direct  
 routines. A serializer's only job is to pack/unpack complex objects  
 correctly: the key advantage is that polymorphic objects/struct only  
 have to support one custom serialization routine/mixin. Also,  
 separating physical-encoding  from logical-encoding allows for some  
 very powerful idioms. You could, for example, add cycle  
 detection/support to msgpack by having the serializer write in a  
 micro-format on-top of msgpack. Another, more important example is  
 letting the user decide between an array and a map representation for  
 structs/objects. Using arrays for objects is extremely brittle. If say,  
 a new hire refactors a class from A{ int x; int y;} to A{ int y; int x;  
 } you now have an extremely hard to find and track down logic bug in  
 your program. To say nothing about trying to upgrade an implementation.  
 On the other hand, using maps for objects adds a decent amount of  
 overhead to the total serialized length.

 Yeah, It's a implementation-dependent problem. So I wrote "usually".
 If Phobos supports good serializer, I will remove Packer and  
 direct-conversion Unpacker.

 Other little things,
 -Why isn't nil mapped to null?

 Where?

 MsgPack has a nil type which I assume is equivalent to null. So why  
 isn't it simply included in the object serializer (and arguably also in  
 the array and map serializer as well)?

 Sorry, I can't understand this statement. Nullable types(class, array  
 and associative-array)
 is serialized to nil if null.

Sorry. The comment was about redundancy. i.e. packNil <=> pack(null).

 For that matter, packNil, packTrue, packFalse, etc are all redundant  
 with pack(T).

 I removed packTrue, packFalse.

 Furthermore, stream, packArray, packMap and packRaw all should be  
 internal routines. (Though, a rawPack(T)(T value) routine might not go  
 amiss)

 I moved packRaw to internal method, but packArray and packMap should be  
 public.
 MessagePack's array and map can contain any type.

I can see the usefulness of the functionality, but for an end-user API, it  
is brittle and error-prone. For example:  
p.packArray(3).pack(5).pack("six"); produces an incorrect msgPack stream.  
A better syntax would be: p.pack(5,"six"); or p.packArray(5,"six"); I know  
you're using pack(Types...) to mean pack(Types[0]).pack(Types[1])... but  
using it for variable type arrays is worth considering. For packMap, pack(  
["key": "value"], [5:"six"] ) could work, but packMap( ["key": "value"],  
[5:"six"] ) or packMap("key","value",5,"six") would avoid confusion with  
pack(Types...).

 (Returns: this to method chain. => Returns: self. i.e. for method  
 chaining )

 Changed.

 Oh, as for packing real you'll need to use CustomFloat!(80) for cross  
 platform inter-op. Also, it shouldn't be  system: there's nothing  
 memory-unsafe about writing or reading them. Please don't abuse  
  system, etc for meanings other their intended meanings.

 I forgot this. CutstomFloat was already fixed by you and David in trunk.
 I added 80bit floating point support for non-x86 environment.

Don't forget that some x86 platform store 80-bit floats in more than  
80-bits. So you do need to check the size of real on x86 platforms as well.

 Masahiro

P.S. Don't forget to apply changes to packer's API to unpacker.

Jun 21 2010

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

Sorry for the delay.


On Tue, 22 Jun 2010 13:47:19 +0900, Robert Jacques <sandford jhu.edu>  
wrote:

 Other little things,
 -Why isn't nil mapped to null?

 Where?

 MsgPack has a nil type which I assume is equivalent to null. So why  
 isn't it simply included in the object serializer (and arguably also  
 in the array and map serializer as well)?

 Sorry, I can't understand this statement. Nullable types(class, array  
 and associative-array)
 is serialized to nil if null.

 Sorry. The comment was about redundancy. i.e. packNil <=> pack(null).

I added pointer type (de)serialization.

 For that matter, packNil, packTrue, packFalse, etc are all redundant  
 with pack(T).

 I removed packTrue, packFalse.

 Furthermore, stream, packArray, packMap and packRaw all should be  
 internal routines. (Though, a rawPack(T)(T value) routine might not go  
 amiss)

 I moved packRaw to internal method, but packArray and packMap should be  
 public.
 MessagePack's array and map can contain any type.

 I can see the usefulness of the functionality, but for an end-user API,  
 it is brittle and error-prone. For example:  
 p.packArray(3).pack(5).pack("six"); produces an incorrect msgPack  
 stream. A better syntax would be: p.pack(5,"six"); or  
 p.packArray(5,"six"); I know you're using pack(Types...) to mean  
 pack(Types[0]).pack(Types[1])... but using it for variable type arrays  
 is worth considering. For packMap, pack( ["key": "value"], [5:"six"] )  
 could work, but packMap( ["key": "value"], [5:"six"] ) or  
 packMap("key","value",5,"six") would avoid confusion with pack(Types...).

I have thought about this point. In the result, I changed (Un)Packer API.
It's true that your suggested syntax is better.
Old packArray was renamed to beginArray and new packArray serializes
container type using variadic arguments. Of course, Map too.

-----
/* old */
packer.packArray(2).pack(true, 10);

/* new */
packer.packArray(true, 10);
// == packer.beginArray(2).pack(true, 10)
-----

Thanks for pointing out this.


Masahiro

Jun 24 2010

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:

Dnia 18-06-2010 o 06:29:50 Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org> napisa=B3(a):

 There's one larger submission that needs review - Masahiro Nakagawa's =

 =

 std.msgpack. I think it would be great if this community established a=

  =

 review process. To do so, we should abstract away starting from at lea=

st  =

 one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code a=

nd  =

 doc)

 Feel free to comment here.

Doc bug:
"VRefBuffer is a zero copy buffer for more efficient"

... for more efficient what?

Jun 18 2010

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Sat, 19 Jun 2010 04:47:35 +0900, Tomek Sowi=C5=84ski <just ask.me> wr=
ote:

 Dnia 18-06-2010 o 06:29:50 Andrei Alexandrescu  =

 <SeeWebsiteForEmail erdani.org> napisa=C5=82(a):

 There's one larger submission that needs review - Masahiro Nakagawa's=


  =

 std.msgpack. I think it would be great if this community established =


a  =

 review process. To do so, we should abstract away starting from at  =


 least one review :o). Here's Masahiro's work:

 code: https://dl.dropbox.com/u/374829/std/msgpack.d
 doc:  https://dl.dropbox.com/u/374829/std/msgpack.html
 zip:  https://dl.dropbox.com/u/374829/std_msgpack.zip (archives code =


 =

 and doc)

 Feel free to comment here.

 Doc bug:
 "VRefBuffer is a zero copy buffer for more efficient"

 ... for more efficient what?

Oh, I forgot "serialization".
Thanks!


Masahiro

Jun 18 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Review: std.msgpack