digitalmars.D.announce - serialization library

Christian Kamm (68/68) Nov 08 2006 Based on initial work from Tom S and clayasaurus, I've written this =

Walter Bright (3/5) Nov 08 2006 Great! Can you do some of the suggestions on

Christian Kamm (2/4) Nov 09 2006 Sure, I wanted to wait until I got some responses and feedback from the ...

Bill Baxter (50/84) Nov 08 2006 I'm using Boost::serialization but I'm not at all happy with it. But

Walter Bright (14/19) Nov 08 2006 Evolution of a file format:

Bill Baxter (8/33) Nov 08 2006 I guess I'm no exception. ;-) I've been through the 4 step program a

Georg Wrede (13/51) Nov 10 2006 Heh, something Microsoft is only now trying to learn. And Unix guys knew...

Bill Baxter (15/29) Nov 10 2006 Having all the structure in ASCII is great, and maybe everything in

Georg Wrede (20/45) Nov 11 2006 Probably just as well, since then a normal parser could not handle it.
Sean Kelly (3/28) Nov 11 2006 Base64 encoding :-p

Robert Ramey (5/5) Dec 01 2009 boost serialization has an object wrapper for binary data - called (surp...

Christian Kamm (10/14) Nov 09 2006 Oh, indeed. It does not take care of them yet. Additionally, classes are...
Christian Kamm (26/32) Nov 09 2006 Check out
Fredrik Olsson (8/38) Nov 09 2006 What you want is a lib for reading and writing EA IFF-85 compatible file...

Bill Baxter (26/34) Nov 09 2006 I've never heard of EA IFF-85, but a brief skim of the description here:

Fredrik Olsson (26/73) Nov 10 2006 No it is not limited to any specific kind of files. EA made it as an

Paulo Herrera (9/90) Jan 06 2007 Take a look at the HDF file format that is used to serialize huge amount...

"Christian Kamm" <kamm nospam.de> writes:

Based on initial work from Tom S and clayasaurus, I've written this  =

serialization library. If hope something like this doesn't already exist=
!

http://www.math.tu-berlin.de/~kamm/d/serialization.zip

Currently, it only provides binary file io through the Serializer class.=
  =

It can
- write/read almost (hopefully) every type through a call to  =

Serializer.describe
- track class references and pointers by default
- serialize classes and structs through a templated 'describe' member  =

function
- write derived classes from base class reference*
- read derived classes into base class reference*
- serialize not default constructible classes*

(* for this to work, the class needs to be registered with the archive  =

type)

It has far less features than boost::serialization but is already in a  =

very usable state: FreeUniverse, a D game based on the Arc library, uses=
  =

it for writing and loading savegames as well as other persistant state  =

information.

What it does not do/is missing:
- exception safety / multithread safety
- out-of-class/struct serialization methods (is it possible to check  =

whether a specific overload exists at compile time?)
- static arrays need to be serialized with describe_staticarray (static =
 =

arrays can't be inout, so the general-purpose template method doesn't  =

work... is there a way around the problem?)
- things I forgot right now

Documentation is still rather sparse. This short example shows the basic=
  =

usage

---
struct Foo
{
   int a =3D 3;

   void describe(T)(T archive)
   {
     archive.describe(a);
   }
}

void main()
{
   real bar =3D 3.141;
   Foo foo;

   // write data
   Serializer s =3D new Serializer("testfile", FileMode.Out);
   s.describe(bar);
   s.describe(foo);
   delete s;

   // read data
   s =3D new Serializer("testfile", FileMode.In);
   s.describe(bar);
   s.describe(foo);
}
---

See the unittests in serializer.d for other details. Most of the logic i=
s  =

in basicarchive.d. Docs definitely need work.

Since FreeUniverse was its first real user, it is currently maintained i=
n  =

the FreeUniverse svn. However, if other people are interested, I will  =

request a seperate project for it on dsource.

Comments and improvements are of course welcome.

Best Regards,
Christian Kamm

Nov 08 2006

Walter Bright <newshound digitalmars.com> writes:

Christian Kamm wrote:
 Based on initial work from Tom S and clayasaurus, I've written this 
 serialization library. If hope something like this doesn't already exist!

Great! Can you do some of the suggestions on 
http://www.digitalmars.com/d/howto-promote.html?

Nov 08 2006

"Christian Kamm" <kamm nospam.de> writes:

 Great! Can you do some of the suggestions on  
 http://www.digitalmars.com/d/howto-promote.html?

Sure, I wanted to wait until I got some responses and feedback from the  
community though.

Nov 09 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Christian Kamm wrote:
 Based on initial work from Tom S and clayasaurus, I've written this 
 serialization library. If hope something like this doesn't already exist!

Great!

 
 http://www.math.tu-berlin.de/~kamm/d/serialization.zip
 
 Currently, it only provides binary file io through the Serializer class. 
 It can
 - write/read almost (hopefully) every type through a call to 
 Serializer.describe
 - track class references and pointers by default
 - serialize classes and structs through a templated 'describe' member 
 function
 - write derived classes from base class reference*
 - read derived classes into base class reference*
 - serialize not default constructible classes*
 
 (* for this to work, the class needs to be registered with the archive 
 type)
 
 It has far less features than boost::serialization but is already in a 
 very usable state: FreeUniverse, a D game based on the Arc library, uses 
 it for writing and loading savegames as well as other persistant state 
 information.

I'm using Boost::serialization but I'm not at all happy with it.  But 
the things that I don't like mostly have to do with versioning, which it 
looks like you don't support anyway.

 What it does not do/is missing:
 - exception safety / multithread safety
 - out-of-class/struct serialization methods (is it possible to check 
 whether a specific overload exists at compile time?)

I could be mistaken but I think this is that ADL / Koenig Lookup 
territory that Walter doesn't want go into.

 - static arrays need to be serialized with describe_staticarray (static 
 arrays can't be inout, so the general-purpose template method doesn't 
 work... is there a way around the problem?)
 - things I forgot right now

Endian issues?

 
 Documentation is still rather sparse. This short example shows the basic 
 usage


Just a wish list item, but I'd prefer an actual "file format" library as 
opposed to a serialization library.  Maybe a file format library would 
build on top of the serialization library, but anyway, the key 
difference is that a serialization lib aims to turn *particular* data 
structures into a binary format that can be losslessly loaded back into 
the same data structure later.

But that is not the way people design generic file formats, like say the 
Photoshop file format.  Things like that need to be very extensible and 
shouldn't be tied to particular data structures.  I think that's where 
boost::serialization gets into trouble.  Once you start talking about 
versioning, you're no longer talking about one specific data structure.

For instance Boost::serialization lacks a way to ignore blocks or skip 
chunks of data that are not recognized or obsolete.  You actually have 
to load the obsolete thing into the proper (possibly obsolete) data 
structure and then delete the unnecessary thing you just created.  This 
is not good from the forwards/backwards compatibility view.  Old code 
simply cannot read the file (even if it understands the majority of the 
chunks that matter), and new code is forced to maintain old data 
structures just for the purpose of loading up obsolete data and throwing 
it away.

How do you fix it?  Very simple really.  Just store the file as a series 
of chunks with fixed length headers, and each header contains the length 
of the data in that chunk.  If you get a chunk header with a tag you 
don't understand, just ignore it.  A particular chunk can have 
sub-chunks too.  I think it's similar in many ways to a grammar definition:

   file:
     header chunklist

   chunklist:
     chunk
     chunk chunklist

   header:
     typeIndicator versionNumber DataEndianness

   chunk:
     chunkHeader data

   chunkHeader:
     chunkType DataLength

   data:
     // Here's where you list all the types of data known to you

Or something like that.
I'd like a library that helps me read and write my data in that sort of 
data-structure independent format.

--bb

Nov 08 2006

Walter Bright <newshound digitalmars.com> writes:

Bill Baxter wrote:
 How do you fix it?  Very simple really.  Just store the file as a series 
 of chunks with fixed length headers, and each header contains the length 
 of the data in that chunk.  If you get a chunk header with a tag you 
 don't understand, just ignore it.  A particular chunk can have 
 sub-chunks too.

Evolution of a file format:

1.0: Just spew the struct contents out into a file using something like 
fwrite().

2.0: Oops! Need to update 1.0 and retain backwards compatibility. 
Solution: 2.0 files put out 'illegal' values into the 1.0 format to 
signal it's a 2.0 file.

3.0: Doh! Find another set of illegal 2.0 values. This time, get smarter 
and have another field with a version number in it.

4.0: Get smart and implement your suggestion, so you can have both 
backwards and *forwards* compatibility.

Think I'm joking? Just look at a few! Everyone learns this the hard way.

Me, if practical, I like file formats to be in ascii so I can examine 
them easily to see if they're working right.

Nov 08 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Walter Bright wrote:
 Bill Baxter wrote:
 How do you fix it?  Very simple really.  Just store the file as a 
 series of chunks with fixed length headers, and each header contains 
 the length of the data in that chunk.  If you get a chunk header with 
 a tag you don't understand, just ignore it.  A particular chunk can 
 have sub-chunks too.

 
 Evolution of a file format:
 
 1.0: Just spew the struct contents out into a file using something like 
 fwrite().
 
 2.0: Oops! Need to update 1.0 and retain backwards compatibility. 
 Solution: 2.0 files put out 'illegal' values into the 1.0 format to 
 signal it's a 2.0 file.
 
 3.0: Doh! Find another set of illegal 2.0 values. This time, get smarter 
 and have another field with a version number in it.
 
 4.0: Get smart and implement your suggestion, so you can have both 
 backwards and *forwards* compatibility.
 
 Think I'm joking? Just look at a few! Everyone learns this the hard way.

I guess I'm no exception.  ;-)  I've been through the 4 step program a 
few times myself.

 Me, if practical, I like file formats to be in ascii so I can examine 
 them easily to see if they're working right.

That is one thing I do like about boost::serialization.  With basically 
one line of code I can switch between xml serialization and binary 
serialization.  Only thing I didn't like was I couldn't figure out how 
to keep some things binary.

--bb

Nov 08 2006

Georg Wrede <georg.wrede nospam.org> writes:

Bill Baxter wrote:
 Walter Bright wrote:
 
 Bill Baxter wrote:

 How do you fix it?  Very simple really.  Just store the file as a 
 series of chunks with fixed length headers, and each header contains 
 the length of the data in that chunk.  If you get a chunk header with 
 a tag you don't understand, just ignore it.  A particular chunk can 
 have sub-chunks too.


 Evolution of a file format:

 1.0: Just spew the struct contents out into a file using something 
 like fwrite().

 2.0: Oops! Need to update 1.0 and retain backwards compatibility. 
 Solution: 2.0 files put out 'illegal' values into the 1.0 format to 
 signal it's a 2.0 file.

 3.0: Doh! Find another set of illegal 2.0 values. This time, get 
 smarter and have another field with a version number in it.

 4.0: Get smart and implement your suggestion, so you can have both 
 backwards and *forwards* compatibility.

 Think I'm joking? Just look at a few! Everyone learns this the hard way.

 
 
 I guess I'm no exception.  ;-)  I've been through the 4 step program a 
 few times myself.
 
 Me, if practical, I like file formats to be in ascii so I can examine 
 them easily to see if they're working right.


Heh, something Microsoft is only now trying to learn. And Unix guys knew 
right from the start. Even most of the communications protocols are in text.

 That is one thing I do like about boost::serialization.  With basically 
 one line of code I can switch between xml serialization and binary 
 serialization.  Only thing I didn't like was I couldn't figure out how 
 to keep some things binary.

With a text file, you can tell what it is, even when the file has got 
misplaced or renamed, but with a binary it's pretty hopeless.

And of course you might look at a few languages ( ~ fileformats) 
especially made for serializing.

YAML looks very clean, and is easily readable by humans (XML is not)
JSON looks like ECMA script

The following page, although only vaguely related, gives an excellent 
intro to the ideology, at the center:

http://mike.teczno.com/json.html

With these, you'll be right where Walter was talking about.

Nov 10 2006

Bill Baxter <wbaxter gmail.com> writes:

Georg Wrede wrote:
 Bill Baxter wrote:
 
 That is one thing I do like about boost::serialization.  With 
 basically one line of code I can switch between xml serialization and 
 binary serialization.  Only thing I didn't like was I couldn't figure 
 out how to keep some things binary.

 
 
 With a text file, you can tell what it is, even when the file has got 
 misplaced or renamed, but with a binary it's pretty hopeless.

Having all the structure in ASCII is great, and maybe everything in 
ASCII while you're debugging, but some things just don't work well as 
ascii -- images, videos, audio files, 3D meshes, etc.  It makes sense to 
have the structure annotated in ascii, but when it comes to storing raw 
image data there's not much to be gained from storing that as a giant 
ASCII string.  With the boost::serialization's XML I wanted to be able 
to store that image as something like

<image width=1024 height=768 format=RGBA type="float">
   [big hunk o raw binary image data]
</image>

But I couldn't find any way to do that.

 And of course you might look at a few languages ( ~ fileformats) 
 especially made for serializing.
 
 YAML looks very clean, and is easily readable by humans (XML is not)

I took a look at that one before.  I agree that it would be nice if a 
more human-friendly alternative to XML caught on.

--bb

Nov 10 2006

Georg Wrede <georg.wrede nospam.org> writes:

Bill Baxter wrote:
 Georg Wrede wrote:
 
 Bill Baxter wrote:

 That is one thing I do like about boost::serialization.  With 
 basically one line of code I can switch between xml serialization and 
 binary serialization.  Only thing I didn't like was I couldn't figure 
 out how to keep some things binary.

 With a text file, you can tell what it is, even when the file has got 
 misplaced or renamed, but with a binary it's pretty hopeless.

 
 Having all the structure in ASCII is great, and maybe everything in 
 ASCII while you're debugging, but some things just don't work well as 
 ascii -- images, videos, audio files, 3D meshes, etc.  It makes sense to 
 have the structure annotated in ascii, but when it comes to storing raw 
 image data there's not much to be gained from storing that as a giant 
 ASCII string.  With the boost::serialization's XML I wanted to be able 
 to store that image as something like
 
 <image width=1024 height=768 format=RGBA type="float">
   [big hunk o raw binary image data]
 </image>
 
 But I couldn't find any way to do that.

Probably just as well, since then a normal parser could not handle it.

Or if you didn't care, you could invent your own tag type, like

<image width=1024 height=768 format=RGBA type="float">
    <binarydata 

֤*^%��</binarydata 

</image>

And, as you can see, you'd immediately lose the gains of having the 
thing as a text file because it gets unwieldy in a text viewer.

You could always serialize into a subdirectory and save the binaries 
(pictures, etc) as separate files there. Then the XML would only contain 
their names. (Not my invention. Java uses this, OpenOffice, and others.)

To save space you then zip the whole thing, thus getting your single 
serialization file, as originally wanted.

---

This can be very simple in the program, I remember seeing somewhere a 
library that made a zip file on disk look to the program like a 
subdirectory tree. Thus the zipping, creation of directories and other 
chores become transparent to the programmer.

Or you could just use the Phobos zip without a tree.

Nov 11 2006

Sean Kelly <sean f4.ca> writes:

Bill Baxter wrote:
 Georg Wrede wrote:
 Bill Baxter wrote:

 That is one thing I do like about boost::serialization.  With 
 basically one line of code I can switch between xml serialization and 
 binary serialization.  Only thing I didn't like was I couldn't figure 
 out how to keep some things binary.


 With a text file, you can tell what it is, even when the file has got 
 misplaced or renamed, but with a binary it's pretty hopeless.

 
 Having all the structure in ASCII is great, and maybe everything in 
 ASCII while you're debugging, but some things just don't work well as 
 ascii -- images, videos, audio files, 3D meshes, etc.  It makes sense to 
 have the structure annotated in ascii, but when it comes to storing raw 
 image data there's not much to be gained from storing that as a giant 
 ASCII string.  With the boost::serialization's XML I wanted to be able 
 to store that image as something like
 
 <image width=1024 height=768 format=RGBA type="float">
   [big hunk o raw binary image data]
 </image>
 
 But I couldn't find any way to do that.

Base64 encoding :-p


Sean

Nov 11 2006

Robert Ramey <ramey rrsd.com> writes:

boost serialization has an object wrapper for binary data - called (surprise)
binary_object.  On text based formats, it uses base64 encoding it's in the
documentation and also there is a specific test which shows how to use it.  And
its extremely easy to use.

Robert Ramey

Dec 01 2009

"Christian Kamm" <kamm nospam.de> writes:

 What it does not do/is missing:

 Endian issues?

Oh, indeed. It does not take care of them yet. Additionally, classes are  
(if unregistered) identified by their mangled name, which might vary  
between compilers, I think.

 Just a wish list item, but I'd prefer an actual "file format" library as  
 opposed to a serialization library.  ...

I agree that it is not very well suited for writing/reading user data or  
files with a long life-expectancy. It is very nice for temporarily  
swapping data to disk and similar tasks, where the same process reads back  
the data it wrote to a file earlier.

A full-fledged "file format" library, while being something very useful  
I'd love to see as well, would be a project for another day though.

Christian

Nov 09 2006

"Christian Kamm" <kamm nospam.de> writes:

 How do you fix it?  Very simple really.  Just store the file as a seri=

es  =

 of chunks with fixed length headers, and each header contains the leng=

th  =

 of the data in that chunk.  If you get a chunk header with a tag you  =

 don't understand, just ignore it.  A particular chunk can have  =

 sub-chunks too.  I think it's similar in many ways to a grammar  =

 definition:

Check out
http://www.math.tu-berlin.de/~kamm/d/serializationchunk.zip

The chunk.d contains a hackish implementation of your chunk idea: when  =

reading, it discards any chunk-parts it doesn't understand. Once it got =
to  =

one it can process, it discards any other older-versioned chunks of the =
 =

same type. When writing, it is possible to write legacy chunks for older=
  =

versions.

To test it, you need to compile with -version=3DV1_SER for the 1.0 versi=
on  =

of the program and with -version=3DV1_SER -version=3DV2_SER for the 2.0 =
 =

version of the program. Try running the v2 version, copy data_out to  =

data_in and run the v1 version. (sorry for the complicated instructions,=
  =

it's just a hack!)

Is this, approximately, what you had in mind? Personally, I'm not sure  =

about all those classes required and how it would look in a larger  =

project: maybe writing a version number and then having the user write a=
  =

switch statement for it would have been ok too.

Christian

Nov 09 2006

Fredrik Olsson <peylow gmail.com> writes:

Bill Baxter skrev:
<snip>
 How do you fix it?  Very simple really.  Just store the file as a series 
 of chunks with fixed length headers, and each header contains the length 
 of the data in that chunk.  If you get a chunk header with a tag you 
 don't understand, just ignore it.  A particular chunk can have 
 sub-chunks too.  I think it's similar in many ways to a grammar definition:
 
   file:
     header chunklist
 
   chunklist:
     chunk
     chunk chunklist
 
   header:
     typeIndicator versionNumber DataEndianness
 
   chunk:
     chunkHeader data
 
   chunkHeader:
     chunkType DataLength
 
   data:
     // Here's where you list all the types of data known to you
 
 Or something like that.
 I'd like a library that helps me read and write my data in that sort of 
 data-structure independent format.
 

What you want is a lib for reading and writing EA IFF-85 compatible files?

I actually have some code for D doing this around somewhere. Written to 
be able to read IFF graphics files and Lightwave 3D objects.

I shall dig up the code, clean it up in a presentable shape, and make it 
public.


// Fredrik Olsson

 --bb

Nov 09 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Fredrik Olsson wrote:
 Bill Baxter skrev:
 <snip>


 What you want is a lib for reading and writing EA IFF-85 compatible files?

I've never heard of EA IFF-85, but a brief skim of the description here:
http://www.newtek.com/lightwave/developer/LW80/8lwsdk/docs/filefmts/eaiff85.html
sounds good.

Is it something 3D-graphics specific though?  Electronic Arts created 
the standard, and the website I found above is for a 3D modeling 
package...  But it looks right.

But the truth is I don't know what I want exactly in terms of API.  I 
just want something that makes it easy to take my data structures -> 
extract the data into something generic and ESPECIALLY not intrinsically 
tied to the types in my program -> save it to disk -> load it back into 
whatever data structures I choose later.  It's ok if it's a little more 
painful than   MyData.serialize(archive); MyData.load(archive);  as long 
as it achieves the goal.

With Boost::serialization I've ended up having to write upgrader 
programs a few times over the course of development.  It's always a pain 
because boost::serialization wants to be smart so what I end up doing is 
taking an old version of my data structures header file, wrapping it in 
an "oldversion" namespace then load via the olversion::type, and save 
via the newversion::type.

 I actually have some code for D doing this around somewhere. Written to 
 be able to read IFF graphics files and Lightwave 3D objects.

Oh, right, lightwave.  So I guess it's not a coincidence google for EA 
IFF-85 turned up NewTek's page.

 I shall dig up the code, clean it up in a presentable shape, and make it 
 public.

Cool, can you explain how the API works a little?  I guess I can imagine 
that loading such a file is not so different from loading an XML file. 
So like XML parsing there are a few ways to do it.


--bb

Nov 09 2006

Fredrik Olsson <peylow gmail.com> writes:

Bill Baxter wrote:
 Fredrik Olsson wrote:
 Bill Baxter skrev:
 <snip>

 

 What you want is a lib for reading and writing EA IFF-85 compatible 
 files?

 
 I've never heard of EA IFF-85, but a brief skim of the description here:
 http://www.newtek.com/lightwave/developer/LW80/8lwsdk/docs/f
lefmts/eaiff85.html 
 
 sounds good.
 
 Is it something 3D-graphics specific though?  Electronic Arts created 
 the standard, and the website I found above is for a 3D modeling 
 package...  But it looks right.
 

No it is not limited to any specific kind of files. EA made it as an 
atempt to make a general file format structure for any use. Basically it 
just takes care of bundling chunks, marking them as required of 
optional. And byte-order-independence! IFF as such has been used for 
images as in IFF, audio AIFF, 3D objects OBJ, and lots more.

When Microsoft created BMP and WAV they more or less ripped the EA IFF 
rationale, but changed the required byte order.

EA IFF is a low level format. How to actually interpret the data that is 
contained is up to each application.


 But the truth is I don't know what I want exactly in terms of API.  I 
 just want something that makes it easy to take my data structures -> 
 extract the data into something generic and ESPECIALLY not intrinsically 
 tied to the types in my program -> save it to disk -> load it back into 
 whatever data structures I choose later.  It's ok if it's a little more 
 painful than   MyData.serialize(archive); MyData.load(archive);  as long 
 as it achieves the goal.
 
 With Boost::serialization I've ended up having to write upgrader 
 programs a few times over the course of development.  It's always a pain 
 because boost::serialization wants to be smart so what I end up doing is 
 taking an old version of my data structures header file, wrapping it in 
 an "oldversion" namespace then load via the olversion::type, and save 
 via the newversion::type.
 
 I actually have some code for D doing this around somewhere. Written 
 to be able to read IFF graphics files and Lightwave 3D objects.

 
 Oh, right, lightwave.  So I guess it's not a coincidence google for EA 
 IFF-85 turned up NewTek's page.
 

EA IFF 85 was a joint venture of Electronic Arts and Commodore, for 
creating a universal file format for the Amiga. Lightwave 3D is an old 
Amiga application, so them using IFF as a base is kind of natural.

 I shall dig up the code, clean it up in a presentable shape, and make 
 it public.

 
 Cool, can you explain how the API works a little?  I guess I can imagine 
 that loading such a file is not so different from loading an XML file. 
 So like XML parsing there are a few ways to do it.
 

IFF is not quite as flexible as XML, much more flat. So the API is very 
simple.
My current implementation wraps over a Stream instance, and implement 
simple methods as:
foo.seekNextChunk("CTAB");
auto bar = foo.readInt();
Etc, just for working with the basics, as defined by EA IFF 85. NewTek 
the creators of Lightwave 3D have made some additions, that would be 
nice to have as well.

But it was one of my first attempts at D, so I will rewrite it. Just how 
is something I shall think about. Having the chunks as independent 
instances over a seekable stream might be a good idea.

// Fredrik Olsson

 
 --bb

Nov 10 2006

"Paulo Herrera" <pauloh81 yahoo.ca> writes:

On Thu, 09 Nov 2006 02:06:21 +0100, Bill Baxter  
<dnewsgroup billbaxter.com> wrote:

 Christian Kamm wrote:
 Based on initial work from Tom S and clayasaurus, I've written this  
 serialization library. If hope something like this doesn't already  
 exist!

 Great!

  http://www.math.tu-berlin.de/~kamm/d/serialization.zip
  Currently, it only provides binary file io through the Serializer  
 class. It can
 - write/read almost (hopefully) every type through a call to  
 Serializer.describe
 - track class references and pointers by default
 - serialize classes and structs through a templated 'describe' member  
 function
 - write derived classes from base class reference*
 - read derived classes into base class reference*
 - serialize not default constructible classes*
  (* for this to work, the class needs to be registered with the archive  
 type)
  It has far less features than boost::serialization but is already in a  
 very usable state: FreeUniverse, a D game based on the Arc library,  
 uses it for writing and loading savegames as well as other persistant  
 state information.

 I'm using Boost::serialization but I'm not at all happy with it.  But  
 the things that I don't like mostly have to do with versioning, which it  
 looks like you don't support anyway.

 What it does not do/is missing:
 - exception safety / multithread safety
 - out-of-class/struct serialization methods (is it possible to check  
 whether a specific overload exists at compile time?)

 I could be mistaken but I think this is that ADL / Koenig Lookup  
 territory that Walter doesn't want go into.

 - static arrays need to be serialized with describe_staticarray (static  
 arrays can't be inout, so the general-purpose template method doesn't  
 work... is there a way around the problem?)
 - things I forgot right now

 Endian issues?

  Documentation is still rather sparse. This short example shows the  
 basic usage


 Just a wish list item, but I'd prefer an actual "file format" library as  
 opposed to a serialization library.  Maybe a file format library would  
 build on top of the serialization library, but anyway, the key  
 difference is that a serialization lib aims to turn *particular* data  
 structures into a binary format that can be losslessly loaded back into  
 the same data structure later.

 But that is not the way people design generic file formats, like say the  
 Photoshop file format.  Things like that need to be very extensible and  
 shouldn't be tied to particular data structures.  I think that's where  
 boost::serialization gets into trouble.  Once you start talking about  
 versioning, you're no longer talking about one specific data structure.

 For instance Boost::serialization lacks a way to ignore blocks or skip  
 chunks of data that are not recognized or obsolete.  You actually have  
 to load the obsolete thing into the proper (possibly obsolete) data  
 structure and then delete the unnecessary thing you just created.  This  
 is not good from the forwards/backwards compatibility view.  Old code  
 simply cannot read the file (even if it understands the majority of the  
 chunks that matter), and new code is forced to maintain old data  
 structures just for the purpose of loading up obsolete data and throwing  
 it away.

 How do you fix it?  Very simple really.  Just store the file as a series  
 of chunks with fixed length headers, and each header contains the length  
 of the data in that chunk.  If you get a chunk header with a tag you  
 don't understand, just ignore it.  A particular chunk can have  
 sub-chunks too.  I think it's similar in many ways to a grammar  
 definition:

    file:
      header chunklist

    chunklist:
      chunk
      chunk chunklist

    header:
      typeIndicator versionNumber DataEndianness

    chunk:
      chunkHeader data

    chunkHeader:
      chunkType DataLength

    data:
      // Here's where you list all the types of data known to you

 Or something like that.
 I'd like a library that helps me read and write my data in that sort of  
 data-structure independent format.

 --bb

Take a look at the HDF file format that is used to serialize huge amounts  
of scientific data.
It implements a format that is very similar to the one you described.

http://www.hdfgroup.org/

Paulo


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Jan 06 2007

D Programming

C/C++ Programming

Other

digitalmars.D.announce - serialization library