digitalmars.D.announce - serialization library
- Christian Kamm (68/68) Nov 08 2006 Based on initial work from Tom S and clayasaurus, I've written this =
- Walter Bright (3/5) Nov 08 2006 Great! Can you do some of the suggestions on
- Christian Kamm (2/4) Nov 09 2006 Sure, I wanted to wait until I got some responses and feedback from the ...
- Bill Baxter (50/84) Nov 08 2006 I'm using Boost::serialization but I'm not at all happy with it. But
- Walter Bright (14/19) Nov 08 2006 Evolution of a file format:
- Bill Baxter (8/33) Nov 08 2006 I guess I'm no exception. ;-) I've been through the 4 step program a
- Georg Wrede (13/51) Nov 10 2006 Heh, something Microsoft is only now trying to learn. And Unix guys knew...
- Bill Baxter (15/29) Nov 10 2006 Having all the structure in ASCII is great, and maybe everything in
- Georg Wrede (20/45) Nov 11 2006 Probably just as well, since then a normal parser could not handle it.
- Sean Kelly (3/28) Nov 11 2006 Base64 encoding :-p
- Robert Ramey (5/5) Dec 01 2009 boost serialization has an object wrapper for binary data - called (surp...
- Christian Kamm (10/14) Nov 09 2006 Oh, indeed. It does not take care of them yet. Additionally, classes are...
- Christian Kamm (26/32) Nov 09 2006 Check out
- Fredrik Olsson (8/38) Nov 09 2006 What you want is a lib for reading and writing EA IFF-85 compatible file...
- Bill Baxter (26/34) Nov 09 2006 I've never heard of EA IFF-85, but a brief skim of the description here:
- Fredrik Olsson (26/73) Nov 10 2006 No it is not limited to any specific kind of files. EA made it as an
- Paulo Herrera (9/90) Jan 06 2007 Take a look at the HDF file format that is used to serialize huge amount...
Based on initial work from Tom S and clayasaurus, I've written this = serialization library. If hope something like this doesn't already exist= ! http://www.math.tu-berlin.de/~kamm/d/serialization.zip Currently, it only provides binary file io through the Serializer class.= = It can - write/read almost (hopefully) every type through a call to = Serializer.describe - track class references and pointers by default - serialize classes and structs through a templated 'describe' member = function - write derived classes from base class reference* - read derived classes into base class reference* - serialize not default constructible classes* (* for this to work, the class needs to be registered with the archive = type) It has far less features than boost::serialization but is already in a = very usable state: FreeUniverse, a D game based on the Arc library, uses= = it for writing and loading savegames as well as other persistant state = information. What it does not do/is missing: - exception safety / multithread safety - out-of-class/struct serialization methods (is it possible to check = whether a specific overload exists at compile time?) - static arrays need to be serialized with describe_staticarray (static = = arrays can't be inout, so the general-purpose template method doesn't = work... is there a way around the problem?) - things I forgot right now Documentation is still rather sparse. This short example shows the basic= = usage --- struct Foo { int a =3D 3; void describe(T)(T archive) { archive.describe(a); } } void main() { real bar =3D 3.141; Foo foo; // write data Serializer s =3D new Serializer("testfile", FileMode.Out); s.describe(bar); s.describe(foo); delete s; // read data s =3D new Serializer("testfile", FileMode.In); s.describe(bar); s.describe(foo); } --- See the unittests in serializer.d for other details. Most of the logic i= s = in basicarchive.d. Docs definitely need work. Since FreeUniverse was its first real user, it is currently maintained i= n = the FreeUniverse svn. However, if other people are interested, I will = request a seperate project for it on dsource. Comments and improvements are of course welcome. Best Regards, Christian Kamm
Nov 08 2006
Christian Kamm wrote:Based on initial work from Tom S and clayasaurus, I've written this serialization library. If hope something like this doesn't already exist!Great! Can you do some of the suggestions on http://www.digitalmars.com/d/howto-promote.html?
Nov 08 2006
Great! Can you do some of the suggestions on http://www.digitalmars.com/d/howto-promote.html?Sure, I wanted to wait until I got some responses and feedback from the community though.
Nov 09 2006
Christian Kamm wrote:Based on initial work from Tom S and clayasaurus, I've written this serialization library. If hope something like this doesn't already exist!Great!http://www.math.tu-berlin.de/~kamm/d/serialization.zip Currently, it only provides binary file io through the Serializer class. It can - write/read almost (hopefully) every type through a call to Serializer.describe - track class references and pointers by default - serialize classes and structs through a templated 'describe' member function - write derived classes from base class reference* - read derived classes into base class reference* - serialize not default constructible classes* (* for this to work, the class needs to be registered with the archive type) It has far less features than boost::serialization but is already in a very usable state: FreeUniverse, a D game based on the Arc library, uses it for writing and loading savegames as well as other persistant state information.I'm using Boost::serialization but I'm not at all happy with it. But the things that I don't like mostly have to do with versioning, which it looks like you don't support anyway.What it does not do/is missing: - exception safety / multithread safety - out-of-class/struct serialization methods (is it possible to check whether a specific overload exists at compile time?)I could be mistaken but I think this is that ADL / Koenig Lookup territory that Walter doesn't want go into.- static arrays need to be serialized with describe_staticarray (static arrays can't be inout, so the general-purpose template method doesn't work... is there a way around the problem?) - things I forgot right nowEndian issues?Documentation is still rather sparse. This short example shows the basic usageJust a wish list item, but I'd prefer an actual "file format" library as opposed to a serialization library. Maybe a file format library would build on top of the serialization library, but anyway, the key difference is that a serialization lib aims to turn *particular* data structures into a binary format that can be losslessly loaded back into the same data structure later. But that is not the way people design generic file formats, like say the Photoshop file format. Things like that need to be very extensible and shouldn't be tied to particular data structures. I think that's where boost::serialization gets into trouble. Once you start talking about versioning, you're no longer talking about one specific data structure. For instance Boost::serialization lacks a way to ignore blocks or skip chunks of data that are not recognized or obsolete. You actually have to load the obsolete thing into the proper (possibly obsolete) data structure and then delete the unnecessary thing you just created. This is not good from the forwards/backwards compatibility view. Old code simply cannot read the file (even if it understands the majority of the chunks that matter), and new code is forced to maintain old data structures just for the purpose of loading up obsolete data and throwing it away. How do you fix it? Very simple really. Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk. If you get a chunk header with a tag you don't understand, just ignore it. A particular chunk can have sub-chunks too. I think it's similar in many ways to a grammar definition: file: header chunklist chunklist: chunk chunk chunklist header: typeIndicator versionNumber DataEndianness chunk: chunkHeader data chunkHeader: chunkType DataLength data: // Here's where you list all the types of data known to you Or something like that. I'd like a library that helps me read and write my data in that sort of data-structure independent format. --bb
Nov 08 2006
Bill Baxter wrote:How do you fix it? Very simple really. Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk. If you get a chunk header with a tag you don't understand, just ignore it. A particular chunk can have sub-chunks too.Evolution of a file format: 1.0: Just spew the struct contents out into a file using something like fwrite(). 2.0: Oops! Need to update 1.0 and retain backwards compatibility. Solution: 2.0 files put out 'illegal' values into the 1.0 format to signal it's a 2.0 file. 3.0: Doh! Find another set of illegal 2.0 values. This time, get smarter and have another field with a version number in it. 4.0: Get smart and implement your suggestion, so you can have both backwards and *forwards* compatibility. Think I'm joking? Just look at a few! Everyone learns this the hard way. Me, if practical, I like file formats to be in ascii so I can examine them easily to see if they're working right.
Nov 08 2006
Walter Bright wrote:Bill Baxter wrote:I guess I'm no exception. ;-) I've been through the 4 step program a few times myself.How do you fix it? Very simple really. Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk. If you get a chunk header with a tag you don't understand, just ignore it. A particular chunk can have sub-chunks too.Evolution of a file format: 1.0: Just spew the struct contents out into a file using something like fwrite(). 2.0: Oops! Need to update 1.0 and retain backwards compatibility. Solution: 2.0 files put out 'illegal' values into the 1.0 format to signal it's a 2.0 file. 3.0: Doh! Find another set of illegal 2.0 values. This time, get smarter and have another field with a version number in it. 4.0: Get smart and implement your suggestion, so you can have both backwards and *forwards* compatibility. Think I'm joking? Just look at a few! Everyone learns this the hard way.Me, if practical, I like file formats to be in ascii so I can examine them easily to see if they're working right.That is one thing I do like about boost::serialization. With basically one line of code I can switch between xml serialization and binary serialization. Only thing I didn't like was I couldn't figure out how to keep some things binary. --bb
Nov 08 2006
Bill Baxter wrote:Walter Bright wrote:Heh, something Microsoft is only now trying to learn. And Unix guys knew right from the start. Even most of the communications protocols are in text.Bill Baxter wrote:I guess I'm no exception. ;-) I've been through the 4 step program a few times myself.How do you fix it? Very simple really. Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk. If you get a chunk header with a tag you don't understand, just ignore it. A particular chunk can have sub-chunks too.Evolution of a file format: 1.0: Just spew the struct contents out into a file using something like fwrite(). 2.0: Oops! Need to update 1.0 and retain backwards compatibility. Solution: 2.0 files put out 'illegal' values into the 1.0 format to signal it's a 2.0 file. 3.0: Doh! Find another set of illegal 2.0 values. This time, get smarter and have another field with a version number in it. 4.0: Get smart and implement your suggestion, so you can have both backwards and *forwards* compatibility. Think I'm joking? Just look at a few! Everyone learns this the hard way.Me, if practical, I like file formats to be in ascii so I can examine them easily to see if they're working right.That is one thing I do like about boost::serialization. With basically one line of code I can switch between xml serialization and binary serialization. Only thing I didn't like was I couldn't figure out how to keep some things binary.With a text file, you can tell what it is, even when the file has got misplaced or renamed, but with a binary it's pretty hopeless. And of course you might look at a few languages ( ~ fileformats) especially made for serializing. YAML looks very clean, and is easily readable by humans (XML is not) JSON looks like ECMA script The following page, although only vaguely related, gives an excellent intro to the ideology, at the center: http://mike.teczno.com/json.html With these, you'll be right where Walter was talking about.
Nov 10 2006
Georg Wrede wrote:Bill Baxter wrote:Having all the structure in ASCII is great, and maybe everything in ASCII while you're debugging, but some things just don't work well as ascii -- images, videos, audio files, 3D meshes, etc. It makes sense to have the structure annotated in ascii, but when it comes to storing raw image data there's not much to be gained from storing that as a giant ASCII string. With the boost::serialization's XML I wanted to be able to store that image as something like <image width=1024 height=768 format=RGBA type="float"> [big hunk o raw binary image data] </image> But I couldn't find any way to do that.That is one thing I do like about boost::serialization. With basically one line of code I can switch between xml serialization and binary serialization. Only thing I didn't like was I couldn't figure out how to keep some things binary.With a text file, you can tell what it is, even when the file has got misplaced or renamed, but with a binary it's pretty hopeless.And of course you might look at a few languages ( ~ fileformats) especially made for serializing. YAML looks very clean, and is easily readable by humans (XML is not)I took a look at that one before. I agree that it would be nice if a more human-friendly alternative to XML caught on. --bb
Nov 10 2006
Bill Baxter wrote:Georg Wrede wrote:Probably just as well, since then a normal parser could not handle it. Or if you didn't care, you could invent your own tag type, like <image width=1024 height=768 format=RGBA type="float"> <binarydata Ö¤*^%ÄÖ</binarydataBill Baxter wrote:Having all the structure in ASCII is great, and maybe everything in ASCII while you're debugging, but some things just don't work well as ascii -- images, videos, audio files, 3D meshes, etc. It makes sense to have the structure annotated in ascii, but when it comes to storing raw image data there's not much to be gained from storing that as a giant ASCII string. With the boost::serialization's XML I wanted to be able to store that image as something like <image width=1024 height=768 format=RGBA type="float"> [big hunk o raw binary image data] </image> But I couldn't find any way to do that.That is one thing I do like about boost::serialization. With basically one line of code I can switch between xml serialization and binary serialization. Only thing I didn't like was I couldn't figure out how to keep some things binary.With a text file, you can tell what it is, even when the file has got misplaced or renamed, but with a binary it's pretty hopeless.</image> And, as you can see, you'd immediately lose the gains of having the thing as a text file because it gets unwieldy in a text viewer. You could always serialize into a subdirectory and save the binaries (pictures, etc) as separate files there. Then the XML would only contain their names. (Not my invention. Java uses this, OpenOffice, and others.) To save space you then zip the whole thing, thus getting your single serialization file, as originally wanted. --- This can be very simple in the program, I remember seeing somewhere a library that made a zip file on disk look to the program like a subdirectory tree. Thus the zipping, creation of directories and other chores become transparent to the programmer. Or you could just use the Phobos zip without a tree.
Nov 11 2006
Bill Baxter wrote:Georg Wrede wrote:Base64 encoding :-p SeanBill Baxter wrote:Having all the structure in ASCII is great, and maybe everything in ASCII while you're debugging, but some things just don't work well as ascii -- images, videos, audio files, 3D meshes, etc. It makes sense to have the structure annotated in ascii, but when it comes to storing raw image data there's not much to be gained from storing that as a giant ASCII string. With the boost::serialization's XML I wanted to be able to store that image as something like <image width=1024 height=768 format=RGBA type="float"> [big hunk o raw binary image data] </image> But I couldn't find any way to do that.That is one thing I do like about boost::serialization. With basically one line of code I can switch between xml serialization and binary serialization. Only thing I didn't like was I couldn't figure out how to keep some things binary.With a text file, you can tell what it is, even when the file has got misplaced or renamed, but with a binary it's pretty hopeless.
Nov 11 2006
boost serialization has an object wrapper for binary data - called (surprise) binary_object. On text based formats, it uses base64 encoding it's in the documentation and also there is a specific test which shows how to use it. And its extremely easy to use. Robert Ramey
Dec 01 2009
Oh, indeed. It does not take care of them yet. Additionally, classes are (if unregistered) identified by their mangled name, which might vary between compilers, I think.What it does not do/is missing:Endian issues?Just a wish list item, but I'd prefer an actual "file format" library as opposed to a serialization library. ...I agree that it is not very well suited for writing/reading user data or files with a long life-expectancy. It is very nice for temporarily swapping data to disk and similar tasks, where the same process reads back the data it wrote to a file earlier. A full-fledged "file format" library, while being something very useful I'd love to see as well, would be a project for another day though. Christian
Nov 09 2006
How do you fix it? Very simple really. Just store the file as a seri=es =of chunks with fixed length headers, and each header contains the leng=th =of the data in that chunk. If you get a chunk header with a tag you =don't understand, just ignore it. A particular chunk can have =sub-chunks too. I think it's similar in many ways to a grammar =definition:Check out http://www.math.tu-berlin.de/~kamm/d/serializationchunk.zip The chunk.d contains a hackish implementation of your chunk idea: when = reading, it discards any chunk-parts it doesn't understand. Once it got = to = one it can process, it discards any other older-versioned chunks of the = = same type. When writing, it is possible to write legacy chunks for older= = versions. To test it, you need to compile with -version=3DV1_SER for the 1.0 versi= on = of the program and with -version=3DV1_SER -version=3DV2_SER for the 2.0 = = version of the program. Try running the v2 version, copy data_out to = data_in and run the v1 version. (sorry for the complicated instructions,= = it's just a hack!) Is this, approximately, what you had in mind? Personally, I'm not sure = about all those classes required and how it would look in a larger = project: maybe writing a version number and then having the user write a= = switch statement for it would have been ok too. Christian
Nov 09 2006
Bill Baxter skrev: <snip>How do you fix it? Very simple really. Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk. If you get a chunk header with a tag you don't understand, just ignore it. A particular chunk can have sub-chunks too. I think it's similar in many ways to a grammar definition: file: header chunklist chunklist: chunk chunk chunklist header: typeIndicator versionNumber DataEndianness chunk: chunkHeader data chunkHeader: chunkType DataLength data: // Here's where you list all the types of data known to you Or something like that. I'd like a library that helps me read and write my data in that sort of data-structure independent format.What you want is a lib for reading and writing EA IFF-85 compatible files? I actually have some code for D doing this around somewhere. Written to be able to read IFF graphics files and Lightwave 3D objects. I shall dig up the code, clean it up in a presentable shape, and make it public. // Fredrik Olsson--bb
Nov 09 2006
Fredrik Olsson wrote:Bill Baxter skrev: <snip>I've never heard of EA IFF-85, but a brief skim of the description here: http://www.newtek.com/lightwave/developer/LW80/8lwsdk/docs/filefmts/eaiff85.html sounds good. Is it something 3D-graphics specific though? Electronic Arts created the standard, and the website I found above is for a 3D modeling package... But it looks right. But the truth is I don't know what I want exactly in terms of API. I just want something that makes it easy to take my data structures -> extract the data into something generic and ESPECIALLY not intrinsically tied to the types in my program -> save it to disk -> load it back into whatever data structures I choose later. It's ok if it's a little more painful than MyData.serialize(archive); MyData.load(archive); as long as it achieves the goal. With Boost::serialization I've ended up having to write upgrader programs a few times over the course of development. It's always a pain because boost::serialization wants to be smart so what I end up doing is taking an old version of my data structures header file, wrapping it in an "oldversion" namespace then load via the olversion::type, and save via the newversion::type.What you want is a lib for reading and writing EA IFF-85 compatible files?I actually have some code for D doing this around somewhere. Written to be able to read IFF graphics files and Lightwave 3D objects.Oh, right, lightwave. So I guess it's not a coincidence google for EA IFF-85 turned up NewTek's page.I shall dig up the code, clean it up in a presentable shape, and make it public.Cool, can you explain how the API works a little? I guess I can imagine that loading such a file is not so different from loading an XML file. So like XML parsing there are a few ways to do it. --bb
Nov 09 2006
Bill Baxter wrote:Fredrik Olsson wrote:No it is not limited to any specific kind of files. EA made it as an atempt to make a general file format structure for any use. Basically it just takes care of bundling chunks, marking them as required of optional. And byte-order-independence! IFF as such has been used for images as in IFF, audio AIFF, 3D objects OBJ, and lots more. When Microsoft created BMP and WAV they more or less ripped the EA IFF rationale, but changed the required byte order. EA IFF is a low level format. How to actually interpret the data that is contained is up to each application.Bill Baxter skrev: <snip>I've never heard of EA IFF-85, but a brief skim of the description here: http://www.newtek.com/lightwave/developer/LW80/8lwsdk/docs/f lefmts/eaiff85.html sounds good. Is it something 3D-graphics specific though? Electronic Arts created the standard, and the website I found above is for a 3D modeling package... But it looks right.What you want is a lib for reading and writing EA IFF-85 compatible files?But the truth is I don't know what I want exactly in terms of API. I just want something that makes it easy to take my data structures -> extract the data into something generic and ESPECIALLY not intrinsically tied to the types in my program -> save it to disk -> load it back into whatever data structures I choose later. It's ok if it's a little more painful than MyData.serialize(archive); MyData.load(archive); as long as it achieves the goal. With Boost::serialization I've ended up having to write upgrader programs a few times over the course of development. It's always a pain because boost::serialization wants to be smart so what I end up doing is taking an old version of my data structures header file, wrapping it in an "oldversion" namespace then load via the olversion::type, and save via the newversion::type.EA IFF 85 was a joint venture of Electronic Arts and Commodore, for creating a universal file format for the Amiga. Lightwave 3D is an old Amiga application, so them using IFF as a base is kind of natural.I actually have some code for D doing this around somewhere. Written to be able to read IFF graphics files and Lightwave 3D objects.Oh, right, lightwave. So I guess it's not a coincidence google for EA IFF-85 turned up NewTek's page.IFF is not quite as flexible as XML, much more flat. So the API is very simple. My current implementation wraps over a Stream instance, and implement simple methods as: foo.seekNextChunk("CTAB"); auto bar = foo.readInt(); Etc, just for working with the basics, as defined by EA IFF 85. NewTek the creators of Lightwave 3D have made some additions, that would be nice to have as well. But it was one of my first attempts at D, so I will rewrite it. Just how is something I shall think about. Having the chunks as independent instances over a seekable stream might be a good idea. // Fredrik OlssonI shall dig up the code, clean it up in a presentable shape, and make it public.Cool, can you explain how the API works a little? I guess I can imagine that loading such a file is not so different from loading an XML file. So like XML parsing there are a few ways to do it.--bb
Nov 10 2006
On Thu, 09 Nov 2006 02:06:21 +0100, Bill Baxter <dnewsgroup billbaxter.com> wrote:Christian Kamm wrote:Take a look at the HDF file format that is used to serialize huge amounts of scientific data. It implements a format that is very similar to the one you described. http://www.hdfgroup.org/ Paulo -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/Based on initial work from Tom S and clayasaurus, I've written this serialization library. If hope something like this doesn't already exist!Great!http://www.math.tu-berlin.de/~kamm/d/serialization.zip Currently, it only provides binary file io through the Serializer class. It can - write/read almost (hopefully) every type through a call to Serializer.describe - track class references and pointers by default - serialize classes and structs through a templated 'describe' member function - write derived classes from base class reference* - read derived classes into base class reference* - serialize not default constructible classes* (* for this to work, the class needs to be registered with the archive type) It has far less features than boost::serialization but is already in a very usable state: FreeUniverse, a D game based on the Arc library, uses it for writing and loading savegames as well as other persistant state information.I'm using Boost::serialization but I'm not at all happy with it. But the things that I don't like mostly have to do with versioning, which it looks like you don't support anyway.What it does not do/is missing: - exception safety / multithread safety - out-of-class/struct serialization methods (is it possible to check whether a specific overload exists at compile time?)I could be mistaken but I think this is that ADL / Koenig Lookup territory that Walter doesn't want go into.- static arrays need to be serialized with describe_staticarray (static arrays can't be inout, so the general-purpose template method doesn't work... is there a way around the problem?) - things I forgot right nowEndian issues?Documentation is still rather sparse. This short example shows the basic usageJust a wish list item, but I'd prefer an actual "file format" library as opposed to a serialization library. Maybe a file format library would build on top of the serialization library, but anyway, the key difference is that a serialization lib aims to turn *particular* data structures into a binary format that can be losslessly loaded back into the same data structure later. But that is not the way people design generic file formats, like say the Photoshop file format. Things like that need to be very extensible and shouldn't be tied to particular data structures. I think that's where boost::serialization gets into trouble. Once you start talking about versioning, you're no longer talking about one specific data structure. For instance Boost::serialization lacks a way to ignore blocks or skip chunks of data that are not recognized or obsolete. You actually have to load the obsolete thing into the proper (possibly obsolete) data structure and then delete the unnecessary thing you just created. This is not good from the forwards/backwards compatibility view. Old code simply cannot read the file (even if it understands the majority of the chunks that matter), and new code is forced to maintain old data structures just for the purpose of loading up obsolete data and throwing it away. How do you fix it? Very simple really. Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk. If you get a chunk header with a tag you don't understand, just ignore it. A particular chunk can have sub-chunks too. I think it's similar in many ways to a grammar definition: file: header chunklist chunklist: chunk chunk chunklist header: typeIndicator versionNumber DataEndianness chunk: chunkHeader data chunkHeader: chunkType DataLength data: // Here's where you list all the types of data known to you Or something like that. I'd like a library that helps me read and write my data in that sort of data-structure independent format. --bb
Jan 06 2007