digitalmars.D - Request for pre-review: std.serialization/orange
- Jacob Carlborg (38/38) Sep 29 2011 I would like to have some form of pre-review of my serialization library...
- Tobias Pankrath (7/10) Sep 29 2011 There should be something in Phobos. This should involve two steps.
- Jacob Carlborg (6/16) Sep 29 2011 My unit test framework does two things, provides a some form of context
- Jonathan M Davis (14/21) Sep 29 2011 I very much doubt that it would be accepted. assertPred failed to make i...
- Jacob Carlborg (19/40) Sep 30 2011 That's why I'm asking. BTW, if we had a proper unit test framework it
- Jonathan M Davis (17/68) Sep 30 2011 That can be done easily enough with a version block with a version speci...
- Jacob Carlborg (9/20) Sep 30 2011 Ok, I see.
- Robert Jacques (12/14) Sep 30 2011 [snip]
- Jacob Carlborg (17/42) Sep 30 2011 I don't know why but I thought Type.stringof == typeid(Type).toString
- Robert Jacques (16/38) Sep 30 2011 I agree, which is why I suggested lookup should have some granuality. i....
- Jacob Carlborg (13/34) Oct 01 2011 Aah, now I get it. That's a good idea. The question is what to name the
- Robert Jacques (4/35) Oct 02 2011 How about overrideSerializer or overloadSerializer?
- Jacob Carlborg (10/23) Oct 03 2011 registerSerializer for the static method and
- Robert Jacques (2/18) Oct 03 2011 Yes. Sorry for being unclear. The concept being that at the instance lev...
- Jacob Carlborg (4/25) Oct 03 2011 Yes, thanks, that makes sense.
- Robert Jacques (17/19) Sep 30 2011 [snip]
- Jacob Carlborg (25/57) Oct 01 2011 I was hoping the Archive interface and the Base abstract class would be
- Robert Jacques (28/49) Oct 01 2011 For the pre-review its okay. But you'll need it for the actual review.
- Jacob Carlborg (22/78) Oct 02 2011 Ok, it will be the same documentation as for Archive and Base. Ddoc
- Robert Jacques (7/76) Oct 03 2011 So, in essence, you are saying that by the time archiving occurs, isSlic...
- Jacob Carlborg (42/70) Oct 03 2011 No, I'm not saying that. Example:
- Robert Jacques (57/126) Oct 03 2011 Regarding design, I agree, although I'd go one further and define Array ...
- Jacob Carlborg (26/98) Oct 04 2011 Any suggestion how to fix this, how to properly detect if an array is a
- Robert Jacques (16/55) Oct 04 2011 Well, there are two problems. First, there is an API issue: When you pas...
- Jacob Carlborg (8/36) Oct 04 2011 Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not
- Robert Jacques (2/19) Oct 04 2011 Appending has never mandated reallocation. Maybe you're confusing it wit...
- Jacob Carlborg (4/27) Oct 04 2011 Yeah, maybe you're right.
I would like to have some form of pre-review of my serialization library Orange for later inclusion in Phobos as std.serialization (or similar). The reason for why I would like to have a pre-review is that Orange, in its current state, supports both D1/Tango and D2/Phobos. I don't want to spend a lot of time in interacting Orange with Phobos if it have no chance of getting accepted into Phobos. I'm hoping with can do like a regular review with the assumption that if it gets accepted into Phobos I will remove all D1/Tango specific code, integrate it properly with Phobos and do a second regular review. A couple of notes for the review: * The most important packages are: orange.serialization and orange.serialization.archives * I will not accept having just one module for this library. I'm hoping to have a package called "std.serialization" or similar. The question is if there needs to be a (sub)package for the archives as well. * For the unit tests I've used my own kind of micro unit test framework (that is included). Is that something we want to have in general in Phobos so other modules can take advantage of that? Or should I just rip out the framework? * The unit tests are located in its own package, I'm not very happy about putting the unit tests in the same module as the rest of the code, i.e. the serialization module * I'm using some utility functions located in the "util" and "core" packages, what should we do about those, where to put them? * I'm not using all of the functions/modules in the above mentioned utility packages For usage examples, see the project page: http://dsource.org/projects/orange/wiki/Tutorials For more extended usage examples, see the unit tests: https://github.com/jacob-carlborg/orange/tree/master/tests Sources: https://github.com/jacob-carlborg/orange Project page: http://dsource.org/projects/orange/ Documentation: http://dl.dropbox.com/u/18386187/orange_docs/orange.serialization.Serializer.html (Don't forget clicking the "Package" tab in the top corner to see the documentation for the rest of the modules) -- /Jacob Carlborg
Sep 29 2011
* For the unit tests I've used my own kind of micro unit test framework (that is included). Is that something we want to have in general in Phobos so other modules can take advantage of that?There should be something in Phobos. This should involve two steps. First define a protocol, so that multiple unittest libraries can be used in one project (i.e a library uses A and another B). Than write a library on top of this. My suggestion for a protocol would be something simple as: 1. There should only be one approach to unittest used per module 2. If you replace the test runner, call the old one first.
Sep 29 2011
On 2011-09-29 21:42, Tobias Pankrath wrote:My unit test framework does two things, provides a some form of context for the tests; and it catches all assert exceptions to let other tests continue. It also gives a quite nice report of what tests failed. -- /Jacob Carlborg* For the unit tests I've used my own kind of micro unit test framework (that is included). Is that something we want to have in general in Phobos so other modules can take advantage of that?There should be something in Phobos. This should involve two steps. First define a protocol, so that multiple unittest libraries can be used in one project (i.e a library uses A and another B). Than write a library on top of this. My suggestion for a protocol would be something simple as: 1. There should only be one approach to unittest used per module 2. If you replace the test runner, call the old one first.
Sep 29 2011
On Thursday, September 29, 2011 20:58:30 Jacob Carlborg wrote:* For the unit tests I've used my own kind of micro unit test framework (that is included). Is that something we want to have in general in Phobos so other modules can take advantage of that? Or should I just rip out the framework?I very much doubt that it would be accepted. assertPred failed to make it in spite of the various benefits that it provided, and a number of folks seem to be against more complicated unit testing features making it into Phobos. Not to mention, there's already some discussion of the unit tests taking too long. We aren't going to want anything that adds yet more overhead.* The unit tests are located in its own package, I'm not very happy about putting the unit tests in the same module as the rest of the code, i.e. the serialization moduleWell, that's the way that Phobos does it, and it's essentially how D's unit tests are intended to work (though obviously, you can work around that if you really want to). Phobos' makefiles are set up with the idea that each module's unit tests are included in that module. I'm not sure that it's guaranteed that we wouldn't allow the unit tests to be separate, but that's not how any of the other Phobos unit tests works. And personally, I find that the unit tests are far more maintainable when they're right next to the functions that they test. - Jonathan M Davis
Sep 29 2011
On 2011-09-30 08:39, Jonathan M Davis wrote:On Thursday, September 29, 2011 20:58:30 Jacob Carlborg wrote:That's why I'm asking. BTW, if we had a proper unit test framework it could be possible to run a smaller set of unit tests to keep the running time down to a minimum (not something my framework can do).* For the unit tests I've used my own kind of micro unit test framework (that is included). Is that something we want to have in general in Phobos so other modules can take advantage of that? Or should I just rip out the framework?I very much doubt that it would be accepted. assertPred failed to make it in spite of the various benefits that it provided, and a number of folks seem to be against more complicated unit testing features making it into Phobos. Not to mention, there's already some discussion of the unit tests taking too long. We aren't going to want anything that adds yet more overhead.The thing is that just to do a simple test, like serializing an int, it requires some extra code. It's not like calling a single function. It requires: * Something to serialize * A serializer * An archive * Actually doing the serialization * Performing several checks on the serialized data BTW I don't think it scales to have the unit tests in the same module if you want to perform extensive unit tests or if it's a complicated module. The unit test code should be equally well organized, readable and maintainable as the regular code. I think keeping it in the same module makes that harder. -- /Jacob Carlborg* The unit tests are located in its own package, I'm not very happy about putting the unit tests in the same module as the rest of the code, i.e. the serialization moduleWell, that's the way that Phobos does it, and it's essentially how D's unit tests are intended to work (though obviously, you can work around that if you really want to). Phobos' makefiles are set up with the idea that each module's unit tests are included in that module. I'm not sure that it's guaranteed that we wouldn't allow the unit tests to be separate, but that's not how any of the other Phobos unit tests works. And personally, I find that the unit tests are far more maintainable when they're right next to the functions that they test. - Jonathan M Davis
Sep 30 2011
On Friday, September 30, 2011 16:25:47 Jacob Carlborg wrote:On 2011-09-30 08:39, Jonathan M Davis wrote:That can be done easily enough with a version block with a version specific to running longer or shorter tests. No additional framework is necessary. Don has already brought it up in the Phobos newsgroup, but it's unclear whether we're going to do anything about it. The main problem is the compile time, not the run time, so it's ultimately a compiler issue. Templates and CTFE in particular make the whole thing slower.On Thursday, September 29, 2011 20:58:30 Jacob Carlborg wrote:That's why I'm asking. BTW, if we had a proper unit test framework it could be possible to run a smaller set of unit tests to keep the running time down to a minimum (not something my framework can do).* For the unit tests I've used my own kind of micro unit test framework (that is included). Is that something we want to have in general in Phobos so other modules can take advantage of that? Or should I just rip out the framework?I very much doubt that it would be accepted. assertPred failed to make it in spite of the various benefits that it provided, and a number of folks seem to be against more complicated unit testing features making it into Phobos. Not to mention, there's already some discussion of the unit tests taking too long. We aren't going to want anything that adds yet more overhead.In my experience, it's _much_ easier to have the tests right next to what they're testing. I've found it _very_ annoying when I've had to put unit tests after a templated type, because the unit tests coludn't be _in_ the templated type where they would be instantiated with it. The Interval unit tests in std.datetime are way less manageable because of that. Now, if you have unit testing each function individually doesn't work very well, and if you have to do a bunch of setup for any testing, then the situation is a bit different, but you could probably just stick all of the tests at the end of the file then, where they're then separate from the code. - Jonathan M DavisThe thing is that just to do a simple test, like serializing an int, it requires some extra code. It's not like calling a single function. It requires: * Something to serialize * A serializer * An archive * Actually doing the serialization * Performing several checks on the serialized data BTW I don't think it scales to have the unit tests in the same module if you want to perform extensive unit tests or if it's a complicated module. The unit test code should be equally well organized, readable and maintainable as the regular code. I think keeping it in the same module makes that harder.s* The unit tests are located in its own package, I'm not very happy about putting the unit tests in the same module as the rest of the code, i.e. the serialization moduleWell, that's the way that Phobos does it, and it's essentially how D's unit tests are intended to work (though obviously, you can work around that if you really want to). Phobos' makefiles are set up with the idea that each module's unit tests are included in that module. I'm not sure that it's guaranteed that we wouldn't allow the unit tests to be separate, but that's not how any of the other Phobos unit tests works. And personally, I find that the unit tests are far more maintainable when they're right next to the functions that they test. - Jonathan M Davis
Sep 30 2011
On 2011-09-30 18:09, Jonathan M Davis wrote:That can be done easily enough with a version block with a version specific to running longer or shorter tests. No additional framework is necessary. Don has already brought it up in the Phobos newsgroup, but it's unclear whether we're going to do anything about it. The main problem is the compile time, not the run time, so it's ultimately a compiler issue. Templates and CTFE in particular make the whole thing slower.Ok, I see. Phobos is like Boost, everything is templates, that's what we get. Compiling my unit tests with D1 takes about 0.7 seconds, using D2 it takes about 1.4 seconds.Now, if you have unit testing each function individually doesn't work very well, and if you have to do a bunch of setup for any testing, then the situation is a bit different, but you could probably just stick all of the tests at the end of the file then, where they're then separate from the code. - Jonathan M DavisI had all the unit tests in one module at first, but I thought it was too much to have in one module and now I have several more unit tests. -- /Jacob Carlborg
Sep 30 2011
On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:I would like to have some form of pre-review of my serialization library Orange for later inclusion in Phobos as std.serialization (or similar).[snip] This is a quick note on the API design; I'm planning on doing a deeper review of the code + API later. Re: registerSerializer Type.stringof is not unique and can't be used by your serializer. Furthermore, allowing users to manually set the lookup string is going to be a major source of silent errors / bugs and exposes a large portions of your internals. Doing it in this way prevents you from updating how the back-end looks up types. I'd recommend using: typeid(A).toString internally instead as this is unique and registerSerializer's API to void registerSerializer(Derived,Base)(void delegate(Base) dg) if( is(Derived:Base) ) {} which would be called via: registerSerializer!Foo(dg); The method should also be static: If I'm registering a custom serialization method, I don't want to duplicate that mapping everywhere a serializer is instanced. I don't even want to duplicate it for every type of serializer. I think there needs to be some granularity for this: i.e. instance -> type -> global. Also, repeat the above for registerDeserializer. Re: deserialize some of the example don't look like they're correct.
Sep 30 2011
On 2011-09-30 15:03, Robert Jacques wrote:On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:I don't know why but I thought Type.stringof == typeid(Type).toString for everything that wasn't a class.I would like to have some form of pre-review of my serialization library Orange for later inclusion in Phobos as std.serialization (or similar).[snip] This is a quick note on the API design; I'm planning on doing a deeper review of the code + API later. Re: registerSerializer Type.stringof is not unique and can't be used by your serializer.Furthermore, allowing users to manually set the lookup string is going to be a major source of silent errors / bugs and exposes a large portions of your internals. Doing it in this way prevents you from updating how the back-end looks up types. I'd recommend using: typeid(A).toString internally instead as this is unique and registerSerializer's API to void registerSerializer(Derived,Base)(void delegate(Base) dg) if( is(Derived:Base) ) {} which would be called via: registerSerializer!Foo(dg);Didn't thought of that, thanks.The method should also be static: If I'm registering a custom serialization method, I don't want to duplicate that mapping everywhere a serializer is instanced. I don't even want to duplicate it for every type of serializer. I think there needs to be some granularity for this: i.e. instance -> type -> global."register" is static, "registerSerializer" is not because I'm not entirely sure how I want the API to behave. What if I want to serialize a class in two different places. In one place I want to serialize it by default and in the other I want to do custom serialization? "I don't even want to duplicate it for every type of serializer". I'm not sure I quite understand, there's only one type of serializer.Also, repeat the above for registerDeserializer. Re: deserialize some of the example don't look like they're correct.They look correct to me. Note that two of the three "deserialize" methods should only be called when performing custom deserialization of a class/strut. This method will then, most likely, be called from within the class to manually deserialize the fields. -- /Jacob Carlborg
Sep 30 2011
On Fri, 30 Sep 2011 10:41:48 -0400, Jacob Carlborg <doob me.com> wrote:On 2011-09-30 15:03, Robert Jacques wrote:[snip]On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions.The method should also be static: If I'm registering a custom serialization method, I don't want to duplicate that mapping everywhere a serializer is instanced. I don't even want to duplicate it for every type of serializer. I think there needs to be some granularity for this: i.e. instance -> type -> global."register" is static, "registerSerializer" is not because I'm not entirely sure how I want the API to behave. What if I want to serialize a class in two different places. In one place I want to serialize it by default and in the other I want to do custom serialization?"I don't even want to duplicate it for every type of serializer". I'm not sure I quite understand, there's only one type of serializer.I'm sorry, I was thinking about archive types (i.e. JSON vs XML) and somehow thinking that the Serializers would be different for each. (I was thinking that the serializer was templated on the archive for some reason.)Both T deserialize (T)(); T deserialize (T)(string key); have the following example: class Foo { int a; void fromData (Serializer serializer, Serializer.Data key) { a = serializer!(int)("a"); } }Also, repeat the above for registerDeserializer. Re: deserialize some of the example don't look like they're correct.They look correct to me. Note that two of the three "deserialize" methods should only be called when performing custom deserialization of a class/strut. This method will then, most likely, be called from within the class to manually deserialize the fields.
Sep 30 2011
On 2011-10-01 05:00, Robert Jacques wrote:I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions.Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.I'm sorry, I was thinking about archive types (i.e. JSON vs XML) and somehow thinking that the Serializers would be different for each. (I was thinking that the serializer was templated on the archive for some reason.)Ok, no problem.Both T deserialize (T)(); T deserialize (T)(string key); have the following example: class Foo { int a; void fromData (Serializer serializer, Serializer.Data key) { a = serializer!(int)("a"); } }No, "T deserialize (T)(string key)" has: a = serializer!(int)("a"); And "T deserialize (T)()" has: a = serializer!(int)(); Both are correct. This is a complete example of using one of these methods: https://github.com/jacob-carlborg/orange/blob/master/tests/Custom.d Let me know if anything is confusing. -- /Jacob Carlborg
Oct 01 2011
On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:On 2011-10-01 05:00, Robert Jacques wrote:How about overrideSerializer or overloadSerializer? [snip]I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions.Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.Umm... example code for the deserialize method should contain 'deserialize' somewhere inside it.Both T deserialize (T)(); T deserialize (T)(string key); have the following example: class Foo { int a; void fromData (Serializer serializer, Serializer.Data key) { a = serializer!(int)("a"); } }No, "T deserialize (T)(string key)" has: a = serializer!(int)("a"); And "T deserialize (T)()" has: a = serializer!(int)(); Both are correct. This is a complete example of using one of these methods: https://github.com/jacob-carlborg/orange/blob/master/tests/Custom.d Let me know if anything is confusing.
Oct 02 2011
On 2011-10-03 05:50, Robert Jacques wrote:On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method?On 2011-10-01 05:00, Robert Jacques wrote:How about overrideSerializer or overloadSerializer?I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions.Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.Umm... example code for the deserialize method should contain 'deserialize' somewhere inside it.You are completley right a = serializer!(int)("a"); Should be a = deserialize!(int)("a"); My bad. -- /Jacob Carlborg
Oct 03 2011
On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg <doob me.com> wrote:On 2011-10-03 05:50, Robert Jacques wrote:Yes. Sorry for being unclear. The concept being that at the instance level, you are overriding default behavior.On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method?On 2011-10-01 05:00, Robert Jacques wrote:How about overrideSerializer or overloadSerializer?I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions.Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.
Oct 03 2011
On 2011-10-03 15:39, Robert Jacques wrote:On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg <doob me.com> wrote:Yes, thanks, that makes sense. -- /Jacob CarlborgOn 2011-10-03 05:50, Robert Jacques wrote:Yes. Sorry for being unclear. The concept being that at the instance level, you are overriding default behavior.On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method?On 2011-10-01 05:00, Robert Jacques wrote:How about overrideSerializer or overloadSerializer?I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions.Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.
Oct 03 2011
On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:I would like to have some form of pre-review of my serialization library Orange for later inclusion in Phobos as std.serialization (or similar).[snip] (1) The first example in class Serializer is: auto serializer = new Serializer; Shouldn't it be auto serializer = new Serializer(archive); (2) orange.serialization.archives.XmlArchive need to be documented. (3) if Archive.Array (which is poorly named btw) "is a type independent representation of an array" then why does it contain an elementSize field? (3a) Also by the time archiving is called, isSliceOf should always return false. That this function exists speaks to design problems both large and small. On the small scale, isSliceOf indicates that you are testing every array against every other array for slicing, which I hope is not the case. On the large scale, all the alias/object/pointer/arrays/etc resolutions should be done by the serializer not the archive. The archive should only be responsible for converting the internal representation of the serializer to JSON/XML/YAML/etc. (3b) Given that Slice has an ID field, why doesn't array. (4) Why have an Archive Interface and a Base class with common functionality? Why not an abstract class? Also, 'Base' isn't an acceptable class name for a Phobos module. Use 'ArchiveBase' or something more unique instead.
Sep 30 2011
On 2011-10-01 06:29, Robert Jacques wrote:On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:You're right, thanks.I would like to have some form of pre-review of my serialization library Orange for later inclusion in Phobos as std.serialization (or similar).[snip] (1) The first example in class Serializer is: auto serializer = new Serializer; Shouldn't it be auto serializer = new Serializer(archive);(2) orange.serialization.archives.XmlArchive need to be documented.I was hoping the Archive interface and the Base abstract class would be enough.(3) if Archive.Array (which is poorly named btw) "is a type independent representation of an array" then why does it contain an elementSize field?Suggestions for other names are welcome. Perhaps it was poorly worded, but what I mean is that this type can represent all array types.(3a) Also by the time archiving is called, isSliceOf should always return false.Why is that?That this function exists speaks to design problems both large and small. On the small scale, isSliceOf indicates that you are testing every array against every other array for slicing, which I hope is not the case.I do. How would I otherwise discover if an array is a slice of another array or not?On the large scale, all the alias/object/pointer/arrays/etc resolutions should be done by the serializer not the archive. The archive should only be responsible for converting the internal representation of the serializer to JSON/XML/YAML/etc.isSliceOf is never called by any existing archive. Perhaps Slice and Array should be moved to the serializer module. I can also remove isSliceOf from Base. I think I was trying to keep the archives independent of the serializer, in the sense that the archives never should have to import the serializer. That's not currently that case so these structs could be moved.(3b) Given that Slice has an ID field, why doesn't array.That's a good question. I don't think it's used at all. archiveSlice takes a Slice and a sliceId. This field should be removed.(4) Why have an Archive Interface and a Base class with common functionality? Why not an abstract class? Also, 'Base' isn't an acceptable class name for a Phobos module. Use 'ArchiveBase' or something more unique instead.As you can see Base is a template class and therefore is dependent on the archive type. My goal was that Serializer should not be a template class. I thought it was unnecessary to repeat the module name in the class name. But in this case it might be a good idea. Or perhaps call it AbstractArchive. -- /Jacob Carlborg
Oct 01 2011
On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob me.com> wrote:On 2011-10-01 06:29, Robert Jacques wrote:[snip]On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:For the pre-review its okay. But you'll need it for the actual review.(2) orange.serialization.archives.XmlArchive need to be documented.I was hoping the Archive interface and the Base abstract class would be enough.If isSliceOf can return true, then that means that the archive is responsible for alias detection, management, etc. That means that every single archive format must implement an alias resolution algorithm. This results in a lot of copy/paste boiler plate which has to be maintained. It also makes it more difficult to get extra formats supported. Worse if someone forgets to either include this functionality or to apply a patch, silent bugs and/or slowdowns are introduced. All and archiver should be responsible for is taking some decorated data structure and converting it to XML/JSON/YAML/etc and back again. Anything more complex than that should be shared functionality located in the serializer / de-serializer objects.(3) if Archive.Array (which is poorly named btw) "is a type independent representation of an array" then why does it contain an elementSize field?Suggestions for other names are welcome. Perhaps it was poorly worded, but what I mean is that this type can represent all array types.(3a) Also by the time archiving is called, isSliceOf should always return false.Why is that?Okay, first some rational. Consider: assert(!a.isSliceOf(b)); assert(!b.isSliceOf(a)); assert( c.isSliceOf(a)); assert( c.isSliceOf(b)); and class Foo { float x; float[3] point; } void main() { auto foo = new Foo; auto ptr = &foo.x; auto slice = point[0..2]; } In the first case, a, b and c are all slices of a common root array, but the root array may not be serialized. In the second case, first you have a pointer to the inside of an object and second you have a slice of a static array inside an object, all three of which may be serialized together. My impression from your API (so this might not be correct) is that currently, you can't handle the above use cases. Even if you can, an O(N^2) algorithm is rather inefficient. The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag: { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk } For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance. As an optimization, the user should be able to 'finalize' the serialization by pruning away all memory chunks without aliases. (i.e. a serializeAndClear method)That this function exists speaks to design problems both large and small. On the small scale, isSliceOf indicates that you are testing every array against every other array for slicing, which I hope is not the case.I do. How would I otherwise discover if an array is a slice of another array or not?
Oct 01 2011
On 2011-10-02 00:52, Robert Jacques wrote:On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob me.com> wrote: For the pre-review its okay. But you'll need it for the actual review.Ok, it will be the same documentation as for Archive and Base. Ddoc really needs to get better at this, I mean, why can't it just inherit the documentation.No, who says that. You can take this struct and use it outside of this library, it knows nothing about archiving or serialization. If the isSliceOf method should return false when archiving has been called I would need to add logic to detect when serialization and archiving has begun and ended.If isSliceOf can return true, then that means that the archive is responsible for alias detection, management, etc.Also by the time archiving is called, isSliceOf should always return false.Why is that?That means that every single archive format must implement an alias resolution algorithm.No, the serializer performs this task.This results in a lot of copy/paste boiler plate which has to be maintained. It also makes it more difficult to get extra formats supported. Worse if someone forgets to either include this functionality or to apply a patch, silent bugs and/or slowdowns are introduced. All and archiver should be responsible for is taking some decorated data structure and converting it to XML/JSON/YAML/etc and back again. Anything more complex than that should be shared functionality located in the serializer / de-serializer objects.It's the only thing the archive does.This is how it works: As the first step all arrays are serialized as regular arrays and not slices. After all serialization is done I loop over all arrays and check if they are a slice of some other array. If I found a match I replace the serialized array with a slice instead. These arrays are stored as an associative array with the type Array[Id]. I don't know if there's a better data structure for this.I do. How would I otherwise discover if an array is a slice of another array or not?Okay, first some rational. Consider: assert(!a.isSliceOf(b)); assert(!b.isSliceOf(a)); assert( c.isSliceOf(a)); assert( c.isSliceOf(b)); and class Foo { float x; float[3] point; } void main() { auto foo = new Foo; auto ptr = &foo.x; auto slice = point[0..2]; } In the first case, a, b and c are all slices of a common root array, but the root array may not be serialized. In the second case, first you have a pointer to the inside of an object and second you have a slice of a static array inside an object, all three of which may be serialized together. My impression from your API (so this might not be correct) is that currently, you can't handle the above use cases. Even if you can, an O(N^2) algorithm is rather inefficient.The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag: { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk } For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance.I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array?As an optimization, the user should be able to 'finalize' the serialization by pruning away all memory chunks without aliases. (i.e. a serializeAndClear method)-- /Jacob Carlborg
Oct 02 2011
On Mon, 03 Oct 2011 02:38:22 -0400, Jacob Carlborg <doob me.com> wrote:On 2011-10-02 00:52, Robert Jacques wrote:[snip]On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob me.com> wrote:So, in essence, you are saying that by the time archiving occurs, isSliceOf will always return false? Then why is it part of the public API?No, who says that. You can take this struct and use it outside of this library, it knows nothing about archiving or serialization. If the isSliceOf method should return false when archiving has been called I would need to add logic to detect when serialization and archiving has begun and ended.If isSliceOf can return true, then that means that the archive is responsible for alias detection, management, etc.Also by the time archiving is called, isSliceOf should always return false.Why is that?Okay. [snip]That means that every single archive format must implement an alias resolution algorithm.No, the serializer performs this task.I presented one below.This is how it works: As the first step all arrays are serialized as regular arrays and not slices. After all serialization is done I loop over all arrays and check if they are a slice of some other array. If I found a match I replace the serialized array with a slice instead. These arrays are stored as an associative array with the type Array[Id]. I don't know if there's a better data structure for this.I do. How would I otherwise discover if an array is a slice of another array or not?Okay, first some rational. Consider: assert(!a.isSliceOf(b)); assert(!b.isSliceOf(a)); assert( c.isSliceOf(a)); assert( c.isSliceOf(b)); and class Foo { float x; float[3] point; } void main() { auto foo = new Foo; auto ptr = &foo.x; auto slice = point[0..2]; } In the first case, a, b and c are all slices of a common root array, but the root array may not be serialized. In the second case, first you have a pointer to the inside of an object and second you have a slice of a static array inside an object, all three of which may be serialized together. My impression from your API (so this might not be correct) is that currently, you can't handle the above use cases. Even if you can, an O(N^2) algorithm is rather inefficient.Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm. I'm assuming you currently separate object and array alias resolution and don't handle pointers at all.The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag: { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk } For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance.I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array?
Oct 03 2011
On 2011-10-03 15:57, Robert Jacques wrote:So, in essence, you are saying that by the time archiving occurs, isSliceOf will always return false? Then why is it part of the public API?No, I'm not saying that. Example: struct Array { void* ptr; size_t length; size_t elementSize; bool isSliceOf (Array b) { return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + b.length * b.elementSize; } } void main () { auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. 4]; auto aArr = Array(a.ptr, a.length, 4); auto bArr = Array(b.ptr, b.length, 4); assert(bArr.isSliceOf(aArr)); assert(!aArr.isSliceOf(bArr)); } Both the asserts in the above code passes as expected. See, no serialization or archiving in sight. Is there something I'm missing? Actually it does not need to be part of the public API when I think about it. I can move it into Serializer. Array would still need to be public since both Serailzer and Archive need access to it and the package attribute doesn't work very well.Now I think I start to understand.Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm.The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag: { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk } For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance.I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array?I'm assuming you currently separate object and array alias resolution and don't handle pointers at all.Yes. Pointers are handled as well. It's handled similar to arrays/slices. First it always serializes what the pointer points to. Then in a post process step, after all serialization is done, it replaces all serialized pointers, that points to a value that has been serialized, with a reference. If a given pointer doesn't point to a value that has be serialized it's left as is. I can see if I this memory chunk approach can be used instead. How will this be used with the balanced tree, could you give a simple example? -- /Jacob Carlborg
Oct 03 2011
On Mon, 03 Oct 2011 14:10:52 -0400, Jacob Carlborg <doob me.com> wrote:On 2011-10-03 15:57, Robert Jacques wrote:That putting isSliceOf in the public API, implies its usage by the archiver.So, in essence, you are saying that by the time archiving occurs, isSliceOf will always return false? Then why is it part of the public API?No, I'm not saying that. Example: struct Array { void* ptr; size_t length; size_t elementSize; bool isSliceOf (Array b) { return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + b.length * b.elementSize; } } void main () { auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. 4]; auto aArr = Array(a.ptr, a.length, 4); auto bArr = Array(b.ptr, b.length, 4); assert(bArr.isSliceOf(aArr)); assert(!aArr.isSliceOf(bArr)); } Both the asserts in the above code passes as expected. See, no serialization or archiving in sight. Is there something I'm missing?Actually it does not need to be part of the public API when I think about it. I can move it into Serializer. Array would still need to be public since both Serailzer and Archive need access to it and the package attribute doesn't work very well.Regarding design, I agree, although I'd go one further and define Array as a public type inside the Serializer class. However, this concept of an 'Array' is fundamentally flawed. Consider: auto c = a[1..3]; auto cArr = Array(c.ptr,c.length,4); assert(!cArr.isSliceOf(bArr)); assert(!bArr.isSliceOf(cArr)); // and b ~= 5; bArr = Array(b.ptr, b.length, 4); assert(!bArr.isSliceOf(aArr)); In short, a serializer must be capable of handling overlapping arrays, not just strict slices. The array representation therefore needs to parameterize both the underlying array common to all slices and the actual slice that was serialized for that key.Can a pointer point to the interior of an object? To an element of an array?Now I think I start to understand.Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm.The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag: { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk } For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance.I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array?I'm assuming you currently separate object and array alias resolution and don't handle pointers at all.Yes. Pointers are handled as well. It's handled similar to arrays/slices. First it always serializes what the pointer points to. Then in a post process step, after all serialization is done, it replaces all serialized pointers, that points to a value that has been serialized, with a reference. If a given pointer doesn't point to a value that has be serialized it's left as is.I can see if I this memory chunk approach can be used instead. How will this be used with the balanced tree, could you give a simple example?Well, balanced trees need a comparison function so: struct Node { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk //... Other meta-data, i.e. ID, int opCmp(const ref Node b) { if( tail < b.head) return -1; if(b.tail < head) return 1; return 0; } } On equality / assignment, one just has to combine the heads and tail with min/max, respectively, and update hasAliases, etc. The difficulty is when a new node 'bridges the gap' between two existing nodes. This has to handled explicitly as part of the tree re-balancing, but you may want to consider making the merging of nodes part of the comparison operator for simplicity/efficiency: head = min(head,b.head); tail = max(tail,b.tail); hasAliases = true; After pruning, updating meta-data, etc, the aliased memory chunk for any given pointer can be found using a separate comparison operator: int opCmp(const ref void* b) { if(tail < b ) return -1; if(b < head) return 1; return 0; } // Which you'd probably use like: if( auto node = arr.ptr in setOfAliases ) { auto offset = arr.ptr - node.head; //... } else {//... So at the pseudo-code level, it would be something like: foreach(obj; [objects,pointers,arrays]) { auto node = Node(obj); setOfAliases.insert(node); // Convert obj into an intermediate form } setOfAliases.pruneUnaliasedNodes; setOfAliases.postProcessMetadata; // i.e. assign unique alias ID, etc foreach(obj; [objects,pointers,arrays]) { if( auto node = arr.ptr in setOfAliases ) { auto offset = arr.ptr - node.head; // Perform an aliased archival } else { // Perform an unaliased archival } } I hope that helped, though I'm not sure if I really answered your question or not.
Oct 03 2011
On 2011-10-04 07:21, Robert Jacques wrote:That putting isSliceOf in the public API, implies its usage by the archiver.Ok, then I'll remove it.Any suggestion how to fix this, how to properly detect if an array is a slice of some other array?Actually it does not need to be part of the public API when I think about it. I can move it into Serializer. Array would still need to be public since both Serailzer and Archive need access to it and the package attribute doesn't work very well.Regarding design, I agree, although I'd go one further and define Array as a public type inside the Serializer class. However, this concept of an 'Array' is fundamentally flawed. Consider: auto c = a[1..3]; auto cArr = Array(c.ptr,c.length,4); assert(!cArr.isSliceOf(bArr)); assert(!bArr.isSliceOf(cArr));// and b ~= 5; bArr = Array(b.ptr, b.length, 4); assert(!bArr.isSliceOf(aArr));Appending to "b" will reallocate "b" making it a regular array and not a slice: b ~= 5; b[] = 100; assert(a == [0, 1, 2, 3, 4]); "a" is not modified and the assert passes.Can a pointer point to the interior of an object? To an element of an array?Yes, have a look at: https://github.com/jacob-carlborg/orange/blob/master/tests/Pointer.d Not sure about the array part though, but I think so. Each element of an array is serialized just like all the other types.Yes, I think it at least helped somewhat. The thing is that I'm not very familiar with using trees like these. Would this something similar to: https://github.com/jacob-carlborg/orange/blob/master/orange/serializatio /Serializer.d#L1520 ? What is the advantage with using a tree? Is the advantage that you loop over the elements once in the pseudo-code compared to that I loop over them twice, as in: https://github.com/jacob-carlborg/orange/blob/master/orange/serializatio /Serializer.d#L1495 ? -- /Jacob CarlborgI can see if I this memory chunk approach can be used instead. How will this be used with the balanced tree, could you give a simple example?Well, balanced trees need a comparison function so: struct Node { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk //... Other meta-data, i.e. ID, int opCmp(const ref Node b) { if( tail < b.head) return -1; if(b.tail < head) return 1; return 0; } } On equality / assignment, one just has to combine the heads and tail with min/max, respectively, and update hasAliases, etc. The difficulty is when a new node 'bridges the gap' between two existing nodes. This has to handled explicitly as part of the tree re-balancing, but you may want to consider making the merging of nodes part of the comparison operator for simplicity/efficiency: head = min(head,b.head); tail = max(tail,b.tail); hasAliases = true; After pruning, updating meta-data, etc, the aliased memory chunk for any given pointer can be found using a separate comparison operator: int opCmp(const ref void* b) { if(tail < b ) return -1; if(b < head) return 1; return 0; } // Which you'd probably use like: if( auto node = arr.ptr in setOfAliases ) { auto offset = arr.ptr - node.head; //... } else {//... So at the pseudo-code level, it would be something like: foreach(obj; [objects,pointers,arrays]) { auto node = Node(obj); setOfAliases.insert(node); // Convert obj into an intermediate form } setOfAliases.pruneUnaliasedNodes; setOfAliases.postProcessMetadata; // i.e. assign unique alias ID, etc foreach(obj; [objects,pointers,arrays]) { if( auto node = arr.ptr in setOfAliases ) { auto offset = arr.ptr - node.head; // Perform an aliased archival } else { // Perform an unaliased archival } } I hope that helped, though I'm not sure if I really answered your question or not.
Oct 04 2011
On Tue, 04 Oct 2011 03:22:35 -0400, Jacob Carlborg <doob me.com> wrote:On 2011-10-04 07:21, Robert Jacques wrote:[snip]Well, there are two problems. First, there is an API issue: When you pass an aliased array to the archiver, you must pass both an array and a slice of that array. Second, is detection. The comparison operator I suggested below covers this issue. Just remember that with a loop based approach, you can't terminate the search early: you have to check all existing arrays for possible matches.Any suggestion how to fix this, how to properly detect if an array is a slice of some other array?Actually it does not need to be part of the public API when I think about it. I can move it into Serializer. Array would still need to be public since both Serailzer and Archive need access to it and the package attribute doesn't work very well.Regarding design, I agree, although I'd go one further and define Array as a public type inside the Serializer class. However, this concept of an 'Array' is fundamentally flawed. Consider: auto c = a[1..3]; auto cArr = Array(c.ptr,c.length,4); assert(!cArr.isSliceOf(bArr)); assert(!bArr.isSliceOf(cArr));I'm sorry, you're right. In my mind b extended to the end of the a array, for some reason. However, if you do define b to extend to the end of the a array, then it can append without allocating: auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. $]; b ~= 5; assert(b[0]==2); a[2] = 10; assert(b[0]==10); So please don't dismiss this point. [snip]// and b ~= 5; bArr = Array(b.ptr, b.length, 4); assert(!bArr.isSliceOf(aArr));Appending to "b" will reallocate "b" making it a regular array and not a slice: b ~= 5; b[] = 100; assert(a == [0, 1, 2, 3, 4]); "a" is not modified and the assert passes.Would this something similar to: https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1520 ?I'm not sure, that routine seems to be filtering pointers between those with aliases and those without aliases, which would be similar in effect to: if( auto node = arr.ptr in setOfAliases ) {} else {}What is the advantage with using a tree? Is the advantage that you loop over the elements once in the pseudo-code compared to that I loop over them twice, as in: https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1495 ?Primarily, it's O(N logN) vs O(N^2). Also, it solves the isSliceOf problem we discussed above and puts arrays and objects into the same framework, as objects containing fixed sized arrays can have slices.
Oct 04 2011
On 2011-10-04 17:14, Robert Jacques wrote:I'm sorry, you're right. In my mind b extended to the end of the a array, for some reason. However, if you do define b to extend to the end of the a array, then it can append without allocating: auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. $]; b ~= 5; assert(b[0]==2); a[2] = 10; assert(b[0]==10); So please don't dismiss this point.Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not changed when "b" is changed. Since you append to "b" I though that it would always require a reallocation of "b".[snip]I guess I just have to try this and see how it works out. Anyway, thank you for your review and your patience. -- /Jacob CarlborgWould this something similar to: https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1520 ?I'm not sure, that routine seems to be filtering pointers between those with aliases and those without aliases, which would be similar in effect to: if( auto node = arr.ptr in setOfAliases ) {} else {}What is the advantage with using a tree? Is the advantage that you loop over the elements once in the pseudo-code compared to that I loop over them twice, as in: https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1495 ?Primarily, it's O(N logN) vs O(N^2). Also, it solves the isSliceOf problem we discussed above and puts arrays and objects into the same framework, as objects containing fixed sized arrays can have slices.
Oct 04 2011
On Tue, 04 Oct 2011 12:54:27 -0400, Jacob Carlborg <doob me.com> wrote:On 2011-10-04 17:14, Robert Jacques wrote:Appending has never mandated reallocation. Maybe you're confusing it with concatenation (i.e. b = b ~ 5;), which does always reallocate. Or the period in time when the anti-array-stomping parts of druntime was overly conservative and would have prevented b from appending in place.I'm sorry, you're right. In my mind b extended to the end of the a array, for some reason. However, if you do define b to extend to the end of the a array, then it can append without allocating: auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. $]; b ~= 5; assert(b[0]==2); a[2] = 10; assert(b[0]==10); So please don't dismiss this point.Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not changed when "b" is changed. Since you append to "b" I though that it would always require a reallocation of "b".
Oct 04 2011
On 2011-10-05 03:49, Robert Jacques wrote:On Tue, 04 Oct 2011 12:54:27 -0400, Jacob Carlborg <doob me.com> wrote:Yeah, maybe you're right. -- /Jacob CarlborgOn 2011-10-04 17:14, Robert Jacques wrote:Appending has never mandated reallocation. Maybe you're confusing it with concatenation (i.e. b = b ~ 5;), which does always reallocate. Or the period in time when the anti-array-stomping parts of druntime was overly conservative and would have prevented b from appending in place.I'm sorry, you're right. In my mind b extended to the end of the a array, for some reason. However, if you do define b to extend to the end of the a array, then it can append without allocating: auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. $]; b ~= 5; assert(b[0]==2); a[2] = 10; assert(b[0]==10); So please don't dismiss this point.Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not changed when "b" is changed. Since you append to "b" I though that it would always require a reallocation of "b".
Oct 04 2011