digitalmars.D - finish function for output ranges
- Andrei Alexandrescu (34/34) Aug 11 2012 N.B. I haven't yet reviewed the proposal.
- Jonathan M Davis (7/13) Aug 11 2012 finish is what I've used for similar functions in the past. It seems lik...
- Russel Winder (18/24) Aug 12 2012 =20
- Daniel (3/9) Aug 12 2012 How about naming "finish", flush? Which is unambiguous...
- Walter Bright (3/4) Aug 12 2012 We discussed that and rejected it, because "flush" has connotations of b...
- Johannes Pfau (26/43) Aug 12 2012 This is a little off topic, but when I implemented the recent changes
- Walter Bright (3/6) Aug 12 2012 I worry about too many parts to an interface.
- Jonathan M Davis (19/27) Aug 12 2012 The big question is whether it's merited in output ranges in general. A
- Dmitry Olshansky (17/43) Aug 12 2012 Agreed. Current appender has .data (a-la peek) just because of
- Jonathan M Davis (6/15) Aug 12 2012 I'm not sure. He was basically told to redo it as ArrayBuilder rather th...
- Dmitry Olshansky (23/42) Aug 12 2012 Would have been nice to have operator '<-' for "place" *LOL* :)
- =?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= (15/19) Aug 12 2012 As another data point, std.log has a Logger interface which defines
- Joseph Rushton Wakeling (7/11) Aug 12 2012 What about a start() method? You may recall in the RandomSample revisio...
- Mehrdad (3/4) Aug 15 2012 Wouldn't "~" be a better choice?
- Andrei Alexandrescu (3/6) Aug 15 2012 I think neither would be a good choice.
- Joseph Rushton Wakeling (2/3) Aug 15 2012 Was this a daft question? :-)
N.B. I haven't yet reviewed the proposal. There's been a lot of discussion about the behavior of hash accumulators, and I've just have a chat with Walter about such. There are two angles in the discussion: 1. One is, the hash accumulator should work as an operand in an accumulation expression. Then the reduce() algorithm can be used as follows: HashAccumulator ha; reduce!((a, b) => a + b)(ha, [1, 2, 3]); writeln(ha.finish()); This assumes the hash overloads operator +. 2. The other is, the hash accumulator is an output range - a sink! - that supports put() for a lot of stuff. Then the code would go: HashAccumulator ha; copy([1, 2, 3], ha); writeln(ha.finish()); I think (2) is a much more fertile view than (1) because the notion of "reduce" emphasizes the accumulation operation (such as "+"), and that is a forced notion for hashes (we're not really adding stuff there). In contrast, the notion that the hash accumulator is a sink is very natural: you just dump a lot of stuff into the accumulator, and then you call finish and you get its digest. So, where does this leave us? I think we should reify the notion of finish() as an optional method for output ranges. We define in std.range a free finish(r) function that does nothing if r does not define a finish() method, and invokes the method if r does define it. Then people can call r.finish() for all output ranges no problem. For files, finish() should close the file (or at least flush it - unclear on that). I also wonder whether there exists a better name than finish(), and how to handle cases in which e.g. you finish() an output range and then you put more stuff into it, or you finish() a range several times, etc. Destroy! Andrei
Aug 11 2012
On Saturday, August 11, 2012 19:29:53 Andrei Alexandrescu wrote:I also wonder whether there exists a better name than finish()finish is what I've used for similar functions in the past. It seems like a fine name to me.and how to handle cases in which e.g. you finish() an output range and then you put more stuff into it, or you finish() a range several times, etc.In all of the cases that I've dealt with where anything like finish is required, it's made no sense whatsoever to call finish mulitple times.Destroy!Overall, seems like a sensible idea to me. - Jonathan M Davis
Aug 11 2012
On Sat, 2012-08-11 at 19:29 -0400, Andrei Alexandrescu wrote: [=E2=80=A6]I think (2) is a much more fertile view than (1) because the notion of==20"reduce" emphasizes the accumulation operation (such as "+"), and that==20is a forced notion for hashes (we're not really adding stuff there). In==20contrast, the notion that the hash accumulator is a sink is very=20 natural: you just dump a lot of stuff into the accumulator, and then you==20call finish and you get its digest.One could also consider the hash generator to be a builder, which would support 2 rather than 1. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Aug 12 2012
On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu wrote:N.B. I haven't yet reviewed the proposal. For files, finish() should close the file (or at least flush it - unclear on that). I also wonder whether there exists a better name than finish(), and how to handle cases in which e.g. you finish() an output range and then you put more stuff into it, or you finish() a range several times, etc.How about naming "finish", flush? Which is unambiguous...
Aug 12 2012
On 8/12/2012 1:25 AM, Daniel wrote:How about naming "finish", flush? Which is unambiguous...We discussed that and rejected it, because "flush" has connotations of being an intermediate operation, not a final one.
Aug 12 2012
Am Sat, 11 Aug 2012 19:29:53 -0400 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:2. The other is, the hash accumulator is an output range - a sink! - that supports put() for a lot of stuff. Then the code would go: HashAccumulator ha; copy([1, 2, 3], ha);This is a little off topic, but when I implemented the recent changes for std.hash I noticed the above code doesn't work, as ha is passed by value. You currently have to do this: ha = copy([1, 2, 3], ha); //or copy([1, 2, 3], &ha);I think we should reify the notion of finish() as an optional method for output ranges. We define in std.range a free finish(r) function that does nothing if r does not define a finish() method, and invokes the method if r does define it. Then people can call r.finish() for all output ranges no problem.Sounds good.For files, finish() should close the file (or at least flush it - unclear on that). I also wonder whether there exists a better name than finish(), and how to handle cases in which e.g. you finish() an output range and then you put more stuff into it, or you finish() a range several times, etc.The current behavior in std.hash is to reset the 'HashAccumulator' to it's initial state after finish was called, so it can be reused. Finish does some computation which leaves the 'HashAccumulator' in an invalid state and resetting it is cheap, so I thought an implicit reset is convenient. I'm not sure about files. The data property in Appender is also similar, but it doesn't modify the internal state AFAIK, so it's possible to continue using Appender after accessing data. It's probably more a peek function than a finish function. Probably we should distinguish between finish functions which destroy the internal state and peek functions which do not modify the internal state. The implicit reset done in the hash finish functions would probably have to be removed then. The downside of this is that it's then possible to have a 'HashAccumulator' with invalid state, so 'put' would have to check for that (at least in debug mode). With the implicit reset it's not possible to get a 'HashAccumulator' with invalid state.
Aug 12 2012
On 8/12/2012 1:36 AM, Johannes Pfau wrote:Probably we should distinguish between finish functions which destroy the internal state and peek functions which do not modify the internal state.I worry about too many parts to an interface. A peek function needs a really strong use case to justify it.
Aug 12 2012
On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:On 8/12/2012 1:36 AM, Johannes Pfau wrote:The big question is whether it's merited in output ranges in general. A function could still be worth having on an individual type without making sense for output ranges in general as long as it's not required to use the type. However, I really don't think that peek makes sense for output ranges in general. Most of the time, you're just writing to them and not worrying about what's already been written. It's basically the same as when you write to a stream. I don't think that I've seen a peek function on an output stream. And I really don't think that peek makes sense in the context of hashes. You don't care what a hash is until it's finished. And once it's finished, it really doesn't make sense to keep adding to it. I don't know why you'd ever want an intermediate hash result. If you call finish, you're done. And if finish gets called again, it's just like if you call popFront after an input range is empty. The behavior is undefined (though popFront - or finish in this case - probably has an assertion for checking in non-release mode). I really don't think that it's all that big a deal. - Jonathan M DavisProbably we should distinguish between finish functions which destroy the internal state and peek functions which do not modify the internal state.I worry about too many parts to an interface. A peek function needs a really strong use case to justify it.
Aug 12 2012
On 12-Aug-12 14:35, Jonathan M Davis wrote:On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:Agreed. Current appender has .data (a-la peek) just because of implementation details that allow it. In fact there was a pull for Phobos including s better Appender (that would end up being a Builder I guess) that doesn't allow to peek at array in creation until the very end but provides much better performance profile. BTW what's happened with that pull? I recall github nickname sandford, but can't recall whose awesome work that was. I'd love to see it make its way into Phobos.On 8/12/2012 1:36 AM, Johannes Pfau wrote:The big question is whether it's merited in output ranges in general. A function could still be worth having on an individual type without making sense for output ranges in general as long as it's not required to use the type. However, I really don't think that peek makes sense for output ranges in general. Most of the time, you're just writing to them and not worrying about what's already been written. It's basically the same as when you write to a stream. I don't think that I've seen a peek function on an output stream.Probably we should distinguish between finish functions which destroy the internal state and peek functions which do not modify the internal state.I worry about too many parts to an interface. A peek function needs a really strong use case to justify it.And I really don't think that peek makes sense in the context of hashes. You don't care what a hash is until it's finished. And once it's finished, it really doesn't make sense to keep adding to it. I don't know why you'd ever want an intermediate hash result.Having partial hashes over data is very useful e.g. for fast binary diff algorithms. However it requires specific form of hash function (so that you can look at result) and/or operating at specific block granularity.If you call finish, you're done. And if finish gets called again, it's just like if you call popFront after an input range is empty. The behavior is undefined (though popFront - or finish in this case - probably has an assertion for checking in non-release mode). I really don't think that it's all that big a deal.Agreed. finish seems as a good name because indicates that it's an end of operation. But consider that hash accumulator can be reset after finish and reused (unlike say network stream). -- Dmitry Olshansky
Aug 12 2012
On Sunday, August 12, 2012 14:57:00 Dmitry Olshansky wrote:Agreed. Current appender has .data (a-la peek) just because of implementation details that allow it. In fact there was a pull for Phobos including s better Appender (that would end up being a Builder I guess) that doesn't allow to peek at array in creation until the very end but provides much better performance profile. BTW what's happened with that pull? I recall github nickname sandford, but can't recall whose awesome work that was. I'd love to see it make its way into Phobos.I'm not sure. He was basically told to redo it as ArrayBuilder rather than changing Appender, since changing Appender would break a lot of code, and his changes arguably made it so that it wasn't an appender anymore anyway. But he has yet to post a new pull request with those changes. - Jonathan M Davis
Aug 12 2012
On 12-Aug-12 03:29, Andrei Alexandrescu wrote:N.B. I haven't yet reviewed the proposal. There's been a lot of discussion about the behavior of hash accumulators, and I've just have a chat with Walter about such. There are two angles in the discussion: 1. One is, the hash accumulator should work as an operand in an accumulation expression. Then the reduce() algorithm can be used as follows: HashAccumulator ha; reduce!((a, b) => a + b)(ha, [1, 2, 3]); writeln(ha.finish()); This assumes the hash overloads operator +.Would have been nice to have operator '<-' for "place" *LOL* :) (OT: I think C++ would be a much better place if it had it for e.g. iostream ...)I think we should reify the notion of finish() as an optional method for output ranges. We define in std.range a free finish(r) function that does nothing if r does not define a finish() method, and invokes the method if r does define it. Then people can call r.finish() for all output ranges no problem. For files, finish() should close the file (or at least flush it - unclear on that).Easy to check: { File f = File("myfile", "w"); auto sink = f.lockingTextWriter; dumpTo(sink, some_vars); dumpTo(sink, some_other_vars); sink.finish(); //would be taking on f's job to close file return f; //and now what? clearly f is the one responsible // (with the means to transfer that responsibility) } So IMO ranges should not step down to topology & origins of data be it output range or input range. This also means that with streams, finish is a flush and thus I'd expect finish to be callable many times in row.Destroy!One thing I don't like about it is a by-hand nature of it. Manual way is good only when you are interested in the result of finish. I half expect to see rule #X of D coding standard: use RAI or scope(exit) to flush an output range -- Dmitry Olshansky
Aug 12 2012
On Sat, Aug 11, 2012 at 4:29 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:For files, finish() should close the file (or at least flush it - unclear on that). I also wonder whether there exists a better name than finish(), and how to handle cases in which e.g. you finish() an output range and then you put more stuff into it, or you finish() a range several times, etc.As another data point, std.log has a Logger interface which defines log(...) and flush (...) so this fits perfectly into this output range design. Here are the current signatures: shared void log(const ref LogMessage message); shared void flush(); finish would be a weird name for 'interface Logger' because it is not a final operation. For what it is worth, Java's OutputStream has the following: void write(...) // overloaded three times void flush() // "Flushes this output stream and forces any buffered output bytes to be written out." void close() // "Closes this output stream and releases any system resources associated with this stream."
Aug 12 2012
On 12/08/12 00:29, Andrei Alexandrescu wrote:I think we should reify the notion of finish() as an optional method for output ranges. We define in std.range a free finish(r) function that does nothing if r does not define a finish() method, and invokes the method if r does define it. Then people can call r.finish() for all output ranges no problem.What about a start() method? You may recall in the RandomSample revisions I had to introduce a tweak to ensure that the first value returned by front() was set only the first time front() was called, and not in the constructor. The idea of the start() method would be to addresses this requirement, i.e. to do something immediately before front() gets called for the first time and not earlier.
Aug 12 2012
On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu wrote:This assumes the hash overloads operator +.Wouldn't "~" be a better choice?
Aug 15 2012
On 8/15/12 6:27 AM, Mehrdad wrote:On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu wrote:I think neither would be a good choice. AndreiThis assumes the hash overloads operator +.Wouldn't "~" be a better choice?
Aug 15 2012
On 13/08/12 01:08, Joseph Rushton Wakeling wrote:What about a start() method?Was this a daft question? :-)
Aug 15 2012