digitalmars.D - Formatted read consumes input
- monarch_dodra (44/44) Aug 23 2012 As title implies:
- Dmitry Olshansky (14/51) Aug 24 2012 Yes, both parse family and formattedRead are operating on ref
- monarch_dodra (16/25) Aug 24 2012 I had actually considered that argument. But a lot of algorithms
- Denis Shelomovskij (6/28) Aug 24 2012 It's because `formattedRead` is designed to work with an input range
- monarch_dodra (8/48) Aug 24 2012 You had me ready to throw in the towel on that argument, but
- Tove (3/10) Aug 24 2012 Actually... look up "%n" in sscanf it's wonderful, I use it all
- Dmitry Olshansky (4/13) Aug 24 2012 God... what an awful kludge :)
- Steven Schveighoffer (14/56) Sep 07 2012 I believe it behaves as designed, but could be designed in such a way th...
- monarch_dodra (19/27) Sep 07 2012 If you want *do* ref behavior, I still don't see why you we don't
- Steven Schveighoffer (16/45) Sep 07 2012 This looks ugly. Returning a tuple and having to split the result is
- Jonathan M Davis (8/18) Sep 07 2012 Does it _ever_ make sense for a range to be an input range and not a for...
- Steven Schveighoffer (9/24) Sep 07 2012 No it doesn't. That is case 1.
- monarch_dodra (37/57) Sep 07 2012 True...
- monarch_dodra (3/5) Sep 07 2012 I've made a pull request out of it.
- kenji hara (5/11) Sep 08 2012 I have commented to the pull.
- Steven Schveighoffer (7/66) Sep 07 2012 Well, this does work. But I don't like that the semantics depend on
- monarch_dodra (16/22) Sep 07 2012 Yes, but that is another issue, it is a "copy" vs "save" semantic
- kenji hara (22/24) Sep 08 2012 Why you are afraid to declaring "dummy" variable?
- monarch_dodra (8/41) Sep 08 2012 Hum, I think I see your point, although in my opinion, checking
- monarch_dodra (8/41) Sep 08 2012 Hum, I think I see your point, although in my opinion, checking
As title implies: ---- import std.stdio; import std.format; void main() { string s = "42"; int v; formattedRead(s, "%d", &v); writefln("[%s] [%s]", s, v); } ---- [] [42] ---- Is this the "expected" behavior? Furthermore, it is not possible to try to "save" s: ---- import std.stdio; import std.format; import std.range; void main() { string s = "42"; int v; formattedRead(s.save, "%d", &v); writefln("[%s] [%s]", s, v); } ---- main.d(9): Error: template std.format.formattedRead does not match any function template declaration C:\D\dmd.2.060\dmd2\windows\bin\..\..\src\phobos\std\format.d(526): Error: template std.format.formattedRead(R,Char,S...) cannot deduce template function from argument types !()(string,string,int*) ---- The workaround is to have a named backup: auto ss = s.save; formattedRead(ss, "%d", &v); I've traced the root issue to formattedRead's signature, which is: uint formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args); Is there a particular reason for this pass by ref? It is inconsistent with the rest of phobos, or even C's scanf? Is this a file-able bug_report/enhancement_request?
Aug 23 2012
On Thursday, 23 August 2012 at 11:33:19 UTC, monarch_dodra wrote:As title implies: ---- import std.stdio; import std.format; void main() { string s = "42"; int v; formattedRead(s, "%d", &v); writefln("[%s] [%s]", s, v); } ---- [] [42] ---- Is this the "expected" behavior?Yes, both parse family and formattedRead are operating on ref argument. That means they modify in place. Also ponder the thought that 2 consecutive reads should obviously read first and 2nd value in the string not the same one.Furthermore, it is not possible to try to "save" s: ---- import std.stdio; import std.format; import std.range; void main() { string s = "42"; int v; formattedRead(s.save, "%d", &v); writefln("[%s] [%s]", s, v); } ----Yes, because ref doesn't bind r-value.The workaround is to have a named backup: auto ss = s.save; formattedRead(ss, "%d", &v); I've traced the root issue to formattedRead's signature, which is: uint formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args);As I explained above the reason is because the only sane logic of multiple reads is to consume input and to do so it needs ref.Is there a particular reason for this pass by ref? It is inconsistent with the rest of phobos, or even C's scanf?C's scanf is a poor argument as it uses pointers instead of ref (and it can't do ref as there is no ref in C :) ). Yet it doesn't allow to read things in a couple of calls AFAIK. In C scanf returns number of arguments successfully read not bytes so there is no way to continue from where it stopped. BTW it's not documented what formattedRead returns ... just ouch.
Aug 24 2012
On Friday, 24 August 2012 at 11:18:55 UTC, Dmitry Olshansky wrote:On Thursday, 23 August 2012 at 11:33:19 UTC, monarch_dodra wrote:I had actually considered that argument. But a lot of algorithms have the same approach, yet they don't take refs, they *return* the consumed front: ---- R formattedRead(R, Char, S...)(R r, const(Char)[] fmt, S args) auto s2 = formatedRead(s, "%d", &v); ---- Or arguably: ---- Tuple!(size_t, R) formattedRead(R, Char, S...)(R r, const(Char)[] fmt, S args) ---- "minCount", "boyerMooreFinder" and "levenshteinDistanceAndPath" all take this approach to return a consumed range plus an index/count.I've traced the root issue to formattedRead's signature, which is: uint formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args);As I explained above the reason is because the only sane logic of multiple reads is to consume input and to do so it needs ref.
Aug 24 2012
24.08.2012 16:16, monarch_dodra пишет:On Friday, 24 August 2012 at 11:18:55 UTC, Dmitry Olshansky wrote:It's because `formattedRead` is designed to work with an input range which isn't a forward range (not save-able). -- Денис В. Шеломовский Denis V. ShelomovskijOn Thursday, 23 August 2012 at 11:33:19 UTC, monarch_dodra wrote:I had actually considered that argument. But a lot of algorithms have the same approach, yet they don't take refs, they *return* the consumed front: ---- R formattedRead(R, Char, S...)(R r, const(Char)[] fmt, S args) auto s2 = formatedRead(s, "%d", &v); ---- Or arguably: ---- Tuple!(size_t, R) formattedRead(R, Char, S...)(R r, const(Char)[] fmt, S args) ---- "minCount", "boyerMooreFinder" and "levenshteinDistanceAndPath" all take this approach to return a consumed range plus an index/count.I've traced the root issue to formattedRead's signature, which is: uint formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args);As I explained above the reason is because the only sane logic of multiple reads is to consume input and to do so it needs ref.
Aug 24 2012
On Friday, 24 August 2012 at 13:08:43 UTC, Denis Shelomovskij wrote:24.08.2012 16:16, monarch_dodra пишет:You had me ready to throw in the towel on that argument, but thinking harder about it, that doesn't really change anything actually: At the end of formattedRead, the passed range has a certain state. whether you give this range back to the caller via "pass by ref" or "return by value" has nothing to do with save-ability.On Friday, 24 August 2012 at 11:18:55 UTC, Dmitry Olshansky wrote:It's because `formattedRead` is designed to work with an input range which isn't a forward range (not save-able).On Thursday, 23 August 2012 at 11:33:19 UTC, monarch_dodra wrote:I had actually considered that argument. But a lot of algorithms have the same approach, yet they don't take refs, they *return* the consumed front: ---- R formattedRead(R, Char, S...)(R r, const(Char)[] fmt, S args) auto s2 = formatedRead(s, "%d", &v); ---- Or arguably: ---- Tuple!(size_t, R) formattedRead(R, Char, S...)(R r, const(Char)[] fmt, S args) ---- "minCount", "boyerMooreFinder" and "levenshteinDistanceAndPath" all take this approach to return a consumed range plus an index/count.I've traced the root issue to formattedRead's signature, which is: uint formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args);As I explained above the reason is because the only sane logic of multiple reads is to consume input and to do so it needs ref.
Aug 24 2012
On Friday, 24 August 2012 at 11:18:55 UTC, Dmitry Olshansky wrote:C's scanf is a poor argument as it uses pointers instead of ref (and it can't do ref as there is no ref in C :) ). Yet it doesn't allow to read things in a couple of calls AFAIK. In C scanf returns number of arguments successfully read not bytes so there is no way to continue from where it stopped. BTW it's not documented what formattedRead returns ... just ouch.Actually... look up "%n" in sscanf it's wonderful, I use it all the time.
Aug 24 2012
On 24-Aug-12 17:43, Tove wrote:On Friday, 24 August 2012 at 11:18:55 UTC, Dmitry Olshansky wrote:God... what an awful kludge :) -- Olshansky DmitryC's scanf is a poor argument as it uses pointers instead of ref (and it can't do ref as there is no ref in C :) ). Yet it doesn't allow to read things in a couple of calls AFAIK. In C scanf returns number of arguments successfully read not bytes so there is no way to continue from where it stopped. BTW it's not documented what formattedRead returns ... just ouch.Actually... look up "%n" in sscanf it's wonderful, I use it all the time.
Aug 24 2012
On Thu, 23 Aug 2012 07:33:13 -0400, monarch_dodra <monarchdodra gmail.com> wrote:As title implies: ---- import std.stdio; import std.format; void main() { string s = "42"; int v; formattedRead(s, "%d", &v); writefln("[%s] [%s]", s, v); } ---- [] [42] ---- Is this the "expected" behavior? Furthermore, it is not possible to try to "save" s: ---- import std.stdio; import std.format; import std.range; void main() { string s = "42"; int v; formattedRead(s.save, "%d", &v); writefln("[%s] [%s]", s, v); } ---- main.d(9): Error: template std.format.formattedRead does not match any function template declaration C:\D\dmd.2.060\dmd2\windows\bin\..\..\src\phobos\std\format.d(526): Error: template std.format.formattedRead(R,Char,S...) cannot deduce template function from argument types !()(string,string,int*) ---- The workaround is to have a named backup: auto ss = s.save; formattedRead(ss, "%d", &v); I've traced the root issue to formattedRead's signature, which is: uint formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args); Is there a particular reason for this pass by ref? It is inconsistent with the rest of phobos, or even C's scanf? Is this a file-able bug_report/enhancement_request?I believe it behaves as designed, but could be designed in such a way that does not need ref input range. In fact, I think actually R needing to be ref is a bad thing. Consider that if D didn't consider string literals to be lvalues (an invalid assumption IMO), then passing a string literal as the input would not work! The only issue is, what if you *do* want ref behavior for strings? You would need to wrap the string into a ref'd range. That is not a good proposition. Unfortunately, the way IFTI works, there isn't an opportunity to affect the parameter type IFTI decides to use. I think a reasonable enhancement would be to add a formattedReadNoref (or better named alternative) that does not take a ref argument. -Steve
Sep 07 2012
On Friday, 7 September 2012 at 13:58:43 UTC, Steven Schveighoffer wrote:On Thu, 23 Aug 2012 07:33:13 -0400, monarch_dodra The only issue is, what if you *do* want ref behavior for strings? You would need to wrap the string into a ref'd range. That is not a good proposition. Unfortunately, the way IFTI works, there isn't an opportunity to affect the parameter type IFTI decides to use. [SNIP] -SteveIf you want *do* ref behavior, I still don't see why you we don't just do it the algorithm way of return by value: ---- Tuple!(uint, R) formattedRead2(R, Char, S...)(R r, const(Char)[] fmt, S args) { auto ret = formattedRead(r, fmt, args); return Tuple!(uint, R)(ret, r); } void main() { string s = "42 worlds"; int v; s = formattedRead(s.save, "%d", &v)[1]; writefln("[%s][%s]", v, s); } ----
Sep 07 2012
On Fri, 07 Sep 2012 10:35:37 -0400, monarch_dodra <monarchdodra gmail.com> wrote:On Friday, 7 September 2012 at 13:58:43 UTC, Steven Schveighoffer wrote:This looks ugly. Returning a tuple and having to split the result is horrible, I hated dealing with that in C++ (and I even wrote stuff that returned pairs!) Not only that, but there are possible ranges which may not be reassignable. I'd rather have a way to wrap a string into a ref-based input range. We have three situations: 1. input range is a ref type already (i.e. a class or a pImpl struct), no need to pass this by ref, just wastes cycles doing double dereference. 2. input range is a value type, and you want to preserve the original. 3. input range is a value type, and you want to update the original. I'd like to see the library automatically make the right decision for 1, and give you some mechanism to choose between 2 and 3. To preserve existing code, 3 should be the default. -SteveOn Thu, 23 Aug 2012 07:33:13 -0400, monarch_dodra The only issue is, what if you *do* want ref behavior for strings? You would need to wrap the string into a ref'd range. That is not a good proposition. Unfortunately, the way IFTI works, there isn't an opportunity to affect the parameter type IFTI decides to use. [SNIP] -SteveIf you want *do* ref behavior, I still don't see why you we don't just do it the algorithm way of return by value: ---- Tuple!(uint, R) formattedRead2(R, Char, S...)(R r, const(Char)[] fmt, S args) { auto ret = formattedRead(r, fmt, args); return Tuple!(uint, R)(ret, r); } void main() { string s = "42 worlds"; int v; s = formattedRead(s.save, "%d", &v)[1]; writefln("[%s][%s]", v, s); } ----
Sep 07 2012
On Friday, September 07, 2012 10:52:07 Steven Schveighoffer wrote:We have three situations: 1. input range is a ref type already (i.e. a class or a pImpl struct), no need to pass this by ref, just wastes cycles doing double dereference. 2. input range is a value type, and you want to preserve the original. 3. input range is a value type, and you want to update the original. I'd like to see the library automatically make the right decision for 1, and give you some mechanism to choose between 2 and 3. To preserve existing code, 3 should be the default.Does it _ever_ make sense for a range to be an input range and not a forward range and _not_ have it be a reference type? Since it would be implicitly saving it if it were a value type, it would then make sense that it should have save on it. So, I don't think that input ranges which aren't forward ranges make any sense unless they're reference types, in which case, there's no point in taking them by ref, and you _can't_ preserve the original. - Jonathan M Davis
Sep 07 2012
On Fri, 07 Sep 2012 11:04:36 -0400, Jonathan M Davis <jmdavisProg gmx.com> wrote:On Friday, September 07, 2012 10:52:07 Steven Schveighoffer wrote:No it doesn't. That is case 1. However, it's quite easy to forget to define "save" when your range really is a forward range. I don't really know a good way to fix this. To assume that an input-and-not-forward range has reference semantics is prone to inappropriate code compiling just fine. Clearly we can say classes are easily defined as not needing ref. -SteveWe have three situations: 1. input range is a ref type already (i.e. a class or a pImpl struct), no need to pass this by ref, just wastes cycles doing double dereference. 2. input range is a value type, and you want to preserve the original. 3. input range is a value type, and you want to update the original. I'd like to see the library automatically make the right decision for 1, and give you some mechanism to choose between 2 and 3. To preserve existing code, 3 should be the default.Does it _ever_ make sense for a range to be an input range and not a forward range and _not_ have it be a reference type?
Sep 07 2012
On Friday, 7 September 2012 at 14:51:45 UTC, Steven Schveighoffer wrote:On Fri, 07 Sep 2012 10:35:37 -0400, monarch_dodra This looks ugly. Returning a tuple and having to split the result is horrible, I hated dealing with that in C++ (and I even wrote stuff that returned pairs!) Not only that, but there are possible ranges which may not be reassignable. I'd rather have a way to wrap a string into a ref-based input range. We have three situations: 1. input range is a ref type already (i.e. a class or a pImpl struct), no need to pass this by ref, just wastes cycles doing double dereference. 2. input range is a value type, and you want to preserve the original. 3. input range is a value type, and you want to update the original. I'd like to see the library automatically make the right decision for 1, and give you some mechanism to choose between 2 and 3. To preserve existing code, 3 should be the default. -SteveTrue... Still, I find it horrible to have to create a named "dummy" variable just when I simply want to pass a copy of my range. I think I found 2 other solutions: 1: auto ref. 2: Kind of like auto ref: Just provide a non-ref overload. This creates less executable bloat. Like this: -------- //Formatted read for R-Value input range. uint formattedRead(R, Char, S...)(R r, const(Char)[] fmt, S args) { return formattedRead(r, fmt, args); } //Standard formated read uint formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args) -------- This allows me to write, as I would expect: -------- void main() { string s = "x42xT"; int v; formattedRead(s.save, "x%dx", &v); //Pyssing a copy writefln("[%s][%s]", v, s); formattedRead(s, "x%dx", &v); //Please consusme me writefln("[%s][%s]", v, s); } -------- [42][x42xT] //My range is unchanged [42][T] //My range was consumed -------- I think this is a good solution. Do you see anything I may have failed to see?
Sep 07 2012
On Friday, 7 September 2012 at 15:34:12 UTC, monarch_dodra wrote:I think this is a good solution. Do you see anything I may have failed to see?I've made a pull request out of it. https://github.com/D-Programming-Language/phobos/pull/777
Sep 07 2012
I have commented to the pull. I don't like adding convenient interfaces to std.format module. https://github.com/D-Programming-Language/phobos/pull/777#issuecomment-8385551 Kenji Hara 2012/9/8 monarch_dodra <monarchdodra gmail.com>:On Friday, 7 September 2012 at 15:34:12 UTC, monarch_dodra wrote:I think this is a good solution. Do you see anything I may have failed to see?I've made a pull request out of it. https://github.com/D-Programming-Language/phobos/pull/777
Sep 08 2012
On Fri, 07 Sep 2012 11:34:28 -0400, monarch_dodra <monarchdodra gmail.com> wrote:On Friday, 7 September 2012 at 14:51:45 UTC, Steven Schveighoffer wrote:Well, this does work. But I don't like that the semantics depend on whether the value is an rvalue or not. Note that even ranges that are true input ranges (i.e. a file) still consume their data, even as rvalues, there is no way around it. -SteveOn Fri, 07 Sep 2012 10:35:37 -0400, monarch_dodra This looks ugly. Returning a tuple and having to split the result is horrible, I hated dealing with that in C++ (and I even wrote stuff that returned pairs!) Not only that, but there are possible ranges which may not be reassignable. I'd rather have a way to wrap a string into a ref-based input range. We have three situations: 1. input range is a ref type already (i.e. a class or a pImpl struct), no need to pass this by ref, just wastes cycles doing double dereference. 2. input range is a value type, and you want to preserve the original. 3. input range is a value type, and you want to update the original. I'd like to see the library automatically make the right decision for 1, and give you some mechanism to choose between 2 and 3. To preserve existing code, 3 should be the default. -SteveTrue... Still, I find it horrible to have to create a named "dummy" variable just when I simply want to pass a copy of my range. I think I found 2 other solutions: 1: auto ref. 2: Kind of like auto ref: Just provide a non-ref overload. This creates less executable bloat. Like this: -------- //Formatted read for R-Value input range. uint formattedRead(R, Char, S...)(R r, const(Char)[] fmt, S args) { return formattedRead(r, fmt, args); } //Standard formated read uint formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args) -------- This allows me to write, as I would expect: -------- void main() { string s = "x42xT"; int v; formattedRead(s.save, "x%dx", &v); //Pyssing a copy writefln("[%s][%s]", v, s); formattedRead(s, "x%dx", &v); //Please consusme me writefln("[%s][%s]", v, s); } -------- [42][x42xT] //My range is unchanged [42][T] //My range was consumed -------- I think this is a good solution. Do you see anything I may have failed to see?
Sep 07 2012
On Friday, 7 September 2012 at 18:15:00 UTC, Steven Schveighoffer wrote:Well, this does work. But I don't like that the semantics depend on whether the value is an rvalue or not. Note that even ranges that are true input ranges (i.e. a file) still consume their data, even as rvalues, there is no way around it. -SteveYes, but that is another issue, it is a "copy" vs "save" semantic issue. In theory, one should assume that *even* with pass by value, if you want your range to not be consumed, you have to call "save". Most ranges are value types, so we tend to forget it. std.algorithm had a few save-related bugs like that as a matter of fact. But, contrary to post 1, that is not the actual issue being fixed here. It is merely a "compile with unnamed" fix: formattedRead(file.save, ...) And now it compiles fine. AND the range is saved. That's it. Nothing more, nothing less. ... That's *if* file provides "save". I do not know much about file/stream handling in D, but you get my "save" point.
Sep 07 2012
2012/9/8 monarch_dodra <monarchdodra gmail.com>: [snip]Still, I find it horrible to have to create a named "dummy" variable just when I simply want to pass a copy of my range.Why you are afraid to declaring "dummy" variable? formattedRead is a parser, not an algorithm (as I said in the pull request comment). After calling it, zero or more elements will remain. And, in almost cases, the remains will be used other purpose, or just checked that is empty. int n = formattedRead(input_range, fmt, args...); next_parsing(input_range); // reusing input_range assert(input_range.empty); // or just checked that is empty If formattedRead can receive rvalue, calling it would ignore the remains, and it will cause hidden bug. int n = formattedRead(r.save, fmt, args...); // If the remains is not empty, it is ignored. Is this expected, or something logical bug? auto dummy = r.save; int n = formattedRead(dummy, fmt, args...); assert(dummy.empty); // You can assert that remains should be empty. formattedRead returns multiple states (the values which are read, how many values are read, and remains of input), so allowing to ignore them would introduce bad usage and possibilities of bugs. Kenji Hara
Sep 08 2012
On Saturday, 8 September 2012 at 12:10:26 UTC, kenji hara wrote:2012/9/8 monarch_dodra <monarchdodra gmail.com>: [snip]Hum, I think I see your point, although in my opinion, checking the return value is all that is required for generic error checking. Checking the state of the range afterwards is being super extra careful for a specific use case, and should not necessarilly be forced onto the programmer. I'll close the pull in the morning.Still, I find it horrible to have to create a named "dummy" variable just when I simply want to pass a copy of my range.Why you are afraid to declaring "dummy" variable? formattedRead is a parser, not an algorithm (as I said in the pull request comment). After calling it, zero or more elements will remain. And, in almost cases, the remains will be used other purpose, or just checked that is empty. int n = formattedRead(input_range, fmt, args...); next_parsing(input_range); // reusing input_range assert(input_range.empty); // or just checked that is empty If formattedRead can receive rvalue, calling it would ignore the remains, and it will cause hidden bug. int n = formattedRead(r.save, fmt, args...); // If the remains is not empty, it is ignored. Is this expected, or something logical bug? auto dummy = r.save; int n = formattedRead(dummy, fmt, args...); assert(dummy.empty); // You can assert that remains should be empty. formattedRead returns multiple states (the values which are read, how many values are read, and remains of input), so allowing to ignore them would introduce bad usage and possibilities of bugs. Kenji Hara
Sep 08 2012
On Saturday, 8 September 2012 at 12:10:26 UTC, kenji hara wrote:2012/9/8 monarch_dodra <monarchdodra gmail.com>: [snip]Hum, I think I see your point, although in my opinion, checking the return value is all that is required for generic error checking. Checking the state of the range afterwards is being super extra careful for a specific use case, and should not necessarilly be forced onto the programmer. I'll close the pull in the morning.Still, I find it horrible to have to create a named "dummy" variable just when I simply want to pass a copy of my range.Why you are afraid to declaring "dummy" variable? formattedRead is a parser, not an algorithm (as I said in the pull request comment). After calling it, zero or more elements will remain. And, in almost cases, the remains will be used other purpose, or just checked that is empty. int n = formattedRead(input_range, fmt, args...); next_parsing(input_range); // reusing input_range assert(input_range.empty); // or just checked that is empty If formattedRead can receive rvalue, calling it would ignore the remains, and it will cause hidden bug. int n = formattedRead(r.save, fmt, args...); // If the remains is not empty, it is ignored. Is this expected, or something logical bug? auto dummy = r.save; int n = formattedRead(dummy, fmt, args...); assert(dummy.empty); // You can assert that remains should be empty. formattedRead returns multiple states (the values which are read, how many values are read, and remains of input), so allowing to ignore them would introduce bad usage and possibilities of bugs. Kenji Hara
Sep 08 2012