digitalmars.D - *Should* char[] be an output range?
- monarch_dodra (19/19) Sep 02 2013 I'm finalizing work on improvements to "put". The main
- Jonathan M Davis (20/44) Sep 03 2013 The main problem has to do with what you do when there's not enough room...
- monarch_dodra (37/71) Sep 04 2013 Thanks for your reply. Yeah, the debate always boils back down to
I'm finalizing work on improvements to "put". The main improvement is that put will now be able to transcode on the fly, allowing it to do things such as putting a dchar into char sink. What this means is that "(const(char)[]){}" is now considered an output range for dchar. This wasn't the case before, and a source of bugs in functions like formattedWrite: They didn't get much visibility, since Appender (the sink of choice for tests) can do it natively. But passing a delegate char sink to formattedWrite often ended in error. In any case, my main question is: *currently*, "char[]" isn't considered an output range. Shouldn't it though? The rationale is that it contains dchar elements, and we don't know how to "put" a dechar in a char[]'s front. But we do now. Was this just an implementation restriction? Or is there a real good reason to not allow it? In my code, it's a one line tweak to "unlock" char[] as a full fledged output range for char/wchar/dchar/string/wstring/dstring. Should I do it?
Sep 02 2013
On Monday, September 02, 2013 10:04:20 monarch_dodra wrote:I'm finalizing work on improvements to "put". The main improvement is that put will now be able to transcode on the fly, allowing it to do things such as putting a dchar into char sink. What this means is that "(const(char)[]){}" is now considered an output range for dchar. This wasn't the case before, and a source of bugs in functions like formattedWrite: They didn't get much visibility, since Appender (the sink of choice for tests) can do it natively. But passing a delegate char sink to formattedWrite often ended in error. In any case, my main question is: *currently*, "char[]" isn't considered an output range. Shouldn't it though? The rationale is that it contains dchar elements, and we don't know how to "put" a dechar in a char[]'s front. But we do now. Was this just an implementation restriction? Or is there a real good reason to not allow it? In my code, it's a one line tweak to "unlock" char[] as a full fledged output range for char/wchar/dchar/string/wstring/dstring. Should I do it?The main problem has to do with what you do when there's not enough room to write to it. Even checking the length isn't enough, because it's unknown ahead of time whether a particular code point (or string of characters if calling put with multiple characters) will fit - at least not without converting the character to UTF-8 first, which would incur additional cost, since presumably that would result in it being converted twice (once to check and once when actually putting it). Of course, it's a problem in general with output ranges that we haven't defined how to check whether put will succeed or what the normal procedure is when it's going to fail. My first suggestion for that would be to make it so that put returned whether it was successful or not, but it's something that probably needs to be discussed. However, with that problem solved, it may be reasonable to make char[] an output range. But until we sort out some of the remaining details of output ranges (most particularly how to deal with put failing, but there are probably other issues that I'm not thinking of at the moment), I don't think that it's a good idea to change how char[] is treated, since how all of that is sorted out could have an effect on what we do with char[]. - Jonathan M Davis
Sep 03 2013
On Wednesday, 4 September 2013 at 05:00:58 UTC, Jonathan M Davis wrote:The main problem has to do with what you do when there's not enough room to write to it. Even checking the length isn't enough, because it's unknown ahead of time whether a particular code point (or string of characters if calling put with multiple characters) will fit - at least not without converting the character to UTF-8 first, which would incur additional cost, since presumably that would result in it being converted twice (once to check and once when actually putting it). Of course, it's a problem in general with output ranges that we haven't defined how to check whether put will succeed or what the normal procedure is when it's going to fail. My first suggestion for that would be to make it so that put returned whether it was successful or not, but it's something that probably needs to be discussed. However, with that problem solved, it may be reasonable to make char[] an output range. But until we sort out some of the remaining details of output ranges (most particularly how to deal with put failing, but there are probably other issues that I'm not thinking of at the moment), I don't think that it's a good idea to change how char[] is treated, since how all of that is sorted out could have an effect on what we do with char[]. - Jonathan M DavisThanks for your reply. Yeah, the debate always boils back down to what "are arrays/input ranges output ranges, and what do we do when they are "full" (eg empty), and can we even detect it". I think that given your answer, I will simply *not* make them output ranges, but I *will* make sure that making them as such is easy. FYI (but it might need a little bit more work), I think I may be on to something. In my implementation, I defined a private function called "doPut". "doPut" is basically the last "atomic" operation in the "put" functionality. Given a sink "s", and an element "e", it calls exactly the correct call of : s.put(e); s.front = e; s(e); It does *not* iterate over "e", if it is a range. It does *not* attempt to check [e], and it does *not* transcode "e". What this means is that basically, if you write "doPut(s, e)", then you are putting *exactly* the single element "e" into "s". It means the s is a "native" output range for "e": It doesn't need any help from "put". From there, I defined the package trait "isNativeOutputRange(S, E)". If a pair S/E verify this "Native" variant of isOutputRange, then the user has the *guarantee* that "put(s, e)" will place *exactly* "e" into "s". This is particularly interesting for: 1. InputRanges: if the range is not empty, the *put* is guarateed to not overflow. 2. Certain format functions, such as "std.encoding.encode(C, S)(dchar c, S sink)", will transcode c into elements of type C, and place them in the output range S. Here, it is *vital* that no transcoding happen. My "doPut"/Native trait help guarantee this. -------- Well, right now, it is only used as private implementation detail, but it works pretty good. It might be worth investigating in more details if we want this public?
Sep 04 2013