digitalmars.D - DIP76: Autodecode Should Not Throw
- Walter Bright (1/1) Apr 06 2015 http://wiki.dlang.org/DIP76
- Vladimir Panteleev (3/4) Apr 06 2015 I am against this. It can lead to silent irreversible data
- Vladimir Panteleev (8/12) Apr 06 2015 Instead, I would like to suggest promoting the use of `handle`
- w0rp (18/22) Apr 07 2015 I can see the value in both.
- Vladimir Panteleev (6/9) Apr 07 2015 No no no, terrible idea. This means your program will pass your
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/20) Apr 07 2015 I'd say that invalid UTF8 in `string`s _is_ a logic error,
- John Carter (21/25) Apr 07 2015 Sigh!
- Kagamin (7/8) Apr 07 2015 Deprecation can be reported by checking version:
- Dicebot (9/10) Apr 07 2015 I have doubts about it similar to Vladimir. Main problem is that
- Walter Bright (11/18) Apr 07 2015 With UTF strings, if you care about invalid UTF (a surprisingly large am...
- Vladimir Panteleev (8/28) Apr 07 2015 Yes, but std.conv doesn't return NaN if you try to convert
- bearophile (5/7) Apr 07 2015 I have suggested to add a nothrow function like "maybeTo" that
- Walter Bright (5/20) Apr 07 2015 I know, I read your post. The machinery to allocate, throw, catch, and r...
- "Ulrich =?UTF-8?B?S8O8dHRsZXIi?= <kuettler gmail.com> (7/21) Apr 07 2015 There was a time when operations on NaNs where painfully slow.
- H. S. Teoh via Digitalmars-d (9/16) Apr 07 2015 How so? There *are* possible options we can consider to migrate away
- w0rp (13/21) Apr 07 2015 I don't think we are stuck with it. I think we can change it. I
- H. S. Teoh via Digitalmars-d (7/27) Apr 07 2015 If somebody were to write a DIP for killing autodecoding, I'd vote in
- Kagamin (3/7) Apr 08 2015 http://forum.dlang.org/post/luonbfghopyrtcoejjsu@forum.dlang.org
- Daniel Kozak via Digitalmars-d (3/28) Apr 07 2015 me too
- H. S. Teoh via Digitalmars-d (7/10) Apr 07 2015 I used to be pro-autodecoding... nowadays, I'm starting to lean towards
- Abdulhaq (7/8) Apr 07 2015 The DIP lists the benefits but does not mention any cons.
- Walter Bright (3/10) Apr 07 2015 On the other hand, if there's any place where people demand the highest
- Jonathan M Davis via Digitalmars-d (15/16) Apr 19 2015 I am fully in favor of this. Most code really doesn't care about invalid
On Tuesday, 7 April 2015 at 03:17:26 UTC, Walter Bright wrote:http://wiki.dlang.org/DIP76I am against this. It can lead to silent irreversible data corruption.
Apr 06 2015
On Tuesday, 7 April 2015 at 04:05:38 UTC, Vladimir Panteleev wrote:On Tuesday, 7 April 2015 at 03:17:26 UTC, Walter Bright wrote:Instead, I would like to suggest promoting the use of `handle` and the like: http://dlang.org/phobos/std_exception.html#handle This way, code that needs to be nothrow can opt in to be nothrow via such composition, which is also aligned with that introducing the risk of silent data corruption needing to be opt-in.http://wiki.dlang.org/DIP76I am against this. It can lead to silent irreversible data corruption.
Apr 06 2015
On Tuesday, 7 April 2015 at 04:05:38 UTC, Vladimir Panteleev wrote:On Tuesday, 7 April 2015 at 03:17:26 UTC, Walter Bright wrote:I can see the value in both. With something like Objective C on iOS, basically everything is nothrow. They don't do any cleanup for references when exceptions happen, so they don't generate slower reference counting code. Exceptions in Objective C on iOS are not supposed to be caught ever. So you don't use exceptions and garbage collection, your code runs pretty fast, and your applications are smooth. On the other hand, not throwing the exceptions leads to silent failures, which can lead to creating garbage data. Objective C in particular is designed to tolerate failure, given that messages run on nil objects simply do nothing and return cast(T) 0 for the message's return type. You're in a world of checking return codes, validating data, etc. Maybe autodecoding could throw an Error (No 'new' allowed) when debug mode is on, and use replacement characters in release mode. I haven't thought it through, but that's an idea.http://wiki.dlang.org/DIP76I am against this. It can lead to silent irreversible data corruption.
Apr 07 2015
On Tuesday, 7 April 2015 at 07:42:02 UTC, w0rp wrote:Maybe autodecoding could throw an Error (No 'new' allowed) when debug mode is on, and use replacement characters in release mode. I haven't thought it through, but that's an idea.No no no, terrible idea. This means your program will pass your test suite in debug mode (which, of course, is never going to test behavior with bad UTF in all the relevant places), but silently corrupt real-world data in release mode. Errors and asserts are for logic errors, not for validating user input!
Apr 07 2015
On Tuesday, 7 April 2015 at 07:50:40 UTC, Vladimir Panteleev wrote:On Tuesday, 7 April 2015 at 07:42:02 UTC, w0rp wrote:I'd say that invalid UTF8 in `string`s _is_ a logic error, because these are defined to be valid UTF8. If they aren't, someone didn't correctly validate their inputs. Unfortunately, not even the runtime cares about UTF correctness: void main(string[] args) { import std.utf; args[1].validate; // throws }Maybe autodecoding could throw an Error (No 'new' allowed) when debug mode is on, and use replacement characters in release mode. I haven't thought it through, but that's an idea.No no no, terrible idea. This means your program will pass your test suite in debug mode (which, of course, is never going to test behavior with bad UTF in all the relevant places), but silently corrupt real-world data in release mode. Errors and asserts are for logic errors, not for validating user input!
Apr 07 2015
On Tuesday, 7 April 2015 at 04:05:38 UTC, Vladimir Panteleev wrote:On Tuesday, 7 April 2015 at 03:17:26 UTC, Walter Bright wrote:Sigh! 99.99% of the time when I'm processing text.... my program didn't create the text. An eclectic mob of text editors driven by a herd of cats each having wildly different concepts of encoding wrote it. 99.999% of the time when I hit one of these cases... the "irreversible data corruption" is _already_ there. Tough. It's there, it's irreversible, I have to live with it and make forward progress. Sure, on some tasks I want to know it is there.... but by far in most tasks all I can do is shrug, slap it to something sensible, and carry on. One of the first things I had to do in D was write code to do this.... and it all seem way harder and slower than it needed to be. (Oh for The Simple Fun Good Bad Old Days of everything is 7 bit ASCII... except for the funny stuff above 127 which you ignored anyway.)http://wiki.dlang.org/DIP76I am against this. It can lead to silent irreversible data corruption.
Apr 07 2015
On Tuesday, 7 April 2015 at 03:17:26 UTC, Walter Bright wrote:http://wiki.dlang.org/DIP76Deprecation can be reported by checking version: version(EnableNothrowAutodecoding) alias autodecode=autodecodeImpl; else deprecated("compile with -version=EnableNothrowAutodecoding") alias autodecode=autodecodeImpl;
Apr 07 2015
On Tuesday, 7 April 2015 at 03:17:26 UTC, Walter Bright wrote:http://wiki.dlang.org/DIP76I have doubts about it similar to Vladimir. Main problem is that I have no idea what actually happens if replacement characters appear in some unicode text my program processes. So far I have that calming feeling that if something goes wrong in this regard, exception will slap me right into my face. Also it is worrying to see so much effort put into `nothrow` in language which endorses exceptions as its main error reporting mechanism.
Apr 07 2015
On 4/7/2015 1:19 AM, Dicebot wrote:I have doubts about it similar to Vladimir. Main problem is that I have no idea what actually happens if replacement characters appear in some unicode text my program processes.It's much like floating point NaN values, which are 'sticky'.So far I have that calming feeling that if something goes wrong in this regard, exception will slap me right into my face.With UTF strings, if you care about invalid UTF (a surprisingly large amount of operations done on strings simply don't care about invalid UTF) the validation can be done as a separate step. Then, the program logic is divided into operating on "validated" and "unvalidated" data.Also it is worrying to see so much effort put into `nothrow` in language which endorses exceptions as its main error reporting mechanism.There is definitely a tug of war going on there. Exceptions are great, except they aren't free. What I've tried to do is design things so that erroneous input is not possible - that all possible input has straightforward output. In other words, try to define the problem out of existence. Then there are no errors.
Apr 07 2015
On Tuesday, 7 April 2015 at 09:04:09 UTC, Walter Bright wrote:On 4/7/2015 1:19 AM, Dicebot wrote:Yes, but std.conv doesn't return NaN if you try to convert "banana" to a double.I have doubts about it similar to Vladimir. Main problem is that I have no idea what actually happens if replacement characters appear in some unicode text my program processes.It's much like floating point NaN values, which are 'sticky'.With UTF strings, if you care about invalid UTF (a surprisingly large amount of operations done on strings simply don't care about invalid UTF) the validation can be done as a separate step.So can converting invalid UTF to replacement characters.I think the correct solution to that is to kill auto-decoding :) Then all decoding is explicit, and since it is explicit, it is trivial to allow specifying the desired behavior upon encountering invalid UTF-8.Also it is worrying to see so much effort put into `nothrow` in language which endorses exceptions as its main error reporting mechanism.There is definitely a tug of war going on there. Exceptions are great, except they aren't free. What I've tried to do is design things so that erroneous input is not possible - that all possible input has straightforward output. In other words, try to define the problem out of existence. Then there are no errors.
Apr 07 2015
Vladimir Panteleev:std.conv doesn't return NaN if you try to convert "banana" to a double.I have suggested to add a nothrow function like "maybeTo" that returns a Nullable result. Bye, bearophile
Apr 07 2015
On 4/7/2015 2:10 AM, Vladimir Panteleev wrote:On Tuesday, 7 April 2015 at 09:04:09 UTC, Walter Bright wrote:Maybe it should :-)On 4/7/2015 1:19 AM, Dicebot wrote:Yes, but std.conv doesn't return NaN if you try to convert "banana" to a double.I have doubts about it similar to Vladimir. Main problem is that I have no idea what actually happens if replacement characters appear in some unicode text my program processes.It's much like floating point NaN values, which are 'sticky'.I know, I read your post. The machinery to allocate, throw, catch, and replace is still there.With UTF strings, if you care about invalid UTF (a surprisingly large amount of operations done on strings simply don't care about invalid UTF) the validation can be done as a separate step.So can converting invalid UTF to replacement characters.I think the correct solution to that is to kill auto-decoding :) Then all decoding is explicit, and since it is explicit, it is trivial to allow specifying the desired behavior upon encountering invalid UTF-8.I agree autodecoding is a mistake, but we're stuck with it.
Apr 07 2015
On Tuesday, 7 April 2015 at 09:21:52 UTC, Walter Bright wrote:On 4/7/2015 2:10 AM, Vladimir Panteleev wrote:There was a time when operations on NaNs where painfully slow. Also, since NaNs tend to spread, once a NaN appears, there usual is not much of a result left. Debugging used to be painfully hard if NaNs are enabled. We used to rely on floating point exceptions instead. This might or might not be relevant.On Tuesday, 7 April 2015 at 09:04:09 UTC, Walter Bright wrote:Maybe it should :-)On 4/7/2015 1:19 AM, Dicebot wrote:Yes, but std.conv doesn't return NaN if you try to convert "banana" to a double.I have doubts about it similar to Vladimir. Main problem is that I have no idea what actually happens if replacement characters appear in some unicode text my program processes.It's much like floating point NaN values, which are 'sticky'.
Apr 07 2015
On Tue, Apr 07, 2015 at 02:21:50AM -0700, Walter Bright via Digitalmars-d wrote:On 4/7/2015 2:10 AM, Vladimir Panteleev wrote:[...]How so? There *are* possible options we can consider to migrate away from autodecoding. AFAICT the real roadblock here is that some people strongly disagree with this, so it's more a community barrier than a technical one. T -- Unix is my IDE. -- Justin WhearI think the correct solution to that is to kill auto-decoding :) Then all decoding is explicit, and since it is explicit, it is trivial to allow specifying the desired behavior upon encountering invalid UTF-8.I agree autodecoding is a mistake, but we're stuck with it.
Apr 07 2015
On Tuesday, 7 April 2015 at 09:21:52 UTC, Walter Bright wrote:On 4/7/2015 2:10 AM, Vladimir Panteleev wrote:I don't think we are stuck with it. I think we can change it. I think a lot of the automatic decoding happens inside of Phobos, while people care mostly about the boundaries of the API. If we do get rid of it, then as Vladimir says, you can opt in to whether or not you want a non-throwing conversion, or a throwing one. I was going to write about how the auto decoding doesn't solve the problem of comparing strings, given that you need to look at ranges of characters, subject to normalisation, unless you're dealing with just ASCII. I think all of that has been said to death, though. I think it's possible for us to get rid of automatic decoding.I think the correct solution to that is to kill auto-decoding :) Then all decoding is explicit, and since it is explicit, it is trivial to allow specifying the desired behavior upon encountering invalid UTF-8.I agree autodecoding is a mistake, but we're stuck with it.
Apr 07 2015
On Tue, Apr 07, 2015 at 06:00:12PM +0000, w0rp via Digitalmars-d wrote:On Tuesday, 7 April 2015 at 09:21:52 UTC, Walter Bright wrote:If somebody were to write a DIP for killing autodecoding, I'd vote in favor. Getting it past Andrei, OTOH, is a different story. ;-) T -- Never trust an operating system you don't have source for! -- Martin SchulzeOn 4/7/2015 2:10 AM, Vladimir Panteleev wrote:I don't think we are stuck with it. I think we can change it. I think a lot of the automatic decoding happens inside of Phobos, while people care mostly about the boundaries of the API. If we do get rid of it, then as Vladimir says, you can opt in to whether or not you want a non-throwing conversion, or a throwing one. I was going to write about how the auto decoding doesn't solve the problem of comparing strings, given that you need to look at ranges of characters, subject to normalisation, unless you're dealing with just ASCII. I think all of that has been said to death, though. I think it's possible for us to get rid of automatic decoding.I think the correct solution to that is to kill auto-decoding :) Then all decoding is explicit, and since it is explicit, it is trivial to allow specifying the desired behavior upon encountering invalid UTF-8.I agree autodecoding is a mistake, but we're stuck with it.
Apr 07 2015
On Tuesday, 7 April 2015 at 18:18:55 UTC, H. S. Teoh wrote:If somebody were to write a DIP for killing autodecoding, I'd vote in favor. Getting it past Andrei, OTOH, is a different story. ;-)http://forum.dlang.org/post/luonbfghopyrtcoejjsu forum.dlang.org But how DIP can address a non-technical issue?
Apr 08 2015
On Tue, 7 Apr 2015 11:16:16 -0700 "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> wrote:On Tue, Apr 07, 2015 at 06:00:12PM +0000, w0rp via Digitalmars-d wrote:me tooOn Tuesday, 7 April 2015 at 09:21:52 UTC, Walter Bright wrote:If somebody were to write a DIP for killing autodecoding, I'd vote in favor.On 4/7/2015 2:10 AM, Vladimir Panteleev wrote:I don't think we are stuck with it. I think we can change it. I think a lot of the automatic decoding happens inside of Phobos, while people care mostly about the boundaries of the API. If we do get rid of it, then as Vladimir says, you can opt in to whether or not you want a non-throwing conversion, or a throwing one. I was going to write about how the auto decoding doesn't solve the problem of comparing strings, given that you need to look at ranges of characters, subject to normalisation, unless you're dealing with just ASCII. I think all of that has been said to death, though. I think it's possible for us to get rid of automatic decoding.I think the correct solution to that is to kill auto-decoding :) Then all decoding is explicit, and since it is explicit, it is trivial to allow specifying the desired behavior upon encountering invalid UTF-8.I agree autodecoding is a mistake, but we're stuck with it.
Apr 07 2015
On Tue, Apr 07, 2015 at 09:10:32AM +0000, Vladimir Panteleev via Digitalmars-d wrote: [...]I think the correct solution to that is to kill auto-decoding :) Then all decoding is explicit, and since it is explicit, it is trivial to allow specifying the desired behavior upon encountering invalid UTF-8.I used to be pro-autodecoding... nowadays, I'm starting to lean towards killing it. This is another nail in the coffin. T -- He who sacrifices functionality for ease of use, loses both and deserves neither. -- Slashdotter
Apr 07 2015
On Tuesday, 7 April 2015 at 03:17:26 UTC, Walter Bright wrote:http://wiki.dlang.org/DIP76The DIP lists the benefits but does not mention any cons. A con that I can see is that it is violating the 'fail fast' principle. By silently replacing data the developer will be presented with a probably-hard-to-debug problem later down the application lifecyle (probably in an unrelated area), wasting developer time.
Apr 07 2015
On 4/7/2015 5:04 AM, Abdulhaq wrote:On Tuesday, 7 April 2015 at 03:17:26 UTC, Walter Bright wrote:On the other hand, if there's any place where people demand the highest performance, it's string processing.http://wiki.dlang.org/DIP76The DIP lists the benefits but does not mention any cons. A con that I can see is that it is violating the 'fail fast' principle. By silently replacing data the developer will be presented with a probably-hard-to-debug problem later down the application lifecyle (probably in an unrelated area), wasting developer time.
Apr 07 2015
On Monday, April 06, 2015 20:16:19 Walter Bright via Digitalmars-d wrote:http://wiki.dlang.org/DIP76I am fully in favor of this. Most code really doesn't care about invalid unicode, and if it does, it can check explicitly. Using the replacement character is much cleaner and follows the Unicode standard. And in my experience, if I run into invalid Unicode, I generally have to process it regardless, forcing me to do something like use the replacement character anyway. The fact that std.utf.decode throws just becomes an annoyance. About the only real downside to this that I can think of is that if you're writing a new string algorithm, and you botch it such that it mangles the Unicode, right now, you'd quickly get exceptions, whereas with this change, you wouldn't. But if you're testing your string-based code with Unicode rather than just ASCII, then that should still get caught. Regardless, I think that this is the way to go. - Jonathan M Davis
Apr 19 2015