digitalmars.D - char[] annoyance...
- Regan Heath (20/20) Apr 09 2006 Take this code:
- Derek Parnell (22/47) Apr 09 2006 I too have tripped up on this 'bug' and it is very annoying and surprisi...
- Regan Heath (7/53) Apr 09 2006 Thanks Derek, that's exactly what I did.
- Derek Parnell (17/74) Apr 09 2006 Of course; that was obvious. No put-down was implied ;-)
- Regan Heath (38/113) Apr 09 2006 You're right, in that this bug is caused by a side-effect of the languag...
- Bruno Medeiros (9/80) Apr 10 2006 I agree with Regan in that the code is "conceptually" well-expressed, it...
- Georg Wrede (18/79) Apr 10 2006 First, how come you don't use this code?
- Regan Heath (6/88) Apr 10 2006 Because speed wasn't important in this case (once-off console app)
- Lionello Lunesu (7/16) Apr 10 2006 It seems silly to add another 'if', knowing that the compiler could
- S. Chancellor (7/38) Apr 10 2006 if( line[i.. (i+2 > $ ? $ : i +2 ) ] != "||")
- S. Chancellor (7/50) Apr 10 2006 Erps, I should skim your post better. Yeah that's a problem, you can
- Derek Parnell (20/24) Apr 10 2006 No, that is fine and along the same lines as the original code. As is
- Regan Heath (7/31) Apr 10 2006 In my case there were several || on the lines, eg.
- Kevin Bealer (23/44) Apr 10 2006 I see the "gotcha" here, and whether conversions should be done is an
- Regan Heath (18/77) Apr 10 2006 You're probably right.. now, if only I could train my brain to think of ...
- xs0 (20/114) Apr 11 2006 Wouldn't it be somewhat the best, if signed/unsigned comparison actually...
- Kevin Bealer (27/140) Apr 11 2006 True, hadn't thought of that. I guess the correct thing (mathematically...
- Georg Wrede (5/156) Apr 11 2006 I agree.
Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this? Regan
Apr 09 2006
On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this?I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... } -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 10/04/2006 11:54:09 AM
Apr 09 2006
On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:Thanks Derek, that's exactly what I did. The point of this post isn't to get help with one specific problem but rather to ask whether there is a solution to the underlying problem behaviour. I suspect this behaviour will be the source of many bugs to come and I wonder if there is a way to avoid them. ReganTake this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this?I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }
Apr 09 2006
On Mon, 10 Apr 2006 14:04:35 +1200, Regan Heath wrote:On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:Of course; that was obvious. No put-down was implied ;-)On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:Thanks Derek, that's exactly what I did.Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this?I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }The point of this post isn't to get help with one specific problem but rather to ask whether there is a solution to the underlying problem behaviour. I suspect this behaviour will be the source of many bugs to come and I wonder if there is a way to avoid them.The point of my reply was that the "solution to the underlying problem behaviour" is to properly express (and thus document) the algorithm rather than rely on side-effects of the implemented language. In this case, as an example of such algorithm documentation, if one explicitly states that the search only applies to lines of two or more characters, it helps the reader of the code understand your intentions. Without that, the code phrase (i < line.length-2) might not trigger the reader's thoughts about handling short lines. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 10/04/2006 12:10:09 PM
Apr 09 2006
On Mon, 10 Apr 2006 12:17:30 +1000, Derek Parnell <derek psych.ward> wrote:On Mon, 10 Apr 2006 14:04:35 +1200, Regan Heath wrote:and none was taken. :)On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:Of course; that was obvious. No put-down was implied ;-)On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:Thanks Derek, that's exactly what I did.Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this?I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }You're right, in that this bug is caused by a side-effect of the language. I disagree that the algorithm is/was incompletely/incorrectly expressed (more on this below). WRT the side-effect; My question is, do we attempt to prevent future bugs of this sort or do we ignore it? Off the top of my head, we can: a - avoid the side-effect i.e. make length signed. b - attempt to catch it, and others like it. i.e. add a signed/unsigned warning. c - have a type with the range of an unsigned type and a lower bound of 0 (no underflow)The point of this post isn't to get help with one specific problem but rather to ask whether there is a solution to the underlying problem behaviour. I suspect this behaviour will be the source of many bugs to come and I wonder if there is a way to avoid them.The point of my reply was that the "solution to the underlying problem behaviour" is to properly express (and thus document) the algorithm rather than rely on side-effects of the implemented language.In this case, as an example of such algorithm documentation, if one explicitly states that the search only applies to lines of two or more characters, it helps the reader of the code understand your intentions. Without that, the code phrase (i < line.length-2) might not trigger the reader's thoughts about handling short lines.I agree with the general statement; you should express the algorithm in a clear fashion. In fact that it why I think we need to fix this. This side-effect results in a less clear/concise expression of the actual algorithm. It results in a superfluous check for a non-existant special case. Allow me to elaborate.. From a purely algorithmic perspective there is nothing special about short lines in this case, the same rule applies as to any other line, "the index 'i' starts at 0 and must always be less than the length of the line - 2". I believe this is a complete and correct expression of the algorithm itself. However, a side-effect of the representation (code/language) is creating a special case for short lines which does not logically exist. A special case which is not obvious at the time the code is written, resulting in a bug. A special case which complicates the representation of the algorithm (the code) by requiring we check for short lines for no gain and some loss in terms of performance (negligible) and readability. Of course you can argue that the limitations of the representation (code/language) are always going to affect how we can express any given algorithm, this is true. Ideally however we should be able to express an algorithm as precisely/exactly as possible without a risk of a side-effects causing them to fail. At the very least, in this case, we can make the programmer aware of the side effect where it applies. Regan
Apr 09 2006
Derek Parnell wrote:On Mon, 10 Apr 2006 14:04:35 +1200, Regan Heath wrote:I agree with Regan in that the code is "conceptually" well-expressed, it is instead a nasty and unexpected language shortcoming that results in the bug. I, however, at the moment see no good way to solve this language shortcoming, and there may not even be one. :-( (Also: "char[] annoyance..." is an unfit thread title.) -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DOn Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:Of course; that was obvious. No put-down was implied ;-)On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:Thanks Derek, that's exactly what I did.Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this?I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }The point of this post isn't to get help with one specific problem but rather to ask whether there is a solution to the underlying problem behaviour. I suspect this behaviour will be the source of many bugs to come and I wonder if there is a way to avoid them.The point of my reply was that the "solution to the underlying problem behaviour" is to properly express (and thus document) the algorithm rather than rely on side-effects of the implemented language. In this case, as an example of such algorithm documentation, if one explicitly states that the search only applies to lines of two or more characters, it helps the reader of the code understand your intentions. Without that, the code phrase (i < line.length-2) might not trigger the reader's thoughts about handling short lines.
Apr 10 2006
Regan Heath wrote:On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:First, how come you don't use this code? for (int i=0; i<line.length-2; i++) { if (line[i..i] == "|") if (line[i+1..i+1] == "|") continue; } It should be more than twice as fast. Second, isn't your code just an excellent example of why we have bounds checking in D in the first place? I think the obvious and clear-for-the-reader solution is to wrap that code in an if: if (line.length > 1) { // the above code here } I can't see what would be wrong with that.On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:Thanks Derek, that's exactly what I did. The point of this post isn't to get help with one specific problem but rather to ask whether there is a solution to the underlying problem behaviour. I suspect this behaviour will be the source of many bugs to come and I wonder if there is a way to avoid them.Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this?I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }
Apr 10 2006
On Mon, 10 Apr 2006 17:42:33 +0300, Georg Wrede <georg.wrede nospam.org> wrote:Regan Heath wrote:Because speed wasn't important in this case (once-off console app)On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:First, how come you don't use this code? for (int i=0; i<line.length-2; i++) { if (line[i..i] == "|") if (line[i+1..i+1] == "|") continue; } It should be more than twice as fast.On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:Thanks Derek, that's exactly what I did. The point of this post isn't to get help with one specific problem but rather to ask whether there is a solution to the underlying problem behaviour. I suspect this behaviour will be the source of many bugs to come and I wonder if there is a way to avoid them.Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this?I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }Second, isn't your code just an excellent example of why we have bounds checking in D in the first place?Bounds checking certainly helps.I think the obvious and clear-for-the-reader solution is to wrap that code in an if: if (line.length > 1) { // the above code here } I can't see what would be wrong with that.See my reply to Derek for why I think it's 'wrong'. Regan
Apr 10 2006
if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }It seems silly to add another 'if', knowing that the compiler could simply use the signed branch instructions instead of the unsigned ones. Does the spec define what happens when comparing a signed against an unsigned type? Perhaps one or the other should be manually cast in this case. Or, force the comparison as signed. That would have prevented the length-2 bug. L.
Apr 10 2006
On 2006-04-09 17:23:06 -0700, "Regan Heath" <regan netwin.co.nz> said:Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this? Reganif( line[i.. (i+2 > $ ? $ : i +2 ) ] != "||") { ... } Is that not allowed in the [..] syntax? -S.
Apr 10 2006
On 2006-04-10 00:50:24 -0700, S. Chancellor <dnewsgr mephit.kicks-ass.org> said:On 2006-04-09 17:23:06 -0700, "Regan Heath" <regan netwin.co.nz> said:Erps, I should skim your post better. Yeah that's a problem, you can fix it without the other if though. for(int i = 0; i + 2 < line.length; i++) { Or am I still missing something? Hehe. -STake this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this? Reganif( line[i.. (i+2 > $ ? $ : i +2 ) ] != "||") { ... } Is that not allowed in the [..] syntax? -S.
Apr 10 2006
On Mon, 10 Apr 2006 17:54:25 +1000, S. Chancellor <dnewsgr mephit.kicks-ass.org> wrote:Erps, I should skim your post better. Yeah that's a problem, you can fix it without the other if though. for(int i = 0; i + 2 < line.length; i++) { Or am I still missing something? Hehe.No, that is fine and along the same lines as the original code. As is for(int i = 0; i < (cast(int)line.length)-2; i++) { however I still prefer telling the code reader what I'm doing etc... in clear, simple code. Although your code will work, it just isn't as clear at first glance as if (line.length >= 2) { ... } but of course an even better way (IMHO) ... int posn = std.string.find(line, "||"); if (posn >= 0) { // etc .... } -- Derek Parnell Melbourne, Australia
Apr 10 2006
On Mon, 10 Apr 2006 18:51:00 +1000, Derek Parnell <derek psych.ward> wrote:On Mon, 10 Apr 2006 17:54:25 +1000, S. Chancellor <dnewsgr mephit.kicks-ass.org> wrote:In my case there were several || on the lines, eg. ..etc..||<A>_desc||..etc..||<B>_desc||..etc.. and I'm actually rewriting them as: ..etc..||thing(<A>_desc)||..etc..||thing(<B>_desc)||..etc.. but, yeah, it's possible to use find. :) ReganErps, I should skim your post better. Yeah that's a problem, you can fix it without the other if though. for(int i = 0; i + 2 < line.length; i++) { Or am I still missing something? Hehe.No, that is fine and along the same lines as the original code. As is for(int i = 0; i < (cast(int)line.length)-2; i++) { however I still prefer telling the code reader what I'm doing etc... in clear, simple code. Although your code will work, it just isn't as clear at first glance as if (line.length >= 2) { ... } but of course an even better way (IMHO) ... int posn = std.string.find(line, "||"); if (posn >= 0) { // etc .... }
Apr 10 2006
In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this? ReganI see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.for(int i = 0; i+2 < line.length; i++) {Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range. More generally, always do the arithmetic with the signed variables in cases like this if there is a possibility of wraparound. But if the automatic conversions were specified to do uint->int that would also be fine with me. One can also imagine a language with the following inequalities: +> unsigned greater-than +< unsigned less than -> signed greater than -< signed less than +>= unsigned greater-or-equal etc This would not replace the current >, <, but would essentially just be shorthand for existing expressions: (A +> B) is the same as (cast(uint)A > cast(uint)B) .. and so on, except that uint, ulong, ushort, etc would be chosen as needed. Is this worth doing? Maybe - it's not a big deal to me, but the signed/unsigned question does represent a common gotcha in certain expressions. Kevin
Apr 10 2006
On Mon, 10 Apr 2006 16:28:00 +0000 (UTC), Kevin Bealer <Kevin_member pathlink.com> wrote:In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...You're probably right.. now, if only I could train my brain to think of it that way round :)Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this? ReganI see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.for(int i = 0; i+2 < line.length; i++) {Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range.More generally, always do the arithmetic with the signed variables in cases like this if there is a possibility of wraparound.That's good advice, however you have to realise there is a possibility of wraparound, something I didn't do (or I wouldn't have had the problem in the first place) :(But if the automatic conversions were specified to do uint->int that would also be fine with me.But wouldn't that introduce a bug here: int a = int.max-1; uint b = int.max+1; assert(a < b); wouldn't b be promoted to a signed value of <some negative number>?One can also imagine a language with the following inequalities: +> unsigned greater-than +< unsigned less than -> signed greater than -< signed less than +>= unsigned greater-or-equal etc This would not replace the current >, <, but would essentially just be shorthand for existing expressions: (A +> B) is the same as (cast(uint)A > cast(uint)B) .. and so on, except that uint, ulong, ushort, etc would be chosen as needed. Is this worth doing? Maybe - it's not a big deal to me, but the signed/unsigned question does represent a common gotcha in certain expressions.It's a good idea, but, same problem as before; you have to realise length is signed and you have to realise there is a chance for wraparound. A warning about the signed/unsigned comparrison is the very least we should do. I would be tempted to even make it an outright error requiring a cast (or one of these new operators above) to handle. Regan
Apr 10 2006
Regan Heath wrote:On Mon, 10 Apr 2006 16:28:00 +0000 (UTC), Kevin Bealer <Kevin_member pathlink.com> wrote:Wouldn't it be somewhat the best, if signed/unsigned comparison actually worked correctly in mathematical sense? int a; uint b; (a<b) becomes (a<0 || cast(uint)a < b) (a==b) becomes (a>=0 && cast(uint)a == b) (a>b) becomes (a>=0 && cast(uint)a > b) Unfortunately, that doesn't solve the original problem if (a < b.length-2) What would solve it is making .length signed. Even though it should puristically be unsigned, and losing one bit could theoretically be a problem (because you couldn't allocate new byte[more than 2GB], the reality is that a 32-bit app can't allocate more than 2GB of RAM anyway (in Windows at least), and for 64-bit apps, losing that one bit is totally insignificant, unless someone expects a computer with 9 exabytes of ram anytime soon :) And the only place really needing change (and a trivial one) would be the array resizing code.. xs0In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...You're probably right.. now, if only I could train my brain to think of it that way round :)Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this? ReganI see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.for(int i = 0; i+2 < line.length; i++) {Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range.More generally, always do the arithmetic with the signed variables in cases like this if there is a possibility of wraparound.That's good advice, however you have to realise there is a possibility of wraparound, something I didn't do (or I wouldn't have had the problem in the first place) :(But if the automatic conversions were specified to do uint->int that would also be fine with me.But wouldn't that introduce a bug here: int a = int.max-1; uint b = int.max+1; assert(a < b); wouldn't b be promoted to a signed value of <some negative number>?One can also imagine a language with the following inequalities: +> unsigned greater-than +< unsigned less than -> signed greater than -< signed less than +>= unsigned greater-or-equal etc This would not replace the current >, <, but would essentially just be shorthand for existing expressions: (A +> B) is the same as (cast(uint)A > cast(uint)B) .. and so on, except that uint, ulong, ushort, etc would be chosen as needed. Is this worth doing? Maybe - it's not a big deal to me, but the signed/unsigned question does represent a common gotcha in certain expressions.It's a good idea, but, same problem as before; you have to realise length is signed and you have to realise there is a chance for wraparound. A warning about the signed/unsigned comparrison is the very least we should do. I would be tempted to even make it an outright error requiring a cast (or one of these new operators above) to handle. Regan
Apr 11 2006
In article <e1ftsj$1spp$1 digitaldaemon.com>, xs0 says...Regan Heath wrote:True, hadn't thought of that. I guess the correct thing (mathematically) is what is proposed by xs0 below (however, see comments).On Mon, 10 Apr 2006 16:28:00 +0000 (UTC), Kevin Bealer <Kevin_member pathlink.com> wrote:In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...You're probably right.. now, if only I could train my brain to think of it that way round :)Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this? ReganI see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.for(int i = 0; i+2 < line.length; i++) {Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range.More generally, always do the arithmetic with the signed variables in cases like this if there is a possibility of wraparound.That's good advice, however you have to realise there is a possibility of wraparound, something I didn't do (or I wouldn't have had the problem in the first place) :(But if the automatic conversions were specified to do uint->int that would also be fine with me.But wouldn't that introduce a bug here: int a = int.max-1; uint b = int.max+1; assert(a < b); wouldn't b be promoted to a signed value of <some negative number>?I like this conceptually - it seems like definitely the correct approach for a language like python, but it has the unfortunate effect of adding a conditional to what is otherwise a one-instruction subtraction (< is a subtract in asm).Wouldn't it be somewhat the best, if signed/unsigned comparison actually worked correctly in mathematical sense? int a; uint b; (a<b) becomes (a<0 || cast(uint)a < b) (a==b) becomes (a>=0 && cast(uint)a == b) (a>b) becomes (a>=0 && cast(uint)a > b)One can also imagine a language with the following inequalities: +> unsigned greater-than +< unsigned less than -> signed greater than -< signed less than +>= unsigned greater-or-equal etc This would not replace the current >, <, but would essentially just be shorthand for existing expressions: (A +> B) is the same as (cast(uint)A > cast(uint)B) .. and so on, except that uint, ulong, ushort, etc would be chosen as needed. Is this worth doing? Maybe - it's not a big deal to me, but the signed/unsigned question does represent a common gotcha in certain expressions.It's a good idea, but, same problem as before; you have to realise length is signed and you have to realise there is a chance for wraparound. A warning about the signed/unsigned comparrison is the very least we should do. I would be tempted to even make it an outright error requiring a cast (or one of these new operators above) to handle. ReganUnfortunately, that doesn't solve the original problem if (a < b.length-2) What would solve it is making .length signed. Even though it should puristically be unsigned, and losing one bit could theoretically be a problem (because you couldn't allocate new byte[more than 2GB], the reality is that a 32-bit app can't allocate more than 2GB of RAM anyway (in Windows at least), and for 64-bit apps, losing that one bit is totally insignificant, unless someone expects a computer with 9 exabytes of ram anytime soon :) And the only place really needing change (and a trivial one) would be the array resizing code.. xs0The unsigned .length would fix this particular issue. But I think this might be one of those things where every solution seems to cause problems in some other area. For instance, in C, char is (usually) signed, which is good for small numbers, but when you use char as an array index, it walks off the front of the array for anything over 128. Lots and lots of C programs have bugs from this. It's actually up to the compiler writer whether char is signed in C... As a result, when writing crypto and hash function code in C, people usually have to cast the types to get reliable output. Such code is often peppered by casts, to force size and signedness to the right kind. Sometimes you see code where about half of these are completely unnecessary, but the author simply decided that *Let's never have to fix this thing again* and did a cast for every math operation. I realize this is all tangential, but I think that there are three or four ways to handle this and all of them probably have a few "gotchas". You'd almost have to write a big program with the signed version of .length to know if you got bit somehow. I checked a hunch - it looks like .length was changed to size_t in dmd 0.77. It's also mentioned in the portability section of the documentation: "The .length, .size, .sizeof, and .alignof properties will be of type size_t." Not sure what it was before .77. Kevin
Apr 11 2006
Kevin Bealer wrote:In article <e1ftsj$1spp$1 digitaldaemon.com>, xs0 says...I agree. This is not my official opinion, but I sometimes feel like the guy who invented Cruise Control, and this customer comes back and complains that what's the use, he still has to use the steering wheel.Regan Heath wrote:True, hadn't thought of that. I guess the correct thing (mathematically) is what is proposed by xs0 below (however, see comments).On Mon, 10 Apr 2006 16:28:00 +0000 (UTC), Kevin Bealer <Kevin_member pathlink.com> wrote:In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...You're probably right.. now, if only I could train my brain to think of it that way round :)Take this code: void main() { //..open a file, read line for line, on each line: for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } There is a subtle bug. On all lines with a length of 0 or 1 it will give the following error: Error: ArrayBoundsError line_length.d(6) The problem is of course the statement "i < line.length-2". line.length is unsigned, and when you - 2 from an unsigned value.. well lets just say that it's bigger than the actual length of the line - 2. Of course there are plently of other ways to code this, perhaps using foreach, but that's not the point. The point is that this code _can_ be written and on the surface looks fine. Not even -w (warnings) spots the signed/unsigned problem. At the very least can we get a warning for this? ReganI see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.for(int i = 0; i+2 < line.length; i++) {Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range.More generally, always do the arithmetic with the signed variables in cases like this if there is a possibility of wraparound.That's good advice, however you have to realise there is a possibility of wraparound, something I didn't do (or I wouldn't have had the problem in the first place) :(But if the automatic conversions were specified to do uint->int that would also be fine with me.But wouldn't that introduce a bug here: int a = int.max-1; uint b = int.max+1; assert(a < b); wouldn't b be promoted to a signed value of <some negative number>?I like this conceptually - it seems like definitely the correct approach for a language like python, but it has the unfortunate effect of adding a conditional to what is otherwise a one-instruction subtraction (< is a subtract in asm).Wouldn't it be somewhat the best, if signed/unsigned comparison actually worked correctly in mathematical sense? int a; uint b; (a<b) becomes (a<0 || cast(uint)a < b) (a==b) becomes (a>=0 && cast(uint)a == b) (a>b) becomes (a>=0 && cast(uint)a > b)One can also imagine a language with the following inequalities: +> unsigned greater-than +< unsigned less than -> signed greater than -< signed less than +>= unsigned greater-or-equal etc This would not replace the current >, <, but would essentially just be shorthand for existing expressions: (A +> B) is the same as (cast(uint)A > cast(uint)B) .. and so on, except that uint, ulong, ushort, etc would be chosen as needed. Is this worth doing? Maybe - it's not a big deal to me, but the signed/unsigned question does represent a common gotcha in certain expressions.It's a good idea, but, same problem as before; you have to realise length is signed and you have to realise there is a chance for wraparound. A warning about the signed/unsigned comparrison is the very least we should do. I would be tempted to even make it an outright error requiring a cast (or one of these new operators above) to handle. ReganUnfortunately, that doesn't solve the original problem if (a < b.length-2) What would solve it is making .length signed. Even though it should puristically be unsigned, and losing one bit could theoretically be a problem (because you couldn't allocate new byte[more than 2GB], the reality is that a 32-bit app can't allocate more than 2GB of RAM anyway (in Windows at least), and for 64-bit apps, losing that one bit is totally insignificant, unless someone expects a computer with 9 exabytes of ram anytime soon :) And the only place really needing change (and a trivial one) would be the array resizing code.. xs0The unsigned .length would fix this particular issue. But I think this might be one of those things where every solution seems to cause problems in some other area. I realize this is all tangential, but I think that there are three or four ways to handle this and all of them probably have a few "gotchas". You'd almost have to write a big program with the signed version of .length to know if you got bit somehow.
Apr 11 2006