www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - pu$€le

reply strtr <strtr sp.am> writes:
What does this program print?

----
const char[] coins = `$€`;

void main()
{
	writef(`I made `);
	int stash = 0;
	scope(exit) writefln(stash,`.`);
	scope(failure) stash--;

	foreach(coin;coins)
	{
		scope(exit) stash++;
		scope(success) stash++;
		scope(failure) stash--;
		scope(failure) continue;
		writef(coin);
	}
}
----
Jul 17 2010
next sibling parent reply strtr <strtr sp.am> writes:
That is [dollar sign, euro sign]

The reason I post it is because I expected the stash to be 3 lower.
Jul 17 2010
next sibling parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Saturday 17 July 2010 18:59:18 strtr wrote:
 That is [dollar sign, euro sign]
=20
 The reason I post it is because I expected the stash to be 3 lower.
Well, if I replace writef with write, I get I made $=E2=82=AC8. If I leave in the writef though, I get this error: /home/jmdavis/Downloaded_Files/dmd/dmd2/linux/bin/../../src/phobos/std/stdi= o.d(623):=20 Error: static assert "You must pass a formatting string as the first argum= ent to=20 writef or writefln. If no formatting is needed, you may want to use write o= r=20 writeln." /home/jmdavis/Downloaded_Files/dmd/dmd2/linux/bin/../../src/phobos/std/stdi= o.d(1442): =20 instantiated from here: writef!(const(char)) t.d(18): instantiated from here: writef!(const(char)) I'm not quite sure why you're using writef here since writef requires a str= ing=20 as its first argument, and you're passing it something other than a string = as the=20 first argument. =2D Jonathan M Davis
Jul 17 2010
parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Saturday 17 July 2010 18:59:18 strtr wrote:
 That is [dollar sign, euro sign]

 The reason I post it is because I expected the stash to be 3 lower.
Well, if I replace writef with write, I get I made $€8. If I leave in the writef though, I get this error: /home/jmdavis/Downloaded_Files/dmd/dmd2/linux/bin/../../src/phobos/std/stdi o.d(623): Error: static assert "You must pass a formatting string as the first argum ent to writef or writefln. If no formatting is needed, you may want to use write o r writeln." /home/jmdavis/Downloaded_Files/dmd/dmd2/linux/bin/../../src/phobos/std/stdi o.d(1442): instantiated from here: writef!(const(char)) t.d(18): instantiated from here: writef!(const(char)) I'm not quite sure why you're using writef here since writef requires a str ing as its first argument, and you're passing it something other than a string as the first argument. - Jonathan M Davis
Or you have an awesome D emulator in your brain, or you cheated by actually running the code ;P
Jul 17 2010
parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Saturday 17 July 2010 22:10:07 strtr wrote:
 =3D=3D Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
=20
 On Saturday 17 July 2010 18:59:18 strtr wrote:
 That is [dollar sign, euro sign]
=20
 The reason I post it is because I expected the stash to be 3 lower.
=20 Well, if I replace writef with write, I get I made $=E2=82=AC8. If I leave in the writef though, I get this error: /home/jmdavis/Downloaded_Files/dmd/dmd2/linux/bin/../../src/phobos/std/=
st
 di o.d(623):
 Error: static assert  "You must pass a formatting string as the first
 argum ent to
 writef or writefln. If no formatting is needed, you may want to use wri=
te
 o r
 writeln."
 /home/jmdavis/Downloaded_Files/dmd/dmd2/linux/bin/../../src/phobos/std/=
st
 di o.d(1442):
 instantiated from here: writef!(const(char))
 t.d(18):        instantiated from here: writef!(const(char))
 I'm not quite sure why you're using writef here since writef requires a
 str ing
 as its first argument, and you're passing it something other than a
 string as the
 first argument.
 - Jonathan M Davis
=20 Or you have an awesome D emulator in your brain, or you cheated by actual=
ly
 running the code ;P
Cheated? I thought that you were trying to figure out why the code wasn't d= oing=20 what you expected it to be doing. So, of course I ran it. Though, it's more likely that I have an x86 emulator in my brain which can = run=20 dmd than that I have a D emulator in my brain if I figured this out in my h= ead,=20 since I gave you the exact error message that dmd does. =2D Jonathan M Davis
Jul 17 2010
parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Saturday 17 July 2010 22:10:07 strtr wrote:
 == Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article

 On Saturday 17 July 2010 18:59:18 strtr wrote:
 That is [dollar sign, euro sign]

 The reason I post it is because I expected the stash to be 3 lower.
Well, if I replace writef with write, I get I made $€8. If I leave in the writef though, I get this error: /home/jmdavis/Downloaded_Files/dmd/dmd2/linux/bin/../../src/phobos/std/
st
 di o.d(623):
 Error: static assert  "You must pass a formatting string as the first
 argum ent to
 writef or writefln. If no formatting is needed, you may want to use wri
te
 o r
 writeln."
 /home/jmdavis/Downloaded_Files/dmd/dmd2/linux/bin/../../src/phobos/std/
st
 di o.d(1442):
 instantiated from here: writef!(const(char))
 t.d(18):        instantiated from here: writef!(const(char))
 I'm not quite sure why you're using writef here since writef requires a
 str ing
 as its first argument, and you're passing it something other than a
 string as the
 first argument.
 - Jonathan M Davis
Or you have an awesome D emulator in your brain, or you cheated by actual
ly
 running the code ;P
Cheated? I thought that you were trying to figure out why the code wasn't d oing what you expected it to be doing. So, of course I ran it. Though, it's more likely that I have an x86 emulator in my brain which can run dmd than that I have a D emulator in my brain if I figured this out in my h ead, since I gave you the exact error message that dmd does. - Jonathan M Davis
I don't find it more likely that you have a x86 emulator in your brain which then ran dmd to compile some code. I might even think that almost impossible ;P If you knew the compiler well enough you might be capable of giving that error message with only the extra knowledge of where your files recite and version and OS infos.
Jul 17 2010
parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Saturday 17 July 2010 23:01:28 strtr wrote:
 
 Cheated? I thought that you were trying to figure out why the code wasn't
 d oing
 what you expected it to be doing. So, of course I ran it.
 Though, it's more likely that I have an x86 emulator in my brain which
 can run
 dmd than that I have a D emulator in my brain if I figured this out in my
 h ead,
 since I gave you the exact error message that dmd does.
 - Jonathan M Davis
I don't find it more likely that you have a x86 emulator in your brain which then ran dmd to compile some code. I might even think that almost impossible ;P If you knew the compiler well enough you might be capable of giving that error message with only the extra knowledge of where your files recite and version and OS infos.
Well, since both are pretty much impossible, I think that it's a moot point. I can believe that someone would know the compiler well enough to know what it was going to do in most situations and that they would have some idea as to what the error message would be, but if you want them to be at all precise, that just takes too much detail for anyone to remember. If they could do that, they'd be an insanely good programmer. - Jonathan M Davis
Jul 18 2010
parent strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Saturday 17 July 2010 23:01:28 strtr wrote:
 Cheated? I thought that you were trying to figure out why the code wasn't
 d oing
 what you expected it to be doing. So, of course I ran it.
 Though, it's more likely that I have an x86 emulator in my brain which
 can run
 dmd than that I have a D emulator in my brain if I figured this out in my
 h ead,
 since I gave you the exact error message that dmd does.
 - Jonathan M Davis
I don't find it more likely that you have a x86 emulator in your brain which then ran dmd to compile some code. I might even think that almost impossible ;P If you knew the compiler well enough you might be capable of giving that error message with only the extra knowledge of where your files recite and version and OS infos.
Well, since both are pretty much impossible, I think that it's a moot point. I can believe that someone would know the compiler well enough to know what it was going to do in most situations and that they would have some idea as to what the error message would be, but if you want them to be at all precise, that just takes too much detail for anyone to remember. If they could do that, they'd be an insanely good programmer. - Jonathan M Davis
The error only needed to be good enough for me to believe it to be generated by a linux compiler ;) I can probably give you satisfying errors for my program. Sure, it is only a fraction of dmd but then again, I'm only a mediocre programmer.
Jul 18 2010
prev sibling parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Saturday 17 July 2010 18:59:18 strtr wrote:
 That is [dollar sign, euro sign]
 
 The reason I post it is because I expected the stash to be 3 lower.
As to why it's not working right, change th foreach loop to this: foreach(dchar coin; coins) { ... } Otherwise, instead of looping over each code point, you're looping over each code unit. char[] and string are encoded in utf-8, so each char is a code unit, and 1 - 4 code units are put together to form a code point, which is what you'd normally think of as a character. The dollar sign takes one code unit in utf-8, but the euro sign takes 3. So, you're looping 4 times instead of 2. By specifying dchar, the compiler automatically processes the code units correctly to make it so that you loop over each code point (i.e. character) rather than each code unit (i.e. char). You should pretty much never deal with each individual char or wchar in a string or wstring. Do the conversion to dchar or dstring if you want to access individual characters. You can also use std.utf.stride() to iterate over to the next code unit which starts a code point, but you're still going to have to make sure that you convert it to a dchar to process it properly. Otherwise, only ASCII characters will work right (since they fit in a single code unit). Fortunately, foreach takes care of all this for is if we specify the element type as dchar. As for why it's 4 rather than 2 in the corrected version (or 8 instead of 4 in the incorrect version), that's because you have both scope(exit) and scop(success) there. Both will be run, so both will increment stash, and you get double the increments that you seem to be expecting. - Jonathan M Davis
Jul 17 2010
next sibling parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Saturday 17 July 2010 18:59:18 strtr wrote:
 That is [dollar sign, euro sign]

 The reason I post it is because I expected the stash to be 3 lower.
As to why it's not working right, change th foreach loop to this: foreach(dchar coin; coins) { ... } Otherwise, instead of looping over each code point, you're looping over each code unit. char[] and string are encoded in utf-8, so each char is a code unit, and 1 - 4 code units are put together to form a code point, which is what you'd normally think of as a character. The dollar sign takes one code unit in utf-8, but the euro sign takes 3. So, you're looping 4 times instead of 2. By specifying dchar, the compiler automatically processes the code units correctly to make it so that you loop over each code point (i.e. character) rather than each code unit (i.e. char). You should pretty much never deal with each individual char or wchar in a string or wstring. Do the conversion to dchar or dstring if you want to access individual characters. You can also use std.utf.stride() to iterate over to the next code unit which starts a code point, but you're still going to have to make sure that you convert it to a dchar to process it properly. Otherwise, only ASCII characters will work right (since they fit in a single code unit). Fortunately, foreach takes care of all this for is if we specify the element type as dchar. As for why it's 4 rather than 2 in the corrected version (or 8 instead of 4 in the incorrect version), that's because you have both scope(exit) and scop(success) there. Both will be run, so both will increment stash, and you get double the increments that you seem to be expecting. - Jonathan M Davis
Wasn't it obvious the puzzle was about exceptions, with half of the lines being scope guards and all? Part of the puzzle is the realization that chars aren't code points but code units. The other part is understanding the order of scope guard execution. I'm not sure whether the linux or the windows version of writef is the correct one, but here I get a nice utf-exception. (Or did you maybe use D2? if not then we have a discrepancy bug)
Jul 17 2010
parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Saturday 17 July 2010 21:48:30 strtr wrote:
 == Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 
 On Saturday 17 July 2010 18:59:18 strtr wrote:
 That is [dollar sign, euro sign]
 
 The reason I post it is because I expected the stash to be 3 lower.
As to why it's not working right, change th foreach loop to this: foreach(dchar coin; coins) { ... } Otherwise, instead of looping over each code point, you're looping over each code unit. char[] and string are encoded in utf-8, so each char is a code unit, and 1 - 4 code units are put together to form a code point, which is what you'd normally think of as a character. The dollar sign takes one code unit in utf-8, but the euro sign takes 3. So, you're looping 4 times instead of 2. By specifying dchar, the compiler automatically processes the code units correctly to make it so that you loop over each code point (i.e. character) rather than each code unit (i.e. char). You should pretty much never deal with each individual char or wchar in a string or wstring. Do the conversion to dchar or dstring if you want to access individual characters. You can also use std.utf.stride() to iterate over to the next code unit which starts a code point, but you're still going to have to make sure that you convert it to a dchar to process it properly. Otherwise, only ASCII characters will work right (since they fit in a single code unit). Fortunately, foreach takes care of all this for is if we specify the element type as dchar. As for why it's 4 rather than 2 in the corrected version (or 8 instead of 4 in the incorrect version), that's because you have both scope(exit) and scop(success) there. Both will be run, so both will increment stash, and you get double the increments that you seem to be expecting. - Jonathan M Davis
Wasn't it obvious the puzzle was about exceptions, with half of the lines being scope guards and all? Part of the puzzle is the realization that chars aren't code points but code units. The other part is understanding the order of scope guard execution. I'm not sure whether the linux or the windows version of writef is the correct one, but here I get a nice utf-exception. (Or did you maybe use D2? if not then we have a discrepancy bug)
All I ever use is D2. I have no idea what D1 would be doing differently. In D2, writef(), the "f" is for format or formatted, and you have to have a "format" string like printf would in order for it to work. write() is the version which doesn't require a format string. I am using Linux if that changes anything, but as far as I can tell, you're using writef() incorrectly. In any case, I obviously don't quite get what you're trying to do since (at least in D2), I don't believe that you have any functions in that loop which will every throw an exception. If you were using File's writef() because you were writing to a file, then that would be different. But writef() by itself is to stdout and won't throw. Now, as you're using D1, that may change things. But you gave no indication that you were using D1. In future questions, you should probably be more specific about that, since I think that most people around here are using D2, and they will likely assume that you're using D2 unless you say otherwise. I certainly did. - Jonathan M Davis
Jul 17 2010
parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Saturday 17 July 2010 21:48:30 strtr wrote:
 == Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article

 On Saturday 17 July 2010 18:59:18 strtr wrote:
 That is [dollar sign, euro sign]

 The reason I post it is because I expected the stash to be 3 lower.
As to why it's not working right, change th foreach loop to this: foreach(dchar coin; coins) { ... } Otherwise, instead of looping over each code point, you're looping over each code unit. char[] and string are encoded in utf-8, so each char is a code unit, and 1 - 4 code units are put together to form a code point, which is what you'd normally think of as a character. The dollar sign takes one code unit in utf-8, but the euro sign takes 3. So, you're looping 4 times instead of 2. By specifying dchar, the compiler automatically processes the code units correctly to make it so that you loop over each code point (i.e. character) rather than each code unit (i.e. char). You should pretty much never deal with each individual char or wchar in a string or wstring. Do the conversion to dchar or dstring if you want to access individual characters. You can also use std.utf.stride() to iterate over to the next code unit which starts a code point, but you're still going to have to make sure that you convert it to a dchar to process it properly. Otherwise, only ASCII characters will work right (since they fit in a single code unit). Fortunately, foreach takes care of all this for is if we specify the element type as dchar. As for why it's 4 rather than 2 in the corrected version (or 8 instead of 4 in the incorrect version), that's because you have both scope(exit) and scop(success) there. Both will be run, so both will increment stash, and you get double the increments that you seem to be expecting. - Jonathan M Davis
Wasn't it obvious the puzzle was about exceptions, with half of the lines being scope guards and all? Part of the puzzle is the realization that chars aren't code points but code units. The other part is understanding the order of scope guard execution. I'm not sure whether the linux or the windows version of writef is the correct one, but here I get a nice utf-exception. (Or did you maybe use D2? if not then we have a discrepancy bug)
All I ever use is D2. I have no idea what D1 would be doing differently. In D2, writef(), the "f" is for format or formatted, and you have to have a "format" string like printf would in order for it to work. write() is the version which doesn't require a format string. I am using Linux if that changes anything, but as far as I can tell, you're using writef() incorrectly. In any case, I obviously don't quite get what you're trying to do since (at least in D2), I don't believe that you have any functions in that loop which will every throw an exception. If you were using File's writef() because you were writing to a file, then that would be different. But writef() by itself is to stdout and won't throw. Now, as you're using D1, that may change things. But you gave no indication that you were using D1. In future questions, you should probably be more specific about that, since I think that most people around here are using D2, and they will likely assume that you're using D2 unless you say otherwise. I certainly did. - Jonathan M Davis
I think I'll start subject tagging my posts: [D1/D2] std.stdio in D1 doesn't mention a write function and feeding the writef function an illegal UTF string will result in a UTF exception. With this information, what do you think the output should be?
Jul 17 2010
next sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Saturday 17 July 2010 22:52:21 strtr wrote:
=20
 I think I'll start subject tagging my posts: [D1/D2]
 std.stdio in D1 doesn't mention a write function and feeding the writef
 function an illegal UTF string will result in a UTF exception.
 With this information, what do you think the output should be?
Well, I certainly think that throwing an exception for bad UTF-8 values mak= es=20 sense, though D2's docs for writef say nothing about exceptions, and on my= =20 machine, running Linux, they just fail to print anything. Throwing an excep= tion=20 would likely have been better. In any case, I would have expected it to increment stash by 2 on the first = loop=20 because $ would be valid and would hit both scope(exit) and scope(success).= =20 After that... That continue makes me awfulling nervous. You'd expect the sc= ope=20 statements to be run in reverse order with continue and then stash--. Howev= er,=20 to run that continue statement would have to skip the other scope statement= s... I think that we'll have to lower the body of that foreach loop to have any = clue=20 what's going on here. It should come out to something like this, I would th= ink: const char[] coins =3D `$=EF=BF=BD`; void main() { writef(`I made `); int stash =3D 0; scope(exit) writefln(stash,`.`); scope(failure) stash--; foreach(coin;coins) { try { try { try { try { writef(coin); } catch { continue; throw; } } catch { stash--; throw; } stash++; } catch { throw; } } finally { stash++; } } } That being the case, the exception from writef() will always get eaten by t= he=20 continue because the throw that rethrows the exception would never occur.=20 Normally, code like that should result in a compilation error, but it might= not=20 given that it's the compiler creating the try-catch block. My guess is that= this=20 is a bug in dmd. It makes no sense to me to allow any kind of goto, break, = or=20 continue statements in a scope statement's body. Regardless, that continue would mean that the first stash++ would be skippe= d, but=20 the second would still happen because it's in a finally block. That means t= hat=20 each of the 3 bad UTF-8 values which make up the euro symbol would each=20 increment stash once. So, the overall result would then be 5. It's possible that I lowered those scope statements incorrectly, but it loo= ks to=20 me like that's what the code should be doing. Regardless, continue in a sco= pe=20 statement should be an error. =2D Jonathan M Davis
Jul 18 2010
prev sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 18 July 2010 00:38:38 Jonathan M Davis wrote:
 On Saturday 17 July 2010 22:52:21 strtr wrote:
 I think I'll start subject tagging my posts: [D1/D2]
 std.stdio in D1 doesn't mention a write function and feeding the writef
 function an illegal UTF string will result in a UTF exception.
 With this information, what do you think the output should be?
=20 Well, I certainly think that throwing an exception for bad UTF-8 values makes sense, though D2's docs for writef say nothing about exceptions, and on my machine, running Linux, they just fail to print anything. Throwing an exception would likely have been better. =20 In any case, I would have expected it to increment stash by 2 on the first loop because $ would be valid and would hit both scope(exit) and scope(success). After that... That continue makes me awfulling nervous. You'd expect the scope statements to be run in reverse order with continue and then stash--. However, to run that continue statement would have to skip the other scope statements... =20 I think that we'll have to lower the body of that foreach loop to have any clue what's going on here. It should come out to something like this, I would think: =20 const char[] coins =3D `$=EF=BF=BD`; =20 void main() { writef(`I made `); int stash =3D 0; scope(exit) writefln(stash,`.`); scope(failure) stash--; =20 foreach(coin;coins) { try { try { try { try { writef(coin); } catch { continue; throw; } } catch { stash--; throw; } =20 stash++; } catch { throw; } } finally { stash++; } } } =20 =20 That being the case, the exception from writef() will always get eaten by the continue because the throw that rethrows the exception would never occur. Normally, code like that should result in a compilation error, but it might not given that it's the compiler creating the try-catch block. My guess is that this is a bug in dmd. It makes no sense to me to allow any kind of goto, break, or continue statements in a scope statement's body. =20 Regardless, that continue would mean that the first stash++ would be skipped, but the second would still happen because it's in a finally block. That means that each of the 3 bad UTF-8 values which make up the euro symbol would each increment stash once. So, the overall result would then be 5. =20 It's possible that I lowered those scope statements incorrectly, but it looks to me like that's what the code should be doing. Regardless, continue in a scope statement should be an error. =20 - Jonathan M Davis
Hmm. Well, it seems that throw by itself is not legal D. You have to do=20 something like catch(Exception e) { throw e; } But in any case, catch(Exception e) { continue; throw e; } compiles just fine. That seems to me like it shouldn't though, since then t= hrow=20 e; is an unreachable statement. In any case, I'll file a bug report on this. =2D Jonathan M Davis
Jul 18 2010
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:
 You should pretty much never deal with each individual char or wchar in a
string 
 or wstring. Do the conversion to dchar or dstring if you want to access 
 individual characters. You can also use std.utf.stride() to iterate over to
the 
 next code unit which starts a code point, but you're still going to have to
make 
 sure that you convert it to a dchar to process it properly. Otherwise, only 
 ASCII characters will work right (since they fit in a single code unit). 
 Fortunately, foreach takes care of all this for is if we specify the element 
 type as dchar.
I am starting to think that for safety the foreach on a string has to yield dchars on default, and to yield chars only on request: foreach(c; "hello") => dchars foreach(char c; "hello") => chars Bye, bearophile
Jul 18 2010
parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 18 July 2010 04:13:03 bearophile wrote:
 Jonathan M Davis:
 You should pretty much never deal with each individual char or wchar in a
 string or wstring. Do the conversion to dchar or dstring if you want to
 access individual characters. You can also use std.utf.stride() to
 iterate over to the next code unit which starts a code point, but you're
 still going to have to make sure that you convert it to a dchar to
 process it properly. Otherwise, only ASCII characters will work right
 (since they fit in a single code unit). Fortunately, foreach takes care
 of all this for is if we specify the element type as dchar.
I am starting to think that for safety the foreach on a string has to yield dchars on default, and to yield chars only on request: foreach(c; "hello") => dchars foreach(char c; "hello") => chars Bye, bearophile
That's probably a good idea, though for people to write safe string code in the general case, they're really going to have to understand the differences between char, wchar, and dchar as well as what that means for their code. It's just way too easy to shoot yourself in the foot once you start trying to manipulate single characters, and I don't think that there's really a way to fix that unless you forced dchar for everything, which definitely isn' t the D way to do things (though IIRC, that's essentially what Java did). Still, this particular case might be better off defaulting to dchar since dchar is already handled specially in foreach anyhow. My only real problem with that is the fact that while dchar is handled specially, it's done with a conversion, and making foreach over a string default to dchar instead of char breaks how foreach works normally. It seems to me more like a warning would be a better idea. If they really want char, they can specify char, but the warning would warn them so that they'd be aware of the issue and specify the correct type (be it char or dchar or whatever) rather than leaving it blank. That way, foreach retains its normal semantics, and the problem is still averted. - Jonathan M Davis
Jul 18 2010
parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Sunday 18 July 2010 04:13:03 bearophile wrote:
 Jonathan M Davis:
 You should pretty much never deal with each individual char or wchar in a
 string or wstring. Do the conversion to dchar or dstring if you want to
 access individual characters. You can also use std.utf.stride() to
 iterate over to the next code unit which starts a code point, but you're
 still going to have to make sure that you convert it to a dchar to
 process it properly. Otherwise, only ASCII characters will work right
 (since they fit in a single code unit). Fortunately, foreach takes care
 of all this for is if we specify the element type as dchar.
I am starting to think that for safety the foreach on a string has to yield dchars on default, and to yield chars only on request: foreach(c; "hello") => dchars foreach(char c; "hello") => chars Bye, bearophile
That's probably a good idea, though for people to write safe string code in the general case, they're really going to have to understand the differences between char, wchar, and dchar as well as what that means for their code. It's just way too easy to shoot yourself in the foot once you start trying to manipulate single characters, and I don't think that there's really a way to fix that unless you forced dchar for everything, which definitely isn' t the D way to do things (though IIRC, that's essentially what Java did). Still, this particular case might be better off defaulting to dchar since dchar is already handled specially in foreach anyhow. My only real problem with that is the fact that while dchar is handled specially, it's done with a conversion, and making foreach over a string default to dchar instead of char breaks how foreach works normally. It seems to me more like a warning would be a better idea. If they really want char, they can specify char, but the warning would warn them so that they'd be aware of the issue and specify the correct type (be it char or dchar or whatever) rather than leaving it blank. That way, foreach retains its normal semantics, and the problem is still averted. - Jonathan M Davis
I agree with the warning. A good warning would get people to read up on UTF. And if you really want to have char you'll need to cast: foreach(cast(char)c; chars)
Jul 18 2010
parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 18 July 2010 06:16:09 strtr wrote:
 I agree with the warning. A good warning would get people to read up on
 UTF. And if you really want to have char you'll need to cast:
 foreach(cast(char)c; chars)
Actually, the cast would be totally unnecessary. Putting foreach(char c; chars) would be enough. Forcing a cast would change how foreach normally works. I'm not even sure that you can legally put a cast there like that. What we'd want to disallow would be foreach(c; chars) As long as the programmer puts the element type, we can assume that they know what they're doing. But warning in cases where they don't put it would catch a large number of errors in iterating over strings and wstrings. In any case, I filed a bug report for it: http://d.puremagic.com/issues/show_bug.cgi?id=4483
Jul 18 2010
parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Sunday 18 July 2010 06:16:09 strtr wrote:
 I agree with the warning. A good warning would get people to read up on
 UTF. And if you really want to have char you'll need to cast:
 foreach(cast(char)c; chars)
Actually, the cast would be totally unnecessary. Putting foreach(char c; chars) would be enough. Forcing a cast would change how foreach normally works. I'm not even sure that you can legally put a cast there like that. What we'd want to disallow would be foreach(c; chars) As long as the programmer puts the element type, we can assume that they know what they're doing. But warning in cases where they don't put it would catch a large number of errors in iterating over strings and wstrings. In any case, I filed a bug report for it: http://d.puremagic.com/issues/show_bug.cgi?id=4483
As a habit I tend to put types everywhere, just recently have I started using auto. Conceptually, it just looked so obvious foreach(char c; chars) would iterate over characters. And you can go on programming like that (in English) for quite a while without getting any errors whatsoever. The moment I finally used a single non ascii char I noticed something going wrong and had to go back and fix quite a few bugs. And the worst part is, I wasn't the only one making this mistake. Well, what I wanted to say was that I at least won't assume the programmer knows what he's doing only because he adds a type. I totally agree that putting a cast there is probably not really a solution (or legal). Warnings for all non-dchar types. Is there anybody using foreach(c;chars) || foreach(char c;chars) correctly (which couldn't be done with ubytes)?
Jul 18 2010
parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 18 July 2010 10:59:21 strtr wrote:
 I totally agree that putting a cast there is probably not really a solution
 (or legal).
 Warnings for all non-dchar types.
 Is there anybody using foreach(c;chars) || foreach(char c;chars) correctly
 (which couldn't be done with ubytes)?
As soon as some wants to process code units (for whatever reason) instead of code points, then using char and wchar makes sense. Now, I suppose that you could use ubyte and ushort in such circumstances, but I'm sure that _someone_ will be looking to do it, and (there's a decent chance that phobos does it) I don't think that it would go over very well to give them lots of warnings. The issue, of course, is that the common case is that anything other than dchar in a foreach over string types would be a logic error in your code. D does a lot to make things safer, but I don't think that there are very many cases where things like this are special-cased in order to stop errors. The programmer is expected to have some clue as to what they're doing, and the general trend in D from what I can tell is to not use a type unless you have to, so it would be perfectly normal to expect the programmer to have really meant char or wchar if they put it explicitly. I don't know. The truth is that on the one hand, programmers _need_ to understand how D deals with strings and unicode, or they _will_ have bugs. There's no getting around that. So, cases where someone who knows what they're doing is likely to screw up on (like forgetting the type on the foreach) should have warnings associated with them if it's reasonable. However, expecting the compiler to catch each and every instance that a programmer is likely to shoot themself in the foot with unicode and strings is not particularly reasonable. The compiler can't always save the programmer from their own ignorance or stupidity. If anything, that would indicate that making errors _easier_ in code which someone who doesn't understand how D deals with unicode would write would be a good idea. It should be the case that competent D programmers will be able to use strings easily. But it's likely better if the ones who don't know what they're doing shoot themselves in the foot earlier rather than sooner so that they learn what they need to learn about unicode and _become_ competent D programmers. A competent D programmer will not put an explicit char in a foreach loop unless that's what they really mean. The only issue there is that char could be a type for dchar. But that sort of typo would be rather hard to defend against in general. So, certainly on the surface, it would seem overkill to effectively disallow char and wchar in foreach loops and force ubyte and ushort. Still, this is an area which isn't all that hard to screw up on, so I don't know what the best solution is. When it comes down to it, you can't always hold the programmers hand. They need to be informed and responsible. But on the other hand, you do want to make it harder for them to make stupid mistakes, since even competent programmers do make stupid mistakes at least some of the time. A warning for a foreach loop over strings where the element type is not specified is a start. If you have a solid suggestion which would reduce errors in the common case without unduly restraing folks who really know what they're doing, then create a bug report for it with the severity of enhancement. Walter and company will decide what works best with what they intend for D. Your suggestion may or may not be implemented, but it's worth a try. - Jonathan M Davis
Jul 18 2010
parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Sunday 18 July 2010 10:59:21 strtr wrote:
 I totally agree that putting a cast there is probably not really a solution
 (or legal).
 Warnings for all non-dchar types.
 Is there anybody using foreach(c;chars) || foreach(char c;chars) correctly
 (which couldn't be done with ubytes)?
As soon as some wants to process code units (for whatever reason) instead of code points, then using char and wchar makes sense. Now, I suppose that you could use ubyte and ushort in such circumstances, but I'm sure that _someone_ will be looking to do it, and (there's a decent chance that phobos does it) I don't think that it would go over very well to give them lots of warnings. The issue, of course, is that the common case is that anything other than dchar in a foreach over string types would be a logic error in your code. D does a lot to make things safer, but I don't think that there are very many cases where things like this are special-cased in order to stop errors. The programmer is expected to have some clue as to what they're doing, and the general trend in D from what I can tell is to not use a type unless you have to, so it would be perfectly normal to expect the programmer to have really meant char or wchar if they put it explicitly. I don't know. The truth is that on the one hand, programmers _need_ to understand how D deals with strings and unicode, or they _will_ have bugs. There's no getting around that. So, cases where someone who knows what they're doing is likely to screw up on (like forgetting the type on the foreach) should have warnings associated with them if it's reasonable. However, expecting the compiler to catch each and every instance that a programmer is likely to shoot themself in the foot with unicode and strings is not particularly reasonable. The compiler can't always save the programmer from their own ignorance or stupidity. If anything, that would indicate that making errors _easier_ in code which someone who doesn't understand how D deals with unicode would write would be a good idea. It should be the case that competent D programmers will be able to use strings easily. But it's likely better if the ones who don't know what they're doing shoot themselves in the foot earlier rather than sooner so that they learn what they need to learn about unicode and _become_ competent D programmers.
I actually knew about unicode, but I mistakenly thought a char to be a code point (thus variable in size). Somehow I missed any documentation telling me otherwise. Now that I look for it it actually says: char | unsigned 8 bit UTF-8 Maybe some stronger pointers in the documentation would help.
 A competent D programmer will not put an explicit char in a foreach loop unless
 that's what they really mean. The only issue there is that char could be a type
 for dchar. But that sort of typo would be rather hard to defend against in
 general. So, certainly on the surface, it would seem overkill to effectively
 disallow char and wchar in foreach loops and force ubyte and ushort.
 Still, this is an area which isn't all that hard to screw up on, so I don't
know
 what the best solution is. When it comes down to it, you can't always hold the
 programmers hand. They need to be informed and responsible. But on the other
 hand, you do want to make it harder for them to make stupid mistakes, since
even
 competent programmers do make stupid mistakes at least some of the time.
 A warning for a foreach loop over strings where the element type is not
specified
 is a start. If you have a solid suggestion which would reduce errors in the
 common case without unduly restraing folks who really know what they're doing,
 then create a bug report for it with the severity of enhancement. Walter and
 company will decide what works best with what they intend for D. Your
suggestion
 may or may not be implemented, but it's worth a try.
 - Jonathan M Davis
I agree with your bug-report.
Jul 18 2010
parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 18 July 2010 17:15:15 strtr wrote:
 
 I actually knew about unicode, but I mistakenly thought a char to be a code
 point (thus variable in size).
 Somehow I missed any documentation telling me otherwise.
 Now that I look for it it actually says:
 char | 	unsigned 8 bit UTF-8
 
 Maybe some stronger pointers in the documentation would help.
 
The section in TDPL on strings is excellent. A good article on unicode on D's site would be good a good additon though. While some of the documentation is good, it does tend to be fairly sparse. - Jonathan M Davis
Jul 18 2010
prev sibling parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 18 July 2010 00:46:36 Jonathan M Davis wrote:
 I'll file a bug report
 
 - Jonathan M Davis
Wait. That's not the problem. Or at least, that's not the problem that needs to be reported. The problem is that we're not compiling with -w. If you compile with -w, then statements such as scope(failure) continue; won't compile due to being unreachable statements. But if you compile with -w, then the compiler flags it as an error, and the program fails to compile. So, I filed a bug report on the fact that such warnins aren't reported without -w (though they would still compile since they're warnings rather than errors): http://d.puremagic.com/issues/show_bug.cgi?id=4482 Regardless, what you're trying to do is clearly an error, and compiling with -w will show that. - Jonathan M Davis
Jul 18 2010
next sibling parent strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Sunday 18 July 2010 00:46:36 Jonathan M Davis wrote:
 I'll file a bug report

 - Jonathan M Davis
Wait. That's not the problem. Or at least, that's not the problem that needs to be reported. The problem is that we're not compiling with -w. If you compile with -w, then statements such as scope(failure) continue; won't compile due to being unreachable statements. But if you compile with -w, then the compiler flags it as an error, and the program fails to compile. So, I filed a bug report on the fact that such warnins aren't reported without -w (though they would still compile since they're warnings rather than errors): http://d.puremagic.com/issues/show_bug.cgi?id=4482 Regardless, what you're trying to do is clearly an error, and compiling with -w will show that. - Jonathan M Davis
This should be upped to a error, as -w only shows it as unreachable(without a line number:(. I don't think unreachable code is an error. I often have unreachable code when debugging. case default: assert(false);// temp ... break;
Jul 18 2010
prev sibling parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Sunday 18 July 2010 00:46:36 Jonathan M Davis wrote:
 I'll file a bug report

 - Jonathan M Davis
Wait. That's not the problem. Or at least, that's not the problem that needs to be reported. The problem is that we're not compiling with -w. If you compile with -w, then statements such as scope(failure) continue; won't compile due to being unreachable statements. But if you compile with -w, then the compiler flags it as an error, and the program fails to compile. So, I filed a bug report on the fact that such warnins aren't reported without -w (though they would still compile since they're warnings rather than errors): http://d.puremagic.com/issues/show_bug.cgi?id=4482 Regardless, what you're trying to do is clearly an error, and compiling with -w will show that. - Jonathan M Davis
I don't agree with this bug report because of two reasons. 1. Warnings are supposed to be warnings, not errors. If you want to see those warnings you'll use -w. What you probably want is for the dmd to have a -!w flag instead (warnings by default, disable with flag) 2. In this particular example, the problem is not that the warning isn't shown without -w, but that the warning is incorrect and scope(failure) shouldn't be able to catch the exception. Here is a smaller example of the same problem[D1]: ---- void main() { for(int i=0;i<10;i++) { scope(failure){ writefln("continue"); continue; } //scope(failure) writefln("fail"); writefln(i); throw new Exception(format(i)); } } ---- Enable warnings and you'll get the same unreachable warning, but which statement is unreachable as when you compile this without -w it happily prints all ten i's and continues.
Jul 18 2010
parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 18 July 2010 17:36:58 strtr wrote:
 
 I don't agree with this bug report because of two reasons.
 1. Warnings are supposed to be warnings, not errors. If you want to see
 those warnings you'll use -w.
 What you probably want is for the dmd to have a -!w flag instead (warnings
 by default, disable with flag)
 2. In this particular example, the problem is not that the warning isn't
 shown without -w, but that the warning is incorrect and scope(failure)
 shouldn't be able to catch the exception.
 
 Here is a smaller example of the same problem[D1]:
 ----
 void main()
 {
 	for(int i=0;i<10;i++)
 	{
 		scope(failure){
 			writefln("continue");
 			continue;
 		}
 		//scope(failure) writefln("fail");
 		writefln(i);
 		throw new Exception(format(i));
 	}
 }
 ----
 
 Enable warnings and you'll get the same unreachable warning, but which
 statement is unreachable as when you compile this without -w it happily
 prints all ten i's and continues.
With any other compiler that I've ever used, it prints warnings normally. It may or may not have a way to make then errors, but it will print them normally and compile with them. dmd won't display warnings with -w, but when you use -w, it instantly makes them errors. There needs to be a middle ground where warnings are reported and not flagged as errors. As for unreachable code being an error, that's debatable. Obviously, dmd doesn't consider it one. Personally, I hate the fact that javac does with Java. I _want_ that to be a warning. I'd like to be warned about it, and I don't want it to be in production code, but it happens often enough when developing, that I don't want to have to fix it to get code to compile. As such, a warning makes perfect sense. However, when you combine that with the fact that dmd doesn't even report warnings unless it treats them as errors, it becomes easy to miss. - Jonathan M Davis
Jul 18 2010
parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Sunday 18 July 2010 17:36:58 strtr wrote:
 I don't agree with this bug report because of two reasons.
 1. Warnings are supposed to be warnings, not errors. If you want to see
 those warnings you'll use -w.
 What you probably want is for the dmd to have a -!w flag instead (warnings
 by default, disable with flag)
 2. In this particular example, the problem is not that the warning isn't
 shown without -w, but that the warning is incorrect and scope(failure)
 shouldn't be able to catch the exception.

 Here is a smaller example of the same problem[D1]:
 ----
 void main()
 {
 	for(int i=0;i<10;i++)
 	{
 		scope(failure){
 			writefln("continue");
 			continue;
 		}
 		//scope(failure) writefln("fail");
 		writefln(i);
 		throw new Exception(format(i));
 	}
 }
 ----

 Enable warnings and you'll get the same unreachable warning, but which
 statement is unreachable as when you compile this without -w it happily
 prints all ten i's and continues.
With any other compiler that I've ever used, it prints warnings normally. It may or may not have a way to make then errors, but it will print them normally and compile with them. dmd won't display warnings with -w, but when you use -w, it instantly makes them errors. There needs to be a middle ground where warnings are reported and not flagged as errors.
I would use this middle ground by default, if available.
 As for unreachable code being an error, that's debatable. Obviously, dmd
doesn't
 consider it one. Personally, I hate the fact that javac does with Java. I
_want_
 that to be a warning. I'd like to be warned about it, and I don't want it to be
 in production code, but it happens often enough when developing, that I don't
 want to have to fix it to get code to compile. As such, a warning makes perfect
 sense.
I'm not sure whether you missed my point or are simple thinking out loud about unreachable code being a warning. My point was that the unreachable warning was wrong as there is no unreachable code.
 However, when you combine that with the fact that dmd doesn't even report
 warnings unless it treats them as errors, it becomes easy to miss.
 - Jonathan M Davis
Jul 18 2010
parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 18 July 2010 19:14:11 strtr wrote:
 I'm not sure whether you missed my point or are simple thinking out loud
 about unreachable code being a warning.
 My point was that the unreachable warning was wrong as there is no
 unreachable code.
Except that there _is_. You just can't see it. scope(X) creates a try-catch block. So, scope(exit) whatever; /* code */ becomes try { */ code */ } finally { whatever; } scope(success) whatever; /* code */ becomes /* code */ whatever; scope(failure) whatever; /* code */ becomes try { /* code */ } catch(Exception e) { whatever; throw e; } So, something like scope(failure) continue; /* code */ becomes try { /* code */ } catch(Exception e) { continue; throw e; } The throw statement is then unreachable. So, the warning is correct. The problem is that it's not clear. Ideally, you would have a warning which specifically mentions the fact that you can't do that sort of thing in a scope statement. Unless the programmer is thinking about what exactly scope() becomes, the unreachable statement warning will be confusing. So, that's a problem. It is, however, correct. It probably merits its own bug report. - Jonathan M Davis
Jul 18 2010
parent reply strtr <strtr sp.am> writes:
== Quote from Jonathan M Davis (jmdavisprog gmail.com)'s article
 On Sunday 18 July 2010 19:14:11 strtr wrote:
 I'm not sure whether you missed my point or are simple thinking out loud
 about unreachable code being a warning.
 My point was that the unreachable warning was wrong as there is no
 unreachable code.
Except that there _is_. You just can't see it. scope(X) creates a try-catch block. So, scope(exit) whatever; /* code */ becomes try { */ code */ } finally { whatever; } scope(success) whatever; /* code */ becomes /* code */ whatever; scope(failure) whatever; /* code */ becomes try { /* code */ } catch(Exception e) { whatever; throw e; } So, something like scope(failure) continue; /* code */ becomes try { /* code */ } catch(Exception e) { continue; throw e; } The throw statement is then unreachable. So, the warning is correct. The problem is that it's not clear. Ideally, you would have a warning which specifically mentions the fact that you can't do that sort of thing in a scope statement. Unless the programmer is thinking about what exactly scope() becomes, the unreachable statement warning will be confusing. So, that's a problem. It is, however, correct. It probably merits its own bug report. - Jonathan M Davis
Thanks for the explanation! But what you are talking about is implementation, nowhere in the spec does it say anything like this (or did I just miss it :). I could find only this about scope(failure): "scope(failure) executes NonEmptyOrScopeBlockStatement when the scope exits due to exception unwinding." So at the very least it is a documentation bug: It should say something about catching the exception and then re-throwing it, or explain that scope guards are sugar for re-throwing try statements
Jul 18 2010
parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 18 July 2010 19:47:37 strtr wrote:
 Thanks for the explanation!
 But what you are talking about is implementation, nowhere in the spec does
 it say anything like this (or did I just miss it :).
 I could find only this about scope(failure):
 "scope(failure) executes NonEmptyOrScopeBlockStatement  when the scope
 exits due to exception unwinding."
 So at the very least it is a documentation bug:
 It should say something about catching the exception and then re-throwing
 it, or explain that scope guards are sugar for re-throwing try statements
Bug report created: http://d.puremagic.com/issues/show_bug.cgi?id=4484
Jul 18 2010