digitalmars.D - Signed word lengths and indexes

bearophile (10/11) Jun 14 2010 I have found a Reddit discussion few days old:

Byron Heads (3/11) Jun 14 2010 Isn't this why D has foreach and foreach_reverse?

bearophile (5/6) Jun 14 2010 If you mean the exact problem the original article was talking about, th...

div0 (49/64) Jun 14 2010 Well for a start, you lose half your addressable memory.

Walter Bright (3/9) Jun 14 2010 Yes, many times.

Alex Makhotin (7/15) Jun 15 2010 A link on the discussion or examples of unpractical explicit cast would

Walter Bright (4/10) Jun 15 2010 I don't have one, the message database of this n.g. is enormous. You can...

Andrei Alexandrescu (5/18) Jun 15 2010 The discussions about polysemous types should be relevant. We tried to

Steven E. Harris (5/7) Jun 14 2010 What does "length" represent here? It's not clear to me how "i"

BCS (7/14) Jun 14 2010 My thought exactly.

Steven Schveighoffer (7/19) Jun 15 2010 i is unsigned, and therefore can never be less than 0. It's actually a ...

bearophile (4/6) Jun 15 2010 Clever code is bad. It must be minimized. In some rare situations it bec...

Steven Schveighoffer (12/19) Jun 15 2010 Clever code is bad? What are you smoking? In my opinion, clever code

Pelle (6/26) Jun 15 2010 Clever code is bad because you have to think a couple of times more

Adam Ruppe (6/8) Jun 15 2010 That's wrong rather than clever though.

Pelle (5/14) Jun 15 2010 Using the length is meaningless, any uint >= length will work just as

Lars T. Kyllingstad (12/30) Jun 15 2010 As long as you only decrease by one, your trick will work just as well. ...

Pelle (3/27) Jun 15 2010 The same can be said if you use length, and length happens to be big.

Adam Ruppe (7/11) Jun 15 2010 The nice thing about length is x < arr.length can be read as "oh, stop
Steven Schveighoffer (5/20) Jun 15 2010 No, it's not.

BCS (19/20) Jun 15 2010 shouldn't != doesn't.

Steven Schveighoffer (14/47) Jun 15 2010 This is a temporary problem. Once you get used to any particular coding...

BCS (15/47) Jun 15 2010 People cutting you off on the road is a temporary problem, once you tell...

Steven Schveighoffer (13/55) Jun 15 2010 In fact, I meant the specific you. Once a person gets used to any

BCS (14/55) Jun 15 2010 Yes, once Pelle (sorry to pick on you) gets used to any particular codin...
Don (4/71) Jun 15 2010 I would say, if you have trouble understanding that trick, you should

Simen kjaeraas (8/13) Jun 15 2010 Hardwired? Hardly. However, continuous number systems are ubiquitous,

BCS (10/40) Jun 15 2010 Cleaver in my book normally equates to: requiters extra thought to creat...

Steven Schveighoffer (13/42) Jun 15 2010 Clever code does not have to be hard to understand. In this case, it's ...

Adam Ruppe (9/9) Jun 15 2010 Not referring to anyone in particular, but it just occurred to me:

Graham Fawcett (11/23) Jun 15 2010 Well, conceptually maybe. :) Python integers don't overflow, they

BCS (14/51) Jun 15 2010 Yup, and a second is several times to long for code that accomplishes so...

Justin Johansson (6/14) Jun 15 2010 I agree with bearophile's sentiments.

BCS (6/32) Jun 15 2010 It's /to/ clever. That's the problem. If you haven't seen it in a while,...

Steven Schveighoffer (6/32) Jun 15 2010 This is easily solved - put in a comment. I frequently put comments in ...

BCS (10/46) Jun 15 2010 All else being equal, code that *requiters* comments to understand is in...

Steven Schveighoffer (17/30) Jun 15 2010 Code should *always* have comments. I hate reading code that doesn't ha...

BCS (21/52) Jun 15 2010 I agree. It should have comments. But if stripping them out would render...

Steven Schveighoffer (16/37) Jun 15 2010 Because the alternatives are uglier, and it's not as easy to see subtle ...

BCS (17/48) Jun 15 2010 If *any* user *ever* has to ask a question about how code, that does som...

Kagamin (2/30) Jun 15 2010 foreach_reverse, lol

bearophile (15/37) Jun 14 2010 This matters mostly with char/ubyte/byte arrays on 32 bit systems. If yo...

Walter Bright (15/45) Jun 14 2010 D provides powerful abstractions for iteration; it is becoming less and ...

Ellery Newcomer (17/75) Jun 14 2010 I think the problem is people don't generally think of fixnums as

Walter Bright (4/15) Jun 14 2010 Like I said, this didn't appear in Python until quite recently (3.0), so...

bearophile (4/6) Jun 14 2010 You are wrong, see my other answer.

bearophile (25/39) Jun 14 2010 Modern languages must understand that there are other forms of safety be...

bearophile (5/8) Jun 14 2010 OK. Then just removing as many unsigned words as possible from normal co...
Walter Bright (31/90) Jun 14 2010 D's safe mode, integer overflow *cannot* lead to memory corruption. So w...

Pelle (3/12) Jun 15 2010 A long in pythonic would be a BigInt in D, so no overflows. Python
bearophile (18/29) Jun 15 2010 I see. We can drop this, then.

bearophile (3/4) Jun 15 2010 Sorry, I meant they can use unsigned words.
Walter Bright (2/4) Jun 15 2010 Why?

bearophile (8/11) Jun 15 2010 This is partially off-topic to the topic of this thread.

Don (7/18) Jun 15 2010 Indeed, only a subset of D is useful for low-level development. But D
Alex Makhotin (7/8) Jun 15 2010 Right. That's why I well respect the point of view of Linus on that
Walter Bright (32/59) Jun 15 2010 I'd rephrase that as D supports many different styles. One of those styl...

Andrei Alexandrescu (3/82) Jun 15 2010 Andrei
Walter Bright (7/9) Jun 15 2010 What bothers me about this discussion is consider D with features 1 2 3 ...
Stephan (2/5) Jun 16 2010 Why not make such a change in a future release of the official version ?

Walter Bright (3/9) Jun 16 2010 It's pretty low on the priority list, because the absence of such a swit...

Walter Bright (4/14) Jun 16 2010 I would move it up in the priority if there was a serious project that n...

Walter Bright (5/8) Jun 15 2010 It's interesting that D already has most of the gcc extensions:

bearophile (78/92) Jun 15 2010 A problem is that some of those D features can worsen a kernel code. So ...

Walter Bright (12/44) Jun 15 2010 The Arduino is an 8 bit machine. D is designed for 32 bit and up machine...

Adam Ruppe (3/7) Jun 15 2010 Can't you accomplish the same thing with some minor sprinkling of

Don (4/19) Jun 15 2010 One was fixed in this week's DMD release.
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (11/28) Jun 15 2010 e

Walter Bright (3/6) Jun 15 2010 How so? I thought most 64 bit C compilers were specifically designed to ...

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (13/20) Jun 16 2010 I can't isolate it to a minimal test case, but at my job, we make

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (37/53) Jun 16 2010 boundary="------------080305000704070801050406"

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (23/43) Jun 16 2010 to
Andrei Alexandrescu (20/42) Jun 16 2010 Whoa! That's indeed unfortunate. Allow me some more whoring for TDPL:

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (15/61) Jun 16 2010 to

Walter Bright (2/21) Jun 16 2010 Easy. offset should be a size_t, not an unsigned.

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (16/38) Jun 16 2010 to

Don (13/32) Jun 17 2010 I agree.

Justin Spahr-Summers (4/11) Jun 17 2010 127, right? I know at least RISC processors tend to have instructions

Don (5/17) Jun 17 2010 Surprise! c == -1.

BCS (6/8) Jun 17 2010 Why would it ever need to be promoted? Unless all (most?) CPUs have only...

Don (2/13) Jun 17 2010 It shouldn't NEED to. But C defined that >> and << operate that way.

BCS (4/20) Jun 17 2010 At leat for the >>> can we break that? C doesn't even *have* a >>> opera...

Kagamin (5/11) Jun 17 2010 :)

Don (4/17) Jun 17 2010 No. It's a design flaw, not a bug. I think it could only be fixed by

KennyTM~ (4/21) Jun 17 2010 I disagree. The flaw is whether x should be promoted to

Don (9/35) Jun 17 2010 The range of typeof(x & y) can never exceed the range of typeof(x), no

BCS (7/12) Jun 17 2010 However it's not that way for the ternary op, so there is a (somewhat re...
KennyTM~ (21/56) Jun 17 2010 That's arguable. But (byte & int -> int) is meaningful because (&) is

Don (8/58) Jun 17 2010 If y is a variable, it actually performs x >>> (y&31);

KennyTM~ (2/60) Jun 17 2010 Too bad.

Andrei Alexandrescu (4/38) Jun 17 2010 Wait a minute. D should never allow an implicit narrowing conversion. It...

Don (9/48) Jun 17 2010 It'll make it illegal, but it won't make it usable.

Steven Schveighoffer (4/6) Jun 17 2010 Java doesn't have unsigned values, so it's necessary to use regular int'...

Walter Bright (4/12) Jun 17 2010 The reason D has >>> is to cause an unsigned right shift to be generated...

Andrei Alexandrescu (9/23) Jun 17 2010 No.

Walter Bright (6/31) Jun 17 2010 It's not a perfect replacement, as in if T is a custom integer type, you...

Andrei Alexandrescu (13/48) Jun 17 2010 Let me think when I wanted an unsigned shift against an

Walter Bright (7/14) Jun 17 2010 Generally the irritation I feel whenever I right shift and have to go ba...

Andrei Alexandrescu (3/20) Jun 17 2010 I'm sure all linker asm writers will be happy about that feature :o}.
Don (4/20) Jun 18 2010 I've read the OMF spec, and I know it includes shorts and bytes.

Walter Bright (3/26) Jun 18 2010 I can send you the source if you like .

Andrei Alexandrescu (3/21) Jun 18 2010 The please rule it out of the language.

Don (5/18) Jun 17 2010 Unfortunately it doesn't work. You still can't do an unsigned right

BCS (6/10) Jun 18 2010 I still haven't seen anyone address how typeof(a>>>b) == typeof(a) break...

Simen kjaeraas (5/7) Jun 19 2010 It doesn't, of course. However, it is desirable to have similar

Don (3/11) Jun 20 2010 Which is why I said that it doesn't seem possible to make >>> work,

BCS (7/19) Jun 20 2010 At least for me, I find the current situation more surprising than the a...

Andrei Alexandrescu (11/61) Jun 17 2010 Three times. Three times I tried to convince Walter to remove that crap

bearophile (14/17) Jun 17 2010 I know this is off-topic in this thread. I remember the long thread abou...

Andrei Alexandrescu (7/23) Jun 17 2010 I agree. But even within the current language, value range propagation

Kagamin (2/3) Jun 16 2010 I've hit the bug using size_t at the right side of a+=-b (array length)....

Justin Spahr-Summers (6/12) Jun 16 2010 This sounds more like an issue with file offsets being longs,

Kagamin (6/11) Jun 17 2010 1. Ironically the issue is not in file offset's signedness. You still hi...

Justin Spahr-Summers (17/35) Jun 17 2010 How so? Subtracting a size_t from a ulong offset will only cause

Kagamin (13/44) Jun 17 2010 May be, you didn't see the testcase.

Justin Spahr-Summers (28/92) Jun 17 2010 I did see that, but that's erroneous code. Maybe the compiler could warn...

Kagamin (3/5) Jun 14 2010 CLS bans unsigneds.

Andrei Alexandrescu (3/10) Jun 14 2010 CLS = ?

bearophile (5/8) Jun 15 2010 I think he means "Common Language Specification":

Kagamin (2/10) Jun 15 2010 It seems to be reversed for byte...

Justin Johansson (3/15) Jun 15 2010 "Clear Screen"

Kagamin (4/12) Jun 15 2010 Doesn't OS think in terms of pages? And... yes... I have 5 gigs of memor...
bearophile (5/7) Jun 15 2010 A great programmer writes code as simple as possible (but not simpler).

Walter Bright (8/13) Jun 15 2010 I've never met a single programmer or engineer who didn't believe and re...

bearophile (8/11) Jun 15 2010 I don't know if those extra D features are enough. And the C dialect use...

Walter Bright (17/49) Jun 15 2010 I'll await your reply there.

Simen kjaeraas (11/28) Jun 15 2010 I believe the point of Linus (and probably bearophile) was not that C++

Walter Bright (5/34) Jun 15 2010 To some extent, yes. My point was that C++ doesn't have a whole lot beyo...

Jeff Nowakowski (4/5) Jun 16 2010 I find this hard to believe. I seem to recall that you were personally

Walter Bright (2/9) Jun 16 2010 Andrei explained transitivity to me and convinced me of its utility.

Jeff Nowakowski (7/16) Jun 16 2010 Ok, but lots of people have been talking about const correctness for

Walter Bright (15/34) Jun 16 2010 I've talked with C++ experts for years about const. Not one of them ever...

Walter Bright (3/7) Jun 16 2010 I wish to add that I've not heard any proposal or discussion of adding

Jeff Nowakowski (6/13) Jun 16 2010 I know the Javari paper was mentioned here by Bruno. Also, the idea of

bearophile (7/13) Jun 15 2010 I don't understand this, please explain better.

Walter Bright (4/23) Jun 15 2010 Changing the sign of size_t from unsigned to signed when going from 32 t...

bearophile (5/7) Jun 16 2010 If D arrays use signed words as indexes on 32 bit systems then only half...

Walter Bright (4/11) Jun 16 2010 If we go back in the thread, the argument for the signed size_t argument...

bearophile (8/11) Jun 16 2010 I don't fully understand what you mean. On 32 bit systems I can accept a...

dsimcha (20/31) Jun 16 2010 and lists and collections (or a call to malloc) with only 2_147_483_648 ...

Kagamin (3/23) Jun 16 2010 Yo, dude!

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (13/37) Jun 17 2010 rs.D&article_id=3D97545

Kagamin (2/7) Jun 17 2010 You said it yourself: the compiler can be modified for kernel developmen...
bearophile (82/89) Jun 18 2010 So D isn't a "better C" because you can't use it in a *large* number of ...

Michel Fortin (9/13) Jun 18 2010 Bypassing bound checks is as easy as appending ".ptr":

bearophile (17/23) Jun 18 2010 If you try to compile this:

Adam Ruppe (45/47) Jun 18 2010 D need be no uglier than C. Here's my implementation:

bearophile (5/9) Jun 18 2010 Why destroy instead of ~this() ?

Adam Ruppe (10/14) Jun 18 2010 It allocates and deallocates the memory rather than initialize and

bearophile (29/31) Jun 18 2010 I see and I'd like to know! :-)

Adam Ruppe (8/12) Jun 18 2010 Huh, weird. Doesn't make too much of a difference in practice though,

bearophile (4/6) Jun 18 2010 Probably it can be fixed, but you have to be careful, because the paddin...

bearophile <bearophileHUGS lycos.com> writes:

I have found a Reddit discussion few days old:
http://www.reddit.com/r/programming/comments/cdwz5/the_perils_of_unsigned_iteration_in_cc/

It contains this, that I quote (I have no idea if it's true), plus follow-ups:

At Google using uints of all kinds for anything other than bitmasks or other
inherently bit-y, non computable things is strongly discouraged. This includes
things like array sizes, and the warnings for conversion of size_t to int are
disabled. I think it's a good call.<

I have expressed similar ideas here:
http://d.puremagic.com/issues/show_bug.cgi?id=3843

Unless someone explains me why I am wrong, I will keep thinking that using
unsigned words to represent lengths and indexes, as D does, is wrong and

a better design choice.

In a language as greatly numerically unsafe as D (silly C-derived conversion
rules, fixed-sized numbers used everywhere on default, no runtime numerical
overflows) the usage of unsigned numbers can be justified inside bit vectors,
bitwise operations, and few other similar situations only.

If D wants to be "a systems programming language. Its focus is on combining the
power and high performance of C and C++ with the programmer productivity of
modern languages like Ruby and Python." it must understand that numerical
safety is one of the not secondary things that make those languages as Ruby and
Python more productive.

Bye,
bearophile

Jun 14 2010

Byron Heads <wyverex.cypher gmail.com> writes:

On Mon, 14 Jun 2010 16:52:04 -0400, bearophile wrote:

 If D wants to be "a systems programming language. Its focus is on
 combining the power and high performance of C and C++ with the
 programmer productivity of modern languages like Ruby and Python." it
 must understand that numerical safety is one of the not secondary things
 that make those languages as Ruby and Python more productive.
 
 Bye,
 bearophile

Isn't this why D has foreach and foreach_reverse?

-By

Jun 14 2010

bearophile <bearophileHUGS lycos.com> writes:

Byron Heads:
 Isn't this why D has foreach and foreach_reverse?

If you mean the exact problem the original article was talking about, then you
are right.
But foreach and foreach_reverse are not enough to solve the general safety
problem caused by the widespread usage of unsigned words in a language that at
the same time has C conversion rules, uses mostly fixed-sized numbers and lacks
run-time integral numerical overflows. Four things that if present at the same
time create an explosive mix. I am happy to see that (if that quote is right)
Google C++ coding standards agree with me about this.

Bye,
bearophile

Jun 14 2010

div0 <div0 users.sourceforge.net> writes:

On 14/06/2010 21:52, bearophile wrote:
 I have found a Reddit discussion few days old:
 http://www.reddit.com/r/programming/comments/cdwz5/the_perils_of_unsigned_iteration_in_cc/

  It contains this, that I quote (I have no idea if it's true), plus
 follow-ups:

 At Google using uints of all kinds for anything other than bitmasks
 or other inherently bit-y, non computable things is strongly
 discouraged. This includes things like array sizes, and the
 warnings for conversion of size_t to int are disabled. I think it's
 a good call.<

 I have expressed similar ideas here:
 http://d.puremagic.com/issues/show_bug.cgi?id=3843

 Unless someone explains me why I am wrong, I will keep thinking that
 using unsigned words to represent lengths and indexes, as D does, is

 that purpose) in D is a better design choice.

Well for a start, you lose half your addressable memory.

unsigned numbers are only a problem if you don't understand how they 
work, but that goes for just about everything else as well.

Personally I hate the use of signed numbers as array indices; it's 
moronic and demonstrates the writers lack of understanding. It's very 
rare to actually want to index an array with a negative number.
Last time I did that was years ago when writing in assembler; and that
was an optimisation hack to squeeze maximum performance out of my code.

c.f.

Item getItem(int indx) {
   if(indx >= 0 && indx < _arr.length)
     return _arr[indx];
   throw new Error(...)
}

vs.

// cleaner no?
Item getItem(uint indx) {
   if(indx < _arr.length)
     return _arr[indx];
   throw new Error(...)
}

and backwards iteration:

for(int i = end - 1; i >= 0; --i)
   ...

vs

for(uint i = end - 1; i < length; --i)
   ...

Ok about the same, but I find the second more clear, the
i < length clearly indicates iteration over the whole array.

And that second wrong bit of code on the blog is wrong
with signed numbers as well:

int len = strlen(some_c_str);
// say some_c_str is empty so len = 0
int i;
for (i = 0; i < len - 1; ++i) {
   // so len - 1 == -1
   // iterate until i wraps round and becomes -1
}

Using 'int's doesn't magically fix it. Wrong code is just wrong.

I do think that allowing un-casted assignments between signed/unsigned 
is a problem though; that's where most of the bugs creep up I've come 
across crop up. I think D should simply disallow implicit mixing of 
signd-ness.

Hasn't that been discussed before? (I'm not referring to the recent post 
in d.learn) It seems familiar.

-- 
My enormous talent is exceeded only by my outrageous laziness.
http://www.ssTk.co.uk

Jun 14 2010

Walter Bright <newshound1 digitalmars.com> writes:

div0 wrote:
 I do think that allowing un-casted assignments between signed/unsigned 
 is a problem though; that's where most of the bugs creep up I've come 
 across crop up. I think D should simply disallow implicit mixing of 
 signd-ness.

Andrei and I went down that alley for a while. It's not practical.


 Hasn't that been discussed before? (I'm not referring to the recent post 
 in d.learn) It seems familiar.

Yes, many times.

Jun 14 2010

Alex Makhotin <alex bitprox.com> writes:

Walter Bright wrote:
 div0 wrote:
 I do think that allowing un-casted assignments between signed/unsigned 
 is a problem though; that's where most of the bugs creep up I've come 
 across crop up. I think D should simply disallow implicit mixing of 
 signd-ness.

 
 Andrei and I went down that alley for a while. It's not practical.
 

A link on the discussion or examples of unpractical explicit cast would 
be helpful to me to try to understand such decision.


-- 
Alex Makhotin,
the founder of BITPROX,
http://bitprox.com

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

Alex Makhotin wrote:
 Walter Bright wrote:
 Andrei and I went down that alley for a while. It's not practical.

 
 A link on the discussion or examples of unpractical explicit cast would 
 be helpful to me to try to understand such decision.

I don't have one, the message database of this n.g. is enormous. You can try
the 
search box here:


http://www.digitalmars.com/d/archives/digitalmars/D/index.html

Jun 15 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Alex Makhotin wrote:
 Walter Bright wrote:
 Andrei and I went down that alley for a while. It's not practical.

 A link on the discussion or examples of unpractical explicit cast 
 would be helpful to me to try to understand such decision.

 
 I don't have one, the message database of this n.g. is enormous. You can 
 try the search box here:
 
 
 http://www.digitalmars.com/d/archives/digitalmars/D/index.html

The discussions about polysemous types should be relevant. We tried to 
fix things quite valiantly. Currently I believe that improving value 
range propagation is the best way to go.

Andrei

Jun 15 2010

"Steven E. Harris" <seh panix.com> writes:

div0 <div0 users.sourceforge.net> writes:

 for(uint i = end - 1; i < length; --i)
   ...

What does "length" represent here? It's not clear to me how "i"
descending toward zero is going to break the guard condition.

-- 
Steven E. Harris

Jun 14 2010

BCS <none anon.com> writes:

Hello Steven,

 div0 <div0 users.sourceforge.net> writes:
 
 for(uint i = end - 1; i < length; --i)
 ...

 What does "length" represent here? It's not clear to me how "i"
 descending toward zero is going to break the guard condition.
 

My thought exactly.

If i<j and you --i, I'd assume i<j, if your code depends on the case where 
the assumption is wrong, don't ask me to do a code review because I won't 
sign off on it.

-- 
... <IXOYE><

Jun 14 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 14 Jun 2010 21:48:10 -0400, BCS <none anon.com> wrote:

 Hello Steven,

 div0 <div0 users.sourceforge.net> writes:

 for(uint i = end - 1; i < length; --i)
 ...

 What does "length" represent here? It's not clear to me how "i"
 descending toward zero is going to break the guard condition.

 My thought exactly.

 If i<j and you --i, I'd assume i<j, if your code depends on the case  
 where the assumption is wrong, don't ask me to do a code review because  
 I won't sign off on it.

i is unsigned, and therefore can never be less than 0.  It's actually a  
clever way to do it that I've never thought of.

Read it more like this:

for(uint i = end - 1; i < length && i >= 0; --i)

But the i >= 0 is implicit because i is unsigned.

-Steve

Jun 15 2010

bearophile <bearophileHUGS lycos.com> writes:

Steven Schveighoffer:
 i is unsigned, and therefore can never be less than 0.  It's actually a  
 clever way to do it that I've never thought of.

Clever code is bad. It must be minimized. In some rare situations it becomes
useful, but its usage must be seen as a failure of the programmer, that was
unable to write not-clever code that does the same things.

Bye,
bearophile

Jun 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 15 Jun 2010 07:30:52 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Steven Schveighoffer:
 i is unsigned, and therefore can never be less than 0.  It's actually a
 clever way to do it that I've never thought of.

 Clever code is bad. It must be minimized. In some rare situations it  
 becomes useful, but its usage must be seen as a failure of the  
 programmer, that was unable to write not-clever code that does the same  
 things.

Clever code is bad?  What are you smoking?  In my opinion, clever code  
that is clear and concise should always be favored over code that is  
unnecessarily verbose.

In this particular instance, the code is both clear and concise.

The following line of code should generate the exact same code, but is  
more verbose:

for(uint i = end - 1; i < length && i >= 0; --i)

The compiler will throw away the second check during optimization, because  
i is always >= 0.  I don't see why such code should be preferred.

-Steve

Jun 15 2010

Pelle <pelle.mansson gmail.com> writes:

On 06/15/2010 02:10 PM, Steven Schveighoffer wrote:
 On Tue, 15 Jun 2010 07:30:52 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:

 Steven Schveighoffer:
 i is unsigned, and therefore can never be less than 0. It's actually a
 clever way to do it that I've never thought of.

 Clever code is bad. It must be minimized. In some rare situations it
 becomes useful, but its usage must be seen as a failure of the
 programmer, that was unable to write not-clever code that does the
 same things.

 Clever code is bad? What are you smoking? In my opinion, clever code
 that is clear and concise should always be favored over code that is
 unnecessarily verbose.

Clever code is bad because you have to think a couple of times more 
every time you see it. Also, it looks wrong.

 In this particular instance, the code is both clear and concise.

 The following line of code should generate the exact same code, but is
 more verbose:

 for(uint i = end - 1; i < length && i >= 0; --i)

 The compiler will throw away the second check during optimization,
 because i is always >= 0. I don't see why such code should be preferred.

 -Steve

This will probably generate similar code:

for (uint i = end - 1; i < uint.max; --i)

Same functionality, really clever.

Jun 15 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/15/10, Pelle <pelle.mansson gmail.com> wrote:
 for (uint i = end - 1; i < uint.max; --i)

 Same functionality, really clever.

That's wrong rather than clever though.

for(i < length) is saying "continue as long as you are inside the
array's bounds", which is exactly what you mean in that loop. The only
"tricky" is the understanding that an array index is never negative,
but this shouldn't require extra thought in the first place.

Jun 15 2010

Pelle <pelle.mansson gmail.com> writes:

On 06/15/2010 03:25 PM, Adam Ruppe wrote:
 On 6/15/10, Pelle<pelle.mansson gmail.com>  wrote:
 for (uint i = end - 1; i<  uint.max; --i)

 Same functionality, really clever.

 That's wrong rather than clever though.

 for(i<  length) is saying "continue as long as you are inside the
 array's bounds", which is exactly what you mean in that loop. The only
 "tricky" is the understanding that an array index is never negative,
 but this shouldn't require extra thought in the first place.

Using the length is meaningless, any uint >= length will work just as 
well. Using the length there is meaningless, since that's really not 
what you compare against.

Notice why clever tricks are bad? They generate meaningless discussions :)

Jun 15 2010

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

On Tue, 15 Jun 2010 16:05:08 +0200, Pelle wrote:

 On 06/15/2010 03:25 PM, Adam Ruppe wrote:
 On 6/15/10, Pelle<pelle.mansson gmail.com>  wrote:
 for (uint i = end - 1; i<  uint.max; --i)

 Same functionality, really clever.

 That's wrong rather than clever though.

 for(i<  length) is saying "continue as long as you are inside the
 array's bounds", which is exactly what you mean in that loop. The only
 "tricky" is the understanding that an array index is never negative,
 but this shouldn't require extra thought in the first place.

 
 Using the length is meaningless, any uint >= length will work just as
 well. Using the length there is meaningless, since that's really not
 what you compare against.

As long as you only decrease by one, your trick will work just as well.  
In a more general case, it won't:

    for (uint i=end-1; i<uint.max; i--)
    {
        if (badTiming) i--;
        // Oops, we may just have set i = uint.max - 1.
    }


 Notice why clever tricks are bad? They generate meaningless discussions
 :)

I don't think the discussion is meaningless.  I learned a new trick (or a 
new abomination, depending on your viewpoint), that I'll keep in mind 
next time I write a similar loop. ;)

-Lars

Jun 15 2010

Pelle <pelle.mansson gmail.com> writes:

On 06/15/2010 04:12 PM, Lars T. Kyllingstad wrote:
 On Tue, 15 Jun 2010 16:05:08 +0200, Pelle wrote:

 On 06/15/2010 03:25 PM, Adam Ruppe wrote:
 On 6/15/10, Pelle<pelle.mansson gmail.com>   wrote:
 for (uint i = end - 1; i<   uint.max; --i)

 Same functionality, really clever.

 That's wrong rather than clever though.

 for(i<   length) is saying "continue as long as you are inside the
 array's bounds", which is exactly what you mean in that loop. The only
 "tricky" is the understanding that an array index is never negative,
 but this shouldn't require extra thought in the first place.

 Using the length is meaningless, any uint>= length will work just as
 well. Using the length there is meaningless, since that's really not
 what you compare against.

 As long as you only decrease by one, your trick will work just as well.
 In a more general case, it won't:

      for (uint i=end-1; i<uint.max; i--)
      {
          if (badTiming) i--;
          // Oops, we may just have set i = uint.max - 1.
      }

The same can be said if you use length, and length happens to be big. 
You really should use continue in this case.

Jun 15 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/15/10, Pelle <pelle.mansson gmail.com> wrote:
 Using the length is meaningless, any uint >= length will work just as
 well. Using the length there is meaningless, since that's really not
 what you compare against.

The nice thing about length is x < arr.length can be read as "oh, stop
upon going out of bounds". Yes, other numbers would do the same thing
here, but they wouldn't read the same way.

It is more about what the code is saying to the human reader than to
the computer.

 Notice why clever tricks are bad? They generate meaningless discussions :)

Haha, yes!

Jun 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 15 Jun 2010 10:05:08 -0400, Pelle <pelle.mansson gmail.com> wrote:

 On 06/15/2010 03:25 PM, Adam Ruppe wrote:
 On 6/15/10, Pelle<pelle.mansson gmail.com>  wrote:
 for (uint i = end - 1; i<  uint.max; --i)

 Same functionality, really clever.

 That's wrong rather than clever though.

 for(i<  length) is saying "continue as long as you are inside the
 array's bounds", which is exactly what you mean in that loop. The only
 "tricky" is the understanding that an array index is never negative,
 but this shouldn't require extra thought in the first place.

 Using the length is meaningless, any uint >= length will work just as  
 well. Using the length there is meaningless, since that's really not  
 what you compare against.

No, it's not.

for(uint i = initialize(); i < length; modify(i))

This construct is valid no matter what initialize or modify does to i.

-Steve

Jun 15 2010

BCS <none anon.com> writes:

Hello Adam,

 but this shouldn't require extra thought in the first place.

shouldn't != doesn't.

When I first saw the code, it took me about a second to go from, "backwards 
loop" to "wait, thats wrong" to "Oh, I guess that works". That's two stages 
and 750ms to long.

How would I write the loop?

foreach_reverse(uint i, 0 .. length) { ... }

or

for(uint i = length; i > 0;) { --i; ... }

or

for(int i = length; i >= 0; --i) { ... }

or

uint i = length -1;
do { ... } while(i-- > 0);

None of those at first glance seem to be wrong or work differently than they 
do.
 
-- 
... <IXOYE><

Jun 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 15 Jun 2010 08:49:56 -0400, Pelle <pelle.mansson gmail.com> wrote:

 On 06/15/2010 02:10 PM, Steven Schveighoffer wrote:
 On Tue, 15 Jun 2010 07:30:52 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:

 Steven Schveighoffer:
 i is unsigned, and therefore can never be less than 0. It's actually a
 clever way to do it that I've never thought of.

 Clever code is bad. It must be minimized. In some rare situations it
 becomes useful, but its usage must be seen as a failure of the
 programmer, that was unable to write not-clever code that does the
 same things.

 Clever code is bad? What are you smoking? In my opinion, clever code
 that is clear and concise should always be favored over code that is
 unnecessarily verbose.

 Clever code is bad because you have to think a couple of times more  
 every time you see it.

This is a temporary problem.  Once you get used to any particular coding  
trick, you understand it better.

 Also, it looks wrong.

Why?  i is unsigned, therefore >= 0, and must be < length.  That seems  
reasonable and correct to me.

 In this particular instance, the code is both clear and concise.

 The following line of code should generate the exact same code, but is
 more verbose:

 for(uint i = end - 1; i < length && i >= 0; --i)

 The compiler will throw away the second check during optimization,
 because i is always >= 0. I don't see why such code should be preferred.

 -Steve

 This will probably generate similar code:

 for (uint i = end - 1; i < uint.max; --i)

 Same functionality, really clever.

What if end > length?

This is no more clever than the original, but allows bugs.  It's not  
clever, it's wrong.  In addition, it's purposefully obfuscated, while the  
original code is quite clear.  I can obfuscate even further, but I don't  
see why you would want such a thing:

for(uint i = end - 1; i < -1; --i)

"There's a fine line between clever and stupid" --Nigel Tufnel, This is  
Spinal Tap

-Steve

Jun 15 2010

BCS <none anon.com> writes:

Hello Steven,

 On Tue, 15 Jun 2010 08:49:56 -0400, Pelle <pelle.mansson gmail.com>
 wrote:
 
 On 06/15/2010 02:10 PM, Steven Schveighoffer wrote:
 
 On Tue, 15 Jun 2010 07:30:52 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:
 Steven Schveighoffer:
 
 i is unsigned, and therefore can never be less than 0. It's
 actually a clever way to do it that I've never thought of.
 

 Clever code is bad. It must be minimized. In some rare situations
 it becomes useful, but its usage must be seen as a failure of the
 programmer, that was unable to write not-clever code that does the
 same things.
 

 Clever code is bad? What are you smoking? In my opinion, clever code
 that is clear and concise should always be favored over code that is
 unnecessarily verbose.
 

 Clever code is bad because you have to think a couple of times more
 every time you see it.
 

 This is a temporary problem.  Once you get used to any particular
 coding  trick, you understand it better.
 

People cutting you off on the road is a temporary problem, once you tell 
everyone off, they will understand better.

Your statement might have merit if the "you" in it were the specific "you" 
rather than the universal "you". 

If that assumption is made more explicit the statement becomes blatantly 
silly: "This is a temporary problem. Once everyone gets used to any particular 
coding  trick, everyone understands it better."


 Also, it looks wrong.
 

 Why?  i is unsigned, therefore >= 0, and must be < length.  That seems
 reasonable and correct to me.
 

It looks wrong because i only gets smaller. People are hardwired to think 
about continues number system, not modulo number system (explain that 0 - 
1 = -1 to a 6 year old; easy, explain that 0 - 1 = 2^32-1 to them, good luck). 
Yes we can be trained to use such system, but most people still wont think 
that way reflexively. 

-- 
... <IXOYE><

Jun 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 15 Jun 2010 11:28:43 -0400, BCS <none anon.com> wrote:

 Hello Steven,

 On Tue, 15 Jun 2010 08:49:56 -0400, Pelle <pelle.mansson gmail.com>
 wrote:

 On 06/15/2010 02:10 PM, Steven Schveighoffer wrote:

 On Tue, 15 Jun 2010 07:30:52 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:
 Steven Schveighoffer:

 i is unsigned, and therefore can never be less than 0. It's
 actually a clever way to do it that I've never thought of.

 Clever code is bad. It must be minimized. In some rare situations
 it becomes useful, but its usage must be seen as a failure of the
 programmer, that was unable to write not-clever code that does the
 same things.

 Clever code is bad? What are you smoking? In my opinion, clever code
 that is clear and concise should always be favored over code that is
 unnecessarily verbose.

 Clever code is bad because you have to think a couple of times more
 every time you see it.

 This is a temporary problem.  Once you get used to any particular
 coding  trick, you understand it better.

 People cutting you off on the road is a temporary problem, once you tell  
 everyone off, they will understand better.

 Your statement might have merit if the "you" in it were the specific  
 "you" rather than the universal "you".

In fact, I meant the specific you.  Once a person gets used to any  
particular coding trick, that person will understand it better when the  
trick is encountered again.  This is a basic principle of learning.

 Also, it looks wrong.

 Why?  i is unsigned, therefore >= 0, and must be < length.  That seems
 reasonable and correct to me.

 It looks wrong because i only gets smaller. People are hardwired to  
 think about continues number system, not modulo number system (explain  
 that 0 - 1 = -1 to a 6 year old; easy, explain that 0 - 1 = 2^32-1 to  
 them, good luck). Yes we can be trained to use such system, but most  
 people still wont think that way reflexively.

It's really easy to explain.  Use an odometer as an example.  And we don't  
have to be specific in this case, you can substitue 'some very large  
number' for '2^32 - 1'.

Besides, why does a 6-year old have to understand a for loop?  D doesn't  
cater to people who can't grasp the modulo arithmetic concept.

I think that this discussion is becoming pointless.  Let's just accept  
that we don't have to review code for one another, and we like it that way  
:)

-Steve

Jun 15 2010

BCS <none anon.com> writes:

Hello Steven,

 On Tue, 15 Jun 2010 11:28:43 -0400, BCS <none anon.com> wrote:
 
 Hello Steven,
 
 On Tue, 15 Jun 2010 08:49:56 -0400, Pelle <pelle.mansson gmail.com>
 wrote:
 
 Clever code is bad because you have to think a couple of times more
 every time you see it.
 

 This is a temporary problem.  Once you get used to any particular
 coding  trick, you understand it better.
 

 People cutting you off on the road is a temporary problem, once you
 tell  everyone off, they will understand better.
 
 Your statement might have merit if the "you" in it were the specific
 "you" rather than the universal "you".
 

 In fact, I meant the specific you.  Once a person gets used to any
 particular coding trick, that person will understand it better when
 the  trick is encountered again.  This is a basic principle of
 learning.

Yes, once Pelle (sorry to pick on you) gets used to any particular coding 
trick, Pelle will understand it better when the trick is encountered again. 
But what about everyone else? If Pelle were the only one who was going to 
read your code, that would be fine. But unless you can, right now, list by 
name everyone who will ever read your (and if you can, just go buy a lottery 
ticket and retire) then anything but the universal "you" makes the statement 
irrelevant.

 
 Also, it looks wrong.
 

 Why?  i is unsigned, therefore >= 0, and must be < length.  That
 seems reasonable and correct to me.
 

 It looks wrong because i only gets smaller. People are hardwired to
 think about continues number system, not modulo number system
 (explain  that 0 - 1 = -1 to a 6 year old; easy, explain that 0 - 1 =
 2^32-1 to  them, good luck). Yes we can be trained to use such
 system, but most  people still wont think that way reflexively.
 

 It's really easy to explain.  Use an odometer as an example.  And we
 don't  have to be specific in this case, you can substitue 'some very
 large  number' for '2^32 - 1'.

Most 6 year olds will need to have an odometer explained to them first.

 Besides, why does a 6-year old have to understand a for loop?  D
 doesn't  cater to people who can't grasp the modulo arithmetic
 concept.

I wasn't taking about for loops, but the semantics of int vs. uint near zero. 
If a 6 year old can understand something, I won't have to think about it 
to work with it and I can use the time and cycles I gain for something else.

-- 
... <IXOYE><

Jun 15 2010

Don <nospam nospam.com> writes:

Steven Schveighoffer wrote:
 On Tue, 15 Jun 2010 11:28:43 -0400, BCS <none anon.com> wrote:
 
 Hello Steven,

 On Tue, 15 Jun 2010 08:49:56 -0400, Pelle <pelle.mansson gmail.com>
 wrote:

 On 06/15/2010 02:10 PM, Steven Schveighoffer wrote:

 On Tue, 15 Jun 2010 07:30:52 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:
 Steven Schveighoffer:

 i is unsigned, and therefore can never be less than 0. It's
 actually a clever way to do it that I've never thought of.

 Clever code is bad. It must be minimized. In some rare situations
 it becomes useful, but its usage must be seen as a failure of the
 programmer, that was unable to write not-clever code that does the
 same things.

 Clever code is bad? What are you smoking? In my opinion, clever code
 that is clear and concise should always be favored over code that is
 unnecessarily verbose.

 Clever code is bad because you have to think a couple of times more
 every time you see it.

 This is a temporary problem.  Once you get used to any particular
 coding  trick, you understand it better.

 People cutting you off on the road is a temporary problem, once you 
 tell everyone off, they will understand better.

 Your statement might have merit if the "you" in it were the specific 
 "you" rather than the universal "you".

 
 In fact, I meant the specific you.  Once a person gets used to any 
 particular coding trick, that person will understand it better when the 
 trick is encountered again.  This is a basic principle of learning.
 
 Also, it looks wrong.

 Why?  i is unsigned, therefore >= 0, and must be < length.  That seems
 reasonable and correct to me.

 It looks wrong because i only gets smaller. People are hardwired to 
 think about continues number system, not modulo number system (explain 
 that 0 - 1 = -1 to a 6 year old; easy, explain that 0 - 1 = 2^32-1 to 
 them, good luck). Yes we can be trained to use such system, but most 
 people still wont think that way reflexively.

 
 It's really easy to explain.  Use an odometer as an example.  And we 
 don't have to be specific in this case, you can substitue 'some very 
 large number' for '2^32 - 1'.
 
 Besides, why does a 6-year old have to understand a for loop?  D doesn't 
 cater to people who can't grasp the modulo arithmetic concept.
 
 I think that this discussion is becoming pointless.  Let's just accept 
 that we don't have to review code for one another, and we like it that 
 way :)
 
 -Steve

I would say, if you have trouble understanding that trick, you should 
NOT be using unsigned arithmetic EVER. And I agree that most people have 
trouble with it.

Jun 15 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

BCS <none anon.com> wrote:

 It looks wrong because i only gets smaller. People are hardwired to  
 think about continues number system, not modulo number system (explain  
 that 0 - 1 = -1 to a 6 year old; easy, explain that 0 - 1 = 2^32-1 to  
 them, good luck). Yes we can be trained to use such system, but most  
 people still wont think that way reflexively.

Hardwired? Hardly. However, continuous number systems are ubiquitous,
modulo systems are not.

As for teaching a 6-year old, give him a wheel with the numbers 0-9
written on each of the ten spokes, and ask him what number you get by
going backward one step from 0.

-- 
Simen

Jun 15 2010

BCS <none anon.com> writes:

Hello Steven,

 On Tue, 15 Jun 2010 07:30:52 -0400, bearophile
 <bearophileHUGS lycos.com>  wrote:
 
 Steven Schveighoffer:
 
 i is unsigned, and therefore can never be less than 0.  It's
 actually a clever way to do it that I've never thought of.
 

 Clever code is bad. It must be minimized. In some rare situations it
 becomes useful, but its usage must be seen as a failure of the
 programmer, that was unable to write not-clever code that does the
 same  things.
 

 Clever code is bad?  What are you smoking?  In my opinion, clever code
 that is clear and concise should always be favored over code that is
 unnecessarily verbose.

Cleaver in my book normally equates to: requiters extra thought to create 
and read. The exact opposite of clever is not dumb, but simple: with very 
un-clever code the reader is I/O bound, they can understand as fast as they 
can read it. 

 
 In this particular instance, the code is both clear and concise.

That code might be concise but it is not clear.

 
 The following line of code should generate the exact same code, but is
 more verbose:
 
 for(uint i = end - 1; i < length && i >= 0; --i)

That code is just as bad IMO and for exactly the same reason: you are counting 
on underflow and wrapping to make a i<j test start failing after i decrease.

 
 The compiler will throw away the second check during optimization,
 because  i is always >= 0.  I don't see why such code should be
 preferred.
 
 -Steve
 

-- 
... <IXOYE><

Jun 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 15 Jun 2010 10:34:21 -0400, BCS <none anon.com> wrote:

 Hello Steven,

 On Tue, 15 Jun 2010 07:30:52 -0400, bearophile
 <bearophileHUGS lycos.com>  wrote:

 Steven Schveighoffer:

 i is unsigned, and therefore can never be less than 0.  It's
 actually a clever way to do it that I've never thought of.

 Clever code is bad. It must be minimized. In some rare situations it
 becomes useful, but its usage must be seen as a failure of the
 programmer, that was unable to write not-clever code that does the
 same  things.

 Clever code is bad?  What are you smoking?  In my opinion, clever code
 that is clear and concise should always be favored over code that is
 unnecessarily verbose.

 Cleaver in my book normally equates to: requiters extra thought to  
 create and read. The exact opposite of clever is not dumb, but simple:  
 with very un-clever code the reader is I/O bound, they can understand as  
 fast as they can read it.

Clever code does not have to be hard to understand.  In this case, it's  
not hard to understand.  You admit yourself that you understood it within  
a second ;)

  In this particular instance, the code is both clear and concise.

 That code might be concise but it is not clear.

Since uint is declared inside the loop statement, I'd say it is clear --  
it's not open to misinterpretation.  If i was defined elsewhere, I'd agree.

  The following line of code should generate the exact same code, but is
 more verbose:
  for(uint i = end - 1; i < length && i >= 0; --i)

 That code is just as bad IMO and for exactly the same reason: you are  
 counting on underflow and wrapping to make a i<j test start failing  
 after i decrease.

Well, I guess that's one way to look at it.  But what I like a lot about  
the original example, is there is no mixing of unsigned/signed types.  So  
you are always dealing with unsigned, so you do not have to deal with  
worrying about integer promotion.

Thinking about unsigned arithmetic is sometimes difficult, but if you  
understand the rules, using underflow to your advantage is fine IMO.

-Steve

Jun 15 2010

Adam Ruppe <destructionator gmail.com> writes:

Not referring to anyone in particular, but it just occurred to me:

Python's use of -1 to mean length, and -2 to mean length -1 isn't
signed ints... it is using overflow! .... sort of. The thing is that
they are overflowing at a dynamic amount (array.length) instead of the
fixed size.

Actually, if it is treated as full blown proper overflow, you could
get some potentially useful stuff out of it. Given length == 5, -1 >
3. I don't know if Python actually lets you do that, but I doubt it.


But I just had a chuckle about that thought :)

Jun 15 2010

Graham Fawcett <fawcett uwindsor.ca> writes:

On Tue, 15 Jun 2010 11:26:34 -0400, Adam Ruppe wrote:

 Not referring to anyone in particular, but it just occurred to me:
 
 Python's use of -1 to mean length, and -2 to mean length -1 isn't signed
 ints... it is using overflow! .... sort of. The thing is that they are
 overflowing at a dynamic amount (array.length) instead of the fixed
 size.

Well, conceptually maybe. :) Python integers don't overflow, they
automatically convert up to bigints. It's more correct to just say
that Python specifies that a negative array index means a reference
from the right end.

 Actually, if it is treated as full blown proper overflow, you could get
 some potentially useful stuff out of it. Given length == 5, -1 > 3. I
 don't know if Python actually lets you do that, but I doubt it.

Only in explicit modular arithmetic, e.g. '-1 % 5 == 4'. You can slice
a Python list using both postive and negative indexes:

      [10,20,30,40,50][3:-1]  ==> [40]

...but that doesn't imply overflow or modular arithmetic: it's just
the array-indexing contract.

Graham

 
 
 But I just had a chuckle about that thought :)

Jun 15 2010

BCS <none anon.com> writes:

Hello Steven,

 On Tue, 15 Jun 2010 10:34:21 -0400, BCS <none anon.com> wrote:
 
 Hello Steven,
 
 On Tue, 15 Jun 2010 07:30:52 -0400, bearophile
 <bearophileHUGS lycos.com>  wrote:
 Steven Schveighoffer:
 
 i is unsigned, and therefore can never be less than 0.  It's
 actually a clever way to do it that I've never thought of.
 

 Clever code is bad. It must be minimized. In some rare situations
 it becomes useful, but its usage must be seen as a failure of the
 programmer, that was unable to write not-clever code that does the
 same  things.
 

 Clever code is bad?  What are you smoking?  In my opinion, clever
 code that is clear and concise should always be favored over code
 that is unnecessarily verbose.
 

 Cleaver in my book normally equates to: requiters extra thought to
 create and read. The exact opposite of clever is not dumb, but
 simple:  with very un-clever code the reader is I/O bound, they can
 understand as  fast as they can read it.
 

 Clever code does not have to be hard to understand.  In this case,
 it's  not hard to understand.  You admit yourself that you understood
 it within  a second ;)
 

Yup, and a second is several times to long for code that accomplishes something 
that simple.

 In this particular instance, the code is both clear and concise.
 

 That code might be concise but it is not clear.
 

 Since uint is declared inside the loop statement, I'd say it is clear
 --  it's not open to misinterpretation.  If i was defined elsewhere,
 I'd agree.

That i is a uint is clear, but any code that depends on underflow is IMO 
not clear as it requiters thinking in a (for most people) less than intuitive 
way.

 if you understand the rules

Other forms avoid that requirement. The fewer constraints/requirements you 
place on the reader of code the better.

I think the difference of opinion here stems from you basing your assessment 
on what is required of the (single) person who writes the code where as I'm 
basing my assessment on what is required of the (open set of) people who 
read the code.

-- 
... <IXOYE><

Jun 15 2010

Justin Johansson <no spam.com> writes:

bearophile wrote:
 Steven Schveighoffer:
 i is unsigned, and therefore can never be less than 0.  It's actually a  
 clever way to do it that I've never thought of.

 
 Clever code is bad. It must be minimized. In some rare situations it becomes
useful, but its usage must be seen as a failure of the programmer, that was
unable to write not-clever code that does the same things.
 
 Bye,
 bearophile

I agree with bearophile's sentiments.

To my interpretation this means that at sometimes trying to be clever is 
actually stupid.  If I misinterpret those sentiments, please correct me.

Cheers
Justin

Jun 15 2010

BCS <none anon.com> writes:

Hello Steven,

 On Mon, 14 Jun 2010 21:48:10 -0400, BCS <none anon.com> wrote:
 
 Hello Steven,
 
 div0 <div0 users.sourceforge.net> writes:
 
 for(uint i = end - 1; i < length; --i)
 ...

 What does "length" represent here? It's not clear to me how "i"
 descending toward zero is going to break the guard condition.
 

 My thought exactly.
 
 If i<j and you --i, I'd assume i<j, if your code depends on the case
 where the assumption is wrong, don't ask me to do a code review
 because  I won't sign off on it.
 

 i is unsigned, and therefore can never be less than 0.  It's actually
 a  clever way to do it that I've never thought of.

It's /to/ clever. That's the problem. If you haven't seen it in a while, 
it's confusing and it LOOKS wrong even if you have.

 
 Read it more like this:
 
 for(uint i = end - 1; i < length && i >= 0; --i)
 
 But the i >= 0 is implicit because i is unsigned.
 

I know, that's exactly "the case where the assumption is wrong".

-- 
... <IXOYE><

Jun 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 15 Jun 2010 10:08:38 -0400, BCS <none anon.com> wrote:

 Hello Steven,

 On Mon, 14 Jun 2010 21:48:10 -0400, BCS <none anon.com> wrote:

 Hello Steven,

 div0 <div0 users.sourceforge.net> writes:

 for(uint i = end - 1; i < length; --i)
 ...

 What does "length" represent here? It's not clear to me how "i"
 descending toward zero is going to break the guard condition.

 My thought exactly.
  If i<j and you --i, I'd assume i<j, if your code depends on the case
 where the assumption is wrong, don't ask me to do a code review
 because  I won't sign off on it.

 i is unsigned, and therefore can never be less than 0.  It's actually
 a  clever way to do it that I've never thought of.

 It's /to/ clever. That's the problem. If you haven't seen it in a while,  
 it's confusing and it LOOKS wrong even if you have.

This is easily solved - put in a comment.  I frequently put comments in my  
code because I know I'm going to forget why I did something.

  Read it more like this:
  for(uint i = end - 1; i < length && i >= 0; --i)
  But the i >= 0 is implicit because i is unsigned.

 I know, that's exactly "the case where the assumption is wrong".

Reading code assuming integer wrapping never occurs is a big mistake.  You  
should learn to assume wrapping is always possible.

-Steve

Jun 15 2010

BCS <none anon.com> writes:

Hello Steven,

 On Tue, 15 Jun 2010 10:08:38 -0400, BCS <none anon.com> wrote:
 
 Hello Steven,
 
 On Mon, 14 Jun 2010 21:48:10 -0400, BCS <none anon.com> wrote:
 
 Hello Steven,
 
 div0 <div0 users.sourceforge.net> writes:
 
 for(uint i = end - 1; i < length; --i)
 ...

 What does "length" represent here? It's not clear to me how "i"
 descending toward zero is going to break the guard condition.
 

 My thought exactly.
 If i<j and you --i, I'd assume i<j, if your code depends on the
 case
 where the assumption is wrong, don't ask me to do a code review
 because  I won't sign off on it.

 i is unsigned, and therefore can never be less than 0.  It's
 actually a  clever way to do it that I've never thought of.
 

 It's /to/ clever. That's the problem. If you haven't seen it in a
 while,  it's confusing and it LOOKS wrong even if you have.
 

 This is easily solved - put in a comment.  I frequently put comments
 in my  code because I know I'm going to forget why I did something.


All else being equal, code that *requiters* comments to understand is inferior 
to code that doesn't.

 Read it more like this:
 for(uint i = end - 1; i < length && i >= 0; --i)
 But the i >= 0 is implicit because i is unsigned.

 I know, that's exactly "the case where the assumption is wrong".
 

 Reading code assuming integer wrapping never occurs is a big mistake.
 You  should learn to assume wrapping is always possible.
 

You should learn to write code where I and everyone else doesn't /need/ to 
assume it is possible.

(personably, I find it marginally offensive/greedy when someone's first
proposal 
as to how to fix a problem if for the rest of the world to change and the 
second option is for the person to change.)

-- 
... <IXOYE><

Jun 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 15 Jun 2010 11:47:34 -0400, BCS <none anon.com> wrote:

 Hello Steven,

 This is easily solved - put in a comment.  I frequently put comments
 in my  code because I know I'm going to forget why I did something.


 All else being equal, code that *requiters* comments to understand is  
 inferior to code that doesn't.

Code should *always* have comments.  I hate reading code that doesn't have  
comments, it allows you to understand what the person is thinking.

That being said, I don't think this construct requires comments, maybe a  
note like 'uses underflow' or something to let the reader know the writer  
was aware of the issue and did it on purpose, but a comment is not  
essential to understanding the code.

*That* being said, I don't expect to use this construct often.  Typically  
one iterates forwards through an array, and foreach is much better suited  
for iteration anyways.

 Reading code assuming integer wrapping never occurs is a big mistake.
 You  should learn to assume wrapping is always possible.

 You should learn to write code where I and everyone else doesn't /need/  
 to assume it is possible.

Why?  If you can't understand/spot overflow/underflow problems, then why  
should I cater to you?  It's like lowering academic testing standards for  
school children so they can pass on to the next grade.

 (personably, I find it marginally offensive/greedy when someone's first  
 proposal as to how to fix a problem if for the rest of the world to  
 change and the second option is for the person to change.)

Why is it offensive if I expect a code reviewer to take overflow into  
consideration when reviewing code?  It's not some sort of snobbery, I just  
expect reviewers to be competent.

-Steve

Jun 15 2010

BCS <none anon.com> writes:

Hello Steven,

 On Tue, 15 Jun 2010 11:47:34 -0400, BCS <none anon.com> wrote:
 
 Hello Steven,
 
 This is easily solved - put in a comment.  I frequently put comments
 in my  code because I know I'm going to forget why I did something.
 

 All else being equal, code that *requiters* comments to understand is
 inferior to code that doesn't.
 

 Code should *always* have comments.  I hate reading code that doesn't
 have  comments, it allows you to understand what the person is
 thinking.

I agree. It should have comments. But if stripping them out would render 
the code unmaintainable, that indicates to me that it's likely the code is 
to complex. It's a sliding scale, the more difference the comments make, 
the more of an issue it is. And again, this is an "all else being equal" 
case; given two option and nothing else to chose between them, I'll pick 
the one that needs fewer comments.

 Reading code assuming integer wrapping never occurs is a big
 mistake. You  should learn to assume wrapping is always possible.
 

 You should learn to write code where I and everyone else doesn't
 /need/  to assume it is possible.
 

 Why?  If you can't understand/spot overflow/underflow problems, then
 why  should I cater to you?  It's like lowering academic testing
 standards for  school children so they can pass on to the next grade.

The way peoples brains are wired, the first thought people will have about 
that code is wrong. If that can be avoided, why not avoid it?

 
 (personably, I find it marginally offensive/greedy when someone's
 first  proposal as to how to fix a problem if for the rest of the
 world to  change and the second option is for the person to change.)
 

 Why is it offensive if I expect a code reviewer to take overflow into
 consideration when reviewing code

That's /not/ offensive. For one thing, only very few people will ever need 
to be involved in that. The reason I wouldn't let it pass code review has 
zero to do with me not understanding it (I do understand for one thing) but 
has 100% with anyone who ever needs to touch the code needing to understand 
it. That is an open set (and that is why I find it marginally offensive). 
The cost of putting something in your code that is harder (note I'm not saying 
"hard") to understand goes up the more successful the code is and is
effectively 
unbounded.

 It's not some sort of snobbery, I
 just  expect reviewers to be competent.

I expect that to. I also expect people reading my code (for review or what-not) 
to have better things to do with their time than figure out clever code.

-- 
... <IXOYE><

Jun 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 15 Jun 2010 16:07:26 -0400, BCS <none anon.com> wrote:

 Hello Steven,
 Why?  If you can't understand/spot overflow/underflow problems, then
 why  should I cater to you?  It's like lowering academic testing
 standards for  school children so they can pass on to the next grade.

 The way peoples brains are wired, the first thought people will have  
 about that code is wrong. If that can be avoided, why not avoid it?

Because the alternatives are uglier, and it's not as easy to see subtle  
sign problems with them.  The code we are discussing has no such subtle  
problems since all arithmetic/comparison is done with unsigned values.

 Why is it offensive if I expect a code reviewer to take overflow into
 consideration when reviewing code

 That's /not/ offensive. For one thing, only very few people will ever  
 need to be involved in that. The reason I wouldn't let it pass code  
 review has zero to do with me not understanding it (I do understand for  
 one thing) but has 100% with anyone who ever needs to touch the code  
 needing to understand it. That is an open set (and that is why I find it  
 marginally offensive). The cost of putting something in your code that  
 is harder (note I'm not saying "hard") to understand goes up the more  
 successful the code is and is effectively unbounded.

So I have to worry about substandard coders trying to understand my code?   
If anything, they ask a question, and it is explained to them.  There is  
no trickery or deception or obfuscation.  I'd expect a coder who  
understands bitwise operations to understand this code no problem.

I would not, on the other hand, expect a reasonably knowledgeable coder to  
see subtle sign errors due to comparing/subtracting signed and unsigned  
integers.  Those are much trickier to see, even for experienced coders.

In other words, the code looks strange, but is not hiding anything.  Code  
that looks correct but contains a subtle sign bug is worse.

 It's not some sort of snobbery, I
 just  expect reviewers to be competent.

 I expect that to. I also expect people reading my code (for review or  
 what-not) to have better things to do with their time than figure out  
 clever code.

I guess I'd say that's a prejudice against learning new code tricks  
because not everybody knows them.  It sounds foolish to me.

-Steve

Jun 15 2010

BCS <none anon.com> writes:

Hello Steven,

 On Tue, 15 Jun 2010 16:07:26 -0400, BCS <none anon.com> wrote:
 
 Hello Steven,
 
 Why is it offensive if I expect a code reviewer to take overflow
 into consideration when reviewing code
 

 That's /not/ offensive. For one thing, only very few people will ever
 need to be involved in that. The reason I wouldn't let it pass code
 review has zero to do with me not understanding it (I do understand
 for  one thing) but has 100% with anyone who ever needs to touch the
 code  needing to understand it. That is an open set (and that is why
 I find it  marginally offensive). The cost of putting something in
 your code that  is harder (note I'm not saying "hard") to understand
 goes up the more  successful the code is and is effectively
 unbounded.
 

 So I have to worry about substandard coders trying to understand my
 code? If anything, they ask a question, and it is explained to them.

If *any* user *ever* has to ask a question about how code, that does something 
as simple as loop over any array backwards, works the author has failed. 
If even a handful of users take long enough to understand it that they even 
notice thay'er are thinking about it, the author didn't do a good job.

I guess I can restate my opinion as: I'm (slightly) offended that you are 
asking me to think about something that trivial. Would you rather I spend 
any time think about that or would you rather I spend it thinking about the 
rest of your code?

 In other words, the code looks strange, but is not hiding anything.
 Code  that looks correct but contains a subtle sign bug is worse.
 

Looks correct & is correct > looks wrong & is wrong > looks wrong and isn't 
 looks right and isn't

You might talk me into switching the middle two, but they are darn close.

 It's not some sort of snobbery, I
 just  expect reviewers to be competent.

 I expect that to. I also expect people reading my code (for review or
 what-not) to have better things to do with their time than figure out
 clever code.
 

 I guess I'd say that's a prejudice against learning new code tricks
 because not everybody knows them.  It sounds foolish to me.

I have no problem with code trick. I have problems with complex code where 
simple less interesting code does just as well.

I guess we aren't likely to agree on this so I'll just say; many you maintain 
interesting code.

-- 
... <IXOYE><

Jun 15 2010

Kagamin <spam here.lot> writes:

Steven Schveighoffer Wrote:

 On Mon, 14 Jun 2010 21:48:10 -0400, BCS <none anon.com> wrote:
 
 Hello Steven,

 div0 <div0 users.sourceforge.net> writes:

 for(uint i = end - 1; i < length; --i)
 ...

 What does "length" represent here? It's not clear to me how "i"
 descending toward zero is going to break the guard condition.

 My thought exactly.

 If i<j and you --i, I'd assume i<j, if your code depends on the case  
 where the assumption is wrong, don't ask me to do a code review because  
 I won't sign off on it.

 
 i is unsigned, and therefore can never be less than 0.  It's actually a  
 clever way to do it that I've never thought of.
 
 Read it more like this:
 
 for(uint i = end - 1; i < length && i >= 0; --i)
 
 But the i >= 0 is implicit because i is unsigned.
 

foreach_reverse, lol

Jun 15 2010

bearophile <bearophileHUGS lycos.com> writes:

div0:

Well for a start, you lose half your addressable memory.<

This matters mostly with char/ubyte/byte arrays on 32 bit systems. If you have
arrays of shorts, ints or pointers/references or you are on 64 bit systems this
is not so important.

And the extra safety it gives me is a price I can pay.

And if you don't want to pay that addressable indexes price you can use longs
on 32 bit systems :-)


unsigned numbers are only a problem if you don't understand how they work, but
that goes for just about everything else as well.<

This can help you understand why you are very wrong:
"Array bound errors are a problem only if you don't understand how arrays work."

I have understood how unsigned numbers work, but I keep writing some bugs once
in a while.


Personally I hate the use of signed numbers as array indices; it's moronic and
demonstrates the writers lack of understanding.<




It's very rare to actually want to index an array with a negative number.<

That's beside the main point. The main problems come from mixing signed and
unsigned values.


 c.f.
 
 Item getItem(int indx) {
    if(indx >= 0 && indx < _arr.length)
      return _arr[indx];
    throw new Error(...)
 }
 
 vs.
 
 // cleaner no?
 Item getItem(uint indx) {
    if(indx < _arr.length)
      return _arr[indx];
    throw new Error(...)
 }

The second is shorter (and one less test can make it a bit faster) but it's not
cleaner.


Using 'int's doesn't magically fix it. Wrong code is just wrong.<

I agree. But ints can avoid some bugs.


Hasn't that been discussed before?<

Discussions about signed-unsigned-derived troubles have happened before.

But this time I have expressed a focused request, to turn indexes and lenghts
into signed words (as I have written in my enhancement request). I think this
was not discussed before in a focused way (or I was not present yet).

Bye,
bearophile

Jun 14 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
I have found a Reddit discussion few days old:
http://www.reddit.com/r/programming/comments/cdwz5/the_perils_of_unsigned_iteration_in_cc/

It contains this, that I quote (I have no idea if it's true), plus
follow-ups:

At Google using uints of all kinds for anything other than bitmasks or
other inherently bit-y, non computable things is strongly discouraged. This
includes things like array sizes, and the warnings for conversion of size_t
to int are disabled. I think it's a good call.<

I have expressed similar ideas here:
http://d.puremagic.com/issues/show_bug.cgi?id=3843

Unless someone explains me why I am wrong, I will keep thinking that using
unsigned words to represent lengths and indexes, as D does, is wrong and

is a better design choice.

D provides powerful abstractions for iteration; it is becoming less and less
desirable to hand-build loops with for-statements.

As for "unsafe", I think you need to clarify this, as D is not memory unsafe
despite the existence of integer over/under flows.

In a language as greatly numerically unsafe as D (silly C-derived conversion
rules,

Actually, I think they make a lot of sense, and D's improvement on them that
only disallows conversions that lose bits based on range propagation is far
more

fixed-sized numbers used everywhere on default, no runtime numerical
overflows) the usage of unsigned numbers can be justified inside bit vectors,
bitwise operations, and few other similar situations only.

If D wants to be "a systems programming language. Its focus is on combining
the power and high performance of C and C++ with the programmer productivity
of modern languages like Ruby and Python." it must understand that numerical
safety is one of the not secondary things that make those languages as Ruby
and Python more productive.

I have a hard time believing that Python and Ruby are more productive primarily
because they do not have an unsigned type.

Python did not add overflow protection until 3.0, so it's very hard to say this
crippled productivity in early versions.
http://www.python.org/dev/peps/pep-0237/

Ruby & Python 3.0 dynamically switch to larger integer types when overflow
happens. This is completely impractical in a systems language, and is one
reason
why Ruby & Python are execrably slow compared to C-style languages.

Jun 14 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 06/14/2010 05:48 PM, Walter Bright wrote:
bearophile wrote:
I have found a Reddit discussion few days old:
http://www.reddit.com/r/programming/comments/cdwz5/the_perils_of_unsigned_iteration_in_cc/

It contains this, that I quote (I have no idea if it's true), plus
follow-ups:

At Google using uints of all kinds for anything other than bitmasks or
other inherently bit-y, non computable things is strongly
discouraged. This
includes things like array sizes, and the warnings for conversion of
size_t
to int are disabled. I think it's a good call.<

I have expressed similar ideas here:
http://d.puremagic.com/issues/show_bug.cgi?id=3843

Unless someone explains me why I am wrong, I will keep thinking that
using
unsigned words to represent lengths and indexes, as D does, is wrong and

in D
is a better design choice.

D provides powerful abstractions for iteration; it is becoming less and
less desirable to hand-build loops with for-statements.

Ooo ooo, can we remove it?

As for "unsafe", I think you need to clarify this, as D is not memory
unsafe despite the existence of integer over/under flows.

I think the problem is people don't generally think of fixnums as
fixnums when they use them.

Just recently I was reading about some security vulnerabilities in ruby
from a few years ago which were caused when whoever wrote the underlying
C didn't take integer overflow into consideration.

What I take away from this anecdote is that it's that much harder to
write trustworthy code in D. As always, the existence of issue 259
doesn't help matters.

And from personal experience, I submit that checking for overflow is
very painful to do manually (I tried to write a modular arithmetic lib
for fixnums - and gave up rather quickly). Want language support (or
library support, I don't care).

In a language as greatly numerically unsafe as D (silly C-derived
conversion
rules,

Actually, I think they make a lot of sense, and D's improvement on them
that only disallows conversions that lose bits based on range

fixed-sized numbers used everywhere on default, no runtime numerical
overflows) the usage of unsigned numbers can be justified inside bit
vectors,
bitwise operations, and few other similar situations only.

If D wants to be "a systems programming language. Its focus is on
combining
the power and high performance of C and C++ with the programmer
productivity
of modern languages like Ruby and Python." it must understand that
numerical
safety is one of the not secondary things that make those languages as
Ruby
and Python more productive.

I have a hard time believing that Python and Ruby are more productive
primarily because they do not have an unsigned type.

They're more productive because their built in number types aren't
fixnums. That's a nice large class of errors that don't exist in those
languages.

Python did not add overflow protection until 3.0, so it's very hard to
say this crippled productivity in early versions.
http://www.python.org/dev/peps/pep-0237/

Ruby & Python 3.0 dynamically switch to larger integer types when
overflow happens. This is completely impractical in a systems language,
and is one reason why Ruby & Python are execrably slow compared to
C-style languages.

Jun 14 2010

Walter Bright <newshound1 digitalmars.com> writes:

Ellery Newcomer wrote:
 On 06/14/2010 05:48 PM, Walter Bright wrote:
 D provides powerful abstractions for iteration; it is becoming less and
 less desirable to hand-build loops with for-statements.

 
 Ooo ooo, can we remove it?

No :-)


 I have a hard time believing that Python and Ruby are more productive
 primarily because they do not have an unsigned type.

 
 They're more productive because their built in number types aren't 
 fixnums. That's a nice large class of errors that don't exist in those 
 languages.

Like I said, this didn't appear in Python until quite recently (3.0), so that 
cannot be the primary productivity advantage of Python.

Jun 14 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:
 Like I said, this didn't appear in Python until quite recently (3.0), so that 
 cannot be the primary productivity advantage of Python.

You are wrong, see my other answer.

Bye,
bearophile

Jun 14 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

D provides powerful abstractions for iteration; it is becoming less and less
desirable to hand-build loops with for-statements.<

I agree.


As for "unsafe", I think you need to clarify this, as D is not memory unsafe
despite the existence of integer over/under flows.<

Modern languages must understand that there are other forms of safety beside
memory safety. Integer overflows and signed-unsigned conversion-derived bugs
can cause disasters as well.

In current D language the usage of unsigned numbers is a safety hazard. So far
nothing I have seen written by you or other people has shown that this is false.


Actually, I think they make a lot of sense, and D's improvement on them that
only disallows conversions that lose bits based on range propagation is far


1) I'd like D to use signed words to represent lengths and array indexes. We
are going to 64 bit systems where 63 bits can be enough for lenghts. If arrays
of 4 billion items are seen as important on 32 bit systems too, then use a long
:-)
2) I don't like D to silently gulp down expressions that mix signed and
unsigned integers and spit out wrong results when the integers were negative.


I have a hard time believing that Python and Ruby are more productive primarily
because they do not have an unsigned type.<

Python is very productive (for small or medium sized programs! On large
programs Python is less good) because of a quite long list of factors. My
experience with D and Python (and several other languages) has shown me that
Python not using fixnums is one of the factors that help productivity. It's
surely not the only factor, and I agree with you that it's not the most
important, but it's surely one of the significant factors and it can't be
ignored.

Python integers don't overflow, this at the same time allows you to safe brain
time and brain power thinking about possible overflows and the code to avoid
their risk, and makes coding more relaxed. And if you try to write 50 Project
Euler programs in Python and D you will surely see how many bugs the Python
code has avoided you compared to D. Finding and fixing such bugs in D code
requires lot of time that you save in Python.

In D there are other bugs derived from mixing signed and unsigned numbers (and
you can't avoid them just avoiding using unsigned numbers in your code, because
lenghts and indexes and other things use them).


 Python did not add overflow protection until 3.0, so it's very hard to say
 this crippled productivity in early versions.
http://www.python.org/dev/peps/pep-0237/ 

You are wrong. Python 2.x dynamically switches to larger integer types when
overflow happens. This is done transparently and avoids bugs and keeps programs
more efficient. This is on Python V.2.6.5 but similar things happen in much
older versions of Python:

 a = 2
 type(a)



<type 'int'>
 a += 10 ** 1000
 len(str(a))



1001
 type(a)



<type 'long'>


 Ruby & Python 3.0 dynamically switch to larger integer types when overflow
 happens.

This is wrong. Python 3.0 has just the multi-precision integer type, that is
called "int".

For small values it can and will probably use under the cover an user-invisible
optimization that is essentially the same thing that Python 2.x does. At the
moment Python 3 integers are a bit slower than Python 2.x ones because this
optimization is not done yet, one of the main design goals of Python is to keep
the C interpreter of Python itself really simple, so even not expert C
programmer can hack it and help in the develpment of Python.

The PEP 237 and its unification of types was done because:
1) there's no need to keep two integer types in the language, you can just keep
one and the language can use invisible optimizations where possible. Python is
designed to be simple, so removing one type is good.
2) Actually in very uncommon situations the automatic switch to multi-precision
integers can't happen. Such situations are very hard to find, they do not come
up in normal numerical code, they come up when you use C extensions (or Python
standard library code that is written in C). You can program every day four
years in Python 2.x and never find such cases.


This is completely impractical in a systems language, and is one reason why
Ruby & Python are execrably slow compared to C-style languages.<

Lisp languages can be only a 1.0-3.0 times slower can C despite using mostly
multi-precision numbers. So I don't think well implemented multi-precision
numbers are so bad in a very fast language. And where performance really
matters fixnums can be used. In the last years I am starting to think that
using fixnums everywhere is a premature optimization. But anyway, the purpose
of my original post was not to advocate the replacement of fixnums in D with
multi-precision numbers, it was about the change of array indexes and lenghts
from unsigned to signed.

Python is slow compared to D, and surely their multi-precision numbers don't
help their performance, but the "lack" of Python performance has many causes
and the main ones are not the multi-precision numbers. 

The main cause is that Python is designed to have a simple interpterer that can
be modified by not very expert C programmers. This allows lot of people to
write and work on it, this was one of the causes of the Python success. The
unladen swallow project has shown that you can make Python 2-4 times faster
just "improving" (messing up and adding some hairy hacks to it) its
interpreter, etc.

One of the main causes of the low Python performance is that it's dynamically
typed and at the same time it lacks a Just-in-time compiler. A Psyco JIT
compiler allows me to write Python code that is usually no more than 10 times
slower than D. The wonderful JIT compiler of Lua (that lacks multi-precion
numbers but has dynamic typing) allows it to run usually at 0.9-2.5 times
slower than D compiled with DMD (0.9 means it's faster on some FP-heavy code).

Other causes of Python low performance is just that Python code is often not
written with performance in mind. I am often able to write Python programs that
are 2-3 times faster than Python programs I can find around.

Bye,
bearophile

Jun 14 2010

bearophile <bearophileHUGS lycos.com> writes:

bearophile:
 2) I don't like D to silently gulp down expressions that mix signed and
 unsigned integers and spit out wrong results when the integers were negative.

Walter, answering something similar:
Andrei and I went down that alley for a while. It's not practical.

OK. Then just removing as many unsigned words as possible from normal code (you
can see this as the code you want to write in SafeD) can be an alternative. The
indexes and lengths are a common source of unsigned word usage inside SafeD
programs.

Bye,
bearophile

Jun 14 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 As for "unsafe", I think you need to clarify this, as D is not memory
 unsafe despite the existence of integer over/under flows.<

 
 Modern languages must understand that there are other forms of safety beside
 memory safety. Integer overflows and signed-unsigned conversion-derived bugs
 can cause disasters as well.
 
 In current D language the usage of unsigned numbers is a safety hazard. So
 far nothing I have seen written by you or other people has shown that this is
 false.

D's safe mode, integer overflow *cannot* lead to memory corruption. So when you 
say something is "unsafe", I think it's reasonable to ask what you mean by it.

For example, if you define "safe" as "guaranteed to not have bugs", then you're 
requiring that there be a proof of correctness for all programs in D.


 Actually, I think they make a lot of sense, and D's improvement on them
 that only disallows conversions that lose bits based on range propagation


 
 1) I'd like D to use signed words to represent lengths and array indexes.

This would lead to silent breakage of code transferred from C and C++. We've 
tried very hard to not have such things in D. The idea is that code that looks 
the same either behaves the same or issues an error. There's no way to make
your 
proposal pass this requirement.


 We are going to 64 bit systems where 63 bits can be enough for lenghts. If
 arrays of 4 billion items are seen as important on 32 bit systems too, then
 use a long :-) 2) I don't like D to silently gulp down expressions that mix
 signed and unsigned integers and spit out wrong results when the integers
 were negative.

That idea has a lot of merit for 64 bit systems. But there are two problems
with it:

1. D source code is supposed to be portable between 32 and 64 bit systems. This 
would fail miserably if the sign of things silently change in the process.

2. For an operating system kernel's memory management logic, it still would
make 
sense to represent the address space as a flat range from 0..n, not one that's 
split in the middle, half of which is accessed with negative offsets. D is 
supposed to support OS development.



 I have a hard time believing that Python and Ruby are more productive
 primarily because they do not have an unsigned type.<

 
 Python is very productive (for small or medium sized programs! On large
 programs Python is less good) because of a quite long list of factors. My
 experience with D and Python (and several other languages) has shown me that
 Python not using fixnums is one of the factors that help productivity. It's
 surely not the only factor, and I agree with you that it's not the most
 important, but it's surely one of the significant factors and it can't be
 ignored.

We can argue forever with how significant it is, I don't assign nearly as much 
to it as you do.


 Python integers don't overflow,
 this at the same time allows you to safe
 brain time and brain power thinking about possible overflows and the code to
 avoid their risk, and makes coding more relaxed. And if you try to write 50
 Project Euler programs in Python and D you will surely see how many bugs the
 Python code has avoided you compared to D. Finding and fixing such bugs in D
 code requires lot of time that you save in Python.

This is where we differ. I very rarely have a bug due to overflow or 
signed/unsigned differences. If you use the D loop abstractions, you should 
never have these issues with it.


 Python did not add overflow protection until 3.0, so it's very hard to say 
 this crippled productivity in early versions.
 http://www.python.org/dev/peps/pep-0237/

 
 You are wrong. Python 2.x dynamically switches to larger integer types when
 overflow happens. This is done transparently and avoids bugs and keeps
 programs more efficient. This is on Python V.2.6.5 but similar things happen
 in much older versions of Python:
 
 a = 2 type(a)



 <type 'int'>
 a += 10 ** 1000 len(str(a))



 1001
 type(a)



 <type 'long'>

Here's what the wikipedia said about it.

"In Python, a number that becomes too large for an integer seamlessly becomes a 
long.[1] And in Python 3.0, integers and arbitrary sized longs are unified."

-- http://en.wikipedia.org/wiki/Integer_overflow

(Just switching to long isn't good enough - what happens when long overflows? I 
generally don't like solution like this because it makes tripping the bug so 
rare that it can lurk for years. I prefer to flush bugs out in the open early.)


 This is completely impractical in a systems language, and is one reason why
 Ruby & Python are execrably slow compared to C-style languages.

 
 Lisp languages can be only a 1.0-3.0 times slower can C despite using mostly
 multi-precision numbers. So I don't think well implemented multi-precision
 numbers are so bad in a very fast language.

3x is a BIG deal. If you're running a major site, this means you only need 1/3 
of the hardware, and 1/3 of the electric bill. If you're running a program that 
takes all day, now you can run it 3 times that day.

Jun 14 2010

Pelle <pelle.mansson gmail.com> writes:

On 06/15/2010 03:49 AM, Walter Bright wrote:
 Here's what the wikipedia said about it.

 "In Python, a number that becomes too large for an integer seamlessly
 becomes a long.[1] And in Python 3.0, integers and arbitrary sized longs
 are unified."

 -- http://en.wikipedia.org/wiki/Integer_overflow

 (Just switching to long isn't good enough - what happens when long
 overflows? I generally don't like solution like this because it makes
 tripping the bug so rare that it can lurk for years. I prefer to flush
 bugs out in the open early.)

A long in pythonic would be a BigInt in D, so no overflows. Python 
integers don't overflow.

Jun 15 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

D's safe mode, integer overflow *cannot* lead to memory corruption. So when you
say something is "unsafe", I think it's reasonable to ask what you mean by it.<

I meant "more numerical safe". That is it helps avoid part of the
integral-derived bugs.


We've tried very hard to not have such things in D. The idea is that code that
looks the same either behaves the same or issues an error. There's no way to
make your proposal pass this requirement.<

I see. We can drop this, then.


We can argue forever with how significant it is, I don't assign nearly as much
to it as you do.<

I see. If you try solving many Project Euler problems you can see how common
those bugs are :-) For other kind of code they are probably less common.


If you use the D loop abstractions, you should never have these issues with it.<

In D I am probably using higher loop abstractions than the ones you use
normally, but now and then I have those bugs anyway. Talking the length of an
array is necessary now and then even if you use loop abstractions (and
higher-order functions as maps, filters, etc).


Here's what the wikipedia said about it.
"In Python, a number that becomes too large for an integer seamlessly becomes a
long.[1] And in Python 3.0, integers and arbitrary sized longs are unified."<<


This is exactly the same things I have said :-)


(Just switching to long isn't good enough - what happens when long overflows?<

Maybe this is where you didn't understand the situation: Python 2.x "long"
means multi-precision integral numbers. In my example the number was 1001
decimal digits long.


I generally don't like solution like this because it makes tripping the bug so
rare that it can lurk for years. I prefer to flush bugs out in the open early.)<

In Python 2.x this causes zero bugs because those "longs" are multi-precision.


3x is a BIG deal. If you're running a major site, this means you only need 1/3
of the hardware, and 1/3 of the electric bill. If you're running a program that
takes all day, now you can run it 3 times that day.<

This point of the discussion is probably too much indefinite to say something
useful about it. I can answer you that in critical spots of the program it is
probably easy enough to replace multiprecision ints with fixnums, and this can
make the whole program no significantly slower than C code. And in some places
the compiler can infer where fixnums are enough and use them automatically. In
the end regarding this point mine is mostly a gut feeling derived from many
years of usage of multiprecision numbers: I think that in a nearly-system
language as D well implemented multi-precision numbers (with the option to use
fixnums in critical spots) can lead to efficient enough programs. I have
programmed in a compiled CLisp a bit, and the integer value performance is not
so bad. I can of course be wrong, but only an actual test can show it :-) Maybe
someday I will try it and do some benchmarks. Current BigInt of D need the
small-number optimization before a test can be tried (that is to avoid heap
allocation when the bignumber fits in 32 or 64 bits), and the compiler is not
smart enough to replace bigints with ints where bigints are not necessary. In

overflow enabled or disabled, and I have seen that the performance with those
enabled is only a bit less, not significantly so (I have seen the same thing in
Delphi years ago).


That idea has a lot of merit for 64 bit systems. But there are two problems
with it: 1. D source code is supposed to be portable between 32 and 64 bit
systems. This would fail miserably if the sign of things silently change in the
process.<

Then we can use a signed word on 32 bit systems too.
Or if you don't like that, to represent lengths/indexes we can use 64 bit
signed values on 32 bit systems too.


2. For an operating system kernel's memory management logic, it still would
make sense to represent the address space as a flat range from 0..n, not one
that's split in the middle, half of which is accessed with negative offsets. D
is supposed to support OS development.<

I am not expert enough about this to understand well the downsides of signed
numbers used in this. But I can say that D is already not the best language to
develop non-toy operating systems.
And even if someone writes a serious operating system with D, this is an
uncommon application of D language, where probably 95% of other people write
other kinds of programs where unsigned integers everywhere are not the best
thing.
And the uncommon people that want to write an OS or device driver with D can
use signed words. Such uncommon people can even design and use their own arrays
with unsigned-word lengths/indexes :-)
Designing D to appeal to a very uncommon kind of power-users that need to write
an operating system with D doesn't look like a good design choice.

If this whole thread goes nowhere then later I can even close bug 3843, because
there's little point in keeping it open.

Bye,
bearophile

Jun 15 2010

bearophile <bearophileHUGS lycos.com> writes:

 And the uncommon people that want to write an OS or device driver with D can
use signed words.

Sorry, I meant they can use unsigned words.
Bye,
bearophile

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 But I can say that D is already not the best language
 to develop non-toy operating systems.

Why?

Jun 15 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

 But I can say that D is already not the best language
 to develop non-toy operating systems.

Why?

This is partially off-topic to the topic of this thread.

I have not written an OS yet, so I can't be sure. But from what I have read and
seen D seems designed for different purposes, mostly as a high-performance
low-level application language that currently is programmed in a style that
doesn't assume a very efficient GC.

D has many features that are useless or negative if you want to write code
close to the metal as a kernel, as classes, virtual functions, garbage
collector, operator overloading, interfaces, exceptions and try-catch-finally
blocks, closures, references, delegates, nested functions and structs, array
concat, built-in associative arrays, monitor, automatic destructors. When you
write code close to the metal you want to know exactly what your code is doing,
so all the automatic things or higher level things become useless or worse,
they keep you from seeing what the hardware is actually doing.

On the other hand current D language (and C and C++) lacks other
hard-to-implement features that allow the kernel programmer to give more
semantics to the code. So such semantics has to be expressed through normal
coding. Future languages maybe will improve on this, but it will be a hard
work. ATS language tries to improve a bit on this, but it's far from being good
and its syntax is awful.

D also lacks a good number of nonstandard C features that are present in the
"C" compiled by GCC, such low-level features and compilation flags can be quite
useful if you write a kernel. Even LDC has a bit of such features.

Bye,
bearophile

Jun 15 2010

Don <nospam nospam.com> writes:

bearophile wrote:
 Walter Bright:
 
 But I can say that D is already not the best language
 to develop non-toy operating systems.

 Why?

 
 This is partially off-topic to the topic of this thread.
 
 I have not written an OS yet, so I can't be sure. But from what I have read
and seen D seems designed for different purposes, mostly as a high-performance
low-level application language that currently is programmed in a style that
doesn't assume a very efficient GC.
 
 D has many features that are useless or negative if you want to write code
close to the metal as a kernel

Indeed, only a subset of D is useful for low-level development. But D 
has more close-to-the-metal features than C does.
(Compare with C++, which didn't improve the machine model it inherited 
from C).

Of course, the market for kernel development is so small and so 
dominated by C that it's not really worth worrying it.

Jun 15 2010

Alex Makhotin <alex bitprox.com> writes:

bearophile wrote:
 When you write code close to the metal you want to know exactly what your code
is doing, so all the automatic things or higher level things become useless or
worse, they keep you from seeing what the hardware is actually doing.

Right. That's why I well respect the point of view of Linus on that 
matter. And his last comments on that look well motivated to me.


-- 
Alex Makhotin,
the founder of BITPROX,
http://bitprox.com

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 
 But I can say that D is already not the best language to develop non-toy
 operating systems.

 Why?

 
 I have not written an OS yet, so I can't be sure. But from what I have read
 and seen D seems designed for different purposes, mostly as a
 high-performance low-level application language that currently is programmed
 in a style that doesn't assume a very efficient GC.

I'd rephrase that as D supports many different styles. One of those styles is
as 
a "better C".


 D has many features that are useless or negative if you want to write code
 close to the metal as a kernel, as classes, virtual functions, garbage
 collector, operator overloading, interfaces, exceptions and try-catch-finally
 blocks, closures, references, delegates, nested functions and structs, array
 concat, built-in associative arrays, monitor, automatic destructors. When you
 write code close to the metal you want to know exactly what your code is
 doing, so all the automatic things or higher level things become useless or
 worse, they keep you from seeing what the hardware is actually doing.

I agree on those points. Those features would not be used when using D as a 
"better C".

So, you could ask why not use C++ as a "better C" and eschew the C++ features 
that cause trouble for kernel dev? The answer is that C++ doesn't offer much 
over C that does not involve those trouble causing features.

D, on the other hand, offers substantial and valuable features not available in 
C or C++ that can be highly useful for kernel dev. Read on.


 On the other hand current D language (and C and C++) lacks other
 hard-to-implement features that allow the kernel programmer to give more
 semantics to the code. So such semantics has to be expressed through normal
 coding. Future languages maybe will improve on this, but it will be a hard
 work. ATS language tries to improve a bit on this, but it's far from being
 good and its syntax is awful.

I think you are giving zero weight to the D features that assist kernel
programming.


 D also lacks a good number of nonstandard C features that are present in the
 "C" compiled by GCC, such low-level features and compilation flags can be
 quite useful if you write a kernel. Even LDC has a bit of such features.

A non-standard feature means the language is inadequate. There is nothing at
all 
preventing non-standard features from being added to D for specific tasks.
There 
is no reason to believe it is harder to do that for D than to C.

As for standard features D has that make it more suitable for low level 
programming than C is:

1. inline assembler as a standard feature
2. const/immutable qualifiers
3. identification of shared data with the shared type constructor
4. enforced function purity
5. guaranteed basic type sizes
6. arrays that actually work
7. scope guard (yes, even without exception handling)

BTW, you might ask "how do I know my D code doesn't have exception handling or 
GC calls in it?" There are several ways:

1. Remove the support from it from the library. Then, attempts to use such 
features will cause the link step to fail. Kernel C programmers use a custom 
library anyway, no reason why D kernel dev cannot.

2. Compiling code with "nothrow" will check that exceptions are not generated.

3. The compiler could be easily modified to add a switch that prevents such 
features from being used. This is no different from the customizations done to
C 
compilers for kernel dev.

Jun 15 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 bearophile wrote:
 Walter Bright:

 But I can say that D is already not the best language to develop 
 non-toy
 operating systems.

 Why?

 I have not written an OS yet, so I can't be sure. But from what I have 
 read
 and seen D seems designed for different purposes, mostly as a
 high-performance low-level application language that currently is 
 programmed
 in a style that doesn't assume a very efficient GC.

 
 I'd rephrase that as D supports many different styles. One of those 
 styles is as a "better C".
 
 
 D has many features that are useless or negative if you want to write 
 code
 close to the metal as a kernel, as classes, virtual functions, garbage
 collector, operator overloading, interfaces, exceptions and 
 try-catch-finally
 blocks, closures, references, delegates, nested functions and structs, 
 array
 concat, built-in associative arrays, monitor, automatic destructors. 
 When you
 write code close to the metal you want to know exactly what your code is
 doing, so all the automatic things or higher level things become 
 useless or
 worse, they keep you from seeing what the hardware is actually doing.

 
 I agree on those points. Those features would not be used when using D 
 as a "better C".
 
 So, you could ask why not use C++ as a "better C" and eschew the C++ 
 features that cause trouble for kernel dev? The answer is that C++ 
 doesn't offer much over C that does not involve those trouble causing 
 features.
 
 D, on the other hand, offers substantial and valuable features not 
 available in C or C++ that can be highly useful for kernel dev. Read on.
 
 
 On the other hand current D language (and C and C++) lacks other
 hard-to-implement features that allow the kernel programmer to give more
 semantics to the code. So such semantics has to be expressed through 
 normal
 coding. Future languages maybe will improve on this, but it will be a 
 hard
 work. ATS language tries to improve a bit on this, but it's far from 
 being
 good and its syntax is awful.

 
 I think you are giving zero weight to the D features that assist kernel 
 programming.
 
 
 D also lacks a good number of nonstandard C features that are present 
 in the
 "C" compiled by GCC, such low-level features and compilation flags can be
 quite useful if you write a kernel. Even LDC has a bit of such features.

 
 A non-standard feature means the language is inadequate. There is 
 nothing at all preventing non-standard features from being added to D 
 for specific tasks. There is no reason to believe it is harder to do 
 that for D than to C.
 
 As for standard features D has that make it more suitable for low level 
 programming than C is:
 
 1. inline assembler as a standard feature
 2. const/immutable qualifiers
 3. identification of shared data with the shared type constructor
 4. enforced function purity
 5. guaranteed basic type sizes
 6. arrays that actually work

6.5. arrays that actually work and don't need garbage collection

 7. scope guard (yes, even without exception handling)


Andrei

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

Walter Bright wrote:
 I think you are giving zero weight to the D features that assist kernel 
 programming.

What bothers me about this discussion is consider D with features 1 2 3 4, and 
language X with features 1 2 5. X is determined to be better than D because X 
has feature 5, but since X does not have features 3 and 4, therefore 3 and 4
are 
irrelevant.

For example, the more I use scope guard statements, the more of a game changer
I 
believe they are in eliminating the usual rat's nest of goto's one finds in C
code.

Jun 15 2010

Stephan <spam extrawurst.org> writes:

On 15.06.2010 19:41, Walter Bright wrote:
 3. The compiler could be easily modified to add a switch that prevents
 such features from being used. This is no different from the
 customizations done to C compilers for kernel dev.

Why not make such a change in a future release of the official version ?

Jun 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

Stephan wrote:
 On 15.06.2010 19:41, Walter Bright wrote:
 3. The compiler could be easily modified to add a switch that prevents
 such features from being used. This is no different from the
 customizations done to C compilers for kernel dev.

 
 Why not make such a change in a future release of the official version ?

It's pretty low on the priority list, because the absence of such a switch
would 
not prevent you from using D as a better C compiler.

Jun 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

Walter Bright wrote:
 Stephan wrote:
 On 15.06.2010 19:41, Walter Bright wrote:
 3. The compiler could be easily modified to add a switch that prevents
 such features from being used. This is no different from the
 customizations done to C compilers for kernel dev.

 Why not make such a change in a future release of the official version ?

 
 It's pretty low on the priority list, because the absence of such a 
 switch would not prevent you from using D as a better C compiler.

I would move it up in the priority if there was a serious project that needed 
it, as opposed to being a convenient excuse to not use D. One reason that dmd 
comes with source is so that people can try out things like this.

Jun 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 D also lacks a good number of nonstandard C features that are present in the
 "C" compiled by GCC, such low-level features and compilation flags can be
 quite useful if you write a kernel. Even LDC has a bit of such features.

It's interesting that D already has most of the gcc extensions:

     http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_4.html

as standard features, rather than extensions. Being part of the standard 
language implies D being more suitable for kernel dev than standard C is.

Jun 15 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

I'd rephrase that as D supports many different styles. One of those styles is
as a "better C".<

D can replace many but not all usages of C; think about programming an Arduino
(http://en.wikipedia.org/wiki/Arduino ) with a dmd compiler of today.


I agree on those points. Those features would not be used when using D as a
"better C".<

A problem is that some of those D features can worsen a kernel code. So for
example you have to review code to avoid operator overloading usage :-)
There is lot of D compiler complexity useless for that kind of code. A simpler
compiler means less bugs and less D manual to read.


The answer is that C++ doesn't offer much over C that does not involve those
trouble causing features. D, on the other hand, offers substantial and valuable
features not available in C or C++ that can be highly useful for kernel dev.
Read on.<

I don't know if D offers enough of what a kernel developer needs.


A non-standard feature means the language is inadequate.<

I agree, standard C is not perfect for that purpose.


There is nothing at all preventing non-standard features from being added to D
for specific tasks. There is no reason to believe it is harder to do that for D
than to C.<

I agree. (But note that here we are talking just about low level features.
Linus has said that such features are important but he desires other things
absent in C).


As for standard features D has that make it more suitable for low level
programming than C is:<

I agree.


Since it has more than C does, and C is used for kernel dev, then it must be
enough.<

Kernel C code uses several GCC extensions to the C language. And Linus says he
desires higher level features absent from C, C++ and absent from those GCC
extensions.


I'll await your reply there.<

I appreciate your trust, but don't expect me to be able to teach you things
about C and the kind of code needed to write a kernel, you have way more
experience than me :-)

--------------------

With all due respect to Linus, in 30 years of professionally writing software,
I've found that if you solely base improvements on what customers ask for, all
you have are incremental improvements. No quantum leaps, no paradigm shifts, no
game changers.<

You are right in general, but I don't know how much you are right regarding
Linus.
Linus desires some higher level features but maybe he doesn't exactly know what
he desires :-)

language (http://en.wikipedia.org/wiki/Sing_Sharp ), needed to write the
experimental Singularity OS.




nonnull types, method contracts, object invariants and an ownership type system




Internally, it uses an automatic theorem prover [7] that analyzes the
verification conditions to prove the correctness of the program or find errors
in it. One of the main innovations of Boogie is a systematic way (a

System handles callbacks and aggregate objects, and it supports both object [4]
and static [3] class invariants.<



one of C++:
http://msdn.microsoft.com/en-us/library/1b3fsfxw%28VS.80%29.aspx

to state that some condition is true before some method call that has that
thing as precondition. I have not fully understood the purpose of this, but I
think it can be useful for performance (because contracts are enforces in
"release mode" too. So the compiler has to try to remove some of them to
improve code performance).



T! t = new T(); // OK
t = null; // not allowed

Even if D can't turn all its class references to nonnull on default, a syntax
to specify references and pointers that can't be null can be added. The bang
symbol can't be used in D for that purpose, it has enough purposes already.



- [Pure] Method does not change the existing objects (but it may create and
update new objects).
- [Confined] Method is pure and reads only this and objects owned by this.
- [StateIndependent] Method does not read the heap at all.
Add one of the three attributes above to a method to declare it as pure method.
Any called method in a contract has to be pure.



http://research.microsoft.com/en-us/projects/specsharp/krml153.pdf

Sometimes there are even consistency conditions that relate the instance fields
of many or all objects of a class; static class invariants describe these
relations, too, since they cannot be enforced by any one object in isolation.<

This is an example, written in Pseudo-D:


class Client {
    int id;
    static int last_used_id = 0;

    static invariant() {
        assert(Client.last_used_id >= 0);

        HashSet!int used_ids;
        foreach (c; all Client instances) {
            assert(c.id < Client.last_used_id);
            assert(c.id !in usef_ids);
            usef_ids.add(c.id)
        }
    }
    
    this() {
        this.id = Client.last_used_id;
        Client.last_used_id++;
    }
}

Every object of class Client has an ID. The next available ID is stored in the
static field last_used_id. Static class invariants guarantee that last_used_id
has not been assigned to a Client object and that all Client objects have
different IDs.<


In D class/struct invariants can access static fields too. Findind all
instances of a class is not immediate in D, I don't think D reflection is
enough here, you have to store all such references, for example in a static
array of Client.


So in D it can become something like:

class Client {
    int id;
    static int last_used_id = 0;
    static typeof(id)[] clients;

    invariant() {
        assert(Client.last_used_id >= 0);

        // assert len(set(c.id for c in clients)) == len(clients)
        // assert all(c.id < Client.last_used_id for c in clients)

        HashSet!int used_ids;
        foreach (c; clients) {
            assert(c.id < Client.last_used_id);
            assert(c.id !in usef_ids);
            usef_ids.add(c.id)
        }
    }

    this() {
        this.id = Client.last_used_id;
        Client.last_used_id++;
        clients ~= this;
    }
}



problem.

--------------------

 It's interesting that D already has most of the gcc extensions:
 http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_4.html

There's lot of stuff in that page, and some of those things are new for me :-)

4.3 Labels as Values: that's computed gotos, they can be useful if you write an
interpreter or you implement some kind of state machine. In the last two years
I have found two situations where I have found useful this feature of GCC. I'd
like computed gotos in D too (both GDC and LDC can implement them in a simple
enough way. If this is hard to implement with the DMD back-end then I'd like
this feature to be in the D specs anyway, so other D compilers that want to
implement it will implement it with the same standard syntax, improving
portability of D code that uses it).

I will write about more of those GCC things tomorrow...

Bye,
bearophile

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 
 I'd rephrase that as D supports many different styles. One of those styles
 is as a "better C".<

 
 D can replace many but not all usages of C; think about programming an
 Arduino (http://en.wikipedia.org/wiki/Arduino ) with a dmd compiler of today.

The Arduino is an 8 bit machine. D is designed for 32 bit and up machines.

Full C++ won't even work on a 16 bit machine, either.



 I agree on those points. Those features would not be used when using D as a
 "better C".<

 A problem is that some of those D features can worsen a kernel code. So for
 example you have to review code to avoid operator overloading usage :-) There
 is lot of D compiler complexity useless for that kind of code. A simpler
 compiler means less bugs and less D manual to read.

If you're a kernel dev, the language features should not be a problem for you. 
BTW, you listed nested functions as disqualifying a language from being a
kernel 
dev language, yet gcc supports nested functions as an extension.


 The answer is that C++ doesn't offer much over C that does not involve
 those trouble causing features. D, on the other hand, offers substantial
 and valuable features not available in C or C++ that can be highly useful
 for kernel dev. Read on.<

 I don't know if D offers enough of what a kernel developer needs.

It offers more than what C does, so it must be enough since C is enough.


 Since it has more than C does, and C is used for kernel dev, then it must
 be enough.<

 
 Kernel C code uses several GCC extensions to the C language.

As I pointed out, D implements the bulk of those extensions as a standard part
of D.


 With all due respect to Linus, in 30 years of professionally writing
 software, I've found that if you solely base improvements on what customers
 ask for, all you have are incremental improvements. No quantum leaps, no
 paradigm shifts, no game changers.<

 
 You are right in general, but I don't know how much you are right regarding
 Linus. Linus desires some higher level features but maybe he doesn't exactly
 know what he desires :-)

Linus may very well be an expert on various languages and their tradeoffs, but 
maybe not. As far as languages go, he may only be an expert on C. All I know
for 
sure is he is an expert on C and kernel development, and a gifted manager.


 4.3 Labels as Values: that's computed gotos, they can be useful if you write
 an interpreter or you implement some kind of state machine.

They are useful in some circumstances, but are hardly necessary.

Jun 15 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/15/10, Walter Bright <newshound1 digitalmars.com> wrote:
 4.3 Labels as Values: that's computed gotos, they can be useful if you
 write
 an interpreter or you implement some kind of state machine.

 They are useful in some circumstances, but are hardly necessary.

Can't you accomplish the same thing with some minor sprinkling of
inline assembly anyway?

Jun 15 2010

Don <nospam nospam.com> writes:

Walter Bright wrote:
 bearophile wrote:
 Python integers don't overflow,
 this at the same time allows you to safe
 brain time and brain power thinking about possible overflows and the 
 code to
 avoid their risk, and makes coding more relaxed. And if you try to 
 write 50
 Project Euler programs in Python and D you will surely see how many 
 bugs the
 Python code has avoided you compared to D. Finding and fixing such 
 bugs in D
 code requires lot of time that you save in Python.

 
 This is where we differ. I very rarely have a bug due to overflow or 
 signed/unsigned differences.

One was fixed in this week's DMD release.

http://www.dsource.org/projects/dmd/changeset/491

It's interesting to think how this could have been avoided.

Jun 15 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Walter Bright wrote:
 bearophile wrote:
 We are going to 64 bit systems where 63 bits can be enough for
 lenghts. If
 arrays of 4 billion items are seen as important on 32 bit systems too,=


 then
 use a long :-) 2) I don't like D to silently gulp down expressions
 that mix
 signed and unsigned integers and spit out wrong results when the integ=


ers
 were negative.

=20
 That idea has a lot of merit for 64 bit systems. But there are two
 problems with it:
=20
 1. D source code is supposed to be portable between 32 and 64 bit
 systems. This would fail miserably if the sign of things silently chang=

e
 in the process.
=20

	Actually, that problem already occurs in C. I've had problems when
porting code from x86 to x86_64 because some unsigned operations
don't behave the same way on both...

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

Jérôme M. Berger wrote:
 	Actually, that problem already occurs in C. I've had problems when
 porting code from x86 to x86_64 because some unsigned operations
 don't behave the same way on both...

How so? I thought most 64 bit C compilers were specifically designed to avoid 
this problem.

Jun 15 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Walter Bright wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
     Actually, that problem already occurs in C. I've had problems when=


 porting code from x86 to x86_64 because some unsigned operations
 don't behave the same way on both...

=20
 How so? I thought most 64 bit C compilers were specifically designed to=

 avoid this problem.

	I can't isolate it to a minimal test case, but at my job, we make
an image processing library. Since negative image dimensions don't
make sense, we decided to define width and height as "unsigned int".
Now, we have code that works fine on 32-bit platforms (x86 and arm)
but segfaults on x86_64. Simply adding an (int) cast in front of the
image dimensions in a couple of places fixes the issue (tested with
various versions of gcc on linux and windows).

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Jun 16 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

 boundary="------------080305000704070801050406"

This is a multi-part message in MIME format.
--------------080305000704070801050406
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

J=C3=A9r=C3=B4me M. Berger wrote:
 Walter Bright wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
     Actually, that problem already occurs in C. I've had problems whe=



n
 porting code from x86 to x86_64 because some unsigned operations
 don't behave the same way on both...

 How so? I thought most 64 bit C compilers were specifically designed t=


o
 avoid this problem.

=20
 	I can't isolate it to a minimal test case, but at my job, we make
 an image processing library. Since negative image dimensions don't
 make sense, we decided to define width and height as "unsigned int".
 Now, we have code that works fine on 32-bit platforms (x86 and arm)
 but segfaults on x86_64. Simply adding an (int) cast in front of the
 image dimensions in a couple of places fixes the issue (tested with
 various versions of gcc on linux and windows).
=20

	Gotcha! See the attached test case. I will post the explanation for
the issue as a reply to give everyone a chance to try and spot the
error...

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

--------------080305000704070801050406
Content-Type: text/x-csrc;
 name="test.c"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline;
 filename="test.c"

#include <assert.h>
#include <stdio.h>

int main (int argc, char** argv) {
   char*        data   =3D argv[0];  /* Just to get a valid pointer */
   unsigned int offset =3D 3;

   printf ("Original: %p\n", data);

   data +=3D offset;
   printf ("+3      : %p\n", data);

   data +=3D -offset;
   printf ("-3      : %p\n", data);

   assert (data =3D=3D argv[0]);    /* Works on 32-bit systems, fails on =
64-bit */

   return 0;
}

--------------080305000704070801050406--

Jun 16 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

J=C3=A9r=C3=B4me M. Berger wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
 Walter Bright wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
     Actually, that problem already occurs in C. I've had problems wh=




en
 porting code from x86 to x86_64 because some unsigned operations
 don't behave the same way on both...

 How so? I thought most 64 bit C compilers were specifically designed =



to
 avoid this problem.

 	I can't isolate it to a minimal test case, but at my job, we make
 an image processing library. Since negative image dimensions don't
 make sense, we decided to define width and height as "unsigned int".
 Now, we have code that works fine on 32-bit platforms (x86 and arm)
 but segfaults on x86_64. Simply adding an (int) cast in front of the
 image dimensions in a couple of places fixes the issue (tested with
 various versions of gcc on linux and windows).

 	Gotcha! See the attached test case. I will post the explanation for
 the issue as a reply to give everyone a chance to try and spot the
 error...
=20

	The problem comes from the fact that an unsigned int is 32 bits,
even on 64 bits architecture, so what happens is:

 - Some operation between signed and unsigned ints gives a negative
result. Because of the automatic type conversion rules, this is
converted to an unsigned 32-bit int;

 - The result is added to a pointer. On 32-bit systems, the
operation simply wraps around and works. On 64-bit systems, the
result is extended to 64 bits by adding zeroes (since it is
unsigned) and the resulting pointer is wrong.

	That's reasonably easy to spot in this simple example. It's a lot
more difficult on real world code. We had the problem because we
were moving a pointer through the image data. As soon as the
movement depended on the image dimensions (say: move left by 1/4 the
width), then the program crashed. Every other kind of move worked
just fine...

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Jun 16 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jérôme M. Berger wrote:
 Jérôme M. Berger wrote:
 Walter Bright wrote:
 Jérôme M. Berger wrote:
     Actually, that problem already occurs in C. I've had problems when
 porting code from x86 to x86_64 because some unsigned operations
 don't behave the same way on both...

 How so? I thought most 64 bit C compilers were specifically designed to
 avoid this problem.

 	I can't isolate it to a minimal test case, but at my job, we make
 an image processing library. Since negative image dimensions don't
 make sense, we decided to define width and height as "unsigned int".
 Now, we have code that works fine on 32-bit platforms (x86 and arm)
 but segfaults on x86_64. Simply adding an (int) cast in front of the
 image dimensions in a couple of places fixes the issue (tested with
 various versions of gcc on linux and windows).

 	Gotcha! See the attached test case. I will post the explanation for
 the issue as a reply to give everyone a chance to try and spot the
 error...
 
 		Jerome
 

Whoa! That's indeed unfortunate. Allow me some more whoring for TDPL:

==============
\indexes{surprising behavior!of unary \lstinline{-}}%
One surprising  behavior of  unary minus is  that, when applied  to an
unsigned value,  it still yields  an unsigned value (according  to the
rules in~\S~\vref{sec:typing-of-ops}).  For example,\sbs  -55u  is\sbs
 4_294_967_241 , which is\sbs \ccbox{uint.max - 55 + 1}.

\indexes{unsigned type, natural number, two's complement, overflow}%
The fact that unsigned types are  not really natural numbers is a fact
of  life.   In\sbs\dee  and  many other  languages,  two's  complement
arithmetic with  its simple overflow  rules is an  inescapable reality
that cannot be abstracted away.  One way to think \mbox{of}  -val  for
any integral value~ val  is to  consider it a short form \mbox{of}$\,$
\cc{\~val + 1};  in other words, flip every bit in   val  and then add
 1   to  the result.   This  manipulation  does  not raise  particular
questions about the signedness of~ val .
==============

(This heavily adorned text also shows what sausage making looks like...)


Andrei

Jun 16 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Andrei Alexandrescu wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
 Walter Bright wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
     Actually, that problem already occurs in C. I've had problems w=





hen
 porting code from x86 to x86_64 because some unsigned operations
 don't behave the same way on both...

 How so? I thought most 64 bit C compilers were specifically designed=




 to
 avoid this problem.

     I can't isolate it to a minimal test case, but at my job, we make=



 an image processing library. Since negative image dimensions don't
 make sense, we decided to define width and height as "unsigned int".
 Now, we have code that works fine on 32-bit platforms (x86 and arm)
 but segfaults on x86_64. Simply adding an (int) cast in front of the
 image dimensions in a couple of places fixes the issue (tested with
 various versions of gcc on linux and windows).

     Gotcha! See the attached test case. I will post the explanation fo=


r
 the issue as a reply to give everyone a chance to try and spot the
 error...

         Jerome

=20
 Whoa! That's indeed unfortunate. Allow me some more whoring for TDPL:
=20
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 \indexes{surprising behavior!of unary \lstinline{-}}%
 One surprising  behavior of  unary minus is  that, when applied  to an
 unsigned value,  it still yields  an unsigned value (according  to the
 rules in~\S~\vref{sec:typing-of-ops}).  For example,\sbs  -55u  is\sbs
  4_294_967_241 , which is\sbs \ccbox{uint.max - 55 + 1}.
=20
 \indexes{unsigned type, natural number, two's complement, overflow}%
 The fact that unsigned types are  not really natural numbers is a fact
 of  life.   In\sbs\dee  and  many other  languages,  two's  complement
 arithmetic with  its simple overflow  rules is an  inescapable reality
 that cannot be abstracted away.  One way to think \mbox{of}  -val  for
 any integral value~ val  is to  consider it a short form \mbox{of}$\,$
 \cc{\~val + 1};  in other words, flip every bit in   val  and then add
  1   to  the result.   This  manipulation  does  not raise  particular
 questions about the signedness of~ val .
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
 (This heavily adorned text also shows what sausage making looks like...=

)
=20

	In the original code, the problem didn't come from an unary minus.
The rhs expression was quite a bit more complicated than that (not
counting the fact that it was hidden in a preprocessor macro...).

	Note moreover that the problem doesn't come from the unary minus
since the code works as expected on 32-bit platforms...

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Jun 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

Jérôme M. Berger wrote:
 Jérôme M. Berger wrote:
 Walter Bright wrote:
 Jérôme M. Berger wrote:
     Actually, that problem already occurs in C. I've had problems when
 porting code from x86 to x86_64 because some unsigned operations
 don't behave the same way on both...

 How so? I thought most 64 bit C compilers were specifically designed to
 avoid this problem.

 	I can't isolate it to a minimal test case, but at my job, we make
 an image processing library. Since negative image dimensions don't
 make sense, we decided to define width and height as "unsigned int".
 Now, we have code that works fine on 32-bit platforms (x86 and arm)
 but segfaults on x86_64. Simply adding an (int) cast in front of the
 image dimensions in a couple of places fixes the issue (tested with
 various versions of gcc on linux and windows).

 	Gotcha! See the attached test case. I will post the explanation for
 the issue as a reply to give everyone a chance to try and spot the
 error...

Easy. offset should be a size_t, not an unsigned.

Jun 16 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Walter Bright wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
 Walter Bright wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
     Actually, that problem already occurs in C. I've had problems w=





hen
 porting code from x86 to x86_64 because some unsigned operations
 don't behave the same way on both...

 How so? I thought most 64 bit C compilers were specifically designed=




 to
 avoid this problem.

     I can't isolate it to a minimal test case, but at my job, we make=



 an image processing library. Since negative image dimensions don't
 make sense, we decided to define width and height as "unsigned int".
 Now, we have code that works fine on 32-bit platforms (x86 and arm)
 but segfaults on x86_64. Simply adding an (int) cast in front of the
 image dimensions in a couple of places fixes the issue (tested with
 various versions of gcc on linux and windows).

     Gotcha! See the attached test case. I will post the explanation fo=


r
 the issue as a reply to give everyone a chance to try and spot the
 error...

=20
 Easy. offset should be a size_t, not an unsigned.

	And what about image width and height? Sure, in hindsight they
could probably be made into size_t too. Much easier and safer to
make them into signed ints instead, since we don't manipulate images
bigger than 2_147_483_648 on a side anyway...

	Which is more or less bearophile's point: unless you're *really*
sure that you know what you're doing, use signed ints even if
negative numbers make no sense in a particular context.

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Jun 16 2010

Don <nospam nospam.com> writes:

Jérôme M. Berger wrote:
 Walter Bright wrote:
 Jérôme M. Berger wrote:
 Jérôme M. Berger wrote:
 Walter Bright wrote:
 Jérôme M. Berger wrote:

 Now, we have code that works fine on 32-bit platforms (x86 and arm)
 but segfaults on x86_64. Simply adding an (int) cast in front of the
 image dimensions in a couple of places fixes the issue (tested with
 various versions of gcc on linux and windows).




 Easy. offset should be a size_t, not an unsigned.

 
 	And what about image width and height? Sure, in hindsight they
 could probably be made into size_t too. Much easier and safer to
 make them into signed ints instead, since we don't manipulate images
 bigger than 2_147_483_648 on a side anyway...
 
 	Which is more or less bearophile's point: unless you're *really*
 sure that you know what you're doing, use signed ints even if
 negative numbers make no sense in a particular context.

I agree.
Actually the great evil in C is that implicit casts from 
signed<->unsigned AND sign extension are both permitted in a single 
expression.
I hope that when the integer range checking is fully implemented in D, 
such two-way implicit casts will be forbidden.

(D has introduced ANOTHER instance of this with the ridiculous >>> 
operator.
byte b = -1;
byte c = b >>> 1;
Guess what c is!
)

Jun 17 2010

Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:

On Thu, 17 Jun 2010 10:00:24 +0200, Don <nospam nospam.com> wrote:
 (D has introduced ANOTHER instance of this with the ridiculous >>> 
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )
 

127, right? I know at least RISC processors tend to have instructions 
for both a logical and algebraic right shift. In that context, it makes 
sense for a systems programming language.

Jun 17 2010

Don <nospam nospam.com> writes:

Justin Spahr-Summers wrote:
 On Thu, 17 Jun 2010 10:00:24 +0200, Don <nospam nospam.com> wrote:
 (D has introduced ANOTHER instance of this with the ridiculous >>> 
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 
 127, right? I know at least RISC processors tend to have instructions 
 for both a logical and algebraic right shift. In that context, it makes 
 sense for a systems programming language.

Surprise! c == -1.
Because 1 is an int, b gets promoted to int before the shift happens.
Then the result is 0x7FFF_FFFF which then gets converted to byte, 
leaving 0xFF == -1.

Jun 17 2010

BCS <none anon.com> writes:

Hello Don,

 Surprise! c == -1.

No kidding!

 Because 1 is an int, b gets promoted to int before the shift happens.

Why would it ever need to be promoted? Unless all (most?) CPUs have only 
size_t shifts, all three shifts should never promote the LHS.

-- 
... <IXOYE><

Jun 17 2010

Don <nospam nospam.com> writes:

BCS wrote:
 Hello Don,
 
 Surprise! c == -1.

 
 No kidding!
 
 Because 1 is an int, b gets promoted to int before the shift happens.

 
 Why would it ever need to be promoted? Unless all (most?) CPUs have only 
 size_t shifts, all three shifts should never promote the LHS.
 

It shouldn't NEED to. But C defined that >> and << operate that way.

Jun 17 2010

BCS <none anon.com> writes:

Hello Don,

 BCS wrote:
 
 Hello Don,
 
 Surprise! c == -1.
 

 No kidding!
 
 Because 1 is an int, b gets promoted to int before the shift
 happens.
 

 Why would it ever need to be promoted? Unless all (most?) CPUs have
 only size_t shifts, all three shifts should never promote the LHS.
 

 It shouldn't NEED to. But C defined that >> and << operate that way.
 

At leat for the >>> can we break that? C doesn't even *have* a >>> operator.

-- 
... <IXOYE><

Jun 17 2010

Kagamin <spam here.lot> writes:

Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>> 
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

:)
Well, there was issue. Wasn't it fixed?
More interesting case is
byte c = -1 >>> 1;

Jun 17 2010

Don <nospam nospam.com> writes:

Kagamin wrote:
 Don Wrote:
 
 (D has introduced ANOTHER instance of this with the ridiculous >>> 
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 
 :)
 Well, there was issue. Wasn't it fixed?

No. It's a design flaw, not a bug. I think it could only be fixed by 
disallowing that code, or creating a special rule to make that code do 
what you expect. A better solution would be to drop >>>.

 More interesting case is
 byte c = -1 >>> 1;

Jun 17 2010

KennyTM~ <kennytm gmail.com> writes:

On Jun 17, 10 18:59, Don wrote:
 Kagamin wrote:
 Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>>
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 :)
 Well, there was issue. Wasn't it fixed?

 No. It's a design flaw, not a bug. I think it could only be fixed by
 disallowing that code, or creating a special rule to make that code do
 what you expect. A better solution would be to drop >>>.

I disagree. The flaw is whether x should be promoted to 
CommonType!(typeof(x), int), given that the range of typeof(x >>> y) 
should never exceed the range of typeof(x), no matter what value y is.

 More interesting case is
 byte c = -1 >>> 1;

Jun 17 2010

Don <nospam nospam.com> writes:

KennyTM~ wrote:
 On Jun 17, 10 18:59, Don wrote:
 Kagamin wrote:
 Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>>
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 :)
 Well, there was issue. Wasn't it fixed?

 No. It's a design flaw, not a bug. I think it could only be fixed by
 disallowing that code, or creating a special rule to make that code do
 what you expect. A better solution would be to drop >>>.

 
 I disagree. The flaw is whether x should be promoted to 
 CommonType!(typeof(x), int), given that the range of typeof(x >>> y) 
 should never exceed the range of typeof(x), no matter what value y is.

The range of typeof(x & y) can never exceed the range of typeof(x), no 
matter what value y is. Yet (byte & int) is promoted to int.
Actually, what happens to x>>>y if y is negative?

The current rule is:
x OP y      means
cast(CommonType!(x,y))x OP cast(CommonType!(x,y))y

for any binary operation OP.
How can we fix >>> without adding an extra rule?




 
 More interesting case is
 byte c = -1 >>> 1;

Jun 17 2010

BCS <none anon.com> writes:

Hello Don,

 The current rule is:
 x OP y      means
 cast(CommonType!(x,y))x OP cast(CommonType!(x,y))y
 for any binary operation OP.
 How can we fix >>> without adding an extra rule?

However it's not that way for the ternary op, so there is a (somewhat related) 
precedent.

Even considering RHS<0, I would NEVER /expect/ a shift to have any type other 
than typeof(LHS). 

-- 
... <IXOYE><

Jun 17 2010

KennyTM~ <kennytm gmail.com> writes:

On Jun 17, 10 21:04, Don wrote:
 KennyTM~ wrote:
 On Jun 17, 10 18:59, Don wrote:
 Kagamin wrote:
 Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>>
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 :)
 Well, there was issue. Wasn't it fixed?

 No. It's a design flaw, not a bug. I think it could only be fixed by
 disallowing that code, or creating a special rule to make that code do
 what you expect. A better solution would be to drop >>>.

 I disagree. The flaw is whether x should be promoted to
 CommonType!(typeof(x), int), given that the range of typeof(x >>> y)
 should never exceed the range of typeof(x), no matter what value y is.

 The range of typeof(x & y) can never exceed the range of typeof(x), no
 matter what value y is. Yet (byte & int) is promoted to int.

That's arguable. But (byte & int -> int) is meaningful because (&) is 
some what "symmetric" compared to (>>>).

What does (&) do?

     (a & b) <=> foreach (bit x, y; zip(a, b))
                   yield bit (x == y ? 1 : 0);

What does (>>>) do?

     (a >>> b) <=> repeat b times {
                     logical right shift (a);
                   }
                   return a;

Algorithmically, (&) needs to iterate over all bits of "a" and "b", but 
for (>>>) the range of "b" is irrelevant to the result of "a >>> b".

 Actually, what happens to x>>>y if y is negative?

x.d(6): Error: shift by -1 is outside the range 0..32

 The current rule is:
 x OP y means
 cast(CommonType!(x,y))x OP cast(CommonType!(x,y))y

 for any binary operation OP.
 How can we fix >>> without adding an extra rule?

There's already an extra rule for >>>.

     ubyte a = 1;
     writeln(typeof(a >>> a).stringof);
     // prints "int".

Similarly, (^^), (==), etc do not obey this "rule".

IMO, for ShiftExpression ((>>), (<<), (>>>)) the return type should be 
typeof(lhs).

 More interesting case is
 byte c = -1 >>> 1;

Jun 17 2010

Don <nospam nospam.com> writes:

KennyTM~ wrote:
 On Jun 17, 10 21:04, Don wrote:
 KennyTM~ wrote:
 On Jun 17, 10 18:59, Don wrote:
 Kagamin wrote:
 Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>>
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 :)
 Well, there was issue. Wasn't it fixed?

 No. It's a design flaw, not a bug. I think it could only be fixed by
 disallowing that code, or creating a special rule to make that code do
 what you expect. A better solution would be to drop >>>.

 I disagree. The flaw is whether x should be promoted to
 CommonType!(typeof(x), int), given that the range of typeof(x >>> y)
 should never exceed the range of typeof(x), no matter what value y is.

 The range of typeof(x & y) can never exceed the range of typeof(x), no
 matter what value y is. Yet (byte & int) is promoted to int.

 
 That's arguable. But (byte & int -> int) is meaningful because (&) is 
 some what "symmetric" compared to (>>>).

See below. It's what C does that matters.

 Actually, what happens to x>>>y if y is negative?

 
 x.d(6): Error: shift by -1 is outside the range 0..32

If y is a variable, it actually performs   x >>> (y&31);
So it actually makes no sense for it to cast everything to int.

 The current rule is:
 x OP y means
 cast(CommonType!(x,y))x OP cast(CommonType!(x,y))y

 for any binary operation OP.
 How can we fix >>> without adding an extra rule?

 
 There's already an extra rule for >>>.
 
     ubyte a = 1;
     writeln(typeof(a >>> a).stringof);
     // prints "int".
 
 Similarly, (^^), (==), etc do not obey this "rule".

The logical operators aren't relevant. They all return bool.
^^ obeys the rule: typeof(a^^b) is typeof(a*b), in all cases.

 IMO, for ShiftExpression ((>>), (<<), (>>>)) the return type should be 
 typeof(lhs).

I agree that would be better, but it would be a silent change from the C 
behaviour. So it's not possible.

Jun 17 2010

KennyTM~ <kennytm gmail.com> writes:

On Jun 17, 10 23:50, Don wrote:
 KennyTM~ wrote:
 On Jun 17, 10 21:04, Don wrote:
 KennyTM~ wrote:
 On Jun 17, 10 18:59, Don wrote:
 Kagamin wrote:
 Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>>
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 :)
 Well, there was issue. Wasn't it fixed?

 No. It's a design flaw, not a bug. I think it could only be fixed by
 disallowing that code, or creating a special rule to make that code do
 what you expect. A better solution would be to drop >>>.

 I disagree. The flaw is whether x should be promoted to
 CommonType!(typeof(x), int), given that the range of typeof(x >>> y)
 should never exceed the range of typeof(x), no matter what value y is.

 The range of typeof(x & y) can never exceed the range of typeof(x), no
 matter what value y is. Yet (byte & int) is promoted to int.

 That's arguable. But (byte & int -> int) is meaningful because (&) is
 some what "symmetric" compared to (>>>).

 See below. It's what C does that matters.

 Actually, what happens to x>>>y if y is negative?

 x.d(6): Error: shift by -1 is outside the range 0..32

 If y is a variable, it actually performs x >>> (y&31);
 So it actually makes no sense for it to cast everything to int.

 The current rule is:
 x OP y means
 cast(CommonType!(x,y))x OP cast(CommonType!(x,y))y

 for any binary operation OP.
 How can we fix >>> without adding an extra rule?

 There's already an extra rule for >>>.

 ubyte a = 1;
 writeln(typeof(a >>> a).stringof);
 // prints "int".

 Similarly, (^^), (==), etc do not obey this "rule".

 The logical operators aren't relevant. They all return bool.
 ^^ obeys the rule: typeof(a^^b) is typeof(a*b), in all cases.

 IMO, for ShiftExpression ((>>), (<<), (>>>)) the return type should be
 typeof(lhs).

 I agree that would be better, but it would be a silent change from the C
 behaviour. So it's not possible.

Too bad.

Jun 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 KennyTM~ wrote:
 On Jun 17, 10 18:59, Don wrote:
 Kagamin wrote:
 Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>>
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 :)
 Well, there was issue. Wasn't it fixed?

 No. It's a design flaw, not a bug. I think it could only be fixed by
 disallowing that code, or creating a special rule to make that code do
 what you expect. A better solution would be to drop >>>.

 I disagree. The flaw is whether x should be promoted to 
 CommonType!(typeof(x), int), given that the range of typeof(x >>> y) 
 should never exceed the range of typeof(x), no matter what value y is.

 
 The range of typeof(x & y) can never exceed the range of typeof(x), no 
 matter what value y is. Yet (byte & int) is promoted to int.
 Actually, what happens to x>>>y if y is negative?
 
 The current rule is:
 x OP y      means
 cast(CommonType!(x,y))x OP cast(CommonType!(x,y))y
 
 for any binary operation OP.
 How can we fix >>> without adding an extra rule?

Wait a minute. D should never allow an implicit narrowing conversion. It 
doesn't for other cases, so isn't this a simple bug?


Andrei

Jun 17 2010

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Don wrote:
 KennyTM~ wrote:
 On Jun 17, 10 18:59, Don wrote:
 Kagamin wrote:
 Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>>
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 :)
 Well, there was issue. Wasn't it fixed?

 No. It's a design flaw, not a bug. I think it could only be fixed by
 disallowing that code, or creating a special rule to make that code do
 what you expect. A better solution would be to drop >>>.

 I disagree. The flaw is whether x should be promoted to 
 CommonType!(typeof(x), int), given that the range of typeof(x >>> y) 
 should never exceed the range of typeof(x), no matter what value y is.

 The range of typeof(x & y) can never exceed the range of typeof(x), no 
 matter what value y is. Yet (byte & int) is promoted to int.
 Actually, what happens to x>>>y if y is negative?

 The current rule is:
 x OP y      means
 cast(CommonType!(x,y))x OP cast(CommonType!(x,y))y

 for any binary operation OP.
 How can we fix >>> without adding an extra rule?

 
 Wait a minute. D should never allow an implicit narrowing conversion. It 
 doesn't for other cases, so isn't this a simple bug?

It'll make it illegal, but it won't make it usable.
I think the effect of full range propagation will be that >>> will 
become illegal for anything other than int and long, unless it is 
provably identical to >>.
Unless you do the hideous  b >>> cast(typeof(b))1;

I think every D style guide will include the recommendation, "never use 
".



A question I have though is, Java has >>>. Does Java have these problems 
too?

Jun 17 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 17 Jun 2010 15:24:52 -0400, Don <nospam nospam.com> wrote:

 A question I have though is, Java has >>>. Does Java have these problems  
 too?

Java doesn't have unsigned values, so it's necessary to use regular int's  
as bitmasks, hence the extra operator.

-Steve

Jun 17 2010

Walter Bright <newshound1 digitalmars.com> writes:

Steven Schveighoffer wrote:
 On Thu, 17 Jun 2010 15:24:52 -0400, Don <nospam nospam.com> wrote:
 
 A question I have though is, Java has >>>. Does Java have these 
 problems too?

 
 Java doesn't have unsigned values, so it's necessary to use regular 
 int's as bitmasks, hence the extra operator.

The reason D has >>> is to cause an unsigned right shift to be generated
without 
needing to resort to casts as one has to in C.

The problem with such casts is they wreck generic code.

Jun 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Steven Schveighoffer wrote:
 On Thu, 17 Jun 2010 15:24:52 -0400, Don <nospam nospam.com> wrote:

 A question I have though is, Java has >>>. Does Java have these 
 problems too?

 Java doesn't have unsigned values, so it's necessary to use regular 
 int's as bitmasks, hence the extra operator.

 
 The reason D has >>> is to cause an unsigned right shift to be generated 
 without needing to resort to casts as one has to in C.
 
 The problem with such casts is they wreck generic code.

No.

http://www.digitalmars.com/d/2.0/phobos/std_traits.html#Unsigned

void fun(T)(T num) if (isIntegral!T)
{
     auto x = cast(Unsigned!T) num;
     ...
}


Andrei

Jun 17 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 On Thu, 17 Jun 2010 15:24:52 -0400, Don <nospam nospam.com> wrote:

 A question I have though is, Java has >>>. Does Java have these 
 problems too?

 Java doesn't have unsigned values, so it's necessary to use regular 
 int's as bitmasks, hence the extra operator.

 The reason D has >>> is to cause an unsigned right shift to be 
 generated without needing to resort to casts as one has to in C.

 The problem with such casts is they wreck generic code.

 
 No.
 
 http://www.digitalmars.com/d/2.0/phobos/std_traits.html#Unsigned
 
 void fun(T)(T num) if (isIntegral!T)
 {
     auto x = cast(Unsigned!T) num;
     ...
 }

It's not a perfect replacement, as in if T is a custom integer type, you have
to 
extend the template to support it. Furthermore, now your BigInt custom type
also 
has to support a cast to unsigned just so it can right shift. Also, T may not
be 
readily identifiable, so you'd have to write:

     cast(Unsigned!(typeof(expr)) expr;

Jun 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 On Thu, 17 Jun 2010 15:24:52 -0400, Don <nospam nospam.com> wrote:

 A question I have though is, Java has >>>. Does Java have these 
 problems too?

 Java doesn't have unsigned values, so it's necessary to use regular 
 int's as bitmasks, hence the extra operator.

 The reason D has >>> is to cause an unsigned right shift to be 
 generated without needing to resort to casts as one has to in C.

 The problem with such casts is they wreck generic code.

 No.

 http://www.digitalmars.com/d/2.0/phobos/std_traits.html#Unsigned

 void fun(T)(T num) if (isIntegral!T)
 {
     auto x = cast(Unsigned!T) num;
     ...
 }

 
 It's not a perfect replacement, as in if T is a custom integer type, you 
 have to extend the template to support it.

Let me think when I wanted an unsigned shift against an 
arbitrarily-sized integer. Um... never?

 Furthermore, now your BigInt 
 custom type also has to support a cast to unsigned just so it can right 
 shift.

BigInt is a superficial argument. Unless you're willing to flesh it out 
much better, it can be safely dropped.

 Also, T may not be readily identifiable, so you'd have to write:
 
     cast(Unsigned!(typeof(expr)) expr;

It's not like shift occurs often enough to make that an issue.

Note that your argument is predicated on using signed types instead of 
unsigned types in the first place, and tacitly assumes the issue is 
frequent enough to *add a new operator*. Yet unsigned shifts correlate 
naturally with unsigned numbers.

So what is exactly that is valuable in >>> that makes its presence in 
the language justifiable?


Andrei

Jun 17 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Note that your argument is predicated on using signed types instead of 
 unsigned types in the first place, and tacitly assumes the issue is 
 frequent enough to *add a new operator*. Yet unsigned shifts correlate 
 naturally with unsigned numbers.
 
 So what is exactly that is valuable in >>> that makes its presence in 
 the language justifiable?

Generally the irritation I feel whenever I right shift and have to go back 
through and either check the type or just cast it to unsigned to be sure there 
is no latent bug.

For example, the optlink asm code does quite a lot of unsigned right shifts. I 
have to be very careful about the typing to ensure a matching unsigned shift, 
since I have little idea what the range of values the variable can have.

Jun 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Note that your argument is predicated on using signed types instead of 
 unsigned types in the first place, and tacitly assumes the issue is 
 frequent enough to *add a new operator*. Yet unsigned shifts correlate 
 naturally with unsigned numbers.

 So what is exactly that is valuable in >>> that makes its presence in 
 the language justifiable?

 
 Generally the irritation I feel whenever I right shift and have to go 
 back through and either check the type or just cast it to unsigned to be 
 sure there is no latent bug.
 
 For example, the optlink asm code does quite a lot of unsigned right 
 shifts. I have to be very careful about the typing to ensure a matching 
 unsigned shift, since I have little idea what the range of values the 
 variable can have.

I'm sure all linker asm writers will be happy about that feature :o}.

Andrei

Jun 17 2010

Don <nospam nospam.com> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Note that your argument is predicated on using signed types instead of 
 unsigned types in the first place, and tacitly assumes the issue is 
 frequent enough to *add a new operator*. Yet unsigned shifts correlate 
 naturally with unsigned numbers.

 So what is exactly that is valuable in >>> that makes its presence in 
 the language justifiable?

 
 Generally the irritation I feel whenever I right shift and have to go 
 back through and either check the type or just cast it to unsigned to be 
 sure there is no latent bug.

But x >>> 1 doesn't work for shorts and bytes.

 For example, the optlink asm code does quite a lot of unsigned right 
 shifts. I have to be very careful about the typing to ensure a matching 
 unsigned shift, since I have little idea what the range of values the 
 variable can have.

I've read the OMF spec, and I know it includes shorts and bytes.
So I really don't think >>> solves even this use case.

Jun 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Don wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Note that your argument is predicated on using signed types instead 
 of unsigned types in the first place, and tacitly assumes the issue 
 is frequent enough to *add a new operator*. Yet unsigned shifts 
 correlate naturally with unsigned numbers.

 So what is exactly that is valuable in >>> that makes its presence in 
 the language justifiable?

 Generally the irritation I feel whenever I right shift and have to go 
 back through and either check the type or just cast it to unsigned to 
 be sure there is no latent bug.

 
 But x >>> 1 doesn't work for shorts and bytes.

I know. That's ill thought out.

 
 For example, the optlink asm code does quite a lot of unsigned right 
 shifts. I have to be very careful about the typing to ensure a 
 matching unsigned shift, since I have little idea what the range of 
 values the variable can have.

 
 I've read the OMF spec, and I know it includes shorts and bytes.
 So I really don't think >>> solves even this use case.

I can send you the source if you like <g>.

Jun 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Don wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Note that your argument is predicated on using signed types instead 
 of unsigned types in the first place, and tacitly assumes the issue 
 is frequent enough to *add a new operator*. Yet unsigned shifts 
 correlate naturally with unsigned numbers.

 So what is exactly that is valuable in >>> that makes its presence 
 in the language justifiable?

 Generally the irritation I feel whenever I right shift and have to go 
 back through and either check the type or just cast it to unsigned to 
 be sure there is no latent bug.

 But x >>> 1 doesn't work for shorts and bytes.

 
 I know. That's ill thought out.

The please rule it out of the language.

Andrei

Jun 18 2010

Don <nospam nospam.com> writes:

Walter Bright wrote:
 Steven Schveighoffer wrote:
 On Thu, 17 Jun 2010 15:24:52 -0400, Don <nospam nospam.com> wrote:

 A question I have though is, Java has >>>. Does Java have these 
 problems too?

 Java doesn't have unsigned values, so it's necessary to use regular 
 int's as bitmasks, hence the extra operator.

 
 The reason D has >>> is to cause an unsigned right shift to be generated 
 without needing to resort to casts as one has to in C.

Unfortunately it doesn't work. You still can't do an unsigned right 
shift of a signed byte by 1, without resorting to a cast.

 The problem with such casts is they wreck generic code.

It's C's cavalier approach to implicit conversions that wrecks generic 
code. And it makes such a pigs breakfast of it that >>> doesn't quite work.

Jun 17 2010

BCS <none anon.com> writes:

Hello Don,

 It's C's cavalier approach to implicit conversions that wrecks generic
 code. And it makes such a pigs breakfast of it that >>> doesn't quite
 work.
 

I still haven't seen anyone address how typeof(a>>>b) == typeof(a) breaks 
c code when a>>>b isn't legal c to begin with. (Note, I'm not saying do the 
same with >> or << because I see why that can;t be done)

-- 
... <IXOYE><

Jun 18 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

BCS <none anon.com> wrote:

 I still haven't seen anyone address how typeof(a>>>b) == typeof(a)  
 breaks c code when a>>>b isn't legal c to begin with.

It doesn't, of course. However, it is desirable to have similar
rules for similar operations, like >> and >>>.

-- 
Simen

Jun 19 2010

Don <nospam nospam.com> writes:

Simen kjaeraas wrote:
 BCS <none anon.com> wrote:
 
 I still haven't seen anyone address how typeof(a>>>b) == typeof(a) 
 breaks c code when a>>>b isn't legal c to begin with.

 
 It doesn't, of course. However, it is desirable to have similar
 rules for similar operations, like >> and >>>.
 

Which is why I said that it doesn't seem possible to make >>> work, 
without making a special-case rule for it.

Jun 20 2010

BCS <none anon.com> writes:

Hello Don,

 Simen kjaeraas wrote:
 
 BCS <none anon.com> wrote:
 
 I still haven't seen anyone address how typeof(a>>>b) == typeof(a)
 breaks c code when a>>>b isn't legal c to begin with.
 

 It doesn't, of course. However, it is desirable to have similar rules
 for similar operations, like >> and >>>.
 

 Which is why I said that it doesn't seem possible to make >>> work,
 without making a special-case rule for it.

At least for me, I find the current situation more surprising than the
alternative. 
For that matter if >>> worked different than >>, If that were the case, I 
think I would have (the first time I ran across it) thought the >> case was 
the odd one.

-- 
... <IXOYE><

Jun 20 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 KennyTM~ wrote:
 On Jun 17, 10 18:59, Don wrote:
 Kagamin wrote:
 Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>>
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 :)
 Well, there was issue. Wasn't it fixed?

 No. It's a design flaw, not a bug. I think it could only be fixed by
 disallowing that code, or creating a special rule to make that code do
 what you expect. A better solution would be to drop >>>.

 I disagree. The flaw is whether x should be promoted to 
 CommonType!(typeof(x), int), given that the range of typeof(x >>> y) 
 should never exceed the range of typeof(x), no matter what value y is.

 The range of typeof(x & y) can never exceed the range of typeof(x), 
 no matter what value y is. Yet (byte & int) is promoted to int.
 Actually, what happens to x>>>y if y is negative?

 The current rule is:
 x OP y      means
 cast(CommonType!(x,y))x OP cast(CommonType!(x,y))y

 for any binary operation OP.
 How can we fix >>> without adding an extra rule?

 Wait a minute. D should never allow an implicit narrowing conversion. 
 It doesn't for other cases, so isn't this a simple bug?

 
 It'll make it illegal, but it won't make it usable.
 I think the effect of full range propagation will be that >>> will 
 become illegal for anything other than int and long, unless it is 
 provably identical to >>.
 Unless you do the hideous  b >>> cast(typeof(b))1;
 
 I think every D style guide will include the recommendation, "never use 
  >>>".

Three times. Three times I tried to convince Walter to remove that crap 
from D - one for each '>'. Last time was as the manuscript going out the 
door and I was willing to take the flak from the copyeditors for the 
changes in pagination. Just like with non-null references, Walter has 
framed the matter in a way that makes convincing extremely difficult. 
That would be great if he were right.

 A question I have though is, Java has >>>. Does Java have these problems 
 too?

Java is much more conservative with implicit conversions, so they 
wouldn't allow the assignment without a cast. Beyond that, yes, the 
issues are the same.


Andrei

Jun 17 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Just like with non-null references, Walter has 
 framed the matter in a way that makes convincing extremely difficult. 
 That would be great if he were right.

I know this is off-topic in this thread. I remember the long thread about this.
Making all D references nonnull on default requires a significant change in
both the language and the way objects are used in D, so I can understand that
Walter has refused this idea, maybe he is right.

But something more moderate can be done, keep the references nullable on
default, and it can be invented a symbol (like  ) that can be added as suffix
to a class reference type or pointer type, that denotes it is nonnull (and the
type system can enforce it at the calling point too, etc, it's part of the
function signature or variable type, so it's more than just syntax sugar for a
null test inside the function!).

I believe this reduced idea can be enough to avoid many null-derived bugs, it's
different from the situation of the Java exceptions, it's less viral, if you
write a 100 lines long D program, or a long C-style D program, you are probably
free to never use this feature.

void foo(int*  ptr, Bar  b) {...}

void main() {
    int*  p = ensureNonull(cast(int*)malloc(int.sizeof));
    Bar  b = ensureNonull(new Bar());
    foo(p, b);
}

Something (badly named) like ensureNonull() changes the input type into a
notnull type and performs a run-time test of not-null-ty :-)

Surely this idea has some holes, but they can probably be fixed. 

Bye,
bearophile

Jun 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Kagamin wrote:
 Don Wrote:

 (D has introduced ANOTHER instance of this with the ridiculous >>> 
 operator.
 byte b = -1;
 byte c = b >>> 1;
 Guess what c is!
 )

 :)
 Well, there was issue. Wasn't it fixed?

 
 No. It's a design flaw, not a bug. I think it could only be fixed by 
 disallowing that code, or creating a special rule to make that code do 
 what you expect. A better solution would be to drop >>>.

I agree. But even within the current language, value range propagation 
(VRP) should disallow this case without a problem.

There's been a long discussion about computing the bounds of a & b and a 
|| b given the bounds of a and b. The current VRP code for those 
operations is broken, and I suspect the VRP code for a >>> b is broken too.


Andrei

Jun 17 2010

Kagamin <spam here.lot> writes:

Walter Bright Wrote:

 Easy. offset should be a size_t, not an unsigned.

I've hit the bug using size_t at the right side of a+=-b (array length). It's
just a long was at the left side (file offset). Such code should actually work
in 64bit system and it fails in 32bit. MS compiler reports such portability
issues with a warning, I believe.

Jun 16 2010

Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:

On Thu, 17 Jun 2010 02:46:13 -0400, Kagamin <spam here.lot> wrote:
 
 Walter Bright Wrote:
 
 Easy. offset should be a size_t, not an unsigned.

 
 I've hit the bug using size_t at the right side of a+=-b (array length). It's
just a long was at the left side (file offset). Such code should actually work
in 64bit system and it fails in 32bit. MS compiler reports such portability
issues with a warning, I believe.

This sounds more like an issue with file offsets being longs, 
ironically. Using longs to represent zero-based locations in a file is 
extremely unsafe. Such usages should really be restricted to short-range 
offsets from the current file position, and fpos_t used for everything 
else (which is assumably available in std.c.stdio).

Jun 16 2010

Kagamin <spam here.lot> writes:

Justin Spahr-Summers Wrote:

 This sounds more like an issue with file offsets being longs, 
 ironically. Using longs to represent zero-based locations in a file is 
 extremely unsafe. Such usages should really be restricted to short-range 
 offsets from the current file position, and fpos_t used for everything 
 else (which is assumably available in std.c.stdio).

1. Ironically the issue is not in file offset's signedness. You still hit the
bug with ulong offset.
2. Signed offset is two times safer than unsigned as you can detect underflow
bug (and, maybe, overflow). With unsigned offset you get exception if the
filesystem doesn't support sparse files, so the linux will keep silence.
3. Signed offset is consistent/type-safe in the case of the seek function as it
doesn't arbitrarily mutate between signed and unsigned.
4. Choosing unsigned for file offset is not dictated by safety, but by
stupidity: "hey, I lose my bit!"
I AM an optimization zealot, but unsigned offsets are plain dead freaking
stupid.

Jun 17 2010

Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:

On Thu, 17 Jun 2010 03:27:59 -0400, Kagamin <spam here.lot> wrote:
 
 Justin Spahr-Summers Wrote:
 
 This sounds more like an issue with file offsets being longs, 
 ironically. Using longs to represent zero-based locations in a file is 
 extremely unsafe. Such usages should really be restricted to short-range 
 offsets from the current file position, and fpos_t used for everything 
 else (which is assumably available in std.c.stdio).

 
 1. Ironically the issue is not in file offset's signedness. You still hit the
bug with ulong offset.

How so? Subtracting a size_t from a ulong offset will only cause 
problems if the size_t value is larger than the offset. If that's the 
case, then the issue remains even with a signed offset.

 2. Signed offset is two times safer than unsigned as you can detect
 underflow bug (and, maybe, overflow).

The solution with unsigned values is to make sure that they won't 
underflow *before* performing the arithmetic - and that's really the 
proper solution anyways.

 With unsigned offset you get exception if the filesystem doesn't
 support sparse files, so the linux will keep silence.

I'm not sure what this means. Can you explain?

 3. Signed offset is consistent/type-safe in the case of the seek function as
it doesn't arbitrarily mutate between signed and unsigned.

My point was about signed values being used to represent zero-based 
indices. Obviously there are applications for a signed offset *from the 
current position*. It's seeking to a signed offset *from the start of 
the file* that's unsafe.

 4. Choosing unsigned for file offset is not dictated by safety, but by
stupidity: "hey, I lose my bit!"

You referred to 32-bit systems, correct? I'm sure there are 32-bit 
systems out there that need to be able to access files larger than two 
gigabytes.

 I AM an optimization zealot, but unsigned offsets are plain dead
 freaking stupid.

It's not an optimization. Unsigned values logically correspond to disk 
and memory locations.

Jun 17 2010

Kagamin <spam here.lot> writes:

Justin Spahr-Summers Wrote:

 1. Ironically the issue is not in file offset's signedness. You still hit the
bug with ulong offset.

 
 How so? Subtracting a size_t from a ulong offset will only cause 
 problems if the size_t value is larger than the offset. If that's the 
 case, then the issue remains even with a signed offset.

May be, you didn't see the testcase.
ulong a;
ubyte[] b;
a+=-b.length; // go a little backwards

or

seek(-b.length, SEEK_CUR, file);

 2. Signed offset is two times safer than unsigned as you can detect
 underflow bug (and, maybe, overflow).

 
 The solution with unsigned values is to make sure that they won't 
 underflow *before* performing the arithmetic - and that's really the 
 proper solution anyways.

If you rely on client code to be correct, you get security issue. And client
doesn't necessarily use your language or your compiler. Or he can turn off
overflow checks for performance. Or he can use the same unsigned variable for
both signed and unsigned offsets, so checks for underflow become useless.

 With unsigned offset you get exception if the filesystem doesn't
 support sparse files, so the linux will keep silence.

 
 I'm not sure what this means. Can you explain?

This means that you have subtle bug.

 3. Signed offset is consistent/type-safe in the case of the seek function as
it doesn't arbitrarily mutate between signed and unsigned.

 
 My point was about signed values being used to represent zero-based 
 indices. Obviously there are applications for a signed offset *from the 
 current position*. It's seeking to a signed offset *from the start of 
 the file* that's unsafe.

To catch this is the case of signed offset you need only one check. In the case
of unsigned offsets you have to watch underflows in the entire application code
even if it's not related to file seeks - just in order to fix issue that can be
fixed separately.

 4. Choosing unsigned for file offset is not dictated by safety, but by
stupidity: "hey, I lose my bit!"

 
 You referred to 32-bit systems, correct? I'm sure there are 32-bit 
 systems out there that need to be able to access files larger than two 
 gigabytes.

I'm talking about 64-bit file offsets which are 64-bit on 32-bit systems too.
As to file size limitations there's no difference between signed and unsigned
lenghts. File sizes have no tendency stick to 4 gig value. If you need to
handle files larger that 2 gigs, you also need to handle files larger than 4
gigs.

 I AM an optimization zealot, but unsigned offsets are plain dead
 freaking stupid.

 
 It's not an optimization. Unsigned values logically correspond to disk 
 and memory locations.

They don't. Memory locations are a *subset* of size_t values range. That's why
you have bound checks. And the problem is usage of these locations: memory bus
doesn't perform computations on the addresses, application does - it adds,
subtracts, mixes signeds with unsigneds, has various type system holes or
kludges, library design issues, used good practices etc. In other words, it
gets a little bit complex than just locations.

Jun 17 2010

Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:

On Thu, 17 Jun 2010 06:41:33 -0400, Kagamin <spam here.lot> wrote:
 
 Justin Spahr-Summers Wrote:
 
 1. Ironically the issue is not in file offset's signedness. You still hit the
bug with ulong offset.

 
 How so? Subtracting a size_t from a ulong offset will only cause 
 problems if the size_t value is larger than the offset. If that's the 
 case, then the issue remains even with a signed offset.

 
 May be, you didn't see the testcase.
 ulong a;
 ubyte[] b;
 a+=-b.length; // go a little backwards

I did see that, but that's erroneous code. Maybe the compiler could warn 
about unary minus on an unsigned type, but I find such problems rare as 
long as everyone working on the code understands signedness.

 or
 
 seek(-b.length, SEEK_CUR, file);

I wouldn't call it a failure of unsigned types that this causes 
problems. Like I suggested above, the situation could possibly be 
alleviated if the compiler just warned about unary minus no-ops.

Like a couple others pointed out, this is just a lack of understanding 
of unsigned types and modular arithmetic. I'd say that any programmer 
should have such an understanding, regardless if their programming 
language of choice supports unsigned types or not.

 2. Signed offset is two times safer than unsigned as you can detect
 underflow bug (and, maybe, overflow).

 
 The solution with unsigned values is to make sure that they won't 
 underflow *before* performing the arithmetic - and that's really the 
 proper solution anyways.

 
 If you rely on client code to be correct, you get security issue. And client
doesn't necessarily use your language or your compiler. Or he can turn off
overflow checks for performance. Or he can use the same unsigned variable for
both signed and unsigned offsets, so checks for underflow become useless.

What kind of client are we talking about? If you're referring to 
contract programming, then it's the client's own fault if they fiddle 
around with the code and end up breaking it or violating its 
conventions.

 With unsigned offset you get exception if the filesystem doesn't
 support sparse files, so the linux will keep silence.

 
 I'm not sure what this means. Can you explain?

 
 This means that you have subtle bug.
 
 3. Signed offset is consistent/type-safe in the case of the seek function as
it doesn't arbitrarily mutate between signed and unsigned.

 
 My point was about signed values being used to represent zero-based 
 indices. Obviously there are applications for a signed offset *from the 
 current position*. It's seeking to a signed offset *from the start of 
 the file* that's unsafe.

 
 To catch this is the case of signed offset you need only one check. In the
case of unsigned offsets you have to watch underflows in the entire application
code even if it's not related to file seeks - just in order to fix issue that
can be fixed separately.

Signed offsets can (truly) underflow as well. I don't see how the issue 
is any different.

 
 4. Choosing unsigned for file offset is not dictated by safety, but by
stupidity: "hey, I lose my bit!"

 
 You referred to 32-bit systems, correct? I'm sure there are 32-bit 
 systems out there that need to be able to access files larger than two 
 gigabytes.

 
 I'm talking about 64-bit file offsets which are 64-bit on 32-bit systems too.

In D's provided interface, this is true, but fseek() from C uses C's 
long data type, which is *not* 64-bit on 32-bit systems, and this is (I 
assume) what std.stdio uses under-the-hood, making it doubly unsafe.

 As to file size limitations there's no difference between signed and
 unsigned lenghts. File sizes have no tendency stick to 4 gig value. If
 you need to handle files larger that 2 gigs, you also need to handle
 files larger than 4 gigs.

Of course. But why restrict oneself to half the available space 
unnecessarily?

 I AM an optimization zealot, but unsigned offsets are plain dead
 freaking stupid.

 
 It's not an optimization. Unsigned values logically correspond to
 disk and memory locations.

 
 They don't. Memory locations are a *subset* of size_t values range.
 That's why you have bound checks. And the problem is usage of these
 locations: memory bus doesn't perform computations on the addresses,
 application does - it adds, subtracts, mixes signeds with unsigneds,
 has various type system holes or kludges, library design issues, used
 good practices etc. In other words, it gets a little bit complex than
 just locations.

Bounds checking does alleviate the issue somewhat, I'll grant you that. 
But as far as address computation, even if your application does none, 
the operating system still will in order to map logical addresses, which 
start at 0, to physical addresses, which also start at 0. And the memory 
bus absolutely requires unsigned values even if it needs to perform no 
computation itself.

Jun 17 2010

Kagamin <spam here.lot> writes:

bearophile Wrote:

 

CLS bans unsigneds.

Signed ints FTW!!!

Jun 14 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Kagamin wrote:
 bearophile Wrote:
 

 CLS bans unsigneds.
 
 Signed ints FTW!!!

CLS = ?

Andrei

Jun 14 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Signed ints FTW!!!

 
 CLS = ?

I think he means "Common Language Specification":
http://msdn.microsoft.com/en-us/library/12a7a7h3.aspx

Bye,
bearophile

Jun 15 2010

Kagamin <spam here.lot> writes:

bearophile Wrote:

 Andrei Alexandrescu:
 Signed ints FTW!!!

 
 CLS = ?

 
 I think he means "Common Language Specification":
 http://msdn.microsoft.com/en-us/library/12a7a7h3.aspx
 

It seems to be reversed for byte...

Jun 15 2010

Justin Johansson <no spam.com> writes:

Andrei Alexandrescu wrote:
 Kagamin wrote:
 bearophile Wrote:


 CLS bans unsigneds.

 Signed ints FTW!!!

 
 CLS = ?
 
 Andrei

"Clear Screen"

Justin

Jun 15 2010

Kagamin <spam here.lot> writes:

Walter Bright Wrote:

 1) I'd like D to use signed words to represent lengths and array indexes.

 
 This would lead to silent breakage of code transferred from C and C++.

I actually wrote an application in C and used signeds for array indexes, string
positions and field offsets.

 2. For an operating system kernel's memory management logic, it still would
make 
 sense to represent the address space as a flat range from 0..n, not one that's 
 split in the middle, half of which is accessed with negative offsets. D is 
 supposed to support OS development.
 

Doesn't OS think in terms of pages? And... yes... I have 5 gigs of memory on my
32bit windows system. How can you make sense of representing it with uint
length?
Not to mention that if you really need it, you *can* workaround compiler checks
by declaring your own getters/ranges.

Jun 15 2010

bearophile <bearophileHUGS lycos.com> writes:

Justin Johansson:
 To my interpretation this means that at sometimes trying to be clever is 
 actually stupid.

A great programmer writes code as simple as possible (but not simpler).

Code that doesn't need comments to be understood is often better than code that
needs comments to be understood.

Bye,
bearophile

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Justin Johansson:
 To my interpretation this means that at sometimes trying to be clever is 
 actually stupid.

 
 A great programmer writes code as simple as possible (but not simpler).


I've never met a single programmer or engineer who didn't believe and recite 
that platitude, and this includes every programmer and engineer who would find 
very complicated ways to do simple things.

I've also never met a programming language advocate that didn't believe their 
language fulfilled that maxim.

To me, it just goes to show that anyone can create a complex solution, but it 
takes a genius to produce a simple one.

Jun 15 2010

bearophile <bearophileHUGS lycos.com> writes:

Don:

Indeed, only a subset of D is useful for low-level development.<

A problem is that some of those D features (that are often useful in
application code) are actively negative for that kind of development.


But D has more close-to-the-metal features than C does.<

I don't know if those extra D features are enough. And the C dialect used for
example by Linux is not standard C, it uses many other tricks. I think D
doesn't have some of them (I will try to answer this better to a Walter's post).

A recent nice post by Linus linked here by Walter has partially answered a
question I have asked here, that is what language features a kernel developer
can enjoy that both C and C++ lack. That answer has shown that
close-to-the-metal features are useful but they are not enough. 

I presume it's not even easy to express what those more important things are,
Linus writes:

So I agree that describing the data is important, but at the same time, the
things that really need the most description are how the data hangs together,
what the consistency requirements are, what the locking rules are (and not for
a single data object either), etc etc. And my suspicion is that you can't
easily really describe those to a compiler. So you end up having to write that
code yourself regardless. And hey, maybe it's because I do just low-level
programming that I think so. As mentioned, most of the code I work with really
deeply cares about the kinds of things that most software projects probably
never even think about: stack depth, memory access ordering, fine-grained
locking, and direct hardware access.<

D gives few more ways to give complex semantics to the compiler, but probably
other better languages need to be invented for this. I think it is possible to
invent such languages, but maybe they will be hard to use (maybe as Coq
http://en.wikipedia.org/wiki/Coq ), so they will be niche languages. Such niche
can be so small that maybe the work to invent and implement and keep updated
and debugged such language is not worth it.

Bye,
bearophile

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Don:
 
 Indeed, only a subset of D is useful for low-level development.<

 
 A problem is that some of those D features (that are often useful in
 application code) are actively negative for that kind of development.
 
 
 But D has more close-to-the-metal features than C does.<

 
 I don't know if those extra D features are enough.

Since it has more than C does, and C is used for kernel dev, then it must be
enough.

 And the C dialect used for
 example by Linux is not standard C, it uses many other tricks. I think D
 doesn't have some of them (I will try to answer this better to a Walter's
 post).

I'll await your reply there.


 So I agree that describing the data is important, but at the same time, the
 things that really need the most description are how the data hangs
 together, what the consistency requirements are, what the locking rules are
 (and not for a single data object either), etc etc. And my suspicion is
 that you can't easily really describe those to a compiler. So you end up
 having to write that code yourself regardless. And hey, maybe it's because
 I do just low-level programming that I think so. As mentioned, most of the
 code I work with really deeply cares about the kinds of things that most
 software projects probably never even think about: stack depth, memory
 access ordering, fine-grained locking, and direct hardware access.<

 
 D gives few more ways to give complex semantics to the compiler, but probably
 other better languages need to be invented for this. I think it is possible
 to invent such languages, but maybe they will be hard to use (maybe as Coq
 http://en.wikipedia.org/wiki/Coq ), so they will be niche languages. Such
 niche can be so small that maybe the work to invent and implement and keep
 updated and debugged such language is not worth it.


With all due respect to Linus, in 30 years of professionally writing software, 
I've found that if you solely base improvements on what customers ask for, all 
you have are incremental improvements. No quantum leaps, no paradigm shifts, no 
game changers.

To get those, you have to look quite a bit beyond what the customer asks for.
It 
also requires understanding that if a customer asks for feature X, it really 
means he is having problem Y, and there may be a far better solution to X than
Y.

One example of this is transitive immutability. Nobody asked for it. A lot of 
people question the need for it. I happen to believe that it offers a quantum 
improvement in the ability of a programmer to manage the complexity of a large 
program, which is why I (and Andrei) have invested so much effort in it, and
are 
willing to endure flak over it. The payoff won't be clear for years, but I
think 
it'll be large.

Scope guard statements are another example. So are shared types.

Jun 15 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Walter Bright <newshound1 digitalmars.com> wrote:
 bearophile wrote:
 Don:

 Indeed, only a subset of D is useful for low-level development.<

  A problem is that some of those D features (that are often useful in
 application code) are actively negative for that kind of development.

 But D has more close-to-the-metal features than C does.<

  I don't know if those extra D features are enough.

 Since it has more than C does, and C is used for kernel dev, then it  
 must be enough.

I believe the point of Linus (and probably bearophile) was not that C++
lacked features, but rather it lets programmers confuse one another by
having features that are not as straight-forward as C. D also has these.


 One example of this is transitive immutability. Nobody asked for it. A  
 lot of people question the need for it. I happen to believe that it  
 offers a quantum improvement in the ability of a programmer to manage  
 the complexity of a large program, which is why I (and Andrei) have  
 invested so much effort in it, and are willing to endure flak over it.  
 The payoff won't be clear for years, but I think it'll be large.

I still have problems understanding how someone could come up with the
idea of non-transitive const. I remember the reaction when I read about
it being such a great thing on this newsgroup, and going "wtf? Why on
earth would it not be transitive? That would be useless!" (yes, I was
not a very experienced programmer).

-- 
Simen

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

Simen kjaeraas wrote:
 Walter Bright <newshound1 digitalmars.com> wrote:
 bearophile wrote:
 Don:

 Indeed, only a subset of D is useful for low-level development.<

  A problem is that some of those D features (that are often useful in
 application code) are actively negative for that kind of development.

 But D has more close-to-the-metal features than C does.<

  I don't know if those extra D features are enough.

 Since it has more than C does, and C is used for kernel dev, then it 
 must be enough.

 
 I believe the point of Linus (and probably bearophile) was not that C++
 lacked features, but rather it lets programmers confuse one another by
 having features that are not as straight-forward as C. D also has these.

To some extent, yes. My point was that C++ doesn't have a whole lot beyond that 
to offer, while D does.


 One example of this is transitive immutability. Nobody asked for it. A 
 lot of people question the need for it. I happen to believe that it 
 offers a quantum improvement in the ability of a programmer to manage 
 the complexity of a large program, which is why I (and Andrei) have 
 invested so much effort in it, and are willing to endure flak over it. 
 The payoff won't be clear for years, but I think it'll be large.

 
 I still have problems understanding how someone could come up with the
 idea of non-transitive const. I remember the reaction when I read about
 it being such a great thing on this newsgroup, and going "wtf? Why on
 earth would it not be transitive? That would be useless!" (yes, I was
 not a very experienced programmer).

I don't think the non-transitive const is very useful either, and I think that 
C++ demonstrates that.

Jun 15 2010

Jeff Nowakowski <jeff dilacero.org> writes:

On 06/15/2010 05:43 PM, Walter Bright wrote:
 One example of this is transitive immutability. Nobody asked for it.

I find this hard to believe. I seem to recall that you were personally 
against const for a very long time. Did none of the people advocating 
for const suggest a deep const? Should I dig through the archives?

Jun 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

Jeff Nowakowski wrote:
 On 06/15/2010 05:43 PM, Walter Bright wrote:
 One example of this is transitive immutability. Nobody asked for it.

 
 I find this hard to believe. I seem to recall that you were personally 
 against const for a very long time. Did none of the people advocating 
 for const suggest a deep const? Should I dig through the archives?


Andrei explained transitivity to me and convinced me of its utility.

Jun 16 2010

Jeff Nowakowski <jeff dilacero.org> writes:

On 06/16/2010 12:33 PM, Walter Bright wrote:
 Jeff Nowakowski wrote:
 On 06/15/2010 05:43 PM, Walter Bright wrote:
 One example of this is transitive immutability. Nobody asked for it.

 I find this hard to believe. I seem to recall that you were personally
 against const for a very long time. Did none of the people advocating
 for const suggest a deep const? Should I dig through the archives?


 Andrei explained transitivity to me and convinced me of its utility.

Ok, but lots of people have been talking about const correctness for 
years (including yourself), stemming from the known C++ problems, and I 
don't see how "transitive immutability" (a deep const) is a new idea 
that nobody asked for. The only thing new here is that you guys came up 
with an implementation for D, and lots of people were glad to have it, 
even if many were also against it.

Jun 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

Jeff Nowakowski wrote:
 On 06/16/2010 12:33 PM, Walter Bright wrote:
 Jeff Nowakowski wrote:
 On 06/15/2010 05:43 PM, Walter Bright wrote:
 One example of this is transitive immutability. Nobody asked for it.

 I find this hard to believe. I seem to recall that you were personally
 against const for a very long time. Did none of the people advocating
 for const suggest a deep const? Should I dig through the archives?


 Andrei explained transitivity to me and convinced me of its utility.

 
 Ok, but lots of people have been talking about const correctness for 
 years (including yourself), stemming from the known C++ problems, and I 
 don't see how "transitive immutability" (a deep const) is a new idea 
 that nobody asked for. The only thing new here is that you guys came up 
 with an implementation for D, and lots of people were glad to have it, 
 even if many were also against it.

I've talked with C++ experts for years about const. Not one of them ever 
mentioned transitivity, let alone asked for it or thought it was a desirable 
property.

After we designed transitive const for D, I presented it to several C++
experts. 
My first job was to explain what transitive meant - none of them were familiar 
with the idea. Next, it took a lot of convincing of them that this was a good 
idea. They all insisted that a const pointer to mutable data was terribly
important.

While it is true that C++ people have talked about const-correctness since
const 
was introduced to C++, it does not at all imply any concept or understanding of 
transitivity. Transitivity is an orthogonal idea.

The people who do understand transitive const and need no convincing are the 
functional programming crowd. What's interesting are the languages which claim 
to offer FP features, as that's the latest bandwagon, but totally miss 
transitive const.

Jun 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

Walter Bright wrote:
 The people who do understand transitive const and need no convincing are 
 the functional programming crowd. What's interesting are the languages 
 which claim to offer FP features, as that's the latest bandwagon, but 
 totally miss transitive const.

I wish to add that I've not heard any proposal or discussion of adding

Jun 16 2010

Jeff Nowakowski <jeff dilacero.org> writes:

On 06/16/2010 04:48 PM, Walter Bright wrote:
 Walter Bright wrote:
 The people who do understand transitive const and need no convincing
 are the functional programming crowd. What's interesting are the
 languages which claim to offer FP features, as that's the latest
 bandwagon, but totally miss transitive const.

 I wish to add that I've not heard any proposal or discussion of adding


I know the Javari paper was mentioned here by Bruno. Also, the idea of 
deep immutability just isn't rocket science and has occurred to many 
people, and is why many people have started looking into Haskell given 
the new focus on concurrency. However, you're right in that as far as I 
know D is the only language to take the ball and run with it.

Jun 16 2010

bearophile <bearophileHUGS lycos.com> writes:

This thread was not about linux or Linus or operating systems, it was about my
proposal of changing indexes and lengths in D to signed words. So let's go back
to the true purpose of this thread!

Walter Bright:
 1. D source code is supposed to be portable between 32 and 64 bit systems.
This 
 would fail miserably if the sign of things silently change in the process.

I don't understand this, please explain better.
If I use a signed word on both 32 and 64 bit systems to represent indexes and
lengths what bad things can this cause?


 2. For an operating system kernel's memory management logic, it still would
make 
 sense to represent the address space as a flat range from 0..n, not one that's 
 split in the middle, half of which is accessed with negative offsets. D is 
 supposed to support OS development.

I don't understand, I don't understand how this is related to lengths and
indexes, for examples array ones.

Bye,
bearophile

Jun 15 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 This thread was not about linux or Linus or operating systems, it was about
 my proposal of changing indexes and lengths in D to signed words. So let's go
 back to the true purpose of this thread!
 
 Walter Bright:
 1. D source code is supposed to be portable between 32 and 64 bit systems.
 This would fail miserably if the sign of things silently change in the
 process.

 
 I don't understand this, please explain better. If I use a signed word on
 both 32 and 64 bit systems to represent indexes and lengths what bad things
 can this cause?

Changing the sign of size_t from unsigned to signed when going from 32 to 64 
bits will cause a difference in behavior.


 2. For an operating system kernel's memory management logic, it still would
 make sense to represent the address space as a flat range from 0..n, not
 one that's split in the middle, half of which is accessed with negative
 offsets. D is supposed to support OS development.

 
 I don't understand, I don't understand how this is related to lengths and
 indexes, for examples array ones.

A memory manager sees the address space as 0..N, not -N/2..0..N/2

Jun 15 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

Changing the sign of size_t from unsigned to signed when going from 32 to 64
bits will cause a difference in behavior.<

I have proposed to use a "signed word" on both 32 and 64 bits systems. So
where's the difference in behaviour?


A memory manager sees the address space as 0..N, not -N/2..0..N/2<

If D arrays use signed words as indexes on 32 bit systems then only half of the
original length can be used. The numbers in 0..N/2 are a subset of half of the
unsigned number range.

Bye,
bearophile

Jun 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 
 Changing the sign of size_t from unsigned to signed when going from 32 to
 64 bits will cause a difference in behavior.<

 
 I have proposed to use a "signed word" on both 32 and 64 bits systems. So
 where's the difference in behaviour?

If we go back in the thread, the argument for the signed size_t argument was
for 
64 bit address spaces. With 32 bit address spaces, objects larger than 31 bits 
are needed.

Jun 16 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

 If we go back in the thread, the argument for the signed size_t argument was
for 
 64 bit address spaces.

I was asking for signed word lengths and indexes on both 32 and 64 bit systems.
Sorry for not being more clear from the start.


With 32 bit address spaces, objects larger than 31 bits are needed.<

I don't fully understand what you mean. On 32 bit systems I can accept arrays
and lists and collections (or a call to malloc) with only 2_147_483_648
items/bytes.
On a 32 bit Windows with 3 GB RAM (and Windows itself set to use 3 GB) DMD
allows me to allocate only a part of those. In practice you can't allocate more
than 2 GB in a single block.
On a 32 bit system I can desire arrays with something like 3_000_000_000 items
only when the array items are single bytes (ubyte, char, byte, bool), and such
situations are not so common (and probably 32bit Windows will not allow me to
do it).

(I am still writing a comment to another answer of yours, I am not so fast,
please be patient :-) )

Bye,
bearophile

Jun 16 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Walter Bright:
 If we go back in the thread, the argument for the signed size_t argument was
for
 64 bit address spaces.

 I was asking for signed word lengths and indexes on both 32 and 64 bit systems.

Sorry for not being more clear from the start.
With 32 bit address spaces, objects larger than 31 bits are needed.<

 I don't fully understand what you mean. On 32 bit systems I can accept arrays

and lists and collections (or a call to malloc) with only 2_147_483_648
items/bytes.
 On a 32 bit Windows with 3 GB RAM (and Windows itself set to use 3 GB) DMD

allows me to allocate only a part of those. In practice you can't allocate more
than 2 GB in a single block.
 On a 32 bit system I can desire arrays with something like 3_000_000_000 items

only when the array items are single bytes (ubyte, char, byte, bool), and such
situations are not so common (and probably 32bit Windows will not allow me to
do it).
 (I am still writing a comment to another answer of yours, I am not so fast,

please be patient :-) )
 Bye,
 bearophile

That's because Win32 reserves the upper half of the address space for kernel
address space.  If you use the 3GB switch, then you get 3GB for your program and
1GB for the kernel, but only if the program is large address space aware.  If
you
use Win64, you get 4 GB of address space for your 32-bit programs, but again
only
if they're large address space aware.

Programs need to be explicitly be made large address space aware because some
legacy programs assumed it would never be possible to have more than 2GB of
address space, and thus used the most significant bit of pointers "creatively"
or
used ints for things that size_t should be used for would break in unpredictable
ways if the program could suddenly see more than 2GB of address space.  You can
make a program large address space aware by using editbin
(http://msdn.microsoft.com/en-us/library/d25ddyfc%28v=VS.80%29.aspx).

Jun 16 2010

Kagamin <spam here.lot> writes:

Jérôme M. Berger Wrote:

 #include <assert.h>
 #include <stdio.h>
 
 int main (int argc, char** argv) {
    char*        data   = argv[0];  /* Just to get a valid pointer */
    unsigned int offset = 3;
 
    printf ("Original: %p\n", data);
 
    data += offset;
    printf ("+3      : %p\n", data);
 
    data += -offset;
    printf ("-3      : %p\n", data);
 
    assert (data == argv[0]);    /* Works on 32-bit systems, fails on 64-bit */
 
    return 0;
 }
 

Yo, dude!
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=97545

Jun 16 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Kagamin wrote:
 J=C3=A9r=C3=B4me M. Berger Wrote:
=20
 #include <assert.h>
 #include <stdio.h>

 int main (int argc, char** argv) {
    char*        data   =3D argv[0];  /* Just to get a valid pointer */=


    unsigned int offset =3D 3;

    printf ("Original: %p\n", data);

    data +=3D offset;
    printf ("+3      : %p\n", data);

    data +=3D -offset;
    printf ("-3      : %p\n", data);

    assert (data =3D=3D argv[0]);    /* Works on 32-bit systems, fails =


on 64-bit */
    return 0;
 }

 Yo, dude!
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalma=

rs.D&article_id=3D97545

	Yes, I know. I was pointing out to Walter a real life example of
code that works on 32-bit systems but not on 64-bit systems because
of signedness issues. That was in answer to Walter saying: "I
thought most 64 bit C compilers were specifically designed to avoid
this problem."

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Jun 17 2010

Kagamin <spam here.lot> writes:

Walter Bright Wrote:

 2. For an operating system kernel's memory management logic, it still would
make 
 sense to represent the address space as a flat range from 0..n, not one that's 
 split in the middle, half of which is accessed with negative offsets. D is 
 supposed to support OS development.
 

You said it yourself: the compiler can be modified for kernel development. This
makes kernel examples (not even considering their validity) not very valuable.

Jun 17 2010

bearophile <bearophileHUGS lycos.com> writes:

Sorry for the slow answer. Reading all this stuff and trying to understand some
of it requires time to me.

Walter Bright:

The Arduino is an 8 bit machine. D is designed for 32 bit and up machines. Full
C++ won't even work on a 16 bit machine, either.<

So D isn't a "better C" because you can't use it in a *large* number of
situations (for every 32 bit CPU built today, they probably build 10 8/16 bit
CPUs) where C is used.


If you're a kernel dev, the language features should not be a problem for you.<

From what I have seen, C++ has a ton of features that are negative for kernel
development. So a language that misses them in the first place is surely
better, because it's simpler to use, and its compiler is smaller and simpler to
debug. About two years ago I have read about an unfocused (and dead) proposal
to write a C compiler just to write the Linux kernel, allowing to avoid GCC.


BTW, you listed nested functions as disqualifying a language from being a
kernel dev language, yet gcc supports nested functions as an extension.<

Nested functions are useful for my D code, I like them and I use them. But in D
(unless they are static!) they create an extra pointer. From what I have read
such silent creation of extra data structures is bad if you are writing a
kernel. So probably a kernel dev can accept only static nested functions. For e
kernel dev the default of nonstatic is bad, because if he/she/shi forgets to
add the "static" attribute then it's probably a bug. This is why I have listed
D nested functions as a negative point for a kernel dev.

Regarding GCC having nested functions (GCC implements them with a trapoline), I
presume kernel devs don't use thie GCC extension. GCC is designed for many
purposes and surely some of its features are not designed for kernel-writing
purposes.


As I pointed out, D implements the bulk of those extensions as a standard part
of D.<

I am studying this still. See below.


They are useful in some circumstances, but are hardly necessary.<

For a low-level programmer they can be positively useful, while several other D
features are useless or actively negative.
I have seen about 15-20% of performance increase using computed gotos in a
finite state machine I have written (that processes strings).
Recently CPython has introduced them with a 15-20% performance improvement:
http://bugs.python.org/issue4753

------------------------------

 It's interesting that D already has most of the gcc extensions:
 http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_4.html

Some more items from that page:

4.5 Constructing Function Calls: this syntax&semantics seems dirty, and I don't
fully understand how to use this stuff. In D I miss a good apply() and a good
general memoize. The memoize is a quick and easy way to cache computations and
to turn recursive functions into efficient dynamic programming algorithms.

-----------

4.13 Arrays of Length Zero: they are available in D, but you get a array bound
error if you try to use them to create variable-length structs. So to use them
you have to to overload the opIndex and opIndexAssign of the struct...

-----------

4.14 Arrays of Variable Length (allocated on the stack): this is missing in D.
Using alloca is a workaround.

-----------

4.21 Case Ranges: D has this, but I am not sure D syntax is better.

-----------

4.22 Cast to a Union Type: this is missing in D, it can be done anyway using
adding a static opCall to the enum for each of its fields:

union Foo {
    int i;
    double d;
    static Foo opCall(int ii) { Foo f; f.i = ii; return f; }
    static Foo opCall(double dd) { Foo f; f.d = dd; return f; }
}
void main() {
    Foo f1 = Foo(10);
    Foo f2 = Foo(10.5);
}

-----------

4.23 Declaring Attributes of Functions 

noreturn: missing in D. But I am not sure how much useful this is, the page
says: >it helps avoid spurious warnings of uninitialized variables.<

format (archetype, string-index, first-to-check) and format_arg (string-index):
they are missing in D, and it can be useful for people that want to use
std.c.stdio.printf.

no_instrument_function: missing in D. It can be useful to not profile a
function.

section ("section-name"): missing in D.

no_check_memory_usage: I don't understand this.

-----------

4.29 Specifying Attributes of Variables 

aligned (alignment): I think D doesn't allow to specify an align for
fixed-sized arrays.

nocommon: I don't understand this.

-----------

4.30 Specifying Attributes of Types 

transparent_union: D missed this, but I don't know how much useful this is.

-----------

4.34 Variables in Specified Registers: missing in D.
4.34.1, 4.34.2

Recently a Hashell middle-end for LLVM has shown that LLVM can be used to use
registers better than fixing them in specified registers (so they are in
specified registers only outside functions and this frees registers inside the
functions and increases performance a bit).

-----------

4.37 Function Names as Strings: I think this is missing in D. __FUNCTION__ can
be useful with string mixins.

-----------

4.38 Getting the Return or Frame Address of a Function: missing in D. I don't
know when to use this.

-----------

4.39 Other built-in functions provided by GNU CC:

__builtin_constant_p: missing in D. It can be useful with static if.

------------------------------

There are other pages of docs about GCC, like this one:
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

alloc_size: I don't know how much this can be useful in D, probably not much.

artificial: I don't understand this.

error ("message") and warning ("message"): I don't fully understand them.

malloc: missing in D (it's a function attribute).

noinline: missign in D.

noclone: missing in D (cloning happens with LDC).

nonnull (arg-index, ...): ah ah, missing in D :-) I didn't know about this
attribute. But a much better syntax can be used in D.

optimize: missing in D, useful. Often used in CLisp.

pcs: missing in D.

hot, cold: missing in D, but not so useful.

regparm (number): I don't fully understand this.

sseregparm: something like this seems needed in D.

force_align_arg_pointer: missing in D, but I don't understand it fully.

signal: I don't know.

syscall_linkage: missing in D.

target: curious, I don't know if this is needed in D (a static if around the
versions can be enough, but I don't remember if the CPU type is available at
compile-time).

warn_unused_result: missing in D. Can be useful where exceptions can't be used.

------------------------------

I have omitted many attributes and little features useful for specific CPU
targets.
So it seems there is a good number of features present in GNU C that D are
missing in D. I don't know how many of them are used for example in the Linux
kernel.

Bye,
bearophile

Jun 18 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-06-18 08:11:00 -0400, bearophile <bearophileHUGS lycos.com> said:

 4.13 Arrays of Length Zero: they are available in D, but you get a 
 array bound error if you try to use them to create variable-length 
 structs. So to use them you have to to overload the opIndex and 
 opIndexAssign of the struct...

Bypassing bound checks is as easy as appending ".ptr":

	staticArray.ptr[10]; // no bound check

Make an alias to the static array's ptr property if you prefer not to 
have to write .ptr all the time.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Michel Fortin:
 Bypassing bound checks is as easy as appending ".ptr":
 
 	staticArray.ptr[10]; // no bound check
 
 Make an alias to the static array's ptr property if you prefer not to 
 have to write .ptr all the time.

If you try to compile this:

import std.c.stdlib: malloc;
struct Foo {
  int x;
  int[0] a;
}
void main() {
  enum N = 20;
  Foo* f = cast(Foo*)malloc(Foo.sizeof + N * typeof(Foo.a[0]).sizeof);
  f.a.ptr[10] = 5;
}

You receive:
prog.d(9): Error: null dereference in function _Dmain

As I have said, you have to use operator overloading of the struct and some
near-ugly code that uses the offsetof. I don't like this a lot.

Bye,
bearophile

Jun 18 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/18/10, bearophile <bearophileHUGS lycos.com> wrote:
 As I have said, you have to use operator overloading of the struct and some
 near-ugly code that uses the offsetof. I don't like this a lot.

D need be no uglier than C. Here's my implementation:


/*  very_unsafe */ struct TailArray(T) {
	T opIndex(size_t idx) {
		T* tmp = cast(T*) (&this) + idx;
		return *tmp;
	}

	T opIndexAssign(T value, size_t idx) {
		T* tmp = cast(T*) (&this) + idx;
		*tmp = value;
		return value;
	}
}

// And this demonstrates how to use it:

import std.contracts;
import std.c.stdlib;

struct MyString {
	size_t size;
	TailArray!(char) data; // same as char data[0]; in C

                // to show how to construct it
	static MyString* make(size_t size) {
		MyString* item = cast(MyString*) malloc(MyString.sizeof + size);
		enforce(item !is null);
		item.size = size;
		return item;
	}

	static void destroy(MyString* s) {
		free(s);
	}
}

import std.stdio;

void main() {
	MyString* str = MyString.make(5);
	scope(exit) MyString.destroy(str);

                 // assigning works same as C
	str.data[0] = 'H';
	str.data[1] = 'e';
	str.data[2] = 'l';
	str.data[3] = 'l';
	str.data[4] = 'o';

                // And so does getting
	for(int a = 0; a < str.size; a++)
		writef("%s", str.data[a]);
	writefln("");
}

Jun 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Adam Ruppe:
 D need be no uglier than C. Here's my implementation:

That's cute, thank you :-)


 	static void destroy(MyString* s) {
 		free(s);
 	}

Why destroy instead of ~this() ?

Bye,
bearophile

Jun 18 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/18/10, bearophile <bearophileHUGS lycos.com> wrote:
 	static void destroy(MyString* s) {
 		free(s);
 	}

 Why destroy instead of ~this() ?

It allocates and deallocates the memory rather than initialize and
uninitialize the object. I don't think a destructor can free the mem
of its own object.

If I used gc.malloc or stack allocation, the destroy method shouldn't
be necessary at all, since the memory is handled automatically there.

Though, the main reason I did it this way is I was just writing in a C
style rather than a D style, so it was kinda automatic. Still, I'm
pretty sure what I'm saying here about constructor/destructor not able
to actually the memory of the object is true too.

Jun 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Adam Ruppe:

 I don't think a destructor can free the mem
 of its own object.

I see and I'd like to know! :-)

By the way, this program shows your code is not a replacement of the operator
overloading of the variable length struct itself I was talking about, because D
structs can't have length zero (plus 3 bytes of padding, here):

import std.stdio: writeln, write;

struct TailArray(T) {
    T opIndex(size_t idx) {
        T* tmp = cast(T*)(&this) + idx;
        return *tmp;
    }

    T opIndexAssign(T value, size_t idx) {
        T* tmp = cast(T*)(&this) + idx;
        *tmp = value;
        return value;
    }
}

struct MyString1 {
    size_t size;
    TailArray!char data; // not the same as char data[0]; in C
}

struct MyString2 {
    size_t size;
    char[0] data;
}

void main() {
    writeln(MyString1.sizeof); // 8
    writeln(MyString2.sizeof); // 4
}

Bye,
bearophile

Jun 18 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/18/10, bearophile <bearophileHUGS lycos.com> wrote:
 By the way, this program shows your code is not a replacement of the
 operator overloading of the variable length struct itself I was talking
 about, because D structs can't have length zero (plus 3 bytes of padding,
 here):

Huh, weird. Doesn't make too much of a difference in practice though,
since it only changes the malloc line slightly.

In C, before the array[0] was allowed (actually, I'm not completely
sure it is allowed even now in the standard. C99 added something, but
I don't recall if it is the same thing), people would use array[1].

Since it is at the tail of the struct, and you're using pointer magic
to the raw memory anyway, it doesn't make much of a difference.

Jun 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Adam Ruppe:
 Huh, weird. Doesn't make too much of a difference in practice though,
 since it only changes the malloc line slightly.

Probably it can be fixed, but you have to be careful, because the padding isn't
constant, it can change in size according to the CPU word size and the types of
the data that come before TailArray :-)

Bye,
bearophile

Jun 18 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Signed word lengths and indexes