www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Dynamic arrays, basic type names, auto

reply bearophile <bearophileHUGS yahoo.com> writes:
Yet another of my lists of silly suggestions, this time shorter :-)

A syntax like this (that is a bit hairy, I agree) may be useful for
time-critical spots in the code:

auto arr = new int[1_000_000] = int.max;
auto arr = new int[1_000_000] = void;

--------------------------

I may appreciate D 2.0 to have its basic types named like this:

- int1 int2 int4 int8
- uint1 uint2 uint4 uint8
(Fixed size. The number is their length expressed in bytes (shorter than using
bits). There aren't int16 / uint16 yet)

- int
(built-in multi-precision signed integer. The compiler has heuristics to
replace it with a faster fixed size integer in lot of situations. Optimized (if
necessary modifying the GC too) for very short numbers that can fit in just 4-8
bytes, so even where the compiler isn't able to replace it with a fixed sized
integer the slowdown isn't too much big, this is to hopefully let C++/C
programmers be less scared of them. People that need to compute operations on
very large integers have to tolerate a bit of constant slowdown, or use GMP).

- word uword
(They are the default size of the CPU, 4, 8 or 16 or 32 bytes, so their size
isn't fixed. 'word' replaces the current 'size_t')

I don't like this much, but I don't see better alternatives so far (you need
less memory to remember them and dchar, wchar, char):
- char1, char2, char4
(They are unicode. char1 is often 1 byte, and char2 is often 2 bytes long, but
they may grow to 4, so they aren't fixed size)

- str
(replaces the current 'string'. Strings are used all the time in certain kinds
of code, so an identifier shorter than 'string' may be better)

- bool null true false void
(Java uses 'boolean', but 'bool' is enough for D).

- set
(built-in set type, often useful, see Python3 and Fortress)

- list (or sequence, or seq)
(Haskell uses "2-3 finger trees" to implement general purpose sequences, they
may be fit for D too, they aren't meant to replace arrays
http://en.wikipedia.org/wiki/Finger_tree ).

Plus something else for float/double/real complex types.

-----------------------

D used 'auto' for automatic typing of variables:
auto s = "hello";


var s = "hello";

'let' is more logic, shorter than 'auto' and it was used in Basic too:
let s = "hello";

'var' can be written on the keyboard with just the left hand, so it may be
written even faster than 'let'. But 'var' in modern D replaces 'inout'. So I
think 'let' may be a good compromise for D.

Bye,
bearophile
Jul 10 2008
next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"bearophile" <bearophileHUGS yahoo.com> wrote in message 
news:g5539g$cug$1 digitalmars.com...
 Yet another of my lists of silly suggestions, this time shorter :-)

 A syntax like this (that is a bit hairy, I agree) may be useful for 
 time-critical spots in the code:

 auto arr = new int[1_000_000] = int.max;
 auto arr = new int[1_000_000] = void;

 --------------------------

 I may appreciate D 2.0 to have its basic types named like this:

 - int1 int2 int4 int8
 - uint1 uint2 uint4 uint8
 (Fixed size. The number is their length expressed in bytes (shorter than 
 using bits). There aren't int16 / uint16 yet)

 - int
 (built-in multi-precision signed integer. The compiler has heuristics to 
 replace it with a faster fixed size integer in lot of situations. 
 Optimized (if necessary modifying the GC too) for very short numbers that 
 can fit in just 4-8 bytes, so even where the compiler isn't able to 
 replace it with a fixed sized integer the slowdown isn't too much big, 
 this is to hopefully let C++/C programmers be less scared of them. People 
 that need to compute operations on very large integers have to tolerate a 
 bit of constant slowdown, or use GMP).

 - word uword
 (They are the default size of the CPU, 4, 8 or 16 or 32 bytes, so their 
 size isn't fixed. 'word' replaces the current 'size_t')

 I don't like this much, but I don't see better alternatives so far (you 
 need less memory to remember them and dchar, wchar, char):
 - char1, char2, char4
 (They are unicode. char1 is often 1 byte, and char2 is often 2 bytes long, 
 but they may grow to 4, so they aren't fixed size)

 - str
 (replaces the current 'string'. Strings are used all the time in certain 
 kinds of code, so an identifier shorter than 'string' may be better)

 - bool null true false void
 (Java uses 'boolean', but 'bool' is enough for D).

 - set
 (built-in set type, often useful, see Python3 and Fortress)

 - list (or sequence, or seq)
 (Haskell uses "2-3 finger trees" to implement general purpose sequences, 
 they may be fit for D too, they aren't meant to replace arrays
 http://en.wikipedia.org/wiki/Finger_tree ).

 Plus something else for float/double/real complex types.
I read, "please make D's type system the same as Python's" ;) It's fun to dream, but none of this, and I mean _none_ of it, will ever be remotely considered by W. Some things (like integer types that reflect their size, built-in native word types) have been suggested time and time again and W doesn't see the purpose when you have aliases.
 -----------------------

 D used 'auto' for automatic typing of variables:
 auto s = "hello";


 var s = "hello";

 'let' is more logic, shorter than 'auto' and it was used in Basic too:
 let s = "hello";

 'var' can be written on the keyboard with just the left hand, so it may be 
 written even faster than 'let'. But 'var' in modern D replaces 'inout'. So 
 I think 'let' may be a good compromise for D.

 Bye,
 bearophile
I've never had much issue with typing "auto"? I wouldn't have thought that this would be an issue..
Jul 10 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
bearophile:
 - set
 (built-in set type, often useful, see Python3 and Fortress)
An alternative syntax that may be more fitting in D: set of strings: void[str] ss; ---------------- Jarrett Billingsley:
 (like integer types that reflect 
 their size, built-in native word types) have been suggested time and time 
 again and W doesn't see the purpose when you have aliases.
Aliases allow you to use different names, but I think it can't be used to justify the choice/presence of (potentially) worse default names. Bye, bearophile
Jul 10 2008
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:g55hkm$1ash$1 digitalmars.com...
 bearophile:
 - set
 (built-in set type, often useful, see Python3 and Fortress)
An alternative syntax that may be more fitting in D: set of strings: void[str] ss; ---------------- Jarrett Billingsley:
 (like integer types that reflect
 their size, built-in native word types) have been suggested time and time
 again and W doesn't see the purpose when you have aliases.
Aliases allow you to use different names, but I think it can't be used to justify the choice/presence of (potentially) worse default names.
I agree with you 100%. Just convince Walter and everything'll be great! (My argument -- current integer names are qualitative, not quantitative, and it doesn't make sense to assign quantitative values to qualitative names. "long" is no longer "long" when it's the native word size.)
Jul 10 2008
parent reply Markus Koskimies <markus reaaliaika.net> writes:
On Thu, 10 Jul 2008 14:57:54 -0400, Jarrett Billingsley wrote:

 Aliases allow you to use different names, but I think it can't be used
 to justify the choice/presence of (potentially) worse default names.
I agree with you 100%. Just convince Walter and everything'll be great! (My argument -- current integer names are qualitative, not quantitative, and it doesn't make sense to assign quantitative values to qualitative names. "long" is no longer "long" when it's the native word size.)
I'll give my vote to; - i8, i16, i32, i64, ..., u8, u16, u32, ... or intXX / uintXX variants, since if you need to know the width of the number, you think it in bits. No use to use bytes, it's just confusing. - int and uint for default integer size; compiler can decide what to use. - byte, word, dword; these are well defined due to historical reasons. Change them and you confuse lots of people. - short, long; these are also well known types from C, and I think that there is no use to redefine them and confuse people. - ushort, ulong; since short&long are well-defined, why not to have an unsigned variant? --- Hmmh... I would suggest, that there would not be such an element as "uint" (default unsigned integer). Why? - Programmers tend to heavily use "int" in their code - Unsigned variants are mostly used by system software, i.e. they are returned from memXXX, file system calls and such. - When unsigned are used, the programmer knows the width (it is specified by the system). Why not to use byte/dword/ulong/u64 as a type? As the compiler could decide what size to use, it could determine an assignment of unsigned to "int". Let's assume (32-bit systems): u32 getFileSize(); int a = getFileSize(); ==> compiler selects 64-bit integer for circumventing problems in conversion. :) [Yes I know, it would be hard when passing ints to other functions, since the function may be compiled for 32-bit ints only] ;)
Jul 10 2008
parent reply "Nick Sabalausky" <a a.a> writes:
"Markus Koskimies" <markus reaaliaika.net> wrote in message 
news:g55see$1h9i$15 digitalmars.com...
 On Thu, 10 Jul 2008 14:57:54 -0400, Jarrett Billingsley wrote:

 - byte, word, dword; these are well defined due to historical reasons.
 Change them and you confuse lots of people.
A "word" is well-defined to be the native data size of a given chip (memory, cpu, etc). People who have done a lot of PC programming tend to forget that or be unaware of it and end up with the mistaken inpression that it's well-defined to be "two bytes", which has never been true in the general-case.
Jul 10 2008
parent reply Markus Koskimies <markus reaaliaika.net> writes:
On Thu, 10 Jul 2008 19:55:35 -0400, Nick Sabalausky wrote:

 "Markus Koskimies" <markus reaaliaika.net> wrote in message
 news:g55see$1h9i$15 digitalmars.com...
 On Thu, 10 Jul 2008 14:57:54 -0400, Jarrett Billingsley wrote:

 - byte, word, dword; these are well defined due to historical reasons.
 Change them and you confuse lots of people.
A "word" is well-defined to be the native data size of a given chip (memory, cpu, etc). People who have done a lot of PC programming tend to forget that or be unaware of it and end up with the mistaken inpression that it's well-defined to be "two bytes", which has never been true in the general-case.
Hmmh, I disagree. "word" might mean in the history the width of the processor data paths, but nowadays it is 16-bit unsigned even in microcontrollers and DPSs (although DSPs rarely follow fixed width of processor words, e.g. having 20-bit data path, 24/48-bit special registers and accessing memory with 16-bit granularity).
Jul 10 2008
next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Markus Koskimies" <markus reaaliaika.net> wrote in message 
news:g56jog$1h9i$24 digitalmars.com...
 On Thu, 10 Jul 2008 19:55:35 -0400, Nick Sabalausky wrote:

 "Markus Koskimies" <markus reaaliaika.net> wrote in message
 news:g55see$1h9i$15 digitalmars.com...
 On Thu, 10 Jul 2008 14:57:54 -0400, Jarrett Billingsley wrote:

 - byte, word, dword; these are well defined due to historical reasons.
 Change them and you confuse lots of people.
A "word" is well-defined to be the native data size of a given chip (memory, cpu, etc). People who have done a lot of PC programming tend to forget that or be unaware of it and end up with the mistaken inpression that it's well-defined to be "two bytes", which has never been true in the general-case.
Hmmh, I disagree. "word" might mean in the history the width of the processor data paths, but nowadays it is 16-bit unsigned even in microcontrollers and DPSs (although DSPs rarely follow fixed width of processor words, e.g. having 20-bit data path, 24/48-bit special registers and accessing memory with 16-bit granularity).
I get the impression that people who think that "word == 2 bytes" tend to be long-time Windows programmers. Since that's, well, pretty much the only place where that's true.
Jul 10 2008
next sibling parent Markus Koskimies <markus reaaliaika.net> writes:
On Fri, 11 Jul 2008 00:53:28 -0400, Jarrett Billingsley wrote:

 A "word" is well-defined to be the native data size of a given chip
 (memory, cpu, etc). People who have done a lot of PC programming tend
 to forget that or be unaware of it and end up with the mistaken
 inpression that it's well-defined to be "two bytes", which has never
 been true in the general-case.
Hmmh, I disagree. "word" might mean in the history the width of the processor data paths, but nowadays it is 16-bit unsigned even in microcontrollers and DPSs (although DSPs rarely follow fixed width of processor words, e.g. having 20-bit data path, 24/48-bit special registers and accessing memory with 16-bit granularity).
I get the impression that people who think that "word == 2 bytes" tend to be long-time Windows programmers. Since that's, well, pretty much the only place where that's true.
Well, yes, basically, the word being 16-bit dates back to the history of 8086. But anyway, it is nowadays very common in other architectures, too, like ARM. I may remember falsely, but MC68000 series had the same definitions? And like I said, there certainly are processors, which word size is completely different to 16-bits, but if you are programming them with C (and many times with assembler also), the short/word is 16 bits. And yes, there are exceptions.
Jul 10 2008
prev sibling parent Jussi Jumppanen <jussij zeusedit.com> writes:
Jarrett Billingsley Wrote:

 I get the impression that people who think 
 that "word == 2 bytes" tend to be long-time 
 Windows programmers.  Since that's, well, 
 pretty much the only place where that's true. 
As one of those long time Windows programmers, I know a "WORD == 2 bytes" only because that's how WORD is defined in the Windows SDK headers. I also know it is half the size of a DWORD (i.e. double word). As to how the question of how many bytes there are in a "word", I wouldn't have a clue ;)
Jul 10 2008
prev sibling parent Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:
On 2008-07-10 22:24:00 -0500, Markus Koskimies <markus reaaliaika.net> said:

 On Thu, 10 Jul 2008 19:55:35 -0400, Nick Sabalausky wrote:
 
 "Markus Koskimies" <markus reaaliaika.net> wrote in message
 news:g55see$1h9i$15 digitalmars.com...
 On Thu, 10 Jul 2008 14:57:54 -0400, Jarrett Billingsley wrote:
 
 - byte, word, dword; these are well defined due to historical reasons.
 Change them and you confuse lots of people.
 
 
A "word" is well-defined to be the native data size of a given chip (memory, cpu, etc). People who have done a lot of PC programming tend to forget that or be unaware of it and end up with the mistaken inpression that it's well-defined to be "two bytes", which has never been true in the general-case.
Hmmh, I disagree. "word" might mean in the history the width of the processor data paths, but nowadays it is 16-bit unsigned even in microcontrollers and DPSs (although DSPs rarely follow fixed width of processor words, e.g. having 20-bit data path, 24/48-bit special registers and accessing memory with 16-bit granularity).
I normally just stalk this list, but I had to jump in here. The PowerPC is one of the most widely-used processors in embedded systems (and has seen not insignificant usage in the desktop world as well), and all IBM/Motorola documentation refers to a "word" as 32 bits, and a "half-word" as 16 bits. Both of these definitions apply to both the 32-bit and 64-bit PowerPC implementations. As an example, one of the assembly mnemonics for loading a 32-bit value from memory is "lwz" (load word and zero).
Jul 13 2008
prev sibling next sibling parent Yigal Chripun <yigal100 gmail.com> writes:
bearophile wrote:
 Yet another of my lists of silly suggestions, this time shorter :-)
 
 A syntax like this (that is a bit hairy, I agree) may be useful for
time-critical spots in the code:
 
 auto arr = new int[1_000_000] = int.max;
 auto arr = new int[1_000_000] = void;
 
 --------------------------
 
 I may appreciate D 2.0 to have its basic types named like this:
 
 - int1 int2 int4 int8
 - uint1 uint2 uint4 uint8
 (Fixed size. The number is their length expressed in bytes (shorter than using
bits). There aren't int16 / uint16 yet)
 
 - int
 (built-in multi-precision signed integer. The compiler has heuristics to
replace it with a faster fixed size integer in lot of situations. Optimized (if
necessary modifying the GC too) for very short numbers that can fit in just 4-8
bytes, so even where the compiler isn't able to replace it with a fixed sized
integer the slowdown isn't too much big, this is to hopefully let C++/C
programmers be less scared of them. People that need to compute operations on
very large integers have to tolerate a bit of constant slowdown, or use GMP).
 
 - word uword
 (They are the default size of the CPU, 4, 8 or 16 or 32 bytes, so their size
isn't fixed. 'word' replaces the current 'size_t')
 
 I don't like this much, but I don't see better alternatives so far (you need
less memory to remember them and dchar, wchar, char):
 - char1, char2, char4
 (They are unicode. char1 is often 1 byte, and char2 is often 2 bytes long, but
they may grow to 4, so they aren't fixed size)
 
 - str
 (replaces the current 'string'. Strings are used all the time in certain kinds
of code, so an identifier shorter than 'string' may be better)
 
 - bool null true false void
 (Java uses 'boolean', but 'bool' is enough for D).
 
 - set
 (built-in set type, often useful, see Python3 and Fortress)
 
 - list (or sequence, or seq)
 (Haskell uses "2-3 finger trees" to implement general purpose sequences, they
may be fit for D too, they aren't meant to replace arrays
 http://en.wikipedia.org/wiki/Finger_tree ).
 
 Plus something else for float/double/real complex types.
 
 -----------------------
 
 D used 'auto' for automatic typing of variables:
 auto s = "hello";
 

 var s = "hello";
 
 'let' is more logic, shorter than 'auto' and it was used in Basic too:
 let s = "hello";
 
 'var' can be written on the keyboard with just the left hand, so it may be
written even faster than 'let'. But 'var' in modern D replaces 'inout'. So I
think 'let' may be a good compromise for D.
 
 Bye,
 bearophile
Fortran uses int4 and such. Do we really want to go back to that? I personally dislike all those Fortran suggestions but I agree that the current C style also can be improved, so here's my idea: let's generalize the type system. Scala for example has a value type and a reference type that all types inherit from accordingly. What I mean is that I'd like the basic types to become oop objects so that you can use a more uniform syntax. currently in D: auto x = new Class; int y; x.toString; toString(y); I'd prefer to be able to do: y.toString; themselves act like objects. I think this could be implemented as special structs known by the compiler. the types will get "constructors" so that the number of reserved keywords can be reduced. for example: - int will be unbounded integer that the compiler can optimize its internal representation according to its value. - int(size) for a fixed size int. so int(4) is the same as the above int4 in bearophile's suggestion. for signed/unsigned we can have an "unsigned" keyword, a similar "uint" type or maybe just another type-constructor: int(4, u) would be unsigned 4 byte int. - word/byte or maybe even machine(word)/machine(byte) - floatig point: float(fast)/float(large)/float(size)/etc.. - fixed point: fixed(size) or decimal(size) etc.. that's just a rough draft of an idea.. -- Yigal
Jul 10 2008
prev sibling parent Jason House <jason.james.house gmail.com> writes:
Markus Koskimies Wrote:
 - i8, i16, i32, i64, ..., u8, u16, u32, ... or intXX / uintXX variants, 
 since if you need to know the width of the number, you think it in bits. 
 No use to use bytes, it's just confusing.
I too would prefer to have integer sizes in terms of bits rather than bytes as well. As far as what my ideal integer declaration syntax is, I'm still not sure. I certainly hate tying the hands of the optimizer to use a specific size integer just because it's part of the language spec. It may be good to standardize sizes for serialization, even if things are stored in a native size. I also wonder about how overflow/underflow should be handled when -release is not used. Of course, I shouldn't even be thinking about that when there's no check against dereferences nulls
Jul 11 2008