www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Signed word lengths and indexes

reply bearophile <bearophileHUGS lycos.com> writes:
Sorry for the slow answer. Reading all this stuff and trying to understand some
of it requires time to me.

Walter Bright:

The Arduino is an 8 bit machine. D is designed for 32 bit and up machines. Full
C++ won't even work on a 16 bit machine, either.<

So D isn't a "better C" because you can't use it in a *large* number of situations (for every 32 bit CPU built today, they probably build 10 8/16 bit CPUs) where C is used.
If you're a kernel dev, the language features should not be a problem for you.<

From what I have seen, C++ has a ton of features that are negative for kernel development. So a language that misses them in the first place is surely better, because it's simpler to use, and its compiler is smaller and simpler to debug. About two years ago I have read about an unfocused (and dead) proposal to write a C compiler just to write the Linux kernel, allowing to avoid GCC.
BTW, you listed nested functions as disqualifying a language from being a
kernel dev language, yet gcc supports nested functions as an extension.<

Nested functions are useful for my D code, I like them and I use them. But in D (unless they are static!) they create an extra pointer. From what I have read such silent creation of extra data structures is bad if you are writing a kernel. So probably a kernel dev can accept only static nested functions. For e kernel dev the default of nonstatic is bad, because if he/she/shi forgets to add the "static" attribute then it's probably a bug. This is why I have listed D nested functions as a negative point for a kernel dev. Regarding GCC having nested functions (GCC implements them with a trapoline), I presume kernel devs don't use thie GCC extension. GCC is designed for many purposes and surely some of its features are not designed for kernel-writing purposes.
As I pointed out, D implements the bulk of those extensions as a standard part
of D.<

I am studying this still. See below.
They are useful in some circumstances, but are hardly necessary.<

For a low-level programmer they can be positively useful, while several other D features are useless or actively negative. I have seen about 15-20% of performance increase using computed gotos in a finite state machine I have written (that processes strings). Recently CPython has introduced them with a 15-20% performance improvement: http://bugs.python.org/issue4753 ------------------------------
 It's interesting that D already has most of the gcc extensions:
 http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_4.html

Some more items from that page: 4.5 Constructing Function Calls: this syntax&semantics seems dirty, and I don't fully understand how to use this stuff. In D I miss a good apply() and a good general memoize. The memoize is a quick and easy way to cache computations and to turn recursive functions into efficient dynamic programming algorithms. ----------- 4.13 Arrays of Length Zero: they are available in D, but you get a array bound error if you try to use them to create variable-length structs. So to use them you have to to overload the opIndex and opIndexAssign of the struct... ----------- 4.14 Arrays of Variable Length (allocated on the stack): this is missing in D. Using alloca is a workaround. ----------- 4.21 Case Ranges: D has this, but I am not sure D syntax is better. ----------- 4.22 Cast to a Union Type: this is missing in D, it can be done anyway using adding a static opCall to the enum for each of its fields: union Foo { int i; double d; static Foo opCall(int ii) { Foo f; f.i = ii; return f; } static Foo opCall(double dd) { Foo f; f.d = dd; return f; } } void main() { Foo f1 = Foo(10); Foo f2 = Foo(10.5); } ----------- 4.23 Declaring Attributes of Functions noreturn: missing in D. But I am not sure how much useful this is, the page says: >it helps avoid spurious warnings of uninitialized variables.< format (archetype, string-index, first-to-check) and format_arg (string-index): they are missing in D, and it can be useful for people that want to use std.c.stdio.printf. no_instrument_function: missing in D. It can be useful to not profile a function. section ("section-name"): missing in D. no_check_memory_usage: I don't understand this. ----------- 4.29 Specifying Attributes of Variables aligned (alignment): I think D doesn't allow to specify an align for fixed-sized arrays. nocommon: I don't understand this. ----------- 4.30 Specifying Attributes of Types transparent_union: D missed this, but I don't know how much useful this is. ----------- 4.34 Variables in Specified Registers: missing in D. 4.34.1, 4.34.2 Recently a Hashell middle-end for LLVM has shown that LLVM can be used to use registers better than fixing them in specified registers (so they are in specified registers only outside functions and this frees registers inside the functions and increases performance a bit). ----------- 4.37 Function Names as Strings: I think this is missing in D. __FUNCTION__ can be useful with string mixins. ----------- 4.38 Getting the Return or Frame Address of a Function: missing in D. I don't know when to use this. ----------- 4.39 Other built-in functions provided by GNU CC: __builtin_constant_p: missing in D. It can be useful with static if. ------------------------------ There are other pages of docs about GCC, like this one: http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html alloc_size: I don't know how much this can be useful in D, probably not much. artificial: I don't understand this. error ("message") and warning ("message"): I don't fully understand them. malloc: missing in D (it's a function attribute). noinline: missign in D. noclone: missing in D (cloning happens with LDC). nonnull (arg-index, ...): ah ah, missing in D :-) I didn't know about this attribute. But a much better syntax can be used in D. optimize: missing in D, useful. Often used in CLisp. pcs: missing in D. hot, cold: missing in D, but not so useful. regparm (number): I don't fully understand this. sseregparm: something like this seems needed in D. force_align_arg_pointer: missing in D, but I don't understand it fully. signal: I don't know. syscall_linkage: missing in D. target: curious, I don't know if this is needed in D (a static if around the versions can be enough, but I don't remember if the CPU type is available at compile-time). warn_unused_result: missing in D. Can be useful where exceptions can't be used. ------------------------------ I have omitted many attributes and little features useful for specific CPU targets. So it seems there is a good number of features present in GNU C that D are missing in D. I don't know how many of them are used for example in the Linux kernel. Bye, bearophile
Jun 18 2010
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-06-18 08:11:00 -0400, bearophile <bearophileHUGS lycos.com> said:

 4.13 Arrays of Length Zero: they are available in D, but you get a 
 array bound error if you try to use them to create variable-length 
 structs. So to use them you have to to overload the opIndex and 
 opIndexAssign of the struct...

Bypassing bound checks is as easy as appending ".ptr": staticArray.ptr[10]; // no bound check Make an alias to the static array's ptr property if you prefer not to have to write .ptr all the time. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jun 18 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Michel Fortin:
 Bypassing bound checks is as easy as appending ".ptr":
 
 	staticArray.ptr[10]; // no bound check
 
 Make an alias to the static array's ptr property if you prefer not to 
 have to write .ptr all the time.

If you try to compile this: import std.c.stdlib: malloc; struct Foo { int x; int[0] a; } void main() { enum N = 20; Foo* f = cast(Foo*)malloc(Foo.sizeof + N * typeof(Foo.a[0]).sizeof); f.a.ptr[10] = 5; } You receive: prog.d(9): Error: null dereference in function _Dmain As I have said, you have to use operator overloading of the struct and some near-ugly code that uses the offsetof. I don't like this a lot. Bye, bearophile
Jun 18 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Adam Ruppe:
 D need be no uglier than C. Here's my implementation:

That's cute, thank you :-)
 	static void destroy(MyString* s) {
 		free(s);
 	}

Why destroy instead of ~this() ? Bye, bearophile
Jun 18 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Adam Ruppe:

 I don't think a destructor can free the mem
 of its own object.

I see and I'd like to know! :-) By the way, this program shows your code is not a replacement of the operator overloading of the variable length struct itself I was talking about, because D structs can't have length zero (plus 3 bytes of padding, here): import std.stdio: writeln, write; struct TailArray(T) { T opIndex(size_t idx) { T* tmp = cast(T*)(&this) + idx; return *tmp; } T opIndexAssign(T value, size_t idx) { T* tmp = cast(T*)(&this) + idx; *tmp = value; return value; } } struct MyString1 { size_t size; TailArray!char data; // not the same as char data[0]; in C } struct MyString2 { size_t size; char[0] data; } void main() { writeln(MyString1.sizeof); // 8 writeln(MyString2.sizeof); // 4 } Bye, bearophile
Jun 18 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Adam Ruppe:
 Huh, weird. Doesn't make too much of a difference in practice though,
 since it only changes the malloc line slightly.

Probably it can be fixed, but you have to be careful, because the padding isn't constant, it can change in size according to the CPU word size and the types of the data that come before TailArray :-) Bye, bearophile
Jun 18 2010
prev sibling next sibling parent Adam Ruppe <destructionator gmail.com> writes:
On 6/18/10, bearophile <bearophileHUGS lycos.com> wrote:
 As I have said, you have to use operator overloading of the struct and some
 near-ugly code that uses the offsetof. I don't like this a lot.

D need be no uglier than C. Here's my implementation: /* very_unsafe */ struct TailArray(T) { T opIndex(size_t idx) { T* tmp = cast(T*) (&this) + idx; return *tmp; } T opIndexAssign(T value, size_t idx) { T* tmp = cast(T*) (&this) + idx; *tmp = value; return value; } } // And this demonstrates how to use it: import std.contracts; import std.c.stdlib; struct MyString { size_t size; TailArray!(char) data; // same as char data[0]; in C // to show how to construct it static MyString* make(size_t size) { MyString* item = cast(MyString*) malloc(MyString.sizeof + size); enforce(item !is null); item.size = size; return item; } static void destroy(MyString* s) { free(s); } } import std.stdio; void main() { MyString* str = MyString.make(5); scope(exit) MyString.destroy(str); // assigning works same as C str.data[0] = 'H'; str.data[1] = 'e'; str.data[2] = 'l'; str.data[3] = 'l'; str.data[4] = 'o'; // And so does getting for(int a = 0; a < str.size; a++) writef("%s", str.data[a]); writefln(""); }
Jun 18 2010
prev sibling next sibling parent Adam Ruppe <destructionator gmail.com> writes:
On 6/18/10, bearophile <bearophileHUGS lycos.com> wrote:
 	static void destroy(MyString* s) {
 		free(s);
 	}

Why destroy instead of ~this() ?

It allocates and deallocates the memory rather than initialize and uninitialize the object. I don't think a destructor can free the mem of its own object. If I used gc.malloc or stack allocation, the destroy method shouldn't be necessary at all, since the memory is handled automatically there. Though, the main reason I did it this way is I was just writing in a C style rather than a D style, so it was kinda automatic. Still, I'm pretty sure what I'm saying here about constructor/destructor not able to actually the memory of the object is true too.
Jun 18 2010
prev sibling parent Adam Ruppe <destructionator gmail.com> writes:
On 6/18/10, bearophile <bearophileHUGS lycos.com> wrote:
 By the way, this program shows your code is not a replacement of the
 operator overloading of the variable length struct itself I was talking
 about, because D structs can't have length zero (plus 3 bytes of padding,
 here):

Huh, weird. Doesn't make too much of a difference in practice though, since it only changes the malloc line slightly. In C, before the array[0] was allowed (actually, I'm not completely sure it is allowed even now in the standard. C99 added something, but I don't recall if it is the same thing), people would use array[1]. Since it is at the tail of the struct, and you're using pointer magic to the raw memory anyway, it doesn't make much of a difference.
Jun 18 2010