digitalmars.D - Re: Signed word lengths and indexes

bearophile <bearophileHUGS lycos.com> Jun 18 2010

Michel Fortin <michel.fortin michelf.com> Jun 18 2010

bearophile <bearophileHUGS lycos.com> Jun 18 2010

bearophile <bearophileHUGS lycos.com> Jun 18 2010

bearophile <bearophileHUGS lycos.com> Jun 18 2010

bearophile <bearophileHUGS lycos.com> Jun 18 2010

Adam Ruppe <destructionator gmail.com> Jun 18 2010
Adam Ruppe <destructionator gmail.com> Jun 18 2010
Adam Ruppe <destructionator gmail.com> Jun 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Sorry for the slow answer. Reading all this stuff and trying to understand some
of it requires time to me.

Walter Bright:

The Arduino is an 8 bit machine. D is designed for 32 bit and up machines. Full
C++ won't even work on a 16 bit machine, either.<


So D isn't a "better C" because you can't use it in a *large* number of
situations (for every 32 bit CPU built today, they probably build 10 8/16 bit
CPUs) where C is used.


If you're a kernel dev, the language features should not be a problem for you.<


From what I have seen, C++ has a ton of features that are negative for kernel
development. So a language that misses them in the first place is surely
better, because it's simpler to use, and its compiler is smaller and simpler to
debug. About two years ago I have read about an unfocused (and dead) proposal
to write a C compiler just to write the Linux kernel, allowing to avoid GCC.


BTW, you listed nested functions as disqualifying a language from being a
kernel dev language, yet gcc supports nested functions as an extension.<


Nested functions are useful for my D code, I like them and I use them. But in D
(unless they are static!) they create an extra pointer. From what I have read
such silent creation of extra data structures is bad if you are writing a
kernel. So probably a kernel dev can accept only static nested functions. For e
kernel dev the default of nonstatic is bad, because if he/she/shi forgets to
add the "static" attribute then it's probably a bug. This is why I have listed
D nested functions as a negative point for a kernel dev.

Regarding GCC having nested functions (GCC implements them with a trapoline), I
presume kernel devs don't use thie GCC extension. GCC is designed for many
purposes and surely some of its features are not designed for kernel-writing
purposes.


As I pointed out, D implements the bulk of those extensions as a standard part
of D.<


I am studying this still. See below.


They are useful in some circumstances, but are hardly necessary.<


For a low-level programmer they can be positively useful, while several other D
features are useless or actively negative.
I have seen about 15-20% of performance increase using computed gotos in a
finite state machine I have written (that processes strings).
Recently CPython has introduced them with a 15-20% performance improvement:
http://bugs.python.org/issue4753

------------------------------

 It's interesting that D already has most of the gcc extensions:
 http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_4.html


Some more items from that page:

4.5 Constructing Function Calls: this syntax&semantics seems dirty, and I don't
fully understand how to use this stuff. In D I miss a good apply() and a good
general memoize. The memoize is a quick and easy way to cache computations and
to turn recursive functions into efficient dynamic programming algorithms.

-----------

4.13 Arrays of Length Zero: they are available in D, but you get a array bound
error if you try to use them to create variable-length structs. So to use them
you have to to overload the opIndex and opIndexAssign of the struct...

-----------

4.14 Arrays of Variable Length (allocated on the stack): this is missing in D.
Using alloca is a workaround.

-----------

4.21 Case Ranges: D has this, but I am not sure D syntax is better.

-----------

4.22 Cast to a Union Type: this is missing in D, it can be done anyway using
adding a static opCall to the enum for each of its fields:

union Foo {
    int i;
    double d;
    static Foo opCall(int ii) { Foo f; f.i = ii; return f; }
    static Foo opCall(double dd) { Foo f; f.d = dd; return f; }
}
void main() {
    Foo f1 = Foo(10);
    Foo f2 = Foo(10.5);
}

-----------

4.23 Declaring Attributes of Functions 

noreturn: missing in D. But I am not sure how much useful this is, the page
says: >it helps avoid spurious warnings of uninitialized variables.<

format (archetype, string-index, first-to-check) and format_arg (string-index):
they are missing in D, and it can be useful for people that want to use
std.c.stdio.printf.

no_instrument_function: missing in D. It can be useful to not profile a
function.

section ("section-name"): missing in D.

no_check_memory_usage: I don't understand this.

-----------

4.29 Specifying Attributes of Variables 

aligned (alignment): I think D doesn't allow to specify an align for
fixed-sized arrays.

nocommon: I don't understand this.

-----------

4.30 Specifying Attributes of Types 

transparent_union: D missed this, but I don't know how much useful this is.

-----------

4.34 Variables in Specified Registers: missing in D.
4.34.1, 4.34.2

Recently a Hashell middle-end for LLVM has shown that LLVM can be used to use
registers better than fixing them in specified registers (so they are in
specified registers only outside functions and this frees registers inside the
functions and increases performance a bit).

-----------

4.37 Function Names as Strings: I think this is missing in D. __FUNCTION__ can
be useful with string mixins.

-----------

4.38 Getting the Return or Frame Address of a Function: missing in D. I don't
know when to use this.

-----------

4.39 Other built-in functions provided by GNU CC:

__builtin_constant_p: missing in D. It can be useful with static if.

------------------------------

There are other pages of docs about GCC, like this one:
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

alloc_size: I don't know how much this can be useful in D, probably not much.

artificial: I don't understand this.

error ("message") and warning ("message"): I don't fully understand them.

malloc: missing in D (it's a function attribute).

noinline: missign in D.

noclone: missing in D (cloning happens with LDC).

nonnull (arg-index, ...): ah ah, missing in D :-) I didn't know about this
attribute. But a much better syntax can be used in D.

optimize: missing in D, useful. Often used in CLisp.

pcs: missing in D.

hot, cold: missing in D, but not so useful.

regparm (number): I don't fully understand this.

sseregparm: something like this seems needed in D.

force_align_arg_pointer: missing in D, but I don't understand it fully.

signal: I don't know.

syscall_linkage: missing in D.

target: curious, I don't know if this is needed in D (a static if around the
versions can be enough, but I don't remember if the CPU type is available at
compile-time).

warn_unused_result: missing in D. Can be useful where exceptions can't be used.

------------------------------

I have omitted many attributes and little features useful for specific CPU
targets.
So it seems there is a good number of features present in GNU C that D are
missing in D. I don't know how many of them are used for example in the Linux
kernel.

Bye,
bearophile

Jun 18 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-06-18 08:11:00 -0400, bearophile <bearophileHUGS lycos.com> said:

 4.13 Arrays of Length Zero: they are available in D, but you get a 
 array bound error if you try to use them to create variable-length 
 structs. So to use them you have to to overload the opIndex and 
 opIndexAssign of the struct...


Bypassing bound checks is as easy as appending ".ptr":

	staticArray.ptr[10]; // no bound check

Make an alias to the static array's ptr property if you prefer not to 
have to write .ptr all the time.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Michel Fortin:
 Bypassing bound checks is as easy as appending ".ptr":
 
 	staticArray.ptr[10]; // no bound check
 
 Make an alias to the static array's ptr property if you prefer not to 
 have to write .ptr all the time.


If you try to compile this:

import std.c.stdlib: malloc;
struct Foo {
  int x;
  int[0] a;
}
void main() {
  enum N = 20;
  Foo* f = cast(Foo*)malloc(Foo.sizeof + N * typeof(Foo.a[0]).sizeof);
  f.a.ptr[10] = 5;
}

You receive:
prog.d(9): Error: null dereference in function _Dmain

As I have said, you have to use operator overloading of the struct and some
near-ugly code that uses the offsetof. I don't like this a lot.

Bye,
bearophile

Jun 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Adam Ruppe:
 D need be no uglier than C. Here's my implementation:


That's cute, thank you :-)


 	static void destroy(MyString* s) {
 		free(s);
 	}


Why destroy instead of ~this() ?

Bye,
bearophile

Jun 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Adam Ruppe:

 I don't think a destructor can free the mem
 of its own object.


I see and I'd like to know! :-)

By the way, this program shows your code is not a replacement of the operator
overloading of the variable length struct itself I was talking about, because D
structs can't have length zero (plus 3 bytes of padding, here):

import std.stdio: writeln, write;

struct TailArray(T) {
    T opIndex(size_t idx) {
        T* tmp = cast(T*)(&this) + idx;
        return *tmp;
    }

    T opIndexAssign(T value, size_t idx) {
        T* tmp = cast(T*)(&this) + idx;
        *tmp = value;
        return value;
    }
}

struct MyString1 {
    size_t size;
    TailArray!char data; // not the same as char data[0]; in C
}

struct MyString2 {
    size_t size;
    char[0] data;
}

void main() {
    writeln(MyString1.sizeof); // 8
    writeln(MyString2.sizeof); // 4
}

Bye,
bearophile

Jun 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Adam Ruppe:
 Huh, weird. Doesn't make too much of a difference in practice though,
 since it only changes the malloc line slightly.


Probably it can be fixed, but you have to be careful, because the padding isn't
constant, it can change in size according to the CPU word size and the types of
the data that come before TailArray :-)

Bye,
bearophile

Jun 18 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/18/10, bearophile <bearophileHUGS lycos.com> wrote:
 As I have said, you have to use operator overloading of the struct and some
 near-ugly code that uses the offsetof. I don't like this a lot.


D need be no uglier than C. Here's my implementation:


/*  very_unsafe */ struct TailArray(T) {
	T opIndex(size_t idx) {
		T* tmp = cast(T*) (&this) + idx;
		return *tmp;
	}

	T opIndexAssign(T value, size_t idx) {
		T* tmp = cast(T*) (&this) + idx;
		*tmp = value;
		return value;
	}
}

// And this demonstrates how to use it:

import std.contracts;
import std.c.stdlib;

struct MyString {
	size_t size;
	TailArray!(char) data; // same as char data[0]; in C

                // to show how to construct it
	static MyString* make(size_t size) {
		MyString* item = cast(MyString*) malloc(MyString.sizeof + size);
		enforce(item !is null);
		item.size = size;
		return item;
	}

	static void destroy(MyString* s) {
		free(s);
	}
}

import std.stdio;

void main() {
	MyString* str = MyString.make(5);
	scope(exit) MyString.destroy(str);

                 // assigning works same as C
	str.data[0] = 'H';
	str.data[1] = 'e';
	str.data[2] = 'l';
	str.data[3] = 'l';
	str.data[4] = 'o';

                // And so does getting
	for(int a = 0; a < str.size; a++)
		writef("%s", str.data[a]);
	writefln("");
}

Jun 18 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/18/10, bearophile <bearophileHUGS lycos.com> wrote:
 	static void destroy(MyString* s) {
 		free(s);
 	}


 Why destroy instead of ~this() ?


It allocates and deallocates the memory rather than initialize and
uninitialize the object. I don't think a destructor can free the mem
of its own object.

If I used gc.malloc or stack allocation, the destroy method shouldn't
be necessary at all, since the memory is handled automatically there.

Though, the main reason I did it this way is I was just writing in a C
style rather than a D style, so it was kinda automatic. Still, I'm
pretty sure what I'm saying here about constructor/destructor not able
to actually the memory of the object is true too.

Jun 18 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/18/10, bearophile <bearophileHUGS lycos.com> wrote:
 By the way, this program shows your code is not a replacement of the
 operator overloading of the variable length struct itself I was talking
 about, because D structs can't have length zero (plus 3 bytes of padding,
 here):


Huh, weird. Doesn't make too much of a difference in practice though,
since it only changes the malloc line slightly.

In C, before the array[0] was allowed (actually, I'm not completely
sure it is allowed even now in the standard. C99 added something, but
I don't recall if it is the same thing), people would use array[1].

Since it is at the tail of the struct, and you're using pointer magic
to the raw memory anyway, it doesn't make much of a difference.

Jun 18 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Re: Signed word lengths and indexes