www.digitalmars.com         C & C++   DMDScript  

D.gnu - Object file questions

reply "Timo Sintonen" <t.sintonen luukku.com> writes:
I have been looking at object files to see if I can reduce the 
memory usage for minimum systems. There are two things I have 
noticed:

1. In the data segment there is some source code as ascii text 
from a template in gcc/atomics.d . This is in the actual data 
segment and not in debug info segments and goes into the data 
segment of the executable. I do not see any code using this data. 
Why is this in the executable and is it possible to remove it?

2. In the data segment there is also __init for all types. I 
assume that they contain the initial values that are copied when 
a new object of this type is created. Is this data mutable and 
should it really be in data segment and not in rodata?
Aug 14 2014
parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 14 Aug 2014 10:07:04 +0000
schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 I have been looking at object files to see if I can reduce the 
 memory usage for minimum systems. There are two things I have 
 noticed:
 
 1. In the data segment there is some source code as ascii text 
 from a template in gcc/atomics.d . This is in the actual data 
 segment and not in debug info segments and goes into the data 
 segment of the executable. I do not see any code using this data. 
 Why is this in the executable and is it possible to remove it?
 
Strange, could you post a testcase?
 2. In the data segment there is also __init for all types. I 
 assume that they contain the initial values that are copied when 
 a new object of this type is created.
Correct, it's for '.init' (there's especially __..._TypeInfo_init which is the initializer for typeinfo. I've implemented -fno-rtti in a private git branch to get rid of typeinfo)
 Is this data mutable and 
 should it really be in data segment and not in rodata?
 
I think it should be in rodata.
Aug 14 2014
parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Thursday, 14 August 2014 at 17:13:23 UTC, Johannes Pfau wrote:
 Am Thu, 14 Aug 2014 10:07:04 +0000
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 I have been looking at object files to see if I can reduce the 
 memory usage for minimum systems. There are two things I have 
 noticed:
 
 1. In the data segment there is some source code as ascii text 
 from a template in gcc/atomics.d . This is in the actual data 
 segment and not in debug info segments and goes into the data 
 segment of the executable. I do not see any code using this 
 data. Why is this in the executable and is it possible to 
 remove it?
 
Strange, could you post a testcase?
It seems this comes from libdruntime and it exists in object.o and core/atomic.o, Testcase is to compile minlibd library as it is currently in the repo using the makefile as such. But I think it will be in any object file that imports gcc.atomics and uses the template in there.
 2. In the data segment there is also __init for all types. I 
 assume that they contain the initial values that are copied 
 when a new object of this type is created.
Correct, it's for '.init' (there's especially __..._TypeInfo_init which is the initializer for typeinfo. I've implemented -fno-rtti in a private git branch to get rid of typeinfo)
 Is this data mutable and should it really be in data segment 
 and not in rodata?
 
I think it should be in rodata.
So it is not a bug and not a feature. It is just because it does not matter? Maybe a feature request?
Aug 14 2014
next sibling parent "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/14/14 19:53, Timo Sintonen via D.gnu wrote:
 On Thursday, 14 August 2014 at 17:13:23 UTC, Johannes Pfau wrote:
 Am Thu, 14 Aug 2014 10:07:04 +0000
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 I have been looking at object files to see if I can reduce the memory usage
for minimum systems. There are two things I have noticed:

 1. In the data segment there is some source code as ascii text from a template
in gcc/atomics.d . This is in the actual data segment and not in debug info
segments and goes into the data segment of the executable. I do not see any
code using this data. Why is this in the executable and is it possible to
remove it?
Strange, could you post a testcase?
It seems this comes from libdruntime and it exists in object.o and core/atomic.o, Testcase is to compile minlibd library as it is currently in the repo using the makefile as such. But I think it will be in any object file that imports gcc.atomics and uses the template in there.
diff --git a/libphobos/libdruntime/gcc/atomics.d b/libphobos/libdruntime/gcc/atomics.d index 78e644191e8f..ee1a146b680e 100644 --- a/libphobos/libdruntime/gcc/atomics.d +++ b/libphobos/libdruntime/gcc/atomics.d -28,7 +28,7 import gcc.builtins; */ private template __sync_op_and(string op1, string op2) { - const __sync_op_and = ` + enum __sync_op_and = ` T __sync_` ~ op1 ~ `_and_` ~ op2 ~ `(T)(const ref shared T ptr, T value) { static if (T.sizeof == byte.sizeof) artur
Aug 14 2014
prev sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 14 Aug 2014 17:53:32 +0000
schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Thursday, 14 August 2014 at 17:13:23 UTC, Johannes Pfau wrote:
 Am Thu, 14 Aug 2014 10:07:04 +0000
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 I have been looking at object files to see if I can reduce the 
 memory usage for minimum systems. There are two things I have 
 noticed:
 
 1. In the data segment there is some source code as ascii text 
 from a template in gcc/atomics.d . This is in the actual data 
 segment and not in debug info segments and goes into the data 
 segment of the executable. I do not see any code using this 
 data. Why is this in the executable and is it possible to 
 remove it?
 
Strange, could you post a testcase?
It seems this comes from libdruntime and it exists in object.o and core/atomic.o, Testcase is to compile minlibd library as it is currently in the repo using the makefile as such. But I think it will be in any object file that imports gcc.atomics and uses the template in there.
If you're referring to this: http://dpaste.dzfl.pl/fe75e8c7dfca This seems to be the const variable in __sync_op_and. Try to change the code to "immutable __sync_op_and = " or "enum __sync_op_and = " and file a bug report.
 2. In the data segment there is also __init for all types. I 
 assume that they contain the initial values that are copied 
 when a new object of this type is created.
Correct, it's for '.init' (there's especially __..._TypeInfo_init which is the initializer for typeinfo. I've implemented -fno-rtti in a private git branch to get rid of typeinfo)
 Is this data mutable and should it really be in data segment 
 and not in rodata?
 
I think it should be in rodata.
So it is not a bug and not a feature. It is just because it does not matter? Maybe a feature request?
Seems to happen only for the TypeInfo init symbols. I can't run the testsuite right now, but try this: diff --git a/gcc/d/d-decls.cc b/gcc/d/d-decls.cc index bd6f5f9..45d433a 100644 --- a/gcc/d/d-decls.cc +++ b/gcc/d/d-decls.cc -274,6 +274,8 TypeInfoDeclaration::toSymbol (void) // given TypeInfo. It is the actual data, not a reference gcc_assert (TREE_CODE (TREE_TYPE (csym->Stree)) == REFERENCE_TYPE); TREE_TYPE (csym->Stree) = TREE_TYPE (TREE_TYPE (csym->Stree)); + TREE_CONSTANT (csym->Stree) = true; + TREE_READONLY (csym->Stree) = true; relayout_decl (csym->Stree); TREE_USED (csym->Stree) = 1;
Aug 14 2014
parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Thursday, 14 August 2014 at 19:05:46 UTC, Johannes Pfau wrote:
 Am Thu, 14 Aug 2014 17:53:32 +0000
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Thursday, 14 August 2014 at 17:13:23 UTC, Johannes Pfau 
 wrote:
 Am Thu, 14 Aug 2014 10:07:04 +0000
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 I have been looking at object files to see if I can reduce 
 the memory usage for minimum systems. There are two things 
 I have noticed:
 
 1. In the data segment there is some source code as ascii 
 text from a template in gcc/atomics.d . This is in the 
 actual data segment and not in debug info segments and goes 
 into the data segment of the executable. I do not see any 
 code using this data. Why is this in the executable and is 
 it possible to remove it?
 
Strange, could you post a testcase?
It seems this comes from libdruntime and it exists in object.o and core/atomic.o, Testcase is to compile minlibd library as it is currently in the repo using the makefile as such. But I think it will be in any object file that imports gcc.atomics and uses the template in there.
If you're referring to this: http://dpaste.dzfl.pl/fe75e8c7dfca This seems to be the const variable in __sync_op_and. Try to change the code to "immutable __sync_op_and = " or "enum __sync_op_and = " and file a bug report.
 2. In the data segment there is also __init for all types. 
 I assume that they contain the initial values that are 
 copied when a new object of this type is created.
Correct, it's for '.init' (there's especially __..._TypeInfo_init which is the initializer for typeinfo. I've implemented -fno-rtti in a private git branch to get rid of typeinfo)
 Is this data mutable and should it really be in data 
 segment and not in rodata?
 
I think it should be in rodata.
So it is not a bug and not a feature. It is just because it does not matter? Maybe a feature request?
Seems to happen only for the TypeInfo init symbols. I can't run the testsuite right now, but try this: diff --git a/gcc/d/d-decls.cc b/gcc/d/d-decls.cc index bd6f5f9..45d433a 100644 --- a/gcc/d/d-decls.cc +++ b/gcc/d/d-decls.cc -274,6 +274,8 TypeInfoDeclaration::toSymbol (void) // given TypeInfo. It is the actual data, not a reference gcc_assert (TREE_CODE (TREE_TYPE (csym->Stree)) == REFERENCE_TYPE); TREE_TYPE (csym->Stree) = TREE_TYPE (TREE_TYPE (csym->Stree)); + TREE_CONSTANT (csym->Stree) = true; + TREE_READONLY (csym->Stree) = true; relayout_decl (csym->Stree); TREE_USED (csym->Stree) = 1;
Looks good. Template code is gone and init blocks have moved to rodata. My simple test program compiles and runs. There is still some __Class in data segment and init values for structs and arrays in bss segment. Is it possible to move these to rodata too? In my application there will be several large structs. I never create anything of these types. Instead I use them to point to hardware registers and maybe on top of existing byte arrays like message buffers. There will still be initial values for these structs wasting memory. Is there any way to omit them?
Aug 16 2014
next sibling parent "ketmar via D.gnu" <d.gnu puremagic.com> writes:
On Sat, 16 Aug 2014 07:06:34 +0000
"Timo Sintonen via D.gnu" <d.gnu puremagic.com> wrote:

 structs wasting memory. Is there any way to omit them?
maybe this will work: struct A { int n =3D void; uint[2] a =3D void; ...and so on for all fields }
Aug 16 2014
prev sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 16 Aug 2014 07:06:34 +0000
schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Thursday, 14 August 2014 at 19:05:46 UTC, Johannes Pfau wrote:
 Am Thu, 14 Aug 2014 17:53:32 +0000
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Thursday, 14 August 2014 at 17:13:23 UTC, Johannes Pfau 
 wrote:
 Am Thu, 14 Aug 2014 10:07:04 +0000
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 I have been looking at object files to see if I can reduce 
 the memory usage for minimum systems. There are two things 
 I have noticed:
 
 1. In the data segment there is some source code as ascii 
 text from a template in gcc/atomics.d . This is in the 
 actual data segment and not in debug info segments and goes 
 into the data segment of the executable. I do not see any 
 code using this data. Why is this in the executable and is 
 it possible to remove it?
 
Strange, could you post a testcase?
It seems this comes from libdruntime and it exists in object.o and core/atomic.o, Testcase is to compile minlibd library as it is currently in the repo using the makefile as such. But I think it will be in any object file that imports gcc.atomics and uses the template in there.
If you're referring to this: http://dpaste.dzfl.pl/fe75e8c7dfca This seems to be the const variable in __sync_op_and. Try to change the code to "immutable __sync_op_and = " or "enum __sync_op_and = " and file a bug report.
 2. In the data segment there is also __init for all types. 
 I assume that they contain the initial values that are 
 copied when a new object of this type is created.
Correct, it's for '.init' (there's especially __..._TypeInfo_init which is the initializer for typeinfo. I've implemented -fno-rtti in a private git branch to get rid of typeinfo)
 Is this data mutable and should it really be in data 
 segment and not in rodata?
 
I think it should be in rodata.
So it is not a bug and not a feature. It is just because it does not matter? Maybe a feature request?
Seems to happen only for the TypeInfo init symbols. I can't run the testsuite right now, but try this: diff --git a/gcc/d/d-decls.cc b/gcc/d/d-decls.cc index bd6f5f9..45d433a 100644 --- a/gcc/d/d-decls.cc +++ b/gcc/d/d-decls.cc -274,6 +274,8 TypeInfoDeclaration::toSymbol (void) // given TypeInfo. It is the actual data, not a reference gcc_assert (TREE_CODE (TREE_TYPE (csym->Stree)) == REFERENCE_TYPE); TREE_TYPE (csym->Stree) = TREE_TYPE (TREE_TYPE (csym->Stree)); + TREE_CONSTANT (csym->Stree) = true; + TREE_READONLY (csym->Stree) = true; relayout_decl (csym->Stree); TREE_USED (csym->Stree) = 1;
Looks good. Template code is gone and init blocks have moved to rodata. My simple test program compiles and runs. There is still some __Class in data segment and init values for structs and arrays in bss segment. Is it possible to move these to rodata too?
Iain recently pushed a commit to put zero initializers into bss, so that's intentional: http://bugzilla.gdcproject.org/show_bug.cgi?id=139 But I understand your point that it should be in rodata instead, you'll have to discuss this with Iain. Regarding __Class: Can you post a short example?
 
 In my application there will be several large structs. I never 
 create anything of these types. Instead I use them to point to 
 hardware registers and maybe on top of existing byte arrays like 
 message buffers. There will still be initial values for these 
 structs wasting memory. Is there any way to omit them?
 
See https://github.com/D-Programming-GDC/GDC/pull/82 attribute("noinit")
Aug 16 2014
next sibling parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Saturday, 16 August 2014 at 07:36:07 UTC, Johannes Pfau wrote:

 Iain recently pushed a commit to put zero initializers into 
 bss, so
 that's intentional:
 http://bugzilla.gdcproject.org/show_bug.cgi?id=139
 But I understand your point that it should be in rodata 
 instead, you'll
 have to discuss this with Iain.
It is true that bss does not take place in the executable. But in small processors, even there is nowadays plenty of rom there is not enough ram. It is also a question of safety: in the long run, data area may be corrupted by buggy program or electrical distort while rodata in rom cannot be changed. At least in my setup, gold maps bss to executable anyway while ld does not. I noticed your comment in the bug report. I was just thinking the same: one big block of zeros that is used by all. Another that I was thinking is that memset might be used for these types. Then there would be no block of zeros at all. But that would require an extra flag in typeinfo to separate these types from others...

 Regarding __Class: Can you post a short example?
Some lines from mapfile. Seems to be one for every type in the program: .data 0x0000000020001074 0x720 minlibd/libdruntime/libdruntime.a(object_.o) 0x0000000020001074 _D9Exception7__ClassZ 0x00000000200010c0 _D8TypeInfo7__ClassZ 0x000000002000110c _D17TypeInfo_Function7__ClassZ 0x0000000020001158 _D17TypeInfo_Delegate7__ClassZ 0x00000000200011a4 _D14TypeInfo_Class7__ClassZ 0x00000000200011f0 _D18TypeInfo_Interface7__ClassZ 0x000000002000123c _D15TypeInfo_Struct7__ClassZ 0x0000000020001288 _D16TypeInfo_Typedef7__ClassZ
 
 In my application there will be several large structs. I never 
 create anything of these types. Instead I use them to point to 
 hardware registers and maybe on top of existing byte arrays 
 like message buffers. There will still be initial values for 
 these structs wasting memory. Is there any way to omit them?
 
See https://github.com/D-Programming-GDC/GDC/pull/82 attribute("noinit")
Yes this will solve the problem.
Aug 16 2014
parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 16 Aug 2014 08:39:04 +0000
schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 
 Regarding __Class: Can you post a short example?
Some lines from mapfile. Seems to be one for every type in the program: .data 0x0000000020001074 0x720 minlibd/libdruntime/libdruntime.a(object_.o) 0x0000000020001074 _D9Exception7__ClassZ 0x00000000200010c0 _D8TypeInfo7__ClassZ 0x000000002000110c _D17TypeInfo_Function7__ClassZ 0x0000000020001158 _D17TypeInfo_Delegate7__ClassZ 0x00000000200011a4 _D14TypeInfo_Class7__ClassZ 0x00000000200011f0 _D18TypeInfo_Interface7__ClassZ 0x000000002000123c _D15TypeInfo_Struct7__ClassZ 0x0000000020001288 _D16TypeInfo_Typedef7__ClassZ
I just had a look at this and ClassInfo has a mutable 'monitor' field, so it can't be placed into read-only data.
Aug 16 2014
parent reply "Mike" <none none.com> writes:
On Saturday, 16 August 2014 at 09:29:14 UTC, Johannes Pfau wrote:

 I just had a look at this and ClassInfo has a mutable 'monitor' 
 field,
 so it can't be placed into read-only data.
This was discussed at DConf 2014. https://www.youtube.com/watch?v=TNvUIWFy02I#t=1008 There is currently a pull request to remove the monitor from object field from object and therefore all classes: https://github.com/D-Programming-Language/druntime/pull/789. Mike
Aug 16 2014
parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 16 Aug 2014 10:36:19 +0000
schrieb "Mike" <none none.com>:

 On Saturday, 16 August 2014 at 09:29:14 UTC, Johannes Pfau wrote:
 
 I just had a look at this and ClassInfo has a mutable 'monitor' 
 field,
 so it can't be placed into read-only data.
This was discussed at DConf 2014. https://www.youtube.com/watch?v=TNvUIWFy02I#t=1008 There is currently a pull request to remove the monitor from object field from object and therefore all classes: https://github.com/D-Programming-Language/druntime/pull/789. Mike
Great! But I think this pull request addresses a different monitor problem: There's an implicit __monitor field in every class right now, which makes every class _instance_ bigger. But the monitor in TypeInfo/ClassInfo is different: ClassInfo exists only once per class, it doesn't matter how many class instances you've got. AFAIR this monitor is to support synchronize(ClassType) which synchronizes on the class type, not on an instance.
Aug 17 2014
parent reply "Mike" <none none.com> writes:
On Sunday, 17 August 2014 at 08:26:40 UTC, Johannes Pfau wrote:
 Great! But I think this pull request addresses a different 
 monitor
 problem: There's an implicit __monitor field in every class 
 right now,
 which makes every class _instance_ bigger.

 But the monitor in TypeInfo/ClassInfo is different: ClassInfo 
 exists
 only once per class, it doesn't matter how many class instances 
 you've
 got. AFAIR this monitor is to support synchronize(ClassType) 
 which
 synchronizes on the class type, not on an instance.
I looked through the source code, and couldn't find any such monitor. Can you please point it out for me? Thanks, Mike
Aug 17 2014
parent Johannes Pfau <nospam example.com> writes:
Am Sun, 17 Aug 2014 10:44:34 +0000
schrieb "Mike" <none none.com>:

 On Sunday, 17 August 2014 at 08:26:40 UTC, Johannes Pfau wrote:
 Great! But I think this pull request addresses a different 
 monitor
 problem: There's an implicit __monitor field in every class 
 right now,
 which makes every class _instance_ bigger.

 But the monitor in TypeInfo/ClassInfo is different: ClassInfo 
 exists
 only once per class, it doesn't matter how many class instances 
 you've
 got. AFAIR this monitor is to support synchronize(ClassType) 
 which
 synchronizes on the class type, not on an instance.
I looked through the source code, and couldn't find any such monitor. Can you please point it out for me? Thanks, Mike
In gcc/d/d-objfile.cc: Search for /* Put out the ClassInfo. * The layout is: * void **vptr; * monitor_t monitor; * byte[] initializer; // static initialisation data Actually I just realized that this is also true for all TypeInfo, so I'll have to revert the commit which placed TypeInfo into .rodata (Thinking more about it, it's more or less the same monitor as the one referred in the pull request: TypeInfo are classes and for every type there is one instance, then these instances have __monitor fields. But the implementation in the compiler is slightly different)
Aug 17 2014
prev sibling parent reply "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/16/14 09:33, Johannes Pfau via D.gnu wrote:
 https://github.com/D-Programming-GDC/GDC/pull/82
[Only noticed this accidentally; using a mailing list instead of some web forum would increase visibility...]
  enum var = Volatile!(T,addr)(): doesn't allow |= on enum literals, even if
the type implements opAssign as there's no this pointer
T volatile_load(T)(ref T v) nothrow { asm { "" : "+m" v; } T res = v; asm { "" : "+g" res; } return res; } void volatile_store(T)(ref T v, const T a) nothrow { asm { "" : : "m" v; } v = a; asm { "" : "+m" v; } } struct Volatile(T, alias /* T* */ A) { void opOpAssign(string OP)(const T rhs) nothrow { auto v = volatile_load(*A); mixin("v " ~ OP ~ "= rhs;"); volatile_store(*A, v); } } enum TimerB = Volatile!(uint, cast(uint*)0xDEADBEEF)(); void main() { TimerB |= 0b1; TimerB += 1; }
 not emitting force-inlined functions is a logical optimization for forceinline
(if a function is always inlined, there's no way to call it, so there's no need
to output it).
Taking the address of an always_inline function is allowed. artur
Aug 16 2014
next sibling parent reply "Mike" <none none.com> writes:
On Saturday, 16 August 2014 at 09:59:03 UTC, Artur Skawina via 
D.gnu wrote:
 Taking the address of an always_inline function is allowed.
It may be allowed, but it probably shouldn't be. Always-inlining a function and taking the address of that function is contradictory. But this situation demonstrates why having an intelligent linker is a better solution than decorating with attributes. The linker should know if you took an address of an always-inlined function or not and decide whether or not to remove it from the binary. Mike
Aug 16 2014
parent reply "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/16/14 12:41, Mike via D.gnu wrote:
 On Saturday, 16 August 2014 at 09:59:03 UTC, Artur Skawina via D.gnu wrote:
 Taking the address of an always_inline function is allowed.
It may be allowed, but it probably shouldn't be. Always-inlining a function and taking the address of that function is contradictory.
Address-of should work -- disallowing it wouldn't help much, but would create problems for code that needs to call the function both directly and indirectly. This is actually a larger problem for D than for C (where it's allowed) because of generic code, templates and delegates. The alternative would be requiring trivial not- inline wrappers and compile failures if one is accidentally forgotten. A ` nocode` attribute would be a good idea, yes, but there's no need to make it implicit for ` inline`.
 But this situation demonstrates why having an intelligent linker is a better
solution than decorating with attributes.  The linker should know if you took
an address of an always-inlined function or not and decide whether or not to
remove it from the binary.
It already does. Apparently there are some kind of problems with certain setups, but, instead of addressing those problems, more and more /language/ hacks are proposed... artur
Aug 16 2014
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 16 Aug 2014 13:15:57 +0200
schrieb "Artur Skawina via D.gnu" <d.gnu puremagic.com>:

 On 08/16/14 12:41, Mike via D.gnu wrote:
 On Saturday, 16 August 2014 at 09:59:03 UTC, Artur Skawina via
 D.gnu wrote:
 Taking the address of an always_inline function is allowed.
It may be allowed, but it probably shouldn't be. Always-inlining a function and taking the address of that function is contradictory.
Address-of should work -- disallowing it wouldn't help much, but would create problems for code that needs to call the function both directly and indirectly. This is actually a larger problem for D than for C (where it's allowed) because of generic code, templates and delegates. The alternative would be requiring trivial not- inline wrappers and compile failures if one is accidentally forgotten. A ` nocode` attribute would be a good idea, yes, but there's no need to make it implicit for ` inline`.
We can make this explicit. I don't care enough to argue about that.
 But this situation demonstrates why having an intelligent linker is
 a better solution than decorating with attributes.  The linker
 should know if you took an address of an always-inlined function or
 not and decide whether or not to remove it from the binary.
It already does. Apparently there are some kind of problems with certain setups, but, instead of addressing those problems, more and more /language/ hacks are proposed... artur
So as you know all these problems and you know exactly how to fix them, where's your contribution?
Aug 17 2014
parent reply "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/17/14 10:31, Johannes Pfau via D.gnu wrote:
 Am Sat, 16 Aug 2014 13:15:57 +0200
 schrieb "Artur Skawina via D.gnu" <d.gnu puremagic.com>:
 
 It already does. Apparently there are some kind of problems with
 certain setups, but, instead of addressing those problems, more and
 more /language/ hacks are proposed...
 So as you know all these problems and you know exactly how to fix them,
 where's your contribution?
*I* haven't encountered any problems and have been using functions+ data+gc-sections for years... artur
Aug 17 2014
parent reply Johannes Pfau <nospam example.com> writes:
Am Sun, 17 Aug 2014 13:38:36 +0200
schrieb "Artur Skawina via D.gnu" <d.gnu puremagic.com>:

 On 08/17/14 10:31, Johannes Pfau via D.gnu wrote:
 Am Sat, 16 Aug 2014 13:15:57 +0200
 schrieb "Artur Skawina via D.gnu" <d.gnu puremagic.com>:
 
 It already does. Apparently there are some kind of problems with
 certain setups, but, instead of addressing those problems, more and
 more /language/ hacks are proposed...
 So as you know all these problems and you know exactly how to fix
 them, where's your contribution?
*I* haven't encountered any problems and have been using functions+ data+gc-sections for years...
Then I don't understand your statement at all. You said 'instead of addressing those problems' but there are no problems? Also what exactly are 'more /language/ hacks'?
Aug 17 2014
parent "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/17/14 13:57, Johannes Pfau via D.gnu wrote:
 Am Sun, 17 Aug 2014 13:38:36 +0200
 schrieb "Artur Skawina via D.gnu" <d.gnu puremagic.com>:
 
 On 08/17/14 10:31, Johannes Pfau via D.gnu wrote:
 Am Sat, 16 Aug 2014 13:15:57 +0200
 schrieb "Artur Skawina via D.gnu" <d.gnu puremagic.com>:

 It already does. Apparently there are some kind of problems with
 certain setups, but, instead of addressing those problems, more and
 more /language/ hacks are proposed...
 So as you know all these problems and you know exactly how to fix
 them, where's your contribution?
*I* haven't encountered any problems and have been using functions+ data+gc-sections for years...
Then I don't understand your statement at all. You said 'instead of addressing those problems' but there are no problems?
I don't know - it wasn't me who proposed: - attribute("noinit") - attribute("notypeinfo") - attribute("nocode") - pragma(GNU_nomoduleinfo) etc
 Also what exactly are 'more /language/ hacks'?
The above, volatile attribute etc. Note that I agree (some of) those are necessary -- it's just that they are all useful for certain very specific cases -- they are not a general solution to the codegen bloat problem. A situation where practically every declaration and almost every scope in a D program needs to be annotated with compiler- -specific non-portable annotations is not a good one. And not even a practical one -- it not reasonable to expect everyone to modify the source of every used library (!) to match the requirements of every project (some may need RTTI, other may not want it at all, etc). artur
Aug 17 2014
prev sibling parent "Mike" <none none.com> writes:
On Saturday, 16 August 2014 at 11:16:09 UTC, Artur Skawina via
D.gnu wrote:

 A ` nocode` attribute would be a good idea, yes, but there's no 
 need
 to make it implicit for ` inline`.

 But this situation demonstrates why having an intelligent 
 linker is a better solution than decorating with attributes.  
 The linker should know if you took an address of an 
 always-inlined function or not and decide whether or not to 
 remove it from the binary.
It already does. Apparently there are some kind of problems with certain setups, but, instead of addressing those problems, more and more /language/ hacks are proposed...
Do you mean the problems with --gc-sections breaking code? Mike
Aug 17 2014
prev sibling next sibling parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Saturday, 16 August 2014 at 09:59:03 UTC, Artur Skawina via 
D.gnu wrote:
 On 08/16/14 09:33, Johannes Pfau via D.gnu wrote:
 https://github.com/D-Programming-GDC/GDC/pull/82
[Only noticed this accidentally; using a mailing list instead of some web forum would increase visibility...]
  enum var = Volatile!(T,addr)(): doesn't allow |= on enum 
 literals, even if the type implements opAssign as there's no 
 this pointer
T volatile_load(T)(ref T v) nothrow { asm { "" : "+m" v; } T res = v; asm { "" : "+g" res; } return res; } void volatile_store(T)(ref T v, const T a) nothrow { asm { "" : : "m" v; } v = a; asm { "" : "+m" v; } } struct Volatile(T, alias /* T* */ A) { void opOpAssign(string OP)(const T rhs) nothrow { auto v = volatile_load(*A); mixin("v " ~ OP ~ "= rhs;"); volatile_store(*A, v); } } enum TimerB = Volatile!(uint, cast(uint*)0xDEADBEEF)(); void main() { TimerB |= 0b1; TimerB += 1; }
 not emitting force-inlined functions is a logical optimization 
 for forceinline (if a function is always inlined, there's no 
 way to call it, so there's no need to output it).
Taking the address of an always_inline function is allowed. artur
This seems to work. I am not so familiar with these opAssign things, so how I can do basic assignment: TimerB = 0x1234 ? How can I use this with struct members ? Is it possible to inline volatile_load and volatile_store ?
Aug 16 2014
next sibling parent "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/16/14 18:46, Timo Sintonen via D.gnu wrote:
 
 I am not so familiar with these opAssign things, so how I can do basic
assignment: TimerB = 0x1234  ?
 Is it possible to inline volatile_load and volatile_store ?
version (GNU) { static import gcc.attribute; enum inline = gcc.attribute.attribute("forceinline"); } extern int volatile_dummy; inline T volatile_load(T)(ref T v) nothrow { asm { "" : "+m" v, "+m" volatile_dummy; } T res = v; asm { "" : "+g" res, "+m" volatile_dummy; } return res; } inline void volatile_store(T, A)(ref T v, A a) nothrow { asm { "" : "+m" volatile_dummy : "m" v; } v = a; asm { "" : "+m" v, "+m" volatile_dummy; } } static struct Volatile(T, alias PTR) { static: nothrow: inline: void opOpAssign(string OP)(const T rhs) { auto v = volatile_load(*PTR); mixin("v " ~ OP ~ "= rhs;"); volatile_store(*PTR, v); } void opAssign()(const T rhs) { volatile_store(*PTR, rhs); } T opUnary(string OP:"*")() { return volatile_load(*PTR); } } enum TimerB = Volatile!(uint, cast(uint*)0xDEADBEEF)(); int main() { TimerB |= 0b1; TimerB += 1; TimerB = 42; return *TimerB; }
 How can I use this with struct members ?
One possibility would be to declare all members as `Volatile!...`, or even create such a struct at CT. Another solution would be something like http://forum.dlang.org/post/mailman.4237.1405540813.2907.digital ars-d puremagic.com . artur
Aug 16 2014
prev sibling parent reply "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/16/14 20:40, Artur Skawina wrote:
 How can I use this with struct members ?
One possibility would be to declare all members as `Volatile!...`, or
I did not like that required dereference in the previous version, and tried a different approach: struct Timer { Volatile!uint control; Volatile!uint data; } enum timerA = cast(Timer*)0xDEADBEAF; int main() { timerA.control |= 0b1; timerA.control += 1; timerA.control = 42; int a = timerA.data - timerA.data; int b = timerA.control; return timerA.control; } version (GNU) { static import gcc.attribute; enum inline = gcc.attribute.attribute("forceinline"); } extern int volatile_dummy; inline T volatile_load(T)(ref T v) nothrow { asm { "" : "+m" v, "+m" volatile_dummy; } T res = v; asm { "" : "+g" res, "+m" v, "+m" volatile_dummy; } return res; } inline void volatile_store(T, A)(ref T v, A a) nothrow { asm { "" : "+m" volatile_dummy : "m" v; } v = a; asm { "" : "+m" v, "+m" volatile_dummy; } } struct Volatile(T) { T raw; nothrow: inline: disable this(this); void opAssign(A)(A a) { volatile_store(raw, a); } T load() property { return volatile_load(raw); } alias load this; void opOpAssign(string OP)(const T rhs) { auto v = volatile_load(raw); mixin("v " ~ OP ~ "= rhs;"); volatile_store(raw, v); } } artur
Aug 16 2014
parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Saturday, 16 August 2014 at 20:01:06 UTC, Artur Skawina via
D.gnu wrote:
 On 08/16/14 20:40, Artur Skawina wrote:
 How can I use this with struct members ?
One possibility would be to declare all members as `Volatile!...`, or
I did not like that required dereference in the previous version, and tried a different approach: struct Timer { Volatile!uint control; Volatile!uint data; } enum timerA = cast(Timer*)0xDEADBEAF; int main() { timerA.control |= 0b1; timerA.control += 1; timerA.control = 42; int a = timerA.data - timerA.data; int b = timerA.control; return timerA.control; } version (GNU) { static import gcc.attribute; enum inline = gcc.attribute.attribute("forceinline"); } extern int volatile_dummy; inline T volatile_load(T)(ref T v) nothrow { asm { "" : "+m" v, "+m" volatile_dummy; } T res = v; asm { "" : "+g" res, "+m" v, "+m" volatile_dummy; } return res; } inline void volatile_store(T, A)(ref T v, A a) nothrow { asm { "" : "+m" volatile_dummy : "m" v; } v = a; asm { "" : "+m" v, "+m" volatile_dummy; } } struct Volatile(T) { T raw; nothrow: inline: disable this(this); void opAssign(A)(A a) { volatile_store(raw, a); } T load() property { return volatile_load(raw); } alias load this; void opOpAssign(string OP)(const T rhs) { auto v = volatile_load(raw); mixin("v " ~ OP ~ "= rhs;"); volatile_store(raw, v); } } artur
This seems to work. With inlining the code is quite compact. Not tested yet but the code for these constructs looks correct: for (f=0;f<50;f++) { regs.txreg = śomebuf[f] } while (regs.status == 0) {} What is the purpose of volatile_dummy? Even if it is not used, the address for it is calculated in several places. The struct members are defined saparately. This means the address of every member is stored and fetched separately. The compiler seems to remove some of these and use the pointer, but I am not sure what happens when the structs are bigger. It seems all loads and stores access the real memory, like volatile should do. It is hard to follow the optimized code so I am not yet sure that they have not been reordered in any way. Anyway, this seems acceptable solution to me. Johannes, is this good starting point to you or is your work with compiler builtins giving us some more?
Aug 17 2014
next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sun, 17 Aug 2014 07:57:15 +0000
schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Saturday, 16 August 2014 at 20:01:06 UTC, Artur Skawina via
 D.gnu wrote:
 On 08/16/14 20:40, Artur Skawina wrote:
 How can I use this with struct members ?
=20 One possibility would be to declare all members as=20 `Volatile!...`, or
I did not like that required dereference in the previous=20 version, and tried a different approach: struct Timer { Volatile!uint control; Volatile!uint data; } enum timerA =3D cast(Timer*)0xDEADBEAF; int main() { timerA.control |=3D 0b1; timerA.control +=3D 1; timerA.control =3D 42; int a =3D timerA.data - timerA.data; int b =3D timerA.control; return timerA.control; } version (GNU) { static import gcc.attribute; enum inline =3D gcc.attribute.attribute("forceinline"); } =20 extern int volatile_dummy; =20 inline T volatile_load(T)(ref T v) nothrow { asm { "" : "+m" v, "+m" volatile_dummy; } T res =3D v; asm { "" : "+g" res, "+m" v, "+m" volatile_dummy; } return res; } inline void volatile_store(T, A)(ref T v, A a) nothrow { asm { "" : "+m" volatile_dummy : "m" v; } v =3D a; asm { "" : "+m" v, "+m" volatile_dummy; } } =20 struct Volatile(T) { T raw; nothrow: inline: disable this(this); void opAssign(A)(A a) { volatile_store(raw, a); } T load() property { return volatile_load(raw); } alias load this; void opOpAssign(string OP)(const T rhs) { auto v =3D volatile_load(raw); mixin("v " ~ OP ~ "=3D rhs;"); volatile_store(raw, v); } } artur
=20 This seems to work. With inlining the code is quite compact. =20 Not tested yet but the code for these constructs looks correct: for (f=3D0;f<50;f++) { regs.txreg =3D =C5=9Bomebuf[f] } while (regs.status =3D=3D 0) {} =20 What is the purpose of volatile_dummy? Even if it is not used, the address for it is calculated in several places. =20 The struct members are defined saparately. This means the address of every member is stored and fetched separately. The compiler seems to remove some of these and use the pointer, but I am not sure what happens when the structs are bigger. =20 It seems all loads and stores access the real memory, like volatile should do. It is hard to follow the optimized code so I am not yet sure that they have not been reordered in any way. =20 Anyway, this seems acceptable solution to me. =20 Johannes, is this good starting point to you or is your work with compiler builtins giving us some more?
You mean __builtin_volatile_load/store? I'm not sure if compiler barriers and these builtins are 100% equal, I think I managed to produce example code where the barriers didn't work 100% as expected. But these builtins will need to be introduced anyway as core.bitop.volatileLoad or whatever final name the DMD devs decide on. Regarding nocode/typeinfo/noinit/GNU_nomoduleinfo: I think these are useful anyway. The linker can strip these out, but I don't want to rely on the linker and on the user to know all special linker flags only to avoid some binary bloat which can be avoided in the compiler. But overall this approach looks fine.
Aug 17 2014
prev sibling next sibling parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Sunday, 17 August 2014 at 07:57:15 UTC, Timo Sintonen wrote:
 This seems to work.
This does not work with member functions struct uartreg { Volatile!int sr; Volatile!int dr; Volatile!int brr; Volatile!int cr1; Volatile!int cr2; Volatile!int cr3; Volatile!int gtpr; // send a byte to the uart void send(int t) { while ((sr&0x80)==0) { } dr=t; } } In this function the fetch of sr is omitted but compare is still made against an invalid register value. Then address of dr is omitted and store is made from wrong register to invalid address. So the generated code is totally invalid. If I move this function out of the struct then it is ok. I use -O2, not tested what it woud do without optimization. Also if I have: cr1=cr2=0; I get: expression this.cr2.opAssign(0) is void and has no value
Aug 17 2014
parent reply "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/17/14 11:24, Timo Sintonen via D.gnu wrote:
 On Sunday, 17 August 2014 at 07:57:15 UTC, Timo Sintonen wrote:
 This seems to work.
This does not work with member functions struct uartreg { Volatile!int sr; Volatile!int dr; Volatile!int brr; Volatile!int cr1; Volatile!int cr2; Volatile!int cr3; Volatile!int gtpr; // send a byte to the uart void send(int t) { while ((sr&0x80)==0) { } dr=t; } } In this function the fetch of sr is omitted but compare is still made against an invalid register value. Then address of dr is omitted and store is made from wrong register to invalid address. So the generated code is totally invalid. If I move this function out of the struct then it is ok. I use -O2, not tested what it woud do without optimization.
It works for me: import volat; // module w/ the last Volatile(T) implementation. struct uartreg { Volatile!int sr; Volatile!int dr; Volatile!int brr; Volatile!int cr1; Volatile!int cr2; Volatile!int cr3; Volatile!int gtpr; // send a byte to the uart void send(int t) { while ((sr&0x80)==0) { } dr=t; } } enum uart = cast(uartreg*)0xDEADBEAF; void main() { uart.send(42); } => 0000000000403620 <_Dmain>: 403620: b8 af be ad de mov $0xdeadbeaf,%eax 403625: 0f 1f 00 nopl (%rax) 403628: b9 af be ad de mov $0xdeadbeaf,%ecx 40362d: 8b 11 mov (%rcx),%edx 40362f: 81 e2 80 00 00 00 and $0x80,%edx 403635: 74 f1 je 403628 <_Dmain+0x8> 403637: bf b3 be ad de mov $0xdeadbeb3,%edi 40363c: 31 c0 xor %eax,%eax 40363e: c7 07 2a 00 00 00 movl $0x2a,(%rdi) 403644: c3 retq Except for some obviously missed optimizations (dead eax load, unnecessary ecx reload), the code seems fine. What platform are you using and what does the emitted code look like?
 Also if I have:
 cr1=cr2=0;
 I get: expression this.cr2.opAssign(0) is void and has no value
That's because the opAssign returns void, which prevents this kind of chaining. This was a deliberate choice, as I /wanted/ to disallow that; it's already a bad idea for normal assignments; for volatile ones, which can require a specific order, it's an even worse one. But it's trivial to "fix", just change void opAssign(A)(A a) { volatile_store(raw, a); } to T opAssign(A)(A a) { volatile_store(raw, a); return a; } artur
Aug 17 2014
parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Sunday, 17 August 2014 at 11:35:33 UTC, Artur Skawina via 
D.gnu wrote:

 It works for me:

    import volat; // module w/ the last Volatile(T) 
 implementation.

    struct uartreg {
        Volatile!int sr;
        Volatile!int dr;
        Volatile!int brr;
        Volatile!int cr1;
        Volatile!int cr2;
        Volatile!int cr3;
        Volatile!int gtpr;

        // send a byte to the uart
        void send(int t) {
          while ((sr&0x80)==0)
          {  }
          dr=t;
        }
    }

    enum uart = cast(uartreg*)0xDEADBEAF;

    void main() {
       uart.send(42);
    }

 =>

 0000000000403620 <_Dmain>:
   403620:       b8 af be ad de          mov    $0xdeadbeaf,%eax
   403625:       0f 1f 00                nopl   (%rax)
   403628:       b9 af be ad de          mov    $0xdeadbeaf,%ecx
   40362d:       8b 11                   mov    (%rcx),%edx
   40362f:       81 e2 80 00 00 00       and    $0x80,%edx
   403635:       74 f1                   je     403628 
 <_Dmain+0x8>
   403637:       bf b3 be ad de          mov    $0xdeadbeb3,%edi
   40363c:       31 c0                   xor    %eax,%eax
   40363e:       c7 07 2a 00 00 00       movl   $0x2a,(%rdi)
   403644:       c3                      retq

 Except for some obviously missed optimizations (dead eax load,
 unnecessary ecx reload), the code seems fine. What platform
 are you using and what does the emitted code look like?

 Also if I have:
 cr1=cr2=0;
 I get: expression this.cr2.opAssign(0) is void and has no value
That's because the opAssign returns void, which prevents this kind of chaining. This was a deliberate choice, as I /wanted/ to disallow that; it's already a bad idea for normal assignments; for volatile ones, which can require a specific order, it's an even worse one. But it's trivial to "fix", just change void opAssign(A)(A a) { volatile_store(raw, a); } to T opAssign(A)(A a) { volatile_store(raw, a); return a; } artur
I am compiling for arm and I am sorry I misinterpreted the optimized code. Actually the code is correct but it still does not work. The problem is that the call to get the tls pointer for volatile_dummy seems to corrupt the register (r3) where the this pointer is. The call is inside the while loop. After removing tha call by hand in the assembly everything works. R3 is usually pushed into stack when it is used in a function. I have to check what is wrong in this case.
Aug 17 2014
parent reply "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/17/14 15:44, Timo Sintonen via D.gnu wrote:

 I am compiling for arm and I am sorry I misinterpreted the optimized code.
Actually the code is correct but it still does not work.
 The problem is that the call to get the tls pointer for volatile_dummy seems
to corrupt the register (r3) where the this pointer is. The call is inside the
while loop.  After removing tha call by hand in the assembly everything works.
R3 is usually pushed into stack when it is used in a function. I have to check
what is wrong in this case.
Does declaring it as: extern __gshared int volatile_dummy; help? artur
Aug 17 2014
parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Sunday, 17 August 2014 at 13:59:03 UTC, Artur Skawina via 
D.gnu wrote:
 On 08/17/14 15:44, Timo Sintonen via D.gnu wrote:

 I am compiling for arm and I am sorry I misinterpreted the 
 optimized code. Actually the code is correct but it still does 
 not work.
 The problem is that the call to get the tls pointer for 
 volatile_dummy seems to corrupt the register (r3) where the 
 this pointer is. The call is inside the while loop.  After 
 removing tha call by hand in the assembly everything works. R3 
 is usually pushed into stack when it is used in a function. I 
 have to check what is wrong in this case.
Does declaring it as: extern __gshared int volatile_dummy; help? artur
Yes, now it works. But the register corruption is still an issue. My tls function clearly uses r3 and does not save it. Johannes, do you know the arm calling system? Is it caller or callee that should save r3? In this case it is my function that has one function inlined that has another function inlined that contains a compiler generated function call. Could this be a bug in the compiler that it does not recognize the innermost call and does not save registers?
Aug 17 2014
parent reply Johannes Pfau <nospam example.com> writes:
Am Sun, 17 Aug 2014 14:36:53 +0000
schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Sunday, 17 August 2014 at 13:59:03 UTC, Artur Skawina via 
 D.gnu wrote:
 On 08/17/14 15:44, Timo Sintonen via D.gnu wrote:

 I am compiling for arm and I am sorry I misinterpreted the 
 optimized code. Actually the code is correct but it still does 
 not work.
 The problem is that the call to get the tls pointer for 
 volatile_dummy seems to corrupt the register (r3) where the 
 this pointer is. The call is inside the while loop.  After 
 removing tha call by hand in the assembly everything works. R3 
 is usually pushed into stack when it is used in a function. I 
 have to check what is wrong in this case.
Does declaring it as: extern __gshared int volatile_dummy; help? artur
Yes, now it works. But the register corruption is still an issue. My tls function clearly uses r3 and does not save it. Johannes, do you know the arm calling system? Is it caller or callee that should save r3? In this case it is my function that has one function inlined that has another function inlined that contains a compiler generated function call. Could this be a bug in the compiler that it does not recognize the innermost call and does not save registers?
r3 is an argument/scratch register, the callee can't rely on its contents after a function call. This could also be caused by the inline ASM.
Aug 17 2014
next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sun, 17 Aug 2014 16:45:15 +0200
schrieb Johannes Pfau <nospam example.com>:

 the callee can't rely on its
caller of course ;-)
Aug 17 2014
prev sibling next sibling parent "Timo Sintonen" <t.sintonen luukku.com> writes:
On Sunday, 17 August 2014 at 14:47:57 UTC, Johannes Pfau wrote:
 Am Sun, 17 Aug 2014 14:36:53 +0000
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Sunday, 17 August 2014 at 13:59:03 UTC, Artur Skawina via 
 D.gnu wrote:
 On 08/17/14 15:44, Timo Sintonen via D.gnu wrote:

 I am compiling for arm and I am sorry I misinterpreted the 
 optimized code. Actually the code is correct but it still 
 does not work.
 The problem is that the call to get the tls pointer for 
 volatile_dummy seems to corrupt the register (r3) where the 
 this pointer is. The call is inside the while loop.  After 
 removing tha call by hand in the assembly everything works. 
 R3 is usually pushed into stack when it is used in a 
 function. I have to check what is wrong in this case.
Does declaring it as: extern __gshared int volatile_dummy; help? artur
Yes, now it works. But the register corruption is still an issue. My tls function clearly uses r3 and does not save it. Johannes, do you know the arm calling system? Is it caller or callee that should save r3? In this case it is my function that has one function inlined that has another function inlined that contains a compiler generated function call. Could this be a bug in the compiler that it does not recognize the innermost call and does not save registers?
r3 is an argument/scratch register, the callee can't rely on its contents after a function call. This could also be caused by the inline ASM.
So is this a bug or just undefined behavior?
Aug 17 2014
prev sibling parent "Timo Sintonen" <t.sintonen luukku.com> writes:
On Sunday, 17 August 2014 at 14:47:57 UTC, Johannes Pfau wrote:
 Am Sun, 17 Aug 2014 14:36:53 +0000
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:
 
 But the register corruption is still an issue. My tls function 
 clearly uses r3 and does not save it.
 
 Johannes, do you know the arm calling system? Is it caller or 
 callee that should save r3?
 In this case it is my function that has one function inlined 
 that has another function inlined that contains a compiler 
 generated function call. Could this be a bug in the compiler 
 that it does not recognize the innermost call and does not 
 save registers?
r3 is an argument/scratch register, the caller can't rely on its contents after a function call. This could also be caused by the inline ASM.
I have had some weird bugs lately and then I looked my other object files. I think there is a bug because I found more like this: This is a class function (actually a constructor) that writes constant values into two variables, one is a static class variable in tls an the other is an instance variable 27 0000 10B5 push {r4, lr} 28 0002 0346 mov r3, r0 29 0004 FFF7FEFF bl __aeabi_read_tp load_tp_soft 30 0008 034A ldr r2, .L3 33 0010 1150 str r1, [r2, r0] 35 0014 1846 mov r0, r3 36 0016 10BD pop {r4, pc} 37 .L4: 38 .align 2 39 .L3: 40 0018 00000000 .word .LANCHOR0(tpoff) 41 In line 28 the this pointer is saved to r3, then the call in line 29 returns the tls start address in r0. __aeabi_read_tp uses r3 to fetch the address so r3 is corrupted R3 is used in 34 to store to address this+8 and then r3 is moved back to r0 returning incorrect value for this. Is this a gdc or gcc bug?
Aug 18 2014
prev sibling parent "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/17/14 09:57, Timo Sintonen via D.gnu wrote:

 What is the purpose of volatile_dummy? Even if it is not used,
Ensuring ordering, w/o it the compiler could reorder operations on different volatile objects. (Which isn't necessarily a bad thing, but people expect certain semantics of 'volatile', so it would be a bad and dangerous default)
 the address for it is calculated in several places.
It's completely optimized away for me (I'm testing on x86). Can you show the emitted code?
 The struct members are defined saparately. This means the address
 of every member is stored and fetched separately. The compiler
 seems to remove some of these and use the pointer, but I am not
 sure what happens when the structs are bigger.
Yes, the compiler does not to generate optimal code, but so far I've only seen dead immediate-constant->register loads; so it's not a huge problem. artur
Aug 17 2014
prev sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 16 Aug 2014 11:58:49 +0200
schrieb "Artur Skawina via D.gnu" <d.gnu puremagic.com>:

 On 08/16/14 09:33, Johannes Pfau via D.gnu wrote:
 https://github.com/D-Programming-GDC/GDC/pull/82
[Only noticed this accidentally; using a mailing list instead of some web forum would increase visibility...]
  enum var = Volatile!(T,addr)(): doesn't allow |= on enum literals,
 even if the type implements opAssign as there's no this pointer
T volatile_load(T)(ref T v) nothrow { asm { "" : "+m" v; } T res = v; asm { "" : "+g" res; } return res; } void volatile_store(T)(ref T v, const T a) nothrow { asm { "" : : "m" v; } v = a; asm { "" : "+m" v; } } struct Volatile(T, alias /* T* */ A) { void opOpAssign(string OP)(const T rhs) nothrow { auto v = volatile_load(*A); mixin("v " ~ OP ~ "= rhs;"); volatile_store(*A, v); } } enum TimerB = Volatile!(uint, cast(uint*)0xDEADBEEF)(); void main() { TimerB |= 0b1; TimerB += 1; }
That's a good start. Can you also get unary operators working? e.g TimerB++; Do you think it's possible to combine this with the other solution you posted for struct fields? Or do we need separate Volatile!T and VolatileField!T types?
Aug 17 2014
parent reply "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/17/14 10:49, Johannes Pfau via D.gnu wrote:

 That's a good start. Can you also get unary operators working?
 e.g
 TimerB++;
Unary ops are easy. If you mean post-inc and post-dec -- that's a language problem. At least for volatile, they will cause a compile error; for atomic ops the naive `post-op->tmp-load+op+tmp` rewrite can introduce bugs... D would need to make the post-ops overloadable to get rid of these issues.
 Do you think it's possible to combine this with the other solution you
 posted for struct fields? Or do we need separate Volatile!T and
 VolatileField!T types?
Right now, I'd prefer this approach: -------------------------------------------------------------- module volat; version (GNU) { static import gcc.attribute; enum inline = gcc.attribute.attribute("forceinline"); } extern int volatile_dummy; inline T volatile_load(T)(ref T v) nothrow { asm { "" : "+m" v, "+m" volatile_dummy; } T res = v; asm { "" : "+g" res, "+m" v, "+m" volatile_dummy; } return res; } inline void volatile_store(T, A)(ref T v, A a) nothrow { asm { "" : "+m" volatile_dummy : "m" v; } v = a; asm { "" : "+m" v, "+m" volatile_dummy; } } inline void volatile_barrier(T)(ref T v) nothrow { asm { "" : "+m" v, "+m" volatile_dummy; } } struct Volatile(T) { T raw; nothrow: inline: disable this(this); void opAssign(A)(A a) { volatile_store(raw, a); } T load() property { return volatile_load(raw); } alias load this; void opOpAssign(string OP)(const T b) { volatile_barrier(raw); mixin("raw " ~ OP ~ "= b;"); volatile_barrier(raw); } T opUnary(string OP)() { volatile_barrier(raw); auto result = mixin(OP ~ "raw"); volatile_barrier(raw); return result; } } -------------------------------------------------------------- import volat; struct Timer { Volatile!uint control; Volatile!uint data; } enum timerA = cast(Timer*)0xDEADBEAF; int main() { timerA.control |= 0b1; timerA.control += 1; timerA.control = 42; int a = timerA.data - timerA.data; int b = ++timerA.control; --timerA.data; timerA.control /= 2; return b; } -------------------------------------------------------------- compiles to: -------------------------------------------------------------- 0000000000403620 <_Dmain>: 403620: ba af be ad de mov $0xdeadbeaf,%edx 403625: b9 b3 be ad de mov $0xdeadbeb3,%ecx 40362a: 83 0a 01 orl $0x1,(%rdx) 40362d: 83 02 01 addl $0x1,(%rdx) 403630: c7 02 2a 00 00 00 movl $0x2a,(%rdx) 403636: 8b 42 04 mov 0x4(%rdx),%eax 403639: 8b 72 04 mov 0x4(%rdx),%esi 40363c: 8b 02 mov (%rdx),%eax 40363e: 83 c0 01 add $0x1,%eax 403641: 89 02 mov %eax,(%rdx) 403643: 83 6a 04 01 subl $0x1,0x4(%rdx) 403647: d1 2a shrl (%rdx) 403649: c3 retq -------------------------------------------------------------- Do you see any problems with it? (Other than gcc not removing that dead constant load) [The struct-with-volatile-fields can be built from a "normal" struct at CT. But that's just syntax sugar.] artur
Aug 17 2014
parent reply Johannes Pfau <nospam example.com> writes:
Am Sun, 17 Aug 2014 15:15:12 +0200
schrieb "Artur Skawina via D.gnu" <d.gnu puremagic.com>:

 Do you see any problems with it? (Other than gcc not removing
 that dead constant load)
It's perfect for structs, but when simply declaring a Volatile!uint the pointer dereference must be done manually, right? ---- enum TimerB = cast(Volatile!(uint)*)0xDEADBEEF; *TimerB |= 0b1; ---- I don't think that a huge problem though, just a little bit inconvenient.
Aug 17 2014
parent "Artur Skawina via D.gnu" <d.gnu puremagic.com> writes:
On 08/17/14 16:16, Johannes Pfau via D.gnu wrote:
 Am Sun, 17 Aug 2014 15:15:12 +0200
 schrieb "Artur Skawina via D.gnu" <d.gnu puremagic.com>:
 
 Do you see any problems with it? (Other than gcc not removing
 that dead constant load)
It's perfect for structs, but when simply declaring a Volatile!uint the pointer dereference must be done manually, right? ---- enum TimerB = cast(Volatile!(uint)*)0xDEADBEEF; *TimerB |= 0b1; ---- I don't think that a huge problem though, just a little bit inconvenient.
Another D-problem - the language doesn't have /real/ refs. But... import volat; inline ref property timerA() { return *cast(Volatile!uint*)0xDEADBEAF; } int main() { timerA |= 0b1; timerA += 1; timerA = 42; int a = timerA - timerA; int b = ++timerA; --timerA; timerA /= 2; return b; } => 0000000000403620 <_Dmain>: 403620: ba af be ad de mov $0xdeadbeaf,%edx 403625: 83 0a 01 orl $0x1,(%rdx) 403628: 83 02 01 addl $0x1,(%rdx) 40362b: c7 02 2a 00 00 00 movl $0x2a,(%rdx) 403631: 8b 02 mov (%rdx),%eax 403633: 8b 0a mov (%rdx),%ecx 403635: 8b 02 mov (%rdx),%eax 403637: 83 c0 01 add $0x1,%eax 40363a: 89 02 mov %eax,(%rdx) 40363c: 83 2a 01 subl $0x1,(%rdx) 40363f: d1 2a shrl (%rdx) 403641: c3 retq artur
Aug 17 2014