www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - ARM bare-metal programming in D (cont) - volatile

reply "Mike" <none none.com> writes:
Hello again,

I'm interested in ARM bare-metal programming with D, and I'm 
trying to get my head wrapped around how to approach this.  I'm 
making progress, but I found something that was surprising to me: 
deprecation of the volatile keyword.

In the bare-metal/hardware/driver world, this keyword is 
important to ensure the optimizer doesn't cache reads to 
memory-mapped IO, as some hardware peripheral may modify the 
value without involving the processor.

I've read a few discussions on the D forums about the volatile 
keyword debate, but noone seemed to reconcile the need for 
volatile in memory-mapped IO.  Was this an oversight?

What's D's answer to this?  If one were to use D to read from 
memory-mapped IO, how would one ensure the compiler doesn't cache 
the value?
Oct 23 2013
next sibling parent =?UTF-8?B?IsOYaXZpbmQi?= <oivind.loe gmail.com> writes:
On Thursday, 24 October 2013 at 00:43:11 UTC, Mike wrote:
 Hello again,

 I'm interested in ARM bare-metal programming with D, and I'm 
 trying to get my head wrapped around how to approach this.  I'm 
 making progress, but I found something that was surprising to 
 me: deprecation of the volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is 
 important to ensure the optimizer doesn't cache reads to 
 memory-mapped IO, as some hardware peripheral may modify the 
 value without involving the processor.

 I've read a few discussions on the D forums about the volatile 
 keyword debate, but noone seemed to reconcile the need for 
 volatile in memory-mapped IO.  Was this an oversight?

 What's D's answer to this?  If one were to use D to read from 
 memory-mapped IO, how would one ensure the compiler doesn't 
 cache the value?
+1
Oct 23 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm trying to get my
 head wrapped around how to approach this.  I'm making progress, but I found
 something that was surprising to me: deprecation of the volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is important to ensure
the
 optimizer doesn't cache reads to memory-mapped IO, as some hardware peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the volatile keyword debate,
 but noone seemed to reconcile the need for volatile in memory-mapped IO.  Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O. The correct and guaranteed way to make this work is to write two "peek" and "poke" functions to read/write a particular memory address: int peek(int* p); void poke(int* p, int value); Implement them in the obvious way, and compile them separately so the optimizer will not try to inline/optimize them.
Oct 23 2013
next sibling parent reply "Mike" <none none.com> writes:
On Thursday, 24 October 2013 at 05:37:49 UTC, Walter Bright wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm 
 trying to get my
 head wrapped around how to approach this.  I'm making 
 progress, but I found
 something that was surprising to me: deprecation of the 
 volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is 
 important to ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some 
 hardware peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the volatile 
 keyword debate,
 but noone seemed to reconcile the need for volatile in 
 memory-mapped IO.  Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from 
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O. The correct and guaranteed way to make this work is to write two "peek" and "poke" functions to read/write a particular memory address: int peek(int* p); void poke(int* p, int value); Implement them in the obvious way, and compile them separately so the optimizer will not try to inline/optimize them.
Thanks for the answer, Walter. I think this would be acceptable in many (most?) cases, but not where high performance is needed I think these functions add too much overhead if they are not inlined and in a critical path (bit-banging IO, for example). Afterall, a read/write to a volatile address is a single atomic instruction, if done properly. Is there a way to tell D to remove the function overhead, for example, like a "naked" attribute, yet still retain the "volatile" behavior?
Oct 23 2013
next sibling parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 October 2013 07:19, Mike <none none.com> wrote:
 On Thursday, 24 October 2013 at 05:37:49 UTC, Walter Bright wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm trying to
 get my
 head wrapped around how to approach this.  I'm making progress, but I
 found
 something that was surprising to me: deprecation of the volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is important to
 ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some hardware
 peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the volatile keyword
 debate,
 but noone seemed to reconcile the need for volatile in memory-mapped IO.
 Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O. The correct and guaranteed way to make this work is to write two "peek" and "poke" functions to read/write a particular memory address: int peek(int* p); void poke(int* p, int value); Implement them in the obvious way, and compile them separately so the optimizer will not try to inline/optimize them.
Thanks for the answer, Walter. I think this would be acceptable in many (most?) cases, but not where high performance is needed I think these functions add too much overhead if they are not inlined and in a critical path (bit-banging IO, for example). Afterall, a read/write to a volatile address is a single atomic instruction, if done properly.
Operations on volatile are *not* atomic. Nor do they establish a proper happens-before relationship for threading. This is why we have core.atomic as a portable synchronisation mechanism in D. Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 23 2013
parent "Mike" <none none.com> writes:
On Thursday, 24 October 2013 at 06:40:14 UTC, Iain Buclaw wrote:
 volatile was never a reliable method for dealing with memory 
 mapped I/O.
 The correct and guaranteed way to make this work is to write 
 two "peek" and
 "poke" functions to read/write a particular memory address:

     int peek(int* p);
     void poke(int* p, int value);

 Implement them in the obvious way, and compile them 
 separately so the
 optimizer will not try to inline/optimize them.
Thanks for the answer, Walter. I think this would be acceptable in many (most?) cases, but not where high performance is needed I think these functions add too much overhead if they are not inlined and in a critical path (bit-banging IO, for example). Afterall, a read/write to a volatile address is a single atomic instruction, if done properly.
Operations on volatile are *not* atomic. Nor do they establish a proper happens-before relationship for threading. This is why we have core.atomic as a portable synchronisation mechanism in D. Regards
I probably shouldn't have used the word "operations". What I meant is reading/writing to a volatile, aligned word in memory is an atomic operation. At least on my target platform it is. That may not be a correct generalization, however. The point I'm trying to make is the Peek/Poke function proposal adds function overhead compared to the "volatile" method in C, and I'm just want to know if there's a way to to eliminate/reduce it.
Oct 23 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/23/2013 11:19 PM, Mike wrote:
 Thanks for the answer, Walter. I think this would be acceptable in many (most?)
 cases, but not where high performance is needed I think these functions add too
 much overhead if they are not inlined and in a critical path (bit-banging IO,
 for example). Afterall, a read/write to a volatile address is a single atomic
 instruction, if done properly.

 Is there a way to tell D to remove the function overhead, for example, like a
 "naked" attribute, yet still retain the "volatile" behavior?
You have to give up on volatile. Nobody agrees on what it means. What does "don't optimize" mean? And that's not at all the same thing as "atomic". I wouldn't worry about peek/poke being too slow unless you actually benchmark it and prove it is. Then, your alternatives are: 1. Write it in ordinary D, compile it, check the code generated, and if it is what you want, you're golden (at least for that compiler & switches). 2. Write it in inline asm. That's what it's for. 3. Write it in an external C function and link it in.
Oct 23 2013
next sibling parent "Mike" <none none.com> writes:
On Thursday, 24 October 2013 at 06:48:07 UTC, Walter Bright wrote:
 On 10/23/2013 11:19 PM, Mike wrote:
 Thanks for the answer, Walter. I think this would be 
 acceptable in many (most?)
 cases, but not where high performance is needed I think these 
 functions add too
 much overhead if they are not inlined and in a critical path 
 (bit-banging IO,
 for example). Afterall, a read/write to a volatile address is 
 a single atomic
 instruction, if done properly.

 Is there a way to tell D to remove the function overhead, for 
 example, like a
 "naked" attribute, yet still retain the "volatile" behavior?
You have to give up on volatile. Nobody agrees on what it means. What does "don't optimize" mean? And that's not at all the same thing as "atomic". I wouldn't worry about peek/poke being too slow unless you actually benchmark it and prove it is. Then, your alternatives are: 1. Write it in ordinary D, compile it, check the code generated, and if it is what you want, you're golden (at least for that compiler & switches). 2. Write it in inline asm. That's what it's for. 3. Write it in an external C function and link it in.
Well, I wasn't rooting for volatile, I just wanted a way to read/write my IO registers as fast as possible with D. I think the last two methods you've given confirm my suspicions and will work. But... I had my heart set on doing it all in D :-( Thanks for the answers.
Oct 23 2013
prev sibling parent reply "eles" <eles eles.com> writes:
On Thursday, 24 October 2013 at 06:48:07 UTC, Walter Bright wrote:
 On 10/23/2013 11:19 PM, Mike wrote:
 Thanks for the answer, Walter. I think this would be 
 acceptable in many (most?)
 cases, but not where high performance is needed I think these 
 functions add too
 much overhead if they are not inlined and in a critical path 
 (bit-banging IO,
 for example). Afterall, a read/write to a volatile address is 
 a single atomic
 instruction, if done properly.

 Is there a way to tell D to remove the function overhead, for 
 example, like a
 "naked" attribute, yet still retain the "volatile" behavior?
You have to give up on volatile. Nobody agrees on what it means. What does "don't optimize" mean? And that's not at all the same thing as "atomic".
Is not about "atomize me", it is about "really *read* me" or "really *write* me" at that memory location, don't fake it, don't cache me. And do it now, not 10 seconds later.
Oct 24 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/24/2013 4:18 AM, eles wrote:
 On Thursday, 24 October 2013 at 06:48:07 UTC, Walter Bright wrote:
 On 10/23/2013 11:19 PM, Mike wrote:
 Thanks for the answer, Walter. I think this would be acceptable in many (most?)
 cases, but not where high performance is needed I think these functions add too
 much overhead if they are not inlined and in a critical path (bit-banging IO,
 for example). Afterall, a read/write to a volatile address is a single atomic
 instruction, if done properly.

 Is there a way to tell D to remove the function overhead, for example, like a
 "naked" attribute, yet still retain the "volatile" behavior?
You have to give up on volatile. Nobody agrees on what it means. What does "don't optimize" mean? And that's not at all the same thing as "atomic".
Is not about "atomize me", it is about "really *read* me" or "really *write* me" at that memory location, don't fake it, don't cache me. And do it now, not 10 seconds later.
Like I said, nobody (on the standards committees) could agree on exactly what that meant.
Oct 24 2013
parent reply "eles" <eles eles.com> writes:
On Thursday, 24 October 2013 at 17:02:51 UTC, Walter Bright wrote:
 On 10/24/2013 4:18 AM, eles wrote:
 On Thursday, 24 October 2013 at 06:48:07 UTC, Walter Bright 
 wrote:
 On 10/23/2013 11:19 PM, Mike wrote:
Like I said, nobody (on the standards committees) could agree on exactly what that meant.
The standard committees might not agree, but there is somebody out there that really knows very accurately what that should mean: that somebody is the hardware itself. Just imagine the best hardware example that you have at hand: the microprocessor that you are programming for. It writes on the bus, there is a short delay before the signals are guaranteed to reach the correct levels, then reads the memory data and so on. You cannot read the data before the delay passes. You cannot say "well, I could postpone the writing on the address on the bus, let's read the memory location first" -- or you would read garbage. Or you cannot say: well, first I will execute the program without a processor then, when the user is already pissed off, I would finally execute all those instructions at once. Too bad that the computer is already flying through the window at that time. You command that processor from the compiler. Now, the thing that's needed is to give a way to do the same (ie commanding a hardware) from the program compiled by the compiler.
Oct 24 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/24/2013 11:33 AM, eles wrote:
 On Thursday, 24 October 2013 at 17:02:51 UTC, Walter Bright wrote:
 On 10/24/2013 4:18 AM, eles wrote:
 On Thursday, 24 October 2013 at 06:48:07 UTC, Walter Bright wrote:
 On 10/23/2013 11:19 PM, Mike wrote:
Like I said, nobody (on the standards committees) could agree on exactly what that meant.
The standard committees might not agree, but there is somebody out there that really knows very accurately what that should mean: that somebody is the hardware itself. Just imagine the best hardware example that you have at hand: the microprocessor that you are programming for. It writes on the bus, there is a short delay before the signals are guaranteed to reach the correct levels, then reads the memory data and so on. You cannot read the data before the delay passes. You cannot say "well, I could postpone the writing on the address on the bus, let's read the memory location first" -- or you would read garbage. Or you cannot say: well, first I will execute the program without a processor then, when the user is already pissed off, I would finally execute all those instructions at once. Too bad that the computer is already flying through the window at that time. You command that processor from the compiler. Now, the thing that's needed is to give a way to do the same (ie commanding a hardware) from the program compiled by the compiler.
The trouble with that is since the standards people cannot agree on what volatile means, you're working with a compiler that has non-standard behavior. This is not portable and not reliable.
Oct 24 2013
parent reply "Mike" <none none.com> writes:
On Thursday, 24 October 2013 at 19:11:03 UTC, Walter Bright wrote:
 On 10/24/2013 11:33 AM, eles wrote:
 On Thursday, 24 October 2013 at 17:02:51 UTC, Walter Bright 
 wrote:
 On 10/24/2013 4:18 AM, eles wrote:
 On Thursday, 24 October 2013 at 06:48:07 UTC, Walter Bright 
 wrote:
 On 10/23/2013 11:19 PM, Mike wrote:
Like I said, nobody (on the standards committees) could agree on exactly what that meant.
The standard committees might not agree, but there is somebody out there that really knows very accurately what that should mean: that somebody is the hardware itself. Just imagine the best hardware example that you have at hand: the microprocessor that you are programming for. It writes on the bus, there is a short delay before the signals are guaranteed to reach the correct levels, then reads the memory data and so on. You cannot read the data before the delay passes. You cannot say "well, I could postpone the writing on the address on the bus, let's read the memory location first" -- or you would read garbage. Or you cannot say: well, first I will execute the program without a processor then, when the user is already pissed off, I would finally execute all those instructions at once. Too bad that the computer is already flying through the window at that time. You command that processor from the compiler. Now, the thing that's needed is to give a way to do the same (ie commanding a hardware) from the program compiled by the compiler.
The trouble with that is since the standards people cannot agree on what volatile means, you're working with a compiler that has non-standard behavior. This is not portable and not reliable.
There should be some way, in the D language, to tell the compiler "Do exactly what I say here, and don't try to be clever about it" without introducing unnecessary (and unfortunate) overhead. It doesn't have to be /volatile/. /shared/ may be the solution here, but based on a comment by Iain Buclaw (http://forum.dlang.org/post/mailman.2454.1382619958.1719.digitalm rs-d puremagic.com) it seems there could be some disagreement on what this means to compiler implementers. I don't see why "shared" could not only mean "shared by more than one thread/cpu", but also "shared by external hardware peripherals". Maybe /shared/'s definition needs to be further defined to ensure all compilers implement it the same way, and be unambiguous enough to provide a solution to this /volatile/ debate. Using peek and poke functions is, well, nah... Better methods exist. Using inline assembly is a reasonable alternative, as is linking to an external C library, but why use D then? Is low-level/embedded software development a design goal of the D language?
Oct 24 2013
next sibling parent "eles" <eles eles.com> writes:
On Friday, 25 October 2013 at 04:30:37 UTC, Mike wrote:
 On Thursday, 24 October 2013 at 19:11:03 UTC, Walter Bright 
 wrote:
 On 10/24/2013 11:33 AM, eles wrote:
 On Thursday, 24 October 2013 at 17:02:51 UTC, Walter Bright 
 wrote:
 On 10/24/2013 4:18 AM, eles wrote:
 On Thursday, 24 October 2013 at 06:48:07 UTC, Walter Bright 
 wrote:
 On 10/23/2013 11:19 PM, Mike wrote:
Maybe /shared/'s definition needs to be further defined to ensure all compilers implement it the same way, and be unambiguous enough to provide a solution to this /volatile/ debate.
The problem with shared alone variable is that it can be simply placed by the optimizer at another memory location than the intended one, even if all threads are seeing it "as if" at the intended location.
Oct 25 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/24/2013 9:30 PM, Mike wrote:
 Using peek and poke functions is, well, nah... Better methods exist.  Using
 inline assembly is a reasonable alternative, as is linking to an external C
 library, but why use D then?  Is low-level/embedded software development a
 design goal of the D language?
I've written device drivers and embedded systems. The quantity of code that deals with memory-mapped I/O is a very, very small part of those programs. The subset of that code that needs to exactly control the read and write cycles is tinier still. (For example, when writing to a memory-mapped video buffer, such control is quite unnecessary.) Any of the methods I presented are not a significant burden. Adding two lines of inline assembler to get exactly what you want isn't hard, and you can hide it behind a mixin if you like. And, of course, you'll still need inline assembler to deal with the other system-type operations needed for embedded systems work. For example, setting up the program stack, setting the segment registers, etc. No language provides support for them outside of inline assembler or assembler intrinsics.
Oct 25 2013
parent reply Russel Winder <russel winder.org.uk> writes:
On Fri, 2013-10-25 at 13:04 -0700, Walter Bright wrote:
[…]
 I've written device drivers and embedded systems. The quantity of code that 
 deals with memory-mapped I/O is a very, very small part of those programs. The 
 subset of that code that needs to exactly control the read and write cycles is 
 tinier still. (For example, when writing to a memory-mapped video buffer, such 
 control is quite unnecessary.)
 
 Any of the methods I presented are not a significant burden.
 
 Adding two lines of inline assembler to get exactly what you want isn't hard, 
 and you can hide it behind a mixin if you like.
 
 And, of course, you'll still need inline assembler to deal with the other 
 system-type operations needed for embedded systems work. For example, setting
up 
 the program stack, setting the segment registers, etc. No language provides 
 support for them outside of inline assembler or assembler intrinsics.
My experience, admittedly late 1970s, early 1980s then early 2000s concurs with yours that only a small amount of code requires this read and write behaviour, but where it is needed it is crucial and in areas where every picosecond matters (*). I disagree with your point about memory video buffers as a general statement, it depends on the buffering and refresh strategy of the buffer. Some frame buffers are very picky and so exact read and write behaviour of the code is needed. Less so now though fortunately. Using functions is a burden here if it involves a function call, only macros are feasible as units of abstraction. Moreover this is the classic approach to inline assembler some form of macro so as to create a comprehensible abstraction. The problem with inline assembler is that you need versions for every target architecture making it a source code and build nightmare. OK there are directory hierarchy idioms and build idioms that make it easier (**), but inline assembler should only really be an answer in cases where there are hardware instructions on a given target that it cannot reasonable be expected that the compiler can generate from the source code. Classics here are the elliptic function libraries, and the context switch operations. So the issue is not the approach per se but how that is encoded in the source code to make it readable and comprehensible AND performant. Volatile as a variable modifier always worked for me in the past but it got bad press and all compiler writers ignored it as a feature till it became useless. Perhaps it is time to reclaim volatile for D give it a memory barrier semantic so that there can be no instruction reordering around the read and write operations, and make it a tool for those who need it. After all no-one is actually using for anything just now are they? (*) OK a small exaggeration in late 1970s where the time scale was 18ms, but you get my point. (**) Actually it is much easier to do with build tools such as SCons and Waf than it ever was with Make, and the GNU "Auto" tools (especially on Windows), and even CMake. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Oct 28 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/28/2013 1:13 AM, Russel Winder wrote:
 My experience, admittedly late 1970s, early 1980s then early 2000s
 concurs with yours that only a small amount of code requires this read
 and write behaviour, but where it is needed it is crucial and in areas
 where every picosecond matters (*). I disagree with your point about
 memory video buffers as a general statement, it depends on the buffering
 and refresh strategy of the buffer. Some frame buffers are very picky
 and so exact read and write behaviour of the code is needed. Less so now
 though fortunately.
I've not only built my own single board computers with video buffers, but I've written code for several graphics boards back in the 80's. None needed exact read/write behavior.
 Using functions is a burden here if it involves a function call, only
 macros are feasible as units of abstraction. Moreover this is the
 classic approach to inline assembler some form of macro so as to create
 a comprehensible abstraction.
If you want every picosecond, you're really best off writing a few lines of inline asm. Then you can craft exactly what you need.
 The problem with inline assembler is that you need versions for every
 target architecture making it a source code and build nightmare.
When you're writing code for memory-mapped I/O, it is NOT going to be portable, pretty much by definition! (Are there any two different target architectures with exactly the same memory-mapped I/O stuff?)
 OK there are directory hierarchy idioms and build idioms that make it
 easier (**), but inline assembler should only really be an answer in
 cases where there are hardware instructions on a given target that it
 cannot reasonable be expected that the compiler can generate from the
 source code. Classics here are the elliptic function libraries, and the
 context switch operations.

 So the issue is not the approach per se but how that is encoded in the
 source code to make it readable and comprehensible AND performant.

 Volatile as a variable modifier always worked for me in the past but it
 got bad press and all compiler writers ignored it as a feature till it
 became useless. Perhaps it is time to reclaim volatile for D give it a
 memory barrier semantic so that there can be no instruction reordering
 around the read and write operations, and make it a tool for those who
 need it. After all no-one is actually using for anything just now are
 they?
Ask any two people, even ones in this thread, what "volatile" means, and you'll get two different answers. Note that the issues of reordering, caching, cycles, and memory barriers are separate and distinct issues. Those issues also vary dramatically from one architecture to the next. (For example, what really happens with a+=1 ? Should it generate an INC, or an ADD, or a MOV/ADD/MOV triple for MMIO? Where do the barriers go? Do you even need barriers? Should a LOCK prefix be emitted? How is the compiler supposed to know just how the MMIO works on some particular computer board?)
Oct 28 2013
parent reply "eles" <eles eles.com> writes:
On Monday, 28 October 2013 at 08:42:12 UTC, Walter Bright wrote:
 On 10/28/2013 1:13 AM, Russel Winder wrote:
 Ask any two people, even ones in this thread, what "volatile" 
 means, and you'll get two different answers. Note that the 
 issues of reordering, caching, cycles, and memory barriers are 
 separate and distinct issues. Those issues also vary 
 dramatically from one architecture to the next.
"volatile" => "fickle"
 (For example, what really happens with a+=1 ? Should it 
 generate an INC, or an ADD, or a MOV/ADD/MOV triple for MMIO? 
 Where do the barriers go? Do you even need barriers? Should a 
 LOCK prefix be emitted? How is the compiler supposed to know 
 just how the MMIO works on some particular computer board?)
read [address] into registry (mov) registry++ (add) write registry to [address] (mov) You cannot do it otherwise (that is, a shortcut operator). "Shortcut" operators on fickle memory location shall be simply forbidden. Compiler is able to complain about that. Only explicit reads and writes shall be possible. OK, go with peek() and poke() if you feel it's better and easier (this avoids the a+=1 problem). At least as a first step. But put those into the compiler/phobos, not force somebody to write ASM or C for that. If D send people back to a C compiler, it would never displace C. Templated peek() and poke() are 5 LOCs. Put those in a std.hardware module and, if you prefer, leave it undocumented. Since we discuss this matter, it could have been solved 10 times.
Oct 28 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/28/2013 2:33 AM, eles wrote:
 (For example, what really happens with a+=1 ? Should it generate an INC, or an
 ADD, or a MOV/ADD/MOV triple for MMIO? Where do the barriers go? Do you even
 need barriers? Should a LOCK prefix be emitted? How is the compiler supposed
 to know just how the MMIO works on some particular computer board?)
read [address] into registry (mov) registry++ (add) write registry to [address] (mov) You cannot do it otherwise (that is, a shortcut operator).
That overlooks what happens if another thread changes the memory in between the read and the write. Hence the issues of memory barriers, lock prefixes, etc.
 Since we discuss this matter, it could have been solved 10 times.
Pull requests are welcome!
Oct 28 2013
parent reply "eles" <eles eles.com> writes:
On Monday, 28 October 2013 at 16:06:48 UTC, Walter Bright wrote:
 On 10/28/2013 2:33 AM, eles wrote:
 That overlooks what happens if another thread changes the 
 memory in between the read and the write. Hence the issues of 
 memory barriers, lock prefixes, etc.
Synchronizing the access to the resource is the job of the programmer. He will take a mutex for it. You do that inside the kernel space, not in the user space. There is just one kernel, and it is able to synchronize with itself. Put this into perspective.
 Pull requests are welcome!
You pre-approve?
Oct 28 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/28/2013 11:50 AM, eles wrote:
 Pull requests are welcome!
You pre-approve?
It'll be subject to review by the community.
Oct 28 2013
prev sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Thu, 2013-10-24 at 08:19 +0200, Mike wrote:
[…]
     int peek(int* p);
     void poke(int* p, int value);

 Implement them in the obvious way, and compile them separately 
 so the optimizer will not try to inline/optimize them.
Thanks for the answer, Walter. I think this would be acceptable in many (most?) cases, but not where high performance is needed I think these functions add too much overhead if they are not inlined and in a critical path (bit-banging IO, for example). Afterall, a read/write to a volatile address is a single atomic instruction, if done properly. Is there a way to tell D to remove the function overhead, for example, like a "naked" attribute, yet still retain the "volatile" behavior?
Also this (peek and poke) is not a viable approach if you wanted to write an operating system in D. I think it should be an aim to have the replacement for Windows, OS X, Linux, etc. written in D instead of C/C++. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Oct 24 2013
parent "eles" <eles eles.com> writes:
On Thursday, 24 October 2013 at 14:53:18 UTC, Russel Winder wrote:
 On Thu, 2013-10-24 at 08:19 +0200, Mike wrote:
 […]
 I think it should be an aim to have the replacement for 
 Windows, OS X,
 Linux, etc. written in D instead of C/C++.
I pray strongly that W&A believe the same.
Oct 24 2013
prev sibling next sibling parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 October 2013 06:37, Walter Bright <newshound2 digitalmars.com> wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm trying to get
 my
 head wrapped around how to approach this.  I'm making progress, but I
 found
 something that was surprising to me: deprecation of the volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is important to
 ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some hardware
 peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the volatile keyword
 debate,
 but noone seemed to reconcile the need for volatile in memory-mapped IO.
 Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O.
Are you talking dmd or in general (it's hard to tell). In gdc, volatile is the same as in gcc/g++ in behaviour. Although in one aspect, when the default storage model was switched to thread-local, that made volatile on it's own pointless. As a side note, 'shared' is considered a volatile type in gdc, which differs from the deprecated keyword which set volatile at a decl/expression level. There is a difference in semantics, but it escapes this author at 6.30am in the morning. :o) In any case, using shared would be my recommended route for you to go down.
 The correct and guaranteed way to make this work is to write two "peek" and
 "poke" functions to read/write a particular memory address:

     int peek(int* p);
     void poke(int* p, int value);

 Implement them in the obvious way, and compile them separately so the
 optimizer will not try to inline/optimize them.
+1. Using an optimiser along with code that talks to hardware can result in bizarre behaviour. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 23 2013
parent reply "Mike" <none none.com> writes:
On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw wrote:
 On 24 October 2013 06:37, Walter Bright 
 <newshound2 digitalmars.com> wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm 
 trying to get
 my
 head wrapped around how to approach this.  I'm making 
 progress, but I
 found
 something that was surprising to me: deprecation of the 
 volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is 
 important to
 ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some 
 hardware
 peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the 
 volatile keyword
 debate,
 but noone seemed to reconcile the need for volatile in 
 memory-mapped IO.
 Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O.
Are you talking dmd or in general (it's hard to tell). In gdc, volatile is the same as in gcc/g++ in behaviour. Although in one aspect, when the default storage model was switched to thread-local, that made volatile on it's own pointless. As a side note, 'shared' is considered a volatile type in gdc, which differs from the deprecated keyword which set volatile at a decl/expression level. There is a difference in semantics, but it escapes this author at 6.30am in the morning. :o) In any case, using shared would be my recommended route for you to go down.
 The correct and guaranteed way to make this work is to write 
 two "peek" and
 "poke" functions to read/write a particular memory address:

     int peek(int* p);
     void poke(int* p, int value);

 Implement them in the obvious way, and compile them separately 
 so the
 optimizer will not try to inline/optimize them.
+1. Using an optimiser along with code that talks to hardware can result in bizarre behaviour.
Well, I've done some reading about "shared" but I don't quite grasp it yet. I still have some learning to do. That's my problem, but if you feel like explaining how it can be used in place of volatile for hardware register access, that would be awfully nice.
Oct 24 2013
parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 October 2013 08:18, Mike <none none.com> wrote:
 On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw wrote:
 On 24 October 2013 06:37, Walter Bright <newshound2 digitalmars.com>
 wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm trying to
 get
 my
 head wrapped around how to approach this.  I'm making progress, but I
 found
 something that was surprising to me: deprecation of the volatile
 keyword.

 In the bare-metal/hardware/driver world, this keyword is important to
 ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some hardware
 peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the volatile keyword
 debate,
 but noone seemed to reconcile the need for volatile in memory-mapped IO.
 Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O.
Are you talking dmd or in general (it's hard to tell). In gdc, volatile is the same as in gcc/g++ in behaviour. Although in one aspect, when the default storage model was switched to thread-local, that made volatile on it's own pointless. As a side note, 'shared' is considered a volatile type in gdc, which differs from the deprecated keyword which set volatile at a decl/expression level. There is a difference in semantics, but it escapes this author at 6.30am in the morning. :o) In any case, using shared would be my recommended route for you to go down.
 The correct and guaranteed way to make this work is to write two "peek"
 and
 "poke" functions to read/write a particular memory address:

     int peek(int* p);
     void poke(int* p, int value);

 Implement them in the obvious way, and compile them separately so the
 optimizer will not try to inline/optimize them.
+1. Using an optimiser along with code that talks to hardware can result in bizarre behaviour.
Well, I've done some reading about "shared" but I don't quite grasp it yet. I still have some learning to do. That's my problem, but if you feel like explaining how it can be used in place of volatile for hardware register access, that would be awfully nice.
'shared' guarantees that all reads and writes specified in source code happen in the exact order specified with no omissions, as there may be other threads reading/writing to the variable at the same time. Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 24 2013
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Thursday, 24 October 2013 at 08:20:43 UTC, Iain Buclaw wrote:
 On 24 October 2013 08:18, Mike <none none.com> wrote:
 On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw 
 wrote:
 On 24 October 2013 06:37, Walter Bright 
 <newshound2 digitalmars.com>
 wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and 
 I'm trying to
 get
 my
 head wrapped around how to approach this.  I'm making 
 progress, but I
 found
 something that was surprising to me: deprecation of the 
 volatile
 keyword.

 In the bare-metal/hardware/driver world, this keyword is 
 important to
 ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some 
 hardware
 peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the 
 volatile keyword
 debate,
 but noone seemed to reconcile the need for volatile in 
 memory-mapped IO.
 Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read 
 from
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O.
Are you talking dmd or in general (it's hard to tell). In gdc, volatile is the same as in gcc/g++ in behaviour. Although in one aspect, when the default storage model was switched to thread-local, that made volatile on it's own pointless. As a side note, 'shared' is considered a volatile type in gdc, which differs from the deprecated keyword which set volatile at a decl/expression level. There is a difference in semantics, but it escapes this author at 6.30am in the morning. :o) In any case, using shared would be my recommended route for you to go down.
 The correct and guaranteed way to make this work is to write 
 two "peek"
 and
 "poke" functions to read/write a particular memory address:

     int peek(int* p);
     void poke(int* p, int value);

 Implement them in the obvious way, and compile them 
 separately so the
 optimizer will not try to inline/optimize them.
+1. Using an optimiser along with code that talks to hardware can result in bizarre behaviour.
Well, I've done some reading about "shared" but I don't quite grasp it yet. I still have some learning to do. That's my problem, but if you feel like explaining how it can be used in place of volatile for hardware register access, that would be awfully nice.
'shared' guarantees that all reads and writes specified in source code happen in the exact order specified with no omissions, as there may be other threads reading/writing to the variable at the same time. Regards
Is it actually implemented as such in any D compiler? That's a lot of memory barriers, shared would have to come with a massive SLOW! notice on it. Not saying that's a bad choice necessarily, but I was pretty sure this had never been implemented.
Oct 24 2013
parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 October 2013 10:27, John Colvin <john.loughran.colvin gmail.com> wrote:
 On Thursday, 24 October 2013 at 08:20:43 UTC, Iain Buclaw wrote:
 On 24 October 2013 08:18, Mike <none none.com> wrote:
 On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw wrote:
 On 24 October 2013 06:37, Walter Bright <newshound2 digitalmars.com>
 wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm trying to
 get
 my
 head wrapped around how to approach this.  I'm making progress, but I
 found
 something that was surprising to me: deprecation of the volatile
 keyword.

 In the bare-metal/hardware/driver world, this keyword is important to
 ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some hardware
 peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the volatile keyword
 debate,
 but noone seemed to reconcile the need for volatile in memory-mapped
 IO.
 Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O.
Are you talking dmd or in general (it's hard to tell). In gdc, volatile is the same as in gcc/g++ in behaviour. Although in one aspect, when the default storage model was switched to thread-local, that made volatile on it's own pointless. As a side note, 'shared' is considered a volatile type in gdc, which differs from the deprecated keyword which set volatile at a decl/expression level. There is a difference in semantics, but it escapes this author at 6.30am in the morning. :o) In any case, using shared would be my recommended route for you to go down.
 The correct and guaranteed way to make this work is to write two "peek"
 and
 "poke" functions to read/write a particular memory address:

     int peek(int* p);
     void poke(int* p, int value);

 Implement them in the obvious way, and compile them separately so the
 optimizer will not try to inline/optimize them.
+1. Using an optimiser along with code that talks to hardware can result in bizarre behaviour.
Well, I've done some reading about "shared" but I don't quite grasp it yet. I still have some learning to do. That's my problem, but if you feel like explaining how it can be used in place of volatile for hardware register access, that would be awfully nice.
'shared' guarantees that all reads and writes specified in source code happen in the exact order specified with no omissions, as there may be other threads reading/writing to the variable at the same time. Regards
Is it actually implemented as such in any D compiler? That's a lot of memory barriers, shared would have to come with a massive SLOW! notice on it. Not saying that's a bad choice necessarily, but I was pretty sure this had never been implemented.
If you require memory barriers to access share data, that is what 'synchronized' and core.atomic is for. There is *no* implicit locks occurring when accessing the data. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 24 2013
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Thursday, 24 October 2013 at 09:43:51 UTC, Iain Buclaw wrote:
 'shared' guarantees that all reads and writes specified in 
 source code
 happen in the exact order specified with no omissions
 If you require memory barriers to access share data, that is 
 what
 'synchronized' and core.atomic is for.  There is *no* implicit 
 locks
 occurring when accessing the data.
If there are no memory barriers, then there is no guarantee* of ordering of reads or writes. Sure, the compiler can promise not to rearrange them, but the CPU is a different matter. *dependant on CPU architecture of course. e.g. IIRC the intel atom never reorders anything.
Oct 24 2013
parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 October 2013 12:10, John Colvin <john.loughran.colvin gmail.com> wrote:
 On Thursday, 24 October 2013 at 09:43:51 UTC, Iain Buclaw wrote:
 'shared' guarantees that all reads and writes specified in source code
 happen in the exact order specified with no omissions
 If you require memory barriers to access share data, that is what
 'synchronized' and core.atomic is for.  There is *no* implicit locks
 occurring when accessing the data.
If there are no memory barriers, then there is no guarantee* of ordering of reads or writes. Sure, the compiler can promise not to rearrange them, but the CPU is a different matter. *dependant on CPU architecture of course. e.g. IIRC the intel atom never reorders anything.
I was talking about the compiler, not CPU. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 24 2013
parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 24 Oct 2013 14:04:44 +0100
schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 24 October 2013 12:10, John Colvin
 <john.loughran.colvin gmail.com> wrote:
 On Thursday, 24 October 2013 at 09:43:51 UTC, Iain Buclaw wrote:
 'shared' guarantees that all reads and writes specified in
 source code happen in the exact order specified with no omissions
Does this include writes to non-shared data? For example: ------------------------------------ shared int x; int y; void func() { x = 0; y = 3; //Can the compiler move this assignment? x = 1; } ------------------------------------ So there's no additional overhead (in code / instructions emitted) when using shared instead of volatile in code like this? And this is valid code with shared (assuming reading/assigning to x is atomic)? ------------------------------------ volatile bool x = false; void waitForX() { while(!x){} } __interrupt(X) void x_interrupt() { x = true; } ------------------------------------
Oct 24 2013
parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 October 2013 18:49, Johannes Pfau <nospam example.com> wrote:
 Am Thu, 24 Oct 2013 14:04:44 +0100
 schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 24 October 2013 12:10, John Colvin
 <john.loughran.colvin gmail.com> wrote:
 On Thursday, 24 October 2013 at 09:43:51 UTC, Iain Buclaw wrote:
 'shared' guarantees that all reads and writes specified in
 source code happen in the exact order specified with no omissions
Does this include writes to non-shared data? For example: ------------------------------------ shared int x; int y; void func() { x = 0; y = 3; //Can the compiler move this assignment? x = 1; } ------------------------------------
Yes, reordering may occur so long as the compiler does not change behaviour in respect to the programs sequential points in the application. (Your example, for instance, can not possibly be re-ordered). It is also worth noting while you may have guarantee of this, it does not mean that you can go using __thread data without memory barriers. (For instance, if you have an asynchronous signal handler, it may alter the __thread'ed data at any point in the sequential program).
 So there's no additional overhead (in code / instructions emitted) when
 using shared instead of volatile in code like this? And this is valid
 code with shared (assuming reading/assigning to x is atomic)?
 ------------------------------------
 volatile bool x = false;

 void waitForX()
 {
     while(!x){}
 }

 __interrupt(X) void x_interrupt()
 {
     x = true;
 }
 ------------------------------------
That is correct. :o) Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 24 2013
parent Johannes Pfau <nospam example.com> writes:
Am Thu, 24 Oct 2013 21:28:45 +0100
schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 24 October 2013 18:49, Johannes Pfau <nospam example.com> wrote:
 Am Thu, 24 Oct 2013 14:04:44 +0100
 schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 24 October 2013 12:10, John Colvin
 <john.loughran.colvin gmail.com> wrote:
 On Thursday, 24 October 2013 at 09:43:51 UTC, Iain Buclaw wrote:
 'shared' guarantees that all reads and writes specified in
 source code happen in the exact order specified with no
 omissions
Does this include writes to non-shared data? For example: ------------------------------------ shared int x; int y; void func() { x = 0; y = 3; //Can the compiler move this assignment? x = 1; } ------------------------------------
Yes, reordering may occur so long as the compiler does not change behaviour in respect to the programs sequential points in the application. (Your example, for instance, can not possibly be re-ordered). It is also worth noting while you may have guarantee of this, it does not mean that you can go using __thread data without memory barriers. (For instance, if you have an asynchronous signal handler, it may alter the __thread'ed data at any point in the sequential program).
 So there's no additional overhead (in code / instructions emitted)
 when using shared instead of volatile in code like this? And this
 is valid code with shared (assuming reading/assigning to x is
 atomic)? ------------------------------------
 volatile bool x = false;

 void waitForX()
 {
     while(!x){}
 }

 __interrupt(X) void x_interrupt()
 {
     x = true;
 }
 ------------------------------------
That is correct. :o) Regards
Sounds good. Now this should be the standard defined behaviour for all compilers. But I guess it'll take some more time till the shared design is really finalized.
Oct 25 2013
prev sibling parent reply "eles" <eles eles.com> writes:
On Thursday, 24 October 2013 at 08:20:43 UTC, Iain Buclaw wrote:
 On 24 October 2013 08:18, Mike <none none.com> wrote:
 On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw 
 wrote:
 On 24 October 2013 06:37, Walter Bright 
 <newshound2 digitalmars.com>
 wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
'shared' guarantees that all reads and writes specified in source code happen in the exact order specified with no omissions, as there may be other threads reading/writing to the variable at the same time.
All that's missing is a guarantee that the reading/writing actually occur at the intended address and not in some compiler cache.
Oct 24 2013
parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 October 2013 12:22, eles <eles eles.com> wrote:
 On Thursday, 24 October 2013 at 08:20:43 UTC, Iain Buclaw wrote:
 On 24 October 2013 08:18, Mike <none none.com> wrote:
 On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw wrote:
 On 24 October 2013 06:37, Walter Bright <newshound2 digitalmars.com>
 wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
'shared' guarantees that all reads and writes specified in source code happen in the exact order specified with no omissions, as there may be other threads reading/writing to the variable at the same time.
All that's missing is a guarantee that the reading/writing actually occur at the intended address and not in some compiler cache.
The compiler does not cache shared data (at least in GDC). -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 24 2013
parent "eles" <eles eles.com> writes:
On Thursday, 24 October 2013 at 13:05:58 UTC, Iain Buclaw wrote:
 On 24 October 2013 12:22, eles <eles eles.com> wrote:
 On Thursday, 24 October 2013 at 08:20:43 UTC, Iain Buclaw 
 wrote:
 On 24 October 2013 08:18, Mike <none none.com> wrote:
 On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw 
 wrote:
 On 24 October 2013 06:37, Walter Bright 
 <newshound2 digitalmars.com>
 wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 The compiler does not cache shared data (at least in GDC).
Well, that should not be a matter of implementation, but of language standard. Besides not caching, still MIA is the fact that these read/write operations should occur when asked, not later (orderly execution means almost nothing if all those operations are executed by the compiler at some time later, eventually not taking into account sleep()s between operations - sometimes the hardware needs, let's say, 500ms to guarantee a register is filled with a meaning value - and so on. So it is about the correct memory location, the immediateness of those operations (this will also ensure orderly execution) and about the uncaching.
Oct 24 2013
prev sibling next sibling parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Thursday, 24 October 2013 at 05:37:49 UTC, Walter Bright wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm 
 trying to get my
 head wrapped around how to approach this.  I'm making 
 progress, but I found
 something that was surprising to me: deprecation of the 
 volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is 
 important to ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some 
 hardware peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the volatile 
 keyword debate,
 but noone seemed to reconcile the need for volatile in 
 memory-mapped IO.  Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from 
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O. The correct and guaranteed way to make this work is to write two "peek" and "poke" functions to read/write a particular memory address: int peek(int* p); void poke(int* p, int value); Implement them in the obvious way, and compile them separately so the optimizer will not try to inline/optimize them.
Yes, this is a simplest way to do it and works with gdc when compiled in separate file with no optimizations and inlining. But todays peripherals may have tens of registers and they are usually represented as a struct. Using the peripheral often require several register access. Doing it this way will not make code very readable. As a workaround I have all register access functions in a separate file and compile those files in a separate directory with no optimizations. The amount of code generated is 3-4 times more and this is a problem because in controllers memory and speed are always too small.
Oct 23 2013
parent "Mike" <none none.com> writes:
On Thursday, 24 October 2013 at 06:41:54 UTC, Timo Sintonen wrote:
 On Thursday, 24 October 2013 at 05:37:49 UTC, Walter Bright 
 wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm 
 trying to get my
 head wrapped around how to approach this.  I'm making 
 progress, but I found
 something that was surprising to me: deprecation of the 
 volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is 
 important to ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some 
 hardware peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the 
 volatile keyword debate,
 but noone seemed to reconcile the need for volatile in 
 memory-mapped IO.  Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from 
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O. The correct and guaranteed way to make this work is to write two "peek" and "poke" functions to read/write a particular memory address: int peek(int* p); void poke(int* p, int value); Implement them in the obvious way, and compile them separately so the optimizer will not try to inline/optimize them.
Yes, this is a simplest way to do it and works with gdc when compiled in separate file with no optimizations and inlining. But todays peripherals may have tens of registers and they are usually represented as a struct. Using the peripheral often require several register access. Doing it this way will not make code very readable. As a workaround I have all register access functions in a separate file and compile those files in a separate directory with no optimizations. The amount of code generated is 3-4 times more and this is a problem because in controllers memory and speed are always too small.
+1, This is what I feared. I don't think D needs a volatile keyword, but it would be nice to have *some* way to avoid this overhead using language features. I'm beginning to think inline ASM is the only way to avoid this. That's not a deal breaker for me, but it makes me sad.
Oct 24 2013
prev sibling next sibling parent reply "eles" <eles eles.com> writes:
On Thursday, 24 October 2013 at 05:37:49 UTC, Walter Bright wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 volatile was never a reliable method for dealing with memory 
 mapped I/O. The correct and guaranteed way to make this work is 
 to write two "peek" and "poke" functions to read/write a 
 particular memory address:

     int peek(int* p);
     void poke(int* p, int value);

 Implement them in the obvious way, and compile them separately 
 so the optimizer will not try to inline/optimize them.
I rised the problem here: http://forum.dlang.org/thread/selnpobzzvrsuyihnstl forum.dlang.org Anyway, poke's and peek's are a bit more cumbersome than volatile variables, since they do not cope so well, for example, with arithmetic expressions. Anyway, still better than nothing. *If* they would exist. IMHO, the embedded and hardware interfacing should get more attention.
Oct 24 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/24/2013 4:13 AM, eles wrote:
 Anyway, poke's and peek's are a bit more cumbersome than volatile variables,
 since they do not cope so well, for example, with arithmetic expressions.
Why wouldn't they work with arithmetic expressions? poke(0x888777, peek(0x12345678) + 1);
 Anyway, still better than nothing. *If* they would exist.
T peek(T)(T* addr) { return *addr; } void poke(T)(T* addr, T value) { *addr = value; }
 IMHO, the embedded and hardware interfacing should get more attention.
D's most excellent support for inline assembler should do nicely.
Oct 25 2013
parent reply "eles" <eles eles.com> writes:
On Friday, 25 October 2013 at 20:10:13 UTC, Walter Bright wrote:
 On 10/24/2013 4:13 AM, eles wrote:
 Anyway, poke's and peek's are a bit more cumbersome than 
 volatile variables,
 since they do not cope so well, for example, with arithmetic 
 expressions.
Why wouldn't they work with arithmetic expressions? poke(0x888777, peek(0x12345678) + 1);
 Anyway, still better than nothing. *If* they would exist.
T peek(T)(T* addr) { return *addr; } void poke(T)(T* addr, T value) { *addr = value; }
Frankly, if it is not a big deal, why don't you put those in a std.hardware or std.directaccess module? If I have to compile those in D outside of main program, why don't allow me to disable the optimizer *for a part* of my D program? Or for a variable? And if I have to compile those in C, should I go entirely with it and only let the D program to be an "extern int main()"? OOH you show me it is not a big deal, OTOH you make a big deal from it refusing every support inside the compiler or the standard library. Should I one day define my own "int plus(int a, int b) { return a+b; }"?
Oct 26 2013
next sibling parent Russel Winder <russel winder.org.uk> writes:
On Sat, 2013-10-26 at 16:48 +0200, eles wrote:
[…]
 OOH you show me it is not a big deal, OTOH you make a big deal 
 from it refusing every support inside the compiler or the 
 standard library.
I am assuming that the C++ memory model and it's definition of volatile has in some way made the problem go away and expressions such as: device->csw.ready can be constructed such that there is no caching of values and the entity is always read. Given the issues of out of order execution, compiler optimization and multicore, what is their solution? (The above is a genuine question rather than a troll. Last time I was writing UNIX device drivers seriously was 30+ years ago, in C, and the last embedded systems work was 10 years ago using C with specialist compilers – 8051, AVR chips and the like. I would love to be able to work with the GPIO on a Raspberry Pi with D, it would get me back into all that fun stuff. I am staying away as it looks like a return to C is the only viable just now, unless I learn C++ again.)
 Should I one day define my own "int plus(int a, int b) { return 
 a+b; }"?
Surely, a + b always transforms to a.__add__(b) in all quality languages (*) so that you can redefine the meaning from the default. (*) which rules out Java. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Oct 27 2013
prev sibling next sibling parent reply Russel Winder <russel winder.org.uk> writes:
The sub-text here is that, D should be one of the main languages on A
Raspberry Pi.

Currently children start with Scratch, move to Python and then (most
likely) to C.

Oracle have made a huge push to ensure Java is mainstream on the
Raspberry Pi.

I would much prefer to have D or Go as the "Don't go to C or Java"
option.

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
Oct 27 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
27-Oct-2013 13:09, Russel Winder пишет:
 The sub-text here is that, D should be one of the main languages on A
 Raspberry Pi.
s/Raspberry Pi/ARM boards/ After all Rasp Pi is only one of many - a tiny piece of outdated ARM.
 Currently children start with Scratch, move to Python and then (most
 likely) to C.

 Oracle have made a huge push to ensure Java is mainstream on the
 Raspberry Pi.

 I would much prefer to have D or Go as the "Don't go to C or Java"
 option.
+1 -- Dmitry Olshansky
Oct 27 2013
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
The sub-text here is that, D should be one of the main languages on A
Raspberry Pi.

Currently children start with Scratch, move to Python and then (most
likely) to C.

Oracle have made a huge push to ensure Java is mainstream on the
Raspberry Pi.

I would much prefer to have D or Go as the "Don't go to C or Java"
option.

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
Oct 27 2013
prev sibling parent Russel Winder <russel winder.org.uk> writes:
On Sat, 2013-10-26 at 16:48 +0200, eles wrote:
[…]
 OOH you show me it is not a big deal, OTOH you make a big deal 
 from it refusing every support inside the compiler or the 
 standard library.
I am assuming that the C++ memory model and it's definition of volatile has in some way made the problem go away and expressions such as: device->csw.ready can be constructed such that there is no caching of values and the entity is always read. Given the issues of out of order execution, compiler optimization and multicore, what is their solution? (The above is a genuine question rather than a troll. Last time I was writing UNIX device drivers seriously was 30+ years ago, in C, and the last embedded systems work was 10 years ago using C with specialist compilers – 8051, AVR chips and the like. I would love to be able to work with the GPIO on a Raspberry Pi with D, it would get me back into all that fun stuff. I am staying away as it looks like a return to C is the only viable just now, unless I learn C++ again.)
 Should I one day define my own "int plus(int a, int b) { return 
 a+b; }"?
Surely, a + b always transforms to a.__add__(b) in all quality languages (*) so that you can redefine the meaning from the default. (*) which rules out Java. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Oct 27 2013
prev sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 October 2013 07:36, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 24 October 2013 06:37, Walter Bright <newshound2 digitalmars.com> wrote:
 On 10/23/2013 5:43 PM, Mike wrote:
 I'm interested in ARM bare-metal programming with D, and I'm trying to get
 my
 head wrapped around how to approach this.  I'm making progress, but I
 found
 something that was surprising to me: deprecation of the volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is important to
 ensure the
 optimizer doesn't cache reads to memory-mapped IO, as some hardware
 peripheral
 may modify the value without involving the processor.

 I've read a few discussions on the D forums about the volatile keyword
 debate,
 but noone seemed to reconcile the need for volatile in memory-mapped IO.
 Was
 this an oversight?

 What's D's answer to this?  If one were to use D to read from
 memory-mapped IO,
 how would one ensure the compiler doesn't cache the value?
volatile was never a reliable method for dealing with memory mapped I/O.
Are you talking dmd or in general (it's hard to tell). In gdc, volatile is the same as in gcc/g++ in behaviour. Although in one aspect, when the default storage model was switched to thread-local, that made volatile on it's own pointless. As a side note, 'shared' is considered a volatile type in gdc, which differs from the deprecated keyword which set volatile at a decl/expression level. There is a difference in semantics, but it escapes this author at 6.30am in the morning. :o)
To elaborate (now I am a little more awake :) - 'volatile' on the type means that it's volatile-qualified. volatile on the decl means it's treated as volatile in the 'C' sense. What's the difference? Well for the backend a volatile type only really has an effect on function returns (eg: a function that returns a shared int may not be subject for use in, say, tail-call optimisations). GDC propagates the volatile flag of the type to the decl, so that there is effectively no difference between shared and volatile, except in a semantic sense in the D frontend language implementation. Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 24 2013
prev sibling next sibling parent reply "Arjan" <arjan ask.me.to> writes:
On Thursday, 24 October 2013 at 00:43:11 UTC, Mike wrote:
 Hello again,

 I'm interested in ARM bare-metal programming with D, and I'm 
 trying to get my head wrapped around how to approach this.  I'm 
 making progress, but I found something that was surprising to 
 me: deprecation of the volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is 
 important to ensure the optimizer doesn't cache reads to 
 memory-mapped IO, as some hardware peripheral may modify the 
 value without involving the processor.

 I've read a few discussions on the D forums about the volatile 
 keyword debate, but noone seemed to reconcile the need for 
 volatile in memory-mapped IO.  Was this an oversight?

 What's D's answer to this?  If one were to use D to read from 
 memory-mapped IO, how would one ensure the compiler doesn't 
 cache the value?
This article might also give some insight in the problems with volatile: http://blog.regehr.org/archives/28
Oct 23 2013
parent "Mike" <none none.com> writes:
On Thursday, 24 October 2013 at 06:25:20 UTC, Arjan wrote:
 On Thursday, 24 October 2013 at 00:43:11 UTC, Mike wrote:
 Hello again,

 I'm interested in ARM bare-metal programming with D, and I'm 
 trying to get my head wrapped around how to approach this.  
 I'm making progress, but I found something that was surprising 
 to me: deprecation of the volatile keyword.

 In the bare-metal/hardware/driver world, this keyword is 
 important to ensure the optimizer doesn't cache reads to 
 memory-mapped IO, as some hardware peripheral may modify the 
 value without involving the processor.

 I've read a few discussions on the D forums about the volatile 
 keyword debate, but noone seemed to reconcile the need for 
 volatile in memory-mapped IO.  Was this an oversight?

 What's D's answer to this?  If one were to use D to read from 
 memory-mapped IO, how would one ensure the compiler doesn't 
 cache the value?
This article might also give some insight in the problems with volatile: http://blog.regehr.org/archives/28
Arjan, Thanks for the information. I basically agree with premise of that blog, but the author also said "If you are writing code for an in-order embedded processor and have little or no infrastructure besides the C compiler, you may need to lean more heavily on volatile". Well, that's me. My goal is to target bare-metal embedded systems with D, so I'm looking for a volatile-like solution in D. I don't care if the language has the "volatile" keyword or not, I just want to be able to read and write to my IO as fast as possible, and I'm wondering if D has a way to do this in a way that is comparable to what can be achieved in C.
Oct 23 2013
prev sibling parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Mike" <none none.com> wrote in message 
news:bifrvifzrhgocrejepvc forum.dlang.org...
 I've read a few discussions on the D forums about the volatile keyword 
 debate, but noone seemed to reconcile the need for volatile in 
 memory-mapped IO.  Was this an oversight?

 What's D's answer to this?  If one were to use D to read from 
 memory-mapped IO, how would one ensure the compiler doesn't cache the 
 value?
There are a few options: 1. Use shared in place of volatile. I'm not sure this actually works, but otherwise this is pretty good. 2. Use the deprecated volatile statement. D got it right that volatile access is a property of the load/store and not the variable, but missed the point that it's a huge pain to have to remember volatile at use. Could be made better with a wrapper. I think this still works. 3. Use inline assembly. This sucks. 4. Defeat the optimizer with inline assembly. asm { nop; } // Haha, gotcha *my_hardware_register = 999; asm { nop; } This might be harder with gdc/ldc than it is with dmd, but I'm pretty sure there's a way to trick it into thinking an asm block could clobber/read arbitrary memory. 5. Lobby for/implement some nice new volatile_read and volatile_write intrinsics. Old discussion: http://www.digitalmars.com/d/archives/digitalmars/D/volatile_variables in_D...._51984.html
Oct 24 2013
parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 October 2013 12:50, Daniel Murphy <yebblies nospamgmail.com> wrote:
 "Mike" <none none.com> wrote in message
 news:bifrvifzrhgocrejepvc forum.dlang.org...
 I've read a few discussions on the D forums about the volatile keyword
 debate, but noone seemed to reconcile the need for volatile in
 memory-mapped IO.  Was this an oversight?

 What's D's answer to this?  If one were to use D to read from
 memory-mapped IO, how would one ensure the compiler doesn't cache the
 value?
There are a few options: 1. Use shared in place of volatile. I'm not sure this actually works, but otherwise this is pretty good. 2. Use the deprecated volatile statement. D got it right that volatile access is a property of the load/store and not the variable, but missed the point that it's a huge pain to have to remember volatile at use. Could be made better with a wrapper. I think this still works. 3. Use inline assembly. This sucks. 4. Defeat the optimizer with inline assembly. asm { nop; } // Haha, gotcha *my_hardware_register = 999; asm { nop; } This might be harder with gdc/ldc than it is with dmd, but I'm pretty sure there's a way to trick it into thinking an asm block could clobber/read arbitrary memory.
In gdc: --- asm {"" ::: "memory";} An asm instruction without any output operands will be treated identically to a volatile asm instruction in gcc, which indicates that the instruction has important side effects. So it creates a point in the code which may not be deleted (unless it is proved to be unreachable). The "memory" clobber will tell the backend to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory. (That does not prevent a CPU from reordering loads and stores with respect to another CPU, though; you need real memory barrier instructions for that.) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 24 2013
parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Thursday, 24 October 2013 at 13:22:50 UTC, Iain Buclaw wrote:

 In gdc:
 ---
 asm {"" ::: "memory";}

 An asm instruction without any output operands will be treated
 identically to a volatile asm instruction in gcc, which 
 indicates that
 the instruction has important side effects.  So it creates a 
 point in
 the code which may not be deleted (unless it is proved to be
 unreachable).

 The "memory" clobber will tell the backend to not keep memory 
 values
 cached in registers across the assembler instruction and not 
 optimize
 stores or loads to that memory.  (That does not prevent a CPU 
 from
 reordering loads and stores with respect to another CPU, 
 though; you
 need real memory barrier instructions for that.)
I have not (yet) had any problems when writing io registers but more with read access. Any operation after write should read the register back from real memory and not in processor registers. Any repetitive read should always read the real io register in memory. The hardware may change the register value at any time. Now a very common task like while (regs.status==0) ... may be optimized to an endless loop because the memory is read only once before the loop starts. I understood from earlier posts that variables should not be volatile but the operation should. It seems it is possible to guide the compiler like above. So would the right solution be to have a volatile block, similar to synchronized? Inside that block no memory access is optimized. This way no information of volatility is needed outside the block or in variables used there.
Oct 24 2013
parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Timo Sintonen" <t.sintonen luukku.com> wrote in message 
news:qdvhyrzshckafkiekvnw forum.dlang.org...
 On Thursday, 24 October 2013 at 13:22:50 UTC, Iain Buclaw wrote:

 In gdc:
 ---
 asm {"" ::: "memory";}

 An asm instruction without any output operands will be treated
 identically to a volatile asm instruction in gcc, which indicates that
 the instruction has important side effects.  So it creates a point in
 the code which may not be deleted (unless it is proved to be
 unreachable).

 The "memory" clobber will tell the backend to not keep memory values
 cached in registers across the assembler instruction and not optimize
 stores or loads to that memory.  (That does not prevent a CPU from
 reordering loads and stores with respect to another CPU, though; you
 need real memory barrier instructions for that.)
I have not (yet) had any problems when writing io registers but more with read access. Any operation after write should read the register back from real memory and not in processor registers. Any repetitive read should always read the real io register in memory. The hardware may change the register value at any time. Now a very common task like while (regs.status==0) ... may be optimized to an endless loop because the memory is read only once before the loop starts. I understood from earlier posts that variables should not be volatile but the operation should. It seems it is possible to guide the compiler like above. So would the right solution be to have a volatile block, similar to synchronized? Inside that block no memory access is optimized. This way no information of volatility is needed outside the block or in variables used there.
Volatile blocks are already in the language, but they suck. You don't want to have to mark every access as volatile, because all accesses to that hardware register are going to be volatile. You want it to be automatic. I'm really starting to think intrinsics are the way to go. They are safe, clear, and can be inlined. The semantics I imagine would be along the lines of llvm's volatile memory accesses (http://llvm.org/docs/LangRef.html#volatile-memory-accesses)
Oct 25 2013
parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Friday, 25 October 2013 at 13:07:56 UTC, Daniel Murphy wrote:
 "Timo Sintonen" <t.sintonen luukku.com> wrote:
 I have not (yet) had any problems when writing io registers 
 but more with read access. Any operation after write should 
 read the register back from real memory and not in processor 
 registers.  Any repetitive read should always read the real io 
 register in memory. The hardware may change the register value 
 at any time.

 Now a very common task like
 while (regs.status==0) ...
 may be optimized to an endless loop because the memory is read 
 only once before the loop starts.

 I understood from earlier posts that variables should not be 
 volatile but the operation should. It seems it is possible to 
 guide the compiler like above. So would the right solution be 
 to have a volatile block, similar to synchronized? Inside that 
 block no memory access is optimized.  This way no information 
 of volatility is needed outside the block or in variables used 
 there.
Volatile blocks are already in the language, but they suck. You don't want to have to mark every access as volatile, because all accesses to that hardware register are going to be volatile. You want it to be automatic. I'm really starting to think intrinsics are the way to go. They are safe, clear, and can be inlined. The semantics I imagine would be along the lines of llvm's volatile memory accesses (http://llvm.org/docs/LangRef.html#volatile-memory-accesses)
It seems that it is two different things here. As far as I understand, sharing means something like 'somebody may change my data' and volatility is something like 'I have to know immediately if the data is changed'. It has become obvious that these two are not easy to fit together and make a working model. The original question in this thread was to have a proper way to access hardware registers. So far, even the top people have offered only workarounds. I wonder how long D can be marketed as system language if it does not have a defined and reliable way to access system hardware. Register access occurs often in time critical places like interrupt routines. A library routine or external function is not a choice. Whatever the feature is, it has to be built in the language. I don't care if it is related to variables, blocks or files as long as I do not have to put these files in a separate directory like I do now. I would like to hear more what would be the options. Then we could make a decision what is the right way to go.
Oct 25 2013
parent reply Johannes Pfau <nospam example.com> writes:
Am Fri, 25 Oct 2013 17:20:23 +0200
schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Friday, 25 October 2013 at 13:07:56 UTC, Daniel Murphy wrote:
 "Timo Sintonen" <t.sintonen luukku.com> wrote:
 I have not (yet) had any problems when writing io registers 
 but more with read access. Any operation after write should 
 read the register back from real memory and not in processor 
 registers.  Any repetitive read should always read the real io 
 register in memory. The hardware may change the register value 
 at any time.

 Now a very common task like
 while (regs.status==0) ...
 may be optimized to an endless loop because the memory is read 
 only once before the loop starts.

 I understood from earlier posts that variables should not be 
 volatile but the operation should. It seems it is possible to 
 guide the compiler like above. So would the right solution be 
 to have a volatile block, similar to synchronized? Inside that 
 block no memory access is optimized.  This way no information 
 of volatility is needed outside the block or in variables used 
 there.
Volatile blocks are already in the language, but they suck. You don't want to have to mark every access as volatile, because all accesses to that hardware register are going to be volatile. You want it to be automatic. I'm really starting to think intrinsics are the way to go. They are safe, clear, and can be inlined. The semantics I imagine would be along the lines of llvm's volatile memory accesses (http://llvm.org/docs/LangRef.html#volatile-memory-accesses)
It seems that it is two different things here. As far as I understand, sharing means something like 'somebody may change my data' and volatility is something like 'I have to know immediately if the data is changed'. It has become obvious that these two are not easy to fit together and make a working model. The original question in this thread was to have a proper way to access hardware registers. So far, even the top people have offered only workarounds. I wonder how long D can be marketed as system language if it does not have a defined and reliable way to access system hardware. Register access occurs often in time critical places like interrupt routines. A library routine or external function is not a choice. Whatever the feature is, it has to be built in the language. I don't care if it is related to variables, blocks or files as long as I do not have to put these files in a separate directory like I do now. I would like to hear more what would be the options. Then we could make a decision what is the right way to go.
What's wrong with the solution Iain mentioned, i.e the way shared is implemented in GDC? http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2475.1382646532.1719.digitalmars-d:40puremagic.com http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2480.1382655175.1719.digitalmars-d:40puremagic.com
Oct 25 2013
parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Friday, 25 October 2013 at 18:12:40 UTC, Johannes Pfau wrote:
 Am Fri, 25 Oct 2013 17:20:23 +0200
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:
 It seems that it is two different things here. As far as I 
 understand, sharing means something like 'somebody may change 
 my data' and volatility is something like 'I have to know 
 immediately if the data is changed'. It has become obvious 
 that these two are not easy to fit together and make a working 
 model.
 
 The original question in this thread was to have a proper way 
 to access hardware registers. So far, even the top people have 
 offered only workarounds. I wonder how long D can be marketed 
 as system language if it does not have a defined and reliable 
 way to access system hardware.
 
 Register access occurs often in time critical places like 
 interrupt routines. A library routine or external function is 
 not a choice. Whatever the feature is, it has to be built in 
 the language. I don't care if it is related to variables, 
 blocks or files as long as I do not have to put these files in 
 a separate  directory like I do now.
 
 I would like to hear more what would be the options. Then we 
 could make a decision what is the right way to go.
What's wrong with the solution Iain mentioned, i.e the way shared is implemented in GDC? http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2475.1382646532.1719.digitalmars-d:40puremagic.com http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2480.1382655175.1719.digitalmars-d:40puremagic.com
There is nothing wrong if it works. When I last time discussed about this with you and Iain, I do not remember if this was mentioned. I have been on belief that gdc has no solution. The second thing is, as I mentioned, that register access is such an important feature in system language that it should be in language specs. A quick search did not bring any documentation about shared in general and how gdc version is different. TDPL mentions only that shared guarantees the order of operations but does not mention anything about volatility. Can anybody point to any documentation?
Oct 25 2013
parent reply Johannes Pfau <nospam example.com> writes:
Am Fri, 25 Oct 2013 21:16:29 +0200
schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Friday, 25 October 2013 at 18:12:40 UTC, Johannes Pfau wrote:
 What's wrong with the solution Iain mentioned, i.e the way 
 shared
 is implemented in GDC?

 http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2475.1382646532.1719.digitalmars-d:40puremagic.com
 http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2480.1382655175.1719.digitalmars-d:40puremagic.com
There is nothing wrong if it works. When I last time discussed about this with you and Iain, I do not remember if this was mentioned. I have been on belief that gdc has no solution.
Yes, this was news to me as well.
 
 The second thing is, as I mentioned, that register access is such 
 an important feature in system language that it should be in 
 language specs.
 
 A quick search did not bring any documentation about shared in 
 general and how gdc version is different. TDPL mentions only that 
 shared guarantees the order of operations but does not mention 
 anything about volatility.
 Can anybody point to any documentation?
Well to be honest I don't think there's any kind of spec related to shared. This is still a very unspecified / fragile part of the language. (I totally agree though that it should be specified)
Oct 26 2013
next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 26 October 2013 12:41, Johannes Pfau <nospam example.com> wrote:
 Am Fri, 25 Oct 2013 21:16:29 +0200
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Friday, 25 October 2013 at 18:12:40 UTC, Johannes Pfau wrote:
 What's wrong with the solution Iain mentioned, i.e the way
 shared
 is implemented in GDC?

 http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2475.1382646532.1719.digitalmars-d:40puremagic.com
 http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2480.1382655175.1719.digitalmars-d:40puremagic.com
There is nothing wrong if it works. When I last time discussed about this with you and Iain, I do not remember if this was mentioned. I have been on belief that gdc has no solution.
Yes, this was news to me as well.
Was added about 3 years ago... https://github.com/D-Programming-GDC/GDC/commit/f87a03aa2dc619caf076174f857d4e299ce2bd8d And the type qualifier only got propagated to the declaration just over a year ago. https://github.com/D-Programming-GDC/GDC/commit/ce3e42c7283616e49728dac050f9fb090c94bfd0 -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Oct 26 2013
prev sibling next sibling parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Saturday, 26 October 2013 at 11:43:02 UTC, Johannes Pfau wrote:
 Am Fri, 25 Oct 2013 21:16:29 +0200
 schrieb "Timo Sintonen" <t.sintonen luukku.com>:

 On Friday, 25 October 2013 at 18:12:40 UTC, Johannes Pfau 
 wrote:
 What's wrong with the solution Iain mentioned, i.e the way 
 shared
 is implemented in GDC?

 http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2475.1382646532.1719.digitalmars-d:40puremagic.com
 http://forum.dlang.org/thread/bifrvifzrhgocrejepvc forum.dlang.org?page=4#post-mailman.2480.1382655175.1719.digitalmars-d:40puremagic.com
There is nothing wrong if it works. When I last time discussed about this with you and Iain, I do not remember if this was mentioned. I have been on belief that gdc has no solution.
Yes, this was news to me as well.
 
 The second thing is, as I mentioned, that register access is 
 such an important feature in system language that it should be 
 in language specs.
 
 A quick search did not bring any documentation about shared in 
 general and how gdc version is different. TDPL mentions only 
 that shared guarantees the order of operations but does not 
 mention anything about volatility.
 Can anybody point to any documentation?
Well to be honest I don't think there's any kind of spec related to shared. This is still a very unspecified / fragile part of the language. (I totally agree though that it should be specified)
Seems to work. I can make every member as shared or the whole struct. Not yet tested how it works with property functions or when there are tables or structs as members, but now I get forward in my work. A little bit sad that the honored leader of the language still thinks that the right way to go is what we did with Commodore 64...
Oct 26 2013
next sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Sat, 2013-10-26 at 14:49 +0200, Timo Sintonen wrote:
[…]
 A little bit sad that the honored leader of the language still 
 thinks that the right way to go is what we did with Commodore 
 64...
Not a good style of argument, since the way of the Commodore 64 might be a good one. It isn't, but it might have been. The core problem with peek and poke for writing device drivers is that hardware controllers do not just use byte structured memory for things, they use bit structures. So for data I/O, device->buffer = value value = device->buffer can be replaced easily with: poke(device->buffer, value) value = peek(device->buffer) but this doesn't work when you are using bitfields, you end up having to do all the ugly bit mask manipulation explicitly. Thus, what the equivalent of: device->csw.enable = 1 status = device->csw.ready is, is left to the imagination. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Oct 27 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/27/2013 1:31 AM, Russel Winder wrote:
 The core problem with peek and poke for writing device drivers is that
 hardware controllers do not just use byte structured memory for things,
 they use bit structures.

 So for data I/O,

 	device->buffer = value
 	value = device->buffer

 can be replaced easily with:

 	poke(device->buffer, value)
 	value = peek(device->buffer)

 but this doesn't work when you are using bitfields, you end up having to
 do all the ugly bit mask manipulation explicitly. Thus, what the
 equivalent of:

 	device->csw.enable = 1
 	status = device->csw.ready

 is, is left to the imagination.
Bitfield code generation for C compilers has generally been rather crappy. If you wanted performant code, you always had to do the masking yourself. I've written device drivers, and have designed, built, and programmed single board computers. I've never found dealing with the oddities of memory mapped I/O and bit flags to be of any difficulty. Do you really find & and | operations to be ugly? I don't find them any uglier than + and *. Maybe that's because of my hardware background.
Oct 27 2013
parent reply Russel Winder <russel winder.org.uk> writes:
On Sun, 2013-10-27 at 02:12 -0700, Walter Bright wrote:
[…]
 Bitfield code generation for C compilers has generally been rather crappy. If 
 you wanted performant code, you always had to do the masking yourself.
Endianism and packing have always been the bête noir of bitfields due to it not being part of the standard but left as compiler specific – sort of essentially in a way due to the vast difference in targets. Given a single compiler for a given target I never found the generated code poor. Using the UNIX compiler in early 1980s and the AVR compiler suites we used in the 2000s generated code always seemed fine. What's your evidence for hand crafted code being better than compiler generated code?
 I've written device drivers, and have designed, built, and programmed single 
 board computers. I've never found dealing with the oddities of memory mapped
I/O 
 and bit flags to be of any difficulty.
But don't you find: *x = (1 << 7) & (1 << 9) to lead directly to the use of macros: SET_SOMETHING_READY(x) to hide the lack of immediacy of comprehension of the purpose of the expression?
 Do you really find & and | operations to be ugly? I don't find them any uglier 
 than + and *. Maybe that's because of my hardware background.
It's not the operations that are the problem, it is the expressions using them that lead to code that is the antithesis of self-documenting. Almost all code using <<, >>, & and | invariable ends up being replaced with macros in C and C++ so as to avoid using functions. The core point here is that this sort of code fails as soon as a function call is involved, functions cannot be used as a tool of abstraction. At least with C and C++. Clearly D has a USP over C and C++ here in that macros can be replaced by CTFE. But how to guarantee that a function is fully evaluated at compile time and not allowed to generate a function call. Only then can functions be used instead of macros to make such code self documenting. Much better to have a bitfield system that works. Especially on architectures such as AVR where there are areas of bit addressable memory. Although Intel only have words accessible memory, not all architectures do. C (and thus C++) hacked a solution that worked fine for the one compiler with the PDP and VAX targets. It was only when there were multiple compilers and multiple targets that the problem arose. There is nothing really wrong with the C bitfield syntax it was just that different compilers did different things for the same target. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Oct 28 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/28/2013 12:49 AM, Russel Winder wrote:
 On Sun, 2013-10-27 at 02:12 -0700, Walter Bright wrote:
 […]
 Bitfield code generation for C compilers has generally been rather crappy. If
 you wanted performant code, you always had to do the masking yourself.
Endianism and packing have always been the bête noir of bitfields due to it not being part of the standard but left as compiler specific – sort of essentially in a way due to the vast difference in targets. Given a single compiler for a given target I never found the generated code poor. Using the UNIX compiler in early 1980s and the AVR compiler suites we used in the 2000s generated code always seemed fine. What's your evidence for hand crafted code being better than compiler generated code?
Generally the shifting is unnecessary, but the compiler doesn't know that as the spec says the values need to be right-justified. Also, I often set/reset/test many fields at once - doesn't work to well with bitfields. Endianism should not be an issue if you're dealing with MMIO, since MMIO is going to be extremely target-specific and hence so is your code to deal with it.
 I've written device drivers, and have designed, built, and programmed single
 board computers. I've never found dealing with the oddities of memory mapped
I/O
 and bit flags to be of any difficulty.
But don't you find: *x = (1 << 7) & (1 << 9) to lead directly to the use of macros: SET_SOMETHING_READY(x) to hide the lack of immediacy of comprehension of the purpose of the expression?
My bit code usually looks like: x |= FLAG_X | FLAG_Y; x &= ~(FLAG_Y | FLAG_Z); if (x & (FLAG_A | FLAG_B)) ... You'll find stuff like that all through the dmd source code :-)
 Do you really find & and | operations to be ugly? I don't find them any uglier
 than + and *. Maybe that's because of my hardware background.
It's not the operations that are the problem, it is the expressions using them that lead to code that is the antithesis of self-documenting. Almost all code using <<, >>, & and | invariable ends up being replaced with macros in C and C++ so as to avoid using functions. The core point here is that this sort of code fails as soon as a function call is involved, functions cannot be used as a tool of abstraction. At least with C and C++.
I thought that with modern inlining, this was no longer an issue.
 Clearly D has a USP over C and C++ here in that macros can be replaced
 by CTFE. But how to guarantee that a function is fully evaluated at
 compile time and not allowed to generate a function call. Only then can
 functions be used instead of macros to make such code self documenting.
enum X = foo(args); guarantees that foo(args) is evaluated at compile time. I.e. in any context that requires a value at compile time guarantees that it will get evaluated at compile time. If it is not required at compile time, it will not attempt CTFE on it.
Oct 28 2013
prev sibling parent Russel Winder <russel winder.org.uk> writes:
On Sat, 2013-10-26 at 14:49 +0200, Timo Sintonen wrote:
[…]
 A little bit sad that the honored leader of the language still 
 thinks that the right way to go is what we did with Commodore 
 64...
Not a good style of argument, since the way of the Commodore 64 might be a good one. It isn't, but it might have been. The core problem with peek and poke for writing device drivers is that hardware controllers do not just use byte structured memory for things, they use bit structures. So for data I/O, device->buffer = value value = device->buffer can be replaced easily with: poke(device->buffer, value) value = peek(device->buffer) but this doesn't work when you are using bitfields, you end up having to do all the ugly bit mask manipulation explicitly. Thus, what the equivalent of: device->csw.enable = 1 status = device->csw.ready is, is left to the imagination. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Oct 27 2013
prev sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Saturday, 26 October 2013 at 11:43:02 UTC, Johannes Pfau wrote:
 Well to be honest I don't think there's any kind of spec 
 related to
 shared. This is still a very unspecified / fragile part of the 
 language.

 (I totally agree though that it should be specified)
I agree, and thus I think it's dangerous at best and harmful at worst to make any recommendations to use shared for anything but a mere type tag (with no intrinsic meaning) at the moment. LDC certainly does not ascribe any special meaning to shared variables, and last time I checked, DMD didn't make any of the guarantees discussed here either. David
Oct 27 2013