D - Volatile
- Jim Starkey (56/56) Mar 21 2002 Please pardon my ignorance if this has been hashed and re-hashed. I
- Walter (8/47) Mar 21 2002 They'd have to be implemented with mutexes anyway, so might as well just
- Serge K (11/18) Mar 21 2002 "volatile" does not mean "atomic" or even "synchronized".
- Walter (11/29) Mar 21 2002 Volatile
- Stephen Fuld (24/36) Mar 22 2002 just
- Walter (27/46) Mar 22 2002 register
- Stephen Fuld (20/66) Mar 22 2002 is
- Walter (24/34) Mar 22 2002 the
- Stephen Fuld (31/67) Mar 22 2002 that
- Walter (19/55) Mar 26 2002 can
- Stephen Fuld (36/95) Mar 27 2002 Sure.
- Pavel Minayev (5/7) Mar 27 2002 know
- OddesE (11/18) Mar 27 2002 Yeah I loved it!
- Walter (10/20) Mar 26 2002 not
- Russ Lewis (18/24) Mar 26 2002 Not a bad idea, although I don't like the idea that it removes ALL cachi...
- Walter (15/36) Mar 26 2002 caching.
- Richard Krehbiel (29/55) Mar 27 2002 charset="Windows-1252"
- Walter (25/25) Mar 31 2002 charset="Windows-1252"
- Stephen Fuld (27/47) Mar 27 2002 to
-
OddesE
(13/19)
Mar 27 2002
"Stephen Fuld"
wrote in message - Stephen Fuld (10/31) Mar 27 2002 a
- OddesE (15/42) Mar 28 2002 be
- Richard Krehbiel (12/17) Mar 26 2002 going
- Serge K (4/7) Mar 22 2002 You should try Visual C++ for Alpha.
- Walter (8/15) Mar 22 2002 optimize
- Karl Bochert (22/26) Mar 23 2002 Watcom has a form of asm that allows optimization.
- Pavel Minayev (5/10) Mar 23 2002 AFAIK, D chooses calling convention on its own, and might use
- Karl Bochert (21/37) Mar 23 2002 To quote from a message on the Euphoria newsgroup
- Sean L. Palmer (13/19) Mar 25 2002 Watcom did run circles around the competition back in the day. GCC's in...
- Walter (12/31) Mar 26 2002 wrote:
- Jim Starkey (36/45) Mar 22 2002 No, it neither necessary nor desirable to use mutexes. Yes, there are
- Walter (12/52) Mar 22 2002 Writes to bytes and aligned words/dwords are done atomically by the CPU,
Please pardon my ignorance if this has been hashed and re-hashed. I just got a pointer to D from another list, came over for a quick look-see, and liked what I saw. So I thought I'd toss in a few thoughts. I notice there is no support for volatile, which perplexes me. Volatile is necessary to warn an optimizer that another thread may change a data item without warning. It isn't necessary in a JVM because those types of optimization can be expressed in byte codes, although it does limit what a JIT compiler can do. D is intended for real compilation, however, and when the instruction set guys give us enough registers, the compiler is going to want to stick intermediates in them. Without volatile, this ain't a gona work. That said, the C concept of volatile declaration doesn't go far enough. While it does warn the compiler that an unexpected change is value is fair game, it doesn't tell the compiler when or if to generate multi-process safe instruction sequences. The obvious response is that data structures should be protected by a mutex or synchronize. The problem is that these are vastly too expensive to use in a tight, fine-grained multi-thread application. Modern multi-processors do a wonderful job of implementing processor interlocked atomic instructions. Modern OSes do a reasonable job of scheduling threads on multi-processors. Modern language, however, do a rotten job of giving the primitives to exploit these environments. Yeah, I know I can write an inline "lock xsub decl" yada yada yada. But it's painful and non-portable. And we all know that writing assembler rots the soul. So, guys, I would like the following: 1. A volatile declaration so the compiler can do smart things while I do fast things. 2. A "volatile volatile" declaration or distinct operator or operator modified to tell the compiler to use an processor interlock instruction sequence OR give me a compile time error why it can't. There are probably smarter ways to do this than a volatile declaration. But something is needed in that niche. Or, alternatively, I could have my head throughly wedged. But I'll take on all comers until that is so obvious that I can see it myself.
Mar 21 2002
"Jim Starkey" <jas netfrastructure.com> wrote in message news:3C9A43BC.AFBA03BA netfrastructure.com...I notice there is no support for volatile, which perplexes me. Volatile is necessary to warn an optimizer that another thread may change a data item without warning.They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit word boundary, which can happen writing doubles, longs, or even misaligned ints.That said, the C concept of volatile declaration doesn't go far enough. While it does warn the compiler that an unexpected change is value is fair game, it doesn't tell the compiler when or if to generate multi-process safe instruction sequences.I agree that the C definition of volatile is next to useless.The obvious response is that data structures should be protected by a mutex or synchronize. The problem is that these are vastly too expensive to use in a tight, fine-grained multi-thread application. Modern multi-processors do a wonderful job of implementing processor interlocked atomic instructions. Modern OSes do a reasonable job of scheduling threads on multi-processors. Modern language, however, do a rotten job of giving the primitives to exploit these environments. Yeah, I know I can write an inline "lock xsub decl" yada yada yada. But it's painful and non-portable. And we all know that writing assembler rots the soul. So, guys, I would like the following: 1. A volatile declaration so the compiler can do smart things while I do fast things. 2. A "volatile volatile" declaration or distinct operator or operator modified to tell the compiler to use an processor interlock instruction sequence OR give me a compile time error why it can't. There are probably smarter ways to do this than a volatile declaration. But something is needed in that niche.You're wrong, writing assembler puts one into a State of Grace <g>.
Mar 21 2002
"volatile" does not mean "atomic" or even "synchronized". It's just an indication that some variable in the memory can be changed from "outside". And nobody cares when *exactly* it happens, as long as it happens. For example: by another thread on the same processor. => everything is in the same cache - no problem here. by another processor, or any other hardware (DMA, ...) => any modern processor has support for cache coherency (MESI or better), in fact - it's a "must" thing for any processor with the cache. - no problem there. (..even i486 had it..)I notice there is no support for volatile, which perplexes me. Volatile is necessary to warn an optimizer that another thread may change a data item without warning.They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized".I agree that the C definition of volatile is next to useless.Is it?
Mar 21 2002
"Serge K" <skarebo programmer.net> wrote in message news:a7e2kc$17qp$1 digitaldaemon.com...VolatileI notice there is no support for volatile, which perplexes me.It does in Java, which to me makes it more useful than C's notion of "don't put it in a register"."volatile" does not mean "atomic" or even "synchronized".is necessary to warn an optimizer that another thread may change a data item without warning.They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized".It's just an indication that some variable in the memory can be changedfrom "outside".And nobody cares when *exactly* it happens, as long as it happens. For example: by another thread on the same processor. => everything is in the same cache - no problem here. by another processor, or any other hardware (DMA, ...) => any modern processor has support for cache coherency (MESI or better), in fact - it's a "must" thing for any processorwith the cache.- no problem there. (..even i486 had it..)If you are writing to, say, a long, the long will be two write cycles. In between those two, another thread could change part of it, resulting in a scrambled write.Since it does not guarantee atomic writes, yes, I believe it is useless.I agree that the C definition of volatile is next to useless.Is it?
Mar 21 2002
"Walter" <walter digitalmars.com> wrote in message news:a7entf$1hik$2 digitaldaemon.com..."Serge K" <skarebo programmer.net> wrote in message news:a7e2kc$17qp$1 digitaldaemon.com...justVolatileI notice there is no support for volatile, which perplexes me.is necessary to warn an optimizer that another thread may change a data item without warning.They'd have to be implemented with mutexes anyway, so might as well"don'tIt does in Java, which to me makes it more useful than C's notion ofwrap them in "synchronized"."volatile" does not mean "atomic" or even "synchronized".put it in a register".This is necessary in many embedded systems, even when they are single threaded and even some operating system applications. For example, it is common in embedded systems to have external hardware be made visible by memory mapping the external hardware registers into the process memory space. This makes it easy to use standard syntax to manipulate the register and is the only way to implement I/O on some processors. However, you can't let the CPU keep the "data" in a CPU register or it won't work. For example, an update to the register has to actually go to the external register to be effective. It doesn't accomplish anything to update the copy in a CPU register without doing the store as the external hardware might not see it for a long time. Similarly, of course, these external registers can change their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.) You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, or worse yet, even be "optimized" away altogether. Note that this is a different issue than cache coherence. -- - Stephen Fuld e-mail address disguised to prevent spam
Mar 22 2002
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7fq36$2uha$1 digitaldaemon.com...registerIt does in Java, which to me makes it more useful than C's notion of"don't put it in a register". This is necessary in many embedded systems, even when they are single threaded and even some operating system applications. For example, it is common in embedded systems to have external hardware be made visible by memory mapping the external hardware registers into the process memory space. This makes it easy to use standard syntax to manipulate theand is the only way to implement I/O on some processors. However, youcan'tlet the CPU keep the "data" in a CPU register or it won't work. For example, an update to the register has to actually go to the external register to be effective. It doesn't accomplish anything to update thecopyin a CPU register without doing the store as the external hardware mightnotsee it for a long time. Similarly, of course, these external registerscanchange their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.) You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, or worse yet, even be "optimized" away altogether. Note that this is a different issue than cache coherence.I understand what you mean. It's still problematic how that actually winds up being implemented in the compiler. C doesn't really define how many reads are done to an arbitrary expression in order to implement it, for example: j = i++; How many times is i read? Once or twice? mov eax, i inc i mov j, eax or: mov eax, i mov j, eax inc eax mov i, eax These ambiguities to me mean that if you need precise control over memory read and write cycles, the appropriate thing to use is the inline assembler. Volatile may happen to work, but to my mind is unreliable and may change behavior from compiler to compiler. BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize the surrounding code, unlike any other inline implementation I'm aware of.
Mar 22 2002
"Walter" <walter digitalmars.com> wrote in message news:a7ft6h$1ccq$1 digitaldaemon.com..."Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7fq36$2uha$1 digitaldaemon.com...isIt does in Java, which to me makes it more useful than C's notion of"don't put it in a register". This is necessary in many embedded systems, even when they are single threaded and even some operating system applications. For example, itworsecommon in embedded systems to have external hardware be made visible by memory mapping the external hardware registers into the process memory space. This makes it easy to use standard syntax to manipulate theregisterand is the only way to implement I/O on some processors. However, youcan'tlet the CPU keep the "data" in a CPU register or it won't work. For example, an update to the register has to actually go to the external register to be effective. It doesn't accomplish anything to update thecopyin a CPU register without doing the store as the external hardware mightnotsee it for a long time. Similarly, of course, these external registerscanchange their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.) You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, orreadsyet, even be "optimized" away altogether. Note that this is a different issue than cache coherence.I understand what you mean. It's still problematic how that actually winds up being implemented in the compiler. C doesn't really define how manyare done to an arbitrary expression in order to implement it, for example: j = i++; How many times is i read? Once or twice? mov eax, i inc i mov j, eax or: mov eax, i mov j, eax inc eax mov i, eax These ambiguities to me mean that if you need precise control over memory read and write cycles, the appropriate thing to use is the inlineassembler.Volatile may happen to work, but to my mind is unreliable and may change behavior from compiler to compiler. BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can stilloptimizethe surrounding code, unlike any other inline implementation I'm aware of.While I agree that you can use inline asm, and there are ways to code that could cause trouble, in practice, it works pretty well. People don't do things like post increment external registers when reading them. I know the syntax allows it, but programmers, especially embedded programmers learn pretty quickly what things to do and what not to do with the hardware they have. In practice, most uses of stuff like this is to read the whole register and test some bits or extract a field, or to create a word with the desired contents and write it in one piece to the external register. So, while volatile isn't a complete solution, it avoids having to delve into asm for the vast majority of such uses. -- - Stephen Fuld e-mail address disguised to prevent spam
Mar 22 2002
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7g4uv$2q8r$1 digitaldaemon.com...While I agree that you can use inline asm, and there are ways to code that could cause trouble, in practice, it works pretty well. People don't do things like post increment external registers when reading them. I knowthesyntax allows it, but programmers, especially embedded programmers learn pretty quickly what things to do and what not to do with the hardware they have. In practice, most uses of stuff like this is to read the whole register and test some bits or extract a field, or to create a word withthedesired contents and write it in one piece to the external register. So, while volatile isn't a complete solution, it avoids having to delve intoasmfor the vast majority of such uses.Wouldn't it be better to have a more reliable method than trial and error? Trial and error is subject to subtle changes if a new compiler is used. I also wish to point out that volatile permeates the typing system in a C/C++ compiler. There is a great deal of code to keep everything straight in the contexts of overloading, casting, type copying, etc. I don't see why volatile is that necessary for hardware registers. You can still easilly read a hardware register by setting a pointer to it and going *p. The compiler isn't going to skip the write to it through *p (it's very, very hard for a C optimizer to remove dead stores through pointers, due to the aliasing problem). Any reads through a pointer are not cached across any assignments through a pointer, including any function calls (again, due to the aliasing problem). For example, the second read of *p will not get cached away: x = *p; // first read func(); // call function to prevent caching of pointer results y = *p; // second read func() can simply consist of RET. To do, say, a spin lock on *p: while (*p != value) func();
Mar 22 2002
"Walter" <walter digitalmars.com> wrote in message news:a7gfrs$35e$1 digitaldaemon.com..."Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7g4uv$2q8r$1 digitaldaemon.com...thatWhile I agree that you can use inline asm, and there are ways to codetheycould cause trouble, in practice, it works pretty well. People don't do things like post increment external registers when reading them. I knowthesyntax allows it, but programmers, especially embedded programmers learn pretty quickly what things to do and what not to do with the hardwareSo,have. In practice, most uses of stuff like this is to read the whole register and test some bits or extract a field, or to create a word withthedesired contents and write it in one piece to the external register.Of course! :-)while volatile isn't a complete solution, it avoids having to delve intoasmfor the vast majority of such uses.Wouldn't it be better to have a more reliable method than trial and error?Trial and error is subject to subtle changes if a new compiler is used.Yes.I also wish to point out that volatile permeates the typing system in a C/C++ compiler. There is a great deal of code to keep everything straightinthe contexts of overloading, casting, type copying, etc.I'll take your word for what is required within the compiler. I'm a compiler user, not a designer.I don't see why volatile is that necessary for hardware registers. You can still easilly read a hardware register by setting a pointer to it andgoing*p.Sure. But I am trying, as I thought you were with D, trying to minimize/eliminate the use of pointers in the source code as a major source of error.The compiler isn't going to skip the write to it through *p (it's very, very hard for a C optimizer to remove dead stores through pointers, due to the aliasing problem).Again, I am not a compiler designer, but "very very hard" implies that it isn't impossible and therefore, some future compiler *could* do it and thus breaking code as you described the problem above. :-(Any reads through a pointer are not cached across any assignments through a pointer, including any function calls (again, due to the aliasing problem). For example, the second read of *p will not get cached away: x = *p; // first read func(); // call function to prevent caching of pointer results y = *p; // second read func() can simply consist of RET. To do, say, a spin lock on *p: while (*p != value) func();Oh, that's intuitive! :-( Add an extra empty function call in order to prevent the compiler from doing some undesirable optimization. Uccccch! There has got to be a better way to address the problem than this. I'm not wedded to the "volatile" syntax and certainly not wedded to how C does things. I was just pointing out, for those who have never done embedded programming, a major reason for that syntax. If you can come up with a better solution (I guess I don't count the ones you have proposed so far to be better.) than I am all for it. You have showed such immagination in solving other C/C++ deficiencies that I have reason to hope you can solve this one elegantly. - Sorry to put you on the spot. :-) -- - Stephen Fuld e-mail address disguised to prevent spam
Mar 22 2002
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7gom0$96n$1 digitaldaemon.com..."Walter" <walter digitalmars.com> wrote in message news:a7gfrs$35e$1 digitaldaemon.com...canI don't see why volatile is that necessary for hardware registers. Yousourcestill easilly read a hardware register by setting a pointer to it andgoing*p.Sure. But I am trying, as I thought you were with D, trying to minimize/eliminate the use of pointers in the source code as a majorof error.Pointers are still in D, for the reason that sometimes you just gotta have them. Minimizing them is a design goal, though. Also, to access hardware registers, you're going to need pointers because there is no way to specify absolute addresses for variables.toThe compiler isn't going to skip the write to it through *p (it's very, very hard for a C optimizer to remove dead stores through pointers, duethusthe aliasing problem).Again, I am not a compiler designer, but "very very hard" implies that it isn't impossible and therefore, some future compiler *could* do it andbreaking code as you described the problem above. :-(To make it impossible just have the pointer set in a function that the compiler doesn't know about.toAny reads through a pointer are not cached across any assignments through a pointer, including any function calls (again, dueresultsthe aliasing problem). For example, the second read of *p will not get cached away: x = *p; // first read func(); // call function to prevent caching of pointernoty = *p; // second read func() can simply consist of RET. To do, say, a spin lock on *p: while (*p != value) func();Oh, that's intuitive! :-( Add an extra empty function call in order to prevent the compiler from doing some undesirable optimization. Uccccch! There has got to be a better way to address the problem than this. I'mwedded to the "volatile" syntax and certainly not wedded to how C does things. I was just pointing out, for those who have never done embedded programming, a major reason for that syntax. If you can come up with a better solution (I guess I don't count the ones you have proposed so fartobe better.) than I am all for it.Yeah, I understand it isn't the greatest, but it'll work reliably. I also happen to be fond of inline assembler when dealing with hardware <g>.You have showed such immagination in solving other C/C++ deficiencies that I have reason to hope you can solve this one elegantly.Ahem. I'm on to that tactic!
Mar 26 2002
"Walter" <walter digitalmars.com> wrote in message news:a7qh13$15fb$1 digitaldaemon.com..."Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7gom0$96n$1 digitaldaemon.com...Sure."Walter" <walter digitalmars.com> wrote in message news:a7gfrs$35e$1 digitaldaemon.com...canI don't see why volatile is that necessary for hardware registers. Yousourcestill easilly read a hardware register by setting a pointer to it andgoing*p.Sure. But I am trying, as I thought you were with D, trying to minimize/eliminate the use of pointers in the source code as a majorof error.Pointers are still in D, for the reason that sometimes you just gotta have them.Minimizing them is a design goal, though.And a worthy one.Also, to access hardware registers, you're going to need pointers because there is no way tospecifyabsolute addresses for variables.Well, you could change that and eliminate one more use of pointers. I know of at least one language that allows the specification of absolute addresses for variables. You have to be careful when to allow/implement it, but it seems to work well. Some versions of the compiler (like the one given to students) just ignore the extra specification, but there are versions (you could use options) to support this. Another way to do it is to honor the requests but make the addresses program absolute and rely on the linker and other external things like the loader (or Prom/flash) burne to make them truely absolute. BTW, their syntax is varname type addressvery,The compiler isn't going to skip the write to it through *p (it'sduevery hard for a C optimizer to remove dead stores through pointers,toitthe aliasing problem).Again, I am not a compiler designer, but "very very hard" implies thatYes, but that is another "work around" that just doesn't seem "natural" Adding extra requirements that the programmer needs to know about in order to "trick" the compiler into doing the right thing are IMNSHO, not the right way to go.isn't impossible and therefore, some future compiler *could* do it andthusbreaking code as you described the problem above. :-(To make it impossible just have the pointer set in a function that the compiler doesn't know about.dueAny reads through a pointer are not cached across any assignments through a pointer, including any function calls (again,toAgreed. I am working toward comming up with "the greatest" solution. :-)resultsthe aliasing problem). For example, the second read of *p will not get cached away: x = *p; // first read func(); // call function to prevent caching of pointernoty = *p; // second read func() can simply consist of RET. To do, say, a spin lock on *p: while (*p != value) func();Oh, that's intuitive! :-( Add an extra empty function call in order to prevent the compiler from doing some undesirable optimization. Uccccch! There has got to be a better way to address the problem than this. I'mwedded to the "volatile" syntax and certainly not wedded to how C does things. I was just pointing out, for those who have never done embedded programming, a major reason for that syntax. If you can come up with a better solution (I guess I don't count the ones you have proposed so fartobe better.) than I am all for it.Yeah, I understand it isn't the greatest, but it'll work reliably.I also happen to be fond of inline assembler when dealing with hardware <g>.An affliction that I am afraid is chronic, and probably not curable. :-) As I believe that the purpose of a high level language is to minimize the use of assembler, I am not so afflicted. You can always drop to assembler, but that is precisely what we are trying to avoid as much as possible.solveYou have showed such immagination in solving other C/C++ deficiencies that I have reason to hope you canBut, based on your next post about "sequential", it seems to have worked :-) (I'll respond to that post there.) That is my goal here. To promote discussion on variious ways of solving the problems in order for the best one to come out. -- - Stephen Fuld e-mail address disguised to prevent spamthis one elegantly.Ahem. I'm on to that tactic!
Mar 27 2002
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7t4c6$2jfd$1 digitaldaemon.com...Well, you could change that and eliminate one more use of pointers. Iknowof at least one language that allows the specification of absoluteaddresses Borland Pascal had it. It was great for low-level programming, indeed.
Mar 27 2002
"Pavel Minayev" <evilone omen.ru> wrote in message news:a7tdf1$2o5m$1 digitaldaemon.com..."Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7t4c6$2jfd$1 digitaldaemon.com...Yeah I loved it! Also great for addressing BIOS vars and VGA memory (in the old DOS days)... :) -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mailWell, you could change that and eliminate one more use of pointers. Iknowof at least one language that allows the specification of absoluteaddresses Borland Pascal had it. It was great for low-level programming, indeed.
Mar 27 2002
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7gom0$96n$1 digitaldaemon.com...notOh, that's intuitive! :-( Add an extra empty function call in order toprevent the compiler from doing some undesirable optimization. Uccccch! There has got to be a better way to address the problem than this. I'mwedded to the "volatile" syntax and certainly not wedded to how C does things. I was just pointing out, for those who have never done embedded programming, a major reason for that syntax. If you can come up with a better solution (I guess I don't count the ones you have proposed so fartobe better.) than I am all for it. You have showed such immagination in solving other C/C++ deficiencies that I have reason to hope you can solve this one elegantly.I did have a thought. How about a keyword "sequence", as in: sequence; // no caching across this keyword x = *p; // *p is always reloaded and: x = *p; sequence; // *p is not cached
Mar 26 2002
Walter wrote:I did have a thought. How about a keyword "sequence", as in: sequence; // no caching across this keyword x = *p; // *p is always reloaded and: x = *p; sequence; // *p is not cachedNot a bad idea, although I don't like the idea that it removes ALL caching. How about also adding a block syntax, where caching is only disabled on the statements in the block: y = *q; sequence { x = *p; }// *p is NOT cached func(*q); // *q is still cached Pardon me if I'm being anal, but it seems like we should make 'sequence' impact as few lines of code as possible, so you can still mix good optimization into the same code block. Of course, somebody's going to say (for their hardware registers) that they will have to add 'sequence' to every line that uses the register, and they're going to ask for a 'sequence' type modifier...and we're back to volatile. :( -- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
Mar 26 2002
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message news:3CA0F98A.265E2A99 deming-os.org...Walter wrote:caching.I did have a thought. How about a keyword "sequence", as in: sequence; // no caching across this keyword x = *p; // *p is always reloaded and: x = *p; sequence; // *p is not cachedNot a bad idea, although I don't like the idea that it removes ALLHow about also adding a block syntax, where caching is only disabled onthestatements in the block: y = *q; sequence { x = *p; }// *p is NOT cached func(*q); // *q is still cached Pardon me if I'm being anal, but it seems like we should make 'sequence'impactas few lines of code as possible, so you can still mix good optimizationintothe same code block.Sequence won't affect enregistering variables, which is the big speed win, not caching. I think it will have a negligible affect on performance. Sequence fits nicely into the optimizer, because a special op is just inserted into the instruction stream that causes a 'kill' in the data flow analysis.Of course, somebody's going to say (for their hardware registers) thattheywill have to add 'sequence' to every line that uses the register, andthey'regoing to ask for a 'sequence' type modifier...and we're back to volatile.:( Nobody's ever happy <g>.
Mar 26 2002
charset="Windows-1252" Content-Transfer-Encoding: quoted-printable (Apology: This message is HTML so a massive link might still be = clickable.) "Walter" <walter digitalmars.com> wrote in message = news:a7qrji$1bnv$1 digitaldaemon.com...=20 "Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7gom0$96n$1 digitaldaemon.com...order toOh, that's intuitive! :-( Add an extra empty function call in =Uccccch!prevent the compiler from doing some undesirable optimization. =I'mThere has got to be a better way to address the problem than this. =notdoeswedded to the "volatile" syntax and certainly not wedded to how C =embeddedthings. I was just pointing out, for those who have never done =with aprogramming, a major reason for that syntax. If you can come up =farbetter solution (I guess I don't count the ones you have proposed so =toinbe better.) than I am all for it. You have showed such immagination =solvesolving other C/C++ deficiencies that I have reason to hope you can =This reminded me of something, so I did a quick Google search. Go read a Linux Torvalds rant about SMP-safety, volatile, and = "barrier()" (which is the Linux kernel's equivalent of "sequence"). And = much of the thread is interesting, so I'm linking the whole thing (with = this massive link - sorry). http://groups.google.com/groups?hl=3Den&threadm=3Dlinux.kernel.Pine.LNX.4= .33.0107231546430.7916-100000%40penguin.transmeta.com&rnum=3D5&prev=3D/gr= oups%3Fq%3Dtorvalds%2Btransmeta%2Bbarrier%26hl%3Den Boiled down, Torvalds believes that "volatile" as a storage class = modifier is always wrong; if "volatile" semantics (whatever they are) = are needed, then apply them at the moment of access (as with a cast). --=20 Richard Krehbiel, Arlington, VA, USA rich kastle.com (work) or krehbiel3 comcast.net (personal)this one elegantly.=20 I did have a thought. How about a keyword "sequence", as in: =20 sequence; // no caching across this keyword x =3D *p; // *p is always reloaded =20 and: x =3D *p; sequence; // *p is not cached =20 =20
Mar 27 2002
charset="Windows-1252" Content-Transfer-Encoding: quoted-printable That's a great link! Thanks. Interestingly, Linus appears to have come = to the same conclusion about volatile I did: "But the fact is, that when you add "volatile" to the register, it really tells gcc "Be afraid. Be very afraid. This user expects some random behaviour that is not actually covered by any standard, so just don't ever use this variable for any optimizations, = even if they are obviously correct. That way he can't complain". -Linus "Richard Krehbiel" <rich kastle.com> wrote in message = news:a7secs$27fl$1 digitaldaemon.com... (Apology: This message is HTML so a massive link might still be = clickable.) Go read a Linux Torvalds rant about SMP-safety, volatile, and = "barrier()" (which is the Linux kernel's equivalent of "sequence"). And = much of the thread is interesting, so I'm linking the whole thing (with = this massive link - sorry). = http://groups.google.com/groups?hl=3Den&threadm=3Dlinux.kernel.Pine.LNX.4= .33.0107231546430.7916-100000%40penguin.transmeta.com&rnum=3D5&prev=3D/gr= oups%3Fq%3Dtorvalds%2Btransmeta%2Bbarrier%26hl%3Den Boiled down, Torvalds believes that "volatile" as a storage class = modifier is always wrong; if "volatile" semantics (whatever they are) = are needed, then apply them at the moment of access (as with a cast).
Mar 31 2002
"Walter" <walter digitalmars.com> wrote in message news:a7qrji$1bnv$1 digitaldaemon.com..."Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7gom0$96n$1 digitaldaemon.com...toOh, that's intuitive! :-( Add an extra empty function call in ordersolveprevent the compiler from doing some undesirable optimization. Uccccch! There has got to be a better way to address the problem than this. I'mnotwedded to the "volatile" syntax and certainly not wedded to how C does things. I was just pointing out, for those who have never done embedded programming, a major reason for that syntax. If you can come up with a better solution (I guess I don't count the ones you have proposed so fartobe better.) than I am all for it. You have showed such immagination in solving other C/C++ deficiencies that I have reason to hope you canI think the fundamental question is whether the "non registerability" should be a property of the variable (that is, "volatile") or of the particular access to the variable (that is, "sequence"). I guess there are two types of situations where this functionality is required, variables shared among multiple threads and physical hardware registers. For the latter, since we are talking about a direct, one to one relationship between a variable and a particular piece of physical hardware, I think it is clearly a property of the variable itself. For the former, I guess it it could be considered either. But in practical terms, since one thread can't know when another thread is going to access the variable, you probably don't want the variable living in a register for any significant length of time, and probably want a simple locking mechanism as well. So I guess I come down on the side of making it a property of the variable, not the particular access. I think that will reduce source program size, eliminate the class of bugs that might occur for someone "forgetting" to put in the sequence keyword, etc. The lock mechanism is a separate issue, but I do believe there should be a defined access to the low cost locks offerred by atomic instructions in most architectures. -- - Stephen Fuld e-mail address disguised to prevent spamthis one elegantly.I did have a thought. How about a keyword "sequence", as in: sequence; // no caching across this keyword x = *p; // *p is always reloaded and: x = *p; sequence; // *p is not cached
Mar 27 2002
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7t4ca$2jfd$2 digitaldaemon.com... <SNIP>The lock mechanism is a separate issue, but I do believe there should be a defined access to the low cost locks offerred by atomic instructions inmostarchitectures. -- - Stephen Fuld e-mail address disguised to prevent spamIsn't depending on atomic instructions dangerous? What about multi-processor systems, where two atomic instructions might execute simultaneously? -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net __________________________________________ Remove _XYZ from my address when replying by mail
Mar 27 2002
"OddesE" <OddesE_XYZ hotmail.com> wrote in message news:a7tf5d$2p0k$1 digitaldaemon.com..."Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7t4ca$2jfd$2 digitaldaemon.com... <SNIP>aThe lock mechanism is a separate issue, but I do believe there should beThe atomic instructions I was talking about are things like test and set, compare and swap, or atomic fetch-op-store, where the memory is locked for the duration of the instruction. These are safe in multi-processor systems. Sorry if I confused you. -- - Stephen Fuld e-mail address disguised to prevent spamdefined access to the low cost locks offerred by atomic instructions inmostarchitectures. -- - Stephen Fuld e-mail address disguised to prevent spamIsn't depending on atomic instructions dangerous? What about multi-processor systems, where two atomic instructions might execute simultaneously?-- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net __________________________________________ Remove _XYZ from my address when replying by mail
Mar 27 2002
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7tm71$2sjd$2 digitaldaemon.com..."OddesE" <OddesE_XYZ hotmail.com> wrote in message news:a7tf5d$2p0k$1 digitaldaemon.com...be"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message news:a7t4ca$2jfd$2 digitaldaemon.com... <SNIP>The lock mechanism is a separate issue, but I do believe there shouldaindefined access to the low cost locks offerred by atomic instructionssystems.mostThe atomic instructions I was talking about are things like test and set, compare and swap, or atomic fetch-op-store, where the memory is locked for the duration of the instruction. These are safe in multi-processorarchitectures. -- - Stephen Fuld e-mail address disguised to prevent spamIsn't depending on atomic instructions dangerous? What about multi-processor systems, where two atomic instructions might execute simultaneously?Sorry if I confused you. -- - Stephen Fuld e-mail address disguised to prevent spamYou didn't confuse me, the topic just does. Multi threading issues are one of my weaker points when it comes to programming... :( Thanks for clearing it up. -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail
Mar 28 2002
"Walter" <walter digitalmars.com> wrote in message news:a7gfrs$35e$1 digitaldaemon.com...I don't see why volatile is that necessary for hardware registers. You can still easilly read a hardware register by setting a pointer to it andgoing*p. The compiler isn't going to skip the write to it through *p (it'svery,very hard for a C optimizer to remove dead stores through pointers, due to the aliasing problem).The linux crowd had the devil of a time with a new release of GCC. It seems that the standard for C states that acessing the bytes of one object does not necessarily alias the bytes of any other object if their accesses are by different types, unless one is char. This means that in: auto float f; *(volatile long *)&f = 0; ...this need not visibly affect the object f. Yep.
Mar 26 2002
BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize the surrounding code, unlike any other inline implementation I'm aware of.You should try Visual C++ for Alpha. It can optimize not only the surrounding code, but inline assembly code as well. I was truly amazed when I've noticed that.
Mar 22 2002
"Serge K" <skarebo programmer.net> wrote in message news:a7gclf$ej$1 digitaldaemon.com...optimizeBTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can stillof.the surrounding code, unlike any other inline implementation I'm awareYou should try Visual C++ for Alpha. It can optimize not only the surrounding code, but inline assembly code as well. I was truly amazed when I've noticed that.D's instruction scheduler (and peephole optimizer) is specifically prevented from operating on the inline assembler blocks. I'm a little surprised that a compiler wouldn't do that. The whole point of inline asm is to wrest control away from the compiler and precisely lay out the instructions.
Mar 22 2002
On Fri, 22 Mar 2002 10:20:54 -0800, "Walter" <walter digitalmars.com> wrote:BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize the surrounding code, unlike any other inline implementation I'm aware of.Watcom has a form of asm that allows optimization. #pragma aux setSP = \ "mov ESP, eax" \ parm [eax] \ modify [EAX] ; #pragma aux getSP = \ "mov edx, esp" \ value [edx] modify [eax]; Then: ... current_sp = getSP() --- is fully optimized. It also has the 'asm("mov eax, esp") form, which I believe is opaque to the compiler. Watcom also allows register passing convention in addition to the standard _stdcall and _stddecl. This, and extensive optimization, enables it to produce the fastest C code of any compiler that I am aware of. An excellent back-end for D, someday. free too ;-) Karl Bochert
Mar 23 2002
"Karl Bochert" <kbochert ix.netcom.com> wrote in message news:1103_1016902791 bose...Watcom also allows register passing convention in addition to the standard _stdcall and _stddecl. This, and extensive optimization, enables it to produce the fastest C code of any compilerAFAIK, D chooses calling convention on its own, and might use fastcall where it seems better.that I am aware of. An excellent back-end for D, someday. free too ;-)Hm? Where can I get it, then?
Mar 23 2002
On Sat, 23 Mar 2002 22:49:09 +0300, "Pavel Minayev" <evilone omen.ru> wrote:"Karl Bochert" <kbochert ix.netcom.com> wrote in message news:1103_1016902791 bose...To quote from a message on the Euphoria newsgroup " OpenWatcom is available as most of you know. The Beta to 11c does compile Euphoria Translated Code and runs much faster than LCC or Borland but you have to know a few tricks to get Watcom to work at all because the libraries and header files arent included in the beta release. I have the solution to this problem! Download Watcom 11c beta Download Masm32 by Hutch " I did this and the only problem I had was that I downloaded the file groups individually and missed one. Also the Watcom resource compiler is missing. The URL's are: http://www.openwatcom.org/ http://www.movsd.com/masm.htm A couple of benchmarks: http://www.byte.com/art/9801/sec12/art7.htm. http://www.geocities.com/SiliconValley/Vista/6552/compila.html. Karl BochertWatcom also allows register passing convention in addition to the standard _stdcall and _stddecl. This, and extensive optimization, enables it to produce the fastest C code of any compilerAFAIK, D chooses calling convention on its own, and might use fastcall where it seems better.that I am aware of. An excellent back-end for D, someday. free too ;-)Hm? Where can I get it, then?
Mar 23 2002
Watcom did run circles around the competition back in the day. GCC's inline asm provides a similar amount of information to the optimizer so in theory it should be able to perform as well as Watcom (but in practice it doesn't, from what I can tell so far) Watcom's inline asm had one main problem, which GCC doesn't: Watcom didn't let your inline asm request an empty register from the compiler... you just used a given register and the asm around the call would be rearranged to make room for the register your inline asm used. For recursive functions that doesn't work so well. For instance if you made a vector add routine where the vectors are pointed to by edx and eax, then edx and eax would become bottleneck registers whilst doing lots of vector adds and would end up getting pushed and popped alot. SeanWatcom also allows register passing convention in addition to the standard _stdcall and _stddecl. This, and extensive optimization, enables it to produce the fastest C code of any compiler that I am aware of. An excellent back-end for D, someday. free too ;-) Karl Bochert
Mar 25 2002
"Karl Bochert" <kbochert ix.netcom.com> wrote in message news:1103_1016902791 bose...On Fri, 22 Mar 2002 10:20:54 -0800, "Walter" <walter digitalmars.com>wrote:optimizeBTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can stillof.the surrounding code, unlike any other inline implementation I'm awareWatcom has a form of asm that allows optimization. #pragma aux setSP = \ "mov ESP, eax" \ parm [eax] \ modify [EAX] ; #pragma aux getSP = \ "mov edx, esp" \ value [edx] modify [eax]; Then: current_sp = getSP() is fully optimized.The Digital Mars optimizer doesn't need those hints to be specified by the user, it just analyzes the instructions.Watcom also allows register passing convention in addition to the standard _stdcall and _stddecl. This, and extensive optimization, enables it to produce the fastest C code of any compiler that I am aware of.My marketing has always been bad. I remember magazine compiler reviews where the reviewer's own numbers showed us to be the fastest compiler, but borland got the writeup as fastest. Where we produced the fastest benchmarks according to the reviewer's own numbers, but watcom got the writeup as fastest. It's all a bit maddening <g>.
Mar 26 2002
Walter wrote in message ...boundary,I notice there is no support for volatile, which perplexes me. Volatile is necessary to warn an optimizer that another thread may change a data item without warning.They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit wordwhich can happen writing doubles, longs, or even misaligned ints.No, it neither necessary nor desirable to use mutexes. Yes, there are restrictions on the interlocked instructions, but since volatile is implemented/ enforced by the compiler, this should be acceptable. The compiler's responsibility should be to either implement an operation atomically or generate a diagnostic explaining why it can't. An example of something that can be cheaply handled by enhanced volatile is use counts by objects shared across threads. An atomic interlocked decrement implemented with "lock xsub decl" does the trick correctly with no more cost than an extra bus cycle, where a mutex requires an OS call. The ratio of costs are probably three orders of magnitude or more.I agree that the C definition of volatile is next to useless.I didn't mean to imply that the C definition of volatle is next to useless -- it is, in fact, absolutely critical for all but the most primitive multi-threaded code. Even when used with mutexes volatile is necessary to warn the optimizer off unwarranted assumptions of invariance. If D is going to succeed, it is necessary to anticipate where computer architures are going. Everyone, I hope, understands that memory is cheap and plentiful, larger virtual address spaces are in easy sight, and dirt cheap multi-processors are here. Although we're in a period of rapidly increasing clock rates, we're also approaching physical limits on feature size. In the not distant future it will be cheaper to add more processors than buy/build faster ones. At that point performance will be gated by the degree to which doubling the number of processors doubles the speed of the system. There are a hierarchy of synchronization primitives -- interlocked instructions, shared/exclusive locks, and mutexes -- with a large variation in cost. Interlocked instructions are almost free, mutexes cost an arm and a leg. Forcing all synchronization to use mutexes is an unnecessary waste of resources. In the absence of volatile, however, it is impossible to implement finer grained sychronization primitives. This doesn't strike me as wise....
Mar 22 2002
"Jim Starkey" <jas netfrastructure.com> wrote in message news:a7fk2p$20rj$1 digitaldaemon.com...Writes to bytes and aligned words/dwords are done atomically by the CPU, misaligned data and multiword data is not.They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit wordboundary,which can happen writing doubles, longs, or even misaligned ints.No, it neither necessary nor desirable to use mutexes. Yes, there are restrictions on the interlocked instructions, but since volatile is implemented/ enforced by the compiler, this should be acceptable. The compiler's responsibility should be to either implement an operation atomically or generate a diagnostic explaining why it can't.An example of something that can be cheaply handled by enhanced volatile is use counts by objects shared across threads. An atomic interlocked decrement implemented with "lock xsub decl" does the trick correctly with no more cost than an extra bus cycle, where a mutex requires an OS call. The ratio of costs are probably three orders of magnitude or more.Synchronizing mutexes do not require an os call most of the time, although they still are slower than a simple lock. None of the modern java vm's do an os call for each synchronize.I'm sorry, I just don't see how. See my other post here about j=i++; and how volatile doesn't help.I agree that the C definition of volatile is next to useless.I didn't mean to imply that the C definition of volatle is next to useless -- it is, in fact, absolutely critical for all but the most primitive multi-threaded code. Even when used with mutexes volatile is necessary to warn the optimizer off unwarranted assumptions of invariance.If D is going to succeed, it is necessary to anticipate where computer architures are going. Everyone, I hope, understands that memory is cheap and plentiful, larger virtual address spaces are in easy sight, and dirt cheap multi-processors are here. Although we're in a period of rapidly increasing clock rates, we're also approaching physical limits on feature size. In the not distant future it will be cheaper to add more processors than buy/build faster ones. At that point performance will be gated by the degree to which doubling the number of processors doubles the speed of the system.I think you're right.There are a hierarchy of synchronization primitives -- interlocked instructions, shared/exclusive locks, and mutexes -- with a large variation in cost. Interlocked instructions are almost free, mutexes cost an arm and a leg. Forcing all synchronization to use mutexes is an unnecessary waste of resources. In the absence of volatile, however, it is impossible to implement finer grained sychronization primitives. This doesn't strike me as wise....I think your points merit further investigation, though I don't see how volatile is the answer.
Mar 22 2002