digitalmars.D - Dual Core Support
- Manfred Nowak (9/9) Jun 16 2005 The shipping of the "AMD Athlon 64 X2" is announced to start at the
- Brad Beveridge (4/17) Jun 16 2005 How do you mean? You can program in a multithreaded manner in D, which
- Lionello Lunesu (10/10) Jun 16 2005 | Will D be outdated before the release of 1.0 because D has no support
- Manfred Nowak (56/58) Jun 17 2005 Thank you both for your responses, Brad and Lionellu.
- xs0 (41/92) Jun 17 2005 AFAIK, multi-core processors are almost exactly the same as having
- Manfred Nowak (27/34) Jun 18 2005 Thanks for your opinions, I have read them carefully several times.
- xs0 (34/67) Jun 18 2005 I don't know what can or can't be done over the internal bus, but as far...
- Manfred Nowak (6/8) Jun 18 2005 Please have a look at
- Sean Kelly (15/40) Jun 18 2005 Yikes. So you're saying you'd have lockless sharing of data between the...
- Manfred Nowak (24/48) Jun 19 2005 Why lockless?
- Sean Kelly (18/42) Jun 19 2005 If multiple cores share a single cache, then there's no need to force ca...
- James Dunne (10/57) Jun 19 2005 It's been said in this thread before, but multi-threading control is a f...
- Manfred Nowak (19/32) Jun 19 2005 True. But have you read why Buhr abandoned his concurrency project
- Sean Kelly (10/25) Jun 19 2005 They should because the way errors are handled depends on system state. ...
- Matthias Becker (6/32) Jun 20 2005 Anyway, this isn't a new problem as real concurrency isn't an invention ...
- Brad Beveridge (21/30) Jun 20 2005 I haven't had time to read the references that you posted, but the above...
- Sean Kelly (19/49) Jun 20 2005 AFAIK, dual core machines are indistuingishable from 'true' SMP machines...
- Sean Kelly (9/9) Jun 17 2005 I need to read up a bit on multi-core systems, but they act the same as ...
- Brad Beveridge (17/30) Jun 17 2005 This I agree with, library support for multi-processor systems is a good...
- Sean Kelly (7/14) Jun 17 2005 Exactly. And that leaves us with cache coherency problems. I think we'...
- Brad Beveridge (12/25) Jun 17 2005 Thinking along these lines, performance programming in D would possibly
- Sean Kelly (16/25) Jun 17 2005 True enough :) And things are changing for x86 architectures in this re...
- Manfred Nowak (7/12) Jun 18 2005 No. I have somewhere seen an argument, that if concurrency is not
- Matthias Becker (3/13) Jun 18 2005 There are some problems with optimizers that can move code around so thi...
- Sean Kelly (10/21) Jun 18 2005 This is an issue with C/C++. Specifically, it relates to the "as if" ru...
- Manfred Nowak (7/15) Jun 19 2005 Are you able to prove, that the argument holds for C++ only, which
- Sean Kelly (4/18) Jun 19 2005 Not at all. I imagine many languages target a single-threaded virtual m...
- Sean Kelly (19/33) Jun 20 2005 Okay, I dug up a copy of Ghostscript for the PC and read the first few p...
- Brad Beveridge (22/32) Jun 20 2005 Does volatile prevent code movement within the block? For example
- Sean Kelly (18/49) Jun 21 2005 The spec just says that "Memory writes occurring before the Statement ar...
- Derek Parnell (11/22) Jun 19 2005 Yes. In the exact same manner that all existing 3+GL languages are.
- Manfred Nowak (24/35) Jun 20 2005 I disagree. All this languages are way beyond version 1.0 whereas D
- Brad Beveridge (29/40) Jun 20 2005 If I have contributed to your discomfort, I am sorry - that was
- Matthias Becker (1/8) Jun 21 2005 You can build mutexes and monitors with synchronized without problems.
- Manfred Nowak (3/5) Jun 21 2005 So why did Buhr implement them?
- Brad Beveridge (17/27) Jun 22 2005 I read the library approaches paper from Buhr that you reference, I
The shipping of the "AMD Athlon 64 X2" is announced to start at the end of this month. A review is available: http://www.amdreview.com/reviews.php?rev=athlonx24200 As the review suggests WinXP and Sandra are prepared to use more than one CPU. Will D be outdated before the release of 1.0 because D has no support for multi core units? -manfred
Jun 16 2005
Manfred Nowak wrote:The shipping of the "AMD Athlon 64 X2" is announced to start at the end of this month. A review is available: http://www.amdreview.com/reviews.php?rev=athlonx24200 As the review suggests WinXP and Sandra are prepared to use more than one CPU. Will D be outdated before the release of 1.0 because D has no support for multi core units? -manfredHow do you mean? You can program in a multithreaded manner in D, which should take advantage of multiple cpus/cores. Or am I missing something? Brad
Jun 16 2005
| Will D be outdated before the release of 1.0 because D has no support | for multi core units? There's nothing special about multi-core processors, at least when it comes to the compiler, it's all the same. A PC with a dual-core CPU (or two 'single-core' CPU's for that matter) can simply run two programs at full speed, at the same time. On a single-core CPU, the operating system lets each running program use the CPU for a fraction of a second, so it seems they are running at the same time, but they never really are. L.
Jun 16 2005
"Lionello Lunesu" <lio lunesu.removethis.com> wrote:There's nothing special about multi-core processors, at least when it comes to the compiler, it's all the same.Thank you both for your responses, Brad and Lionellu. In essence both of you seem to want the OS to represent a multicore system as a virtual single core system to you. In this case you are right: neglecting the fact that you have a multicore system does not raise any need to use its capabilities. On the other hand the OS has to do the work to make the multicore sytem to appear as a virtual single core system to you. | If control of Northbridge functions is shared between software | on both cores, software must ensure that only one core at a time | is allowed to access the shared MSR. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_ docs/26094.PDF (p. 324) So there is a need to adress the specialities of dual core machines. Please recall that an AMD Athlon64 system can contain up to 8 dual core units and that one of D's major goals is to | Provide low level bare metal access as required http://www.digitalmars.com/d/overview.html Is this really true when all bare metal access has to use the asm statement? Please look deeper into the D specs: http://www.digitalmars.com/d/statement.html The throw-statement: | The Object reference is thrown as an exception. What will happen if both cores throw an exception at the same clock impulse? The volatile satement: | Memory writes occurring before the Statement are performed | before any reads within or after the Statement. Memory reads | occurring after the Statement occur after any writes before or | within Statement are completed. What does this mean for a multi core system, which shares the main memory between all activated cores? Algorithmically it is simply not true that a dual core system is aequivalent to a higher clocked single core system! Please recall the simple task of deciding wether there is a given and fixed value in an array large enough. Using a virtual single core machine you would simply loop through all indices until you find the given value or end up not finding it, then issuing the appropriate result. Given a natural number n (n>=2 && n <=16) and a mchine with n cores you would divide the array into n equal sized pieces and assign a core to each piece of the array. In case of not finding the searched value you would in essence end up having cut down the number of clock cycles needed to an n-th of the time of a virtual single core system. But if you cannot assign a core to a task because the used language does not allow this assignment you can do nothing more than assigning the n parts of the array to n threads and then _hope_ that the OS will execute them in parallel. Would you trust your life to a system, that is usually fast but cannot be assured to have reaction time prolongations in a magnitude of more than ten? You may want to answer with "no", and in this case my initial question on the outdatedness of D is assigned a positive value. -manfred
Jun 17 2005
Manfred Nowak wrote:"Lionello Lunesu" <lio lunesu.removethis.com> wrote:AFAIK, multi-core processors are almost exactly the same as having multiple cpus, except they're in a single box and share a single bus to the outside world. So, I'd say that there's nothing much that can be done beyond what is already done (which is basically multi-threading support and synchronization objects). I don't think starting a thread is light-weight enough that the compiler should try to multi-thread code automatically, because in 99.9% cases there'd be no benefit.There's nothing special about multi-core processors, at least when it comes to the compiler, it's all the same.Thank you both for your responses, Brad and Lionellu. In essence both of you seem to want the OS to represent a multicore system as a virtual single core system to you. In this case you are right: neglecting the fact that you have a multicore system does not raise any need to use its capabilities.On the other hand the OS has to do the work to make the multicore sytem to appear as a virtual single core system to you.I think the OS does just the opposite - by scheduling and task-switching, it hides the actual CPUs/cores, and makes the system appear as having any number of them (where the number is the number of threads that are running).| If control of Northbridge functions is shared between software | on both cores, software must ensure that only one core at a time | is allowed to access the shared MSR. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_ docs/26094.PDF (p. 324) So there is a need to adress the specialities of dual core machines.You should've also mentioned the title of the white paper, which is BIOS and Kernel Developer's Guide for [AMD processors]. I disagree that D should be specialized for those types of software, and I think you'd still need assembler anyway; much important kernel code is both speed-critical and extremely specific, so coding it in a high-level langauge is just not an option realistically.Please look deeper into the D specs: http://www.digitalmars.com/d/statement.html The throw-statement: | The Object reference is thrown as an exception. What will happen if both cores throw an exception at the same clock impulse?Each thread will unwind its stack, like it does now, until it gets to an exception handler.. I don't see the difference when there is more than one core..The volatile satement: | Memory writes occurring before the Statement are performed | before any reads within or after the Statement. Memory reads | occurring after the Statement occur after any writes before or | within Statement are completed. What does this mean for a multi core system, which shares the main memory between all activated cores?Again, you skipped an important part: A volatile statement does not guarantee atomicity. Whenever more than one thread can access the same memory (where at least one is writing to it), the accesses should be synchronized, multi-core or not. Providing synchronization methods is the job of OS and/or hardware, and using them is already simple in D.Algorithmically it is simply not true that a dual core system is aequivalent to a higher clocked single core system!Unfortunately, no, it isn't.[snip] But if you cannot assign a core to a task because the used language does not allow this assignment you can do nothing more than assigning the n parts of the array to n threads and then _hope_ that the OS will execute them in parallel.The OS is in charge of both cores anyway; you can't bypass it and somehow take control of the cores, so you hope for the best in any case. That's another reason why automatically multi-threading doesn't make much sense.Would you trust your life to a system, that is usually fast but cannot be assured to have reaction time prolongations in a magnitude of more than ten?No, but luckily both software and OSs in such systems are usually written with hard guarantees about how much time anything takes..You may want to answer with "no", and in this case my initial question on the outdatedness of D is assigned a positive value.Well, I certainly wouldn't like D to be outdated so soon, but I think that as far as performance is concerned, there are several better things that could be done first (any-order loops, array ops, easier MMX/SSE utilization, etc.). I think that only after single-thread optimizations are exhausted, we (or D or Walter) should be moving towards multi-cpu/core stuff. xs0
Jun 17 2005
xs0 <xs0 xs0.com> wrote: [...]AFAIK, multi-core processors are almost exactly the same as having multiple cpus, except they're in a single box and share a single bus to the outside world.Thanks for your opinions, I have read them carefully several times. There is one fundamental difference between dual-cores and dual- cpus: dual-cores can exchange data over the internal bus and do not need any bandwidth on the bus to the "outside world". I.e. if you have a multi dual-core machine and knows that two threads have to communicate intensively you loose performance if you cannot control to have both threads running on a single dual- core die.So, I'd say that there's nothing much that can be done beyond what is already done (which is basically multi-threading support and synchronization objects).[...] I do not find the hook in your arguments to the explanation why control of the two points of execution (which are implied by a dual core machine) is not necessary. In the example of the throw statement you even explicitely say, that you are not interested in guiding the machine, instead the machine is allowed to do what ever _randomly_ occurs first To explain why this might be wrong imagine security rules for a train: - if the pressing of the alive-knob for the driver times out then stop the train as if you are joining in to a sattion - if fire alarm is issued then bring the train to a stop as fast as possible except you are in a tunnel, then delay the stopping of the train until you have left the tunnel Now what will your machine do if fire alarm is issued in a tunnel and the pressing of the alive-knob is timing out also? -manfred
Jun 18 2005
Manfred Nowak wrote:Thanks for your opinions, I have read them carefully several times. There is one fundamental difference between dual-cores and dual- cpus: dual-cores can exchange data over the internal bus and do not need any bandwidth on the bus to the "outside world". I.e. if you have a multi dual-core machine and knows that two threads have to communicate intensively you loose performance if you cannot control to have both threads running on a single dual- core die.I don't know what can or can't be done over the internal bus, but as far as thread control is concerned, it's not something that can be done by user apps, no matter what you do to the language they were coded in, because it's in the OS domain. If/when OS supports it, the functionality is available through an OS library, so, everything that D needs for multi-core CPU support is already there (access to OS :) Again, I think it'd be better to focus on providing constructs that allow optimization in general. When/if it is feasible to optimize them by utilizing multi-core cpus in the way you'd want, the only thing that needs to be done is improve the compiler. In the meantime, they can be optimized for other cases, like by making use of MMX/SSE instructions, which I think are totally underutilized generally, and which could easily provide comparable gains in speed. Well, writing all this, I think I'm not sure what are you actually proposing to be done. You seem to want some sort of multi-core support, but what would that be? Can you give an example or two?I'm not saying it's not necessary, I'm just saying it's not something that can be done in the language itself.So, I'd say that there's nothing much that can be done beyond what is already done (which is basically multi-threading support and synchronization objects).[...] I do not find the hook in your arguments to the explanation why control of the two points of execution (which are implied by a dual core machine) is not necessary.In the example of the throw statement you even explicitely say, that you are not interested in guiding the machine, instead the machine is allowed to do what ever _randomly_ occurs firstIn a general-purpose OS, everything is basically random - at any time, the OS can switch to another task. In a real-time OS, things are different (although, admittedly, I don't know how much), but I guess most software we're writing won't be running on such an OS. Even regardless of all this - considering the two simultaneous exceptions case: if they can occur simultaneously, it's almost certain that they can also occur within, say, 1 microsecond. If that is so, you must handle both cases of which occurs first anyway; when that is done, it doesn't matter anymore which comes first..To explain why this might be wrong imagine security rules for a train: - if the pressing of the alive-knob for the driver times out then stop the train as if you are joining in to a sattion - if fire alarm is issued then bring the train to a stop as fast as possible except you are in a tunnel, then delay the stopping of the train until you have left the tunnel Now what will your machine do if fire alarm is issued in a tunnel and the pressing of the alive-knob is timing out also?Hmm, I'm not sure where you see randomness in all this (hopefully, the software would be coded to handle the case where both things occur), but as for "my machine" - for something this simple (stop if (at_station && !alive) || (on_fire && !inside_tunnel)), I wouldn't use a CPU at all, this can be done far more reliably with a few really big logic gates :) xs0
Jun 18 2005
xs0 <xs0 xs0.com> wrote:You seem to want some sort of multi-core support, but what would that be? Can you give an example or two?Please have a look at http://plg.uwaterloo.ca/~usystem/pub/uSystem/uC++book.pdf Thanks for "Marco A"'s post 29355 in the old D group for directing me to this reference. -manfred
Jun 18 2005
In article <d90h6h$134$1 digitaldaemon.com>, Manfred Nowak says...xs0 <xs0 xs0.com> wrote: [...]Yikes. So you're saying you'd have lockless sharing of data between the cores and only force a cache sync when communicating between processors? Makes sense, I suppose, but it sounds risky.AFAIK, multi-core processors are almost exactly the same as having multiple cpus, except they're in a single box and share a single bus to the outside world.Thanks for your opinions, I have read them carefully several times. There is one fundamental difference between dual-cores and dual- cpus: dual-cores can exchange data over the internal bus and do not need any bandwidth on the bus to the "outside world". I.e. if you have a multi dual-core machine and knows that two threads have to communicate intensively you loose performance if you cannot control to have both threads running on a single dual- core die.In the example of the throw statement you even explicitely say, that you are not interested in guiding the machine, instead the machine is allowed to do what ever _randomly_ occurs first To explain why this might be wrong imagine security rules for a train: - if the pressing of the alive-knob for the driver times out then stop the train as if you are joining in to a sattion - if fire alarm is issued then bring the train to a stop as fast as possible except you are in a tunnel, then delay the stopping of the train until you have left the tunnel Now what will your machine do if fire alarm is issued in a tunnel and the pressing of the alive-knob is timing out also?Perhaps I'm missing something, but I don't see why this example requires special assembly-level handling of exceptions. If the button failure exception is thrown before the fire warning is signalled, then the train will begin to slow down. Then when the fire warning is signalled I assume the train will continue on at its existing speed until it exits the tunnel, then it will stop? And if the reverse happens, the train will ignore the stop button time-out because it's handing a more important directive. Is the issue that you don't want to use traditional synchronization in the error handling mechanism and would rather prioritize at the signalling level? I'll admit I haven't done this sort of programming before. Sean
Jun 18 2005
Sean Kelly <sean f4.ca> wrote: [...]Yikes. So you're saying you'd have lockless sharing of data between the cores and only force a cache sync when communicating between processors? Makes sense, I suppose, but it sounds risky.Why lockless?[...]In the example of the throw statement you even explicitely say, that you are not interested in guiding the machine, instead the machine is allowed to do what ever _randomly_ occurs first To explain why this might be wrong imagine security rules for a train: - if the pressing of the alive-knob for the driver times out then stop the train as if you are joining in to a sattion - if fire alarm is issued then bring the train to a stop as fast as possible except you are in a tunnel, then delay the stopping of the train until you have left the tunnel Now what will your machine do if fire alarm is issued in a tunnel and the pressing of the alive-knob is timing out also?If the button failure exception is thrown before the fire warningAnd if the reverse happens[...]Is the issue that you don't want to use traditional synchronization in the error handling mechanism and would rather prioritize at the signalling level?I see, that you catched the basic principal behind my example. And as you may see above it is difficult to the human brain to think in concurrency: you serialized the events but do not handle the case when depending on an unlucky implementation both cores might independently raise both exceptions, one core the fire exception and the other the alive-knob exception. In this case you have a control leak. There is one more thing to mention: it is not seldom, that specifications are incomplete or even contradictory and that detection of this specification faults occurs late in the software production process. Depending on the awareness of the implementators such a fault might traverse into the final product. Have a look at your two cases: you are handling the case that the alive-knob exception comes first, but you missed that the fire-knob exception might be thrown, when the train stopped already, but in a tunnel. -manfred
Jun 19 2005
In article <d94alb$2gld$1 digitaldaemon.com>, Manfred Nowak says...Sean Kelly <sean f4.ca> wrote: [...]If multiple cores share a single cache, then there's no need to force cache coherency when sharing data between them. Of course, that assumes there's some way to tell you're running on two cores sharing a cache, which may not be possible. As for why: cache synchs take time. Less time than full locking, but time nevertheless. I don't know how useful this would be for PCs, but for NUMA machines that have clusered cores but inter-cluster ops involve message-passing, this may be a reasonable strategy. Though I'm speculating here, as I've never actually coded for such a machine.Yikes. So you're saying you'd have lockless sharing of data between the cores and only force a cache sync when communicating between processors? Makes sense, I suppose, but it sounds risky.Why lockless?I see, that you catched the basic principal behind my example. And as you may see above it is difficult to the human brain to think in concurrency: you serialized the events but do not handle the case when depending on an unlucky implementation both cores might independently raise both exceptions, one core the fire exception and the other the alive-knob exception. In this case you have a control leak.Why can't the exception handlers serialize error-handing though? There ultimately has to be some coordination to resolve potentially conflicting directives. Why should this happen when the exception is thrown as opposed to when it's caught?There is one more thing to mention: it is not seldom, that specifications are incomplete or even contradictory and that detection of this specification faults occurs late in the software production process. Depending on the awareness of the implementators such a fault might traverse into the final product. Have a look at your two cases: you are handling the case that the alive-knob exception comes first, but you missed that the fire-knob exception might be thrown, when the train stopped already, but in a tunnel.And what if the train had already stopped because of an engine failure, or because someone pulled the emergency brake? The 'fire' routine would need to know whether it should try and move a stopped train out of a tunnel, etc. How can this be solved by prioritizing exceptions? Or am I missing something? Sean
Jun 19 2005
It's been said in this thread before, but multi-threading control is a function of the OS and not the language. Is C a dead language because it doesn't have dual-core functionality? Of course not. Although, we're still not clear on what dual-core functionality is being proposed to be added to the language. Regardless, it shouldn't be a concern. Simple mutli-threading constructs and locking mechanisms should be enough to guarantee that D will work in dual-core systems. In article <d94hu8$2l7i$1 digitaldaemon.com>, Sean Kelly says...In article <d94alb$2gld$1 digitaldaemon.com>, Manfred Nowak says...Regards, James DunneSean Kelly <sean f4.ca> wrote: [...]If multiple cores share a single cache, then there's no need to force cache coherency when sharing data between them. Of course, that assumes there's some way to tell you're running on two cores sharing a cache, which may not be possible. As for why: cache synchs take time. Less time than full locking, but time nevertheless. I don't know how useful this would be for PCs, but for NUMA machines that have clusered cores but inter-cluster ops involve message-passing, this may be a reasonable strategy. Though I'm speculating here, as I've never actually coded for such a machine.Yikes. So you're saying you'd have lockless sharing of data between the cores and only force a cache sync when communicating between processors? Makes sense, I suppose, but it sounds risky.Why lockless?I see, that you catched the basic principal behind my example. And as you may see above it is difficult to the human brain to think in concurrency: you serialized the events but do not handle the case when depending on an unlucky implementation both cores might independently raise both exceptions, one core the fire exception and the other the alive-knob exception. In this case you have a control leak.Why can't the exception handlers serialize error-handing though? There ultimately has to be some coordination to resolve potentially conflicting directives. Why should this happen when the exception is thrown as opposed to when it's caught?There is one more thing to mention: it is not seldom, that specifications are incomplete or even contradictory and that detection of this specification faults occurs late in the software production process. Depending on the awareness of the implementators such a fault might traverse into the final product. Have a look at your two cases: you are handling the case that the alive-knob exception comes first, but you missed that the fire-knob exception might be thrown, when the train stopped already, but in a tunnel.And what if the train had already stopped because of an engine failure, or because someone pulled the emergency brake? The 'fire' routine would need to know whether it should try and move a stopped train out of a tunnel, etc. How can this be solved by prioritizing exceptions? Or am I missing something? Sean
Jun 19 2005
James Dunne <james.jdunne gmail.com> wrote:Is C a dead language because it doesn't have dual-core functionality? Of course not.True. But have you read why Buhr abandoned his concurrency project in C?Simple mutli-threading constructs and locking mechanisms should be enough to guarantee that D will work in dual-core systems.Can you prove that? [...]Why should they? This kind of argument has shown up repeatedly: Why should a concurrent working machine be viewed as a serial working machine? In fact the AMD cores are designed to have a programmable lower bound on the priority of interrupts they will handle: so they will handle interrupts concurrently. [...]In this case you have a control leak.Why can't the exception handlers serialize error-handing though?You are right, that you can extend the security rules and will have more complex scenes to solve. Therefore I limited the example to only three variables.And what if the train had already stopped because of an engine failure, or because someone pulled the emergency brake?This truly cannot be done by prioritizing and therefore I said, that you have a control leak: depending on the implementation it might be necessary to preemptry both taks assigned to the cores and start one adapted to the more complex scene. -manfredThe 'fire' routine would need to know whether it should try and move a stopped train out of a tunnel, etc. How can this be solved by prioritizing exceptions? Or am I missing something?
Jun 19 2005
In article <d94ls5$2o57$1 digitaldaemon.com>, Manfred Nowak says...They should because the way errors are handled depends on system state. And resources for handling there errors are shared. If two errors are thrown concurrently that both want to do something with the speed of the train, for example, something will need to prioritize those operations. What would the speed control do if it simultaneously received errors to stop and to accelerate?Why should they? This kind of argument has shown up repeatedly: Why should a concurrent working machine be viewed as a serial working machine? In fact the AMD cores are designed to have a programmable lower bound on the priority of interrupts they will handle: so they will handle interrupts concurrently.In this case you have a control leak.Why can't the exception handlers serialize error-handing though?This can all be done in code though. Do multi-core CPUs actually offer instructions to do this in a way that requires language support beyond what D already has? (I suppose I should go read the references you've been posting) SeanThis truly cannot be done by prioritizing and therefore I said, that you have a control leak: depending on the implementation it might be necessary to preemptry both taks assigned to the cores and start one adapted to the more complex scene.The 'fire' routine would need to know whether it should try and move a stopped train out of a tunnel, etc. How can this be solved by prioritizing exceptions? Or am I missing something?
Jun 19 2005
A dualcore isn't that mucgh different from dual CPUs. Make an example of what problem could arise on a dual core that can't on dual CPUs.Simple mutli-threading constructs and locking mechanisms should be enough to guarantee that D will work in dual-core systems.Can you prove that?[...]Anyway, this isn't a new problem as real concurrency isn't an invention of this year. We have it for a long time. There are a lot of dual CPU-machines with real concurrency. You haven't described any problem that wouldn't arise on such machine.Why should they? This kind of argument has shown up repeatedly: Why should a concurrent working machine be viewed as a serial working machine? In fact the AMD cores are designed to have a programmable lower bound on the priority of interrupts they will handle: so they will handle interrupts concurrently. [...]In this case you have a control leak.Why can't the exception handlers serialize error-handing though?You are right, that you can extend the security rules and will have more complex scenes to solve. Therefore I limited the example to only three variables.And what if the train had already stopped because of an engine failure, or because someone pulled the emergency brake?This truly cannot be done by prioritizing and therefore I said, that you have a control leak: depending on the implementation it might be necessary to preemptry both taks assigned to the cores and start one adapted to the more complex scene.The 'fire' routine would need to know whether it should try and move a stopped train out of a tunnel, etc. How can this be solved by prioritizing exceptions? Or am I missing something?
Jun 20 2005
Manfred Nowak wrote:James Dunne <james.jdunne gmail.com> wrote:<SNIP>I haven't had time to read the references that you posted, but the above begs the question - can you prove that existing multi-threaded controls will not work correctly on SMP machines? I've read this thread, and I am sorry to say that I am too thick to see why dual core CPUs are any different to programming multiple CPU machines - or for that matter any different to programming a multi-threaded application. Manfred, you look to be most concerned with concurrency issues - but from a programmers point of view I cannot see the difference between programming with multiple threads and programming with multiple CPUS/cores. Assuming a general purpose OS (and I think we have to), then your train example has (to my mind) exactly the same problems regardless of what kind of machine it is run on. The only true difference is that on a multiple core machine the instructions can actually run at the same physical time, on a single core machine the threads need to share the CPU, but that means nothing because the CPU could change threads every few operations - ie you need to provide the same locks and measures anyhow. BradSimple mutli-threading constructs and locking mechanisms should be enough to guarantee that D will work in dual-core systems.Can you prove that?
Jun 20 2005
In article <d96n2l$11lq$1 digitaldaemon.com>, Brad Beveridge says...Manfred Nowak wrote:They will.James Dunne <james.jdunne gmail.com> wrote:<SNIP>I haven't had time to read the references that you posted, but the above begs the question - can you prove that existing multi-threaded controls will not work correctly on SMP machines?Simple mutli-threading constructs and locking mechanisms should be enough to guarantee that D will work in dual-core systems.Can you prove that?I've read this thread, and I am sorry to say that I am too thick to see why dual core CPUs are any different to programming multiple CPU machines - or for that matter any different to programming a multi-threaded application.AFAIK, dual core machines are indistuingishable from 'true' SMP machines to all but perhaps an OS programmer. The most obvious example of this is that Windows reports each core of a multi-core machine as a separate CPU.Manfred, you look to be most concerned with concurrency issues - but from a programmers point of view I cannot see the difference between programming with multiple threads and programming with multiple CPUS/cores.The only difference I can think of is that cache coherency is not an issue with single CPU machines, though you typically have to pretend that it is anyway (since not many applications are written to target a specific hardware configuration). Theoretically, I could see some of what Manfred mentioned being a potential point of optimization for realtime systems, but those would probably be built with a custom compiler and target a specific run environment anyway.Assuming a general purpose OS (and I think we have to), then your train example has (to my mind) exactly the same problems regardless of what kind of machine it is run on. The only true difference is that on a multiple core machine the instructions can actually run at the same physical time, on a single core machine the threads need to share the CPU, but that means nothing because the CPU could change threads every few operations - ie you need to provide the same locks and measures anyhow.Exactly. D is no different that any other procedural language in how it deals with concurrency. Though as a point of geek interest I suppose it's worth mentioning that BS' original purpose for C++ was as a concurrent language--it just didn't really stay that way once he'd finished his research. In any case, if there's anything that D lacks, I'd love to hear some concrete examples. It's much easier to address issues when you know specifically what they are, and the discussion has remained pretty abstract up to this point. Sean
Jun 20 2005
I need to read up a bit on multi-core systems, but they act the same as SMP systems, correct? So your concern is having library facilities which allow you to assign tasks to different processors and so on? If so, I think at least some basic functionality is a candidate for 1.0, especially if some motivated person is willing to write it :) I'm currently experimenting with some lockless synch. functionality in Ares, and would be happy to build processor affinity support and such into the Thread class if someone is willing to supply the assembly for it... and I believe Walter would do the same for Phobos. Sean
Jun 17 2005
Sean Kelly wrote:I need to read up a bit on multi-core systems, but they act the same as SMP systems, correct? So your concern is having library facilities which allow you to assign tasks to different processors and so on? If so, I think at least some basic functionality is a candidate for 1.0, especially if some motivated person is willing to write it :) I'm currently experimenting with some lockless synch. functionality in Ares, and would be happy to build processor affinity support and such into the Thread class if someone is willing to supply the assembly for it... and I believe Walter would do the same for Phobos. SeanThis I agree with, library support for multi-processor systems is a good idea. Of course, as far as I am aware at the application level you don't really get to choose anyhow - you can provide hints to the OS about processor afinity, but that is about it. Writing software for multicore systems is almost the same as writing multithreaded programs - the main difference being that even more sublte bugs can show due to the fact that threads actually are executing at the same time rather than concurrently. As an aside, I don't particularly see the true use for multicore systems in real life applications at the moment. Right now most CPUs, unless you program very carefully, are memory bound - they spend a lot of their time waiting for memory accesses. Having multiple cores just increases the demand on the main memory bus, so the CPUs (unless executing completely out of cache) will still be waiting a lot. But I guess that is why we are seeing larger and larger L1 caches. Brad
Jun 17 2005
In article <d8uq8m$1heq$1 digitaldaemon.com>, Brad Beveridge says...As an aside, I don't particularly see the true use for multicore systems in real life applications at the moment. Right now most CPUs, unless you program very carefully, are memory bound - they spend a lot of their time waiting for memory accesses. Having multiple cores just increases the demand on the main memory bus, so the CPUs (unless executing completely out of cache) will still be waiting a lot. But I guess that is why we are seeing larger and larger L1 caches.Exactly. And that leaves us with cache coherency problems. I think we're getting close to a fundamental change in how applications are designed, but I haven't seen any suggestion for how to handle SMP efficiently and easily as locks and such just don't cut it. It's an interesting time for software design :) Sean
Jun 17 2005
Sean Kelly wrote:In article <d8uq8m$1heq$1 digitaldaemon.com>, Brad Beveridge says...<snip>Exactly. And that leaves us with cache coherency problems. I think we're getting close to a fundamental change in how applications are designed, but I haven't seen any suggestion for how to handle SMP efficiently and easily as locks and such just don't cut it. It's an interesting time for software design :) SeanThinking along these lines, performance programming in D would possibly benefit more from a library that lets you manipulate the cache. Such a library could possibly provide functions to prefill the cache, lock portions of it, etc. Of course, messing with caches is not the kind of thing that you want to do even 1% of the time - there is just too much chance that locking the cache down will negatively impact performance. Especially if the OS wants to do a context switch. Sigh, programming just ain't what it used to be when you could cycle count your assembler instructions & figure out how fast your loop would be :) Brad
Jun 17 2005
In article <d8utov$1khp$1 digitaldaemon.com>, Brad Beveridge says...Thinking along these lines, performance programming in D would possibly benefit more from a library that lets you manipulate the cache. Such a library could possibly provide functions to prefill the cache, lock portions of it, etc. Of course, messing with caches is not the kind of thing that you want to do even 1% of the time - there is just too much chance that locking the cache down will negatively impact performance. Especially if the OS wants to do a context switch. Sigh, programming just ain't what it used to be when you could cycle count your assembler instructions & figure out how fast your loop would be :)True enough :) And things are changing for x86 architectures in this regard. Until recently, x86 machines only had full mfence facilities (with the LOCK instruction) but IIRC acquire/release instructions were added to the Itanium, and I think things are moving towards more fine-grained cache control. But this is something that is sufficiently complex (even for experts) that it really needs to be done right in a library so that the average joe doesn't have to worry about it. Lockless containers are one such feature, and perhaps some other design patterns would be appropriate to support as well. Ben's work is a definite step in the right direction, and it may well be a basis for some of the stuff that ends up in Ares. As for the rest... it's worth keeping on on the C++ standardization process as they're facing similar issues for the next release. But D has a lead on C++ at the moment because of the way Walter implemented 'volatile'. It's my hope that D will be we well suited for concurrent programming years before the next iteration of the C++ standard is finalized. Sean
Jun 17 2005
Sean Kelly <sean f4.ca> wrote:I need to read up a bit on multi-core systems, but they act the same as SMP systems, correct?Dual-cores _are_ an implementation of SMP.So your concern is having library facilities which allow you to assign tasks to different processors and so on?No. I have somewhere seen an argument, that if concurrency is not implemented into the language then no compiler can be guaranteed to deliver correct code under all circumstances---therefore concurrency must be implemented into the language. -manfred
Jun 18 2005
There are some problems with optimizers that can move code around so things might get called before a libray-lock-directive if the compiler isn't aware of that it musn't move code in fromt of or behind this function call.I need to read up a bit on multi-core systems, but they act the same as SMP systems, correct?Dual-cores _are_ an implementation of SMP.So your concern is having library facilities which allow you to assign tasks to different processors and so on?No. I have somewhere seen an argument, that if concurrency is not implemented into the language then no compiler can be guaranteed to deliver correct code under all circumstances---therefore concurrency must be implemented into the language.
Jun 18 2005
In article <d90hkm$134$2 digitaldaemon.com>, Manfred Nowak says...Sean Kelly <sean f4.ca> wrote:Just making sure I wasn't missing something.I need to read up a bit on multi-core systems, but they act the same as SMP systems, correct?Dual-cores _are_ an implementation of SMP.This is an issue with C/C++. Specifically, it relates to the "as if" rule and the fact that the theoretical virtual machine optimizers target has no concept of concurrency. So there's no real way to ensure volatile instructions aren't being reordered unless you use a synchronization library. D addresses this particular issue somewhat in its reinterpretation of "volatile," and I'm sure Walter is keeping an eye on the C++ standardization talks about this issue as well. SeanSo your concern is having library facilities which allow you to assign tasks to different processors and so on?No. I have somewhere seen an argument, that if concurrency is not implemented into the language then no compiler can be guaranteed to deliver correct code under all circumstances---therefore concurrency must be implemented into the language.
Jun 18 2005
Sean Kelly <sean f4.ca> wrote: [...]This is an issue with C/C++. Specifically, it relates to the "as if" rule and the fact that the theoretical virtual machine optimizers target has no concept of concurrency. So there's no real way to ensure volatile instructions aren't being reordered unless you use a synchronization library. D addresses this particular issue somewhat in its reinterpretation of "volatile," and I'm sure Walter is keeping an eye on the C++ standardization talks about this issue as well.Are you able to prove, that the argument holds for C++ only, which would be a contradiction to a paper accepted by ACM and available here: http://plg.uwaterloo.ca/~usystem/pub/uSystem/LibraryApproach.ps.gz -manfred
Jun 19 2005
In article <d948ms$2feb$1 digitaldaemon.com>, Manfred Nowak says...Sean Kelly <sean f4.ca> wrote: [...]Not at all. I imagine many languages target a single-threaded virtual machine. Java is probably one of the few exceptions. SeanThis is an issue with C/C++. Specifically, it relates to the "as if" rule and the fact that the theoretical virtual machine optimizers target has no concept of concurrency. So there's no real way to ensure volatile instructions aren't being reordered unless you use a synchronization library. D addresses this particular issue somewhat in its reinterpretation of "volatile," and I'm sure Walter is keeping an eye on the C++ standardization talks about this issue as well.Are you able to prove, that the argument holds for C++ only, which would be a contradiction to a paper accepted by ACM and available here: http://plg.uwaterloo.ca/~usystem/pub/uSystem/LibraryApproach.ps.gz
Jun 19 2005
In article <d948ms$2feb$1 digitaldaemon.com>, Manfred Nowak says...Sean Kelly <sean f4.ca> wrote: [...]Okay, I dug up a copy of Ghostscript for the PC and read the first few pages of this paper. I definately agree with it, but I don't know that it applies to D. For reference, here are the suggested solutions: 1. provide some explicit language facilities to control optimization (eg. pragma, volatile, etc.) 2. provide some concurrency constructs that allow the translator to determine when to disable certain optimizations 3. a combination of approaches one and two It's worth noting that D already provides both of these proposed solutions in language. The 'synchronized' keyword could be used to prevent the compiler from optimizing code around these areas (if it isn't already). And 'volatile' provides programmers who need to implement concurrent code outside of synchronization blocks a means of preventing compiler optimization of critical code blocks. More work may still be useful in this area. For example, 'volatile' in D just prevents optimization across a code block, but it might be worthwhile to provide a means for something akin to acquire and release semantics to allow *some* optimization to occur. SeanThis is an issue with C/C++. Specifically, it relates to the "as if" rule and the fact that the theoretical virtual machine optimizers target has no concept of concurrency. So there's no real way to ensure volatile instructions aren't being reordered unless you use a synchronization library. D addresses this particular issue somewhat in its reinterpretation of "volatile," and I'm sure Walter is keeping an eye on the C++ standardization talks about this issue as well.Are you able to prove, that the argument holds for C++ only, which would be a contradiction to a paper accepted by ACM and available here: http://plg.uwaterloo.ca/~usystem/pub/uSystem/LibraryApproach.ps.gz
Jun 20 2005
Sean Kelly wrote: <Snip>It's worth noting that D already provides both of these proposed solutions in language. The 'synchronized' keyword could be used to prevent the compiler from optimizing code around these areas (if it isn't already). And 'volatile' provides programmers who need to implement concurrent code outside of synchronization blocks a means of preventing compiler optimization of critical code blocks. More work may still be useful in this area. For example, 'volatile' in D just prevents optimization across a code block, but it might be worthwhile to provide a means for something akin to acquire and release semantics to allow *some* optimization to occur.Does volatile prevent code movement within the block? For example ... some optimised code (A) ... volatile { ... some order critical code ... } ... some optimised code (B) It is obvious from the description of volatile that the 3 sections of code above will have memory barriers, ie when the volatile section begins all memory writes from A will have occured, and when B begins executing all memory writes from the volatile block will have finished. But, does code within the volatile block get optimised? It would be nice if code within a volatile statement is strictly ordered, with no opportunity for the compiler to move memory read/write operations. Does anybody know if this is true in practice? Brad
Jun 20 2005
In article <d97jeu$1mcv$1 digitaldaemon.com>, Brad Beveridge says...Sean Kelly wrote: <Snip>The spec just says that "Memory writes occurring before the Statement are performed before any reads within or after the Statement. Memory reads occurring after the Statement occur after any writes before or within Statement are completed." So the compiler is currently free to optimize within the code block, just not across the boundaries. And now that I look at it, it sounds like volatile statements already implement acquire/release semantics. I think the current behavior is actually okay though, as the code within the volatile block could theoretically be thousands of lines long, and I wouldn't want the optimizer to ignore that code completely, just not optimize it beyond the boundaries I've established. Also, the requirements for 'synchronized' say nothing about optimizer behavior, and I think they should--'synchronized' should probably be identical to 'volatile' except that the block is also atomic. I grant that it would be easy enough for a Mutex writer to add volatile blocks to his code, but as a synchronized block is implicitly volatile, it's worth changing simply to improve clarity if nothing else. SeanIt's worth noting that D already provides both of these proposed solutions in language. The 'synchronized' keyword could be used to prevent the compiler from optimizing code around these areas (if it isn't already). And 'volatile' provides programmers who need to implement concurrent code outside of synchronization blocks a means of preventing compiler optimization of critical code blocks. More work may still be useful in this area. For example, 'volatile' in D just prevents optimization across a code block, but it might be worthwhile to provide a means for something akin to acquire and release semantics to allow *some* optimization to occur.Does volatile prevent code movement within the block? For example ... some optimised code (A) ... volatile { ... some order critical code ... } ... some optimised code (B) It is obvious from the description of volatile that the 3 sections of code above will have memory barriers, ie when the volatile section begins all memory writes from A will have occured, and when B begins executing all memory writes from the volatile block will have finished. But, does code within the volatile block get optimised? It would be nice if code within a volatile statement is strictly ordered, with no opportunity for the compiler to move memory read/write operations. Does anybody know if this is true in practice?
Jun 21 2005
On Thu, 16 Jun 2005 16:09:44 +0000 (UTC), Manfred Nowak wrote:The shipping of the "AMD Athlon 64 X2" is announced to start at the end of this month. A review is available: http://www.amdreview.com/reviews.php?rev=athlonx24200 As the review suggests WinXP and Sandra are prepared to use more than one CPU. Will D be outdated before the release of 1.0 because D has no support for multi core units?Yes. In the exact same manner that all existing 3+GL languages are. talking about library support rather than language support? Are you talking about the need for D to have new keywords or new object code generation when the target is a dual/triple/quadruple/quintuple/... core machine? Maybe this thread can be renamed "Duel Core Support" ;-) -- Derek Parnell Melbourne, Australia 20/06/2005 7:35:55 AM
Jun 19 2005
Derek Parnell <derek psych.ward> wrote: [...]I disagree. All this languages are way beyond version 1.0 whereas D isnt.Will D be outdated before the release of 1.0 because D has no support for multi core units?Yes. In the exact same manner that all existing 3+GL languagesBut maybe you are talking about library support rather than language support?If the paper of Buhr, which I have mentioned somewehere above, is right then it is possible to include all concurrency support into a library, but only if the language follows the rules dictated by the library. And I agree with Buhr that such dicatation is the same as havin chnged the language.Are you talking about the need for D to have new keywords or new object code generation when the target is a dual/triple/quadruple/quintuple/... core machine?According to my statement above a clear: maybe. And the reason for this is that I do not believe that the only two keyowrds in D that something have to do with concurrency can be show as aequivalents to Buhrs "mutex" and "monitor". But I may be wrong.Maybe this thread can be renamed "Duel Core Support" ;-)Thx for this broad hint. In fact I feel thrown onto a position which I did not want to be engaged in. All I wanted to know is whether there is a proof that D can handle concurrency in general and as the title shows dual cores as a special case. Maybe I should have posted this into the "learn" group. However, I posted here and found myself confronted with opinions, that dual cores are not different from single cores or unfounded claims that D can handle any kind of concurrency. Somehow I feel very uncomfortable. -manfred
Jun 20 2005
Manfred Nowak wrote:Thx for this broad hint. In fact I feel thrown onto a position which I did not want to be engaged in. All I wanted to know is whether there is a proof that D can handle concurrency in general and as the title shows dual cores as a special case. Maybe I should have posted this into the "learn" group. However, I posted here and found myself confronted with opinions, that dual cores are not different from single cores or unfounded claims that D can handle any kind of concurrency. Somehow I feel very uncomfortable.If I have contributed to your discomfort, I am sorry - that was certainly not my intention. I truly am interested in this topic, but as I've said before I just don't understand the problem. I also have not read the references previously posted as they are not in a format I can easily open (need to get a ps viewer, etc). I think the primary things I don't understand are (all are from a logical/programmers point of view) 1) Is there any difference between multiple core CPUs, and machines with multiple CPUs? * I don't believe that there is any significant difference, in which case we perhaps should agree that we are talking about SMP in general. 2) From a programmers point of view, what _is_ the difference between a program that runs in multiple threads and a program that runs in multiple threads on multiple cores? * I understand that physically there are different things happening, but I currently believe that logically there is no difference. 3) Can you please summerise the primitives that are required to program properly on SMP machines? * Although I do little multi-threaded programming, I understand that threads need to have atomic operations as a basic synchronizing mechanism, other than that I am not familiar enough to comment. 4) Could you please show a specific case that D is not able to handle an SMP situation, and how it could/should be fixed with additions to the language? * I liked the train example, could you perhaps make it pseudo-code & point out the weaknesses? Thanks Brad
Jun 20 2005
You can build mutexes and monitors with synchronized without problems.Are you talking about the need for D to have new keywords or new object code generation when the target is a dual/triple/quadruple/quintuple/... core machine?According to my statement above a clear: maybe. And the reason for this is that I do not believe that the only two keyowrds in D that something have to do with concurrency can be show as aequivalents to Buhrs "mutex" and "monitor". But I may be wrong.
Jun 21 2005
Matthias Becker <Matthias_member pathlink.com> wrote:You can build mutexes and monitors with synchronized without problems.So why did Buhr implement them? -manfred
Jun 21 2005
Manfred Nowak wrote:Matthias Becker <Matthias_member pathlink.com> wrote:I read the library approaches paper from Buhr that you reference, I don't see that he implemented anything. He made two basic points 1) Variables cached in registers will not be visible between tasks 2) Code optimisation can reorder instructions agressively, which can lead to code that should be inside critical sections being moved outside critical sections. C addresses point 1 with the volatile keyword, any variable that is "volatile" will be written to memory rather than kept solely in registers. D's meaning of volatile addresses both concerns, code cannot move around a volatile statement, and reads and writes are performed to memory. D also adds "synchronized", but in reality you could build your own locks on top of volatile without the language feature "sychronized". So D as a language meets the criteria for concurrent programming that Buhr layed out. BradYou can build mutexes and monitors with synchronized without problems.So why did Buhr implement them? -manfred
Jun 22 2005