www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - optlink on multicore machines

reply Walter Bright <newshound1 digitalmars.com> writes:
After acquiring a mysterious virus that would randomly hang my Windows 
box at 100% CPU but all the processes showed 0 runtime, it was time to 
reinstall Windows. Since installing Windows is an all-day affair, I 
decided it was time to upgrade my 7 year old hardware to multicore.

Once I was up and running, I decided to run the D test suite. I 
immediately discovered that optlink simply doesn't work on multicore. 
The multithreading code in it was developed for a single core machine, 
and multicore is different.

I was able to fix it by running the command:

    imagecfg -a 0x1 \dm\bin\link.exe

imagecfg.exe is downloadable from the internet. This command patches the 
executable so it only runs on one core.
Jun 30 2009
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Walter Bright wrote:
 After acquiring a mysterious virus that would randomly hang my Windows
 box at 100% CPU but all the processes showed 0 runtime, it was time to
 reinstall Windows. Since installing Windows is an all-day affair, I
 decided it was time to upgrade my 7 year old hardware to multicore.
 
 Once I was up and running, I decided to run the D test suite. I
 immediately discovered that optlink simply doesn't work on multicore.
 The multithreading code in it was developed for a single core machine,
 and multicore is different.
 
 I was able to fix it by running the command:
 
    imagecfg -a 0x1 \dm\bin\link.exe
 
 imagecfg.exe is downloadable from the internet. This command patches the
 executable so it only runs on one core.
See, we TOLD YOU! :D Incidentally, I thought someone had already done that and it didn't work... if it DOES work, then brillo-bananas; I'm off to patch me some OPTLINK.
Jun 30 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Daniel Keep wrote:
 See, we TOLD YOU!
I remember asking about this a while back, and I was told it worked fine on multicore machines.
Jun 30 2009
parent reply BCS <none anon.com> writes:
Hello Walter,

 Daniel Keep wrote:
 
 See, we TOLD YOU!
 
I remember asking about this a while back, and I was told it worked fine on multicore machines.
I IS running fine on 3 or 4 multicore machines around here.
Jun 30 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.
That's a mystery, then.
Jun 30 2009
next sibling parent reply Brad Roberts <braddr bellevue.puremagic.com> writes:
On Tue, 30 Jun 2009, Walter Bright wrote:

 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.
That's a mystery, then.
Still sounds like a standard race condition. Reducing the app to a single core just makes it harder to hit.
Jun 30 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
Brad Roberts wrote:
 On Tue, 30 Jun 2009, Walter Bright wrote:
 
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.
That's a mystery, then.
Still sounds like a standard race condition. Reducing the app to a single core just makes it harder to hit.
There's more to it than that. Multicore has sequential consistency issues that single core does not.
Jun 30 2009
prev sibling next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.
That's a mystery, then.
thats the wonderfull world of hard to catch and reproduce multithreading problems - hope D will help here in the future
Jun 30 2009
next sibling parent BLS <windevguy hotmail.de> writes:
dennis luehring wrote:
 Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.
That's a mystery, then.
thats the wonderfull world of hard to catch and reproduce multithreading problems - hope D will help here in the future
Just D is not written in D. And now, thanks to the wonderful new zero pb multi-what-the-heck support it would be a nice D2 language in action- plus proof of product test case. But well, eat your own dog food has never been very en vogue in D tool chain development.
Jun 30 2009
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 30 Jun 2009 20:54:55 +0200, dennis luehring wrote:

 Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.
That's a mystery, then.
thats the wonderfull world of hard to catch and reproduce multithreading problems - hope D will help here in the future
Ok then ... so optlink is going to be rewritten in D - excellent! And good luck to the brave developer too. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jun 30 2009
parent reply Benji Smith <dlanguage benjismith.net> writes:
Derek Parnell wrote:
 On Tue, 30 Jun 2009 20:54:55 +0200, dennis luehring wrote:
 
 Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.
That's a mystery, then.
thats the wonderfull world of hard to catch and reproduce multithreading problems - hope D will help here in the future
Ok then ... so optlink is going to be rewritten in D - excellent! And good luck to the brave developer too.
Just out of curiosity... Why is a linker so hard to write? A few years ago, I developed a small domain specific language and implemented its compiler, outputting bytecode for a very specialized (and limited purpose) virtual machine. In my case, I decided it was easier to give good error messages if the compiler & linker were a single entity. I've always been annoyed by the discrepancy between compilers and linkers (mostly because build tools have their own special languages, pointlessly different than the development language). So my compiler combined compilation and linking into a single step. Every time the compiler encountered an "import" statement, it checked to see whether a symbol table existed for the imported module and, if not, it added the module to the parse queue. After processing a new module, it would add the resultant code into a namespace-aware symbol table for the given module. Once the parse queue was empty, I checked for unresolved symbols, cyclic dependency errors, etc. If there were no other referential errors (and if all the other semantic checks passed), then I'd start the code-generation process at the main entry point. The whole program was represented as a DAG, and writing bytecode was as simple as traversing that graph. Since the "linking" behavior was built right into the compiler, it was a piece of cake. Anyhow... Whenever someone on the NG complains about optlink, the inevitable conclusion is that it would be a huge undertaking to produce a new or improved linker. Why? Seems to me that a new linker implementation would be relatively straightforward. There are really only three steps: 1) Parse object files. 2) Create DAG structures using references in those object files. 3) Walk the graph, copying the code (with rewritten addresses) into the final executable. Is it really more complex than that? What am I missing? (Caveat: I don't know much about Windows PE, or any of the many other object file formats. Still, though... it doesn't seem like it could be THAT difficult. The compiler has already done most of the tricky stuff.) --benji
Jun 30 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Benji Smith wrote:
 Just out of curiosity... Why is a linker so hard to write?
Linkers are actually rather simple programs. The hard part is all the undocumented, semi-documented, and flat out wrong documentation on esoterica of the various file formats involved. A linker that's been around for a while gets all this "lore" embedded into the code. Discarding it and not using it for a reference means you're in for years of debugging.
 (Caveat: I don't know much about Windows PE, or any of the many other 
 object file formats. Still, though... it doesn't seem like it could be 
 THAT difficult. The compiler has already done most of the tricky stuff.)
Here's a list of the file formats optlink deals with: Intel OMF with Pharlap, Microsoft, and Digital Mars (!) extensions Codeview (various versions of) COM EXE New EXE Portable EXE 16 bit Windows 16 bit DLLs DOS Overlays OS/2 executables DOS extender executables Stub executables 16 and 32 bit resource files library file format module definition file map files linker command files For just one example, PE formats are only semi-documented. Write a file dumper for them and you'll see <g>. Granted, these days one can cross about half of that off the list. But there's still a lot left.
Jun 30 2009
parent reply Tim Matthews <tim.matthews7 gmail.com> writes:
Walter Bright wrote:

 
 For just one example, PE formats are only semi-documented. Write a file 
 dumper for them and you'll see <g>.
 
Any particular problem u had because I did actually plan to write a dumper for pe formats.
Jun 30 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
Tim Matthews wrote:
 Walter Bright wrote:
 
 For just one example, PE formats are only semi-documented. Write a 
 file dumper for them and you'll see <g>.
Any particular problem u had because I did actually plan to write a dumper for pe formats.
Here's one: figure out the algorithm for the checksum field computation.
Jun 30 2009
prev sibling parent "Nick Sabalausky" <a a.a> writes:
"Benji Smith" <dlanguage benjismith.net> wrote in message 
news:h2ed4e$1ueh$1 digitalmars.com...
 Derek Parnell wrote:
 On Tue, 30 Jun 2009 20:54:55 +0200, dennis luehring wrote:

 Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.
That's a mystery, then.
thats the wonderfull world of hard to catch and reproduce multithreading problems - hope D will help here in the future
Ok then ... so optlink is going to be rewritten in D - excellent! And good luck to the brave developer too.
Just out of curiosity... Why is a linker so hard to write? A few years ago, I developed a small domain specific language and implemented its compiler, outputting bytecode for a very specialized (and limited purpose) virtual machine. In my case, I decided it was easier to give good error messages if the compiler & linker were a single entity. I've always been annoyed by the discrepancy between compilers and linkers (mostly because build tools have their own special languages, pointlessly different than the development language). So my compiler combined compilation and linking into a single step. Every time the compiler encountered an "import" statement, it checked to see whether a symbol table existed for the imported module and, if not, it added the module to the parse queue. After processing a new module, it would add the resultant code into a namespace-aware symbol table for the given module. Once the parse queue was empty, I checked for unresolved symbols, cyclic dependency errors, etc. If there were no other referential errors (and if all the other semantic checks passed), then I'd start the code-generation process at the main entry point. The whole program was represented as a DAG, and writing bytecode was as simple as traversing that graph. Since the "linking" behavior was built right into the compiler, it was a piece of cake. Anyhow... Whenever someone on the NG complains about optlink, the inevitable conclusion is that it would be a huge undertaking to produce a new or improved linker. Why? Seems to me that a new linker implementation would be relatively straightforward. There are really only three steps: 1) Parse object files. 2) Create DAG structures using references in those object files. 3) Walk the graph, copying the code (with rewritten addresses) into the final executable. Is it really more complex than that? What am I missing? (Caveat: I don't know much about Windows PE, or any of the many other object file formats. Still, though... it doesn't seem like it could be THAT difficult. The compiler has already done most of the tricky stuff.)
I'm not much of an expert on linkers, but maybe the difficulity is in getting Walter to fix/rewrite/release-the-source-for optlink, and/or maybe there's a percieved (or real) difficulty in the idea of getting an alternate linker to be adapted as the official dmd linker? Or maybe there could be a technical difficulty, too, I don't know :) Just speculating...
Jun 30 2009
prev sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Tue, Jun 30, 2009 at 11:01 AM, Walter
Bright<newshound1 digitalmars.com> wrote:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.
That's a mystery, then.
It works fine for me most of the time, but hangs about 1 out of 20 links or so. Not insurmountable for a 1-link project. But I can see how that ain't going to take you far if you're running a test suite with >> 20 programs that must be linked back-to-back. --bb
Jun 30 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Bill Baxter wrote:
 It works fine for me most of the time, but hangs about 1 out of 20
 links or so.  Not insurmountable for a 1-link project.  But I can see
 how that ain't going to take you far if you're running a test suite
 with >> 20 programs that must be linked back-to-back.
The test suite does thousands of links. It doesn't get further than a dozen or two into it before it fails. I suppose that could explain why it seems to work.
Jun 30 2009
next sibling parent "David B. Held" <dheld codelogicconsulting.com> writes:
Walter Bright wrote:
 Bill Baxter wrote:
 It works fine for me most of the time, but hangs about 1 out of 20
 links or so.  Not insurmountable for a 1-link project.  But I can see
 how that ain't going to take you far if you're running a test suite
 with >> 20 programs that must be linked back-to-back.
The test suite does thousands of links. It doesn't get further than a dozen or two into it before it fails. I suppose that could explain why it seems to work.
I did notice that the linker seemed to hang randomly. Glad it isn't just me. Dave
Jul 05 2009
prev sibling parent "David B. Held" <dheld codelogicconsulting.com> writes:
Walter Bright wrote:
 Bill Baxter wrote:
 It works fine for me most of the time, but hangs about 1 out of 20
 links or so.  Not insurmountable for a 1-link project.  But I can see
 how that ain't going to take you far if you're running a test suite
 with >> 20 programs that must be linked back-to-back.
The test suite does thousands of links. It doesn't get further than a dozen or two into it before it fails. I suppose that could explain why it seems to work.
I did notice that the linker seemed to hang randomly. Glad it isn't just me. Dave
Jul 05 2009
prev sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 30 Jun 2009 12:29:19 +0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 After acquiring a mysterious virus that would randomly hang my Windows  
 box at 100% CPU but all the processes showed 0 runtime, it was time to  
 reinstall Windows. Since installing Windows is an all-day affair, I  
 decided it was time to upgrade my 7 year old hardware to multicore.

 Once I was up and running, I decided to run the D test suite. I  
 immediately discovered that optlink simply doesn't work on multicore.  
 The multithreading code in it was developed for a single core machine,  
 and multicore is different.

 I was able to fix it by running the command:

     imagecfg -a 0x1 \dm\bin\link.exe

 imagecfg.exe is downloadable from the internet. This command patches the  
 executable so it only runs on one core.
Great to hear that. Will the linker be updated in upcoming release, or are everyone suggested to read the newsgroups, download imagecfg and patch optlink manually?
Jun 30 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
Denis Koroskin wrote:
 Will the linker be updated in upcoming release, or are everyone 
 suggested to read the newsgroups, download imagecfg and patch optlink 
 manually?
Certainly I'll patch the linker for the next update.
Jun 30 2009