digitalmars.D - optlink on multicore machines

Walter Bright (12/12) Jun 30 2009 After acquiring a mysterious virus that would randomly hang my Windows

Daniel Keep (6/22) Jun 30 2009 See, we TOLD YOU!

Walter Bright (3/4) Jun 30 2009 I remember asking about this a while back, and I was told it worked fine...

BCS (2/9) Jun 30 2009 I IS running fine on 3 or 4 multicore machines around here.

Walter Bright (2/3) Jun 30 2009 That's a mystery, then.

Brad Roberts (3/7) Jun 30 2009 Still sounds like a standard race condition. Reducing the app to a sing...

Walter Bright (3/11) Jun 30 2009 There's more to it than that. Multicore has sequential consistency

dennis luehring (3/7) Jun 30 2009 thats the wonderfull world of hard to catch and reproduce multithreading...

BLS (6/14) Jun 30 2009 Just D is not written in D. And now, thanks to the wonderful new zero pb...
Derek Parnell (7/15) Jun 30 2009 Ok then ... so optlink is going to be rewritten in D - excellent! And go...

Benji Smith (39/51) Jun 30 2009 Just out of curiosity... Why is a linker so hard to write?

Walter Bright (29/33) Jun 30 2009 Linkers are actually rather simple programs. The hard part is all the

Tim Matthews (3/7) Jun 30 2009 Any particular problem u had because I did actually plan to write a

Walter Bright (2/11) Jun 30 2009 Here's one: figure out the algorithm for the checksum field computation.

Nick Sabalausky (8/59) Jun 30 2009 I'm not much of an expert on linkers, but maybe the difficulity is in

Bill Baxter (7/11) Jun 30 2009 It works fine for me most of the time, but hangs about 1 out of 20

Walter Bright (4/8) Jun 30 2009 The test suite does thousands of links. It doesn't get further than a

David B. Held (4/14) Jul 05 2009 I did notice that the linker seemed to hang randomly. Glad it isn't
David B. Held (4/14) Jul 05 2009 I did notice that the linker seemed to hang randomly. Glad it isn't

Denis Koroskin (5/17) Jun 30 2009 Great to hear that.

Walter Bright (2/5) Jun 30 2009 Certainly I'll patch the linker for the next update.

Walter Bright <newshound1 digitalmars.com> writes:

After acquiring a mysterious virus that would randomly hang my Windows 
box at 100% CPU but all the processes showed 0 runtime, it was time to 
reinstall Windows. Since installing Windows is an all-day affair, I 
decided it was time to upgrade my 7 year old hardware to multicore.

Once I was up and running, I decided to run the D test suite. I 
immediately discovered that optlink simply doesn't work on multicore. 
The multithreading code in it was developed for a single core machine, 
and multicore is different.

I was able to fix it by running the command:

    imagecfg -a 0x1 \dm\bin\link.exe

imagecfg.exe is downloadable from the internet. This command patches the 
executable so it only runs on one core.

Jun 30 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Walter Bright wrote:
 After acquiring a mysterious virus that would randomly hang my Windows
 box at 100% CPU but all the processes showed 0 runtime, it was time to
 reinstall Windows. Since installing Windows is an all-day affair, I
 decided it was time to upgrade my 7 year old hardware to multicore.
 
 Once I was up and running, I decided to run the D test suite. I
 immediately discovered that optlink simply doesn't work on multicore.
 The multithreading code in it was developed for a single core machine,
 and multicore is different.
 
 I was able to fix it by running the command:
 
    imagecfg -a 0x1 \dm\bin\link.exe
 
 imagecfg.exe is downloadable from the internet. This command patches the
 executable so it only runs on one core.

See, we TOLD YOU!

:D

Incidentally, I thought someone had already done that and it didn't
work...  if it DOES work, then brillo-bananas; I'm off to patch me some
OPTLINK.

Jun 30 2009

Walter Bright <newshound1 digitalmars.com> writes:

Daniel Keep wrote:
 See, we TOLD YOU!

I remember asking about this a while back, and I was told it worked fine 
on multicore machines.

Jun 30 2009

BCS <none anon.com> writes:

Hello Walter,

 Daniel Keep wrote:
 
 See, we TOLD YOU!
 

 I remember asking about this a while back, and I was told it worked
 fine on multicore machines.
 

I IS running fine on 3 or 4 multicore machines around here.

Jun 30 2009

Walter Bright <newshound1 digitalmars.com> writes:

BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.

That's a mystery, then.

Jun 30 2009

Brad Roberts <braddr bellevue.puremagic.com> writes:

On Tue, 30 Jun 2009, Walter Bright wrote:

 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.

 
 That's a mystery, then.

Still sounds like a standard race condition.  Reducing the app to a single 
core just makes it harder to hit.

Jun 30 2009

Walter Bright <newshound1 digitalmars.com> writes:

Brad Roberts wrote:
 On Tue, 30 Jun 2009, Walter Bright wrote:
 
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.

 That's a mystery, then.

 
 Still sounds like a standard race condition.  Reducing the app to a single 
 core just makes it harder to hit.

There's more to it than that. Multicore has sequential consistency 
issues that single core does not.

Jun 30 2009

dennis luehring <dl.soluz gmx.net> writes:

Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.

 
 That's a mystery, then.

thats the wonderfull world of hard to catch and reproduce multithreading 
problems - hope D will help here in the future

Jun 30 2009

BLS <windevguy hotmail.de> writes:

dennis luehring wrote:
 Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.

 That's a mystery, then.

 
 thats the wonderfull world of hard to catch and reproduce multithreading 
 problems - hope D will help here in the future

Just D is not written in D. And now, thanks to the wonderful new zero pb 
  multi-what-the-heck support it would be a nice D2 language in action- 
   plus proof of product test case.

But well, eat your own dog food has never been very en vogue in D tool 
chain development.

Jun 30 2009

Derek Parnell <derek psych.ward> writes:

On Tue, 30 Jun 2009 20:54:55 +0200, dennis luehring wrote:

 Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.

 
 That's a mystery, then.

 
 thats the wonderfull world of hard to catch and reproduce multithreading 
 problems - hope D will help here in the future

Ok then ... so optlink is going to be rewritten in D - excellent! And good
luck to the brave developer too.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jun 30 2009

Benji Smith <dlanguage benjismith.net> writes:

Derek Parnell wrote:
 On Tue, 30 Jun 2009 20:54:55 +0200, dennis luehring wrote:
 
 Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.

 That's a mystery, then.

 thats the wonderfull world of hard to catch and reproduce multithreading 
 problems - hope D will help here in the future

 
 Ok then ... so optlink is going to be rewritten in D - excellent! And good
 luck to the brave developer too.
 

Just out of curiosity... Why is a linker so hard to write?

A few years ago, I developed a small domain specific language and 
implemented its compiler, outputting bytecode for a very specialized 
(and limited purpose) virtual machine.

In my case, I decided it was easier to give good error messages if the 
compiler & linker were a single entity. I've always been annoyed by the 
discrepancy between compilers and linkers (mostly because build tools 
have their own special languages, pointlessly different than the 
development language). So my compiler combined compilation and linking 
into a single step.

Every time the compiler encountered an "import" statement, it checked to 
see whether a symbol table existed for the imported module and, if not, 
it added the module to the parse queue. After processing a new module, 
it would add the resultant code into a namespace-aware symbol table for 
the given module.

Once the parse queue was empty, I checked for unresolved symbols, cyclic 
dependency errors, etc. If there were no other referential errors (and 
if all the other semantic checks passed), then I'd start the 
code-generation process at the main entry point. The whole program was 
represented as a DAG, and writing bytecode was as simple as traversing 
that graph. Since the "linking" behavior was built right into the 
compiler, it was a piece of cake.

Anyhow...

Whenever someone on the NG complains about optlink, the inevitable 
conclusion is that it would be a huge undertaking to produce a new or 
improved linker.

Why?

Seems to me that a new linker implementation would be relatively 
straightforward. There are really only three steps:

1) Parse object files.
2) Create DAG structures using references in those object files.
3) Walk the graph, copying the code (with rewritten addresses) into the 
final executable.

Is it really more complex than that? What am I missing?

(Caveat: I don't know much about Windows PE, or any of the many other 
object file formats. Still, though... it doesn't seem like it could be 
THAT difficult. The compiler has already done most of the tricky stuff.)

--benji

Jun 30 2009

Walter Bright <newshound1 digitalmars.com> writes:

Benji Smith wrote:
 Just out of curiosity... Why is a linker so hard to write?

Linkers are actually rather simple programs. The hard part is all the 
undocumented, semi-documented, and flat out wrong documentation on 
esoterica of the various file formats involved.

A linker that's been around for a while gets all this "lore" embedded 
into the code. Discarding it and not using it for a reference means 
you're in for years of debugging.

 (Caveat: I don't know much about Windows PE, or any of the many other 
 object file formats. Still, though... it doesn't seem like it could be 
 THAT difficult. The compiler has already done most of the tricky stuff.)

Here's a list of the file formats optlink deals with:

Intel OMF with Pharlap, Microsoft, and Digital Mars (!) extensions
Codeview (various versions of)
COM
EXE
New EXE
Portable EXE
16 bit Windows
16 bit DLLs
DOS Overlays
OS/2 executables
DOS extender executables
Stub executables
16 and 32 bit resource files
library file format
module definition file
map files
linker command files

For just one example, PE formats are only semi-documented. Write a file 
dumper for them and you'll see <g>.

Granted, these days one can cross about half of that off the list. But 
there's still a lot left.

Jun 30 2009

Tim Matthews <tim.matthews7 gmail.com> writes:

Walter Bright wrote:

 
 For just one example, PE formats are only semi-documented. Write a file 
 dumper for them and you'll see <g>.
 

Any particular problem u had because I did actually plan to write a 
dumper for pe formats.

Jun 30 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tim Matthews wrote:
 Walter Bright wrote:
 
 For just one example, PE formats are only semi-documented. Write a 
 file dumper for them and you'll see <g>.

 
 Any particular problem u had because I did actually plan to write a 
 dumper for pe formats.


Here's one: figure out the algorithm for the checksum field computation.

Jun 30 2009

"Nick Sabalausky" <a a.a> writes:

"Benji Smith" <dlanguage benjismith.net> wrote in message 
news:h2ed4e$1ueh$1 digitalmars.com...
 Derek Parnell wrote:
 On Tue, 30 Jun 2009 20:54:55 +0200, dennis luehring wrote:

 Walter Bright schrieb:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.

 That's a mystery, then.

 thats the wonderfull world of hard to catch and reproduce multithreading 
 problems - hope D will help here in the future

 Ok then ... so optlink is going to be rewritten in D - excellent! And 
 good
 luck to the brave developer too.

 Just out of curiosity... Why is a linker so hard to write?

 A few years ago, I developed a small domain specific language and 
 implemented its compiler, outputting bytecode for a very specialized (and 
 limited purpose) virtual machine.

 In my case, I decided it was easier to give good error messages if the 
 compiler & linker were a single entity. I've always been annoyed by the 
 discrepancy between compilers and linkers (mostly because build tools have 
 their own special languages, pointlessly different than the development 
 language). So my compiler combined compilation and linking into a single 
 step.

 Every time the compiler encountered an "import" statement, it checked to 
 see whether a symbol table existed for the imported module and, if not, it 
 added the module to the parse queue. After processing a new module, it 
 would add the resultant code into a namespace-aware symbol table for the 
 given module.

 Once the parse queue was empty, I checked for unresolved symbols, cyclic 
 dependency errors, etc. If there were no other referential errors (and if 
 all the other semantic checks passed), then I'd start the code-generation 
 process at the main entry point. The whole program was represented as a 
 DAG, and writing bytecode was as simple as traversing that graph. Since 
 the "linking" behavior was built right into the compiler, it was a piece 
 of cake.

 Anyhow...

 Whenever someone on the NG complains about optlink, the inevitable 
 conclusion is that it would be a huge undertaking to produce a new or 
 improved linker.

 Why?

 Seems to me that a new linker implementation would be relatively 
 straightforward. There are really only three steps:

 1) Parse object files.
 2) Create DAG structures using references in those object files.
 3) Walk the graph, copying the code (with rewritten addresses) into the 
 final executable.

 Is it really more complex than that? What am I missing?

 (Caveat: I don't know much about Windows PE, or any of the many other 
 object file formats. Still, though... it doesn't seem like it could be 
 THAT difficult. The compiler has already done most of the tricky stuff.)

I'm not much of an expert on linkers, but maybe the difficulity is in 
getting Walter to fix/rewrite/release-the-source-for optlink, and/or maybe 
there's a percieved (or real) difficulty in the idea of getting an alternate 
linker to be adapted as the official dmd linker? Or maybe there could be a 
technical difficulty, too, I don't know :)

Just speculating...

Jun 30 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Jun 30, 2009 at 11:01 AM, Walter
Bright<newshound1 digitalmars.com> wrote:
 BCS wrote:
 I IS running fine on 3 or 4 multicore machines around here.

 That's a mystery, then.

It works fine for me most of the time, but hangs about 1 out of 20
links or so.  Not insurmountable for a 1-link project.  But I can see
how that ain't going to take you far if you're running a test suite
with >> 20 programs that must be linked back-to-back.

--bb

Jun 30 2009

Walter Bright <newshound1 digitalmars.com> writes:

Bill Baxter wrote:
 It works fine for me most of the time, but hangs about 1 out of 20
 links or so.  Not insurmountable for a 1-link project.  But I can see
 how that ain't going to take you far if you're running a test suite
 with >> 20 programs that must be linked back-to-back.


The test suite does thousands of links. It doesn't get further than a 
dozen or two into it before it fails. I suppose that could explain why 
it seems to work.

Jun 30 2009

"David B. Held" <dheld codelogicconsulting.com> writes:

Walter Bright wrote:
 Bill Baxter wrote:
 It works fine for me most of the time, but hangs about 1 out of 20
 links or so.  Not insurmountable for a 1-link project.  But I can see
 how that ain't going to take you far if you're running a test suite
 with >> 20 programs that must be linked back-to-back.

 
 
 The test suite does thousands of links. It doesn't get further than a 
 dozen or two into it before it fails. I suppose that could explain why 
 it seems to work.

I did notice that the linker seemed to hang randomly.  Glad it isn't 
just me.

Dave

Jul 05 2009

"David B. Held" <dheld codelogicconsulting.com> writes:

Walter Bright wrote:
 Bill Baxter wrote:
 It works fine for me most of the time, but hangs about 1 out of 20
 links or so.  Not insurmountable for a 1-link project.  But I can see
 how that ain't going to take you far if you're running a test suite
 with >> 20 programs that must be linked back-to-back.

 
 
 The test suite does thousands of links. It doesn't get further than a 
 dozen or two into it before it fails. I suppose that could explain why 
 it seems to work.

I did notice that the linker seemed to hang randomly.  Glad it isn't 
just me.

Dave

Jul 05 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 30 Jun 2009 12:29:19 +0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 After acquiring a mysterious virus that would randomly hang my Windows  
 box at 100% CPU but all the processes showed 0 runtime, it was time to  
 reinstall Windows. Since installing Windows is an all-day affair, I  
 decided it was time to upgrade my 7 year old hardware to multicore.

 Once I was up and running, I decided to run the D test suite. I  
 immediately discovered that optlink simply doesn't work on multicore.  
 The multithreading code in it was developed for a single core machine,  
 and multicore is different.

 I was able to fix it by running the command:

     imagecfg -a 0x1 \dm\bin\link.exe

 imagecfg.exe is downloadable from the internet. This command patches the  
 executable so it only runs on one core.

Great to hear that.

Will the linker be updated in upcoming release, or are everyone suggested  
to read the newsgroups, download imagecfg and patch optlink manually?

Jun 30 2009

Walter Bright <newshound1 digitalmars.com> writes:

Denis Koroskin wrote:
 Will the linker be updated in upcoming release, or are everyone 
 suggested to read the newsgroups, download imagecfg and patch optlink 
 manually?

Certainly I'll patch the linker for the next update.

Jun 30 2009

D Programming

C/C++ Programming

Other

digitalmars.D - optlink on multicore machines