www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [challenge] Linker surgery

reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
The gist of the matter is I have module with some immutable tables. 
Quite some I should say (Unicode stuff). This gives the outline of problem:

imutable Something tableA  = ...; //~4-8Kb
...
imutable Something tableZ  = ...; //~4-8Kb

Now I have a set of functions that make use of each of these tables:

bool funcA(dchar ch){
	//optional ... extra logic
	return tableA[ch];
}
...
bool funcZ(dchar ch){ ... }

Now is the challenge is how do I make it NOT link in tables if I don't 
call the corresponding functions.

First try fails:

bool funcA(dchar ch){
	//this just allocates!
	immutable tableA = [ ... ];
	//sadly this appears in the map file
	static immutable tableA = [ ... ];
}

For the moment this what I found to 'work' :
template funcA(){
	immutable tableA = [ ... ];
	bool funcA(dchar ch){
		...
	}
}

Otherwise the table is present regardless of whether or not I make use 
of funcA. The above is a neat hack but I think there must be a better 
way to avoid pulling globals into executable.

In the test sample below 'fable' is present in the map file, foo itself 
is correctly absent.

//module
// should be changed somehow so as not to put fable into exe when foo
// is not referenced, with no templates involved
module mod;
immutable byte[] fable = [1, 2, 3, 4, 5];

public byte foo(int ch){
     return fable[ch];
}


//driver
import mod;

immutable byte[] bable = [1, 2, 3, 4, 5];

byte boo(int ch){
     return bable[ch];
}

void main(string[] args){
     boo(0);
}

-- 
Dmitry Olshansky
May 14 2013
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Tue, 14 May 2013 18:59:22 +0400
Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 
 Now is the challenge is how do I make it NOT link in tables if I
 don't call the corresponding functions.
 
Don't some linkers do unreferenced symbol removal? Maybe that's all that's needed?
May 14 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
14-May-2013 22:27, Nick Sabalausky пишет:
 On Tue, 14 May 2013 18:59:22 +0400
 Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Now is the challenge is how do I make it NOT link in tables if I
 don't call the corresponding functions.
Don't some linkers do unreferenced symbol removal?
Interestingly it does ... for functions.
 Maybe that's all
 that's needed?
Might be - if that was my personal need I might set off to search some smart linker. But the thing will end up in the standard library and surely there it's stuck with ld/optlink. -- Dmitry Olshansky
May 14 2013
parent reply "IgorStepanov" <wazar mail.ru> writes:
Do this table linked, if you remove all functions, which use it?
It not, you can try the next method (Another hack).

extern(C) immutable int[] table_ = [1,2,3]; //table_.mangleof ==
"table_";

int foo()
{
        pragma(mangle, "table_") extern immutable int[] table;
        return table[1];
}

In this hack we "fooling" the compiler. We say, that "table_"
doesn't used in foo(). foo() use another extern array "table"
with overrided mangle (new feature)

On Tuesday, 14 May 2013 at 18:36:32 UTC, Dmitry Olshansky wrote:
 14-May-2013 22:27, Nick Sabalausky пишет:
 On Tue, 14 May 2013 18:59:22 +0400
 Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Now is the challenge is how do I make it NOT link in tables 
 if I
 don't call the corresponding functions.
Don't some linkers do unreferenced symbol removal?
Interestingly it does ... for functions.
 Maybe that's all
 that's needed?
Might be - if that was my personal need I might set off to search some smart linker. But the thing will end up in the standard library and surely there it's stuck with ld/optlink.
May 14 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
15-May-2013 04:17, IgorStepanov пишет:
 Do this table linked, if you remove all functions, which use it?
Thanks for this try, but they DO link in always. And I believe this is a key problem - each function goes into a separate object but globals are always pulled in! Anyone ever used digits, letter or similar arrays in std.ascii ? I doubt that - yet they are almost (next to everything depends on std.uni and that depends on std.ascii) always there in the binary. These bits do add up and now with the full spectrum of Unicode... Hm... now if we go with the shared library route I'd only do a disservice by make tables into templates. But in static linked one surely nobody wants these e.g. extra ~20K of normalization tables (and there is way more) for next to *every* program. P.S. I'm coming to hate developing std stuff, so little space for maneuvers ;)
 It not, you can try the next method (Another hack).

 extern(C) immutable int[] table_ = [1,2,3]; //table_.mangleof ==
 "table_";

 int foo()
 {
         pragma(mangle, "table_") extern immutable int[] table;
         return table[1];
 }

 In this hack we "fooling" the compiler. We say, that "table_"
 doesn't used in foo(). foo() use another extern array "table"
 with overrided mangle (new feature)

 On Tuesday, 14 May 2013 at 18:36:32 UTC, Dmitry Olshansky wrote:
 14-May-2013 22:27, Nick Sabalausky пишет:
 On Tue, 14 May 2013 18:59:22 +0400
 Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Now is the challenge is how do I make it NOT link in tables if I
 don't call the corresponding functions.
Don't some linkers do unreferenced symbol removal?
Interestingly it does ... for functions.
 Maybe that's all
 that's needed?
Might be - if that was my personal need I might set off to search some smart linker. But the thing will end up in the standard library and surely there it's stuck with ld/optlink.
-- Dmitry Olshansky
May 17 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 17.05.2013 14:29, Dmitry Olshansky wrote:
 15-May-2013 04:17, IgorStepanov пишет:
 Do this table linked, if you remove all functions, which use it?
Thanks for this try, but they DO link in always. And I believe this is a key problem - each function goes into a separate object but globals are always pulled in!
Yes, if you build a library the functions in a module are split into separate object files, but data is always written into the object file of the original module. The linker cannot split these afterwards if any data in the module is referenced (which might happen by just importing the module). A workaround could be to put the data into a different module.
May 17 2013
next sibling parent reply "Igor Stepanov" <wazar.leollone yahoo.com> writes:
On Friday, 17 May 2013 at 17:57:54 UTC, Rainer Schuetze wrote:
 On 17.05.2013 14:29, Dmitry Olshansky wrote:
 15-May-2013 04:17, IgorStepanov пишет:
 Do this table linked, if you remove all functions, which use 
 it?
Thanks for this try, but they DO link in always. And I believe this is a key problem - each function goes into a separate object but globals are always pulled in!
Yes, if you build a library the functions in a module are split into separate object files, but data is always written into the object file of the original module. The linker cannot split these afterwards if any data in the module is referenced (which might happen by just importing the module). A workaround could be to put the data into a different module.
What happens, if you place table into separate module? This module will be compiled as independent object file and (I hope) can be not linked if symbols from it will not be used. Or my logic is broken?
May 17 2013
parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 18.05.2013 00:47, Igor Stepanov wrote:
 On Friday, 17 May 2013 at 17:57:54 UTC, Rainer Schuetze wrote:
 On 17.05.2013 14:29, Dmitry Olshansky wrote:
 15-May-2013 04:17, IgorStepanov пишет:
 Do this table linked, if you remove all functions, which use it?
Thanks for this try, but they DO link in always. And I believe this is a key problem - each function goes into a separate object but globals are always pulled in!
Yes, if you build a library the functions in a module are split into separate object files, but data is always written into the object file of the original module. The linker cannot split these afterwards if any data in the module is referenced (which might happen by just importing the module). A workaround could be to put the data into a different module.
What happens, if you place table into separate module? This module will be compiled as independent object file and (I hope) can be not linked if symbols from it will not be used. Or my logic is broken?
That should work with the restrictions as described in reply to Dimitry.
May 17 2013
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
17-May-2013 21:57, Rainer Schuetze пишет:
 On 17.05.2013 14:29, Dmitry Olshansky wrote:
 15-May-2013 04:17, IgorStepanov пишет:
 Do this table linked, if you remove all functions, which use it?
Thanks for this try, but they DO link in always. And I believe this is a key problem - each function goes into a separate object but globals are always pulled in!
Yes, if you build a library the functions in a module are split into separate object files, but data is always written into the object file of the original module. The linker cannot split these afterwards if any data in the module is referenced (which might happen by just importing the module).
And how then I would use these tables if even importing these modules then pulls in the data? Local import doesn't help too.
 A workaround could be to put the data into a different module.
Then say I go for X files with tables and import these modules. It still doesn't work - evidently simply referencing a module pulls in the data. If I compile and link the following 3 modules: module fmod; public immutable int[] fable = [1,2,3]; module mod; import fmod; int foo(int i) { return fable[i]; } //driver import mod; immutable byte[] bable = [1, 2, 3, 4, 5]; byte boo(int ch){ return bable[ch]; } void main(string[] args){ boo(0); } I still have this symbol in map file: _D4fmod5fableyAi That is immutable(int[]) fmod.fable after ddemangle. -- Dmitry Olshansky
May 17 2013
next sibling parent "Igor Stepanov" <wazar.leollone yahoo.com> writes:
And how then I would use these tables if even importing these 
modules
then pulls in the data? without import with pragma(mangle) help. But this dark linker magic no much better you template solution.
May 17 2013
prev sibling next sibling parent reply "Igor Stepanov" <wazar.leollone yahoo.com> writes:
module fmod;

public immutable int[] fable = [1,2,3];


module mod;

//import fmod;

int foo(int i)
{
        pragma(mangle, "_D4fmod5fableyAi") extern immutable int[] 
fable;
        return fable[i];
}
May 17 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-05-18 01:10, Igor Stepanov wrote:
 module fmod;

 public immutable int[] fable = [1,2,3];


 module mod;

 //import fmod;

 int foo(int i)
 {
         pragma(mangle, "_D4fmod5fableyAi") extern immutable int[] fable;
         return fable[i];
 }
Or with extern (C) to avoid the pragma. -- /Jacob Carlborg
May 18 2013
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 18.05.2013 00:49, Dmitry Olshansky wrote:
 17-May-2013 21:57, Rainer Schuetze пишет:
 On 17.05.2013 14:29, Dmitry Olshansky wrote:
 15-May-2013 04:17, IgorStepanov пишет:
 Do this table linked, if you remove all functions, which use it?
Thanks for this try, but they DO link in always. And I believe this is a key problem - each function goes into a separate object but globals are always pulled in!
Yes, if you build a library the functions in a module are split into separate object files, but data is always written into the object file of the original module. The linker cannot split these afterwards if any data in the module is referenced (which might happen by just importing the module).
And how then I would use these tables if even importing these modules then pulls in the data?
It depends. An imported module is referenced if it is part of the dependency chain for static constructors, i.e if it contains static this() or imports something that contains it. A module with just data in it should be fine.
 Local import doesn't help too.

 A workaround could be to put the data into a different module.
Then say I go for X files with tables and import these modules. It still doesn't work - evidently simply referencing a module pulls in the data. If I compile and link the following 3 modules: module fmod; public immutable int[] fable = [1,2,3]; module mod; import fmod; int foo(int i) { return fable[i]; } //driver import mod; immutable byte[] bable = [1, 2, 3, 4, 5]; byte boo(int ch){ return bable[ch]; } void main(string[] args){ boo(0); } I still have this symbol in map file: _D4fmod5fableyAi That is immutable(int[]) fmod.fable after ddemangle.
If you compile all the files on the command line for the executable everything gets dragged in. You will have to build the first two modules into a library: dmd -lib fmod.d mod.d -ofm.lib dmd -map driver.d m.lib grep fmod driver.map Splitting functions into separate object files also only happens when -lib is specified.
May 17 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
18-May-2013 10:40, Rainer Schuetze пишет:
 On 18.05.2013 00:49, Dmitry Olshansky wrote:
 17-May-2013 21:57, Rainer Schuetze пишет:
 On 17.05.2013 14:29, Dmitry Olshansky wrote:
 15-May-2013 04:17, IgorStepanov пишет:
 Do this table linked, if you remove all functions, which use it?
Thanks for this try, but they DO link in always. And I believe this is a key problem - each function goes into a separate object but globals are always pulled in!
Yes, if you build a library the functions in a module are split into separate object files, but data is always written into the object file of the original module. The linker cannot split these afterwards if any data in the module is referenced (which might happen by just importing the module).
And how then I would use these tables if even importing these modules then pulls in the data?
It depends. An imported module is referenced if it is part of the dependency chain for static constructors, i.e if it contains static this() or imports something that contains it. A module with just data in it should be fine.
 Local import doesn't help too.

 A workaround could be to put the data into a different module.
Then say I go for X files with tables and import these modules. It still doesn't work - evidently simply referencing a module pulls in the data. If I compile and link the following 3 modules: module fmod; public immutable int[] fable = [1,2,3]; module mod; import fmod; int foo(int i) { return fable[i]; } //driver import mod; immutable byte[] bable = [1, 2, 3, 4, 5]; byte boo(int ch){ return bable[ch]; } void main(string[] args){ boo(0); } I still have this symbol in map file: _D4fmod5fableyAi That is immutable(int[]) fmod.fable after ddemangle.
If you compile all the files on the command line for the executable everything gets dragged in.
Then the only question would be - why we need this behavior? It looks painfully clear to me that -lib style should be the default.
You will have to build the first two modules
 into a library:

 dmd -lib fmod.d mod.d -ofm.lib
 dmd -map driver.d m.lib
 grep fmod driver.map

 Splitting functions into separate object files also only happens when
 -lib is specified.
Great then even this seems to work: module mod; int foo(int i) { static immutable int[] fable = [1,2,3]; //table_.mangleof == "table_"; return fable[i]; } public void car(){ } //driver import mod; immutable byte[] bable = [1, 2, 3, 4, 5]; byte boo(int ch){ return bable[ch]; } void main(string[] args){ boo(0); car(); } -- Dmitry Olshansky
May 18 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 18.05.2013 09:46, Dmitry Olshansky wrote:
 18-May-2013 10:40, Rainer Schuetze пишет:
 If you compile all the files on the command line for the executable
 everything gets dragged in.
Then the only question would be - why we need this behavior? It looks painfully clear to me that -lib style should be the default.
I guess it is the compilation model inherited from C++. Putting everything into libraries has its own share of issues like not linking in modules that are only accessed via the object factory or that register with some other system in a static constructor.
  >You will have to build the first two modules
 into a library:

 dmd -lib fmod.d mod.d -ofm.lib
 dmd -map driver.d m.lib
 grep fmod driver.map

 Splitting functions into separate object files also only happens when
 -lib is specified.
Great then even this seems to work: module mod; int foo(int i) { static immutable int[] fable = [1,2,3]; //table_.mangleof == "table_"; return fable[i]; } public void car(){ } //driver import mod; immutable byte[] bable = [1, 2, 3, 4, 5]; byte boo(int ch){ return bable[ch]; } void main(string[] args){ boo(0); car(); }
I just verified (for Win32) that it also works if you also defined data in the car function. On 17.05.2013 19:57, Rainer Schuetze wrote:
 Yes, if you build a library the functions in a module are split into
 separate object files, but data is always written into the object
 file of the original module. The linker cannot split these afterwards 
 if any data in the module is referenced (which might happen by just
 importing the module).
It seems I was wrong here. Function local data is written into the same object file as the function. I guess I confused it with COMDAT sections.
May 19 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
19-May-2013 11:02, Rainer Schuetze пишет:
 On 18.05.2013 09:46, Dmitry Olshansky wrote:
 18-May-2013 10:40, Rainer Schuetze пишет:
 If you compile all the files on the command line for the executable
 everything gets dragged in.
Then the only question would be - why we need this behavior? It looks painfully clear to me that -lib style should be the default.
I guess it is the compilation model inherited from C++.
Facepalm. :)
 Putting
 everything into libraries has its own share of issues like not linking
 in modules that are only accessed via the object factory or that
 register with some other system in a static constructor.
I thought these are just hooked up in some fake static this constructors (and these get pulled with import alone). Anyway - we then already have the same issues with classes from libraries that are loaded only via object factory. Meaning that it's not a new problem.
 On 17.05.2013 19:57, Rainer Schuetze wrote:
  > Yes, if you build a library the functions in a module are split into
  > separate object files, but data is always written into the object
  > file of the original module. The linker cannot split these afterwards
  > if any data in the module is referenced (which might happen by just
  > importing the module).

 It seems I was wrong here. Function local data is written into the same
 object file as the function.
Yes, which is nice. Your advice was priceless, now I just need to re-arrange some code to put everything as local to these functions. Since it goes to Phobos it'll be a library automatically. -- Dmitry Olshansky
May 19 2013