www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Identifier-name compression.

reply Stefan Koch <uplink.coder googlemail.com> writes:
Hi,

I just had a nice idea.
However due to my lack of obj-file-format knowlege I don't know 
how feasible it is.
As far as I can see Identifiers are already in a hashed format 
while inside the symbol-table of the compiler.
The Idea would be to safe a hash-table from id to clear-text-name 
or compressed-clear-text-name inside the object And simply mangle 
the id of the identifier rather then the identifier itself.
May 21 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 21 May 2016 at 22:40:44 UTC, Stefan Koch wrote:
 Hi,

 I just had a nice idea.
 However due to my lack of obj-file-format knowlege I don't know 
 how feasible it is.
 As far as I can see Identifiers are already in a hashed format 
 while inside the symbol-table of the compiler.
 The Idea would be to safe a hash-table from id to 
 clear-text-name or compressed-clear-text-name inside the object 
 And simply mangle the id of the identifier rather then the 
 identifier itself.
I though about this a bit more and I am more and more convinced that it can actually work. Since the symbol id per module will be unique. So basically it would go like _modulename_length%modulename_SymbolID This way processing time will not be touched and in the best case it will even be reduced.
May 21 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/21/2016 3:50 PM, Stefan Koch wrote:
 [...]
It won't be reproducible from run to run, and worse, if you use separate compilation, duplicates are inevitable.
May 21 2016
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 21 May 2016 at 22:59:48 UTC, Walter Bright wrote:
 On 5/21/2016 3:50 PM, Stefan Koch wrote:
 [...]
It won't be reproducible from run to run, and worse, if you use separate compilation, duplicates are inevitable.
please elaborate why wouldn't it be reproduceble from run to run ? aren't symbols always inserted in the same order. So the same sourceFile will always produce the same mangling ? and at link time the id-to-identifier translation-table would be consulted ?
May 21 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/21/2016 4:02 PM, Stefan Koch wrote:
 On Saturday, 21 May 2016 at 22:59:48 UTC, Walter Bright wrote:
 On 5/21/2016 3:50 PM, Stefan Koch wrote:
 [...]
It won't be reproducible from run to run, and worse, if you use separate compilation, duplicates are inevitable.
please elaborate why wouldn't it be reproduceble from run to run ?
Because it is the address of the symbol, and modern operating systems randomize the addresses of a loaded program from run to run.
 and at link time the id-to-identifier translation-table would be consulted ?
There's no such table.
May 21 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 21 May 2016 at 23:20:53 UTC, Walter Bright wrote:
 On 5/21/2016 4:02 PM, Stefan Koch wrote:
 and at link time the id-to-identifier translation-table would 
 be consulted ?
There's no such table.
Of course the table would have to build by the compiler and inserted as data into the object-file.
May 21 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/21/2016 4:30 PM, Stefan Koch wrote:
 Of course the table would have to build by the compiler and inserted as data
 into the object-file.
You'd have to build your own linker, too.
May 21 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 21 May 2016 at 23:43:48 UTC, Walter Bright wrote:
 On 5/21/2016 4:30 PM, Stefan Koch wrote:
 Of course the table would have to build by the compiler and 
 inserted as data
 into the object-file.
You'd have to build your own linker, too.
Not if dmd is used to build the executable.
May 21 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/21/2016 4:45 PM, Stefan Koch wrote:
 On Saturday, 21 May 2016 at 23:43:48 UTC, Walter Bright wrote:
 On 5/21/2016 4:30 PM, Stefan Koch wrote:
 Of course the table would have to build by the compiler and inserted as data
 into the object-file.
You'd have to build your own linker, too.
Not if dmd is used to build the executable.
Since such a dmd would have to be able to read .o files created by C/C++, it would be the same thing as building our own linker.
May 21 2016
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 21 May 2016 at 23:52:59 UTC, Walter Bright wrote:
 On 5/21/2016 4:45 PM, Stefan Koch wrote:
 On Saturday, 21 May 2016 at 23:43:48 UTC, Walter Bright wrote:
 On 5/21/2016 4:30 PM, Stefan Koch wrote:
 Of course the table would have to build by the compiler and 
 inserted as data
 into the object-file.
You'd have to build your own linker, too.
Not if dmd is used to build the executable.
Since such a dmd would have to be able to read .o files created by C/C++, it would be the same thing as building our own linker.
If an extern(C) or extern(c++) is used we can't do our mangling scheme anyway. So any function that is supposed to be called by C or C++ will still be mangled a compatible way. That way we can get away with using dmd as a pre-linker and doing the rest of the job with the system linker.
May 21 2016
prev sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 21 May 2016 at 22:59:48 UTC, Walter Bright wrote:
 On 5/21/2016 3:50 PM, Stefan Koch wrote:
 [...]
It won't be reproducible from run to run, and worse, if you use separate compilation, duplicates are inevitable.
There will not be duplicates since you would not compile the same module twice and If you do, It is trivial to remove them. In fact you would have the same doublicates with every mangling scheme. A symbol can be uniquely identified with the module it is defined in and a numerical id. If your module names clash you cannot compile anyway... At least I hope so.
May 21 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/21/2016 4:08 PM, Stefan Koch wrote:
 A symbol can be uniquely identified with the module it is defined in and a
 numerical id.
I've used such for temporaries, but they caused problems and people complained.
May 21 2016
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 21 May 2016 at 23:22:22 UTC, Walter Bright wrote:
 On 5/21/2016 4:08 PM, Stefan Koch wrote:
 A symbol can be uniquely identified with the module it is 
 defined in and a
 numerical id.
I've used such for temporaries, but they caused problems and people complained.
I see. But realistically compression of the symbolName is not diffrent. If fact the hypothetical id-to-name table would just be the external dictionary of a compressor.
May 21 2016