digitalmars.D - Converting Optlink from Assembler to C - Reboot
- Walter Bright (2/2) Nov 30 2009 http://www.reddit.com/r/programming/comments/a9mse/walter_bright_from_as...
Walter Bright <newshound1 digitalmars.com> writes:
http://www.reddit.com/r/programming/comments/a9mse/walter_bright_from_assembly_to_c/ Hopefully a working link this time.
Nov 30 2009
bearophile <bearophileHUGS lycos.com> writes:
A nice article, thank you for writing it for us for free (I used to pay to read similar texts).I had the job at the time of converting a huge (and very successful) electronic schematic editor, DASH, from assembler into C. In C it could then be recompiled for 32 bits, and even ported to other platforms like the Sun workstations. The conversion took months, but was a big success.<Sounds like a so much bug-prone job that the probability of having a final working product seem small. I guess you have translated it part-by-part, keeping it functional all the time, as you are doing with optlink.9. Optlink has no unit tests. Writing them would require understanding what the various functions do, and I won't know precisely what they are supposed to do until most of the program is converted.<Translating a largish program from a language to a different language is a good way to convince most programmers that unit tests are a Good Thing. Unit tests make this work quite simpler. And many good programs eventually need to be translated, it's not an uncommon event.Although I have not run any speed tests, I expect the performance of the non-I/O bound code to be about 30% slower. Since a linker tends to be I/O bound, the actual performance loss probably will be about 10%, which I can live with.<We'll see. Modern C compilers, like the Intel one, LLVM or GCC 4.4 sometimes give surprises :-) Modern C compilers also use CPU instructions that were not available in the past (you have to use compilation flags for Pentium4/Core2 or similar CPUs of course).The difference in object code size is primarily due to the assembler having done register assignments that cross over multiple function calls. There's just a lot less pushing and popping of parameters.<In GCC there are annotations (that LLVM doesn't support yet) that allow to nail variables into registers. One Haskell compiler (GHC) uses this feature a lot, it's explained here: http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdfRegistered mode is implemented by both the C and NCG back-ends and involves primarilytwo optimisations which significantly increase the performance of generated code. The first optimisation is known as TABLES_NEXT_TO_CODE and optimises the data layout of the code GHC produces, this is further explained in section 2.5. The second optimisation though is far more significant, it deals with the register allocation for the generated code. As part of the execution model that GHC uses for Haskell, it defines a variety of virtual registers which mirror many of the registers typically found on a CPU, such as a stack pointer register. These are examined in more detail in section 2.4.2 but their purpose is to create a virtual machine architecture which can be explicitly managed instead of using the C stack for example. In unregistered mode these virtual registers are all stored on the heap, needing memory access for any reads and writes. Given the frequency of access of these virtual registers, this creates a significant amount of memory traffic which becomes a limiting factor on the performance of the generated code. In registered mode many of these virtual registers are instead permanently assigned to a specific hardware registers (register pinning), greatly improving performance. The runtime of code compiled in registered mode as opposed to unregistered mode is on average 55% shorter. More details of this can be seen in section 4.< Eventually this GCC feature can be added to D too. D may also use a function annotation that tells a function how to manage certain registers.There are also a lot of functions with multiple entry points. I don't know of any current compiler that is able to enregister across a flow graph of function calls, or is able to tail merge multiple functions.<Probably no common compiler.After a few false starts, I remembered the lessons from converting DASH, which are:<How I translate from a higher level language to another, for example Java to D or Python to C: 1) I make sure to have a certain amount of correct input-output pairs for the original code, and sometimes I even write few unittests. 2) I clean & reformat the original code, because a well formatted code is more readable. I use the formatting style that I am able to read better. I sometimes remove excessive comments, and all the parts of the code that don't need to be translated. I keep testing that the functional tests and unit test keep working. 3) I slowly modify the original code, lowering its style, or just changing it, to make the code look and work as closer as possible to the destination language. Often this is a lowering of semantics, to use more basic constructs that are shared between the two languages. Every change is of course followed by running the tests. This is the intermediate version. 4) And only now I jump to the destination language, fix things, and hope. Sometimes I see problems, so I go back to the intermediate version and fix it so its semantics is closer to the destination language. Sometimes I even use an intermediate language, for example to translate Python to C I first use D with my dlibs that allow me to write very Python-like D code. Then I lower that D version to normal D-style code. And then I slowly lower the D version to C-style code. Running tests every few minutes. At the end I convert it to C, and I try to see if it works. D1 language is closer to C compared to D2 so this is simpler. Some bugs are common here when I translate D to C, like forgetting to initialize C variables, etc. To solve this problem I slowly assign "= void" to *all* variables and structs in the D code, so I can see it crash before going to C. As you, I have seen that doing such things in the most tidy and regular way possible saves me time. Even using an intermediate language (like D1 to translate Python => C) looks like a slow thing, but it allows me to avoid many bugs along the way, so even if it's boring and it looks like a waste of time, in the end I save time. D1 is useful to me to perform such translations too (D2 will be less fit for this purpose, so I'll probably keep using D1 for this). Bye, bearophile
Nov 30 2009
bearophile <bearophileHUGS lycos.com> writes:
I have forgotten something: Walter:Although I have not run any speed tests, I expect the performance of the non-I/O bound code to be about 30% slower. Since a linker tends to be I/O bound, the actual performance loss probably will be about 10%, which I can live with.<To improve this situation, when the translation to C is done, the program can be profiled, and the performance critical parts can be modified from spaghetti code with gotos to structured code that I think C optimizers can digest better. Bye, bearophile
Dec 01 2009