digitalmars.D - Self-Modifying code for user settings optimization
- Jason Jeffory (16/16) Jan 09 2016 Instead of something like
- Rikki Cattermole (20/20) Jan 09 2016 I've been looking into this issue for web routing.
- Jason Jeffory (26/50) Jan 09 2016 Well, I wasn't thinking of interpreted/JIT code but native.
- Rikki Cattermole (7/58) Jan 09 2016 What I think you're wanting is a little to 'magical' for compilers
- John Colvin (4/5) Jan 09 2016 An enum isn't guaranteed to be embedded in the instruction
- Rikki Cattermole (9/13) Jan 09 2016 enum FOO = true;
- John Colvin (4/22) Jan 09 2016 Of course, I just meant that when reading a global or an enum,
- Jason Jeffory (80/171) Jan 09 2016 It might, which is why I asked, seems like it would be something
- Rikki Cattermole (28/28) Jan 09 2016 interface IFoo {
- Jason Jeffory (8/36) Jan 09 2016 I see what you are saying... it should work. Seems like a lot of
- Jay Norwood (13/18) Jan 10 2016 There is debug line info, but good luck with most of that after
Instead of something like DoSomething(UserSettings["width"]); Which requires an access to UserSettings, which may be slow in time critical code(but you want to provide some way to configure various behaviors), why not use self-modifying code? DoSomething(3); // 3 maybe the default, but a hack is somehow introduced, such as modifying the "push 3" instruction to "push width" (but width is constant for all runs, the instruction itself changes values only when the default value is changed(could be done at setup). (push could be mov or whatever) This would avoid pipelining issue and provide the absolute fasted way to have settings? (Of course, we'd have to know where all the "push" instructions are located and all that since the modification could not occur serially(would be somewhat pointless then)) Not even sure if CPU's allow SMC anymore?
Jan 09 2016
I've been looking into this issue for web routing. Over all its definitely more performant. But: - You need some way to generate code - ABI compatibility - Host binary compatibility (not the same as ABI) - Front end for the "language" to specify what to generate I'm either going sljit way or my own. ATM I'm looking at building a c frontend to help with porting of sljit and for the future AOT generation of binaries. Most of the work to get x86 done for sljit has been done, about 2-3k left. https://github.com/rikkimax/sljitd Regarding if CPU's allow for JIT'ing code, yup they do allow it still. If they didn't, that CPU would be next to useless. However, an OS is not required to expose this. But if you're dealing with Windows and *nix. Don't worry about it. If you're interested in working on helping to port sljit please do. Just note that it isn't a very optimized JIT but it is fairly small and easy to use. Important to me is that it can be fully ported to D without much worries unlike LLVM, which is a pain to compile anyway.
Jan 09 2016
On Saturday, 9 January 2016 at 10:41:23 UTC, Rikki Cattermole wrote:I've been looking into this issue for web routing. Over all its definitely more performant. But: - You need some way to generate code - ABI compatibility - Host binary compatibility (not the same as ABI) - Front end for the "language" to specify what to generate I'm either going sljit way or my own. ATM I'm looking at building a c frontend to help with porting of sljit and for the future AOT generation of binaries. Most of the work to get x86 done for sljit has been done, about 2-3k left. https://github.com/rikkimax/sljitd Regarding if CPU's allow for JIT'ing code, yup they do allow it still. If they didn't, that CPU would be next to useless. However, an OS is not required to expose this. But if you're dealing with Windows and *nix. Don't worry about it. If you're interested in working on helping to port sljit please do. Just note that it isn't a very optimized JIT but it is fairly small and easy to use. Important to me is that it can be fully ported to D without much worries unlike LLVM, which is a pain to compile anyway.Well, I wasn't thinking of interpreted/JIT code but native. I suppose D could possibly do it with CTFE? (Create the CTFE to keep track of the the addresses, if possible, of where the variables are at in memory, so they can be updated). e.g., DoSomething(Settings!"Width"); // Somehow puts in a dummy variable and keeps track of it's address x = Settings!Width; // similar but different behavior. Basically turn an complex call(dictionary look up or something similar) to a mov x, const instruction, etc.. Mainly I'm thinking about switches like If (Settings["FastCode"]) { } but want to remove the lookup. Hence maybe something like if (volatile bool x = TRUE) { } But then somehow capture x's address(not sure if we could accomplish that in D?) which we easily change it's value outside the time critical code when needed. Sorry, I can't help with sljit... have way to many things on my plate, at some point I might if stuff changes. I'll look into it a little though. Thanks
Jan 09 2016
On 10/01/16 12:32 AM, Jason Jeffory wrote:On Saturday, 9 January 2016 at 10:41:23 UTC, Rikki Cattermole wrote:What I think you're wanting is a little to 'magical' for compilers especially dmd to do. I would recommend using enum's and static if or just go ahead and set it to a global variable. Enums are free and global variables may have cache misses issue, but it will be better then doing an AA lookup every time.I've been looking into this issue for web routing. Over all its definitely more performant. But: - You need some way to generate code - ABI compatibility - Host binary compatibility (not the same as ABI) - Front end for the "language" to specify what to generate I'm either going sljit way or my own. ATM I'm looking at building a c frontend to help with porting of sljit and for the future AOT generation of binaries. Most of the work to get x86 done for sljit has been done, about 2-3k left. https://github.com/rikkimax/sljitd Regarding if CPU's allow for JIT'ing code, yup they do allow it still. If they didn't, that CPU would be next to useless. However, an OS is not required to expose this. But if you're dealing with Windows and *nix. Don't worry about it. If you're interested in working on helping to port sljit please do. Just note that it isn't a very optimized JIT but it is fairly small and easy to use. Important to me is that it can be fully ported to D without much worries unlike LLVM, which is a pain to compile anyway.Well, I wasn't thinking of interpreted/JIT code but native. I suppose D could possibly do it with CTFE? (Create the CTFE to keep track of the the addresses, if possible, of where the variables are at in memory, so they can be updated). e.g., DoSomething(Settings!"Width"); // Somehow puts in a dummy variable and keeps track of it's address x = Settings!Width; // similar but different behavior. Basically turn an complex call(dictionary look up or something similar) to a mov x, const instruction, etc.. Mainly I'm thinking about switches like If (Settings["FastCode"]) { } but want to remove the lookup. Hence maybe something like if (volatile bool x = TRUE) { } But then somehow capture x's address(not sure if we could accomplish that in D?) which we easily change it's value outside the time critical code when needed. Sorry, I can't help with sljit... have way to many things on my plate, at some point I might if stuff changes. I'll look into it a little though. Thanks
Jan 09 2016
On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole wrote:Enums are free and global variables may have cache misses issueAn enum isn't guaranteed to be embedded in the instruction stream, there's still plenty of opportunities for cache misses.
Jan 09 2016
On 10/01/16 3:50 AM, John Colvin wrote:On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole wrote:enum FOO = true; static if (FOO) { doThis(); } else { doThat(); } No need for enum to be embedded in the instruction stream. Because it won't be. The else block just doesn't get compiled in.Enums are free and global variables may have cache misses issueAn enum isn't guaranteed to be embedded in the instruction stream, there's still plenty of opportunities for cache misses.
Jan 09 2016
On Saturday, 9 January 2016 at 14:55:27 UTC, Rikki Cattermole wrote:On 10/01/16 3:50 AM, John Colvin wrote:Of course, I just meant that when reading a global or an enum, enum isn't necessarily cheaper. static if f.t.w.On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole wrote:enum FOO = true; static if (FOO) { doThis(); } else { doThat(); } No need for enum to be embedded in the instruction stream. Because it won't be. The else block just doesn't get compiled in.Enums are free and global variables may have cache misses issueAn enum isn't guaranteed to be embedded in the instruction stream, there's still plenty of opportunities for cache misses.
Jan 09 2016
On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole wrote:On 10/01/16 12:32 AM, Jason Jeffory wrote:It might, which is why I asked, seems like it would be something trivial to do if the address of the function and relative address of the "variable" can be gotten at "compile time"(not sure it is possible by maybe one could write an object parser).On Saturday, 9 January 2016 at 10:41:23 UTC, Rikki Cattermole wrote:What I think you're wanting is a little to 'magical' for compilers especially dmd to do.I've been looking into this issue for web routing. Over all its definitely more performant. But: - You need some way to generate code - ABI compatibility - Host binary compatibility (not the same as ABI) - Front end for the "language" to specify what to generate I'm either going sljit way or my own. ATM I'm looking at building a c frontend to help with porting of sljit and for the future AOT generation of binaries. Most of the work to get x86 done for sljit has been done, about 2-3k left. https://github.com/rikkimax/sljitd Regarding if CPU's allow for JIT'ing code, yup they do allow it still. If they didn't, that CPU would be next to useless. However, an OS is not required to expose this. But if you're dealing with Windows and *nix. Don't worry about it. If you're interested in working on helping to port sljit please do. Just note that it isn't a very optimized JIT but it is fairly small and easy to use. Important to me is that it can be fully ported to D without much worries unlike LLVM, which is a pain to compile anyway.Well, I wasn't thinking of interpreted/JIT code but native. I suppose D could possibly do it with CTFE? (Create the CTFE to keep track of the the addresses, if possible, of where the variables are at in memory, so they can be updated). e.g., DoSomething(Settings!"Width"); // Somehow puts in a dummy variable and keeps track of it's address x = Settings!Width; // similar but different behavior. Basically turn an complex call(dictionary look up or something similar) to a mov x, const instruction, etc.. Mainly I'm thinking about switches like If (Settings["FastCode"]) { } but want to remove the lookup. Hence maybe something like if (volatile bool x = TRUE) { } But then somehow capture x's address(not sure if we could accomplish that in D?) which we easily change it's value outside the time critical code when needed. Sorry, I can't help with sljit... have way to many things on my plate, at some point I might if stuff changes. I'll look into it a little though. ThanksI would recommend using enum's and static if or just go ahead and set it to a global variable. Enums are free and global variables may have cache misses issue, but it will be better then doing an AA lookup every time.I don't think either of these work. The globals have the cache issue and pollute the namespace. Enums area static, It requires a recompile to change settings. I'm talking about changing them a run time. Do you follow? e.g., void Compute() { ... for(;;) { if (Settings["ComputeEXACT"]) { // Slower but faster } else { // Fast but worse approximation } } } Obviously in some cases we can rearrange the order to avoid looking up ComputeEXACT in the loop, but assume this is not the case. The AA lookup is too slow, adds too much overhead. Now suppose we use a local variable void Compute() { bool ComputeEXACT = false; // We could use Settings["ComputeEXACT"] here, but assume Compute() may be used in other loops through chaining. ... for(;;) { if (ComputeEXACT) { // Slower but faster } else { // Fast but worse approximation } } } But now ComputeEXACT behaves like the enum, essentially requires recompilation. The only way around that fact is to modify ComputeEXACT by code. This is very easy to do if we know where it is at. We should be able to know, assuming it isn't optimized out(prevent using volatile or whatever). It should be on the stack, right. Since false is a constant, a simple instruction should be generated to push it on there. We can change this!! (would be platform dependent, but not hard) Alternatively, we could use a simple array to hold all the settings, this requires an indirection. Even better, just modify the if loop directly from a jne to a je type of thing. This would be the fastest way!!?! This all requires knowing how to get the addresses of stuff at compile time. The goals: 1. Avoid any lookup of memory locations except possibly off the stack. 2. Modify specific code-memory locations at runtime. My guess is that D can't do this out of the box, but maybe it can accomplish it with an object code parser that builds a list of all the address that need modifying? (might require a two pass compilation, or the object parser could modify the code to "correct" it, then the core update routine could be written in D directly(assume all are bool for now): void UpdateSettings() { volatile bool Settings[N]; for(int i = 0; i < N; i++) Modify(Settings[i].address, Settings[i].value); } So, you call UpdateSettings, it modifies all the addresses with the settings value. The object parser comes in after the code has compiled and fills in the correct info for N and address. (of course, this is dangerous, needless to say!)
Jan 09 2016
interface IFoo { void a(); void b(); } __gshared IFoo a, b; __gshared IFoo instance; class Foo(bool bar) : IFoo { void a() { static if (bar) { // do something } else { // do nothing } } } shared static this() { a = new Foo!true; b = new Foo!false; } void update(Lookup lookup) { if (lookup["bar"]) instance = a; else instance = b; } Small indirection when executing to find which function to execute but that is the best out of language semantics we have and only works for booleans.
Jan 09 2016
On Saturday, 9 January 2016 at 23:43:32 UTC, Rikki Cattermole wrote:interface IFoo { void a(); void b(); } __gshared IFoo a, b; __gshared IFoo instance; class Foo(bool bar) : IFoo { void a() { static if (bar) { // do something } else { // do nothing } } } shared static this() { a = new Foo!true; b = new Foo!false; } void update(Lookup lookup) { if (lookup["bar"]) instance = a; else instance = b; } Small indirection when executing to find which function to execute but that is the best out of language semantics we have and only works for booleans.I see what you are saying... it should work. Seems like a lot of bloat for something relatively trivial. Changing the whole context to change a single branch and multiplying the number of types might have some long term consequences. Its also complexifying the code quite a bit... more prone to errors. Maybe with a bit of ingenuity these can be overcome.
Jan 09 2016
On Saturday, 9 January 2016 at 21:09:05 UTC, Jason Jeffory wrote:It might, which is why I asked, seems like it would be something trivial to do if the address of the function and relative address of the "variable" can be gotten at "compile time"(not sure it is possible by maybe one could write an object parser).There is debug line info, but good luck with most of that after the optimizer gets through with the code. This project provides an api for code patching. Maybe it will help, or at least give you ideas. http://www.dyninst.org/dyninst I also have some interest in the ability to add arbitrary named markers to code at compile time that could be accessed from symbol info. I'm not interested in modifying the code, but in using the addresses to create windows for code measurement. Our hardware supports performance analysis limited to a specified address range without instrumenting the code, but with optimized code it is difficult to use.
Jan 10 2016