digitalmars.D - Self Optimizing Code
- Mark "J" Twain (53/53) Aug 02 2016 Another new D feature here!
- ketmar (1/1) Aug 03 2016 hello, you just invented JIT compiler.
- Luis (2/3) Aug 03 2016 This is not JIT.
- ketmar (5/8) Aug 03 2016 yes, it is. fairly advanced one, that does dynamic recompilation
- Jack Stouffer (2/3) Aug 03 2016 Yup, 100% a JIT. I'm pretty sure PyPy does this to some degree
- Mark "J" Twain (20/21) Aug 03 2016 Um, no. JIT = Just in time compilation. The code is already
- ketmar (6/6) Aug 03 2016 On Wednesday, 3 August 2016 at 19:22:05 UTC, Mark "J" Twain wrote:
- ikod (3/7) Aug 03 2016 Looks like this:
- Mark "J" Twain (5/15) Aug 03 2016 No, I don't think so. The best I can tell is this is just
- ikod (3/19) Aug 03 2016 This is not just profiling, but "Profile-Guided Optimization
- Mark "J" Twain (8/29) Aug 03 2016 Yes, but that is still profiling. It is static profiling while I
- Chris Wright (17/18) Aug 03 2016 The person is talking about algorithms with tuning parameters (like arra...
- Mark "J" Twain (45/68) Aug 03 2016 Close. I am only talking about adding the features to do such
- Chris Wright (22/22) Aug 03 2016 In your example, you have a size_t or double factor for array growth. If...
- Mark "J" Twain (59/83) Aug 04 2016 Yes, I mentioned that, but there is no ideal way to do this. I
- Chris Wright (17/21) Aug 04 2016 The cost with language changes isn't just implementation. The largest
- crimaniak (40/42) Aug 04 2016 Some time ago I played with self-optimizing cache layer.
- Mark "J" Twain (48/91) Aug 04 2016 I did not think of this in terms of networking. I suppose it can
Another new D feature here! Self-Optimizing Code is a fantastic research area that can lead to greater improvements in code, make them more responsive to individual applications, and requires very little effort. To demonstrate: In many programs, memory behavior is fixed to a large degree, if not completely. Take the typical dynamic data structure that over allocates the required memory to store it's data by some factor, usually 2. We might have code like if (length >= capacity) expand(2*length); These types of expressions are used through a program. They are hard coded values that are generally somewhat arbitrary. They definitely are not optimal in all cases. Instead, a better solution would be to use variables: if (n*length > m*capacity) expand(l*length) n,m, and l are not variable and can change. How is this useful you say? Because of mathematics! In a typical program we will have N tuples, where each tuple t_i is a parameter set, such as t_k(n,m,l). This forms a vector space and, due to mathematics, we can find optimal values for the different parameter sets by observing the outcome of small perturbations over long times. e.g., we intially start with our hard coded values as defaults. t_i_0(1,1,2) for our example above. A background process has this variable, valid ranges, and the ability to measure performance for the app. It then, in the background, perturbates t_i, say t_i_1(1.01, 1, 2), monitors the performance impact, and stores the results. This can be efficiently since the algorithm is quite simple, can be done without threading concerns(since the parameters are effectively immutable to app), and can run in the background just monitoring the app performance. Using statistics, we can find out if the change had any significant effect(remember, we are doing this over the long haul, so any deviations will cancel out). Eventually the program will optimize itself to run faster by choosing the values that best suit how the program is used during the optimization process(which can be done continually, and using better techniques than I have describe). To accomplish this task easily, all one needs to do is have special variables usage Perf("Memory_Capacity") if ($n*length >= $m*capacity) expand($l*length); The compiler then stores values(n,m,l) in a list, in this case we named it, along with the file position, etc. This is the only extra stuff that is done to a program. The special variable symbols $ was arbitrary, not necessarily desirable. A BG thread is created that then does the optimization part. We might need to specify ranges for the values(the sample space for the algorithm, that could be done externally though).
Aug 02 2016
On Wednesday, 3 August 2016 at 07:36:21 UTC, ketmar wrote:hello, you just invented JIT compiler.This is not JIT.
Aug 03 2016
On Wednesday, 3 August 2016 at 13:45:13 UTC, Luis wrote:On Wednesday, 3 August 2016 at 07:36:21 UTC, ketmar wrote:yes, it is. fairly advanced one, that does dynamic recompilation of some code pathes, but still JIT. being in touch with JIT-related things in more than a decade, i can recognize JIT when i smell it.hello, you just invented JIT compiler.This is not JIT.
Aug 03 2016
On Wednesday, 3 August 2016 at 07:36:21 UTC, ketmar wrote:hello, you just invented JIT compiler.Yup, 100% a JIT. I'm pretty sure PyPy does this to some degree
Aug 03 2016
On Wednesday, 3 August 2016 at 07:36:21 UTC, ketmar wrote:hello, you just invented JIT compiler.Um, no. JIT = Just in time compilation. The code is already compiled and in binary form. The only differnece is that hard coded values = literals in the binary, become variables. There is no recompliation. The variables are simply updated by an optimization routine. They could be manually updated by the user, for example, but the user is not intelligent with regards to these things(although an interface for the user could be used). Regardless, this is not JIT. Having seen some of your posts, I know you won't admit your ignorance here, but unless you are going to claim that DMD is JIT, then this is not JIT. Why? It can already be done in DMD without using any advanced techniques. The problem is that it requires the programmer to do more boilerplate code than is needed. This is more akin to GC, but instead of managing memory references, it manages variables. Instead of freeing stuff, it perturbs them. But does so in the background too... but does not need to "stop the world", since as far as the program is concerned, these special variables like literals(immutable at least).
Aug 03 2016
On Wednesday, 3 August 2016 at 19:22:05 UTC, Mark "J" Twain wrote: you know what? i'm sorry that i stepped in your thread. yet i fixed this, as i added you to my ignore list, which means that i won't get any more notifications with your name on 'em, and won't see 'em in web frontend. sorry, i have zero tolerance to people like you these days. hope we won't meet again.
Aug 03 2016
On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:Another new D feature here! Self-Optimizing Code is a fantastic research area that can lead to greater improvements in code, make them more responsive to individual applications, and requires very little effort.Looks like this: https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
Aug 03 2016
On Wednesday, 3 August 2016 at 18:18:51 UTC, ikod wrote:On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:No, I don't think so. The best I can tell is this is just profiling. This would be having the program itself profile itself and make modifications itself(of course, limited to these "special variables").Another new D feature here! Self-Optimizing Code is a fantastic research area that can lead to greater improvements in code, make them more responsive to individual applications, and requires very little effort.Looks like this: https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
Aug 03 2016
On Wednesday, 3 August 2016 at 19:25:08 UTC, Mark "J" Twain wrote:On Wednesday, 3 August 2016 at 18:18:51 UTC, ikod wrote:This is not just profiling, but "Profile-Guided Optimization (PGO)"On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:No, I don't think so. The best I can tell is this is just profiling. This would be having the program itself profileAnother new D feature here! Self-Optimizing Code is a fantastic research area that can lead to greater improvements in code, make them more responsive to individual applications, and requires very little effort.Looks like this: https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
Aug 03 2016
On Wednesday, 3 August 2016 at 21:09:46 UTC, ikod wrote:On Wednesday, 3 August 2016 at 19:25:08 UTC, Mark "J" Twain wrote:Yes, but that is still profiling. It is static profiling while I am talking about dynamic profiling. It is similar, but it doesn't happen while the program is in use by the user. That is, simply, the programmer runs the profiler and it optimizes the code then ships it as final to the user. I am talking about the user running the program and the program optimizes itself. Big difference!On Wednesday, 3 August 2016 at 18:18:51 UTC, ikod wrote:This is not just profiling, but "Profile-Guided Optimization (PGO)"On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:No, I don't think so. The best I can tell is this is just profiling. This would be having the program itself profileAnother new D feature here! Self-Optimizing Code is a fantastic research area that can lead to greater improvements in code, make them more responsive to individual applications, and requires very little effort.Looks like this: https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
Aug 03 2016
On Wed, 03 Aug 2016 21:09:46 +0000, ikod wrote:This is not just profiling, but "Profile-Guided Optimization (PGO)"The person is talking about algorithms with tuning parameters (like array growth rate when appending) adjusting those parameters based on observed characteristics of the program. For instance, if it's observed that arrays of a given type tend to grow to ~1800 entries or stay under 100 entries, the runtime might automatically extend your array from length 128 to 2048 directly instead of growing to 256, then 512, then 1024, and finally 2048. A compiler would not be allowed to make these changes automatically. There are significant downsides. You need a lot of bookkeeping to determine what values you need. The values are not preserved across runs. You need to implement it separately for each algorithm. You generally only bother when the potential performance gains are great as an absolute measure and relative to the amount of bookkeeping you would have to do. For instance, with array appending, it's a terribly common operation, so you can only use this technique productively if gathering the data is dirt cheap.
Aug 03 2016
On Thursday, 4 August 2016 at 01:54:57 UTC, Chris Wright wrote:On Wed, 03 Aug 2016 21:09:46 +0000, ikod wrote:Close. I am only talking about adding the features to do such things and not having the compiler involved in the analysis. We can already do this sort of thing but it requires more boilerplate code and and the benefit may be negligible. What it does for us is allows us to change hard coded values into special variables that then can be manipulated(if desired, or left alone) while the program is executing. All the compiler does is take these special variables and stores them in chunks(tuples) and possibly stores some discriminatory info about them like the file and function they were used in. Then it is up to the optimizing algorithm do decide how to approach using them. If it does nothing, then the only performance hit is that variables were used instead of hard coded literals. A bit more info may be required for the optimizer to make good choices though. Since the data collected could be extremely large(thousands of tuples). This creates a very large parameter space and knowing what parameter changes created what performance changes is crucial(although a blind sort of parameter space montecarlo method could work, but would produce erratic behavior). The optimizations would persist between program executions simply by writing out all the current parameter values, else it would be difficult for the program to ever actually optimize itself. Neural networks could be used to find more optimized conditions. It could be extended to include more dynamic characteristics of a program, like what JIT does, but I think this adds far more complexity and would require much more compiler support and would then probably just end up with a JIT like environment. Another simple example. Suppose one has a sleep(10) in a thread. The 10 was "arbitrarily" chosen. Instead, if we could dosleep($10), 10 becomes a parameter. The compiler emits it to a file along with it's address(it is a global in the program address space). The optimizer then reads the file, attempts to modify this value with a slight perturbation, say to 11. Checks the performance impact(over many uses, to remove statistical deviations) and if it is more performant, it then uses that value as the default. It can then try 12 and repeat the process. While the algorithm is not advanced, it is produces a more optimal result than hard coding the value to 10, which then cannot change for the life of the program. To keep the program stable, such changes would have to occur very slowly. Better algorithms, and better profiling tools(which, say, could do wider sampling and better performance analysis), could hone in on good values from the get go. Then the program itself adjusts to the individual user case by analysis.This is not just profiling, but "Profile-Guided Optimization (PGO)"The person is talking about algorithms with tuning parameters (like array growth rate when appending) adjusting those parameters based on observed characteristics of the program. For instance, if it's observed that arrays of a given type tend to grow to ~1800 entries or stay under 100 entries, the runtime might automatically extend your array from length 128 to 2048 directly instead of growing to 256, then 512, then 1024, and finally 2048. A compiler would not be allowed to make these changes automatically. There are significant downsides. You need a lot of bookkeeping to determine what values you need. The values are not preserved across runs. You need to implement it separately for each algorithm. You generally only bother when the potential performance gains are great as an absolute measure and relative to the amount of bookkeeping you would have to do. For instance, with array appending, it's a terribly common operation, so you can only use this technique productively if gathering the data is dirt cheap.
Aug 03 2016
In your example, you have a size_t or double factor for array growth. If you set it to -1, you would have an unpleasant time. You need a way to specify the range of valid values. In more complex algorithms, you need a way to evaluate several parameters together to determine if they are valid. Since you're not always optimizing for the same thing (time vs memory; wasted memory vs number of reallocations; for approximation algorithms, how good an approximation is generated), you need a way to specify that as well. Putting that in the language would be horribly complex. It would be much simpler for everyone if it were in a library instead. I could probably code up a basic version in a couple hours. Which I'll do for US$150 -- below market rate, especially since I'd continue supporting it. Furthermore, your idea of changing compile-time constants without recompiling the program is unworkable. Constants are folded and inlined. That "* 2" might turn into "<< 1". A loop based on a constant can be unrolled. You suggested that we can do a random walk on values to find optimal values. This will find a local minimum, which might not be the globally optimal value. Something like simulated annealing would work better in the general case. Simulated annealing is a fair bit more complex than a random walk; it's not going to be implemented inside druntime.
Aug 03 2016
On Thursday, 4 August 2016 at 04:41:43 UTC, Chris Wright wrote:In your example, you have a size_t or double factor for array growth. If you set it to -1, you would have an unpleasant time. You need a way to specify the range of valid values. In more complex algorithms, you need a way to evaluate several parameters together to determine if they are valid.Yes, I mentioned that, but there is no ideal way to do this. I think it shouldn't be done in code since it clutters the code. The parameter tuples are the groupings, I think the compiler can infer groupings based on source code closeness(if 3 parameters show up on the same line, it's obviously save to assume they go together). Ultimately there is no big deal about grouping because one can view the parameter space of tuples as flat.Since you're not always optimizing for the same thing (time vs memory; wasted memory vs number of reallocations; for approximation algorithms, how good an approximation is generated), you need a way to specify that as well.Well, it depends. If performance is cpy cycles and memory, then they are relatively easy to get, although absolute accuracy would be required.Putting that in the language would be horribly complex. It would be much simpler for everyone if it were in a library instead. I could probably code up a basic version in a couple hours. Which I'll do for US$150 -- below market rate, especially since I'd continue supporting it.or you could pay me 150$ and I'd do it. But I have better things to do, and without compiler support, or a preprocessor, I'm not interested in the code complexity a library solution would have. Having to specify every parameter with some long sequence of characters just makes it messy. I would have to disagree about the language complexity though. At most the hardest part would be figuring out what syntax to use for symbols. $(10,0,100,1) Could be used. When ever the compiler comes across such a thing it simply 1. creates an "anonymous" variable with value 10. 2. Emits something like (file,line,default value,min,max,step,address) One could allow for group indexing or "closeness" by code. e.g., $(10,0,100,1,43) is of group 43. 0x043245 which is the address. That would be it on the compiler side. The optimizer then has enough information to work from. While such a method may lack enough information to be optimized effectively, in theory, it would be enough to find the optimal parameters in the long term. One could probably get away with a two or three line code for the variables that does the same. auto x = Opt_Var(10, 0, 100, 1, 43); then use x in the code. The hard part might be getting the relative address and absolute addresses setup properly for the optimizer. Might work but still a bit cluttery IMO. If you feel like implementing it for the benefit of humanity, be my guest. Else I'll end up trying in the future when I get around to it.Furthermore, your idea of changing compile-time constants without recompiling the program is unworkable. Constants are folded and inlined. That "* 2" might turn into "<< 1". A loop based on a constant can be unrolled.This is why I said they were variables. I said they were effectively constants. They obviously can't be constants. We have to have an address that holds the value so it can be changed by the optimizer.You suggested that we can do a random walk on values to find optimal values. This will find a local minimum, which might not be the globally optimal value. Something like simulated annealing would work better in the general case. Simulated annealing is a fair bit more complex than a random walk; it's not going to be implemented inside druntime.Well, there are many optimization methods, that isn't the point. The optimizer can be exposed for modification and different methods could be used. My main point was to introduce the idea. I have never seen it mentioned in literature before. The ability for a program to slowly optimize itself over it's life cycle seems to be a new and novel concept. The benefit of such a thing? Not sure. May only improve performance 1 or 2%. The real benefit? Who knows, but as time goes on it might become real useful and be part of the profiling process and find better ways to optimize programs that give better results.
Aug 04 2016
On Thu, 04 Aug 2016 18:39:36 +0000, Mark "J" Twain wrote:I would have to disagree about the language complexity though. At most the hardest part would be figuring out what syntax to use for symbols.The cost with language changes isn't just implementation. The largest part is that people reading code in the language need to be able to recognize what a piece of code is doing. If it's not immediately obvious, that is a cost borne by everyone using the language. Your suggested syntax is terribly inscrutable, and the easiest way to fix that problem is to use a library instead of a language change.Having to specify every parameter with some long sequence of characters just makes it messy.It would be far faster to implement a library solution. It would be much easier to maintain a library solution. A library solution could use functions in phobos and other libraries available in dub, not just druntime. A library solution could release much more often. It would be easier to configure -- for instance, random walk vs simulated annealing. It would be easier to specify and create constraints using a library solution. It would take no political effort to publish a library on dub. But it would cost you an extra handful of characters, so it would be unusable. Are you trolling?
Aug 04 2016
On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:Instead, a better solution would be to use variables: if (n*length > m*capacity) expand(l*length)Some time ago I played with self-optimizing cache layer. Problem: Time to obtain cache items is unknown and server dependant. For example, network can be involved in this. Sometimes is it localhost, sometimes server on another continent. Time to cache it is unknown too. For example if memcached extension is not installed on this server then fallback to disk backend will slow cache performance very much. So we don't know is it worst to cache. Solution: cache measures own performance and acts accordingly. So I make next things: 1. Functional interface instead of save/load type interface. cacheLevel.get(functor_to_get_item) allow to measure item obtaining time. 2. Implement all measuring/controlling logic in separate class with interface AbstractStatist, in CacheLevel class make just hooks for it. So changing AbstractStatist implementation I can change work mode to measure statistics and use it or work as usual cache (EmptyStatist for all empty hooks). I make implementation to skip all items not worst caching (calcTime/calcHits*totalHits-totalTime < 0) but it's possible to make more smart things (like caching only most efficient items not exceeding cache size). In my experience I can draw some conclusions. 1. It need to separate measure mode and control mode. You can't have accurate statistics while changing system behavior according to current statistics state. 2. Statistics can be different for different applications but for specific application in specific conditions for most cases it can be approximated as constant. So for array allocating strategy more realistic scenario the next, I think: 1. Application compiled in some 'array_debug' mode then some statist trait added to array, collect usage statistics and writes optimal constants at the application exit. 2. Programmer configure array allocator in application according to these constants. 3. Application builds in release mode with optimal allocation strategy and without any statist overhead and works fast. Users are happy.
Aug 04 2016
On Thursday, 4 August 2016 at 11:35:41 UTC, crimaniak wrote:On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:I did not think of this in terms of networking. I suppose it can be used there too but, as you mention, latency could be a factor.Instead, a better solution would be to use variables: if (n*length > m*capacity) expand(l*length)Some time ago I played with self-optimizing cache layer. Problem: Time to obtain cache items is unknown and server dependant. For example, network can be involved in this. Sometimes is it localhost, sometimes server on another continent. Time to cache it is unknown too. For example if memcached extension is not installed on this server then fallback to disk backend will slow cache performance very much. So we don't know is it worst to cache. Solution: cache measures own performance and acts accordingly.So I make next things: 1. Functional interface instead of save/load type interface. cacheLevel.get(functor_to_get_item) allow to measure item obtaining time. 2. Implement all measuring/controlling logic in separate class with interface AbstractStatist, in CacheLevel class make just hooks for it. So changing AbstractStatist implementation I can change work mode to measure statistics and use it or work as usual cache (EmptyStatist for all empty hooks). I make implementation to skip all items not worst caching (calcTime/calcHits*totalHits-totalTime < 0) but it's possible to make more smart things (like caching only most efficient items not exceeding cache size).If it is more complex, that what I described, it would have to be thought out a great deal. My goal was to simply have the program expose optimization points(variables) then allow an optimizer to change those to find better points. The program itself would be virtually unmodified. No code to interact with the optimization process except to use variables instead of constants(which is minimal and necessary). Exposing an interface for the program itself to guide the optimization process seems like a lot more work. But, of course, ultimately is better as it allows more information to flow in to the optimization process. But this design is beyond what I'm willing to achieve(this way could be months or years to get done right), while my method could take just a few hours to code up, and is rather general, although a bit dumb(fire and forget and hope for the best).In my experience I can draw some conclusions. 1. It need to separate measure mode and control mode. You can't have accurate statistics while changing system behavior according to current statistics state. 2. Statistics can be different for different applications but for specific application in specific conditions for most cases it can be approximated as constant.Yes, it is tricky to make the algorithm stable. This is why I think, for a simple optimizer, it would need to do this over long periods(months of program use). Because there are so many aberrations(other programs, user behavior, etc), these can only be statically removed by repetitive uses. Essentially "low pass the data to remove all the spikes" then compare the avg result with the previous.So for array allocating strategy more realistic scenario the next, I think: 1. Application compiled in some 'array_debug' mode then some statist trait added to array, collect usage statistics and writes optimal constants at the application exit. 2. Programmer configure array allocator in application according to these constants. 3. Application builds in release mode with optimal allocation strategy and without any statist overhead and works fast. Users are happy.This is more static profiling type of optimizations. I am talking about something a bit different. Both methods could be used together for a better result, but mine is for simplicity. We generally blindly set constants for things that affect performance. Let's simply turn those constants in to variables and let a global blind optimizer try to figure out better values than what we "blindly" set. There is no guarantee that it would find a better result and may even introduce program instability. But all this stuff can be somewhat measured by cpu and memory usage and given enough parameters, there is probably at least several optimal points the optimizer could find. Ultimately for just a little work(setting the variables and specifying their ranges and step, say), we could have most programs being created, in D for now at least, optimizing themselves(while they are being used by the user after they have been shipped) to some degree. This, I believe, is unheard of. It represents the next level in program optimizations. Imagine one day where a program could optimize itself depending on the hardware of the user, the users habits, etc. Well, this method attempts to get that ball rolling and does all those things in a general way(albeit ignorant, but maybe just as effective). It's hard to know how effective until it is done, but we do know it can't be any worse than what we already have.
Aug 04 2016