digitalmars.D - Feature Request: Hashed Based Assertion
- tcak (40/40) Nov 26 2015 I brought this topic in "Learn" a while ago, but I want to talk
- tcak (3/6) Nov 26 2015 One applicable solution: __traits( hashOf,
- Andrea Fontana (2/9) Nov 26 2015 Can't you calculate hash of involved files at compile time?
- tcak (7/17) Nov 26 2015 One file can consist of many API functions. If there are 50
- Jacob Carlborg (5/9) Nov 26 2015 With a complete D front end working at compile time it would at least be...
- qznc (3/14) Nov 26 2015 This is the job of the type checker, isn't it? What would a hash
- Idan Arye (12/15) Nov 26 2015 So it's not just the function's signature you want to hash, but
- bitwise (14/55) Nov 26 2015 I'm wondering if a diff tool could be somehow combined with a
- deadalnix (11/11) Nov 26 2015 I see many solution here that do not require any language change.
- tcak (11/22) Nov 27 2015 Not one thing in your solutions give any simple solution like:
- =?UTF-8?B?Tm9yZGzDtnc=?= (15/17) Nov 27 2015 I've thought about this too in the past and asked on the forums
I brought this topic in "Learn" a while ago, but I want to talk about it again. You are in a big team or working with a big code base. APIs are being defined/modified, configuration constants are defined/modified, structures are defined/modified for data. You are coding on business logic side, and relying everything based on current APIs, configuration, and data structures. A part of codes have been updated on API side, but you are not aware of it, or time has passed, and you assume that your code will work properly. Nobody would be checking every single part of business logic line by line. On runtime, you will get unexpected results, and lose some hair till finding where the problem is. Also finding expected results on a long running processes would cause much more trouble. --- What I do currently is that: I calculate the hash of API code (function, configuration, etc together) with a hash function, and store it where the API is defined as a constant. public enum HASH_OF_THIS_API = 0x1234; // Hash is calculated from here public void my_api_function(){} public enum my_api_constant = 5; // till here Then wherever I use that API, I insert a "static assert( HASH_OF_THIS_API == 0x1234 );". Whoever modifies the API, after the modification, calculates the most recent code's hash value and updates the constant. This allows compiler to warn the business logic programmer about changes on API codes. So, changing parts can be reviewed and changes are made if required. --- The feature request part comes here: It is possible that API programmer forgets to update the hash value in the code. Also, comments in the code shouldn't affect the hash value. Automation is required on compile-time, so the compiler automatically calculates the hash value of code, and it can be read on compile-time. Hence, no constant is required to store the hash value. What is needed is to be able to bind a hash value to any block with a name.
Nov 26 2015
On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:I brought this topic in "Learn" a while ago, but I want to talk about it again. [...]One applicable solution: __traits( hashOf, apiFunctionName/structName/variableName/className )
Nov 26 2015
On Thursday, 26 November 2015 at 11:14:54 UTC, tcak wrote:On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:Can't you calculate hash of involved files at compile time?I brought this topic in "Learn" a while ago, but I want to talk about it again. [...]One applicable solution: __traits( hashOf, apiFunctionName/structName/variableName/className )
Nov 26 2015
On Thursday, 26 November 2015 at 11:18:19 UTC, Andrea Fontana wrote:On Thursday, 26 November 2015 at 11:14:54 UTC, tcak wrote:One file can consist of many API functions. If there are 50 functions in it, and only 1 of them has been modified, whole hash will change. Compiler cannot tell which API has been changed then. Purpose is to decrease the burden on programmer, and put it onto compiler.On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:Can't you calculate hash of involved files at compile time?I brought this topic in "Learn" a while ago, but I want to talk about it again. [...]One applicable solution: __traits( hashOf, apiFunctionName/structName/variableName/className )
Nov 26 2015
On 2015-11-26 12:24, tcak wrote:One file can consist of many API functions. If there are 50 functions in it, and only 1 of them has been modified, whole hash will change. Compiler cannot tell which API has been changed then. Purpose is to decrease the burden on programmer, and put it onto compiler.With a complete D front end working at compile time it would at least be possible in theory. -- /Jacob Carlborg
Nov 26 2015
On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:I brought this topic in "Learn" a while ago, but I want to talk about it again. You are in a big team or working with a big code base. APIs are being defined/modified, configuration constants are defined/modified, structures are defined/modified for data. You are coding on business logic side, and relying everything based on current APIs, configuration, and data structures. A part of codes have been updated on API side, but you are not aware of it, or time has passed, and you assume that your code will work properly. Nobody would be checking every single part of business logic line by line.This is the job of the type checker, isn't it? What would a hash provide that a type checker does not?
Nov 26 2015
On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:I brought this topic in "Learn" a while ago, but I want to talk about it again. [...]So it's not just the function's signature you want to hash, but it's code as well? What about functions called from the API function? Or functions that set data that'll later be used by the API functions? If anything, I would have hashed the unittests of the API function. If the behavior of the API function changes in a fashion that requires a modification of the unittest, then you might need to alert the business logic programmers. Anything less than that is just useless noise that'll hide the actual changes you want to be warned about among the endless clutter created by trivial changes.
Nov 26 2015
On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:I brought this topic in "Learn" a while ago, but I want to talk about it again. You are in a big team or working with a big code base. APIs are being defined/modified, configuration constants are defined/modified, structures are defined/modified for data. You are coding on business logic side, and relying everything based on current APIs, configuration, and data structures. A part of codes have been updated on API side, but you are not aware of it, or time has passed, and you assume that your code will work properly. Nobody would be checking every single part of business logic line by line. On runtime, you will get unexpected results, and lose some hair till finding where the problem is. Also finding expected results on a long running processes would cause much more trouble. --- What I do currently is that: I calculate the hash of API code (function, configuration, etc together) with a hash function, and store it where the API is defined as a constant. public enum HASH_OF_THIS_API = 0x1234; // Hash is calculated from here public void my_api_function(){} public enum my_api_constant = 5; // till here Then wherever I use that API, I insert a "static assert( HASH_OF_THIS_API == 0x1234 );". Whoever modifies the API, after the modification, calculates the most recent code's hash value and updates the constant. This allows compiler to warn the business logic programmer about changes on API codes. So, changing parts can be reviewed and changes are made if required. --- The feature request part comes here: It is possible that API programmer forgets to update the hash value in the code. Also, comments in the code shouldn't affect the hash value. Automation is required on compile-time, so the compiler automatically calculates the hash value of code, and it can be read on compile-time. Hence, no constant is required to store the hash value. What is needed is to be able to bind a hash value to any block with a name.I'm wondering if a diff tool could be somehow combined with a parser to create a list of functions/symbols which may have experienced behavioural changes between versions of dmd. What I'm suggesting is a diff tool which is aware of a symbol's dependancies so that even if a function body wasn't changed, its dependant symbols could be checked as well. If such a tool existed, it could be ran against each new release of dmd, and produce a comma separated list of functions that may have experienced behavioural changes. With that list in hand, one could then simply grep for each symbol in their own repository each time they upgrade dmd. I hearby place this idea in the public domain ;) Bit
Nov 26 2015
I see many solution here that do not require any language change. To start, have a linter yell at the programmer when (s)he submit a diff. Dev commit directly ? What the fuck are you doing ? Do code review and get a linter. Alternatively, generate a di file and hash it. You can have a bot do it and commit with a commit hook. DMD can dump infos about the program in json format. hash this and run with it. You may also change your strategy in term of source control: https://www.youtube.com/watch?v=W71BTkUbdqE . Unified source code aleviate completely these kind of issues to boot.
Nov 26 2015
On Friday, 27 November 2015 at 05:33:52 UTC, deadalnix wrote:I see many solution here that do not require any language change. To start, have a linter yell at the programmer when (s)he submit a diff. Dev commit directly ? What the fuck are you doing ? Do code review and get a linter. Alternatively, generate a di file and hash it. You can have a bot do it and commit with a commit hook. DMD can dump infos about the program in json format. hash this and run with it. You may also change your strategy in term of source control: https://www.youtube.com/watch?v=W71BTkUbdqE . Unified source code aleviate completely these kind of issues to boot.Not one thing in your solutions give any simple solution like: static assert( __traits( hashOf, std.file.read ) == 0x1234, "They have changed implementation again." ); static assert( __traits( hashOf, facebook.apis.addUser ) == 0x5543, "Check API documentation again for addUser." ); di file wouldn't work. It doesn't contain implementation code. Also, all APIs are in it. We need specific hash for each API, so it doesn't take long time to find where the problem is. JSON is same as di. No difference. Yours are not helping, making everything more complex.
Nov 27 2015
On Friday, 27 November 2015 at 08:09:27 UTC, tcak wrote:Yours are not helping, making everything more complex.Yes, because to achieve what you're asking for, you NEED a complex solution. The code WILL change with every release..thats the point of a release.. so any hashing mechanism like you're describing will just trigger every time, making it useless. Even if this was not the case, you still wouldn't know where the changes were. Bit
Nov 27 2015
On Friday, 27 November 2015 at 16:18:52 UTC, bitwise wrote:On Friday, 27 November 2015 at 08:09:27 UTC, tcak wrote:Let me explain: It is not complex. What makes it complex is that you envision a very detailed thing. Hash of a Function = MD5( Token List of Function /* but ignore comments */ ); You do not have to know where the changes are. You need to know what has changed, how it acts currently briefly. If behaviour of code changes, it is good that you know it. With above hashing method, a piece of code that hasn't changed would have same hash value always. And if you do not like it, don't check the hash value. Just continue writing your codes as you wish. But in business perspective, if the software's consistency is worth millions of dollars, a software engineer would want it to be giving error whenever codes change. Do we want D to be a child language, or have more useful features?Yours are not helping, making everything more complex.Yes, because to achieve what you're asking for, you NEED a complex solution. The code WILL change with every release..thats the point of a release.. so any hashing mechanism like you're describing will just trigger every time, making it useless. Even if this was not the case, you still wouldn't know where the changes were. Bit
Nov 27 2015
On Friday, 27 November 2015 at 18:51:54 UTC, tcak wrote:On Friday, 27 November 2015 at 16:18:52 UTC, bitwise wrote:Your approach is prone to false positives. if(1) doSomething(); if(1) { doSomething(); } Same behaviour, different code. I hope you have a heck of a coding standard written up ;) Worse still, consider the following example: void foo() { if(bar()) deleteSomeFiles(); } int bar() { return 0; } Your proposed approach would not notify you that foo(), a potentially dangerous function, has changed it's behaviour if someone made bar() return 1. *insert witty comeback to your comment about "business perspective" here* BitOn Friday, 27 November 2015 at 08:09:27 UTC, tcak wrote:Let me explain: It is not complex. What makes it complex is that you envision a very detailed thing. Hash of a Function = MD5( Token List of Function /* but ignore comments */ ); You do not have to know where the changes are. You need to know what has changed, how it acts currently briefly. If behaviour of code changes, it is good that you know it. With above hashing method, a piece of code that hasn't changed would have same hash value always. And if you do not like it, don't check the hash value. Just continue writing your codes as you wish. But in business perspective, if the software's consistency is worth millions of dollars, a software engineer would want it to be giving error whenever codes change. Do we want D to be a child language, or have more useful features?Yours are not helping, making everything more complex.Yes, because to achieve what you're asking for, you NEED a complex solution. The code WILL change with every release..thats the point of a release.. so any hashing mechanism like you're describing will just trigger every time, making it useless. Even if this was not the case, you still wouldn't know where the changes were. Bit
Nov 27 2015
On Friday, 27 November 2015 at 20:00:16 UTC, bitwise wrote:On Friday, 27 November 2015 at 18:51:54 UTC, tcak wrote:Question: Has the behaviour of foo changed? If foo cares about bar's behaviour, foo checks bar's hash value. -- if(1) doSomething(); if(1) { doSomething(); } You are correct here about hash calculation, but unless someone touches to codes, this never happens, and no hash changes would be seen. If someone is touching it as you exampled, checking the documentation about what has happened would be the correct approach. Importance of behaviour change is perceptional, computer cannot know that already.On Friday, 27 November 2015 at 16:18:52 UTC, bitwise wrote:Your approach is prone to false positives. if(1) doSomething(); if(1) { doSomething(); } Same behaviour, different code. I hope you have a heck of a coding standard written up ;) Worse still, consider the following example: void foo() { if(bar()) deleteSomeFiles(); } int bar() { return 0; } Your proposed approach would not notify you that foo(), a potentially dangerous function, has changed it's behaviour if someone made bar() return 1. *insert witty comeback to your comment about "business perspective" here* BitOn Friday, 27 November 2015 at 08:09:27 UTC, tcak wrote:Let me explain: It is not complex. What makes it complex is that you envision a very detailed thing. Hash of a Function = MD5( Token List of Function /* but ignore comments */ ); You do not have to know where the changes are. You need to know what has changed, how it acts currently briefly. If behaviour of code changes, it is good that you know it. With above hashing method, a piece of code that hasn't changed would have same hash value always. And if you do not like it, don't check the hash value. Just continue writing your codes as you wish. But in business perspective, if the software's consistency is worth millions of dollars, a software engineer would want it to be giving error whenever codes change. Do we want D to be a child language, or have more useful features?Yours are not helping, making everything more complex.Yes, because to achieve what you're asking for, you NEED a complex solution. The code WILL change with every release..thats the point of a release.. so any hashing mechanism like you're describing will just trigger every time, making it useless. Even if this was not the case, you still wouldn't know where the changes were. Bit
Nov 27 2015
On Friday, 27 November 2015 at 20:19:40 UTC, tcak wrote:if(1) doSomething(); if(1) { doSomething(); } You are correct here about hash calculation, but unless someone touches to codes, this never happens, and no hash changes would be seen. If someone is touching it as you exampled, checking the documentation about what has happened would be the correct approach. Importance of behaviour change is perceptional, computer cannot know that already.If you really want to integrate this into the language, you should consider future improvements. Hashing the tokens is a conservative approximation of "behavior change", as the example above shows. Another example would be variable renames. The specification of the hash algorithm should provide the freedom that both variants above get the same hash, but still be correct in the sense that different behavior always yields different hashes. Overall, I'm not convinced that this needs to be a language extension or trait. It could simple a static analysis tool independent of the compiler.
Nov 27 2015
On Friday, 27 November 2015 at 08:09:27 UTC, tcak wrote:On Friday, 27 November 2015 at 05:33:52 UTC, deadalnix wrote:If the API signature change, the type system will yell at you. All the proposed solution will work. If the implementation change, you can apply the same solution on the binary, tadaaa ! If you want less hash change, a good idea can be to dump llvm ir from ldc, and run the cannibalization on it using opt. Also, if you have so much code that rely on implementation details that aren't in the API to the extent it is such a problem that you need language extension to handle it, you are doing something very very wrong. Indeed I'm not helping. You think you need a language extension, when it is quite obvious you have some methodology problem on your side and refuse to reconsider. What about, I know it is crazy, use a unified repository, have test and continuous integration, and submit diff with code review. If one change an API in a way that break the client code, the client ill fail and the CI tool will warn the developer that he needs to fix the client code or rework his API change. If the client code was not tested, then the problem is clearly not the API hash. Not only this doesn't require language extension, but this solves way more problems than the one you want to solve here. Now, don't get we wrong, I know how it is. Companies with broken work culture won't change anything unless the it is on the edge of bankruptcy. I understand. This is how it works. Please understand that, on the other side, it doesn't seems like the right move to export broken work environment as language features.I see many solution here that do not require any language change. To start, have a linter yell at the programmer when (s)he submit a diff. Dev commit directly ? What the fuck are you doing ? Do code review and get a linter. Alternatively, generate a di file and hash it. You can have a bot do it and commit with a commit hook. DMD can dump infos about the program in json format. hash this and run with it. You may also change your strategy in term of source control: https://www.youtube.com/watch?v=W71BTkUbdqE . Unified source code aleviate completely these kind of issues to boot.Not one thing in your solutions give any simple solution like: static assert( __traits( hashOf, std.file.read ) == 0x1234, "They have changed implementation again." ); static assert( __traits( hashOf, facebook.apis.addUser ) == 0x5543, "Check API documentation again for addUser." ); di file wouldn't work. It doesn't contain implementation code. Also, all APIs are in it. We need specific hash for each API, so it doesn't take long time to find where the problem is. JSON is same as di. No difference. Yours are not helping, making everything more complex.
Nov 27 2015
On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:What is needed is to be able to bind a hash value to any block with a name.I've thought about this too in the past and asked on the forums but I haven't gotten any response. It is possible. The problem is easier in dynamic languages. See for instance a the following solution in a specific Python runtime here: http://pgbovine.net/incpy.html `hashOf` is for AAs not for content digests. I believe the only realistic solution to this problem is to implement a specific pass in the D compiler that recursively calculates hash-digests (hash-chains) for all the code and data involved in a function call. It should probably only work for pure functions. AFAICT, it is possible but it's far from easy to get 100% correct :) DMD pull requests should be very welcomed, at least by me ;) See also: https://en.wikipedia.org/wiki/Hash_chain
Nov 27 2015