digitalmars.D - D code obfuscator
- DigitalDesigns (8/8) Jun 13 2018 Is there an obfuscator for D that at least renames identifiers?
- Shachar Shemesh (18/28) Jun 13 2018 I highly doubt it.
- DigitalDesigns (2/33) Jun 13 2018 Just one question! Are you kidding me?
- Stefan Koch (6/7) Jun 13 2018 He most certainly is not.
- Shachar Shemesh (21/33) Jun 14 2018 First of all, run your program under strace. For a surprising percentage...
- DigitalDesigns (2/39) Jun 14 2018 Wait? Are you sure you are not kidding? Do you want another shot?
- Shachar Shemesh (11/13) Jun 14 2018 No, I'm fine. Thank you. I am not out here to convert anyone. If you
- DigitalDesigns (5/18) Jun 14 2018 That's the best you can do? Do you really expect me to go and
- =?UTF-8?Q?Ali_=c3=87ehreli?= (3/5) Jun 14 2018 That was your third. :/
- Cym13 (33/35) Jun 14 2018 I won't say that obfuscation is entirely useless, if I have to
- Vladimir Panteleev (13/14) Jun 14 2018 I've had some experience on both sides of this... so, I think I
- Vladimir Panteleev (5/8) Jun 13 2018 Yes, DustMite has an obfuscation mode.
- Norm (6/14) Jun 13 2018 I don't know any specifically for D but these C/C++ tools might
Is there an obfuscator for D that at least renames identifiers? This is because sometimes they leak from various processes and could be potential sources of attack. It would be a tool that probably just replaces their values with, say their hash + something else and done pre release build. Ideally it would be able to compile with dmd and all in memory or use temp storage without file issues. It can't modify the code directly because then that would be permanent.
Jun 13 2018
On 14/06/18 03:01, DigitalDesigns wrote:Is there an obfuscator for D that at least renames identifiers? This is because sometimes they leak from various processes and could be potential sources of attack. It would be a tool that probably just replaces their values with, say their hash + something else and done pre release build. Ideally it would be able to compile with dmd and all in memory or use temp storage without file issues. It can't modify the code directly because then that would be permanent.I highly doubt it. You see, with introspection and run-time execution, writing such a tool is equivalent to solving the halting problem. You simply do not know what you're affecting. There are some cases where you might know at x% certainty that it's okay to rename. Someone might do a best-effort based tool. I'm not aware of one. With that said, what you're trying to achieve is probably not a good idea anyways. With very few exceptions(1), reverse-engineering code to figure out what it does is not considerably more difficult than using the source, even when none of the identifiers leak at all. Certain aspects of creating attacks are even easier with good rev-eng tools than in source form. Shachar 1- One notable exception is complex algorithmic code. I will point out that those are difficult to figure out from source code too, and it usually takes very good documentation to be able to do so, so even there I'm not sure my original statement doesn't hold.
Jun 13 2018
On Thursday, 14 June 2018 at 02:13:58 UTC, Shachar Shemesh wrote:On 14/06/18 03:01, DigitalDesigns wrote:Just one question! Are you kidding me?Is there an obfuscator for D that at least renames identifiers? This is because sometimes they leak from various processes and could be potential sources of attack. It would be a tool that probably just replaces their values with, say their hash + something else and done pre release build. Ideally it would be able to compile with dmd and all in memory or use temp storage without file issues. It can't modify the code directly because then that would be permanent.I highly doubt it. You see, with introspection and run-time execution, writing such a tool is equivalent to solving the halting problem. You simply do not know what you're affecting. There are some cases where you might know at x% certainty that it's okay to rename. Someone might do a best-effort based tool. I'm not aware of one. With that said, what you're trying to achieve is probably not a good idea anyways. With very few exceptions(1), reverse-engineering code to figure out what it does is not considerably more difficult than using the source, even when none of the identifiers leak at all. Certain aspects of creating attacks are even easier with good rev-eng tools than in source form. Shachar 1- One notable exception is complex algorithmic code. I will point out that those are difficult to figure out from source code too, and it usually takes very good documentation to be able to do so, so even there I'm not sure my original statement doesn't hold.
Jun 13 2018
On Thursday, 14 June 2018 at 05:21:03 UTC, DigitalDesigns wrote:Just one question! Are you kidding me?He most certainly is not. Infact I prefer size-optimized machinecode over source sometimes. Because it is a trustworthy representation of what the program does. Rather then being a half-truth about what it should do.
Jun 13 2018
On 14/06/18 08:21, DigitalDesigns wrote:On Thursday, 14 June 2018 at 02:13:58 UTC, Shachar Shemesh wrote:First of all, run your program under strace. For a surprising percentage of the programs that should give you a fairly good idea of what the program is doing. ltrace goes further, but it can be easily defeated by statically linking, so probably irrelevant for our current discussion. Next, try loading your program in Ida Pro (https://www.hex-rays.com/products/ida/index.shtml). You will notice that program flow practically jumps out at you with no further work on your part. Other tricks require a little more knowledge, but are still exceedingly effective. In a demonstration I saw in 2002, Halvar Flake showed how he uses Ida to graph the branches, and then use a tool he built to place breakpoints on the branch points. Next he started feeding inputs to the program, and colored the graph where the input sent the code. He used that to find the correct input that would bring the code path to the line he thought might be vulnerable. If I had to do this trick today for *my own* programs, I'd still use Ida and the compiled code. So, no, I was not kidding. Not even close. ShacharWith that said, what you're trying to achieve is probably not a good idea anyways. With very few exceptions(1), reverse-engineering code to figure out what it does is not considerably more difficult than using the source, even when none of the identifiers leak at all. Certain aspects of creating attacks are even easier with good rev-eng tools than in source form. ShacharJust one question! Are you kidding me?
Jun 14 2018
On Thursday, 14 June 2018 at 08:54:16 UTC, Shachar Shemesh wrote:On 14/06/18 08:21, DigitalDesigns wrote:Wait? Are you sure you are not kidding? Do you want another shot?On Thursday, 14 June 2018 at 02:13:58 UTC, Shachar Shemesh wrote:First of all, run your program under strace. For a surprising percentage of the programs that should give you a fairly good idea of what the program is doing. ltrace goes further, but it can be easily defeated by statically linking, so probably irrelevant for our current discussion. Next, try loading your program in Ida Pro (https://www.hex-rays.com/products/ida/index.shtml). You will notice that program flow practically jumps out at you with no further work on your part. Other tricks require a little more knowledge, but are still exceedingly effective. In a demonstration I saw in 2002, Halvar Flake showed how he uses Ida to graph the branches, and then use a tool he built to place breakpoints on the branch points. Next he started feeding inputs to the program, and colored the graph where the input sent the code. He used that to find the correct input that would bring the code path to the line he thought might be vulnerable. If I had to do this trick today for *my own* programs, I'd still use Ida and the compiled code. So, no, I was not kidding. Not even close. ShacharWith that said, what you're trying to achieve is probably not a good idea anyways. With very few exceptions(1), reverse-engineering code to figure out what it does is not considerably more difficult than using the source, even when none of the identifiers leak at all. Certain aspects of creating attacks are even easier with good rev-eng tools than in source form. ShacharJust one question! Are you kidding me?
Jun 14 2018
On 14/06/18 13:39, DigitalDesigns wrote:Wait? Are you sure you are not kidding? Do you want another shot?No, I'm fine. Thank you. I am not out here to convert anyone. If you want to believe the magic of obfuscation, go right ahead. You can probably even leverage D's CTFE to do it inside the compiler while not making your program too much uglier. Something like replacing definitions with: mixin Obfuscate!(int, "variableName"); and use with: Deobfuscate!"variableName"; Shouldn't be too difficult to create. Shachar
Jun 14 2018
On Thursday, 14 June 2018 at 11:07:17 UTC, Shachar Shemesh wrote:On 14/06/18 13:39, DigitalDesigns wrote:Dude, don't be an idiot! Please! Of course, here we go...Wait? Are you sure you are not kidding? Do you want another shot?No, I'm fine. Thank you. I am not out here to convert anyone. If you want to believe the magic of obfuscation, go right ahead.You can probably even leverage D's CTFE to do it inside the compiler while not making your program too much uglier. Something like replacing definitions with: mixin Obfuscate!(int, "variableName"); and use with: Deobfuscate!"variableName"; Shouldn't be too difficult to create.That's the best you can do? Do you really expect me to go and manually obfuscate an entire program? Do you want to try again? 3 strikes and your out!
Jun 14 2018
On 06/14/2018 04:33 AM, DigitalDesigns wrote:3 strikes and your out!That was your third. :/ Ali
Jun 14 2018
On Thursday, 14 June 2018 at 10:39:19 UTC, DigitalDesigns wrote:Wait? Are you sure you are not kidding? Do you want another shot?I won't say that obfuscation is entirely useless, if I have to choose I'll of course take the version with symbols for reverse engineering and there are specific cases where symbols carry way to much information for you to want it disclosed (most common being names of customers or projects etc). But, as someone whose job is to find security issues with softwares (and other stuff) be it with or without source, I can say with professionnal certainty that things like changing all identifiers to single-letter ids don't slow me the slightest in my assignments. That's just the state of things, reversers deal with stripped stuff all the time, identifiers are just nice to have. So instead, here's what would slow a reverse engineer: - Remove strings. Make sure to remove as many as you can, especially debug statements. Hide the rest by encrypting in memory. Even if it is possible to decrypt it or read it at runtime it'll be way harder to correlate things together. - Pack. Have your software decipher itself in memory at runtime, not all at once but only sections at once dynamically. Use random keys automatically generated at compile-time for that, that'll mess up binary diffs. - Include binary tricks to mess up with disassemblers. There are many constructs that common disassemblers interpret badly. - Mess with the structure. If you can remove all conditions and loops. A reverser can often just look at a function's logical graph and know what kind of work it is doing. The movfuscator is a good example. - Add runtime checks based on time deltas between two points of the code in different functions. Generate other output based on that. - Be sure to encrypt all communications of course. In short, do what good malwares do.
Jun 14 2018
On Thursday, 14 June 2018 at 08:54:16 UTC, Shachar Shemesh wrote:So, no, I was not kidding. Not even close.I've had some experience on both sides of this... so, I think I can say with some certainty that debugging symbols make reverse-engineering MUCH easier (many hunts to find the relevant code can be reduced to a keyword search), so I think it's a valid concern. That D leaks identifiers and other bits from the source code is a real issue preventing some real-world use cases. E.g., there might be legal obligations in place where leaking source code identifiers could be considered a breach of NDA etc. In one case, we needed to write an RTTI patcher for C++ (MSVC) after updating/reconfiguring the build toolchain, as the compiler would otherwise place the class names of some classes in the binary.
Jun 14 2018
On Thursday, 14 June 2018 at 00:01:31 UTC, DigitalDesigns wrote:Is there an obfuscator for D that at least renames identifiers? This is because sometimes they leak from various processes and could be potential sources of attack.Yes, DustMite has an obfuscation mode. You will need to give it a test command which checks if the file is still a working D program. Building the program and running its unit tests is generally sufficient for this purpose.
Jun 13 2018
On Thursday, 14 June 2018 at 00:01:31 UTC, DigitalDesigns wrote:Is there an obfuscator for D that at least renames identifiers? This is because sometimes they leak from various processes and could be potential sources of attack. It would be a tool that probably just replaces their values with, say their hash + something else and done pre release build. Ideally it would be able to compile with dmd and all in memory or use temp storage without file issues. It can't modify the code directly because then that would be permanent.I don't know any specifically for D but these C/C++ tools might help as a starting point. https://github.com/obfuscator-llvm/obfuscator/wiki https://github.com/obfuscator-llvm/obfuscator/tree/llvm-4.0 https://sourceforge.net/projects/cshroud/
Jun 13 2018