www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Parse File at compile time, but not embedded

reply Pie? <AmericanPie gmail.com> writes:
Is it possible to parse a file at compile time without embedding 
it into the binary?

I have a sort of "configuration" file that defines how to create 
some objects. I'd like to be able to read how to create them but 
not have that config file stick around in the binary.

e.g., (simple contrived example follows)

Config.txt
    x, my1
    y, my1
    z, my2


class my1 { }
class my2 { }

void parseConfig(A)
{
     ....
}

void main()
{
    parseConfig('Config.txt') // Effectively creates a mixin that 
mixes in auto x = new my1; auto y = new my1; auto z = new my2;
}


If parseConfig uses import('Config.txt') then config.txt will end 
up in the binary which I do not want. It would be easier to be 
able to use import and strip it out later if possible. Config.txt 
may contain secure information, which is why is doesn't belong in 
the binary.
Jun 06 2016
parent reply Alex Parrill <initrd.gz gmail.com> writes:
On Monday, 6 June 2016 at 17:31:52 UTC, Pie? wrote:
 Is it possible to parse a file at compile time without 
 embedding it into the binary?

 I have a sort of "configuration" file that defines how to 
 create some objects. I'd like to be able to read how to create 
 them but not have that config file stick around in the binary.

 e.g., (simple contrived example follows)

 Config.txt
    x, my1
    y, my1
    z, my2


 class my1 { }
 class my2 { }

 void parseConfig(A)
 {
     ....
 }

 void main()
 {
    parseConfig('Config.txt') // Effectively creates a mixin 
 that mixes in auto x = new my1; auto y = new my1; auto z = new 
 my2;
 }


 If parseConfig uses import('Config.txt') then config.txt will 
 end up in the binary which I do not want. It would be easier to 
 be able to use import and strip it out later if possible. 
 Config.txt may contain secure information, which is why is 
 doesn't belong in the binary.
Most compilers, I believe, will not embed a string if it is not used anywhere at runtime. DMD might not though, I'm not sure. But reading sensitive data at compile-time strikes me as dangerous, depending on your use case. If you are reading sensitive information at compile time, you are presumably going to include that information in your binary (otherwise why would you read it?), and your binary is not secure.
Jun 06 2016
parent reply Pie? <AmericanPie gmail.com> writes:
On Monday, 6 June 2016 at 21:31:32 UTC, Alex Parrill wrote:
 On Monday, 6 June 2016 at 17:31:52 UTC, Pie? wrote:
 Is it possible to parse a file at compile time without 
 embedding it into the binary?

 I have a sort of "configuration" file that defines how to 
 create some objects. I'd like to be able to read how to create 
 them but not have that config file stick around in the binary.

 e.g., (simple contrived example follows)

 Config.txt
    x, my1
    y, my1
    z, my2


 class my1 { }
 class my2 { }

 void parseConfig(A)
 {
     ....
 }

 void main()
 {
    parseConfig('Config.txt') // Effectively creates a mixin 
 that mixes in auto x = new my1; auto y = new my1; auto z = new 
 my2;
 }


 If parseConfig uses import('Config.txt') then config.txt will 
 end up in the binary which I do not want. It would be easier 
 to be able to use import and strip it out later if possible. 
 Config.txt may contain secure information, which is why is 
 doesn't belong in the binary.
Most compilers, I believe, will not embed a string if it is not used anywhere at runtime. DMD might not though, I'm not sure.
This doesn't seem to be the case. In a release build, even though I never "use" the string, it is embedded. I guess this is due to not using enum but enum seems to be much harder to work with if not impossible.
 But reading sensitive data at compile-time strikes me as 
 dangerous, depending on your use case. If you are reading 
 sensitive information at compile time, you are presumably going 
 to include that information in your binary (otherwise why would 
 you read it?), and your binary is not secure.
Not necessarily, You chased that rabbit quite far! The data your reading could contain sensitive information only used at compile time and not meant to embed. For example, the file could contain login and password to an SQL database that you then connect, at compile time and retrieve that information the disregard the password(it is not needed at run time).
Jun 06 2016
next sibling parent reply Mithun Hunsur <me philpax.me> writes:
On Monday, 6 June 2016 at 21:57:20 UTC, Pie? wrote:
 On Monday, 6 June 2016 at 21:31:32 UTC, Alex Parrill wrote:
 On Monday, 6 June 2016 at 17:31:52 UTC, Pie? wrote:
 Is it possible to parse a file at compile time without 
 embedding it into the binary?

 I have a sort of "configuration" file that defines how to 
 create some objects. I'd like to be able to read how to 
 create them but not have that config file stick around in the 
 binary.

 e.g., (simple contrived example follows)

 Config.txt
    x, my1
    y, my1
    z, my2


 class my1 { }
 class my2 { }

 void parseConfig(A)
 {
     ....
 }

 void main()
 {
    parseConfig('Config.txt') // Effectively creates a mixin 
 that mixes in auto x = new my1; auto y = new my1; auto z = 
 new my2;
 }


 If parseConfig uses import('Config.txt') then config.txt will 
 end up in the binary which I do not want. It would be easier 
 to be able to use import and strip it out later if possible. 
 Config.txt may contain secure information, which is why is 
 doesn't belong in the binary.
Most compilers, I believe, will not embed a string if it is not used anywhere at runtime. DMD might not though, I'm not sure.
This doesn't seem to be the case. In a release build, even though I never "use" the string, it is embedded. I guess this is due to not using enum but enum seems to be much harder to work with if not impossible.
 But reading sensitive data at compile-time strikes me as 
 dangerous, depending on your use case. If you are reading 
 sensitive information at compile time, you are presumably 
 going to include that information in your binary (otherwise 
 why would you read it?), and your binary is not secure.
Not necessarily, You chased that rabbit quite far! The data your reading could contain sensitive information only used at compile time and not meant to embed. For example, the file could contain login and password to an SQL database that you then connect, at compile time and retrieve that information the disregard the password(it is not needed at run time).
This is definitely possible, but it can depend on your compiler. If you use an enum, it'll be treated as a compile-time constant - so if you never store it anywhere (i.e. enum File = import('file.txt'); string file = File; is a no-no at global scope), you should be fine. If you do find yourself in the precarious situation of storing the data, then it's up to your compiler to detect that there are no runtime references to the data and elide it. LDC and GDC most likely do this, but I doubt DMD would. For safety, you should try and reformulate your code in terms of enums and local variables; this *should* work with DMD, but it's possible it's not smart enough to catch onto the fact that the function is never used at run-time (and therefore does not need to be included in the executable).
Jun 06 2016
parent reply Pie? <AmericanPie gmail.com> writes:
On Tuesday, 7 June 2016 at 02:04:41 UTC, Mithun Hunsur wrote:
 On Monday, 6 June 2016 at 21:57:20 UTC, Pie? wrote:
 On Monday, 6 June 2016 at 21:31:32 UTC, Alex Parrill wrote:
 [...]
This doesn't seem to be the case. In a release build, even though I never "use" the string, it is embedded. I guess this is due to not using enum but enum seems to be much harder to work with if not impossible.
 [...]
Not necessarily, You chased that rabbit quite far! The data your reading could contain sensitive information only used at compile time and not meant to embed. For example, the file could contain login and password to an SQL database that you then connect, at compile time and retrieve that information the disregard the password(it is not needed at run time).
This is definitely possible, but it can depend on your compiler. If you use an enum, it'll be treated as a compile-time constant - so if you never store it anywhere (i.e. enum File = import('file.txt'); string file = File; is a no-no at global scope), you should be fine. If you do find yourself in the precarious situation of storing the data, then it's up to your compiler to detect that there are no runtime references to the data and elide it. LDC and GDC most likely do this, but I doubt DMD would. For safety, you should try and reformulate your code in terms of enums and local variables; this *should* work with DMD, but it's possible it's not smart enough to catch onto the fact that the function is never used at run-time (and therefore does not need to be included in the executable).
Ok, I will assume it will be able to be removed for release. It is an easy check(just search if binary contains file info). I'm sure an easy fix could be to write 0's over the data in the binary if necessary. If I use an enum dmd does *not* remove it in release build. I will work on parsing the file using CTFE and hopefully dmd will not try to keep it around, or it can be solved using gdc/ldc or some other method.
Jun 06 2016
next sibling parent Pie? <AmericanPie gmail.com> writes:
If I use an enum dmd DOES remove it in release build. But I'm not 
sure for the general case yet.
Jun 06 2016
prev sibling parent cym13 <cpicard openmailbox.org> writes:
On Tuesday, 7 June 2016 at 04:17:05 UTC, Pie? wrote:
 Ok, I will assume it will be able to be removed for release. It 
 is an easy check(just search if binary contains file info). I'm 
 sure an easy fix could be to write 0's over the data in the 
 binary if necessary.
Binaries aren't magical beings, if your string is there you can just check for it as you would any other file: grep "mysecret" mybinary sed "s/mysecret/garbage/g" mybinary If your string is very small you may hit a problem though. I know gcc for example sometimes maps little strings directly using mov instructions and the numeric value of the string chars. So if your string is very short it may be segmented in words, just adapt your search from there.
Jun 14 2016
prev sibling parent reply Alex Parrill <initrd.gz gmail.com> writes:
On Monday, 6 June 2016 at 21:57:20 UTC, Pie? wrote:
 On Monday, 6 June 2016 at 21:31:32 UTC, Alex Parrill wrote:
 But reading sensitive data at compile-time strikes me as 
 dangerous, depending on your use case. If you are reading 
 sensitive information at compile time, you are presumably 
 going to include that information in your binary (otherwise 
 why would you read it?), and your binary is not secure.
Not necessarily, You chased that rabbit quite far! The data your reading could contain sensitive information only used at compile time and not meant to embed. For example, the file could contain login and password to an SQL database that you then connect, at compile time and retrieve that information the disregard the password(it is not needed at run time).
Accessing a SQL server at compile time seems like a huge abuse of CTFE (and I'm pretty sure it's impossible at the moment). Why do I need to install and set up a MySQL database in order to build your software?
Jun 07 2016
next sibling parent cy <dlang verge.info.tm> writes:
On Tuesday, 7 June 2016 at 22:09:58 UTC, Alex Parrill wrote:
 Accessing a SQL server at compile time seems like a huge abuse 
 of CTFE (and I'm pretty sure it's impossible at the moment). 
 Why do I need to install and set up a MySQL database in order 
 to build your software?
Presumably you wouldn't be building it at all, since this seems like a technique to provide obfuscated binaries where people aren't privvy to exactly what was used to compile it.
Jun 09 2016
prev sibling next sibling parent reply Joerg Joergonson <JJoergonson gmail.com> writes:
On Tuesday, 7 June 2016 at 22:09:58 UTC, Alex Parrill wrote:
 On Monday, 6 June 2016 at 21:57:20 UTC, Pie? wrote:
 On Monday, 6 June 2016 at 21:31:32 UTC, Alex Parrill wrote:
 [...]
Not necessarily, You chased that rabbit quite far! The data your reading could contain sensitive information only used at compile time and not meant to embed. For example, the file could contain login and password to an SQL database that you then connect, at compile time and retrieve that information the disregard the password(it is not needed at run time).
Accessing a SQL server at compile time seems like a huge abuse of CTFE (and I'm pretty sure it's impossible at the moment). Why do I need to install and set up a MySQL database in order to build your software?
Lol, who says you have access to my software? You know, the problem with assumptions is that they generally make no sense when you actually think about them.
Jun 09 2016
next sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Thursday, 9 June 2016 at 22:02:44 UTC, Joerg Joergonson wrote:
 Lol, who says you have access to my software? You know, the 
 problem with assumptions is that they generally make no sense 
 when you actually think about them.
oh, yeah. it suddenly reminds me about some obscure thing. other people told me that they were able to solve the same problems with something they called "build system"...
Jun 10 2016
parent Joerg Joergonson <JJoergonson gmail.com> writes:
On Friday, 10 June 2016 at 07:03:21 UTC, ketmar wrote:
 On Thursday, 9 June 2016 at 22:02:44 UTC, Joerg Joergonson 
 wrote:
 Lol, who says you have access to my software? You know, the 
 problem with assumptions is that they generally make no sense 
 when you actually think about them.
oh, yeah. it suddenly reminds me about some obscure thing. other people told me that they were able to solve the same problems with something they called "build system"...
Mines not a build system... In any case LDC does drop the data so it is ok. The problem with people is that they are idiots! They make assumptions about other peoples stuff without having any clue what actually is going on rather than addressing the real issue. In fact, the thing I'm doing has nothing to do with SQL, security, etc. It was only an example. I just don't want crap in my EXE that shouldn't be there, simple as that. Also, since I'm the sole designer and the software is simple, I have every right to do it how I want. What's strange, though, is my little ole app takes 300MB's and constantly uses 13% of the cpu... even though all it does is display a few images. This is with LDC release. Doesn't seem very efficient. I imagine a similar app in C would take about 1% and 20MB. Hopefully profiling in D isn't as much a nightmare as setting it up. BTW, I'm using simpledisplay... I saw that you made a commit or something on github. Are you noticing any similarities to cpu and memory usage?
Jun 11 2016
prev sibling parent reply Alex Parrill <initrd.gz gmail.com> writes:
On Thursday, 9 June 2016 at 22:02:44 UTC, Joerg Joergonson wrote:
 On Tuesday, 7 June 2016 at 22:09:58 UTC, Alex Parrill wrote:
 Accessing a SQL server at compile time seems like a huge abuse 
 of CTFE (and I'm pretty sure it's impossible at the moment). 
 Why do I need to install and set up a MySQL database in order 
 to build your software?
Lol, who says you have access to my software? You know, the problem with assumptions is that they generally make no sense when you actually think about them.
By "I" I meant "someone new coming into the project", such as a new hire or someone that will be maintaining your program while you work on other things. In any case, this is impossible. D has no such concept as "compile-time-only" values, so any usage of a value risks embedding it into the binary.
Jun 10 2016
parent reply Joerg Joergonson <JJoergonson gmail.com> writes:
On Friday, 10 June 2016 at 12:48:43 UTC, Alex Parrill wrote:
 On Thursday, 9 June 2016 at 22:02:44 UTC, Joerg Joergonson 
 wrote:
 On Tuesday, 7 June 2016 at 22:09:58 UTC, Alex Parrill wrote:
 Accessing a SQL server at compile time seems like a huge 
 abuse of CTFE (and I'm pretty sure it's impossible at the 
 moment). Why do I need to install and set up a MySQL database 
 in order to build your software?
Lol, who says you have access to my software? You know, the problem with assumptions is that they generally make no sense when you actually think about them.
By "I" I meant "someone new coming into the project", such as a new hire or someone that will be maintaining your program while you work on other things. In any case, this is impossible. D has no such concept as "compile-time-only" values, so any usage of a value risks embedding it into the binary.
It seems that dmd does not remove the data if it is used in any way. When I started using the code, the data then appeared in the binary. The access to the code is through the following auto SetupData(string filename) { enum d = ParseData!(filename); //pragma(msg, d); mixin(d); return data; } The enum d does not have the data in it as showing by the pragma. ParseData simply determines how to build data depending on external state and uses import(filename) to get data. Since the code compiles, obviously d is a CT constant. But after actually using "data" and doing some work with it, the imported file showed up in the binary. Of course, if I just copy the pragma output and paste it in place of the first 3 lines, the external file it isn't added to the binary(since there are obviously then no references to it). So, at least for DMD, it doesn't do a good job at removing "dangling" references. I haven't tried GDC or LDC. You could probably use somethign like string ParseData(string filename)() { auto lines[] = import(splitLines(import(filename))); if (lines[0] == "XXXyyyZZZ33322211") return "int data = 3"; return "int data = 4"; } So the idea is if the external file contains XXXyyyZZZ33322211 we create an int with value 3 and if not then with 4. The point is, though, that `XXXyyyZZZ33322211` should never be in the binary since ParseData is never called at run-time. At compile time, the compiler executes ParseData as CTFE and is able to generate the mixin string as if directly typed "int data = 3;" or "int data = 4;" instead. The only difference between my code and the above is the generated string that is returned. I'm going to assume it's a dmd thing for now until I'm able check it out with another compiler.
Jun 10 2016
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Friday, 10 June 2016 at 18:47:59 UTC, Joerg Joergonson wrote:
 In any case, this is impossible. D has no such concept as 
 "compile-time-only" values, so any usage of a value risks 
 embedding it into the binary.
sure, it has. template ParseData (string text) { private static enum Key = "XXXyyyZZZ33322211\n"; private static enum TRet = "int data = 3;"; private static enum FRet = "int data = 4;"; static if (text.length >= Key.length) { static if (text[0..Key.length] == Key) enum ParseData = TRet; else enum ParseData = FRet; } else { enum ParseData = FRet; } } void main () { mixin(ParseData!(import("a"))); } look, ma, no traces of our secret key in binary! and no traces of `int data` declaration too!
Jun 11 2016
parent reply Joerg Joergonson <JJoergonson gmail.com> writes:
On Saturday, 11 June 2016 at 13:03:47 UTC, ketmar wrote:
 On Friday, 10 June 2016 at 18:47:59 UTC, Joerg Joergonson wrote:
 In any case, this is impossible. D has no such concept as 
 "compile-time-only" values, so any usage of a value risks 
 embedding it into the binary.
sure, it has. template ParseData (string text) { private static enum Key = "XXXyyyZZZ33322211\n"; private static enum TRet = "int data = 3;"; private static enum FRet = "int data = 4;"; static if (text.length >= Key.length) { static if (text[0..Key.length] == Key) enum ParseData = TRet; else enum ParseData = FRet; } else { enum ParseData = FRet; } } void main () { mixin(ParseData!(import("a"))); } look, ma, no traces of our secret key in binary! and no traces of `int data` declaration too!
This doesn't seem to be the case though in more complex examples ;/ enums seem to be compile time only in certain conditions. My code is almost identical do what you have written except ParseData generates a more complex string and I do reference parts of the "Key" in the generation of the code. It's possible DMD keeps the full code around because of this.
Jun 11 2016
parent ketmar <ketmar ketmar.no-ip.org> writes:
On Sunday, 12 June 2016 at 01:39:11 UTC, Joerg Joergonson wrote:
 This doesn't seem to be the case though in more complex 
 examples ;/
it is.
 My code is almost identical do what you have written
your code is *completely* different. that's why there are no traces of CTFE values in my sample. it's not that hard to find out that my code has no functions at all, so no code for 'em can be generated.
Jun 13 2016
prev sibling parent Adrian Matoga <dlang.spam matoga.info> writes:
On Tuesday, 7 June 2016 at 22:09:58 UTC, Alex Parrill wrote:
 Not necessarily, You chased that rabbit quite far! The data 
 your reading could contain sensitive information only used at 
 compile time and not meant to embed. For example, the file 
 could contain login and password to an SQL database that  you 
 then connect, at compile time and retrieve that information 
 the disregard the password(it is not needed at run time).
Accessing a SQL server at compile time seems like a huge abuse of CTFE (and I'm pretty sure it's impossible at the moment). Why do I need to install and set up a MySQL database in order to build your software?
Just mount a filesystem that uses an SQL database as storage (query can be encoded in file path) and you have it. Whether it's a good idea is another story.
Jun 10 2016