www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Binary file grammar

reply "wobbles" <grogan.colin gmail.com> writes:
I have to read a binary file.
I can use std.stdio.File.rawRead to do this (and it's even 
typesafe, awesome!!)

Currently, I'm doing this with a little helper function I've 
called Get:
     /// Creates a buffer to read the file. Handles a special case 
string
     T[] Get(T)(File f, size_t num=1){
	static if(is(T == char)){
	    ubyte[] strLen = new ubyte[1];
	    f.rawRead(strLen);
	    ubyte[] strBuf = new ubyte[strLen[0]];
	    return (f.rawRead(strBuf));
	} else {	
	    T[] buf = new T[num];
             return f.rawRead(buf);
	}
     }

Then to use it:
File f = File(filename, "rb");

auto versionNumber = f.Get!(ushort); // reads 2 bytes from file 
into a ushort[]
auto nextByte = f.Get!(byte); // reads 1 byte into a byte[]
auto next5Bytes = f.Get!(byte)(5); // reads 5 bytes and puts them 
into a byte[]

There is a lot of improvements to be made to the above, mostly 
around returning a Range. Also, reusing buffers so as not to 
allocate so much. That's probably Priority 1 actually.
That's all fine and dandy, will get to that.

There's some more complicated data in there too, like a string 
which in this binary file is defined as a byte representing the 
number of bytes in this string, followed by that number of bytes. 
(Similar to a char[] in C/D I imagine).

So, to read that, I'd write:
     auto myString = f.Get!(char)(f.Get!(byte)); // reads 
f.Get!byte number of bytes into a char array.

While doing this I had the idea of implementing a more general 
approach to this, using CTFE to build a struct / parserFunction 
(Similar to Pegged [1] ).
You describe at compile time how this binary file looks, and then 
the parser handles everything else.

Does anyone know of a good Binary Description Grammar out in the 
wild? I'd rather not re-invent the wheel on that front, and just 
use something standard. My Googling didn't come up with something 
that could be considered "standard" however.

Any ideas?

Thanks

[1] https://github.com/PhilippeSigaud/Pegged
Aug 10 2015
next sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
it looks like you can use some serialization library instead, like=20
Orange. structs that those library takes as input can be seen as a kind=20
of grammar description. ;-)=
Aug 10 2015
parent reply "wobbles" <grogan.colin gmail.com> writes:
On Monday, 10 August 2015 at 12:38:11 UTC, ketmar wrote:
 it looks like you can use some serialization library instead, 
 like Orange. structs that those library takes as input can be 
 seen as a kind of grammar description. ;-)
Trouble with that is what if there's some funny formats some data is in? Like I described above a string which starts with a number and continues for that number of bytes. What if it's a string that goes on until you hit a '\0' ? A serialisation library like Orange couldnt do that, as it's merely for (de)serialising D objects, I think?
Aug 10 2015
parent Jacob Carlborg <doob me.com> writes:
On 10/08/15 15:12, wobbles wrote:

 Trouble with that is what if there's some funny formats some data is in?
 Like I described above a string which starts with a number and continues
 for that number of bytes.
 What if it's a string that goes on until you hit a '\0' ? A
 serialisation library like Orange couldnt do that, as it's merely for
 (de)serialising D objects, I think?
Orange can handle D strings. If it sees something like char* it will serialize it as a pointer to a char, not a C string. Orange supports custom archivers if you need the data in some special format. -- /Jacob Carlborg
Aug 10 2015
prev sibling next sibling parent "Artem Tarasov" <lomereiter gmail.com> writes:
On Monday, 10 August 2015 at 12:29:43 UTC, wobbles wrote:
 While doing this I had the idea of implementing a more general 
 approach to this, using CTFE to build a struct / parserFunction 
 (Similar to Pegged [1] ).
 You describe at compile time how this binary file looks, and 
 then the parser handles everything else.

 Does anyone know of a good Binary Description Grammar out in 
 the wild? I'd rather not re-invent the wheel on that front, and 
 just use something standard. My Googling didn't come up with 
 something that could be considered "standard" however.

 Any ideas?
You should look into Construct (Python) - https://github.com/construct/construct If you are not afraid of Haskell, also take a look at https://github.com/bos/attoparsec
Aug 10 2015
prev sibling parent reply "Atila Neves" <atila.neves gmail.com> writes:
On Monday, 10 August 2015 at 12:29:43 UTC, wobbles wrote:
 I have to read a binary file.
 I can use std.stdio.File.rawRead to do this (and it's even 
 typesafe, awesome!!)

 [...]
https://github.com/atilaneves/cerealed If that doesn't do what you need, I've done something wrong. Atila
Aug 11 2015
next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Tue, 11 Aug 2015 07:51:37 +0000, Atila Neves wrote:

 On Monday, 10 August 2015 at 12:29:43 UTC, wobbles wrote:
 I have to read a binary file.
 I can use std.stdio.File.rawRead to do this (and it's even typesafe,
 awesome!!)

 [...]
=20 https://github.com/atilaneves/cerealed =20 If that doesn't do what you need, I've done something wrong.
a strange thing: i myself used only cerealed, but somehow keep thinking=20 that it was orange, and speaking about orange. i should setup autoreplacement in my nntp client...=
Aug 11 2015
prev sibling parent reply "wobbles" <grogan.colin gmail.com> writes:
On Tuesday, 11 August 2015 at 07:51:39 UTC, Atila Neves wrote:
 On Monday, 10 August 2015 at 12:29:43 UTC, wobbles wrote:
 I have to read a binary file.
 I can use std.stdio.File.rawRead to do this (and it's even 
 typesafe, awesome!!)

 [...]
https://github.com/atilaneves/cerealed If that doesn't do what you need, I've done something wrong. Atila
Yep, this seems to do the job! I'll investigate tonight, and pester you later :D
Aug 11 2015
parent "Atila Neves" <atila.neves gmail.com> writes:
On Tuesday, 11 August 2015 at 13:40:52 UTC, wobbles wrote:
 On Tuesday, 11 August 2015 at 07:51:39 UTC, Atila Neves wrote:
 On Monday, 10 August 2015 at 12:29:43 UTC, wobbles wrote:
 I have to read a binary file.
 I can use std.stdio.File.rawRead to do this (and it's even 
 typesafe, awesome!!)

 [...]
https://github.com/atilaneves/cerealed If that doesn't do what you need, I've done something wrong. Atila
Yep, this seems to do the job! I'll investigate tonight, and pester you later :D
Pester away! That's what I get for putting it out there. :P Atila
Aug 11 2015