digitalmars.D.learn - Converting from std.file.read's void[]
- Jonathan M Davis (14/14) Sep 21 2010 Okay, it seems that the way to read in a binary file is to use std.file....
- bearophile (7/15) Sep 21 2010 D2 string are immutable(char)[] and not char[].
- bearophile (14/15) Sep 21 2010 I have just tried those a little. Python file object doesn't have a eof(...
- Jonathan M Davis (6/27) Sep 21 2010 I believe that the typical behaviour in C and C++ is that eof() is false...
- Jonathan M Davis (9/27) Sep 21 2010 Well, yes. I was talking about strings in the general sense (though UTF-...
- Kagamin (3/6) Sep 21 2010 You may like the BinaryReader interface
- Steven Schveighoffer (10/35) Sep 22 2010 You can slice void arrays, even though you cannot index them. If you kn...
Okay, it seems that the way to read in a binary file is to use std.file.read() which reads in the file as a void[]. This immediately raises the question as to how to convert the void[] into something useful. It seems to me that casting void[] to a ubyte[] is then the appropriate thing to do because then you can properly index it and grab the appropriate bytes that need to be converting into useful values. However, that still raises the question of how to get anything useful out of the bytes. UTF-8 strings are easy because they're the same size as ubytes. Casting to char[] for the portion of the data that you want as a string seems to work just fine. But what about other types? Is it the correct thing to cast to T[] where T is whatever type the data represents and then index into it to get the values that you want of that type and then cast the next section of the data to U[] where U is the type for the next section of the data, etc.? Or is there a better way to handle this? - Jonathan M Davis
Sep 21 2010
Jonathan M Davis:UTF-8 strings are easy because they're the same size as ubytes. Casting to char[] for the portion of the data that you want as a string seems to work just fine.D2 string are immutable(char)[] and not char[]. Strings are UTF-8, while the raw bytes you read from a file may contain everything, so in some situations you need to use the validate function.But what about other types? Is it the correct thing to cast to T[] where T is whatever type the data represents and then index into it to get the values that you want of that type and then cast the next section of the data to U[] where U is the type for the next section of the data, etc.? Or is there a better way to handle this?It's better to avoid casts when possible, and SafeD may even be restrict their usage. Take a look at the rawWrite/rawRead methods of std.stdio.File. Bye, bearophile
Sep 21 2010
Take a look at the rawWrite/rawRead methods of std.stdio.File.I have just tried those a little. Python file object doesn't have a eof() method. This D2 program shows that eof() is false even when the whole file has being read, is this correct? import std.stdio: File; void main() { double[3] data = [0.5, 1.5, 2.5]; auto f = File("test.raw", "wb"); f.rawWrite(data); f.close(); f = File("test.raw", "rb"); assert(!f.eof()); f.rawRead(data); assert(f.eof()); // Assertion failure } Bye, bearophile
Sep 21 2010
On Tuesday, September 21, 2010 17:34:26 bearophile wrote:I believe that the typical behaviour in C and C++ is that eof() is false until you've tried to read beyond the end of the file. So, you get one more read than you might expect. You do the read, an then check eof() rather than checking eof() and then doing the read if it isn't true. - Jonathan M DavisTake a look at the rawWrite/rawRead methods of std.stdio.File.I have just tried those a little. Python file object doesn't have a eof() method. This D2 program shows that eof() is false even when the whole file has being read, is this correct? import std.stdio: File; void main() { double[3] data = [0.5, 1.5, 2.5]; auto f = File("test.raw", "wb"); f.rawWrite(data); f.close(); f = File("test.raw", "rb"); assert(!f.eof()); f.rawRead(data); assert(f.eof()); // Assertion failure } Bye, bearophile
Sep 21 2010
On Tuesday, September 21, 2010 16:41:57 bearophile wrote:Jonathan M Davis:Well, yes. I was talking about strings in the general sense (though UTF-8 strings), not necessarily the specific type string. The fact that you can cast to char[] makes getting strings easy, while the correct way to deal with types which aren't bytes isn't as obvious.UTF-8 strings are easy because they're the same size as ubytes. Casting to char[] for the portion of the data that you want as a string seems to work just fine.D2 string are immutable(char)[] and not char[]. Strings are UTF-8, while the raw bytes you read from a file may contain everything, so in some situations you need to use the validate function.That does look like a better way to handle it. Thanks. Normally, I don't mess with binary files, so I'm not particularly well-versed in the correct ways to read them. - Jonathan M DavisBut what about other types? Is it the correct thing to cast to T[] where T is whatever type the data represents and then index into it to get the values that you want of that type and then cast the next section of the data to U[] where U is the type for the next section of the data, etc.? Or is there a better way to handle this?It's better to avoid casts when possible, and SafeD may even be restrict their usage. Take a look at the rawWrite/rawRead methods of std.stdio.File.
Sep 21 2010
Jonathan M Davis Wrote:Okay, it seems that the way to read in a binary file is to use std.file.read() which reads in the file as a void[]. This immediately raises the question as to how to convert the void[] into something useful.You may like the BinaryReader interface http://msdn.microsoft.com/en-us/library/system.io.binaryreader_members.aspx
Sep 21 2010
On Tue, 21 Sep 2010 19:06:43 -0400, Jonathan M Davis <jmdavisProg gmx.com> wrote:Okay, it seems that the way to read in a binary file is to use std.file.read() which reads in the file as a void[]. This immediately raises the question as to how to convert the void[] into something useful. It seems to me that casting void[] to a ubyte[] is then the appropriate thing to do because then you can properly index it and grab the appropriate bytes that need to be converting into useful values. However, that still raises the question of how to get anything useful out of the bytes. UTF-8 strings are easy because they're the same size as ubytes. Casting to char[] for the portion of the data that you want as a string seems to work just fine. But what about other types? Is it the correct thing to cast to T[] where T is whatever type the data represents and then index into it to get the values that you want of that type and then cast the next section of the data to U[] where U is the type for the next section of the data, etc.? Or is there a better way to handle this?You can slice void arrays, even though you cannot index them. If you know for instance that a struct S resides at the 15th byte, you can do: (cast(S[])arr[15..$])[0]; or: *(cast(S*)arr.ptr + 15); there are various ways to get the data. Only if you know the data is an *array* of a certain type is it useful to cast the entire array. -Steve
Sep 22 2010