www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Reading a file eats whole memory

reply "Emil Wojak" <emil wojak.eu> writes:
Hi!

Could someone please explain why this code tries to eat my 1 GB memory a=
nd  =

gets killed by the kernel afterwards? Eventually it prints "Error: Out o=
f  =

memory" when I set ulimit on memory prior to launching the program.

The code:
import std.stream;

int main(char [][] args) {
	Stream input=3Dnew File(args[0]);

	char[] data;
	input.read(data);
	input.close();
	return 0;
}

My intention was to read the executable itself, which is about 444 kB.
I'm running Linux, compiling with Digital Mars D Compiler v1.022
Oct 21 2007
next sibling parent reply div0 <div0 users.sourceforge.net> writes:
Emil Wojak wrote:
 Hi!
 
 Could someone please explain why this code tries to eat my 1 GB memory 
 and gets killed by the kernel afterwards? Eventually it prints "Error: 
 Out of memory" when I set ulimit on memory prior to launching the program.
 
 The code:
 import std.stream;
 
 int main(char [][] args) {
     Stream input=new File(args[0]);
 
     char[] data;
     input.read(data);
     input.close();
     return 0;
 }
 
 My intention was to read the executable itself, which is about 444 kB.
 I'm running Linux, compiling with Digital Mars D Compiler v1.022
You are trying to read a string in, so I guess the routine is using the 1st four bytes as a string length count. That's how tango works anyway IIRC. -- My enormous talent is exceeded only by my outrageous laziness.
Oct 21 2007
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"div0" <div0 users.sourceforge.net> wrote in message 
news:fffqid$1csk$1 digitalmars.com...
 You are trying to read a string in, so I guess the routine is using the 
 1st four bytes as a string length count. That's how tango works anyway 
 IIRC.

 -- 
You are precisely right. If you just want to get all the data in a file, just do: import std.file; int main(char[][] args) { ubyte[] data = cast(ubyte[])std.file.read(args[0]); return 0; } Two things: one, std.file.read returns a void[], which is a bit like D's equivalent of a void* -- it can point to anything, but you can't modify its data, and it also has a length which indicates the number of bytes in the data. Two, I'm casting to ubyte[] instead of char[]. Do NOT use char[] for "plain old data" as in C. char is a UTF-8 datatype, not a "one byte" datatype. You'll most likely get errors unless your input file is all plain ASCII or UTF-8 text. D provides the byte and ubyte types for raw byte data.
Oct 21 2007
prev sibling next sibling parent reply Frank Benoit <keinfarbton googlemail.com> writes:
Emil Wojak schrieb:
 Hi!
 
 Could someone please explain why this code tries to eat my 1 GB memory
 and gets killed by the kernel afterwards? Eventually it prints "Error:
 Out of memory" when I set ulimit on memory prior to launching the program.
 
 The code:
 import std.stream;
 
 int main(char [][] args) {
     Stream input=new File(args[0]);
 
     char[] data;
     input.read(data);
     input.close();
     return 0;
 }
 
 My intention was to read the executable itself, which is about 444 kB.
 I'm running Linux, compiling with Digital Mars D Compiler v1.022
other had commented the file reading... Using arg[0] to access the programs binary is not save, because if it is called via the PATH variable it does not contain the path. /proc/self/exe is a link to your executable.
Oct 21 2007
parent "Emil Wojak" <emil wojak.eu> writes:
Dnia 21-10-2007 o 17:45:54 Frank Benoit <keinfarbton googlemail.com>  
napisaƂ(a):

Thank you everyone for your explanations. This test below proves what you  
wrote:

$ echo -en '\x03\x00\x00\x00abcdefgh' > string.dat

A test code:
-----------------
import std.stdio;
import std.stream;

int main(char [][] args) {
	Stream input=new File(args[1], FileMode.In);
	char[] data;
	input.read(data);
	writefln("data.length=", data.length, " data=", data);
	input.close();
	return 0;
}
-----------------
$ dmd test.d
$ ./test ./string.dat
data.length=3 data=abc

So the program reads 7 bytes - array length (4 bytes) + 3 bytes of data.
Switching type of data to ubyte[5] makes the program read exactly 5 bytes  
("\x03\x00\x00\x00a").

 Using arg[0] to access the programs binary is not save, because if it is
 called via the PATH variable it does not contain the path.
 /proc/self/exe is a link to your executable.
Well, argv[0] was just a quick and dirty test file, nevertheless thanks for your hint :)
Oct 21 2007
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jarrett Billingsley:
 import std.file;
 int main(char[][] args)
 {
     ubyte[] data = cast(ubyte[])std.file.read(args[0]);
     return 0;
 }
 
 Two things: one, std.file.read returns a void[], which is a bit like D's 
 equivalent of a void*
I don't understand the design of that std.file.read(): why don't return a ubyte[] by default instead of a void[] (and cast it to everything else if you don't need ubytes)? Bye, bearophile
Dec 08 2007
parent Dan <murpsoft hotmail.com> writes:
bearophile Wrote:

 Jarrett Billingsley:
 import std.file;
 int main(char[][] args)
 {
     ubyte[] data = cast(ubyte[])std.file.read(args[0]);
     return 0;
 }
 
 Two things: one, std.file.read returns a void[], which is a bit like D's 
 equivalent of a void*
I don't understand the design of that std.file.read(): why don't return a ubyte[] by default instead of a void[] (and cast it to everything else if you don't need ubytes)? Bye, bearophile
1) It doesn't matter. You're supposed to cast it anyways based on the type of data in the file.
Dec 08 2007