digitalmars.D.learn - Reading large files, writing large files?
- AEon (33/33) Mar 27 2005 Rethinking the way I normally handle files, since now I am faced with
- Regan Heath (68/100) Mar 27 2005 Try this...
- Ben Hinkle (7/25) Mar 27 2005 [snip]
- Regan Heath (6/20) Mar 27 2005 :)
- AEon (27/88) Mar 28 2005 You seem to be "shadowing" some parent class called Source?
- Ben Hinkle (14/51) Mar 28 2005 The class is templatized. It is a way of subclassing any stream subclass...
- Regan Heath (7/27) Mar 28 2005 Good point, that is probably more correct.
- Derek Parnell (13/16) Mar 28 2005 On Tue, 29 Mar 2005 10:39:49 +1200, Regan Heath wrote:
- Ben Hinkle (10/20) Mar 28 2005 IMO the right way to check if a string is empty is asking if the length ...
- Regan Heath (52/75) Mar 28 2005 No. You cannot tell empty from null with length, eg.
- Ben Hinkle (38/110) Mar 28 2005 uhh - I think we have different definition of the word "empty". I take i...
- Regan Heath (39/51) Mar 29 2005 "empty" - "Holding or containing nothing."
- Derek Parnell (8/76) Mar 29 2005 All of this is well said and presented. I'm in total agreement with this
- Ben Hinkle (12/27) Mar 29 2005 What you describe is ok with me but I don't think it maps well to D's
- Regan Heath (8/42) Mar 29 2005 Exactly my point. It would only take a few small changes to "fix" the
- Ben Hinkle (15/43) Mar 29 2005 Java arrays have the semantics you describe. They distinguish between
- Regan Heath (23/76) Mar 29 2005 Ok.
- Regan Heath (49/126) Mar 28 2005 Ben has done a fairly good job of explaining it. I'll have a go too, the...
- AEon (51/82) Mar 28 2005 Ah... that is one of the things I really hate as a OOP beginner. It is
- Regan Heath (42/93) Mar 29 2005 In this case you can look in dmd\src\phobos\std\stream.d for the class
- AEon (11/60) Mar 29 2005 Have read several examples by now. Is there a complete list of catch
- Regan Heath (6/24) Mar 29 2005 Each "catch keyword" is a class derived from the Exception or Error
Rethinking the way I normally handle files, since now I am faced with possibly very huge (100MB and) log files. Dito I need to save large log files. So it does not seem to be a good idea to use my, sofar preferred method: // Ensure file exists if( ! std.file.exists(cfgPathFile) ) ... // Read complete cfg file into array, removes \r\n via splitlines() char[][] cfgText = std.string.splitlines( cast(char[]) std.file.read(cfgPathFile) ); Etc... I have very much come to like splitlines, and read, but with 100 MB log files, loading all that into RAM may turn out ugly? Let's say I'd ignore the RAM issue for a moment, how would I properly use std.file.write() to write into a file? The method I fear will need to be applied for such huge files is something like this (posted by Martin in this newsgroup): import std.stream; void readfile(char[] fn) { File f = new File(); char[] l; f.open(fn); while(!f.eof()) { l = f.readLine(); printf("line: %.*s\n", l); } f.close(); } That would be pretty much the ANSI C way... ieek :)... Is there any way to avoid the latter method? And go the nicer D way, as in the first code example? AEon
Mar 27 2005
On Mon, 28 Mar 2005 05:13:36 +0200, AEon <aeon2001 lycos.de> wrote:Rethinking the way I normally handle files, since now I am faced with possibly very huge (100MB and) log files. Dito I need to save large log files. So it does not seem to be a good idea to use my, sofar preferred method: // Ensure file exists if( ! std.file.exists(cfgPathFile) ) ... // Read complete cfg file into array, removes \r\n via splitlines() char[][] cfgText = std.string.splitlines( cast(char[]) std.file.read(cfgPathFile) ); Etc... I have very much come to like splitlines, and read, but with 100 MB log files, loading all that into RAM may turn out ugly? Let's say I'd ignore the RAM issue for a moment, how would I properly use std.file.write() to write into a file? The method I fear will need to be applied for such huge files is something like this (posted by Martin in this newsgroup): import std.stream; void readfile(char[] fn) { File f = new File(); char[] l; f.open(fn); while(!f.eof()) { l = f.readLine(); printf("line: %.*s\n", l); } f.close(); } That would be pretty much the ANSI C way... ieek :)... Is there any way to avoid the latter method? And go the nicer D way, as in the first code example?Try this... import std.c.stdlib; import std.stream; import std.stdio; class LineReader(Source) : Source { int opApply(int delegate(inout char[]) dg) { int result = 0; char[] line; while(!eof()) { line = readLine(); if (!line) break; result = dg(line); if (result) break; } return result; } int opApply(int delegate(inout size_t, inout char[]) dg) { int result = 0; size_t lineno; char[] line; for(lineno = 1; !eof(); lineno++) { line = readLine(); if (!line) break; result = dg(lineno,line); if (result) break; } return result; } } int main(char[][] args) { LineReader!(BufferedFile) f; if (args.length < 2) usage(); f = new LineReader!(BufferedFile)(); f.open(args[1],FileMode.In); foreach(char[] line; f) { writefln("READ[",line,"]"); } f.close(); f.open(args[1],FileMode.In); foreach(size_t lineno, char[] line; f) { writefln("READ[",lineno,"][",line,"]"); } f.close(); return 0; } void usage() { writefln("USAGE: test29 <file>"); writefln(""); exit(1); } Regan
Mar 27 2005
one tiny improvement would be to combine the new File() with the open(fn) into new File(fn).void readfile(char[] fn) { File f = new File(); char[] l; f.open(fn); while(!f.eof()) { l = f.readLine(); printf("line: %.*s\n", l); } f.close(); }[snip] That's pretty nice. Maybe opApply iterating over lines should be built into Stream. That would resemble the standard Perl style of reading a file line-by-line. I'll poke around with that. It should be pretty easy and it would make line processing with stream much easier to use.That would be pretty much the ANSI C way... ieek :)... Is there any way to avoid the latter method? And go the nicer D way, as in the first code example?Try this... class LineReader(Source) : Source {
Mar 27 2005
On Sun, 27 Mar 2005 22:52:52 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote::)[snip] That's pretty nice.That would be pretty much the ANSI C way... ieek :)... Is there any way to avoid the latter method? And go the nicer D way, as in the first code example?Try this... class LineReader(Source) : Source {Maybe opApply iterating over lines should be built into Stream.That would be nice.That would resemble the standard Perl style of reading a file line-by-line. I'll poke around with that. It should be pretty easy and it would make line processing with stream much easier to use.Agreed. Regan
Mar 27 2005
Trying to understand what you did, here. There seem to be several concepts I am still missing...import std.c.stdlib; import std.stream; import std.stdio; class LineReader(Source) : SourceYou seem to be "shadowing" some parent class called Source?{ int opApply(int delegate(inout char[]) dg)Alas I still have no idea what "delegate" does, and why it needs to be used?{ int result = 0; char[] line; while(!eof()) { line = readLine();How come readLine() knows of the stream?if (!line) break;"if line == null" then break... no idea what this is good for.result = dg(line); if (result) break;Don't understand these lines either. Can it be that you are filling up a "buffer" with all the lines of the stream, until you reach an empty line, to let foreach then scan that "buffer" like it does for any other array? If so that could possibly use up a lot of RAM?!} return result; } int opApply(int delegate(inout size_t, inout char[]) dg) { int result = 0; size_t lineno;Why did you use size_t for lineno, would int now also work? (I tested this and it works fine to replace all size_t with int).char[] line; for(lineno = 1; !eof(); lineno++) { line = readLine(); if (!line) break; result = dg(lineno,line); if (result) break; } return result; } }AFAICT you defined 2 "structures" that will let the user use foreach on "f.open" streams. One version that will "just" read lines another that will also let you retrieve the line numbers as well.int main(char[][] args) {LineReader!(BufferedFile) f; f = new LineReader!(BufferedFile)();Can be reduced to: LineReader!(BufferedFile) f = new LineReader!(BufferedFile)(); making the equivalent coding to File f = new File(); more obvious. IOW you seem to have defined a new stream?if (args.length < 2) usage(); f.open(args[1],FileMode.In); foreach(char[] line; f)Is this default behavior? I.e. that foreach can parse streams? AFAICT this is the the new speciality of your stream, right? Very nice.{ writefln("READ[",line,"]"); } f.close(); f.open(args[1],FileMode.In); foreach(size_t lineno, char[] line; f)Neat.{ writefln("READ[",lineno,"][",line,"]"); } f.close(); return 0; }I noted when testing this code, that it will only read the lines of a stream until an empty line is encountered. Is this indeed intended? AEon
Mar 28 2005
"AEon" <aeon2001 lycos.de> wrote in message news:d290lj$1ukr$1 digitaldaemon.com...Trying to understand what you did, here. There seem to be several concepts I am still missing...The class is templatized. It is a way of subclassing any stream subclass. I think it would also work to do class LineReader(Source : Stream) : Source to force the class Source to be a Stream or Stream subclass.import std.c.stdlib; import std.stream; import std.stdio; class LineReader(Source) : SourceYou seem to be "shadowing" some parent class called Source?opApply is used to implement 'foreach' in classes. See http://www.digitalmars.com/d/statement.html#foreach Also for info about delegate see http://www.digitalmars.com/d/function.html{ int opApply(int delegate(inout char[]) dg)Alas I still have no idea what "delegate" does, and why it needs to be used?It subclasses Stream.{ int result = 0; char[] line; while(!eof()) { line = readLine();How come readLine() knows of the stream?I think this isn't needed. I think it probably is why blank lines stop the foreach.if (!line) break;"if line == null" then break... no idea what this is good for.This is part of the foreach magic.result = dg(line); if (result) break;Don't understand these lines either.Can it be that you are filling up a "buffer" with all the lines of the stream, until you reach an empty line, to let foreach then scan that "buffer" like it does for any other array? If so that could possibly use up a lot of RAM?!on 32 bit machine size_t is uint. On 64 bit it is ulong.} return result; } int opApply(int delegate(inout size_t, inout char[]) dg) { int result = 0; size_t lineno;Why did you use size_t for lineno, would int now also work? (I tested this and it works fine to replace all size_t with int).
Mar 28 2005
On Mon, 28 Mar 2005 13:43:08 -0500, Ben Hinkle <bhinkle mathworks.com> wrote:"AEon" <aeon2001 lycos.de> wrote in message news:d290lj$1ukr$1 digitaldaemon.com...Good point, that is probably more correct.Trying to understand what you did, here. There seem to be several concepts I am still missing...The class is templatized. It is a way of subclassing any stream subclass. I think it would also work to do class LineReader(Source : Stream) : Source to force the class Source to be a Stream or Stream subclass.class LineReader(Source) : SourceYou seem to be "shadowing" some parent class called Source?I think readLine is broken. It needs to return "" and not null. The difference being that "" has a non null "line.ptr" and "line is null" is not true. ReganI think this isn't needed. I think it probably is why blank lines stop the foreach.if (!line) break;"if line == null" then break... no idea what this is good for.
Mar 28 2005
On Tue, 29 Mar 2005 10:39:49 +1200, Regan Heath wrote: [snip]I think readLine is broken. It needs to return "" and not null. The difference being that "" has a non null "line.ptr" and "line is null" is not true.I've mentioned this before. D can not guarantee that a coder will always be able to distinguish between an empty line and an uninitialized line. I believe the two are distinct and useful idioms, and I know that it is theoretically possible, but sometimes when you pass a "", it gets received as null; however not in all situations. :-( -- Derek Parnell Melbourne, Australia http://www.dsource.org/projects/build v1.16 released 29/03/2005 9:24:10 AM
Mar 28 2005
IMO the right way to check if a string is empty is asking if the length is 0. Setting an array's length to 0 automatically sets the ptr to null. So relying on any specific behavior of the ptr of a 0 length array is dangerous at best (since it would rely on always slicing to resize). For example the statement str.length = str.length; does nothing if length > 0 and sets the ptr to null if length == 0. One can argue about D's behavior about nulling the ptr but that's the current situation. Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.I think readLine is broken. It needs to return "" and not null. The difference being that "" has a non null "line.ptr" and "line is null" is not true.I think this isn't needed. I think it probably is why blank lines stop the foreach.if (!line) break;"if line == null" then break... no idea what this is good for.
Mar 28 2005
On Mon, 28 Mar 2005 19:05:39 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:No. You cannot tell empty from null with length, eg. char[] isnull = null; char[] isempty = ""; assert(isnull.length == 0); assert(isempty.length == 0); compile, run, no asserts.IMO the right way to check if a string is empty is asking if the length is 0.I think readLine is broken. It needs to return "" and not null. The difference being that "" has a non null "line.ptr" and "line is null" is not true.I think this isn't needed. I think it probably is why blank lines stop the foreach.if (!line) break;"if line == null" then break... no idea what this is good for.Setting an array's length to 0 automatically sets the ptr to null. So relying on any specific behavior of the ptr of a 0 length array is dangerous at best (since it would rely on always slicing to resize).I agree. I currently use "is" or "===" to tell them apart. eg. char[] isnull = null; char[] isempty = ""; assert(isnull === null); assert(isempty !== null); I, at first, suspected the behaviour above to be a side effect of D's behaviour of appending \0 to hard-coded/static strings (thus ptr cannot be null for ""). If this behaviour were removed ptr would have 'nothing' to point at. However... char[] isempty; char[] test; test.length = 3; test[0] = 'a'; test[1] = 'b'; test[2] = 'c'; isempty = test[0..0]; assert(isempty.length == 0); assert(isempty !== null); it appears not, but, as you mention:For example the statement str.length = str.length; does nothing if length > 0 and sets the ptr to null if length == 0.isempty.length = isempty.length; assert(isempty.length == 0); assert(isempty !== null); asserts on the 2nd assert statement as it has set the ptr to null.One can argue about D's behavior about nulling the ptr but that's the current situation.Indeed. Setting length to 0, should IMO create an empty string, not un-assign or free the string. Setting the reference to null should un-assign or free the string. To be honest I don't really care what it does *so long as* I can tell an empty string (array assigned to something with length 0) apart from one that does not exist (unassigned array, init to null). The simple fact of the matter being that in some situations these two things need to be treated differently. In some cases an AA and the "in" operator can be used as a workaround, as "in" checks for existance. I didn't think of this idea immediately (someone else suggested it). It would be nice if the functionality was more immediately apparent. To clarify I don't want to make it harder to treat them the same, which you can currently do with "if (length == 0)" I just want a guaranteed method of telling them apart.Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.If the array ptr is null the result will be null, right? I don't see a problem with this. Regan
Mar 28 2005
"Regan Heath" <regan netwin.co.nz> wrote in message news:opsodiv9b023k2f5 nrage.netwin.co.nz...On Mon, 28 Mar 2005 19:05:39 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:uhh - I think we have different definition of the word "empty". I take it you define empty to be non-null ptr and 0 length, correct? I take empty to mean anything that compares as equal to "". In D length==0 is equivalent to =="": str.length == 0 iff str == "" That is why I consider testing length to be the simplest/fastest way to test for "empty". For example int main() { char[] x; x = new char[5]; assert(x != ""); assert(x.length != 0); x = x[0..0]; assert(x == ""); assert(x.length == 0); char[] y = ""; assert(y == ""); assert(y.length == 0); char[] z = null; assert(y == ""); assert(y.length == 0); return 0; }No. You cannot tell empty from null with length, eg. char[] isnull = null; char[] isempty = ""; assert(isnull.length == 0); assert(isempty.length == 0); compile, run, no asserts.IMO the right way to check if a string is empty is asking if the length is 0.I think readLine is broken. It needs to return "" and not null. The difference being that "" has a non null "line.ptr" and "line is null" is not true.I think this isn't needed. I think it probably is why blank lines stop the foreach.if (!line) break;"if line == null" then break... no idea what this is good for.It is also true that char[] isempty = ""; char[] isempty2 = test[0..0]; assert( isempty !== isempty2);Setting an array's length to 0 automatically sets the ptr to null. So relying on any specific behavior of the ptr of a 0 length array is dangerous at best (since it would rely on always slicing to resize).I agree. I currently use "is" or "===" to tell them apart. eg. char[] isnull = null; char[] isempty = ""; assert(isnull === null); assert(isempty !== null); I, at first, suspected the behaviour above to be a side effect of D's behaviour of appending \0 to hard-coded/static strings (thus ptr cannot be null for ""). If this behaviour were removed ptr would have 'nothing' to point at. However... char[] isempty; char[] test; test.length = 3; test[0] = 'a'; test[1] = 'b'; test[2] = 'c'; isempty = test[0..0]; assert(isempty.length == 0); assert(isempty !== null); it appears not, but, as you mention:ah - here I can see what empty means to you. It is true our definitions of "empty" differ.For example the statement str.length = str.length; does nothing if length > 0 and sets the ptr to null if length == 0.isempty.length = isempty.length; assert(isempty.length == 0); assert(isempty !== null); asserts on the 2nd assert statement as it has set the ptr to null.One can argue about D's behavior about nulling the ptr but that's the current situation.Indeed. Setting length to 0, should IMO create an empty string, not un-assign or free the string. Setting the reference to null should un-assign or free the string. To be honest I don't really care what it does *so long as* I can tell an empty string (array assigned to something with length 0) apart from one that does not exist (unassigned array, init to null).The simple fact of the matter being that in some situations these two things need to be treated differently.That's what "is" and !== are for. But those are rare occasions I would bet.In some cases an AA and the "in" operator can be used as a workaround, as "in" checks for existance. I didn't think of this idea immediately (someone else suggested it). It would be nice if the functionality was more immediately apparent. To clarify I don't want to make it harder to treat them the same, which you can currently do with "if (length == 0)" I just want a guaranteed method of telling them apart.I was suggesting making it illegal so that casually testing !line would be illegal. Instead it would have to be !line.ptr which makes it more obvious what is actually being tested (ie - the length is ignored and just the ptr is checked) By the way, when would you like readLine to return a null string as opposed to an non-null-zero-length string?Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.If the array ptr is null the result will be null, right? I don't see a problem with this.
Mar 28 2005
On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:I take it you define empty to be non-null ptr and 0 length, correct?"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D? Some abstract ideas/thoughts. A pointer/reference/handle/whatever is a construct which we use to access some data. This construct IMO needs the ability to (1) indicate the (non)existance of the data (2) give us access to the data. In C I would use a pointer eg. char *ptr = NULL; ptr = NULL; //no value exists ptr = ""; //value exists, it is empty. The humble pointer can indicate that no data exists, by pointing at NULL (which is defined to be an invalid address for data). The pointer can indicate the existing data by pointing at it's address. The data it points to may be empty if it "contains nothing" (what that means depends on the data itself). D's char[] is a reference not a pointer. A reference should be able to represent 1 & 2 above but it's implementation in D blurs the distinction between "non existant" and "existing but empty" due to it's relationship with null and it's behaviour when setting length to 0. In short: - A char[] should not go from "empty" to "non existant" without being explicitly assigned to "non existant" (AKA null). - "empty" (AKA "") should not compare equal to "non existant" (AKA null). It appears to me that the only reliable way in D to indicate "non existant" is to throw an exception. Perhaps this is acceptable, perhaps it's the D way and I simply have to get used to it. <snip>I don't think this is necessary.I was suggesting making it illegal so that casually testing !line would be illegal. Instead it would have to be !line.ptr which makes it more obvious what is actually being tested (ie - the length is ignored and just the ptr is checked)Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.If the array ptr is null the result will be null, right? I don't see a problem with this.By the way, when would you like readLine to return a null string as opposed to an non-null-zero-length string?At the end of file. readLine() - null means no lines "exist". readLine() - "" means a line "exists" but is "emtpy" of chars. Regan
Mar 29 2005
On Tue, 29 Mar 2005 22:47:53 +1200, Regan Heath wrote:On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:All of this is well said and presented. I'm in total agreement with this point of view. An empty string is a string that is empty. -- Derek Parnell Melbourne, Australia 29/03/2005 9:03:46 PMI take it you define empty to be non-null ptr and 0 length, correct?"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D? Some abstract ideas/thoughts. A pointer/reference/handle/whatever is a construct which we use to access some data. This construct IMO needs the ability to (1) indicate the (non)existance of the data (2) give us access to the data. In C I would use a pointer eg. char *ptr = NULL; ptr = NULL; //no value exists ptr = ""; //value exists, it is empty. The humble pointer can indicate that no data exists, by pointing at NULL (which is defined to be an invalid address for data). The pointer can indicate the existing data by pointing at it's address. The data it points to may be empty if it "contains nothing" (what that means depends on the data itself). D's char[] is a reference not a pointer. A reference should be able to represent 1 & 2 above but it's implementation in D blurs the distinction between "non existant" and "existing but empty" due to it's relationship with null and it's behaviour when setting length to 0. In short: - A char[] should not go from "empty" to "non existant" without being explicitly assigned to "non existant" (AKA null). - "empty" (AKA "") should not compare equal to "non existant" (AKA null). It appears to me that the only reliable way in D to indicate "non existant" is to throw an exception. Perhaps this is acceptable, perhaps it's the D way and I simply have to get used to it. <snip>I don't think this is necessary.I was suggesting making it illegal so that casually testing !line would be illegal. Instead it would have to be !line.ptr which makes it more obvious what is actually being tested (ie - the length is ignored and just the ptr is checked)Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.If the array ptr is null the result will be null, right? I don't see a problem with this.By the way, when would you like readLine to return a null string as opposed to an non-null-zero-length string?At the end of file. readLine() - null means no lines "exist". readLine() - "" means a line "exists" but is "emtpy" of chars.
Mar 29 2005
"Regan Heath" <regan netwin.co.nz> wrote in message news:opsoeax3jt23k2f5 nrage.netwin.co.nz...On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:What you describe is ok with me but I don't think it maps well to D's arrays. To me I don't really look at existance or non-existance but instead the following two rules 1) all arrays have a well-defined length 2) arrays with non-zero length have a well-defined pointer One can tread carefully to preserve pointers with 0 length arrays but it takes effort.I take it you define empty to be non-null ptr and 0 length, correct?"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D?The foreach will stop automatically at eof. It's like a foreach stopping at the end of an array when it has no more elements. It doesn't run once more with null - it just stops.By the way, when would you like readLine to return a null string as opposed to an non-null-zero-length string?At the end of file. readLine() - null means no lines "exist". readLine() - "" means a line "exists" but is "emtpy" of chars.
Mar 29 2005
On Tue, 29 Mar 2005 08:29:36 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:"Regan Heath" <regan netwin.co.nz> wrote in message news:opsoeax3jt23k2f5 nrage.netwin.co.nz...Exactly my point. It would only take a few small changes to "fix" the problem as I see it.On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:What you describe is ok with me but I don't think it maps well to D's arrays.I take it you define empty to be non-null ptr and 0 length, correct?"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D?To me I don't really look at existance or non-existance but instead the following two rules 1) all arrays have a well-defined length 2) arrays with non-zero length have a well-defined pointer One can tread carefully to preserve pointers with 0 length arrays but it takes effort.Indeed. So, how do you handle existance/non-existance?Which foreach? My one? Assume now that I remove the eof() check. What happens now? ReganThe foreach will stop automatically at eof. It's like a foreach stopping at the end of an array when it has no more elements. It doesn't run once more with null - it just stops.By the way, when would you like readLine to return a null string as opposed to an non-null-zero-length string?At the end of file. readLine() - null means no lines "exist". readLine() - "" means a line "exists" but is "emtpy" of chars.
Mar 29 2005
"Regan Heath" <regan netwin.co.nz> wrote in message news:opsoe3wkh323k2f5 nrage.netwin.co.nz...On Tue, 29 Mar 2005 08:29:36 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:Java arrays have the semantics you describe. They distinguish between null/empty/non-empty and none compare as equal to the others. In fact even trying to compare a null array throws an exception much like trying to call opEquals on a null object reference throws an exception. It's a very reasonable thing to do. The main trouble with Java array semantics is that APIs wind up choosing between null and empty fairly randomly and so many Java array bugs are introduced by guessing some function returns "empty" when it in fact returns null. It's easier to focus instead on only distinguishing empty/non-empty, which is what D does. One can think up APIs where having a third, null, choice would be useful but almost all the time the practical uses of an array are covered by empty/non-empty."Regan Heath" <regan netwin.co.nz> wrote in message news:opsoeax3jt23k2f5 nrage.netwin.co.nz...Exactly my point. It would only take a few small changes to "fix" the problem as I see it.On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:What you describe is ok with me but I don't think it maps well to D's arrays.I take it you define empty to be non-null ptr and 0 length, correct?"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D?It would iterate forever just like any loop that doesn't have an ending condition.The foreach will stop automatically at eof. It's like a foreach stopping at the end of an array when it has no more elements. It doesn't run once more with null - it just stops.Which foreach? My one? Assume now that I remove the eof() check. What happens now?
Mar 29 2005
On Tue, 29 Mar 2005 19:17:55 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:"Regan Heath" <regan netwin.co.nz> wrote in message news:opsoe3wkh323k2f5 nrage.netwin.co.nz...Ok.On Tue, 29 Mar 2005 08:29:36 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:Java arrays have the semantics you describe. They distinguish between null/empty/non-empty and none compare as equal to the others. In fact even trying to compare a null array throws an exception much like trying to call opEquals on a null object reference throws an exception. It's a very reasonable thing to do."Regan Heath" <regan netwin.co.nz> wrote in message news:opsoeax3jt23k2f5 nrage.netwin.co.nz...Exactly my point. It would only take a few small changes to "fix" the problem as I see it.On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com> wrote:What you describe is ok with me but I don't think it maps well to D's arrays.I take it you define empty to be non-null ptr and 0 length, correct?"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D?The main trouble with Java array semantics is that APIs wind up choosing between null and empty fairly randomly and so many Java array bugs are introduced by guessing some function returns "empty" when it in fact returns null.I can see how if the situation does not call for a distinction between "exists but is empty" and "does not exist" then the programmer may choose either "" or null to indicate no value. The choice will likely be based on thier personal preference and/or "fear of null" (a phenomenon I have encountered before) I don't see this possibility as being a good reason to limit flexibility in this way.It's easier to focus instead on only distinguishing empty/non-empty, which is what D does.You mean, limit flexibility for the sake of simplicity. I don't like it.One can think up APIs where having a third, null, choice would be useful but almost all the time the practical uses of an array are covered by empty/non-empty.I think it depends on style and the sort of code you write as to whether the situations where a null choice is "required"* are common or not. Personally I come across them often. I also believe that some people just don't see the need for a distinction, i.e. the current readLine implementation. *(required is perhaps the wrong word, you can probably work around most situation, but the workaround generally is just that, and sub-optimal)Not if readLine were implemented the way I assumed it would have been.It would iterate foreverThe foreach will stop automatically at eof. It's like a foreach stopping at the end of an array when it has no more elements. It doesn't run once more with null - it just stops.Which foreach? My one? Assume now that I remove the eof() check. What happens now?just like any loop that doesn't have an ending condition.Bollocks. :) The ending condition is readLine() returning null (indicating no more lines "exist"). Regan
Mar 29 2005
On Mon, 28 Mar 2005 15:25:57 +0200, AEon <aeon2001 lycos.de> wrote:Trying to understand what you did, here. There seem to be several concepts I am still missing...Ben has done a fairly good job of explaining it. I'll have a go too, the combination of our efforts will hopefully explain "everything". :)This technique is called a "Snap-On". I am creating a new template class "LineReader" which is a child class of an unspecified (at this stage) class. Later when I say: "LineReader!(BufferedFile) f;" it specifies that "Source" is "BufferedFile".import std.c.stdlib; import std.stream; import std.stdio; class LineReader(Source) : SourceYou seem to be "shadowing" some parent class called Source?A delegate is like a function pointer, except that a delegate points to a (non-static) class member function. So calling it is like calling a class member on a class. In this case the delegate is part of the "magic" that makes foreach work on a custom class like LineReader.{ int opApply(int delegate(inout char[]) dg)Alas I still have no idea what "delegate" does, and why it needs to be used?Because LineReader is a child class of BufferedFile, which is a stream. The readLine call above calls the readLine of the parent class BufferedFile.{ int result = 0; char[] line; while(!eof()) { line = readLine();How come readLine() knows of the stream?I was trying to stop at the end of the file, it appears this stops on blank lines. IMO readLine is broken, it is returning null for a blank line, it should return "". The difference between null and "" in the case of char[] is that null has a null .ptr and "is null" is true, so... if (!line.ptr) break; if (line is null) break; statements should only fire when line is null and not "". But it appears readLine does not differentiate between null and "".if (!line) break;"if line == null" then break... no idea what this is good for.As Ben said, it's part of the foreach "magic", his links should explain it. If not, let us know how the docs are deficient and hopefully someone can improve them.result = dg(line); if (result) break;Don't understand these lines either.Can it be that you are filling up a "buffer" with all the lines of the stream, until you reach an empty line, to let foreach then scan that "buffer" like it does for any other array? If so that could possibly use up a lot of RAM?!No. I am reading one line at a time. When I call the delegate I am effectively executing the body of the foreach statement with the line I pass. Then I discard the line and read the next one. So only 1 line is in memory at a time.As Ben mentioned, size_t is either a 32 or 64 bit type depending on the underlying OS/processor. I believe the idea is that using it chooses the most "sensible" type for holding "size" values on the current OS/processor.} return result; } int opApply(int delegate(inout size_t, inout char[]) dg) { int result = 0; size_t lineno;Why did you use size_t for lineno, would int now also work? (I tested this and it works fine to replace all size_t with int).Not 2 structures in the sense of D structs but 2 methods allowing foreach on my new class LineReader, which extends BufferedFile (by adding the foreach ability).char[] line; for(lineno = 1; !eof(); lineno++) { line = readLine(); if (!line) break; result = dg(lineno,line); if (result) break; } return result; } }AFAICT you defined 2 "structures" that will let the user use foreach on "f.open" streams. One version that will "just" read lines another that will also let you retrieve the line numbers as well.You could, I have chosen not to allocate the class till after my error checking, but then I could have moved "LineReader!(BufferedFile) f;" to after the error checking also.. I guess I'm used to C. :)int main(char[][] args) {LineReader!(BufferedFile) f; f = new LineReader!(BufferedFile)();Can be reduced to: LineReader!(BufferedFile) f = new LineReader!(BufferedFile)(); making the equivalent coding to File f = new File(); more obvious.IOW you seem to have defined a new stream?Yes. I have extended/added foreach-ability to any Stream class.It's new speciality of my stream. I think we should add it to Streams though. In addition we could add foreach(char c; f) {} to read characters one at a time.if (args.length < 2) usage(); f.open(args[1],FileMode.In); foreach(char[] line; f)Is this default behavior? I.e. that foreach can parse streams? AFAICT this is the the new speciality of your stream, right? Very nice.No it was not intended. IMO readLine is broken. Regan{ writefln("READ[",line,"]"); } f.close(); f.open(args[1],FileMode.In); foreach(size_t lineno, char[] line; f)Neat.{ writefln("READ[",lineno,"][",line,"]"); } f.close(); return 0; }I noted when testing this code, that it will only read the lines of a stream until an empty line is encountered. Is this indeed intended?
Mar 28 2005
Regan Heath wrote (Ben read your feedback as well thanx):Ah... that is one of the things I really hate as a OOP beginner. It is very difficult to check where the heck certain "behavior" comes from. If the programmer is indeed fully aware of the parent classes, that may be clearer, but when I only see the "new" code, I find it very confusing. I am not even sure one *can* look up the original definition of the parent classes?Because LineReader is a child class of BufferedFile, which is a stream. The readLine call above calls the readLine of the parent class BufferedFile.{ int result = 0; char[] line; while(!eof()) { line = readLine();How come readLine() knows of the stream?Reminds me that I don't actually understand D, and that I only use certain code sniplets all over the place sofar. :)As Ben said, it's part of the foreach "magic", his links should explain it. If not, let us know how the docs are deficient and hopefully someone can improve them.result = dg(line); if (result) break;Don't understand these lines either.Aha... IIRC there was something like that in ANSI C as well... I never trusted it ;)... so size_t is something like a special optimization case. I.e. when do you decide to use good old int, and when do you feel size_t would be a better choice?Why did you use size_t for lineno, would int now also work? (I tested this and it works fine to replace all size_t with int).As Ben mentioned, size_t is either a 32 or 64 bit type depending on the underlying OS/processor. I believe the idea is that using it chooses the most "sensible" type for holding "size" values on the current OS/processor.Neat indeed. BTW, I decided to go the simple way: File lg = open_Read_Log( glb.log ); File mg = open_Write_Mlg( metafile ); File open_Read_Log( char[] logfile ) { char[13] warn = "open_Read_Log"; if( ! std.file.exists(logfile) ) { Err(warn, "Can't open *read* your log file... '"~logfile~"'", "Ensure log file exists and double check path!"); exit(1); } // Define/create "handle" for logfile READ File lg = new File( logfile, FileMode.In ); // If logfile open error: "Error: file '...' not found" return lg; } etc. What surprised me in open_Read_Log(), when comparing it to my ANSI C code: if (fgets(line, M2AXCHR, link)==NULL){ if(ferror(link)!=0){ puts("Error during log read..."); exit(1); } clearerr(link); break;} You can check for file existence. But you do not seem to be able to handle "new File( logfile, FileMode.In )" errors... i.e. if something happens D, will exit with an internal Error message. Presumably one could "catch" such errors to provide own error messages? Same seems to be the case with while( ! lg.eof() ) { line = lg.readLine(); } Should a readLine() error occur, then D trows a internal Error message. I am not sure I *really* want to catch errors, should this be possible in the above 2 cases. But maybe that could be useful? AEonIOW you seem to have defined a new stream?Yes. I have extended/added foreach-ability to any Stream class.
Mar 28 2005
On Tue, 29 Mar 2005 05:35:07 +0200, AEon <aeon2001 lycos.de> wrote:Regan Heath wrote (Ben read your feedback as well thanx):In this case you can look in dmd\src\phobos\std\stream.d for the class definition of BufferedFile. You may be interested in an old thread on method name resolution: http://www.digitalmars.com/d/archives/digitalmars/D/6928.html It's kinda involved but relevant to your comments above as the method name resolution affects the behaviour of a derived class. The idea being D's method name resolution makes it simpler/explicit WRT the behaviour of classes with overloaded methods.Ah... that is one of the things I really hate as a OOP beginner. It is very difficult to check where the heck certain "behavior" comes from. If the programmer is indeed fully aware of the parent classes, that may be clearer, but when I only see the "new" code, I find it very confusing. I am not even sure one *can* look up the original definition of the parent classes?Because LineReader is a child class of BufferedFile, which is a stream. The readLine call above calls the readLine of the parent class BufferedFile.{ int result = 0; char[] line; while(!eof()) { line = readLine();How come readLine() knows of the stream?I wouldn't worry overmuch. I still find it hard to remember how to code things like opApply, I copy/paste from the docs and then modify each time I do it.Reminds me that I don't actually understand D, and that I only use certain code sniplets all over the place sofar. :)As Ben said, it's part of the foreach "magic", his links should explain it. If not, let us know how the docs are deficient and hopefully someone can improve them.result = dg(line); if (result) break;Don't understand these lines either.Good question. I would use 'int' when the size of the type is important, i.e. I need 32 bits. I would use size_t when the size is unimportant, so long as it is "big enough".Aha... IIRC there was something like that in ANSI C as well... I never trusted it ;)... so size_t is something like a special optimization case. I.e. when do you decide to use good old int, and when do you feel size_t would be a better choice?Why did you use size_t for lineno, would int now also work? (I tested this and it works fine to replace all size_t with int).As Ben mentioned, size_t is either a 32 or 64 bit type depending on the underlying OS/processor. I believe the idea is that using it chooses the most "sensible" type for holding "size" values on the current OS/processor.But you do not seem to be able to handle "new File( logfile, FileMode.In )" errors... i.e. if something happens D, will exit with an internal Error message. Presumably one could "catch" such errors to provide own error messages?Yes. try { File f = new File(logfile, FileMode.In); } catch (OpenException e) { writefln("OPEN ERROR - ",e); }Same seems to be the case with while( ! lg.eof() ) { line = lg.readLine(); } Should a readLine() error occur, then D trows a internal Error message.try { while( ! lg.eof() ) { line = lg.readLine(); } } catch (ReadException e) { writefln("READ ERROR - ",e); }I am not sure I *really* want to catch errors, should this be possible in the above 2 cases. But maybe that could be useful?Exceptions are the recommended error handling mechanism for D. The argument/confusion centers around what is worthy of an exception and what is not. For example IMO in the code above not being able to open a file is exceptional (you have assumed it exists by opening in FileMode.In), but, reaching the end of the file is not exceptional as it's guaranteed to happen eventually. Uncaught exceptions are automatically handled by the default handler, for trivial applications allowing it to handle your exceptions (like the failure to open a file) might be exactly what you want. It's your choice. Regan
Mar 29 2005
Regan Heath wrote:Have read several examples by now. Is there a complete list of catch "keywords"? The D documentions mentions a few, but probably not all? e.g. catch (ArrayBoundsError) catch (Object o) catch (std.asserterror.AssertError ae)But you do not seem to be able to handle "new File( logfile, FileMode.In )" errors... i.e. if something happens D, will exit with an internal Error message. Presumably one could "catch" such errors to provide own error messages?Yes. try { File f = new File(logfile, FileMode.In); } catch (OpenException e) { writefln("OPEN ERROR - ",e); }Ahh... info like that could be helpful in the official docs.Same seems to be the case with while( ! lg.eof() ) { line = lg.readLine(); } Should a readLine() error occur, then D trows a internal Error message.try { while( ! lg.eof() ) { line = lg.readLine(); } } catch (ReadException e) { writefln("READ ERROR - ",e); }Well in the above examples it would basically just give me the chance to write out my own messages. But since these cases are serious, there is nothing much one could save. AEonI am not sure I *really* want to catch errors, should this be possible in the above 2 cases. But maybe that could be useful?Exceptions are the recommended error handling mechanism for D. The argument/confusion centers around what is worthy of an exception and what is not. For example IMO in the code above not being able to open a file is exceptional (you have assumed it exists by opening in FileMode.In), but, reaching the end of the file is not exceptional as it's guaranteed to happen eventually. Uncaught exceptions are automatically handled by the default handler, for trivial applications allowing it to handle your exceptions (like the failure to open a file) might be exactly what you want. It's your choice.
Mar 29 2005
On Tue, 29 Mar 2005 19:33:30 +0200, AEon <aeon2001 lycos.de> wrote:Each "catch keyword" is a class derived from the Exception or Error classes. They are defined in the modules that use them. I agree it would be nice to have a complete list. Eventually I can imagine a documentation generator listing all the exceptions that can be thrown by a function. ReganHave read several examples by now. Is there a complete list of catch "keywords"? The D documentions mentions a few, but probably not all? e.g. catch (ArrayBoundsError) catch (Object o) catch (std.asserterror.AssertError ae)But you do not seem to be able to handle "new File( logfile, FileMode.In )" errors... i.e. if something happens D, will exit with an internal Error message. Presumably one could "catch" such errors to provide own error messages?Yes. try { File f = new File(logfile, FileMode.In); } catch (OpenException e) { writefln("OPEN ERROR - ",e); }
Mar 29 2005