digitalmars.D - Review of Jesse Phillips's CSV Parser

dsimcha (7/7) Oct 28 2011 Formal review of Jesse Phillips's CSV parser module begins today and

Vladimir Panteleev (10/12) Oct 28 2011 Documentation should have examples of basic usage at the top. It should ...

Jesse Phillips (8/19) Oct 28 2011 Sounds good. What kind of example should be first? Maybe something that ...

Vladimir Panteleev (21/46) Oct 28 2011 That's not a very good example, since it doesn't show anything about how...

Piotr Szturmaj (3/10) Oct 28 2011 Could csvText!Struct(...) be generalized to also support Tuples and

Jacob Carlborg (4/16) Oct 28 2011 Sounds more like a serialization library with a CSV archive type to me.

Piotr Szturmaj (22/37) Oct 28 2011 I suppose you misunderstood my intentions. Here's an example from the do...

deadalnix (2/41) Oct 28 2011 That would be great !

Jesse Phillips (3/16) Oct 28 2011 I restricted to struct since a class had to be new'd. I and then realize...

dsimcha (12/19) Oct 28 2011 I'll kick this off with my review of the documentation (I'll review the

Jesse Phillips (3/17) Oct 28 2011 I should revisit that. It will still need to be random access and possib...
Jesse Phillips (4/6) Oct 28 2011 Because I do not store anymore information then requested. I must keep p...

Jacob Carlborg (14/21) Oct 28 2011 If a header is specified, is it possible to iterate over the content as

Jesse Phillips (4/22) Oct 28 2011 If I implemented it this way then the entire record would be read in whe...

Jacob Carlborg (4/26) Oct 28 2011 That would be acceptable to me, don't know what how other would see it.

Ary Manzana (8/15) Oct 28 2011 I like the idea of reading a CSV into a struct, or treating all fields

Jesse Phillips (2/11) Oct 28 2011 If you are dealing with incorrect data you must deal with it. If you exp...

Ary Manzana (11/22) Oct 28 2011 You are right.

Jesse Phillips (4/14) Oct 28 2011 I'll look into that.

Walter Bright (30/37) Oct 28 2011 First off, mucho thanks to Jesse for writing this. It's an important mod...

bls (8/9) Oct 29 2011 Walter,

Jesse Phillips (8/19) Oct 29 2011 You have to start somewhere.

bls (7/16) Oct 29 2011 There is somebody working on std.lexer (a generic lexer) (as far as I

Jesse Phillips (5/7) Oct 29 2011 CSV is tabular data, so it would make sense that you may have a similar

Walter Bright (3/5) Oct 29 2011 I don't see a similarity.

Jesse Phillips (61/112) Oct 29 2011 Thank you all.

Jonathan M Davis (6/12) Oct 29 2011 CSV parser libraries exist for Java (IIRC Apache has one), but there def...

Jesse Phillips (5/19) Oct 29 2011 That is all I meant by it. A CSV parser does exist for every language,

Somedude (5/9) Nov 05 2011 Yes, especially since CSV files are still widely used and prove in many

Walter Bright (17/68) Oct 29 2011 I saw the criteria listed, but I didn't see how it was connected with it...

Jesse Phillips (9/9) Oct 30 2011 Thank you for the response, only one more follow-up question.

Steve Teale (10/14) Oct 30 2011 If it's any consolation, I'm suffering from similar questions to myself
Walter Bright (3/12) Oct 30 2011 I started using that convention in dmd, i.e. Objects is an array of Obje...

Eric Poggel (JoeCoder) (2/3) Oct 30 2011 As a D1 user planning to migrate, this would be very appreciated.

Vladimir Panteleev (10/12) Oct 31 2011 Checked the new docs today. If I'm reading them right, the top example

Jesse Phillips (15/28) Oct 31 2011 No, not that I know of. Should it be? Should I go with a more

Jonathan M Davis (4/31) Oct 31 2011 As I recall, part of the problem is that Excel's behavior with regards t...
Jacob Carlborg (6/33) Nov 02 2011 Excel + CSV == Pain in the ass

Don (8/46) Nov 11 2011 Far more horrible than that -- in an old Excel from around 2000 the user...

Jacob Carlborg (4/11) Nov 11 2011 Haha that's just insane.
Vladimir Panteleev (6/13) Nov 11 2011 Wow. Reminds me of this:

Vladimir Panteleev (12/26) Nov 02 2011 I guess such things are really subjective, but personally to me it does ...

Jesse Phillips (12/20) Nov 02 2011 I'll make it a pop culture reference... kidding.

Jesse Phillips (14/22) Nov 05 2011 I've done another update this covers:

Jesse Phillips (5/31) Nov 06 2011 And only one pending request, csvFormatter. I am still considering how I...

Robert Jacques (6/9) Nov 07 2011 My primary use of csv files is as a method of saving data to later graph...

Jesse Phillips (38/61) Nov 10 2011 I don't want to do anything with files, I'm thinking an output buffer

Jonathan M Davis (6/18) Nov 11 2011 Actually, I'd argue that Row would be better than Record, since it _is_ ...

Jesse Phillips (3/9) Nov 11 2011 It _is_ a record of data, just as much as it is a row in a table. The RF...

Jonathan M Davis (10/20) Nov 11 2011 Well, then you have a good argument for keeping it as Record. But spread...
Manu (4/30) Nov 11 2011 +1 for row... although I appreciate the RFC says record all over the pla...

Robert Jacques (4/41) Nov 11 2011 Modules, static and named imports are extremely useful tools for making ...

Jesse Phillips (11/81) Nov 15 2011 I like the point on std.all, but isn't std.csv.Rows and std.csv.Row abou...

Robert Jacques (4/40) Nov 15 2011 Yes and no. Yes, from the point of view of reading 'std.csv.Rows' in exi...

Jesse Phillips (7/9) Nov 15 2011 No, thank you. It is good to have someone fighting a different opinion.

dsimcha (4/11) Nov 06 2011 I apologize. I just noticed that I forgot to put the end date in this
Jonathan M Davis (66/75) Nov 10 2011 - Nitpick: You should probably proofread the module documentation again....

Jesse Phillips (35/116) Nov 10 2011 Err, I'll try, use some of the great tips you gave me :)

dsimcha <dsimcha yahoo.com> writes:

Formal review of Jesse Phillips's CSV parser module begins today and 
runs through November .  Following that, a vote will take place for one 
week.  Please post any comments about this library in this thread.

Docs:
http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

Code:
https://github.com/he-the-great/phobos/tree/csv

Oct 28 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

Documentation should have examples of basic usage at the top. It should be  
possible to figure out how to use a CSV parser within 10 seconds of  
looking at the docs.

It looks like this is only a CSV parser, but not a CSV formatter? That's  
pretty weird. It'd be like std.xml or std.json only having reading  
functionality.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Oct 28 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

Vladimir Panteleev Wrote:

 On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:
 
 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 
 Documentation should have examples of basic usage at the top. It should be  
 possible to figure out how to use a CSV parser within 10 seconds of  
 looking at the docs.

Sounds good. What kind of example should be first? Maybe something that just
gives you an itterable:

auto records = csvText(text);
 
 It looks like this is only a CSV parser, but not a CSV formatter? That's  
 pretty weird. It'd be like std.xml or std.json only having reading  
 functionality.

I have done quite a bit of CSV output and none of them have been the same. The
only similarity is surrounding with quotes and escaping quotes.

"\"" ~ text.replace("\"", "\"\"") ~ "\""

The problem is that how your data is stored and where it needs to go is
different for each app. I don't see a reason to provide a output function where
one must convert their data into a range of struct and specify an output range,
when that conversion is the hard part.

I would consider a function that does escaping/quoting only if the element
needs it. But such a function couldn't work on an Input Range.

Oct 28 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Fri, 28 Oct 2011 19:41:35 +0300, Jesse Phillips  
<jessekphillips+D gmail.com> wrote:

 Vladimir Panteleev Wrote:

 On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Documentation should have examples of basic usage at the top. It should  
 be
 possible to figure out how to use a CSV parser within 10 seconds of
 looking at the docs.

 Sounds good. What kind of example should be first? Maybe something that  
 just gives you an itterable:

 auto records = csvText(text);

That's not a very good example, since it doesn't show anything about how  
to use the result of the csvText call. If it's an iterable, the example  
should use it in a foreach loop.

IMO the example should be a very simple realistic use case, for example  
printing the data in a user-friendly way:
writefln("%s works as a %s and earns %d per year", ...);

 It looks like this is only a CSV parser, but not a CSV formatter? That's
 pretty weird. It'd be like std.xml or std.json only having reading
 functionality.

 I have done quite a bit of CSV output and none of them have been the  
 same. The only similarity is surrounding with quotes and escaping quotes.

 "\"" ~ text.replace("\"", "\"\"") ~ "\""

 The problem is that how your data is stored and where it needs to go is  
 different for each app. I don't see a reason to provide a output  
 function where one must convert their data into a range of struct and  
 specify an output range, when that conversion is the hard part.

 I would consider a function that does escaping/quoting only if the  
 element needs it. But such a function couldn't work on an Input Range.

I think you're overcomplicating this.

You already have a way of specifying the specific format of the CSV file  
(delimiters, quote characters) for the parser. Use the same idiom for the  
formatter.

The formatter should accept data in the same format (datatypes) as what's  
emitted by the parser. If you use functional idioms (ranges), it ought to  
be possible to pipe it all together neatly.

The specifics don't matter very much, but it should be possible to put  
some data (array of string arrays, array of structs or whatnot) into a  
.csv file in a single line of code. Surely this isn't hard?

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Oct 28 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

Could csvText!Struct(...) be generalized to also support Tuples and 
(eventually) classes?

Oct 28 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-28 15:25, Piotr Szturmaj wrote:
 dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

 Could csvText!Struct(...) be generalized to also support Tuples and
 (eventually) classes?

Sounds more like a serialization library with a CSV archive type to me.

-- 
/Jacob Carlborg

Oct 28 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Jacob Carlborg wrote:
 On 2011-10-28 15:25, Piotr Szturmaj wrote:
 dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

 Could csvText!Struct(...) be generalized to also support Tuples and
 (eventually) classes?

 Sounds more like a serialization library with a CSV archive type to me.

I suppose you misunderstood my intentions. Here's an example from the docs:

string str = "Hello,65,63.63\nWorld,123,3673.562";
struct Layout {
     string name;
     int value;
     double other;
}

auto records = csvText!Layout(str);

foreach(record; records) {
     writeln(record.name);
     writeln(record.value);
     writeln(record.other);
}

I want to occasionally write:

alias Tuple!(string, int, double) Layout; // instead of struct

It should be easy to extend, but it would be nice if we have had general 
construct in Phobos. So instead of is(Contents == struct)   
https://github.com/he-the-great/phobos/blob/csv/std/csv.d#L433 there 
should be isComposite!Contents and FieldTypeTuple should be generalized 
(if it isn't) to structs, classes and tuples. I don't know if Tuple as a 
struct doesn't return valid fields from FieldTypeTuple now.

Oct 28 2011

deadalnix <deadalnix gmail.com> writes:

Le 28/10/2011 16:09, Piotr Szturmaj a �crit :
 Jacob Carlborg wrote:
 On 2011-10-28 15:25, Piotr Szturmaj wrote:
 dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

 Could csvText!Struct(...) be generalized to also support Tuples and
 (eventually) classes?

 Sounds more like a serialization library with a CSV archive type to me.

 I suppose you misunderstood my intentions. Here's an example from the docs:

 string str = "Hello,65,63.63\nWorld,123,3673.562";
 struct Layout {
 string name;
 int value;
 double other;
 }

 auto records = csvText!Layout(str);

 foreach(record; records) {
 writeln(record.name);
 writeln(record.value);
 writeln(record.other);
 }

 I want to occasionally write:

 alias Tuple!(string, int, double) Layout; // instead of struct

 It should be easy to extend, but it would be nice if we have had general
 construct in Phobos. So instead of is(Contents == struct)  
 https://github.com/he-the-great/phobos/blob/csv/std/csv.d#L433 there
 should be isComposite!Contents and FieldTypeTuple should be generalized
 (if it isn't) to structs, classes and tuples. I don't know if Tuple as a
 struct doesn't return valid fields from FieldTypeTuple now.

That would be great !

Oct 28 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

Piotr Szturmaj Wrote:

 dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

 
 Could csvText!Struct(...) be generalized to also support Tuples and 
 (eventually) classes?

I restricted to struct since a class had to be new'd. I and then realized most
likely it won't be the final destination. It is better just letting the user
move the items over themselves.

Tuples should just work since they are just struct. Will double check.

Oct 28 2011

dsimcha <dsimcha yahoo.com> writes:

On 10/28/2011 9:18 AM, dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

I'll kick this off with my review of the documentation (I'll review the 
implementation later):

"Header may be provided as first line in file" -> "A header may be 
provided as first line in file"?

csvText:  Shouldn't heading be a generic range of strings instead of a 
string[]?

Why should the order of the heading provided to csvText matter?  csvText 
should simply rearrange its results.

In csvNextToken:  "The expected use of this would be to create a parser. 
And may also be useful when handling errors within a CSV file."  Please 
re-word this sentence so it's grammatically correct.

Oct 28 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

dsimcha Wrote:

 I'll kick this off with my review of the documentation (I'll review the 
 implementation later):
 
 "Header may be provided as first line in file" -> "A header may be 
 provided as first line in file"?
 
 csvText:  Shouldn't heading be a generic range of strings instead of a 
 string[]?

I should revisit that. It will still need to be random access and possible some
others, but doesn't need to be an array.
 
 Why should the order of the heading provided to csvText matter?  csvText 
 should simply rearrange its results.
 
 In csvNextToken:  "The expected use of this would be to create a parser. 
 And may also be useful when handling errors within a CSV file."  Please 
 re-word this sentence so it's grammatically correct.

Oct 28 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

Forgot to answer one.

dsimcha Wrote:

 Why should the order of the heading provided to csvText matter?  csvText 
 should simply rearrange its results.

Because I do not store anymore information then requested. I must keep pumping
out values as I receive them, so order can not be rearranged. With a struct,
you must match the header to the order of fields. This is because I can not
match it by type and requiring that your field names match the file heading
limits what can be in a heading (spaces, punctuation).

Providing specialization for associative arrays also looks like a good option
to resolve this.

Oct 28 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-28 15:18, dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

If a header is specified, is it possible to iterate over the content as 
an associative array, something like this:

string str = "a,b,c\nHello,65,63.63\nWorld,123,3673.562";
auto records = csvText(str, ["b"]);

foreach (header, value ; records) {}

Or accessing a value by header:

foreach (record ; records)
{
     auto value = record["a"];
     auto val = record.b; // perhaps using opDispatch as well
}

-- 
/Jacob Carlborg

Oct 28 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

Jacob Carlborg Wrote:

 If a header is specified, is it possible to iterate over the content as 
 an associative array, something like this:
 
 string str = "a,b,c\nHello,65,63.63\nWorld,123,3673.562";
 auto records = csvText(str, ["b"]);
 
 foreach (header, value ; records) {}
 
 Or accessing a value by header:
 
 foreach (record ; records)
 {
      auto value = record["a"];
      auto val = record.b; // perhaps using opDispatch as well
 }
 
 -- 
 /Jacob Carlborg

If I implemented it this way then the entire record would be read in when a
header exists.

auto records = csvText!(string[string])(str, ["b"]);

Would that be acceptable?

Oct 28 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-28 19:07, Jesse Phillips wrote:
 Jacob Carlborg Wrote:

 If a header is specified, is it possible to iterate over the content as
 an associative array, something like this:

 string str = "a,b,c\nHello,65,63.63\nWorld,123,3673.562";
 auto records = csvText(str, ["b"]);

 foreach (header, value ; records) {}

 Or accessing a value by header:

 foreach (record ; records)
 {
       auto value = record["a"];
       auto val = record.b; // perhaps using opDispatch as well
 }

 --
 /Jacob Carlborg

 If I implemented it this way then the entire record would be read in when a
header exists.

 auto records = csvText!(string[string])(str, ["b"]);

 Would that be acceptable?

That would be acceptable to me, don't know what how other would see it.

-- 
/Jacob Carlborg

Oct 28 2011

Ary Manzana <ary esperanto.org.ar> writes:

On 10/28/11 10:18 AM, dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

I like the idea of reading a CSV into a struct, or treating all fields 
as ints.

But, in my experience, CSVs are never perfect... because humans fill 
them, and humans aren't perfect or don't know "Oh, of course I must fill 
only numbers in this column, otherwise programs will explode".

So although it's a nice functionality, I think just returning everything 
as strings is more than enough.

Oct 28 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

Ary Manzana Wrote:

 I like the idea of reading a CSV into a struct, or treating all fields 
 as ints.
 
 But, in my experience, CSVs are never perfect... because humans fill 
 them, and humans aren't perfect or don't know "Oh, of course I must fill 
 only numbers in this column, otherwise programs will explode".
 
 So although it's a nice functionality, I think just returning everything 
 as strings is more than enough.

If you are dealing with incorrect data you must deal with it. If you expect
everything to be an int, and it isn't what are you going to do? I should check
how easy it is to recover from an error.

Oct 28 2011

Ary Manzana <ary esperanto.org.ar> writes:

On 10/28/11 1:14 PM, Jesse Phillips wrote:
 Ary Manzana Wrote:

 I like the idea of reading a CSV into a struct, or treating all fields
 as ints.

 But, in my experience, CSVs are never perfect... because humans fill
 them, and humans aren't perfect or don't know "Oh, of course I must fill
 only numbers in this column, otherwise programs will explode".

 So although it's a nice functionality, I think just returning everything
 as strings is more than enough.

 If you are dealing with incorrect data you must deal with it. If you expect
everything to be an int, and it isn't what are you going to do? I should check
how easy it is to recover from an error.

You are right.

I see that you throw ConvException, because when some to!... fails it 
throws that. But you don't catch it. So you'd get something like "Can't 
convert 'hello' to an int", but you loose the information of which row 
and column caused the problem. So maybe catching it and throwing a more 
detailed exception...

Also, sometimes people fill out numbers with some comments, like "1 
apple", so maybe a less strict parsing for numbers might be good (as an 
option).

Just some comments :)

Oct 28 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

Ary Manzana Wrote:

 You are right.
 
 I see that you throw ConvException, because when some to!... fails it 
 throws that. But you don't catch it. So you'd get something like "Can't 
 convert 'hello' to an int", but you loose the information of which row 
 and column caused the problem. So maybe catching it and throwing a more 
 detailed exception...

I'll look into that.
 
 Also, sometimes people fill out numbers with some comments, like "1 
 apple", so maybe a less strict parsing for numbers might be good (as an 
 option).

Loose rules are hard to setup to make everyone happy. I think this use case
should be left to the user. I just want to get the information out quickly.
Adding a row count shouldn't be too bad.

Oct 28 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/28/2011 6:18 AM, dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and runs
 through November . Following that, a vote will take place for one week. Please
 post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

First off, mucho thanks to Jesse for writing this. It's an important module for 
Phobos.

This is a review of the documentation:

Include this link: http://en.wikipedia.org/wiki/Comma-separated_values so the 
programmer can learn more about CSV.

"This parser will loosely follow the RFC-4180"

How exactly will it differ from RFC-4180?

"No exceptions are thrown due to incorrect CSV."

What happens, then, when the CSV is incorrect?

"csvText"

Be more specific about what csvText() returns.

I'm confused about the heading parameter to csvText().

"the Content of the input"

What is that? There's no parameter named "Content".

"RecordList"

"Range which provides access to CSV Records and Fields."

What is the Range? Do you mean RecordList is a Range? And what does "provides 
access" mean? If RecordList is a Range, why is it labeled a "List"?

"Array of the heading contained in the file."

Now I'm really confused as to what a heading is. I thought it was the first 
line? Now there's an array of them?

No examples given for constructors.

No documentation for front() or empty().

"Record"

"Returned by a RecordList when Contents is a non-struct."

RecordList is a struct, it does not return anything.

No examples.

"csvNextToken"

Example?

Oct 28 2011

bls <bizprac orange.fr> writes:

On 10/28/2011 12:18 PM, Walter Bright wrote:
 Formal review of Jesse Phillips's CSV parser module

Walter,
Asking for high precision documentation is fine.
And since you face it ...
What about having such a high precision for D2 language definition ?

std.csv is pretty cool. No doubt, But shouldn't std.csv be based on 
std.lexer ?

A+

Oct 29 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Sat, 29 Oct 2011 15:27:22 -0700, bls wrote:

 On 10/28/2011 12:18 PM, Walter Bright wrote:
 Formal review of Jesse Phillips's CSV parser module

 
 Walter,
 Asking for high precision documentation is fine.
 And since you face it ...
 What about having such a high precision for D2 language definition ?

You have to start somewhere.
 
 std.csv is pretty cool. No doubt, But shouldn't std.csv be based on
 std.lexer ?
 
 A+

Well, std.lexer doesn't exist. And if it is reasonable to use one, then 
it could be done without changing the API.

I think the database API stuff will have much more influence on what 
needs to be done with std.csv, but I think what is currently implemented 
must remain as is.

Oct 29 2011

bls <bizprac orange.fr> writes:

On 10/29/2011 03:56 PM, Jesse Phillips wrote:
 std.csv is pretty cool. No doubt, But shouldn't std.csv be based on
  std.lexer ?

  A+


 Well, std.lexer doesn't exist. And if it is reasonable to use one, then
 it could be done without changing the API.

 I think the database API stuff will have much more influence on what
 needs to be done with std.csv, but I think what is currently implemented
 must remain as is.

There is somebody working on std.lexer (a generic lexer) (as far as I 
remember the project started as std.range stress test )

However, I have no problem to imagine that f.i. the Tokenizer will be 
part of the std.lexer module. :)

Regarding std.database.. Please elaborate/  ATM I just can imagine that 
std.csv will be used for Import/Export.

Oct 29 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Sat, 29 Oct 2011 17:05:01 -0700, bls wrote:

 Regarding std.database.. Please elaborate/  ATM I just can imagine that
 std.csv will be used for Import/Export.

CSV is tabular data, so it would make sense that you may have a similar 
interface as you would a database.

csv.select("name").where!"a==4"("years")

Or whatever the interface look like.

Oct 29 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/29/2011 3:27 PM, bls wrote:
 What about having such a high precision for D2 language definition ?

Specific suggestions are welcome. Patches are even more welcome.


 std.csv is pretty cool. No doubt, But shouldn't std.csv be based on std.lexer ?

I don't see a similarity.

Oct 29 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

Thank you all.

I've made my first round of updates to the documentation, will be taking 
another pass and looking at adding to functionality. I'd like some help 
with Walters comments as described below.

On Fri, 28 Oct 2011 12:18:47 -0700, Walter Bright wrote:

 On 10/28/2011 6:18 AM, dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

 
 First off, mucho thanks to Jesse for writing this. It's an important
 module for Phobos.

Thank you. Fun fact, many popular languages don't include a CSV parser, 


 This is a review of the documentation:
 
 Include this link: http://en.wikipedia.org/wiki/Comma-separated_values
 so the programmer can learn more about CSV.

Done, added See Also section.
 
 "This parser will loosely follow the RFC-4180"
 
 How exactly will it differ from RFC-4180?

I do not know how to address this. After making this statement I list the 
criteria. I don't see the need to list where it differs:

* Records terminate with LF, CR and not just CRLF.
* Using Double Quotes or Comma can be customized.

I feel that I summarized well enough that listing yet another would be 
redundant.
 
 "No exceptions are thrown due to incorrect CSV."
 
 What happens, then, when the CSV is incorrect?

Added list.

 "csvText"
 
 Be more specific about what csvText() returns.

You are right. csvText() return a RecordList, but I explain it as though 
it were the Range. I'll think about what detail here, any suggestions are 
welcome.
 
 I'm confused about the heading parameter to csvText().

Well I hope some reorganizing and an added example will help.
 
 "the Content of the input"

 What is that? There's no parameter named "Content".

I've replace all instances of Content with Contents. It is a template 
parameter so hopefully this resolves the confusion.
 
 "RecordList"
 
 "Range which provides access to CSV Records and Fields."
 
 What is the Range? Do you mean RecordList is a Range? And what does
 "provides access" mean? If RecordList is a Range, why is it labeled a
 "List"?

Considering it is the first documentation line of RecordList, I don't see 
where the confusion on "what the range is."

I've re-written the line, it's shorter.

Ok, be prepared for confusion.

RecordList is a range of records, it is not named RecordRange because 
Record is also a range, a range of fields. If you have any suggestions 
for a better name, such as

Document or TabularData

Please let me know what you'd prefer.
 
 "Array of the heading contained in the file."
 
 Now I'm really confused as to what a heading is. I thought it was the
 first line? Now there's an array of them?

The line did not say "headings," but I have change it to:

"Heading from the input in array form."
 
 No examples given for constructors.

I provide an example of using RecordList. Any example for the 
constructors would be the same or similar to the 7 other examples.

Is an example needed for both constructors, and maybe an example for 
front too?
 
 No documentation for front() or empty().

These are range primitives, those programming in D already know "returns 
the current element of the range." I find it more appropriate to describe 
what is returned by front() in the description of the range (many ranges 
have become private and have no documentation at all, not even their 
existence).

I will of course add documentation, but recommendations on non-pointless 
documentation would be welcome. Or how to handle the different types 
which can be assigned to Contents.

 "Record"
 
 "Returned by a RecordList when Contents is a non-struct."
 
 RecordList is a struct, it does not return anything.

It is a range, are we not allowed to talk of them as though they were 
high-level concepts? Do I leave off where you'd obtain a Record (it is 
private). Should I say "Returned by an instance of RecordList when 
calling front?"
 
 No examples.

It is private, removing doc comment for constructor.
 
 "csvNextToken"
 
 Example?

Added.

Hopefully I addressed most of the documentation. Keep them coming.

Oct 29 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, October 29, 2011 22:33:30 Jesse Phillips wrote:
 On 10/28/2011 6:18 AM, dsimcha wrote:
 First off, mucho thanks to Jesse for writing this. It's an important
 module for Phobos.

 
 Thank you. Fun fact, many popular languages don't include a CSV parser,


CSV parser libraries exist for Java (IIRC Apache has one), but there definitely 
isn't one in Java's standard library. CSV parsers are not the sort of thing 
that's generally found in a standard library from what I've seen. So, having 
one will make us abnormal (albeit in a good way).

- Jonathan M Davis

Oct 29 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Sat, 29 Oct 2011 17:19:44 -0700, Jonathan M Davis wrote:

 On Saturday, October 29, 2011 22:33:30 Jesse Phillips wrote:
 On 10/28/2011 6:18 AM, dsimcha wrote:
 First off, mucho thanks to Jesse for writing this. It's an important
 module for Phobos.

 
 Thank you. Fun fact, many popular languages don't include a CSV parser,


 
 CSV parser libraries exist for Java (IIRC Apache has one), but there
 definitely isn't one in Java's standard library. CSV parsers are not the
 sort of thing that's generally found in a standard library from what
 I've seen. So, having one will make us abnormal (albeit in a good way).
 
 - Jonathan M Davis

That is all I meant by it. A CSV parser does exist for every language, 

ini so people would use XML. It really is sad that instead of something 
solid they go the way of "discouraging."

Oct 29 2011

Somedude <lovelydear mailmetrash.com> writes:

Le 30/10/2011 03:09, Jesse Phillips a écrit :
 That is all I meant by it. A CSV parser does exist for every language, 

 ini so people would use XML. It really is sad that instead of something 
 solid they go the way of "discouraging."

Yes, especially since CSV files are still widely used and prove in many
cases far more usable than XML or even JSON. Good luck importing JSON in
Excel.

Dude

Nov 05 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/29/2011 3:33 PM, Jesse Phillips wrote:
 "This parser will loosely follow the RFC-4180"

 How exactly will it differ from RFC-4180?

 I do not know how to address this. After making this statement I list the
 criteria. I don't see the need to list where it differs:

 * Records terminate with LF, CR and not just CRLF.
 * Using Double Quotes or Comma can be customized.

 I feel that I summarized well enough that listing yet another would be
 redundant.

I saw the criteria listed, but I didn't see how it was connected with it being 
different from RFC-4180. Are those criteria the differences? If so, please 
clarify that it is.


 "RecordList"

 "Range which provides access to CSV Records and Fields."

 What is the Range? Do you mean RecordList is a Range? And what does
 "provides access" mean? If RecordList is a Range, why is it labeled a
 "List"?

 Considering it is the first documentation line of RecordList, I don't see
 where the confusion on "what the range is."

 I've re-written the line, it's shorter.

 Ok, be prepared for confusion.

 RecordList is a range of records, it is not named RecordRange because
 Record is also a range, a range of fields. If you have any suggestions
 for a better name, such as

 Document or TabularData

 Please let me know what you'd prefer.

When it says "List" I think of a specific data structure, i.e. linked list. 
Since it is not, it shouldn't be labeled as one.

How about simply "Records" ?


 "Array of the heading contained in the file."

 Now I'm really confused as to what a heading is. I thought it was the
 first line? Now there's an array of them?

 The line did not say "headings," but I have change it to:

 "Heading from the input in array form."

An example would make this much, much clearer.

 Is an example needed for both constructors, and maybe an example for
 front too?

I know that when one is intimately familiar with the code, more examples seem 
redundant. They aren't to the new user!


 No documentation for front() or empty().

 These are range primitives, those programming in D already know "returns
 the current element of the range." I find it more appropriate to describe
 what is returned by front() in the description of the range (many ranges
 have become private and have no documentation at all, not even their
 existence).

 I will of course add documentation, but recommendations on non-pointless
 documentation would be welcome. Or how to handle the different types
 which can be assigned to Contents.

I suggest at a minimum that the docs for these say that they are part of the 
range interface. I want to press Andrei to write up a doc on "What's a Range" 
and then you can just provide a link to the right spot in that.


 "Record"

 "Returned by a RecordList when Contents is a non-struct."

 RecordList is a struct, it does not return anything.

 It is a range, are we not allowed to talk of them as though they were
 high-level concepts? Do I leave off where you'd obtain a Record (it is
 private). Should I say "Returned by an instance of RecordList when
 calling front?"

I think it is very important to be precise in the use of terms in technical 
documentation. A Range doesn't "return" anything either, though there is a 
method of Range which does. A range is a set of elements that can be iterated 
through.

Oct 29 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

Thank you for the response, only one more follow-up question.

Records
Record

Don't these look easy to confuse? And funny if we didn't have type 
inference.

Records records = csvText(str);

foreach(Record record; records) ...

Anyone want to complain about this suggestion? Record is private and can 
only be obtained from Records (is that appropriate phrasing?).

Oct 30 2011

Steve Teale <steve.teale britseyeview.com> writes:

On Sun, 30 Oct 2011 17:02:26 +0000, Jesse Phillips wrote:

 Thank you for the response, only one more follow-up question.
 
 Records
 Record

If it's any consolation, I'm suffering from similar questions to myself 
in my naming of structs and methods for my database interface experiments.

Since it is rather a short step from a generic 'CSV' interface to a 
database interface, you may be determining the terminology that I should 
use. I should really divert for a day and look at what you have done. It 
would be interesting to see if I could write a similar interface over 
your stuff.

But there's so much to do!

Steve

Oct 30 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/30/2011 10:02 AM, Jesse Phillips wrote:
 Thank you for the response, only one more follow-up question.

 Records
 Record

 Don't these look easy to confuse? And funny if we didn't have type
 inference.

 Records records = csvText(str);

 foreach(Record record; records) ...

 Anyone want to complain about this suggestion? Record is private and can
 only be obtained from Records (is that appropriate phrasing?).

I started using that convention in dmd, i.e. Objects is an array of Object, and 
after I got used to it I like it fine.

Oct 30 2011

"Eric Poggel (JoeCoder)" <dnewsgroup2 yage3d.net> writes:

On 10/29/2011 10:03 PM, Walter Bright wrote:
 I want to press Andrei to write up a doc on "What's a Range"

As a D1 user planning to migrate, this would be very appreciated.

Oct 30 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

Checked the new docs today. If I'm reading them right, the top example  
prints:

"Fred works as a Fly and earns $4 per year"

Is this a pop culture reference I'm not catching?

An idea I had the other day was to include convenience presets for the  
more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...)

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Oct 31 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:

 On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:
 
 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 
 Checked the new docs today. If I'm reading them right, the top example
 prints:
 
 "Fred works as a Fly and earns $4 per year"
 
 Is this a pop culture reference I'm not catching?

No, not that I know of. Should it be? Should I go with a more 
professional like example?
 
 An idea I had the other day was to include convenience presets for the
 more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...)

Well if Excel actually conforms to what it claims:

http://office.microsoft.com/en-us/excel-help/excel-formatting-and-
features-that-are-not-transferred-to-other-file-formats-HP010014105.aspx?
CTT=5&origin=HP010099725#BM4

Which I doubt it does, then my parser already defaults to being able to 
read such formats. In my experience Excel sucks at CSV and follows no 
rules.

The implementation I choose as default is the most common and any other 
style is just a butchafication and likely to be unreliable, usually stems 
from "my data has commas, so I'll use colon." Which is great until the 
data also has colon. But maybe that is just my experience.

Oct 31 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, November 01, 2011 01:28:07 Jesse Phillips wrote:
 On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:
 On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:
 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 
 Checked the new docs today. If I'm reading them right, the top example
 prints:
 
 "Fred works as a Fly and earns $4 per year"
 
 Is this a pop culture reference I'm not catching?

 
 No, not that I know of. Should it be? Should I go with a more
 professional like example?
 
 An idea I had the other day was to include convenience presets for the
 more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...)

 
 Well if Excel actually conforms to what it claims:
 
 http://office.microsoft.com/en-us/excel-help/excel-formatting-and-
 features-that-are-not-transferred-to-other-file-formats-HP010014105.aspx?
 CTT=5&origin=HP010099725#BM4
 
 Which I doubt it does, then my parser already defaults to being able to
 read such formats. In my experience Excel sucks at CSV and follows no
 rules.

As I recall, part of the problem is that Excel's behavior with regards to CSV 
is locale-dependent.

- Jonathan M Davis

Oct 31 2011

Jacob Carlborg <doob me.com> writes:

On 2011-11-01 02:28, Jesse Phillips wrote:
 On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:

 On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha<dsimcha yahoo.com>  wrote:

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Checked the new docs today. If I'm reading them right, the top example
 prints:

 "Fred works as a Fly and earns $4 per year"

 Is this a pop culture reference I'm not catching?

 No, not that I know of. Should it be? Should I go with a more
 professional like example?

 An idea I had the other day was to include convenience presets for the
 more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...)

 Well if Excel actually conforms to what it claims:

 http://office.microsoft.com/en-us/excel-help/excel-formatting-and-
 features-that-are-not-transferred-to-other-file-formats-HP010014105.aspx?
 CTT=5&origin=HP010099725#BM4

 Which I doubt it does, then my parser already defaults to being able to
 read such formats. In my experience Excel sucks at CSV and follows no
 rules.

 The implementation I choose as default is the most common and any other
 style is just a butchafication and likely to be unreliable, usually stems
 from "my data has commas, so I'll use colon." Which is great until the
 data also has colon. But maybe that is just my experience.

Excel + CSV == Pain in the ass

Excel uses different value delimiters as the default setting depending 
on the locale of the Excel application.

-- 
/Jacob Carlborg

Nov 02 2011

Don <nospam nospam.com> writes:

On 02.11.2011 09:02, Jacob Carlborg wrote:
 On 2011-11-01 02:28, Jesse Phillips wrote:
 On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:

 On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha<dsimcha yahoo.com> wrote:

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Checked the new docs today. If I'm reading them right, the top example
 prints:

 "Fred works as a Fly and earns $4 per year"

 Is this a pop culture reference I'm not catching?

 No, not that I know of. Should it be? Should I go with a more
 professional like example?

 An idea I had the other day was to include convenience presets for the
 more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...)

 Well if Excel actually conforms to what it claims:

 http://office.microsoft.com/en-us/excel-help/excel-formatting-and-
 features-that-are-not-transferred-to-other-file-formats-HP010014105.aspx?
 CTT=5&origin=HP010099725#BM4

 Which I doubt it does, then my parser already defaults to being able to
 read such formats. In my experience Excel sucks at CSV and follows no
 rules.

 The implementation I choose as default is the most common and any other
 style is just a butchafication and likely to be unreliable, usually stems
 from "my data has commas, so I'll use colon." Which is great until the
 data also has colon. But maybe that is just my experience.

 Excel + CSV == Pain in the ass

 Excel uses different value delimiters as the default setting depending
 on the locale of the Excel application.

Far more horrible than that -- in an old Excel from around 2000 the user 
interface used US settings for reading short dates, if the day of the 
month was 12 or less. Otherwise it uses the locale. But it always uses 
the locale for displaying them.

So 11/1, 12/1, 13/1 gets changed to 1 Nov, 1 Dec, 13 Jan.

I wrote a macro called "November 9" (9/11) for swapping them back if 
they were in the buggy range.

Nov 11 2011

Jacob Carlborg <doob me.com> writes:

On 2011-11-11 12:50, Don wrote:
 Far more horrible than that -- in an old Excel from around 2000 the user
 interface used US settings for reading short dates, if the day of the
 month was 12 or less. Otherwise it uses the locale. But it always uses
 the locale for displaying them.

 So 11/1, 12/1, 13/1 gets changed to 1 Nov, 1 Dec, 13 Jan.

 I wrote a macro called "November 9" (9/11) for swapping them back if
 they were in the buggy range.

Haha that's just insane.

-- 
/Jacob Carlborg

Nov 11 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Fri, 11 Nov 2011 13:50:52 +0200, Don <nospam nospam.com> wrote:

 Far more horrible than that -- in an old Excel from around 2000 the user  
 interface used US settings for reading short dates, if the day of the  
 month was 12 or less. Otherwise it uses the locale. But it always uses  
 the locale for displaying them.

 So 11/1, 12/1, 13/1 gets changed to 1 Nov, 1 Dec, 13 Jan.

 I wrote a macro called "November 9" (9/11) for swapping them back if  
 they were in the buggy range.

Wow. Reminds me of this:  
http://thedailywtf.com/Articles/TransAtlantic-Time-Trap.aspx

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Nov 11 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tue, 01 Nov 2011 03:28:07 +0200, Jesse Phillips  
<jessekphillips+d gmail.com> wrote:

 On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:

 On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Checked the new docs today. If I'm reading them right, the top example
 prints:

 "Fred works as a Fly and earns $4 per year"

 Is this a pop culture reference I'm not catching?

 No, not that I know of. Should it be? Should I go with a more
 professional like example?

I guess such things are really subjective, but personally to me it does  
seem a bit silly, and out-of-place from the documentation of a language  
that aims to be the successor of C++.

Have you decided about whether you'll be adding a formatter? Regardless of  
that, IMO the naming of "csvText" is not ideal. What does the "text" part  
imply - isn't CSV inherently a text format? How about "csvReader", which  
would also disambiguate it from a CSV formatter?

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Nov 02 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Wed, 02 Nov 2011 15:11:24 +0200, Vladimir Panteleev wrote:

 I guess such things are really subjective, but personally to me it does
 seem a bit silly, and out-of-place from the documentation of a language
 that aims to be the successor of C++.

I'll make it a pop culture reference... kidding.
 
 Regardless
 of that, IMO the naming of "csvText" is not ideal. What does the "text"
 part imply - isn't CSV inherently a text format? How about "csvReader",
 which would also disambiguate it from a CSV formatter?

Sounds reasonable to me. CSV could be applied to other things, not that 
it ever is. I started making the implementation very generic and had to 
cut back especially since I couldn't test it all. I had even played with 
the idea of being able to just return slices of a range of things

https://github.com/he-the-great/JPDLibs/blob/separator/csv/csv.d

but stopped its development in favor of something that work with input 
ranges and could get into Phobos.

 Have you decided about whether you'll be adding a formatter? 

Yes I'll give this a try, but I'm going to concentrate on documentation 
of what is there.

Nov 02 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Fri, 28 Oct 2011 09:18:27 -0400, dsimcha wrote:

 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November .  Following that, a vote will take place for one
 week.  Please post any comments about this library in this thread.
 
 Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html
 
 Code:
 https://github.com/he-the-great/phobos/tree/csv

I've done another update this covers:

- Documentation more complete
- A provided heading can be any InputRange of string
- csvText => csvReader
- RecordList => Records
- csvReader no longer takes a Malformed enum, and instead provides 
customization to comma and quote.

Tomorrow I will try and complete one or more of the following with this 
priority:

- Have row/column information for thrown exceptions.
- Support [string][string] Contents.
- Write a CSV formatter. Support for range of struct, range of [string]
[string], and range of string[].

Nov 05 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

And only one pending request, csvFormatter. I am still considering how I 
want to approach this. I don't want to rush it in so I don't plan on 
having it complete before voting starts.

It has been fun and I look forward to the verdict.

On Sat, 05 Nov 2011 23:37:43 +0000, Jesse Phillips wrote:

 On Fri, 28 Oct 2011 09:18:27 -0400, dsimcha wrote:
 
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November .  Following that, a vote will take place for one
 week.  Please post any comments about this library in this thread.
 
 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html
 
 Code:
 https://github.com/he-the-great/phobos/tree/csv

 
 I've done another update this covers:
 
 - Documentation more complete - A provided heading can be any InputRange
 of string - csvText => csvReader - RecordList => Records - csvReader no
 longer takes a Malformed enum, and instead provides customization to
 comma and quote.
 
 Tomorrow I will try and complete one or more of the following with this
 priority:
 
 - Have row/column information for thrown exceptions.
 - Support [string][string] Contents.
 - Write a CSV formatter. Support for range of struct, range of [string]
 [string], and range of string[].

Nov 06 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips <jessekphillips+d gmail.com>
wrote:
 And only one pending request, csvFormatter. I am still considering how I
 want to approach this. I don't want to rush it in so I don't plan on
 having it complete before voting starts.

My primary use of csv files is as a method of saving data to later graph / view
/ format in Excel. i.e.

csvWrite("filename",["x","y","z"],x,y,z);

where x, y, z are ranges with possibly different lengths and/or types.

I also have some API design concerns. Primarily, most of the type names are
overly broad and unclear. For example: Record, Records, Malformed could just as
easily be part of a database API. And because of this, not only does one
reading the code not know what the object is, but if namespace conflicts can
occur, then code reviewers will probably make the wrong assumption half the
time as to what Record is and not bother looking it up.

Furthermore, Records and Record, are essentially internal ranges/structs
returned by csvReader. In the tradition of std.algorithms these should be
defined inside the csvReader function (preferably) or as CsvReader /
CsvReader.Record.

Nov 07 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:

 On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
 <jessekphillips+d gmail.com> wrote:
 And only one pending request, csvFormatter. I am still considering how
 I want to approach this. I don't want to rush it in so I don't plan on
 having it complete before voting starts.

 
 My primary use of csv files is as a method of saving data to later graph
 / view / format in Excel. i.e.
 
 csvWrite("filename",["x","y","z"],x,y,z);
 
 where x, y, z are ranges with possibly different lengths and/or types.

I don't want to do anything with files, I'm thinking an output buffer 
makes the most sense (should be able to have such a buffer write to a 
file).

The x,y,z seems interesting but what about the poor sole who has a range 
of

struct {
    auto x,y,z;
}

If I'm going to make a csvFormatter than making sure something like

csvFormatter(csvReader(txt));

works is number one priority. But you have added a style to consider when 
building the interface.
 
 I also have some API design concerns. Primarily, most of the type names
 are overly broad and unclear. For example: Record, Records, Malformed
 could just as easily be part of a database API. And because of this, not
 only does one reading the code not know what the object is, but if
 namespace conflicts can occur, then code reviewers will probably make
 the wrong assumption half the time as to what Record is and not bother
 looking it up.

The reason it is name like what you would find in a database, is because 
they are the same thing. It is a row of data. In past discussion about 
existing name clashes it was concluded that was what modules, static, and 
named imports wore for.

Malformed, I just don't have any clue what to actually call it. No 
CSVMalformed doesn't help.

But then you go on to the point below, that you shouldn't even see them.
 
 Furthermore, Records and Record, are essentially internal ranges/structs
 returned by csvReader. In the tradition of std.algorithms these should
 be defined inside the csvReader function (preferably) or as CsvReader /
 CsvReader.Record.

I was going to explain how wrong you are and how I tried having all the 
options part of csvReader but it just muddies up the clean API when 
customizing something. But actually I already found the solution with my 
previous updates and incorrectly chose to move Malformed customization 
out.

Long story short the problems I see:

* The documentation still needs to exist and will be a little harder to 
do without these
* Records contains the heading field, this is abnormal for an InputRange 
and means it won't show as an auto link if it becomes a hidden range.
* Currently you can find exactly which function call result in a specific 
error, if I hide the ranges you don't.

I'm expecting that introducing CSVFormatter will require another review, 
at that time I can try making two forms of documentation to see which is 
preferred.

Thank you! Very useful food for thought.

Nov 10 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, November 11, 2011 03:17:10 Jesse Phillips wrote:
 I also have some API design concerns. Primarily, most of the type names
 are overly broad and unclear. For example: Record, Records, Malformed
 could just as easily be part of a database API. And because of this, not
 only does one reading the code not know what the object is, but if
 namespace conflicts can occur, then code reviewers will probably make
 the wrong assumption half the time as to what Record is and not bother
 looking it up.

 
 The reason it is name like what you would find in a database, is because
 they are the same thing. It is a row of data. In past discussion about
 existing name clashes it was concluded that was what modules, static, and
 named imports wore for.

Actually, I'd argue that Row would be better than Record, since it _is_ a row 
in a table. Personally, I'd find it to be more immediately clear that way. With 
Record, I have to figure out what the heck it is a record of, whereas Row is 
immediately obvious in this context.

- Jonathan M Davis

Nov 11 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Fri, 11 Nov 2011 02:17:40 -0800, Jonathan M Davis wrote:

 Actually, I'd argue that Row would be better than Record, since it _is_
 a row in a table. Personally, I'd find it to be more immediately clear
 that way. With Record, I have to figure out what the heck it is a record
 of, whereas Row is immediately obvious in this context.
 
 - Jonathan M Davis

It _is_ a record of data, just as much as it is a row in a table. The RFC 
I reference continuously refers to records and never even says "row."

Nov 11 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, November 11, 2011 15:25:29 Jesse Phillips wrote:
 On Fri, 11 Nov 2011 02:17:40 -0800, Jonathan M Davis wrote:
 Actually, I'd argue that Row would be better than Record, since it _is_
 a row in a table. Personally, I'd find it to be more immediately clear
 that way. With Record, I have to figure out what the heck it is a record
 of, whereas Row is immediately obvious in this context.
 
 - Jonathan M Davis

 
 It _is_ a record of data, just as much as it is a row in a table. The RFC
 I reference continuously refers to records and never even says "row."

Well, then you have a good argument for keeping it as Record. But spreadsheets 
are tables of rows and columns, and a CSV file is a spreadsheet. Personally, I 
find the term record overly vague and wouldn't use it for much of anything - 
not to mention that it makes me think of the analog music device rather than 
anything else. So, I don't particularly like the name Record. I wouldn't use 
it in reference to a DB either. I'd use the term row there as well. But others 
may not agree with me, and if the RFC uses the term record, then that's a 
definite argument for using Record instead of Row.

- Jonathan M Davis

Nov 11 2011

Manu <turkeyman gmail.com> writes:

On 11 November 2011 19:07, Jonathan M Davis <jmdavisProg gmx.com> wrote:

 On Friday, November 11, 2011 15:25:29 Jesse Phillips wrote:
 On Fri, 11 Nov 2011 02:17:40 -0800, Jonathan M Davis wrote:
 Actually, I'd argue that Row would be better than Record, since it _is_
 a row in a table. Personally, I'd find it to be more immediately clear
 that way. With Record, I have to figure out what the heck it is a


 record
 of, whereas Row is immediately obvious in this context.

 - Jonathan M Davis

 It _is_ a record of data, just as much as it is a row in a table. The RFC
 I reference continuously refers to records and never even says "row."

 Well, then you have a good argument for keeping it as Record. But
 spreadsheets
 are tables of rows and columns, and a CSV file is a spreadsheet.
 Personally, I
 find the term record overly vague and wouldn't use it for much of anything
 -
 not to mention that it makes me think of the analog music device rather
 than
 anything else. So, I don't particularly like the name Record. I wouldn't
 use
 it in reference to a DB either. I'd use the term row there as well. But
 others
 may not agree with me, and if the RFC uses the term record, then that's a
 definite argument for using Record instead of Row.


+1 for row... although I appreciate the RFC says record all over the place.
That said though, upon reading it, I think that might be the worst RFC ever
written ;)

Nov 11 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips <jessekphillips+d gmail.com>
wrote:
 On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:
 On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
 <jessekphillips+d gmail.com> wrote:


[snip]
 I also have some API design concerns. Primarily, most of the type names
 are overly broad and unclear. For example: Record, Records, Malformed
 could just as easily be part of a database API. And because of this, not
 only does one reading the code not know what the object is, but if
 namespace conflicts can occur, then code reviewers will probably make
 the wrong assumption half the time as to what Record is and not bother
 looking it up.

 The reason it is name like what you would find in a database, is because
 they are the same thing. It is a row of data. In past discussion about
 existing name clashes it was concluded that was what modules, static, and
 named imports wore for.

 Malformed, I just don't have any clue what to actually call it. No
 CSVMalformed doesn't help.

 But then you go on to the point below, that you shouldn't even see them.

 Furthermore, Records and Record, are essentially internal ranges/structs
 returned by csvReader. In the tradition of std.algorithms these should
 be defined inside the csvReader function (preferably) or as CsvReader /
 CsvReader.Record.

 I was going to explain how wrong you are and how I tried having all the
 options part of csvReader but it just muddies up the clean API when
 customizing something. But actually I already found the solution with my
 previous updates and incorrectly chose to move Malformed customization
 out.

 Long story short the problems I see:

 * The documentation still needs to exist and will be a little harder to
 do without these
 * Records contains the heading field, this is abnormal for an InputRange
 and means it won't show as an auto link if it becomes a hidden range.
 * Currently you can find exactly which function call result in a specific
 error, if I hide the ranges you don't.

 I'm expecting that introducing CSVFormatter will require another review,
 at that time I can try making two forms of documentation to see which is
 preferred.

 Thank you! Very useful food for thought.

Modules, static and named imports are extremely useful tools for making
disparate third party libraries work together. But Phobos is one (big) library,
not a bunch of separate ones. We shouldn't ever need to use these tools to
resolve Phobos name conflicts. In fact one of the (many) goals of D is to scale
down to quick and dirty scripts. rdmd helps with this a lot, but the other
piece of this puzzle is 'import std.all;' which doesn't work today. And while I
don't think we should break existing code just to fix this problem, going
forward, as we update and add libraries to Phobos, we need to strive to prevent
name collisions. And, like std.file and std.stdio, I'd expect std.csv and
std.databases to be used together often. Furthermore, your primary case for
exposing these structs is for documentation purposes. As the user isn't
expected to actually instantiate these structs (or at least not often), naming
them CsvReader and CsvReader.Row is perfectly acceptable. Lastly, I'd like to  
draw your attention to std.algorithm.findSplit. findSplit has to return three
ranges, which it does so using a Tuple. Records is simply a manual tuple of the
headings and the row ranges, so why not leverage Phobos for this instead?

Nov 11 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Sat, 12 Nov 2011 00:21:55 -0500, Robert Jacques wrote:

 On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips
 <jessekphillips+d gmail.com> wrote:
 On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:
 On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
 <jessekphillips+d gmail.com> wrote:


 [snip]
 I also have some API design concerns. Primarily, most of the type
 names are overly broad and unclear. For example: Record, Records,
 Malformed could just as easily be part of a database API. And because
 of this, not only does one reading the code not know what the object
 is, but if namespace conflicts can occur, then code reviewers will
 probably make the wrong assumption half the time as to what Record is
 and not bother looking it up.

 The reason it is name like what you would find in a database, is
 because they are the same thing. It is a row of data. In past
 discussion about existing name clashes it was concluded that was what
 modules, static, and named imports wore for.

 Malformed, I just don't have any clue what to actually call it. No
 CSVMalformed doesn't help.

 But then you go on to the point below, that you shouldn't even see
 them.

 Furthermore, Records and Record, are essentially internal
 ranges/structs returned by csvReader. In the tradition of
 std.algorithms these should be defined inside the csvReader function
 (preferably) or as CsvReader / CsvReader.Record.

 I was going to explain how wrong you are and how I tried having all the
 options part of csvReader but it just muddies up the clean API when
 customizing something. But actually I already found the solution with
 my previous updates and incorrectly chose to move Malformed
 customization out.

 Long story short the problems I see:

 * The documentation still needs to exist and will be a little harder to
 do without these * Records contains the heading field, this is abnormal
 for an InputRange and means it won't show as an auto link if it becomes
 a hidden range.
 * Currently you can find exactly which function call result in a
 specific error, if I hide the ranges you don't.

 I'm expecting that introducing CSVFormatter will require another
 review, at that time I can try making two forms of documentation to see
 which is preferred.

 Thank you! Very useful food for thought.

 
 Modules, static and named imports are extremely useful tools for making
 disparate third party libraries work together. But Phobos is one (big)
 library, not a bunch of separate ones. We shouldn't ever need to use
 these tools to resolve Phobos name conflicts. In fact one of the (many)
 goals of D is to scale down to quick and dirty scripts. rdmd helps with
 this a lot, but the other piece of this puzzle is 'import std.all;'
 which doesn't work today. And while I don't think we should break
 existing code just to fix this problem, going forward, as we update and
 add libraries to Phobos, we need to strive to prevent name collisions.
 And, like std.file and std.stdio, I'd expect std.csv and std.databases
 to be used together often. Furthermore, your primary case for exposing
 these structs is for documentation purposes. As the user isn't expected
 to actually instantiate these structs (or at least not often), naming
 them CsvReader and CsvReader.Row is perfectly acceptable. 

I like the point on std.all, but isn't std.csv.Rows and std.csv.Row about 
the same as your example names? And if dmd could do partial matches (not 
exactly requesting such a feature) csv.Rows, csv.Row.

 Lastly, I'd
 like to draw your attention to std.algorithm.findSplit. findSplit has to
 return three ranges, which it does so using a Tuple. Records is simply a
 manual tuple of the headings and the row ranges, so why not leverage
 Phobos for this instead?

Records requires access to the header for building the associative array. 
So while I still could return both, the range would still be holding the 
header. If you think it would make an improvement then you'd probably 
want to update your vote to no. Namely what I would see coming from that:


auto records = csvReader(txt);

records.header ...;
foreach(row; records.rows) { ... }

Nov 15 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 15 Nov 2011 22:30:51 -0500, Jesse Phillips <jessekphillips+d gmail.com>
wrote:
 On Sat, 12 Nov 2011 00:21:55 -0500, Robert Jacques wrote:
 On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips
 <jessekphillips+d gmail.com> wrote:
 On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:
 On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
 <jessekphillips+d gmail.com> wrote:




[snip]

 Modules, static and named imports are extremely useful tools for making
 disparate third party libraries work together. But Phobos is one (big)
 library, not a bunch of separate ones. We shouldn't ever need to use
 these tools to resolve Phobos name conflicts. In fact one of the (many)
 goals of D is to scale down to quick and dirty scripts. rdmd helps with
 this a lot, but the other piece of this puzzle is 'import std.all;'
 which doesn't work today. And while I don't think we should break
 existing code just to fix this problem, going forward, as we update and
 add libraries to Phobos, we need to strive to prevent name collisions.
 And, like std.file and std.stdio, I'd expect std.csv and std.databases
 to be used together often. Furthermore, your primary case for exposing
 these structs is for documentation purposes. As the user isn't expected
 to actually instantiate these structs (or at least not often), naming
 them CsvReader and CsvReader.Row is perfectly acceptable.

 I like the point on std.all, but isn't std.csv.Rows and std.csv.Row about
 the same as your example names?

Yes and no. Yes, from the point of view of reading 'std.csv.Rows' in existing
code. No, from the point of view writing code. You're much more likely to write
just 'Rows' and then have to go through a quick-compile-fix-compile loop.
Worse, consider the people using the colliding std.databases.Row struct, which
now have to use the fully qualified name. I don't feel that overly succinct
names are important to std.csv, as most users will be using 'auto records =
csvReader(txt);' and will never have to type out std.csv.Rows or csvRows or
whatever.

 And if dmd could do partial matches (not
 exactly requesting such a feature) csv.Rows, csv.Row.
 Lastly, I'd
 like to draw your attention to std.algorithm.findSplit. findSplit has to
 return three ranges, which it does so using a Tuple. Records is simply a
 manual tuple of the headings and the row ranges, so why not leverage
 Phobos for this instead?

 Records requires access to the header for building the associative array.
 So while I still could return both, the range would still be holding the
 header. If you think it would make an improvement then you'd probably
 want to update your vote to no. Namely what I would see coming from that:


 auto records = csvReader(txt);

 records.header ...;
 foreach(row; records.rows) { ... }

Thank you for explaining. I wasn't seeing the implementation efficiency that
comes from the custom struct approach.

Nov 15 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Tue, 15 Nov 2011 23:12:57 -0500, Robert Jacques wrote:

 Thank you for explaining. I wasn't seeing the implementation efficiency
 that comes from the custom struct approach.

No, thank you. It is good to have someone fighting a different opinion. 
It is because of you we can have this bike-shedding and not even have 
shed :D. Had the hiding of both structs not worked so well I would have 
changed it (as I would have thought on it more, realized it was two 
against zero, and wouldn't be distracted with the idea of simplifying the 
docs)

Nov 15 2011

dsimcha <dsimcha yahoo.com> writes:

On 10/28/2011 9:18 AM, dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November . Following that, a vote will take place for one
 week. Please post any comments about this library in this thread.

 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html

 Code:
 https://github.com/he-the-great/phobos/tree/csv

I apologize.  I just noticed that I forgot to put the end date in this 
message.  The review of the CSV parser runs the standard two weeks, 
until November 11.

Nov 06 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, October 28, 2011 09:18:27 dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November .  Following that, a vote will take place for one
 week.  Please post any comments about this library in this thread.
 
 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html
 
 Code:
 https://github.com/he-the-great/phobos/tree/csv

- Nitpick: You should probably proofread the module documentation again. It 
has several mispellings and grammar errors.

- The functions and types mentioned in the documentation should use LREF so 
that they're links to their documentation.

- Shouldn't the term "header" be used rather than "heading?" So, you'd have 
HeaderMismatchException instead of HeadingMismatchException. Usually, the term 
heading is only used for stuff like the heading of an article. Usually, the 
term header is used for stuff like files. RFC 4180 uses the term header but 
never heading. So, unless I'm missing something, I really think that all of 
the instances of heading (both in the documentation and in the API) should be 
changed to header.

- I'd argue against Separator in csvReader being char by default. Individual 
characters should generally be dchar, not char. Using char by itself is almost 
always bad practice IMHO.

- I'd say that csvReader should say that it returns a Records struct which is 
an input range rather than saying that it provides std.range.isInputRange. We 
just don't talk that way about ranges normally. We say that something is a 
particular range type, not that it provides the API for a particular range 
type. If you want to make "is an input range" a link to 
std.range.isInputRange, that's fine, but what's currently there is not how 
ranges are typically referred to. I'd also suggest that you on the range 
functions where you say that they're part of the std.range.isInputRange 
interface, you should use the term API instead of interface, since interface 
has a specific meaning in the language which has nothing to do with this 
context.

- Why does csvNextToken take an Appender!(char[])? That seems overly 
restrictive. I'd argue that it should take an output range of dchar (which 
Appender!(char[]) is). Then it's not restricted to Appender, and it's not 
restricted to char[].

- Also, the fact that you have isSomeChar!(ElementType!Range) concerns me 
somewhat. Normally, I'd expect something more like is(ElementType!Range == 
dchar). As it is, you could have a range of char or wchar, which could do 
funny things. Phobos doesn't normally support that, and I'd be concerned about 
what would happen if someone tried. ElementType!Range will be dchar for all 
string types, so they'll work fine either way, but programmers who are overly 
adventurous or ignorant may try and use a char or wchar range, and I'm willing 
to be that there would be problems with that. Just all around, I'd advise 
checking that the ranges are ranges of dchar and using dchar for the 
separators rather than char.

- You should have unit tests testing ranges _other_ than string. The fact that 
neither wstring nor dstring is tested is worrisome, and ideally, you'd do stuff 
like call filter!"true" on a string so that you have an input range which isn't 
a string. Without at least some of that, I'd be very worried that your code 
can't actually handle generic ranges of dchar and really only works with 
string.

- You should have some unit tests with unicode values - preferably some which 
are multiple code units for both char and wchar. For instance, std.string uses 
\U00010143 in some of its tests, because it's 4 code units in UTF-8 and 2 in 
UTF-16. The complete lack of testing with unicode is worrisome in the same way 
that only testing with string is worrisome. It's too easy to screw something 
up such that it happens to work with string and ASCII but not in general. So, 
please add tests which involve unicode - not only in the data but in the 
separators.

- I find it somewhat odd that you would use a pointer to the range in Record 
rather than using a slice, but I'd have to examine it in more detail to see 
whether that's actually necessary or not. But in general, I'd argue that a 
slice should be used rather than a pointer if it can be.

- Jonathan M Davis


P.S.  Just as a note, since you seem to like not using braces if you don't 
need to for if statements and the like, I though that I'd point out that try 
and catch don't require braces in D if htey contain only one statement. You 
have several one line catches which use braces when they don't need to. I'm 
not arguing that you should or shouldn't change it, but I thought that I would 
point out that the braces aren't necessary in case you didn't know (since they 
_are_ necessary in other languages).

Nov 10 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

Thank you, lots of stuff to ponder and comment.

On Thu, 10 Nov 2011 02:10:28 -0800, Jonathan M Davis wrote:

 On Friday, October 28, 2011 09:18:27 dsimcha wrote:
 Formal review of Jesse Phillips's CSV parser module begins today and
 runs through November .  Following that, a vote will take place for one
 week.  Please post any comments about this library in this thread.
 
 Docs:
 http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html
 
 Code:
 https://github.com/he-the-great/phobos/tree/csv

 
 - Nitpick: You should probably proofread the module documentation again.
 It has several mispellings and grammar errors.

Err, I'll try, use some of the great tips you gave me :)
 
 - The functions and types mentioned in the documentation should use LREF
 so that they're links to their documentation.

I'll see if I can find them all while proofing.
 
 - Shouldn't the term "header" be used rather than "heading?" So, you'd
 have HeaderMismatchException instead of HeadingMismatchException.
 Usually, the term heading is only used for stuff like the heading of an
 article. Usually, the term header is used for stuff like files. RFC 4180
 uses the term header but never heading. So, unless I'm missing
 something, I really think that all of the instances of heading (both in
 the documentation and in the API) should be changed to header.

Used both in development, then chose one, guess I got it wrong. Changing.
 
 - I'd argue against Separator in csvReader being char by default.
 Individual characters should generally be dchar, not char. Using char by
 itself is almost always bad practice IMHO.

The only reason I put defaults on it was because it wouldn't choose the 
value for it from my default values. Otherwise it works with what you 
give it, be it char, wchar, dchar.
 
 - I'd say that csvReader should say that it returns a Records struct
 which is an input range rather than saying that it provides
 std.range.isInputRange. We just don't talk that way about ranges
 normally. We say that something is a particular range type, not that it
 provides the API for a particular range type. If you want to make "is an
 input range" a link to std.range.isInputRange, that's fine, but what's
 currently there is not how ranges are typically referred to. I'd also
 suggest that you on the range functions where you say that they're part
 of the std.range.isInputRange interface, you should use the term API
 instead of interface, since interface has a specific meaning in the
 language which has nothing to do with this context.

Thanks.
 
 - Why does csvNextToken take an Appender!(char[])? That seems overly
 restrictive. I'd argue that it should take an output range of dchar
 (which Appender!(char[]) is). Then it's not restricted to Appender, and
 it's not restricted to char[].

Appender is used so that the exception thrown can contain the consumed 
data. Thinking again on this, I suppose I can catch the exception add the 
data and rethrow. Sounds pretty good actually.
 
 - Also, the fact that you have isSomeChar!(ElementType!Range) concerns
 me somewhat. Normally, I'd expect something more like
 is(ElementType!Range == dchar). As it is, you could have a range of char
 or wchar, which could do funny things. Phobos doesn't normally support
 that, and I'd be concerned about what would happen if someone tried.
 ElementType!Range will be dchar for all string types, so they'll work
 fine either way, but programmers who are overly adventurous or ignorant
 may try and use a char or wchar range, and I'm willing to be that there
 would be problems with that. Just all around, I'd advise checking that
 the ranges are ranges of dchar and using dchar for the separators rather
 than char.

I was of a different mindset of the time. You are probably right I should 
prevent someone from eating their leg off.
 
 - You should have unit tests testing ranges _other_ than string. The
 fact that neither wstring nor dstring is tested is worrisome, and
 ideally, you'd do stuff like call filter!"true" on a string so that you
 have an input range which isn't a string. Without at least some of that,
 I'd be very worried that your code can't actually handle generic ranges
 of dchar and really only works with string.

Did add a wchar input range test, but now it should just be an dchar 
input range test since wchar ranges won't be allowed.
 
 - You should have some unit tests with unicode values - preferably some
 which are multiple code units for both char and wchar. For instance,
 std.string uses \U00010143 in some of its tests, because it's 4 code
 units in UTF-8 and 2 in UTF-16. The complete lack of testing with
 unicode is worrisome in the same way that only testing with string is
 worrisome. It's too easy to screw something up such that it happens to
 work with string and ASCII but not in general. So, please add tests
 which involve unicode - not only in the data but in the separators.

Didn't create any new tests, just added unicode in to input and as 
separators into old tests.
 
 - I find it somewhat odd that you would use a pointer to the range in
 Record rather than using a slice, but I'd have to examine it in more
 detail to see whether that's actually necessary or not. But in general,
 I'd argue that a slice should be used rather than a pointer if it can
 be.

Record requires modifying the range, for a true input range this is not a 
problem, but strings and some other struct based ranges are implicit 
forward ranges and save upon passing.

Slicing doesn't exist on input ranges, though I'd almost argue [] should 
take the place of save() in a forward range.
 
 - Jonathan M Davis
 
 
 P.S.  Just as a note, since you seem to like not using braces if you
 don't need to for if statements and the like, I though that I'd point
 out that try and catch don't require braces in D if htey contain only
 one statement. You have several one line catches which use braces when
 they don't need to. I'm not arguing that you should or shouldn't change
 it, but I thought that I would point out that the braces aren't
 necessary in case you didn't know (since they _are_ necessary in other
 languages).

Thanks, hopefully I'm consistent in that regard and I'll just leave it. I 
knew of try, but not catch.

Nov 10 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Review of Jesse Phillips's CSV Parser