digitalmars.D - Review of Jesse Phillips's CSV Parser
- dsimcha (7/7) Oct 28 2011 Formal review of Jesse Phillips's CSV parser module begins today and
- Vladimir Panteleev (10/12) Oct 28 2011 Documentation should have examples of basic usage at the top. It should ...
- Jesse Phillips (8/19) Oct 28 2011 Sounds good. What kind of example should be first? Maybe something that ...
- Vladimir Panteleev (21/46) Oct 28 2011 That's not a very good example, since it doesn't show anything about how...
- Piotr Szturmaj (3/10) Oct 28 2011 Could csvText!Struct(...) be generalized to also support Tuples and
- Jacob Carlborg (4/16) Oct 28 2011 Sounds more like a serialization library with a CSV archive type to me.
- Piotr Szturmaj (22/37) Oct 28 2011 I suppose you misunderstood my intentions. Here's an example from the do...
- deadalnix (2/41) Oct 28 2011 That would be great !
- Jesse Phillips (3/16) Oct 28 2011 I restricted to struct since a class had to be new'd. I and then realize...
- dsimcha (12/19) Oct 28 2011 I'll kick this off with my review of the documentation (I'll review the
- Jesse Phillips (3/17) Oct 28 2011 I should revisit that. It will still need to be random access and possib...
- Jesse Phillips (4/6) Oct 28 2011 Because I do not store anymore information then requested. I must keep p...
- Jacob Carlborg (14/21) Oct 28 2011 If a header is specified, is it possible to iterate over the content as
- Jesse Phillips (4/22) Oct 28 2011 If I implemented it this way then the entire record would be read in whe...
- Jacob Carlborg (4/26) Oct 28 2011 That would be acceptable to me, don't know what how other would see it.
- Ary Manzana (8/15) Oct 28 2011 I like the idea of reading a CSV into a struct, or treating all fields
- Jesse Phillips (2/11) Oct 28 2011 If you are dealing with incorrect data you must deal with it. If you exp...
- Ary Manzana (11/22) Oct 28 2011 You are right.
- Jesse Phillips (4/14) Oct 28 2011 I'll look into that.
- Walter Bright (30/37) Oct 28 2011 First off, mucho thanks to Jesse for writing this. It's an important mod...
- bls (8/9) Oct 29 2011 Walter,
- Jesse Phillips (8/19) Oct 29 2011 You have to start somewhere.
- bls (7/16) Oct 29 2011 There is somebody working on std.lexer (a generic lexer) (as far as I
- Jesse Phillips (5/7) Oct 29 2011 CSV is tabular data, so it would make sense that you may have a similar
- Walter Bright (3/5) Oct 29 2011 I don't see a similarity.
- Jesse Phillips (61/112) Oct 29 2011 Thank you all.
- Jonathan M Davis (6/12) Oct 29 2011 CSV parser libraries exist for Java (IIRC Apache has one), but there def...
- Jesse Phillips (5/19) Oct 29 2011 That is all I meant by it. A CSV parser does exist for every language,
- Somedude (5/9) Nov 05 2011 Yes, especially since CSV files are still widely used and prove in many
- Walter Bright (17/68) Oct 29 2011 I saw the criteria listed, but I didn't see how it was connected with it...
- Jesse Phillips (9/9) Oct 30 2011 Thank you for the response, only one more follow-up question.
- Steve Teale (10/14) Oct 30 2011 If it's any consolation, I'm suffering from similar questions to myself
- Walter Bright (3/12) Oct 30 2011 I started using that convention in dmd, i.e. Objects is an array of Obje...
- Eric Poggel (JoeCoder) (2/3) Oct 30 2011 As a D1 user planning to migrate, this would be very appreciated.
- Vladimir Panteleev (10/12) Oct 31 2011 Checked the new docs today. If I'm reading them right, the top example
- Jesse Phillips (15/28) Oct 31 2011 No, not that I know of. Should it be? Should I go with a more
- Jonathan M Davis (4/31) Oct 31 2011 As I recall, part of the problem is that Excel's behavior with regards t...
- Jacob Carlborg (6/33) Nov 02 2011 Excel + CSV == Pain in the ass
- Don (8/46) Nov 11 2011 Far more horrible than that -- in an old Excel from around 2000 the user...
- Jacob Carlborg (4/11) Nov 11 2011 Haha that's just insane.
- Vladimir Panteleev (6/13) Nov 11 2011 Wow. Reminds me of this:
- Vladimir Panteleev (12/26) Nov 02 2011 I guess such things are really subjective, but personally to me it does ...
- Jesse Phillips (12/20) Nov 02 2011 I'll make it a pop culture reference... kidding.
- Jesse Phillips (14/22) Nov 05 2011 I've done another update this covers:
- Jesse Phillips (5/31) Nov 06 2011 And only one pending request, csvFormatter. I am still considering how I...
- Robert Jacques (6/9) Nov 07 2011 My primary use of csv files is as a method of saving data to later graph...
- Jesse Phillips (38/61) Nov 10 2011 I don't want to do anything with files, I'm thinking an output buffer
- Jonathan M Davis (6/18) Nov 11 2011 Actually, I'd argue that Row would be better than Record, since it _is_ ...
- Jesse Phillips (3/9) Nov 11 2011 It _is_ a record of data, just as much as it is a row in a table. The RF...
- Jonathan M Davis (10/20) Nov 11 2011 Well, then you have a good argument for keeping it as Record. But spread...
- Manu (4/30) Nov 11 2011 +1 for row... although I appreciate the RFC says record all over the pla...
- Robert Jacques (4/41) Nov 11 2011 Modules, static and named imports are extremely useful tools for making ...
- Jesse Phillips (11/81) Nov 15 2011 I like the point on std.all, but isn't std.csv.Rows and std.csv.Row abou...
- Robert Jacques (4/40) Nov 15 2011 Yes and no. Yes, from the point of view of reading 'std.csv.Rows' in exi...
- Jesse Phillips (7/9) Nov 15 2011 No, thank you. It is good to have someone fighting a different opinion.
- dsimcha (4/11) Nov 06 2011 I apologize. I just noticed that I forgot to put the end date in this
- Jonathan M Davis (66/75) Nov 10 2011 - Nitpick: You should probably proofread the module documentation again....
- Jesse Phillips (35/116) Nov 10 2011 Err, I'll try, use some of the great tips you gave me :)
Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csv
Oct 28 2011
On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.htmlDocumentation should have examples of basic usage at the top. It should be possible to figure out how to use a CSV parser within 10 seconds of looking at the docs. It looks like this is only a CSV parser, but not a CSV formatter? That's pretty weird. It'd be like std.xml or std.json only having reading functionality. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Oct 28 2011
Vladimir Panteleev Wrote:On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:Sounds good. What kind of example should be first? Maybe something that just gives you an itterable: auto records = csvText(text);Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.htmlDocumentation should have examples of basic usage at the top. It should be possible to figure out how to use a CSV parser within 10 seconds of looking at the docs.It looks like this is only a CSV parser, but not a CSV formatter? That's pretty weird. It'd be like std.xml or std.json only having reading functionality.I have done quite a bit of CSV output and none of them have been the same. The only similarity is surrounding with quotes and escaping quotes. "\"" ~ text.replace("\"", "\"\"") ~ "\"" The problem is that how your data is stored and where it needs to go is different for each app. I don't see a reason to provide a output function where one must convert their data into a range of struct and specify an output range, when that conversion is the hard part. I would consider a function that does escaping/quoting only if the element needs it. But such a function couldn't work on an Input Range.
Oct 28 2011
On Fri, 28 Oct 2011 19:41:35 +0300, Jesse Phillips <jessekphillips+D gmail.com> wrote:Vladimir Panteleev Wrote:That's not a very good example, since it doesn't show anything about how to use the result of the csvText call. If it's an iterable, the example should use it in a foreach loop. IMO the example should be a very simple realistic use case, for example printing the data in a user-friendly way: writefln("%s works as a %s and earns %d per year", ...);On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:Sounds good. What kind of example should be first? Maybe something that just gives you an itterable: auto records = csvText(text);Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.htmlDocumentation should have examples of basic usage at the top. It should be possible to figure out how to use a CSV parser within 10 seconds of looking at the docs.I think you're overcomplicating this. You already have a way of specifying the specific format of the CSV file (delimiters, quote characters) for the parser. Use the same idiom for the formatter. The formatter should accept data in the same format (datatypes) as what's emitted by the parser. If you use functional idioms (ranges), it ought to be possible to pipe it all together neatly. The specifics don't matter very much, but it should be possible to put some data (array of string arrays, array of structs or whatnot) into a .csv file in a single line of code. Surely this isn't hard? -- Best regards, Vladimir mailto:vladimir thecybershadow.netIt looks like this is only a CSV parser, but not a CSV formatter? That's pretty weird. It'd be like std.xml or std.json only having reading functionality.I have done quite a bit of CSV output and none of them have been the same. The only similarity is surrounding with quotes and escaping quotes. "\"" ~ text.replace("\"", "\"\"") ~ "\"" The problem is that how your data is stored and where it needs to go is different for each app. I don't see a reason to provide a output function where one must convert their data into a range of struct and specify an output range, when that conversion is the hard part. I would consider a function that does escaping/quoting only if the element needs it. But such a function couldn't work on an Input Range.
Oct 28 2011
dsimcha wrote:Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvCould csvText!Struct(...) be generalized to also support Tuples and (eventually) classes?
Oct 28 2011
On 2011-10-28 15:25, Piotr Szturmaj wrote:dsimcha wrote:Sounds more like a serialization library with a CSV archive type to me. -- /Jacob CarlborgFormal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvCould csvText!Struct(...) be generalized to also support Tuples and (eventually) classes?
Oct 28 2011
Jacob Carlborg wrote:On 2011-10-28 15:25, Piotr Szturmaj wrote:I suppose you misunderstood my intentions. Here's an example from the docs: string str = "Hello,65,63.63\nWorld,123,3673.562"; struct Layout { string name; int value; double other; } auto records = csvText!Layout(str); foreach(record; records) { writeln(record.name); writeln(record.value); writeln(record.other); } I want to occasionally write: alias Tuple!(string, int, double) Layout; // instead of struct It should be easy to extend, but it would be nice if we have had general construct in Phobos. So instead of is(Contents == struct) https://github.com/he-the-great/phobos/blob/csv/std/csv.d#L433 there should be isComposite!Contents and FieldTypeTuple should be generalized (if it isn't) to structs, classes and tuples. I don't know if Tuple as a struct doesn't return valid fields from FieldTypeTuple now.dsimcha wrote:Sounds more like a serialization library with a CSV archive type to me.Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvCould csvText!Struct(...) be generalized to also support Tuples and (eventually) classes?
Oct 28 2011
Le 28/10/2011 16:09, Piotr Szturmaj a écrit :Jacob Carlborg wrote:That would be great !On 2011-10-28 15:25, Piotr Szturmaj wrote:I suppose you misunderstood my intentions. Here's an example from the docs: string str = "Hello,65,63.63\nWorld,123,3673.562"; struct Layout { string name; int value; double other; } auto records = csvText!Layout(str); foreach(record; records) { writeln(record.name); writeln(record.value); writeln(record.other); } I want to occasionally write: alias Tuple!(string, int, double) Layout; // instead of struct It should be easy to extend, but it would be nice if we have had general construct in Phobos. So instead of is(Contents == struct) https://github.com/he-the-great/phobos/blob/csv/std/csv.d#L433 there should be isComposite!Contents and FieldTypeTuple should be generalized (if it isn't) to structs, classes and tuples. I don't know if Tuple as a struct doesn't return valid fields from FieldTypeTuple now.dsimcha wrote:Sounds more like a serialization library with a CSV archive type to me.Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvCould csvText!Struct(...) be generalized to also support Tuples and (eventually) classes?
Oct 28 2011
Piotr Szturmaj Wrote:dsimcha wrote:I restricted to struct since a class had to be new'd. I and then realized most likely it won't be the final destination. It is better just letting the user move the items over themselves. Tuples should just work since they are just struct. Will double check.Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvCould csvText!Struct(...) be generalized to also support Tuples and (eventually) classes?
Oct 28 2011
On 10/28/2011 9:18 AM, dsimcha wrote:Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvI'll kick this off with my review of the documentation (I'll review the implementation later): "Header may be provided as first line in file" -> "A header may be provided as first line in file"? csvText: Shouldn't heading be a generic range of strings instead of a string[]? Why should the order of the heading provided to csvText matter? csvText should simply rearrange its results. In csvNextToken: "The expected use of this would be to create a parser. And may also be useful when handling errors within a CSV file." Please re-word this sentence so it's grammatically correct.
Oct 28 2011
dsimcha Wrote:I'll kick this off with my review of the documentation (I'll review the implementation later): "Header may be provided as first line in file" -> "A header may be provided as first line in file"? csvText: Shouldn't heading be a generic range of strings instead of a string[]?I should revisit that. It will still need to be random access and possible some others, but doesn't need to be an array.Why should the order of the heading provided to csvText matter? csvText should simply rearrange its results. In csvNextToken: "The expected use of this would be to create a parser. And may also be useful when handling errors within a CSV file." Please re-word this sentence so it's grammatically correct.
Oct 28 2011
Forgot to answer one. dsimcha Wrote:Why should the order of the heading provided to csvText matter? csvText should simply rearrange its results.Because I do not store anymore information then requested. I must keep pumping out values as I receive them, so order can not be rearranged. With a struct, you must match the header to the order of fields. This is because I can not match it by type and requiring that your field names match the file heading limits what can be in a heading (spaces, punctuation). Providing specialization for associative arrays also looks like a good option to resolve this.
Oct 28 2011
On 2011-10-28 15:18, dsimcha wrote:Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvIf a header is specified, is it possible to iterate over the content as an associative array, something like this: string str = "a,b,c\nHello,65,63.63\nWorld,123,3673.562"; auto records = csvText(str, ["b"]); foreach (header, value ; records) {} Or accessing a value by header: foreach (record ; records) { auto value = record["a"]; auto val = record.b; // perhaps using opDispatch as well } -- /Jacob Carlborg
Oct 28 2011
Jacob Carlborg Wrote:If a header is specified, is it possible to iterate over the content as an associative array, something like this: string str = "a,b,c\nHello,65,63.63\nWorld,123,3673.562"; auto records = csvText(str, ["b"]); foreach (header, value ; records) {} Or accessing a value by header: foreach (record ; records) { auto value = record["a"]; auto val = record.b; // perhaps using opDispatch as well } -- /Jacob CarlborgIf I implemented it this way then the entire record would be read in when a header exists. auto records = csvText!(string[string])(str, ["b"]); Would that be acceptable?
Oct 28 2011
On 2011-10-28 19:07, Jesse Phillips wrote:Jacob Carlborg Wrote:That would be acceptable to me, don't know what how other would see it. -- /Jacob CarlborgIf a header is specified, is it possible to iterate over the content as an associative array, something like this: string str = "a,b,c\nHello,65,63.63\nWorld,123,3673.562"; auto records = csvText(str, ["b"]); foreach (header, value ; records) {} Or accessing a value by header: foreach (record ; records) { auto value = record["a"]; auto val = record.b; // perhaps using opDispatch as well } -- /Jacob CarlborgIf I implemented it this way then the entire record would be read in when a header exists. auto records = csvText!(string[string])(str, ["b"]); Would that be acceptable?
Oct 28 2011
On 10/28/11 10:18 AM, dsimcha wrote:Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvI like the idea of reading a CSV into a struct, or treating all fields as ints. But, in my experience, CSVs are never perfect... because humans fill them, and humans aren't perfect or don't know "Oh, of course I must fill only numbers in this column, otherwise programs will explode". So although it's a nice functionality, I think just returning everything as strings is more than enough.
Oct 28 2011
Ary Manzana Wrote:I like the idea of reading a CSV into a struct, or treating all fields as ints. But, in my experience, CSVs are never perfect... because humans fill them, and humans aren't perfect or don't know "Oh, of course I must fill only numbers in this column, otherwise programs will explode". So although it's a nice functionality, I think just returning everything as strings is more than enough.If you are dealing with incorrect data you must deal with it. If you expect everything to be an int, and it isn't what are you going to do? I should check how easy it is to recover from an error.
Oct 28 2011
On 10/28/11 1:14 PM, Jesse Phillips wrote:Ary Manzana Wrote:You are right. I see that you throw ConvException, because when some to!... fails it throws that. But you don't catch it. So you'd get something like "Can't convert 'hello' to an int", but you loose the information of which row and column caused the problem. So maybe catching it and throwing a more detailed exception... Also, sometimes people fill out numbers with some comments, like "1 apple", so maybe a less strict parsing for numbers might be good (as an option). Just some comments :)I like the idea of reading a CSV into a struct, or treating all fields as ints. But, in my experience, CSVs are never perfect... because humans fill them, and humans aren't perfect or don't know "Oh, of course I must fill only numbers in this column, otherwise programs will explode". So although it's a nice functionality, I think just returning everything as strings is more than enough.If you are dealing with incorrect data you must deal with it. If you expect everything to be an int, and it isn't what are you going to do? I should check how easy it is to recover from an error.
Oct 28 2011
Ary Manzana Wrote:You are right. I see that you throw ConvException, because when some to!... fails it throws that. But you don't catch it. So you'd get something like "Can't convert 'hello' to an int", but you loose the information of which row and column caused the problem. So maybe catching it and throwing a more detailed exception...I'll look into that.Also, sometimes people fill out numbers with some comments, like "1 apple", so maybe a less strict parsing for numbers might be good (as an option).Loose rules are hard to setup to make everyone happy. I think this use case should be left to the user. I just want to get the information out quickly. Adding a row count shouldn't be too bad.
Oct 28 2011
On 10/28/2011 6:18 AM, dsimcha wrote:Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvFirst off, mucho thanks to Jesse for writing this. It's an important module for Phobos. This is a review of the documentation: Include this link: http://en.wikipedia.org/wiki/Comma-separated_values so the programmer can learn more about CSV. "This parser will loosely follow the RFC-4180" How exactly will it differ from RFC-4180? "No exceptions are thrown due to incorrect CSV." What happens, then, when the CSV is incorrect? "csvText" Be more specific about what csvText() returns. I'm confused about the heading parameter to csvText(). "the Content of the input" What is that? There's no parameter named "Content". "RecordList" "Range which provides access to CSV Records and Fields." What is the Range? Do you mean RecordList is a Range? And what does "provides access" mean? If RecordList is a Range, why is it labeled a "List"? "Array of the heading contained in the file." Now I'm really confused as to what a heading is. I thought it was the first line? Now there's an array of them? No examples given for constructors. No documentation for front() or empty(). "Record" "Returned by a RecordList when Contents is a non-struct." RecordList is a struct, it does not return anything. No examples. "csvNextToken" Example?
Oct 28 2011
On 10/28/2011 12:18 PM, Walter Bright wrote:Formal review of Jesse Phillips's CSV parser moduleWalter, Asking for high precision documentation is fine. And since you face it ... What about having such a high precision for D2 language definition ? std.csv is pretty cool. No doubt, But shouldn't std.csv be based on std.lexer ? A+
Oct 29 2011
On Sat, 29 Oct 2011 15:27:22 -0700, bls wrote:On 10/28/2011 12:18 PM, Walter Bright wrote:You have to start somewhere.Formal review of Jesse Phillips's CSV parser moduleWalter, Asking for high precision documentation is fine. And since you face it ... What about having such a high precision for D2 language definition ?std.csv is pretty cool. No doubt, But shouldn't std.csv be based on std.lexer ? A+Well, std.lexer doesn't exist. And if it is reasonable to use one, then it could be done without changing the API. I think the database API stuff will have much more influence on what needs to be done with std.csv, but I think what is currently implemented must remain as is.
Oct 29 2011
On 10/29/2011 03:56 PM, Jesse Phillips wrote:There is somebody working on std.lexer (a generic lexer) (as far as I remember the project started as std.range stress test ) However, I have no problem to imagine that f.i. the Tokenizer will be part of the std.lexer module. :) Regarding std.database.. Please elaborate/ ATM I just can imagine that std.csv will be used for Import/Export.std.csv is pretty cool. No doubt, But shouldn't std.csv be based onWell, std.lexer doesn't exist. And if it is reasonable to use one, then it could be done without changing the API. I think the database API stuff will have much more influence on what needs to be done with std.csv, but I think what is currently implemented must remain as is.std.lexer ? A+
Oct 29 2011
On Sat, 29 Oct 2011 17:05:01 -0700, bls wrote:Regarding std.database.. Please elaborate/ ATM I just can imagine that std.csv will be used for Import/Export.CSV is tabular data, so it would make sense that you may have a similar interface as you would a database. csv.select("name").where!"a==4"("years") Or whatever the interface look like.
Oct 29 2011
On 10/29/2011 3:27 PM, bls wrote:What about having such a high precision for D2 language definition ?Specific suggestions are welcome. Patches are even more welcome.std.csv is pretty cool. No doubt, But shouldn't std.csv be based on std.lexer ?I don't see a similarity.
Oct 29 2011
Thank you all. I've made my first round of updates to the documentation, will be taking another pass and looking at adding to functionality. I'd like some help with Walters comments as described below. On Fri, 28 Oct 2011 12:18:47 -0700, Walter Bright wrote:On 10/28/2011 6:18 AM, dsimcha wrote:Thank you. Fun fact, many popular languages don't include a CSV parser,Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvFirst off, mucho thanks to Jesse for writing this. It's an important module for Phobos.This is a review of the documentation: Include this link: http://en.wikipedia.org/wiki/Comma-separated_values so the programmer can learn more about CSV.Done, added See Also section."This parser will loosely follow the RFC-4180" How exactly will it differ from RFC-4180?I do not know how to address this. After making this statement I list the criteria. I don't see the need to list where it differs: * Records terminate with LF, CR and not just CRLF. * Using Double Quotes or Comma can be customized. I feel that I summarized well enough that listing yet another would be redundant."No exceptions are thrown due to incorrect CSV." What happens, then, when the CSV is incorrect?Added list."csvText" Be more specific about what csvText() returns.You are right. csvText() return a RecordList, but I explain it as though it were the Range. I'll think about what detail here, any suggestions are welcome.I'm confused about the heading parameter to csvText().Well I hope some reorganizing and an added example will help."the Content of the input" What is that? There's no parameter named "Content".I've replace all instances of Content with Contents. It is a template parameter so hopefully this resolves the confusion."RecordList" "Range which provides access to CSV Records and Fields." What is the Range? Do you mean RecordList is a Range? And what does "provides access" mean? If RecordList is a Range, why is it labeled a "List"?Considering it is the first documentation line of RecordList, I don't see where the confusion on "what the range is." I've re-written the line, it's shorter. Ok, be prepared for confusion. RecordList is a range of records, it is not named RecordRange because Record is also a range, a range of fields. If you have any suggestions for a better name, such as Document or TabularData Please let me know what you'd prefer."Array of the heading contained in the file." Now I'm really confused as to what a heading is. I thought it was the first line? Now there's an array of them?The line did not say "headings," but I have change it to: "Heading from the input in array form."No examples given for constructors.I provide an example of using RecordList. Any example for the constructors would be the same or similar to the 7 other examples. Is an example needed for both constructors, and maybe an example for front too?No documentation for front() or empty().These are range primitives, those programming in D already know "returns the current element of the range." I find it more appropriate to describe what is returned by front() in the description of the range (many ranges have become private and have no documentation at all, not even their existence). I will of course add documentation, but recommendations on non-pointless documentation would be welcome. Or how to handle the different types which can be assigned to Contents."Record" "Returned by a RecordList when Contents is a non-struct." RecordList is a struct, it does not return anything.It is a range, are we not allowed to talk of them as though they were high-level concepts? Do I leave off where you'd obtain a Record (it is private). Should I say "Returned by an instance of RecordList when calling front?"No examples.It is private, removing doc comment for constructor."csvNextToken" Example?Added. Hopefully I addressed most of the documentation. Keep them coming.
Oct 29 2011
On Saturday, October 29, 2011 22:33:30 Jesse Phillips wrote:CSV parser libraries exist for Java (IIRC Apache has one), but there definitely isn't one in Java's standard library. CSV parsers are not the sort of thing that's generally found in a standard library from what I've seen. So, having one will make us abnormal (albeit in a good way). - Jonathan M DavisOn 10/28/2011 6:18 AM, dsimcha wrote: First off, mucho thanks to Jesse for writing this. It's an important module for Phobos.Thank you. Fun fact, many popular languages don't include a CSV parser,
Oct 29 2011
On Sat, 29 Oct 2011 17:19:44 -0700, Jonathan M Davis wrote:On Saturday, October 29, 2011 22:33:30 Jesse Phillips wrote:That is all I meant by it. A CSV parser does exist for every language, ini so people would use XML. It really is sad that instead of something solid they go the way of "discouraging."CSV parser libraries exist for Java (IIRC Apache has one), but there definitely isn't one in Java's standard library. CSV parsers are not the sort of thing that's generally found in a standard library from what I've seen. So, having one will make us abnormal (albeit in a good way). - Jonathan M DavisOn 10/28/2011 6:18 AM, dsimcha wrote: First off, mucho thanks to Jesse for writing this. It's an important module for Phobos.Thank you. Fun fact, many popular languages don't include a CSV parser,
Oct 29 2011
Le 30/10/2011 03:09, Jesse Phillips a écrit :That is all I meant by it. A CSV parser does exist for every language, ini so people would use XML. It really is sad that instead of something solid they go the way of "discouraging."Yes, especially since CSV files are still widely used and prove in many cases far more usable than XML or even JSON. Good luck importing JSON in Excel. Dude
Nov 05 2011
On 10/29/2011 3:33 PM, Jesse Phillips wrote:I saw the criteria listed, but I didn't see how it was connected with it being different from RFC-4180. Are those criteria the differences? If so, please clarify that it is."This parser will loosely follow the RFC-4180" How exactly will it differ from RFC-4180?I do not know how to address this. After making this statement I list the criteria. I don't see the need to list where it differs: * Records terminate with LF, CR and not just CRLF. * Using Double Quotes or Comma can be customized. I feel that I summarized well enough that listing yet another would be redundant.When it says "List" I think of a specific data structure, i.e. linked list. Since it is not, it shouldn't be labeled as one. How about simply "Records" ?"RecordList" "Range which provides access to CSV Records and Fields." What is the Range? Do you mean RecordList is a Range? And what does "provides access" mean? If RecordList is a Range, why is it labeled a "List"?Considering it is the first documentation line of RecordList, I don't see where the confusion on "what the range is." I've re-written the line, it's shorter. Ok, be prepared for confusion. RecordList is a range of records, it is not named RecordRange because Record is also a range, a range of fields. If you have any suggestions for a better name, such as Document or TabularData Please let me know what you'd prefer.An example would make this much, much clearer."Array of the heading contained in the file." Now I'm really confused as to what a heading is. I thought it was the first line? Now there's an array of them?The line did not say "headings," but I have change it to: "Heading from the input in array form."Is an example needed for both constructors, and maybe an example for front too?I know that when one is intimately familiar with the code, more examples seem redundant. They aren't to the new user!I suggest at a minimum that the docs for these say that they are part of the range interface. I want to press Andrei to write up a doc on "What's a Range" and then you can just provide a link to the right spot in that.No documentation for front() or empty().These are range primitives, those programming in D already know "returns the current element of the range." I find it more appropriate to describe what is returned by front() in the description of the range (many ranges have become private and have no documentation at all, not even their existence). I will of course add documentation, but recommendations on non-pointless documentation would be welcome. Or how to handle the different types which can be assigned to Contents.I think it is very important to be precise in the use of terms in technical documentation. A Range doesn't "return" anything either, though there is a method of Range which does. A range is a set of elements that can be iterated through."Record" "Returned by a RecordList when Contents is a non-struct." RecordList is a struct, it does not return anything.It is a range, are we not allowed to talk of them as though they were high-level concepts? Do I leave off where you'd obtain a Record (it is private). Should I say "Returned by an instance of RecordList when calling front?"
Oct 29 2011
Thank you for the response, only one more follow-up question. Records Record Don't these look easy to confuse? And funny if we didn't have type inference. Records records = csvText(str); foreach(Record record; records) ... Anyone want to complain about this suggestion? Record is private and can only be obtained from Records (is that appropriate phrasing?).
Oct 30 2011
On Sun, 30 Oct 2011 17:02:26 +0000, Jesse Phillips wrote:Thank you for the response, only one more follow-up question. Records RecordIf it's any consolation, I'm suffering from similar questions to myself in my naming of structs and methods for my database interface experiments. Since it is rather a short step from a generic 'CSV' interface to a database interface, you may be determining the terminology that I should use. I should really divert for a day and look at what you have done. It would be interesting to see if I could write a similar interface over your stuff. But there's so much to do! Steve
Oct 30 2011
On 10/30/2011 10:02 AM, Jesse Phillips wrote:Thank you for the response, only one more follow-up question. Records Record Don't these look easy to confuse? And funny if we didn't have type inference. Records records = csvText(str); foreach(Record record; records) ... Anyone want to complain about this suggestion? Record is private and can only be obtained from Records (is that appropriate phrasing?).I started using that convention in dmd, i.e. Objects is an array of Object, and after I got used to it I like it fine.
Oct 30 2011
On 10/29/2011 10:03 PM, Walter Bright wrote:I want to press Andrei to write up a doc on "What's a Range"As a D1 user planning to migrate, this would be very appreciated.
Oct 30 2011
On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.htmlChecked the new docs today. If I'm reading them right, the top example prints: "Fred works as a Fly and earns $4 per year" Is this a pop culture reference I'm not catching? An idea I had the other day was to include convenience presets for the more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...) -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Oct 31 2011
On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:No, not that I know of. Should it be? Should I go with a more professional like example?Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.htmlChecked the new docs today. If I'm reading them right, the top example prints: "Fred works as a Fly and earns $4 per year" Is this a pop culture reference I'm not catching?An idea I had the other day was to include convenience presets for the more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...)Well if Excel actually conforms to what it claims: http://office.microsoft.com/en-us/excel-help/excel-formatting-and- features-that-are-not-transferred-to-other-file-formats-HP010014105.aspx? CTT=5&origin=HP010099725#BM4 Which I doubt it does, then my parser already defaults to being able to read such formats. In my experience Excel sucks at CSV and follows no rules. The implementation I choose as default is the most common and any other style is just a butchafication and likely to be unreliable, usually stems from "my data has commas, so I'll use colon." Which is great until the data also has colon. But maybe that is just my experience.
Oct 31 2011
On Tuesday, November 01, 2011 01:28:07 Jesse Phillips wrote:On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:As I recall, part of the problem is that Excel's behavior with regards to CSV is locale-dependent. - Jonathan M DavisOn Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:No, not that I know of. Should it be? Should I go with a more professional like example?Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.htmlChecked the new docs today. If I'm reading them right, the top example prints: "Fred works as a Fly and earns $4 per year" Is this a pop culture reference I'm not catching?An idea I had the other day was to include convenience presets for the more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...)Well if Excel actually conforms to what it claims: http://office.microsoft.com/en-us/excel-help/excel-formatting-and- features-that-are-not-transferred-to-other-file-formats-HP010014105.aspx? CTT=5&origin=HP010099725#BM4 Which I doubt it does, then my parser already defaults to being able to read such formats. In my experience Excel sucks at CSV and follows no rules.
Oct 31 2011
On 2011-11-01 02:28, Jesse Phillips wrote:On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:Excel + CSV == Pain in the ass Excel uses different value delimiters as the default setting depending on the locale of the Excel application. -- /Jacob CarlborgOn Fri, 28 Oct 2011 16:18:27 +0300, dsimcha<dsimcha yahoo.com> wrote:No, not that I know of. Should it be? Should I go with a more professional like example?Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.htmlChecked the new docs today. If I'm reading them right, the top example prints: "Fred works as a Fly and earns $4 per year" Is this a pop culture reference I'm not catching?An idea I had the other day was to include convenience presets for the more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...)Well if Excel actually conforms to what it claims: http://office.microsoft.com/en-us/excel-help/excel-formatting-and- features-that-are-not-transferred-to-other-file-formats-HP010014105.aspx? CTT=5&origin=HP010099725#BM4 Which I doubt it does, then my parser already defaults to being able to read such formats. In my experience Excel sucks at CSV and follows no rules. The implementation I choose as default is the most common and any other style is just a butchafication and likely to be unreliable, usually stems from "my data has commas, so I'll use colon." Which is great until the data also has colon. But maybe that is just my experience.
Nov 02 2011
On 02.11.2011 09:02, Jacob Carlborg wrote:On 2011-11-01 02:28, Jesse Phillips wrote:Far more horrible than that -- in an old Excel from around 2000 the user interface used US settings for reading short dates, if the day of the month was 12 or less. Otherwise it uses the locale. But it always uses the locale for displaying them. So 11/1, 12/1, 13/1 gets changed to 1 Nov, 1 Dec, 13 Jan. I wrote a macro called "November 9" (9/11) for swapping them back if they were in the buggy range.On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:Excel + CSV == Pain in the ass Excel uses different value delimiters as the default setting depending on the locale of the Excel application.On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha<dsimcha yahoo.com> wrote:No, not that I know of. Should it be? Should I go with a more professional like example?Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.htmlChecked the new docs today. If I'm reading them right, the top example prints: "Fred works as a Fly and earns $4 per year" Is this a pop culture reference I'm not catching?An idea I had the other day was to include convenience presets for the more common flavours of CSV formats, e.g.: csvText!CSVFormat_Excel(...)Well if Excel actually conforms to what it claims: http://office.microsoft.com/en-us/excel-help/excel-formatting-and- features-that-are-not-transferred-to-other-file-formats-HP010014105.aspx? CTT=5&origin=HP010099725#BM4 Which I doubt it does, then my parser already defaults to being able to read such formats. In my experience Excel sucks at CSV and follows no rules. The implementation I choose as default is the most common and any other style is just a butchafication and likely to be unreliable, usually stems from "my data has commas, so I'll use colon." Which is great until the data also has colon. But maybe that is just my experience.
Nov 11 2011
On 2011-11-11 12:50, Don wrote:Far more horrible than that -- in an old Excel from around 2000 the user interface used US settings for reading short dates, if the day of the month was 12 or less. Otherwise it uses the locale. But it always uses the locale for displaying them. So 11/1, 12/1, 13/1 gets changed to 1 Nov, 1 Dec, 13 Jan. I wrote a macro called "November 9" (9/11) for swapping them back if they were in the buggy range.Haha that's just insane. -- /Jacob Carlborg
Nov 11 2011
On Fri, 11 Nov 2011 13:50:52 +0200, Don <nospam nospam.com> wrote:Far more horrible than that -- in an old Excel from around 2000 the user interface used US settings for reading short dates, if the day of the month was 12 or less. Otherwise it uses the locale. But it always uses the locale for displaying them. So 11/1, 12/1, 13/1 gets changed to 1 Nov, 1 Dec, 13 Jan. I wrote a macro called "November 9" (9/11) for swapping them back if they were in the buggy range.Wow. Reminds me of this: http://thedailywtf.com/Articles/TransAtlantic-Time-Trap.aspx -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Nov 11 2011
On Tue, 01 Nov 2011 03:28:07 +0200, Jesse Phillips <jessekphillips+d gmail.com> wrote:On Mon, 31 Oct 2011 19:21:02 +0200, Vladimir Panteleev wrote:I guess such things are really subjective, but personally to me it does seem a bit silly, and out-of-place from the documentation of a language that aims to be the successor of C++. Have you decided about whether you'll be adding a formatter? Regardless of that, IMO the naming of "csvText" is not ideal. What does the "text" part imply - isn't CSV inherently a text format? How about "csvReader", which would also disambiguate it from a CSV formatter? -- Best regards, Vladimir mailto:vladimir thecybershadow.netOn Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha yahoo.com> wrote:No, not that I know of. Should it be? Should I go with a more professional like example?Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.htmlChecked the new docs today. If I'm reading them right, the top example prints: "Fred works as a Fly and earns $4 per year" Is this a pop culture reference I'm not catching?
Nov 02 2011
On Wed, 02 Nov 2011 15:11:24 +0200, Vladimir Panteleev wrote:I guess such things are really subjective, but personally to me it does seem a bit silly, and out-of-place from the documentation of a language that aims to be the successor of C++.I'll make it a pop culture reference... kidding.Regardless of that, IMO the naming of "csvText" is not ideal. What does the "text" part imply - isn't CSV inherently a text format? How about "csvReader", which would also disambiguate it from a CSV formatter?Sounds reasonable to me. CSV could be applied to other things, not that it ever is. I started making the implementation very generic and had to cut back especially since I couldn't test it all. I had even played with the idea of being able to just return slices of a range of things https://github.com/he-the-great/JPDLibs/blob/separator/csv/csv.d but stopped its development in favor of something that work with input ranges and could get into Phobos.Have you decided about whether you'll be adding a formatter?Yes I'll give this a try, but I'm going to concentrate on documentation of what is there.
Nov 02 2011
On Fri, 28 Oct 2011 09:18:27 -0400, dsimcha wrote:Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvI've done another update this covers: - Documentation more complete - A provided heading can be any InputRange of string - csvText => csvReader - RecordList => Records - csvReader no longer takes a Malformed enum, and instead provides customization to comma and quote. Tomorrow I will try and complete one or more of the following with this priority: - Have row/column information for thrown exceptions. - Support [string][string] Contents. - Write a CSV formatter. Support for range of struct, range of [string] [string], and range of string[].
Nov 05 2011
And only one pending request, csvFormatter. I am still considering how I want to approach this. I don't want to rush it in so I don't plan on having it complete before voting starts. It has been fun and I look forward to the verdict. On Sat, 05 Nov 2011 23:37:43 +0000, Jesse Phillips wrote:On Fri, 28 Oct 2011 09:18:27 -0400, dsimcha wrote:Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvI've done another update this covers: - Documentation more complete - A provided heading can be any InputRange of string - csvText => csvReader - RecordList => Records - csvReader no longer takes a Malformed enum, and instead provides customization to comma and quote. Tomorrow I will try and complete one or more of the following with this priority: - Have row/column information for thrown exceptions. - Support [string][string] Contents. - Write a CSV formatter. Support for range of struct, range of [string] [string], and range of string[].
Nov 06 2011
On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips <jessekphillips+d gmail.com> wrote:And only one pending request, csvFormatter. I am still considering how I want to approach this. I don't want to rush it in so I don't plan on having it complete before voting starts.My primary use of csv files is as a method of saving data to later graph / view / format in Excel. i.e. csvWrite("filename",["x","y","z"],x,y,z); where x, y, z are ranges with possibly different lengths and/or types. I also have some API design concerns. Primarily, most of the type names are overly broad and unclear. For example: Record, Records, Malformed could just as easily be part of a database API. And because of this, not only does one reading the code not know what the object is, but if namespace conflicts can occur, then code reviewers will probably make the wrong assumption half the time as to what Record is and not bother looking it up. Furthermore, Records and Record, are essentially internal ranges/structs returned by csvReader. In the tradition of std.algorithms these should be defined inside the csvReader function (preferably) or as CsvReader / CsvReader.Record.
Nov 07 2011
On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips <jessekphillips+d gmail.com> wrote:I don't want to do anything with files, I'm thinking an output buffer makes the most sense (should be able to have such a buffer write to a file). The x,y,z seems interesting but what about the poor sole who has a range of struct { auto x,y,z; } If I'm going to make a csvFormatter than making sure something like csvFormatter(csvReader(txt)); works is number one priority. But you have added a style to consider when building the interface.And only one pending request, csvFormatter. I am still considering how I want to approach this. I don't want to rush it in so I don't plan on having it complete before voting starts.My primary use of csv files is as a method of saving data to later graph / view / format in Excel. i.e. csvWrite("filename",["x","y","z"],x,y,z); where x, y, z are ranges with possibly different lengths and/or types.I also have some API design concerns. Primarily, most of the type names are overly broad and unclear. For example: Record, Records, Malformed could just as easily be part of a database API. And because of this, not only does one reading the code not know what the object is, but if namespace conflicts can occur, then code reviewers will probably make the wrong assumption half the time as to what Record is and not bother looking it up.The reason it is name like what you would find in a database, is because they are the same thing. It is a row of data. In past discussion about existing name clashes it was concluded that was what modules, static, and named imports wore for. Malformed, I just don't have any clue what to actually call it. No CSVMalformed doesn't help. But then you go on to the point below, that you shouldn't even see them.Furthermore, Records and Record, are essentially internal ranges/structs returned by csvReader. In the tradition of std.algorithms these should be defined inside the csvReader function (preferably) or as CsvReader / CsvReader.Record.I was going to explain how wrong you are and how I tried having all the options part of csvReader but it just muddies up the clean API when customizing something. But actually I already found the solution with my previous updates and incorrectly chose to move Malformed customization out. Long story short the problems I see: * The documentation still needs to exist and will be a little harder to do without these * Records contains the heading field, this is abnormal for an InputRange and means it won't show as an auto link if it becomes a hidden range. * Currently you can find exactly which function call result in a specific error, if I hide the ranges you don't. I'm expecting that introducing CSVFormatter will require another review, at that time I can try making two forms of documentation to see which is preferred. Thank you! Very useful food for thought.
Nov 10 2011
On Friday, November 11, 2011 03:17:10 Jesse Phillips wrote:Actually, I'd argue that Row would be better than Record, since it _is_ a row in a table. Personally, I'd find it to be more immediately clear that way. With Record, I have to figure out what the heck it is a record of, whereas Row is immediately obvious in this context. - Jonathan M DavisI also have some API design concerns. Primarily, most of the type names are overly broad and unclear. For example: Record, Records, Malformed could just as easily be part of a database API. And because of this, not only does one reading the code not know what the object is, but if namespace conflicts can occur, then code reviewers will probably make the wrong assumption half the time as to what Record is and not bother looking it up.The reason it is name like what you would find in a database, is because they are the same thing. It is a row of data. In past discussion about existing name clashes it was concluded that was what modules, static, and named imports wore for.
Nov 11 2011
On Fri, 11 Nov 2011 02:17:40 -0800, Jonathan M Davis wrote:Actually, I'd argue that Row would be better than Record, since it _is_ a row in a table. Personally, I'd find it to be more immediately clear that way. With Record, I have to figure out what the heck it is a record of, whereas Row is immediately obvious in this context. - Jonathan M DavisIt _is_ a record of data, just as much as it is a row in a table. The RFC I reference continuously refers to records and never even says "row."
Nov 11 2011
On Friday, November 11, 2011 15:25:29 Jesse Phillips wrote:On Fri, 11 Nov 2011 02:17:40 -0800, Jonathan M Davis wrote:Well, then you have a good argument for keeping it as Record. But spreadsheets are tables of rows and columns, and a CSV file is a spreadsheet. Personally, I find the term record overly vague and wouldn't use it for much of anything - not to mention that it makes me think of the analog music device rather than anything else. So, I don't particularly like the name Record. I wouldn't use it in reference to a DB either. I'd use the term row there as well. But others may not agree with me, and if the RFC uses the term record, then that's a definite argument for using Record instead of Row. - Jonathan M DavisActually, I'd argue that Row would be better than Record, since it _is_ a row in a table. Personally, I'd find it to be more immediately clear that way. With Record, I have to figure out what the heck it is a record of, whereas Row is immediately obvious in this context. - Jonathan M DavisIt _is_ a record of data, just as much as it is a row in a table. The RFC I reference continuously refers to records and never even says "row."
Nov 11 2011
On 11 November 2011 19:07, Jonathan M Davis <jmdavisProg gmx.com> wrote:On Friday, November 11, 2011 15:25:29 Jesse Phillips wrote:+1 for row... although I appreciate the RFC says record all over the place. That said though, upon reading it, I think that might be the worst RFC ever written ;)On Fri, 11 Nov 2011 02:17:40 -0800, Jonathan M Davis wrote:recordActually, I'd argue that Row would be better than Record, since it _is_ a row in a table. Personally, I'd find it to be more immediately clear that way. With Record, I have to figure out what the heck it is aWell, then you have a good argument for keeping it as Record. But spreadsheets are tables of rows and columns, and a CSV file is a spreadsheet. Personally, I find the term record overly vague and wouldn't use it for much of anything - not to mention that it makes me think of the analog music device rather than anything else. So, I don't particularly like the name Record. I wouldn't use it in reference to a DB either. I'd use the term row there as well. But others may not agree with me, and if the RFC uses the term record, then that's a definite argument for using Record instead of Row.of, whereas Row is immediately obvious in this context. - Jonathan M DavisIt _is_ a record of data, just as much as it is a row in a table. The RFC I reference continuously refers to records and never even says "row."
Nov 11 2011
On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips <jessekphillips+d gmail.com> wrote:On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:[snip]On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips <jessekphillips+d gmail.com> wrote:Modules, static and named imports are extremely useful tools for making disparate third party libraries work together. But Phobos is one (big) library, not a bunch of separate ones. We shouldn't ever need to use these tools to resolve Phobos name conflicts. In fact one of the (many) goals of D is to scale down to quick and dirty scripts. rdmd helps with this a lot, but the other piece of this puzzle is 'import std.all;' which doesn't work today. And while I don't think we should break existing code just to fix this problem, going forward, as we update and add libraries to Phobos, we need to strive to prevent name collisions. And, like std.file and std.stdio, I'd expect std.csv and std.databases to be used together often. Furthermore, your primary case for exposing these structs is for documentation purposes. As the user isn't expected to actually instantiate these structs (or at least not often), naming them CsvReader and CsvReader.Row is perfectly acceptable. Lastly, I'd like to draw your attention to std.algorithm.findSplit. findSplit has to return three ranges, which it does so using a Tuple. Records is simply a manual tuple of the headings and the row ranges, so why not leverage Phobos for this instead?I also have some API design concerns. Primarily, most of the type names are overly broad and unclear. For example: Record, Records, Malformed could just as easily be part of a database API. And because of this, not only does one reading the code not know what the object is, but if namespace conflicts can occur, then code reviewers will probably make the wrong assumption half the time as to what Record is and not bother looking it up.The reason it is name like what you would find in a database, is because they are the same thing. It is a row of data. In past discussion about existing name clashes it was concluded that was what modules, static, and named imports wore for. Malformed, I just don't have any clue what to actually call it. No CSVMalformed doesn't help. But then you go on to the point below, that you shouldn't even see them.Furthermore, Records and Record, are essentially internal ranges/structs returned by csvReader. In the tradition of std.algorithms these should be defined inside the csvReader function (preferably) or as CsvReader / CsvReader.Record.I was going to explain how wrong you are and how I tried having all the options part of csvReader but it just muddies up the clean API when customizing something. But actually I already found the solution with my previous updates and incorrectly chose to move Malformed customization out. Long story short the problems I see: * The documentation still needs to exist and will be a little harder to do without these * Records contains the heading field, this is abnormal for an InputRange and means it won't show as an auto link if it becomes a hidden range. * Currently you can find exactly which function call result in a specific error, if I hide the ranges you don't. I'm expecting that introducing CSVFormatter will require another review, at that time I can try making two forms of documentation to see which is preferred. Thank you! Very useful food for thought.
Nov 11 2011
On Sat, 12 Nov 2011 00:21:55 -0500, Robert Jacques wrote:On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips <jessekphillips+d gmail.com> wrote:I like the point on std.all, but isn't std.csv.Rows and std.csv.Row about the same as your example names? And if dmd could do partial matches (not exactly requesting such a feature) csv.Rows, csv.Row.On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:[snip]On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips <jessekphillips+d gmail.com> wrote:Modules, static and named imports are extremely useful tools for making disparate third party libraries work together. But Phobos is one (big) library, not a bunch of separate ones. We shouldn't ever need to use these tools to resolve Phobos name conflicts. In fact one of the (many) goals of D is to scale down to quick and dirty scripts. rdmd helps with this a lot, but the other piece of this puzzle is 'import std.all;' which doesn't work today. And while I don't think we should break existing code just to fix this problem, going forward, as we update and add libraries to Phobos, we need to strive to prevent name collisions. And, like std.file and std.stdio, I'd expect std.csv and std.databases to be used together often. Furthermore, your primary case for exposing these structs is for documentation purposes. As the user isn't expected to actually instantiate these structs (or at least not often), naming them CsvReader and CsvReader.Row is perfectly acceptable.I also have some API design concerns. Primarily, most of the type names are overly broad and unclear. For example: Record, Records, Malformed could just as easily be part of a database API. And because of this, not only does one reading the code not know what the object is, but if namespace conflicts can occur, then code reviewers will probably make the wrong assumption half the time as to what Record is and not bother looking it up.The reason it is name like what you would find in a database, is because they are the same thing. It is a row of data. In past discussion about existing name clashes it was concluded that was what modules, static, and named imports wore for. Malformed, I just don't have any clue what to actually call it. No CSVMalformed doesn't help. But then you go on to the point below, that you shouldn't even see them.Furthermore, Records and Record, are essentially internal ranges/structs returned by csvReader. In the tradition of std.algorithms these should be defined inside the csvReader function (preferably) or as CsvReader / CsvReader.Record.I was going to explain how wrong you are and how I tried having all the options part of csvReader but it just muddies up the clean API when customizing something. But actually I already found the solution with my previous updates and incorrectly chose to move Malformed customization out. Long story short the problems I see: * The documentation still needs to exist and will be a little harder to do without these * Records contains the heading field, this is abnormal for an InputRange and means it won't show as an auto link if it becomes a hidden range. * Currently you can find exactly which function call result in a specific error, if I hide the ranges you don't. I'm expecting that introducing CSVFormatter will require another review, at that time I can try making two forms of documentation to see which is preferred. Thank you! Very useful food for thought.Lastly, I'd like to draw your attention to std.algorithm.findSplit. findSplit has to return three ranges, which it does so using a Tuple. Records is simply a manual tuple of the headings and the row ranges, so why not leverage Phobos for this instead?Records requires access to the header for building the associative array. So while I still could return both, the range would still be holding the header. If you think it would make an improvement then you'd probably want to update your vote to no. Namely what I would see coming from that: auto records = csvReader(txt); records.header ...; foreach(row; records.rows) { ... }
Nov 15 2011
On Tue, 15 Nov 2011 22:30:51 -0500, Jesse Phillips <jessekphillips+d gmail.com> wrote:On Sat, 12 Nov 2011 00:21:55 -0500, Robert Jacques wrote:[snip]On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips <jessekphillips+d gmail.com> wrote:On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips <jessekphillips+d gmail.com> wrote:Yes and no. Yes, from the point of view of reading 'std.csv.Rows' in existing code. No, from the point of view writing code. You're much more likely to write just 'Rows' and then have to go through a quick-compile-fix-compile loop. Worse, consider the people using the colliding std.databases.Row struct, which now have to use the fully qualified name. I don't feel that overly succinct names are important to std.csv, as most users will be using 'auto records = csvReader(txt);' and will never have to type out std.csv.Rows or csvRows or whatever.Modules, static and named imports are extremely useful tools for making disparate third party libraries work together. But Phobos is one (big) library, not a bunch of separate ones. We shouldn't ever need to use these tools to resolve Phobos name conflicts. In fact one of the (many) goals of D is to scale down to quick and dirty scripts. rdmd helps with this a lot, but the other piece of this puzzle is 'import std.all;' which doesn't work today. And while I don't think we should break existing code just to fix this problem, going forward, as we update and add libraries to Phobos, we need to strive to prevent name collisions. And, like std.file and std.stdio, I'd expect std.csv and std.databases to be used together often. Furthermore, your primary case for exposing these structs is for documentation purposes. As the user isn't expected to actually instantiate these structs (or at least not often), naming them CsvReader and CsvReader.Row is perfectly acceptable.I like the point on std.all, but isn't std.csv.Rows and std.csv.Row about the same as your example names?And if dmd could do partial matches (not exactly requesting such a feature) csv.Rows, csv.Row.Thank you for explaining. I wasn't seeing the implementation efficiency that comes from the custom struct approach.Lastly, I'd like to draw your attention to std.algorithm.findSplit. findSplit has to return three ranges, which it does so using a Tuple. Records is simply a manual tuple of the headings and the row ranges, so why not leverage Phobos for this instead?Records requires access to the header for building the associative array. So while I still could return both, the range would still be holding the header. If you think it would make an improvement then you'd probably want to update your vote to no. Namely what I would see coming from that: auto records = csvReader(txt); records.header ...; foreach(row; records.rows) { ... }
Nov 15 2011
On Tue, 15 Nov 2011 23:12:57 -0500, Robert Jacques wrote:Thank you for explaining. I wasn't seeing the implementation efficiency that comes from the custom struct approach.No, thank you. It is good to have someone fighting a different opinion. It is because of you we can have this bike-shedding and not even have shed :D. Had the hiding of both structs not worked so well I would have changed it (as I would have thought on it more, realized it was two against zero, and wouldn't be distracted with the idea of simplifying the docs)
Nov 15 2011
On 10/28/2011 9:18 AM, dsimcha wrote:Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csvI apologize. I just noticed that I forgot to put the end date in this message. The review of the CSV parser runs the standard two weeks, until November 11.
Nov 06 2011
On Friday, October 28, 2011 09:18:27 dsimcha wrote:Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csv- Nitpick: You should probably proofread the module documentation again. It has several mispellings and grammar errors. - The functions and types mentioned in the documentation should use LREF so that they're links to their documentation. - Shouldn't the term "header" be used rather than "heading?" So, you'd have HeaderMismatchException instead of HeadingMismatchException. Usually, the term heading is only used for stuff like the heading of an article. Usually, the term header is used for stuff like files. RFC 4180 uses the term header but never heading. So, unless I'm missing something, I really think that all of the instances of heading (both in the documentation and in the API) should be changed to header. - I'd argue against Separator in csvReader being char by default. Individual characters should generally be dchar, not char. Using char by itself is almost always bad practice IMHO. - I'd say that csvReader should say that it returns a Records struct which is an input range rather than saying that it provides std.range.isInputRange. We just don't talk that way about ranges normally. We say that something is a particular range type, not that it provides the API for a particular range type. If you want to make "is an input range" a link to std.range.isInputRange, that's fine, but what's currently there is not how ranges are typically referred to. I'd also suggest that you on the range functions where you say that they're part of the std.range.isInputRange interface, you should use the term API instead of interface, since interface has a specific meaning in the language which has nothing to do with this context. - Why does csvNextToken take an Appender!(char[])? That seems overly restrictive. I'd argue that it should take an output range of dchar (which Appender!(char[]) is). Then it's not restricted to Appender, and it's not restricted to char[]. - Also, the fact that you have isSomeChar!(ElementType!Range) concerns me somewhat. Normally, I'd expect something more like is(ElementType!Range == dchar). As it is, you could have a range of char or wchar, which could do funny things. Phobos doesn't normally support that, and I'd be concerned about what would happen if someone tried. ElementType!Range will be dchar for all string types, so they'll work fine either way, but programmers who are overly adventurous or ignorant may try and use a char or wchar range, and I'm willing to be that there would be problems with that. Just all around, I'd advise checking that the ranges are ranges of dchar and using dchar for the separators rather than char. - You should have unit tests testing ranges _other_ than string. The fact that neither wstring nor dstring is tested is worrisome, and ideally, you'd do stuff like call filter!"true" on a string so that you have an input range which isn't a string. Without at least some of that, I'd be very worried that your code can't actually handle generic ranges of dchar and really only works with string. - You should have some unit tests with unicode values - preferably some which are multiple code units for both char and wchar. For instance, std.string uses \U00010143 in some of its tests, because it's 4 code units in UTF-8 and 2 in UTF-16. The complete lack of testing with unicode is worrisome in the same way that only testing with string is worrisome. It's too easy to screw something up such that it happens to work with string and ASCII but not in general. So, please add tests which involve unicode - not only in the data but in the separators. - I find it somewhat odd that you would use a pointer to the range in Record rather than using a slice, but I'd have to examine it in more detail to see whether that's actually necessary or not. But in general, I'd argue that a slice should be used rather than a pointer if it can be. - Jonathan M Davis P.S. Just as a note, since you seem to like not using braces if you don't need to for if statements and the like, I though that I'd point out that try and catch don't require braces in D if htey contain only one statement. You have several one line catches which use braces when they don't need to. I'm not arguing that you should or shouldn't change it, but I thought that I would point out that the braces aren't necessary in case you didn't know (since they _are_ necessary in other languages).
Nov 10 2011
Thank you, lots of stuff to ponder and comment. On Thu, 10 Nov 2011 02:10:28 -0800, Jonathan M Davis wrote:On Friday, October 28, 2011 09:18:27 dsimcha wrote:Err, I'll try, use some of the great tips you gave me :)Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread. Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html Code: https://github.com/he-the-great/phobos/tree/csv- Nitpick: You should probably proofread the module documentation again. It has several mispellings and grammar errors.- The functions and types mentioned in the documentation should use LREF so that they're links to their documentation.I'll see if I can find them all while proofing.- Shouldn't the term "header" be used rather than "heading?" So, you'd have HeaderMismatchException instead of HeadingMismatchException. Usually, the term heading is only used for stuff like the heading of an article. Usually, the term header is used for stuff like files. RFC 4180 uses the term header but never heading. So, unless I'm missing something, I really think that all of the instances of heading (both in the documentation and in the API) should be changed to header.Used both in development, then chose one, guess I got it wrong. Changing.- I'd argue against Separator in csvReader being char by default. Individual characters should generally be dchar, not char. Using char by itself is almost always bad practice IMHO.The only reason I put defaults on it was because it wouldn't choose the value for it from my default values. Otherwise it works with what you give it, be it char, wchar, dchar.- I'd say that csvReader should say that it returns a Records struct which is an input range rather than saying that it provides std.range.isInputRange. We just don't talk that way about ranges normally. We say that something is a particular range type, not that it provides the API for a particular range type. If you want to make "is an input range" a link to std.range.isInputRange, that's fine, but what's currently there is not how ranges are typically referred to. I'd also suggest that you on the range functions where you say that they're part of the std.range.isInputRange interface, you should use the term API instead of interface, since interface has a specific meaning in the language which has nothing to do with this context.Thanks.- Why does csvNextToken take an Appender!(char[])? That seems overly restrictive. I'd argue that it should take an output range of dchar (which Appender!(char[]) is). Then it's not restricted to Appender, and it's not restricted to char[].Appender is used so that the exception thrown can contain the consumed data. Thinking again on this, I suppose I can catch the exception add the data and rethrow. Sounds pretty good actually.- Also, the fact that you have isSomeChar!(ElementType!Range) concerns me somewhat. Normally, I'd expect something more like is(ElementType!Range == dchar). As it is, you could have a range of char or wchar, which could do funny things. Phobos doesn't normally support that, and I'd be concerned about what would happen if someone tried. ElementType!Range will be dchar for all string types, so they'll work fine either way, but programmers who are overly adventurous or ignorant may try and use a char or wchar range, and I'm willing to be that there would be problems with that. Just all around, I'd advise checking that the ranges are ranges of dchar and using dchar for the separators rather than char.I was of a different mindset of the time. You are probably right I should prevent someone from eating their leg off.- You should have unit tests testing ranges _other_ than string. The fact that neither wstring nor dstring is tested is worrisome, and ideally, you'd do stuff like call filter!"true" on a string so that you have an input range which isn't a string. Without at least some of that, I'd be very worried that your code can't actually handle generic ranges of dchar and really only works with string.Did add a wchar input range test, but now it should just be an dchar input range test since wchar ranges won't be allowed.- You should have some unit tests with unicode values - preferably some which are multiple code units for both char and wchar. For instance, std.string uses \U00010143 in some of its tests, because it's 4 code units in UTF-8 and 2 in UTF-16. The complete lack of testing with unicode is worrisome in the same way that only testing with string is worrisome. It's too easy to screw something up such that it happens to work with string and ASCII but not in general. So, please add tests which involve unicode - not only in the data but in the separators.Didn't create any new tests, just added unicode in to input and as separators into old tests.- I find it somewhat odd that you would use a pointer to the range in Record rather than using a slice, but I'd have to examine it in more detail to see whether that's actually necessary or not. But in general, I'd argue that a slice should be used rather than a pointer if it can be.Record requires modifying the range, for a true input range this is not a problem, but strings and some other struct based ranges are implicit forward ranges and save upon passing. Slicing doesn't exist on input ranges, though I'd almost argue [] should take the place of save() in a forward range.- Jonathan M Davis P.S. Just as a note, since you seem to like not using braces if you don't need to for if statements and the like, I though that I'd point out that try and catch don't require braces in D if htey contain only one statement. You have several one line catches which use braces when they don't need to. I'm not arguing that you should or shouldn't change it, but I thought that I would point out that the braces aren't necessary in case you didn't know (since they _are_ necessary in other languages).Thanks, hopefully I'm consistent in that regard and I'll just leave it. I knew of try, but not catch.
Nov 10 2011