www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - text based file formats

reply Robert Schadek <rburners gmail.com> writes:
I complaint before that D and phobos needs more stuff.
But I can't do it all by myself, but I can ask for help.

So here it goes https://github.com/burner/textbasedfileformats

As on the tin, text based file formats is a library of SAX and 
DOM parsers for text based file formats.

I would like to get the following file formats in.

* json (JSON5) there is actually some code in there already
* xml, there is some code already, the old std.experimental.xml 
code
* yaml, maybe there is something in code.dlang.org to be reused
* toml, maybe there is something in code.dlang.org  to be reused
   * ini, can likely be parsed by the toml parser
* sdl, I know I know, but D uses it.

There are a few design guidelines I would like to adhere to.
* If it exists in phobos, use phobos
* have the DOM parser based on the sax parser
* no return by ref
* make it  safe and pure if possible (and its likely possible)
* share the std.sumtype type if possible (yaml, toml should work)
* no  nogc, this should eventually get into phobos

So stop talking, and start creating PR's.
For the project admin stuff, this will use github. There are 
milestones for the five formats, so please start creating the 
issues you want/can work on and start typing.
Dec 18 2022
next sibling parent Adam D Ruppe <destructionator gmail.com> writes:
On Sunday, 18 December 2022 at 15:56:38 UTC, Robert Schadek wrote:
 * xml, there is some code already, the old std.experimental.xml 
 code
my dom.d doesn't do the sax parser part but has its own advantages over the other things (including being continually maintained for over a decade, unlike the phobos things)
Dec 18 2022
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 19/12/2022 4:56 AM, Robert Schadek wrote:
 * xml, there is some code already, the old std.experimental.xml code
I've toyed with std.experimental.xml. I'm not convinced that it is a good code base for inclusion.
 * no return by ref
As a bit of a follow up of what we were talking about on BeerConf: Because these are not data structures, they won't own externally facing memory (thats the GC job). So these lifetimes issues with ref should never be encountered.
 * make it  safe and pure if possible (and its likely possible)
pure is always a worry for me, but yeah safe and ideally nothrow (if they are forgiving which they absolutely should be, there is no reason to throw an exception until its time to inspect it).
Dec 18 2022
parent reply Adrian Matoga <dlang.spam matoga.info> writes:
On Sunday, 18 December 2022 at 16:12:35 UTC, rikki cattermole 
wrote:
 * make it  safe and pure if possible (and its likely possible)
pure is always a worry for me, but yeah safe and ideally nothrow (if they are forgiving which they absolutely should be, there is no reason to throw an exception until its time to inspect it).
I frequently find it useful for a text data file parser to call a diagnostic callback instead of assuming some default behavior (whether that's forgiving, printing warnings, throwing or something else). With template callback parameters the parser can throw if the user wants it or stay pure nothrow if no action is required.
Dec 20 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/20/2022 1:51 PM, Adrian Matoga wrote:
 I frequently find it useful for a text data file parser to call a diagnostic 
 callback instead of assuming some default behavior (whether that's forgiving, 
 printing warnings, throwing or something else). With template callback 
 parameters the parser can throw if the user wants it or stay pure nothrow if
no 
 action is required.
Yes, sometimes I think this might be the right answer.
Dec 21 2022
prev sibling next sibling parent CM <celestialmachinist proton.me> writes:
On Sunday, 18 December 2022 at 15:56:38 UTC, Robert Schadek wrote:
 * sdl, I know I know, but D uses it.
Thank you for remembering it. I feel like I'm one of the few who prefer SDL to YAML, JSON, and the like.
Dec 18 2022
prev sibling next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 18 December 2022 at 15:56:38 UTC, Robert Schadek wrote:
 So stop talking, and start creating PR's.
 For the project admin stuff, this will use github. There are 
 milestones for the five formats, so please start creating the 
 issues you want/can work on and start typing.
If I were you I would join forces with Ilya and work on getting the mir libraries doing text-parsing integrated into Phobos.
Dec 19 2022
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/18/2022 7:56 AM, Robert Schadek wrote:
 So stop talking, and start creating PR's.
Yup! Curious why CSV isn't in the list. I encounter that a lot at tax time. https://en.wikipedia.org/wiki/Comma-separated_values Maybe just ask OpenAI?
Dec 19 2022
next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright wrote:
 Curious why CSV isn't in the list.
Maybe std.csv is already good enough?
Dec 19 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/19/2022 4:35 AM, Adam D Ruppe wrote:
 On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright wrote:
 Curious why CSV isn't in the list.
Maybe std.csv is already good enough?
LOL, learn something every day! I've even written my own, but it isn't very good.
Dec 19 2022
next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Mon, Dec 19, 2022 at 04:16:57PM -0800, Walter Bright via
Digitalmars-d-announce wrote:
 On 12/19/2022 4:35 AM, Adam D Ruppe wrote:
 On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright wrote:
 Curious why CSV isn't in the list.
Maybe std.csv is already good enough?
LOL, learn something every day! I've even written my own, but it isn't very good.
There's also my little experimental csv parser that was designed to be as fast as possible: https://github.com/quickfur/fastcsv However it can only handle input that fits in memory (using std.mmfile is one possible workaround), has a static limit on field sizes, and does not do validation. T -- Debian GNU/Linux: Cray on your desktop.
Dec 19 2022
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Tuesday, 20 December 2022 at 00:40:07 UTC, H. S. Teoh wrote:
 On Mon, Dec 19, 2022 at 04:16:57PM -0800, Walter Bright via 
 Digitalmars-d-announce wrote:
 On 12/19/2022 4:35 AM, Adam D Ruppe wrote:
 On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright 
 wrote:
 Curious why CSV isn't in the list.
Maybe std.csv is already good enough?
LOL, learn something every day! I've even written my own, but it isn't very good.
There's also my little experimental csv parser that was designed to be as fast as possible: https://github.com/quickfur/fastcsv However it can only handle input that fits in memory (using std.mmfile is one possible workaround), has a static limit on field sizes, and does not do validation. T
We use this at work with some light tweaks, itโ€™s done a lot work ๐Ÿ™‚
Dec 20 2022
next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Dec 20, 2022 at 07:46:36PM +0000, John Colvin via
Digitalmars-d-announce wrote:
[...]
 There's also my little experimental csv parser that was designed to
 be as fast as possible:
 
 	https://github.com/quickfur/fastcsv
 
 However it can only handle input that fits in memory (using std.mmfile
 is one possible workaround), has a static limit on field sizes, and does
 not do validation.
[...]
 We use this at work with some light tweaks, itโ€™s done a lot work ๐Ÿ™‚
Wow, I never expected it to be actually useful. :-P Good to know it's worth something! T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill
Dec 20 2022
prev sibling next sibling parent reply 9il <ilyayaroshenko gmail.com> writes:
On Tuesday, 20 December 2022 at 19:46:36 UTC, John Colvin wrote:
 On Tuesday, 20 December 2022 at 00:40:07 UTC, H. S. Teoh wrote:
 On Mon, Dec 19, 2022 at 04:16:57PM -0800, Walter Bright via 
 Digitalmars-d-announce wrote:
 On 12/19/2022 4:35 AM, Adam D Ruppe wrote:
 On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright 
 wrote:
 Curious why CSV isn't in the list.
Maybe std.csv is already good enough?
LOL, learn something every day! I've even written my own, but it isn't very good.
There's also my little experimental csv parser that was designed to be as fast as possible: https://github.com/quickfur/fastcsv However, it can only handle input that fits in memory (using std.mmfile is one possible workaround), has a static limit on field sizes, and does not do validation. T
We use this at work with some light tweaks, itโ€™s done a lot work ๐Ÿ™‚
It has already been replaced with [mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir is faster, SIMD accelerated, and supports numbers and timestamp recognition.
Dec 20 2022
next sibling parent Tejas <notrealemail gmail.com> writes:
On Wednesday, 21 December 2022 at 04:19:46 UTC, 9il wrote:
 On Tuesday, 20 December 2022 at 19:46:36 UTC, John Colvin wrote:
 On Tuesday, 20 December 2022 at 00:40:07 UTC, H. S. Teoh wrote:
 [...]
We use this at work with some light tweaks, itโ€™s done a lot work ๐Ÿ™‚
It has already been replaced with [mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir is faster, SIMD accelerated, and supports numbers and timestamp recognition.
Wow, I didn't even know `mir.csv` was a thing Thank you very much!!!
Dec 21 2022
prev sibling next sibling parent John Colvin <john.loughran.colvin gmail.com> writes:
On Wednesday, 21 December 2022 at 04:19:46 UTC, 9il wrote:
 On Tuesday, 20 December 2022 at 19:46:36 UTC, John Colvin wrote:
 On Tuesday, 20 December 2022 at 00:40:07 UTC, H. S. Teoh wrote:
 [...]
We use this at work with some light tweaks, itโ€™s done a lot work ๐Ÿ™‚
It has already been replaced with [mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir is faster, SIMD accelerated, and supports numbers and timestamp recognition.
Hah, so it has! Well anyway, it did do a lot of hard work for us for a long time, so thanks :)
Dec 21 2022
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/20/2022 8:19 PM, 9il wrote:
 It has already been replaced with 
 [mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir 
 is faster, SIMD accelerated, and supports numbers and timestamp recognition.
Propose this for Phobos?
Dec 21 2022
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 21 December 2022 at 04:19:46 UTC, 9il wrote:
 It has already been replaced with 
 [mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir
is faster, SIMD accelerated, and supports numbers and timestamp recognition.
Great work. Will this module be extracted into a separate package?
Dec 22 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/20/2022 11:46 AM, John Colvin wrote:
 We use this at work with some light tweaks, itโ€™s done a lot work ๐Ÿ™‚
Sweet!
Dec 21 2022
prev sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Tuesday, 20 December 2022 at 00:16:57 UTC, Walter Bright wrote:
 LOL, learn something every day! I've even written my own, but 
 it isn't very good.
Yeah, I wrote a csv module too back in... I think 2010, before Phobos had one. It is about 90 lines, still works. Nothing special but I actually kinda like it. https://github.com/adamdruppe/arsd/blob/master/csv.d
Dec 21 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/21/2022 6:27 AM, Adam D Ruppe wrote:
 On Tuesday, 20 December 2022 at 00:16:57 UTC, Walter Bright wrote:
 LOL, learn something every day! I've even written my own, but it isn't very
good.
Yeah, I wrote a csv module too back in... I think 2010, before Phobos had one. It is about 90 lines, still works. Nothing special but I actually kinda like it. https://github.com/adamdruppe/arsd/blob/master/csv.d
What this all means is Phobos could use a better one!
Dec 21 2022
prev sibling parent reply Robert Schadek <rburners gmail.com> writes:
 Curious why CSV isn't in the list. I encounter that a lot at 
 tax time.
As Adam said, std.csv is already there and its at least from my perspective okay enough. That being said, I liked how you quoted me here On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright wrote:
 On 12/18/2022 7:56 AM, Robert Schadek wrote:
 So stop talking, and start creating PR's.
Yup!
and replay, create an PR that puts it on the list ;-)
Dec 19 2022
parent Robert Schadek <rburners gmail.com> writes:
replay -> reply
Dec 19 2022
prev sibling parent bachmeier <no spam.net> writes:
On Sunday, 18 December 2022 at 15:56:38 UTC, Robert Schadek wrote:
 I complaint before that D and phobos needs more stuff.
 But I can't do it all by myself, but I can ask for help.

 So here it goes https://github.com/burner/textbasedfileformats

 As on the tin, text based file formats is a library of SAX and 
 DOM parsers for text based file formats.

 I would like to get the following file formats in.

 * json (JSON5) there is actually some code in there already
 * xml, there is some code already, the old std.experimental.xml 
 code
 * yaml, maybe there is something in code.dlang.org to be reused
 * toml, maybe there is something in code.dlang.org  to be reused
   * ini, can likely be parsed by the toml parser
 * sdl, I know I know, but D uses it.
A natural complement to this would be the functionality in https://github.com/eBay/tsv-utils I've created versions of the filter and select functions that take a string as input and return a string or string[] as output. It's a performant way to query text files. Most important, all the hard work is already done.
Dec 19 2022