digitalmars.D - Add duration parsing to core.time?
- Justin Whear (28/28) Aug 20 2013 While working on a configuration file parser, I found myself trying to
- Jonathan M Davis (8/40) Aug 20 2013 If such a function were added, it would be fromString on Duration, and i...
- Jonathan M Davis (52/92) Aug 20 2013 ion
- Brad Anderson (12/26) Aug 21 2013 I agree completely and can speak from experience. We used
While working on a configuration file parser, I found myself trying to decide which units to use for various time variables (e.g. `expireInterval`) which is silly because we have an excellent Duration structure in core.time. I was pleased to discover that Duration has a toString method which prints a nice, human-readable description. Unfortunately, there appears to be no corresponding parse method. Turns out that it's surprisingly easy to write thanks to the existing functionality in std.conv: http://dpaste.dzfl.pl/1500b834 It appears that DPaste stumbles over the unicode 'μs' in the units enum, so here's a test invocation and output: $ dmd -unittest test_duration.d && ./test_duration '12 hours, 30 minutes' '1w2d20m12h5m2s' 12 hours and 30 minutes 1 week, 2 days, 12 hours, 25 minutes, and 2 secs I've made the implementation more flexible than simply parsing the very standard output of Duration.toString by adding more unit synonyms and making whitespace, commas, and 'and' optional. All this really requires is a sequence of digits followed by a unit name, possibly repeating; leading to the very compact form used in '1w2d20m12h5m2s'. All validation is performed by the two calls to std.conv.parse, so invalid strings should fail (e.g. 'four madeupunits'). One possible improvement is to support written-out numbers such as "seven" and "forty-two", but I suspect this would entail a much more involved implementation. Thoughts on including something like this core.time? My thought is that Duration could have a `this(string)` with a non-consuming version of this function for automatic to! support in addition to providing parse. Justin
Aug 20 2013
On Tuesday, August 20, 2013 17:57:19 Justin Whear wrote:While working on a configuration file parser, I found myself trying to decide which units to use for various time variables (e.g. `expireInterval`) which is silly because we have an excellent Duration structure in core.time. I was pleased to discover that Duration has a toString method which prints a nice, human-readable description. Unfortunately, there appears to be no corresponding parse method. Turns out that it's surprisingly easy to write thanks to the existing functionality in std.conv: http://dpaste.dzfl.pl/1500b834 It appears that DPaste stumbles over the unicode 'μs' in the units enum, so here's a test invocation and output: $ dmd -unittest test_duration.d && ./test_duration '12 hours, 30 minutes' '1w2d20m12h5m2s' 12 hours and 30 minutes 1 week, 2 days, 12 hours, 25 minutes, and 2 secs I've made the implementation more flexible than simply parsing the very standard output of Duration.toString by adding more unit synonyms and making whitespace, commas, and 'and' optional. All this really requires is a sequence of digits followed by a unit name, possibly repeating; leading to the very compact form used in '1w2d20m12h5m2s'. All validation is performed by the two calls to std.conv.parse, so invalid strings should fail (e.g. 'four madeupunits'). One possible improvement is to support written-out numbers such as "seven" and "forty-two", but I suspect this would entail a much more involved implementation. Thoughts on including something like this core.time? My thought is that Duration could have a `this(string)` with a non-consuming version of this function for automatic to! support in addition to providing parse.If such a function were added, it would be fromString on Duration, and it would accept the exact format that toString uses (and only that format). Anything more complicated would have to be part of a functionality relating to user-defined format strings, which I haven't finished yet. That'll probably end up in std.datetime.format at some point after I've finished splitting std.datetime. - Jonathan M Davis
Aug 20 2013
On Tuesday, August 20, 2013 15:35:20 Jonathan M Davis wrote:On Tuesday, August 20, 2013 17:57:19 Justin Whear wrote:toWhile working on a configuration file parser, I found myself trying=iondecide which units to use for various time variables (e.g. `expireInterval`) which is silly because we have an excellent Durat=astructure in core.time. I was pleased to discover that Duration has=urnstoString method which prints a nice, human-readable description. Unfortunately, there appears to be no corresponding parse method. T=nits enum,out that it's surprisingly easy to write thanks to the existing functionality in std.conv: http://dpaste.dzfl.pl/1500b834 =20 It appears that DPaste stumbles over the unicode '=CE=BCs' in the u=nutes'so here's a test invocation and output: =20 $ dmd -unittest test_duration.d && ./test_duration '12 hours, 30 mi=very'1w2d20m12h5m2s' 12 hours and 30 minutes 1 week, 2 days, 12 hours, 25 minutes, and 2 secs =20 I've made the implementation more flexible than simply parsing the =ndstandard output of Duration.toString by adding more unit synonyms a=iresmaking whitespace, commas, and 'and' optional. All this really requ=;is a sequence of digits followed by a unit name, possibly repeating=eleading to the very compact form used in '1w2d20m12h5m2s'. All validation is performed by the two calls to std.conv.parse, so invalid strings should fail (e.g. 'four madeupunits'). =20 One possible improvement is to support written-out numbers such as "seven" and "forty-two", but I suspect this would entail a much mor=thatinvolved implementation. =20 Thoughts on including something like this core.time? My thought is =f thisDuration could have a `this(string)` with a non-consuming version o=d itfunction for automatic to! support in addition to providing parse.=20 If such a function were added, it would be fromString on Duration, an=would accept the exact format that toString uses (and only that forma=t).Anything more complicated would have to be part of a functionality re=latingto user-defined format strings, which I haven't finished yet. That'll=probably end up in std.datetime.format at some point after I've finis=hedsplitting std.datetime.And actually, I really don't like the idea of adding a function for par= sing=20 the result of Duration's toString. Duration's toString was intended for= human=20 legibility, not for being written out and the read in again. std.dateti= me has=20 several to*String functions with corresponding from*String functions, b= ut=20 they're all in standard formats, whereas Duration's toString is not. So= , if=20 any kind of from*String is going to be added to Duration, then a standa= rd=20 format needs to be used and a corresponding to*String function created.= There=20 are several standard formats for dates and times, so I assume that ther= e's one=20 for durations as well, but I'd have to look into it. Preferably somethi= ng from=20 ISO 8601 would be used if it has a standard string format for durations= , since=20 that's the main ISO standard for time-related stuff. In general, I'm very much opposed to functions which try and parse arbi= trary=20 strings as they're incredibly error-prone and have to guess at what you= mean.=20 In pretty much any case where the string was emitted by a computer in t= he first=20 place rather than a human, that's just plain sloppy, and ideally, a hum= an=20 would be required to put a string in a standard format when inputting i= t (or=20 input the values separately rather than as a string) in order to avoid=20= intepretation errors. - Jonathan M Davis
Aug 20 2013
On Wednesday, 21 August 2013 at 06:46:49 UTC, Jonathan M Davis wrote:In general, I'm very much opposed to functions which try and parse arbitrary strings as they're incredibly error-prone and have to guess at what you mean. In pretty much any case where the string was emitted by a computer in the first place rather than a human, that's just plain sloppy, and ideally, a human would be required to put a string in a standard format when inputting it (or input the values separately rather than as a string) in order to avoid intepretation errors. - Jonathan M DavisI agree completely and can speak from experience. We used wxWidget's wxDateTime class for years at work and its ParseDateTime which allows free format strings. It was a source of never ending problems for us until we finally stopped using it. The implementation was fine, it's just that dates are not amenable to unstructured reading. Date strings with locale information embedded in them may be doable but they are basically nonexistent. Date strings are a lot like string encodings. They are unsafe to use without knowing a definitive format/encoding.
Aug 21 2013