www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - New Article: My Experience Porting Python Dateutil's Date Parser to D

reply Jack Stouffer <jack jackstouffer.com> writes:
Hello everyone,

I have spent the last two weeks porting the date string parsing 
functionality from the popular Python library, dateutil, to D. I 
have written about my experience here: 
http://jackstouffer.com/blog/porting_dateutil.html

The code and docs can be found here: 
https://github.com/JackStouffer/date-parser

reddit: 
https://www.reddit.com/r/programming/comments/49qdpt/my_experience_porting_python_dateutils_date/

Let me know what you think about the article and the code.

Thanks in advance.
Mar 09 2016
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/9/2016 1:55 PM, Jack Stouffer wrote:
 Hello everyone,

 I have spent the last two weeks porting the date string parsing functionality
 from the popular Python library, dateutil, to D. I have written about my
 experience here: http://jackstouffer.com/blog/porting_dateutil.html

 The code and docs can be found here:
https://github.com/JackStouffer/date-parser

 reddit:
 https://www.reddit.com/r/programming/comments/49qdpt/my_experience_porting_python_dateutils_date/


 Let me know what you think about the article and the code.

 Thanks in advance.
I haven't read the article yet, but you'll get more interest by putting a summary as the first comment on reddit.
Mar 09 2016
next sibling parent reply "H. S. Teoh via Digitalmars-d-announce" writes:
On Wed, Mar 09, 2016 at 02:12:42PM -0800, Walter Bright via
Digitalmars-d-announce wrote:
 On 3/9/2016 1:55 PM, Jack Stouffer wrote:
Hello everyone,

I have spent the last two weeks porting the date string parsing
functionality from the popular Python library, dateutil, to D. I have
written about my experience here:
http://jackstouffer.com/blog/porting_dateutil.html

The code and docs can be found here: https://github.com/JackStouffer/date-parser

reddit:
https://www.reddit.com/r/programming/comments/49qdpt/my_experience_porting_python_dateutils_date/


Let me know what you think about the article and the code.

Thanks in advance.
I haven't read the article yet, but you'll get more interest by putting a summary as the first comment on reddit.
I read the article. While I'm no Python expert (do have a little experience with it mainly through using SCons as a build system for my personal projects), I can totally sympathize with the annoyances of using a dynamically-typed language, as well as dodgy iterator designs like __next__. (I've not had to deal with __next__ in Python so far, but *have* worked with C/C++ code that basically iterates that way, and it's not pretty.) Totally agree that if you can convert something to D in about a week's worth of work, it's totally worth it. D is just a much more comfortable language to work in (to me, anyway -- this is highly subjective, obviously), and, provided you don't do anything silly, generally gives you better performance than many of the alternatives out there. Even when it doesn't perform the best without hand-tweaking, I'd still prefer it for general use, because of nice sanity features such as built-in unittests (now that I've gotten used to them, I sorely miss them in every other language!), sane template syntax, etc.. Nice article. T -- He who does not appreciate the beauty of language is not worthy to bemoan its flaws.
Mar 09 2016
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 9 March 2016 at 22:17:39 UTC, H. S. Teoh wrote:
 system for my personal projects), I can totally sympathize with 
 the annoyances of using a dynamically-typed language, as well 
 as dodgy iterator designs like __next__. (I've not had to deal 
 with __next__ in Python so far, but *have* worked with C/C++ 
 code that basically iterates that way, and it's not pretty.)
What is problematic with __next__ (Py3) and next (Py2)? It's a pretty straight forward standard iterator design and quite different from the table pointers C++ uses.
Mar 09 2016
parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 9 March 2016 at 23:31:04 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 9 March 2016 at 22:17:39 UTC, H. S. Teoh wrote:
 system for my personal projects), I can totally sympathize 
 with the annoyances of using a dynamically-typed language, as 
 well as dodgy iterator designs like __next__. (I've not had to 
 deal with __next__ in Python so far, but *have* worked with 
 C/C++ code that basically iterates that way, and it's not 
 pretty.)
What is problematic with __next__ (Py3) and next (Py2)? It's a pretty straight forward standard iterator design and quite different from the table pointers C++ uses.
I explain my grievances in the article.
Mar 09 2016
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 10 March 2016 at 00:29:46 UTC, Jack Stouffer wrote:
 It's a pretty straight forward standard iterator design and 
 quite different from the table pointers C++ uses.
I explain my grievances in the article.
They didn't make all that much sense to me, so I wondered what Theo's issues were. As in: real issues that have empirical significance. D ranges and Python's are regular iterators, nothing special. The oddball are C++ "iterators" that are pairs of pointers. Efficiency and semantic issues when it comes to iterator-implementation go both ways all depending on the application area. This is nothing new. People have known this for ages, as in decades. If you want fast you have to use a "next" iterator-implementation writing multiple elements directly to the buffer. This is what you do in signal processing.
Mar 10 2016
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
Just pointing out the obvious:

For the simple iterators/generators that run on a non-changing 
source you can basically break it up into:

1. iterators without lookahead
2. iterators with lookahead

Which is basically the same issues you deal with when 
implementing a lexer.

Python-style iterators/generators is basically the former. That 
comes with one set of advantages, but no lookahead. But lookahead 
frequently have cost penalties. There are many tradeoffs. And 
those tradeoffs become rather clear when you consider factors 
like:

1. mutating iterators
2. the size of the object
3. copyable iterators
4. concurrency/thread safety
5. progress with high computational cost
6. high computational cost for the value
7. sources with latency
8. skip functionality
7. non-inlineable situations
8. exceptions
9. complex iterators (e.g. interpolation)
etc

There are massive tradeoffs even when writing iterators for 
really simple data-structures like the linked list. It all 
depends on what functionality one are looking for.

There is no best solution. It all depends on the application.
Mar 10 2016
prev sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Thu, 10 Mar 2016 08:22:58 +0000, Ola Fosheim Grøstad wrote:

 On Thursday, 10 March 2016 at 00:29:46 UTC, Jack Stouffer wrote:
 It's a pretty straight forward standard iterator design and quite
 different from the table pointers C++ uses.
I explain my grievances in the article.
They didn't make all that much sense to me, so I wondered what Theo's issues were. As in: real issues that have empirical significance.
It's a little easier to write iterators in the Python style: you don't have to cache the current value, and you don't have to have a separate check for end-of-iteration. It's a little easier to use them in the D style: you get more flexibility, can check for emptiness without popping an item, and can grab the first item several times. You can convert one to the other, so there's no theoretical difference in what you can accomplish with them. It's mainly annoying. A small efficiency concern, because throwing exceptions is a little slow. The largest practical difference comes when multiple functions are interested in viewing the first item in the same range. LL(1) parsers need to do this. Of course, that's just looking at input ranges versus iterators. If you look at other types of ranges, there's a lot there that Python is missing.
Mar 10 2016
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 10 March 2016 at 17:59:21 UTC, Chris Wright wrote:
 It's a little easier to write iterators in the Python style: 
 you don't have to cache the current value, and you don't have 
 to have a separate check for end-of-iteration. It's a little 
 easier to use them in the D style: you get more flexibility, 
 can check for emptiness without popping an item, and can grab 
 the first item several times.
I don't have any firm opinions on this, but escaping out of the loop with an exception means you don't have to check for emptiness. So I am not sure why D range-iterators should be considered easier.
 You can convert one to the other, so there's no theoretical 
 difference in what you can accomplish with them. It's mainly 
 annoying. A small efficiency concern, because throwing 
 exceptions is a little slow.
Efficiency of exceptions in Python is an implementation issue, though. But I agree that the difference isn't all that interesting.
 The largest practical difference comes when multiple functions 
 are interested in viewing the first item in the same range. 
 LL(1) parsers need to do this.
Iterators and generators in Python are mostly for for-loops and comprehensions. In the rare case where you want lookahead you can just write your own or use an adapter.
 Of course, that's just looking at input ranges versus 
 iterators. If you look at other types of ranges, there's a lot 
 there that Python is missing.
Is there any work done on range-iterators and streams?
Mar 10 2016
prev sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 9 March 2016 at 22:12:42 UTC, Walter Bright wrote:
 I haven't read the article yet, but you'll get more interest by 
 putting a summary as the first comment on reddit.
Thanks for the advice, I think it caused more people to read it. Also, I forgot to mention in the article that the unit tests with coverage reports enabled run in 110ms. I love fast tests :)
Mar 10 2016
parent cym13 <cpicard openmailbox.org> writes:
On Thursday, 10 March 2016 at 21:25:16 UTC, Jack Stouffer wrote:
 On Wednesday, 9 March 2016 at 22:12:42 UTC, Walter Bright wrote:
 I haven't read the article yet, but you'll get more interest 
 by putting a summary as the first comment on reddit.
Thanks for the advice, I think it caused more people to read it. Also, I forgot to mention in the article that the unit tests with coverage reports enabled run in 110ms. I love fast tests :)
Did you time the python tests too? A value by itself doesn't mean much to me
Mar 11 2016
prev sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 9 March 2016 at 21:55:23 UTC, Jack Stouffer wrote:
 The code and docs can be found here: 
 https://github.com/JackStouffer/date-parser
Quick update: all dateutil tests are now passing. It can how parse just about any date format you can throw at it :)
Mar 18 2016