www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - to!()() & leading/trailing whitespace

reply bearophile <bearophileHUGS lycos.com> writes:
This post is about this enhancement request of mine that recently David Simcha
has closed as wontfix:
http://d.puremagic.com/issues/show_bug.cgi?id=4165

This topic is about a small Phobos thing, it's not about large complex things
as the const system. But it's relevant because I have hundreds of Python
scripts (and many small D1 programs) that essentially load some numbers from
textual files, process them, and write the numbers in other textual files. My
textual files are often small, but I have many of them, and I like to see them
processed quickly and safely, and I like to write those little programs in a
short time. So reading numbers from a text file is an essential operation for
me. And when I read textual files it's common to have leading newlines
(whitespace) behind numbers.

David has closed 4165 because:
- It's by design (it's mentioned in the docs of std.conv). But I don't care of
this, I think this it's a wrong design.
- There's a trivial workaround: this is true, but you need to remember to use
this workaround, it may cause bugs (bugs == the program doesn't work), and I
don't see the point in using a workaround very often in my code, I prefer to!()
to do that by itself.

In practice sometimes I use printf() in those D scripts to print many numbers
because it's much faster than writeln(). So I can write and use a more
efficient function that converts strings to numbers, but I'd like to need
Phobos only for such basic and common operation.



Possible disadvantages of a to!int() (and similar to!double(), etc) that
ignores leading and trailing whitespace:

It introduces bugs, because it accepts a more sloppy input: from my experience
this is not true, in Python int() and float() ignore the leading/trailing
whitespace and in years I don't remember it ever causing bugs to me:
 int(" -125\n")
-125
 float(" 6.3e6\t")
6300000.0 Phobos functions are meant as the most simpler bricks, that you may compose to perform more complex operations: this is generally true and good, but Python shows that when two or few operations are frequently done attached to each other, it's good to put inside the std lib something the performs the composed thing in one go, because it helps chunk the code and makes the code shorter and more readable, and decreases the chance for bugs. When I read numbers from files I will need to use to!int(txt.strip()) often. Bye, bearophile
Aug 16 2010
next sibling parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Monday, August 16, 2010 15:35:51 bearophile wrote:
 This post is about this enhancement request of mine that recently David
 Simcha has closed as wontfix:
 http://d.puremagic.com/issues/show_bug.cgi?id=4165
 
 This topic is about a small Phobos thing, it's not about large complex
 things as the const system. But it's relevant because I have hundreds of
 Python scripts (and many small D1 programs) that essentially load some
 numbers from textual files, process them, and write the numbers in other
 textual files. My textual files are often small, but I have many of them,
 and I like to see them processed quickly and safely, and I like to write
 those little programs in a short time. So reading numbers from a text file
 is an essential operation for me. And when I read textual files it's
 common to have leading newlines (whitespace) behind numbers.
 
[snip] A string with whitespace is _not_ a number. I can see why that would be problematic when you don't care about the whitespace, but you're converting the whole string, not just the numeric part. Using strip() is an extremely trivial workaround, and if you don't like that, there's always parse() which will strip out the whitespace itself. to() is for exact conversions, and whitespace is non- numeric. If you want to parse a string, then use parse(). I totally agree with David on this one. All the tools that you need are there. - Jonathan M Davis
Aug 16 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:
 A string with whitespace is _not_ a number.
I will not agree with this this (I am talking about leading/trailing whitespace only). Regarding the other things you say, I have already given answers to them. Bye, bearophile
Aug 16 2010
next sibling parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Monday, August 16, 2010 16:49:45 bearophile wrote:
 Jonathan M Davis:
 A string with whitespace is _not_ a number.
I will not agree with this this (I am talking about leading/trailing whitespace only). Regarding the other things you say, I have already given answers to them. Bye, bearophile
Well, while it may not be what you want, writing a wrapper which calls strip() would be easy to do. - Jonathan M Davis
Aug 16 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:
 Well, while it may not be what you want, writing a wrapper which calls strip() 
 would be easy to do.
It's what I will probably do... Bye, bearophile
Aug 16 2010
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jonathan M Davis wrote:
 On Monday, August 16, 2010 16:49:45 bearophile wrote:
 Jonathan M Davis:
 A string with whitespace is _not_ a number.
I will not agree with this this (I am talking about leading/trailing whitespace only). Regarding the other things you say, I have already given answers to them. Bye, bearophile
Well, while it may not be what you want, writing a wrapper which calls strip() would be easy to do.
It's more complicated than that. readf works with arbitrary input streams, so there would be need for a skipWhitespace() routine. Andrei
Aug 16 2010
prev sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Monday 16 August 2010 16:49:45 bearophile wrote:
 Jonathan M Davis:
 A string with whitespace is _not_ a number.
I will not agree with this this (I am talking about leading/trailing whitespace only). Regarding the other things you say, I have already given answers to them. Bye, bearophile
Why don't you just use parse()? It allows for leading whitespace, and it's really the function that's intended for turning strings into other types if it's not an exact conversion. - Jonathan M Davis
Aug 16 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 This post is about this enhancement request of mine that recently
 David Simcha has closed as wontfix: 
 http://d.puremagic.com/issues/show_bug.cgi?id=4165
[snip]
 Phobos functions are meant as the most simpler bricks, that you may
 compose to perform more complex operations: this is generally true
 and good, but Python shows that when two or few operations are
 frequently done attached to each other, it's good to put inside the
 std lib something the performs the composed thing in one go, because
 it helps chunk the code and makes the code shorter and more readable,
 and decreases the chance for bugs. When I read numbers from files I
 will need to use to!int(txt.strip()) often.
I don't feel very strongly about this (in particular e.g. I do allow leading whitespace for floating-point parsing). My only problem is that sometimes people _don't_ want to ignore trailing whitespace, which becomes quite difficult. But then I guess that's a rare case. Andrei
Aug 16 2010
parent Norbert Nemec <Norbert Nemec-online.de> writes:
On 17/08/10 02:49, Andrei Alexandrescu wrote:
 bearophile wrote:
 This post is about this enhancement request of mine that recently
 David Simcha has closed as wontfix:
 http://d.puremagic.com/issues/show_bug.cgi?id=4165
[snip]
 Phobos functions are meant as the most simpler bricks, that you may
 compose to perform more complex operations: this is generally true
 and good, but Python shows that when two or few operations are
 frequently done attached to each other, it's good to put inside the
 std lib something the performs the composed thing in one go, because
 it helps chunk the code and makes the code shorter and more readable,
 and decreases the chance for bugs. When I read numbers from files I
 will need to use to!int(txt.strip()) often.
I don't feel very strongly about this (in particular e.g. I do allow leading whitespace for floating-point parsing). My only problem is that sometimes people _don't_ want to ignore trailing whitespace, which becomes quite difficult. But then I guess that's a rare case.
Honestly: why should someone *want* the conversion to *fail* if there is additional whitespace? I would agree if to!()() would do something useful with the whitespace but as it is, the routine simply seems picky for the sake of being picky. str<->number conversions are inherently inexact since there are various representations for a number. It is just the decision about how tolerant the routine should be. In my experience it makes sense to make the conversion as tolerant as possible as long as it remains predictable and there is no danger of misinterpreting input.
Aug 17 2010