www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Which D features to emphasize for academic review article

reply "TJB" <broughtj gmail.com> writes:
Hello D Users,

The Software Editor for the Journal of Applied Econometrics has 
agreed to let me write a review of the D programming language for 
econometricians (econometrics is where economic theory and 
statistical analysis meet).  I will have only about 6 pages.  I 
have an idea of what I am going to write about, but I thought I 
would ask here what features are most relevant (in your minds) to 
numerical programmers writing codes for statistical inference.

I look forward to your suggestions.

Thanks,

TJB
Aug 09 2012
next sibling parent reply "dsimcha" <dsimcha yahoo.com> writes:
Ok, so IIUC the audience is academic BUT is people interested in 
using D as a means to an end, not computer scientists?  I use D 
for bioinformatics, which IIUC has similar requirements to 
econometrics.  From my point of view:

I'd emphasize the following:

Native efficiency.  (Important for large datasets and monte carlo 
simulations)

Garbage collection.  (Important because it makes it much easier 
to write non-trivial data structures that don't leak memory, and 
statistical analyses are a lot easier if the data is structured 
well.)

Ranges/std.range/builtin arrays and associative arrays.  (Again, 
these make data handling a pleasure.)

Templates.  (Makes it easier to write algorithms that aren't 
overly specialized to the data structure they operate on.  This 
can also be done with OO containers but requires more boilerplate 
and compromises on efficiency.)

Disclaimer:  These last two are things I'm the primary designer 
and implementer of.  I intentionally put them last so it doesn't 
look like a shameless plug.

std.parallelism  (Important because you can easily parallelize 
your simulation, etc.)

dstats  (https://github.com/dsimcha/dstats  Important because a 
lot of statistical analysis code is already implemented for you.  
It's admittedly very basic compared to e.g. R or Matlab, but it's 
also in many cases better integrated and more efficient.  I'd say 
that it has the 15% of the functionality that covers ~70% of use 
cases.  I welcome contributors to add more stuff to it.  I 
imagine economists would be interested in time series, which is 
currently a big area of missing functionality.)
Aug 09 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/9/2012 10:40 AM, dsimcha wrote:
 I'd emphasize the following:
I'd like to add to that: 1. Proper support for 80 bit floating point types. Many compilers' libraries have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 80 bit floats reduce the incidence of creeping roundoff error. 2. Support for SIMD vectors as native types. 3. Floating point values are default initialized to NaN. 4. Correct support for NaN and infinity values. 5. Correct support for unordered operations. 6. Array types do not degenerate into pointer types whenever passed to a function. In other words, array types know their dimension. 7. Array loop operations, i.e.: for (size_t i = 0; i < a.length; i++) a[i] = b[i] + c; can be written as: a[] = b[] + c; 8. Global data is thread local by default, lessening the risk of unintentional unsynchronized sharing between threads.
Aug 09 2012
next sibling parent reply "F i L" <witte2008 gmail.com> writes:
Walter Bright wrote:
 3. Floating point values are default initialized to NaN.
conveniently with just as much optimization/debugging benefit (arguably more so, because it catches NaN issues at class Foo { float x; // defaults to 0.0f void bar() { float y; // doesn't default y ++; // ERROR: use of unassigned local float z = 0.0f; z ++; // OKAY } } This is the same behavior for any local variable, so where in D you need to explicitly set variables to 'void' to avoid mistakes before runtime. Sorry, I'm not trying to derail this thread. I just think D's has other, much better advertising points that this one.
Aug 10 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/10/2012 1:38 AM, F i L wrote:
 Walter Bright wrote:
 3. Floating point values are default initialized to NaN.
just as much optimization/debugging benefit (arguably more so, because it catches NaN class Foo { float x; // defaults to 0.0f void bar() { float y; // doesn't default y ++; // ERROR: use of unassigned local float z = 0.0f; z ++; // OKAY } } This is the same behavior for any local variable,
It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one: float z; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; To diagnose this correctly, the static analyzer would have to determine that condition1 produces the same result as condition2, or not. This is impossible to prove. So the static analyzer either gives up and lets it pass, or issues an incorrect diagnostic. So our intrepid programmer is forced to write: float z = 0; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; Now, as it may turn out, for your algorithm the value "0" is an out-of-range, incorrect value. Not a problem as it is a dead assignment, right? But then the maintenance programmer comes along and changes condition1 so it is not always the same as condition2, and now the z++ sees the invalid "0" value sometimes, and a silent bug is introduced. This bug will not remain undetected with the default NaN initialization.
 so where in D you need to
 explicitly set variables to 'void' to avoid assignment costs,
This is incorrect, as the optimizer is perfectly capable of removing dead assignments like: f = nan; f = 0.0f; The first assignment is optimized away.
 I just think D's has other, much better advertising points that this one.
Whether you agree with it being a good feature or not, it is a feature unique to D and merits discussion when talking about D's suitability for numerical programming.
Aug 10 2012
next sibling parent reply "F i L" <witte2008 gmail.com> writes:
Walter Bright wrote:
 It catches only a subset of these at compile time. I can craft 
 any number of ways of getting it to miss diagnosing it. 
 Consider this one:

     float z;
     if (condition1)
          z = 5;
     ... lotsa code ...
     if (condition2)
          z++;

 To diagnose this correctly, the static analyzer would have to 
 determine that condition1 produces the same result as 
 condition2, or not. This is impossible to prove. So the static 
 analyzer either gives up and lets it pass, or issues an 
 incorrect diagnostic. So our intrepid programmer is forced to 
 write:

     float z = 0;
     if (condition1)
          z = 5;
     ... lotsa code ...
     if (condition2)
          z++;
Yes, but that's not really an issue since the compiler informs the coder of it's limitation. You're simply forced to initialize the variable in this situation.
 Now, as it may turn out, for your algorithm the value "0" is an 
 out-of-range, incorrect value. Not a problem as it is a dead 
 assignment, right?

 But then the maintenance programmer comes along and changes 
 condition1 so it is not always the same as condition2, and now 
 the z++ sees the invalid "0" value sometimes, and a silent bug 
 is introduced.

 This bug will not remain undetected with the default NaN 
 initialization.
I had a debate on here a few months ago about the merits of default-to-NaN and others brought up similar situations. but since we can write: float z = float.nan; ... explicitly, then this could be thought of as a debugging feature available to the programmer. The problem I've always had with defaulting to NaN is that it's inconsistent with integer types, and while there may be merit to the idea of defaulting all types to NaN/Null, it's simply unavailable for half of the number spectrum. I can only speak for myself, but I much prefer consistency over anything else because it means there's less discrepancies I need to remember when hacking things together. It also steepens the learning curve. More importantly, what we have now is code where bugs-- like the one you mentioned above --are still possible with Ints, but also easy to miss since "the other number type" behaves differently and programmers may accidentally assume a NaN will propagate where it will not.
 This is incorrect, as the optimizer is perfectly capable of 
 removing dead assignments like:

    f = nan;
    f = 0.0f;

 The first assignment is optimized away.
I thought there was some optimization by avoiding assignment, but IDK enough about memory at that level. Now I'm confused as to the point of 'float x = void' type annotations. :-\
 Whether you agree with it being a good feature or not, it is a 
 feature unique to D and merits discussion when talking about 
 D's suitability for numerical programming.
True, and I misspoke by saying it wasn't a "selling point". I only meant to raise issue with a feature that has been more of an annoyance rather than a boon to me personally. That said, I also agree that this thread was the wrong place to raise issue with it.
Aug 10 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/10/2012 9:01 PM, F i L wrote:
 I had a debate on here a few months ago about the merits of default-to-NaN and
 others brought up similar situations. but since we can write:

      float z = float.nan;
      ...
That is a good solution, but in my experience programmers just throw in an =0, as it is simple and fast, and they don't normally think about NaN's.
 explicitly, then this could be thought of as a debugging feature available to
 the programmer. The problem I've always had with defaulting to NaN is that it's
 inconsistent with integer types, and while there may be merit to the idea of
 defaulting all types to NaN/Null, it's simply unavailable for half of the
number
 spectrum. I can only speak for myself, but I much prefer consistency over
 anything else because it means there's less discrepancies I need to remember
 when hacking things together. It also steepens the learning curve.
It's too bad that ints don't have a NaN value, but interestingly enough, valgrind does default initialize them to some internal NaN, making it a most excellent bug detector.
 More importantly, what we have now is code where bugs-- like the one you
 mentioned above --are still possible with Ints, but also easy to miss since
"the
 other number type" behaves differently and programmers may accidentally assume
a
 NaN will propagate where it will not.
Sadly, D has to map onto imperfect hardware :-( We do have NaN values for chars (0xFF) and pointers (the villified 'null'). Think how many bugs the latter has exposed, and then think of all the floating point code with no such obvious indicator of bad initialization.
 I thought there was some optimization by avoiding assignment, but IDK enough
 about memory at that level. Now I'm confused as to the point of 'float x =
void'
 type annotations. :-\
It would be used where the static analysis is not able to detect that the initializer is dead.
Aug 10 2012
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/10/2012 9:32 PM, Walter Bright wrote:
 On 8/10/2012 9:01 PM, F i L wrote:
 I had a debate on here a few months ago about the merits of default-to-NaN and
 others brought up similar situations. but since we can write:

      float z = float.nan;
      ...
That is a good solution, but in my experience programmers just throw in an =0, as it is simple and fast, and they don't normally think about NaN's.
Let me amend that. I've never seen anyone use float.nan, or whatever NaN is in the language they were using. They always use =0. I doubt that yelling at them will change anything.
Aug 10 2012
prev sibling next sibling parent "F i L" <witte2008 gmail.com> writes:
Walter Bright wrote:
 Sadly, D has to map onto imperfect hardware :-(

 We do have NaN values for chars (0xFF) and pointers (the 
 villified 'null'). Think how many bugs the latter has exposed, 
 and then think of all the floating point code with no such 
 obvious indicator of bad initialization.
Yes, if 'int' had a NaN state it would be great. (Though I remember hearing about a hardware that did support it.. somewhere).
Aug 10 2012
prev sibling next sibling parent reply "F i L" <witte2008 gmail.com> writes:
Walter Bright wrote:
 That is a good solution, but in my experience programmers just 
 throw in an =0, as it is simple and fast, and they don't 
 normally think about NaN's.
See! Programmers just want usable default values :-P
 It's too bad that ints don't have a NaN value, but 
 interestingly enough, valgrind does default initialize them to 
 some internal NaN, making it a most excellent bug detector.
I heard somewhere before there's actually an (Intel?) CPU which supports NaN ints... but maybe that's just hearsay.
 Sadly, D has to map onto imperfect hardware :-(

 We do have NaN values for chars (0xFF) and pointers (the 
 villified 'null'). Think how many bugs the latter has exposed, 
 and then think of all the floating point code with no such 
 obvious indicator of bad initialization.
Ya, but I don't think pointers/refs and floats are comparable because one is copy semantics and the other is not. Conceptually, pointers are only references to data while numbers are actual data. It makes sense that one would default to different things. Thought if Int did have a NaN value, I'm not sure which way I would side on this issue. I still think I would prefer having some level of compile-time indication or my errors simply because it saves time when you're making something.
 It would be used where the static analysis is not able to 
 detect that the initializer is dead.
Good to know.
 However, and I've seen this happen, people will satisfy the 
 compiler complaint by initializing the variable to any old 
 value (usually 0), because that value will never get used. 
 Later, after other things change in the code, that value 
 suddenly gets used, even though it may be an incorrect value 
 for the use.
Maybe the perfect solution is to have the compiler initialize the value to NaN, but it also does a bit of static analysis and gives a compiler error when it can determine your variable is being used before being assigned for the sake of productivity. In fact, for the sake of consistency, you could always enforce that (compiler error) rule on every local variable, so even ints would be required to have explicit initialization before use. I still prefer float class members to be defaulted to a usable value, for the sake of consistency with ints.
Aug 11 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/11/12 3:11 AM, F i L wrote:
 I still prefer float class members to be defaulted to a usable value,
 for the sake of consistency with ints.
Actually there's something that just happened two days ago to me that's relevant to this, particularly because it's in a different language (SQL) and different domain (Machine Learning). I was working with an iterative algorithm implemented in SQL, which performs some aggregate computation, on some 30 billions of samples. The algorithm is rather intricate, and each iteration takes the previous one's result as input. Somehow at the end there were NaNs in the sample data I was looking at (there weren't supposed to). So I started investigating; the NaNs could appear only in a rare data corruption case. And indeed before long I found 4 (four) samples out of 30 billion that were corrupt. After one iteration, there were 300K NaNs. After two iterations, a few millions. After four, 800M samples were messed up. NaNs did save the day. Although this case is not about default values but about the result of a computation (in this case 0.0/0.0), I think it still reveals the usefulness of having a singular value in the floating point realm. Andrei
Aug 11 2012
parent reply "F i L" <witte2008 gmail.com> writes:
Andrei Alexandrescu wrote:
 [ ... ]

 Although this case is not about default values but about the 
 result of a computation (in this case 0.0/0.0), I think it 
 still reveals the usefulness of having a singular value in the 
 floating point realm.
My argument was never against the usefulness of NaN for debugging... only that it should be considered a debugging feature and explicitly defined, rather than intruding on convenience and consistency (with Int) by being the default. I completely agree NaNs are important for debugging floating point math, in fact D's default-to-NaN has caught a couple of my construction mistakes before. The problem, is that this sort of construction mistake is bigger than just floating point and NaN. You can mis-set a variable, float or not, or you can not set an int when you should have. So the question becomes not what benefit NaN is for debugging, but what a persons thought process is when creating/debugging code, and herein lies the heart of my qualm. In D we have a bit of a conceptual double standard within the number community. I have to remember these rules when I'm creating something, not just when I'm debugging it. As often as D may have caught a construction mistake specifically related to floats in my code, 10x more so it's produced NaN's where I intended a number, because I forgot about the double standard when adding a field or creating a variable. A C++ guy might not think twice about this because he's used to having to default values all the time (IDK, I'm not that guy), that's a paper-cut on someone's opinion of the language.
Aug 11 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/11/2012 12:33 PM, F i L wrote:
 In D we have a bit of a conceptual double standard within the
 number community. I have to remember these rules when I'm creating something,
 not just when I'm debugging it. As often as D may have caught a construction
 mistake specifically related to floats in my code, 10x more so it's produced
 NaN's where I intended a number, because I forgot about the double standard
when
 adding a field or creating a variable.
I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in the field.
 A C++ guy might not think twice about this because he's used to having to
 default values all the time (IDK, I'm not that guy),
Only if a default constructor is defined for the type, which it often is not, and you'll get garbage for a default initialization.
Aug 11 2012
parent reply "F i L" <witte2008 gmail.com> writes:
Walter Bright wrote:
 I'd rather have a 100 easy to find bugs than 1 unnoticed one 
 that went out in the field.
That's just the thing, bugs are arguably easier to hunt down when things default to a consistent, usable value. When variables are defaulted to Zero, I have a guarantee that any propagated NaN bug is _not_ coming from them (directly). With NaN defaults, I only have a guarantee that the value _might_ be coming said variable. Then, I also have more to be aware of when searching through code, because my ints behave differently than my floats. Arguably, you always have to be aware of this, but at least with explicit sets to NaN, I know the potential culprits earlier (because they'll have distinct assignment). With static analysis warning against local scope NaN issues, there's really only one situation where setting to NaN catches bugs, and that's when you want to guarantee that a member variable is specifically assigned a value (of some kind) during construction. This is a corner case situation because: 1. It makes no guarantees about what value is actually assigned to the variable, only that it's set to something. Which means it's either forgotten in favor of a 'if' statement, or in combination with an if statement. 2. Because of it's singular debugging potential, NaN safeguards are, most often, intentionally put in place (or in D's case, left in place). This is why I think such situations should require an explicit assignment to NaN. The "100 easy bugs" you mentioned weren't actually "bugs", they where times I forgot floats defaulted _differently_. The 10 times where NaN caught legitimate bugs, I would have had to hunt down the mistake either way, and it was trivial to do regardless of the the NaN. Even if it wasn't trivial, I could have very easily assigned NaN to questionable variables explicitly.
Aug 11 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/11/2012 3:01 PM, F i L wrote:
 Walter Bright wrote:
 I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in
 the field.
That's just the thing, bugs are arguably easier to hunt down when things default to a consistent, usable value.
Many, many programming bugs trace back to assumptions that floating point numbers act like ints. There's just no way to avoid knowing and understanding the differences.
 When variables are defaulted to Zero, I have a
 guarantee that any propagated NaN bug is _not_ coming from them (directly).
With
 NaN defaults, I only have a guarantee that the value _might_ be coming said
 variable.
I don't see why this is a bad thing. The fact is, with NaN you know there is a bug. With 0, you may never realize there is a problem. Andrei wrote me about the output of a program he is working on having billions of result values, and he noticed a few were NaNs, which he traced back to a bug. If the bug had set the float value to 0, there's no way he would have ever noticed the issue. It's all about daubing bugs with day-glo orange paint so you know there's a problem. Painting them with camo is not the right solution.
Aug 11 2012
next sibling parent reply "F i L" <witte2008 gmail.com> writes:
Walter Bright wrote:
 That's just the thing, bugs are arguably easier to hunt down 
 when things default
 to a consistent, usable value.
Many, many programming bugs trace back to assumptions that floating point numbers act like ints. There's just no way to avoid knowing and understanding the differences.
My point was that the majority of the time there wasn't a bug introduced. Meaning the code was written an functioned as expected after I initialized the value to 0. I was only expecting the value to act similar (in initial value) as it's 'int' relative, but received a NaN in the output because I forgot to be explicit.
 I don't see why this is a bad thing. The fact is, with NaN you 
 know there is a bug. With 0, you may never realize there is a 
 problem. Andrei wrote me about the output of a program he is 
 working on having billions of result values, and he noticed a 
 few were NaNs, which he traced back to a bug. If the bug had 
 set the float value to 0, there's no way he would have ever 
 noticed the issue.

 It's all about daubing bugs with day-glo orange paint so you 
 know there's a problem. Painting them with camo is not the 
 right solution.
Yes, and this is an excellent argument for using NaN as a debugging practice in general, but I don't see anything in favor of defaulting to NaN. If you don't do some kind of check against code, especially with such large data sets, bugs of various kinds are going to go unchecked regardless. A bug where an initial data value was accidentally initialized to 0 (by a third party later on, for instance), could be just as hard to miss, or harder if you're expecting a NaN to appear. In fact, an explicit set to NaN might discourage a third party to assigning without first questioning the original intention. In this situation I imagine best practice would be to write: float dataValue = float.nan; // MUST BE NaN, DO NOT CHANGE! // set to NaN to ensure is-set.
Aug 11 2012
parent dennis luehring <dl.soluz gmx.net> writes:
Am 12.08.2012 02:43, schrieb F i L:
 Yes, and this is an excellent argument for using NaN as a
 debugging practice in general, but I don't see anything in favor
 of defaulting to NaN. If you don't do some kind of check against
 code, especially with such large data sets, bugs of various kinds
 are going to go unchecked regardless.
is makes absolutely no sense to have different initialization stylel in debug an release - and according to Andrei example: there are many situations where slow-debug code isn't capable to reproduce the error in a human-timespan - especially when working with million, billion datasets (like i also do...)
Aug 11 2012
prev sibling parent reply Don Clugston <dac nospam.com> writes:
On 12/08/12 01:31, Walter Bright wrote:
 On 8/11/2012 3:01 PM, F i L wrote:
 Walter Bright wrote:
 I'd rather have a 100 easy to find bugs than 1 unnoticed one that
 went out in
 the field.
That's just the thing, bugs are arguably easier to hunt down when things default to a consistent, usable value.
Many, many programming bugs trace back to assumptions that floating point numbers act like ints. There's just no way to avoid knowing and understanding the differences.
Exactly. I have come to believe that there are very few algorithms originally designed for integers, which also work correctly for floating point. Integer code nearly always assumes things like, x + 1 != x, x == x, (x + y) - y == x. for (y = x; y < x + 10; y = y + 1) { .... } How many times does it loop?
Aug 13 2012
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 13/08/12 11:11, Don Clugston wrote:
 Exactly. I have come to believe that there are very few algorithms originally
 designed for integers, which also work correctly for floating point.
//////// import std.stdio; void main() { real x = 1.0/9.0; writefln("x = %.128g", x); writefln("9x = %.128g", 9.0*x); } //////// ... well, that doesn't work, does it? Looks like some sort of cheat in place to make sure that the successive division and multiplication will revert to the original number.
 Integer code nearly always assumes things like, x + 1 != x, x == x,
 (x + y) - y == x.
There's always good old "if(x==0)" :-)
Aug 13 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/13/2012 5:38 AM, Joseph Rushton Wakeling wrote:
 Looks like some sort of cheat in place to
 make sure that the successive division and multiplication will revert to the
 original number.
That's called "rounding". But rounding always implies some, small, error that can accumulate into being a very large error.
Aug 13 2012
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 13/08/12 20:04, Walter Bright wrote:
 That's called "rounding". But rounding always implies some, small, error that
 can accumulate into being a very large error.
Well, yes. I was just remarking on the choice of rounding and the motivation behind it. After all, you _could_ round it instead as, x = 1.0/9.0 == 0.11111111111111 ... 111 [finite number of decimal places] but then 9*x == 0.999999999999 ... 9999 [i.e. doesn't multiply back to 1.0]. ... and this is probably more likely to result in undesirable error than the other rounding scheme. (I think the calculator app on Windows used to have this behaviour some years back.)
Aug 13 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Don Clugston:

 I have come to believe that there are very few algorithms 
 originally designed for integers, which also work correctly for 
 floating point.
And JavaScript programs that use integers? Bye, bearophile
Aug 13 2012
prev sibling parent reply "TJB" <broughtj gmail.com> writes:
On Monday, 13 August 2012 at 10:11:06 UTC, Don Clugston wrote:

  ... I have come to believe that there are very few algorithms 
 originally designed for integers, which also work correctly for 
 floating point.

 Integer code nearly always assumes things like, x + 1 != x, x 
 == x,
 (x + y) - y == x.


 for (y = x; y < x + 10; y = y + 1) { .... }

 How many times does it loop?
Don, I would appreciate your thoughts on the issue of re-implementing numeric codes like BLAS and LAPACK in pure D to benefit from the many nice features listed in this discussion. Is it feasible? Worthwhile? Thanks, TJB
Aug 13 2012
parent Don Clugston <dac nospam.com> writes:
On 14/08/12 05:03, TJB wrote:
 On Monday, 13 August 2012 at 10:11:06 UTC, Don Clugston wrote:

  ... I have come to believe that there are very few algorithms
 originally designed for integers, which also work correctly for
 floating point.

 Integer code nearly always assumes things like, x + 1 != x, x == x,
 (x + y) - y == x.


 for (y = x; y < x + 10; y = y + 1) { .... }

 How many times does it loop?
Don, I would appreciate your thoughts on the issue of re-implementing numeric codes like BLAS and LAPACK in pure D to benefit from the many nice features listed in this discussion. Is it feasible? Worthwhile? Thanks, TJB
I found that when converting code for Special Functions from C to D, the code quality improved enormously. Having 'static if' and things like float.epsilon as built-ins makes a surprisingly large difference. It encourages correct code. (For example, it makes any use of magic numbers in the code look really ugly and wrong). Unit tests help too. That probably doesn't apply so much to LAPACK and BLAS, but it would be interesting to see how far we can get with the new SIMD support.
Aug 14 2012
prev sibling parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 11 August 2012 at 04:33:38 UTC, Walter Bright wrote:
 It's too bad that ints don't have a NaN value, but 
 interestingly enough, valgrind does default initialize them to 
 some internal NaN, making it a most excellent bug detector.
The compiler could always have flags specifying if variables were used, and if they are false they are as good as NaN. Only downside is a performance hit unless you Mark it as a release binary. It really comes down to if it's worth implementing or considered a big change (unless it's a flag you have to specially turn on) example: int a; writeln(a++); //compile-time error, or throws an exception on at runtime (read access before being set) internally translated as: int a; bool _is_a_used = false; if (!_a__is_a_used) throw new exception("a not initialized before use!"); //passing to functions will throw the exception, //unless the signature is 'out' writeln(a); ++a; _a__is_a_used= true;
 Sadly, D has to map onto imperfect hardware :-(
Not so much imperfect hardware, just the imperfect 'human' variable.
 We do have NaN values for chars (0xFF) and pointers (the 
 villified 'null'). Think how many bugs the latter has exposed, 
 and then think of all the floating point code with no such 
 obvious indicator of bad initialization.
Aug 11 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/11/2012 1:30 AM, Era Scarecrow wrote:
 On Saturday, 11 August 2012 at 04:33:38 UTC, Walter Bright wrote:
 It's too bad that ints don't have a NaN value, but interestingly enough,
 valgrind does default initialize them to some internal NaN, making it a most
 excellent bug detector.
The compiler could always have flags specifying if variables were used, and if they are false they are as good as NaN. Only downside is a performance hit unless you Mark it as a release binary. It really comes down to if it's worth implementing or considered a big change (unless it's a flag you have to specially turn on)
Not so easy. Suppose you pass a pointer to the variable to another function. Does that function set it?
Aug 11 2012
parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 11 August 2012 at 09:26:42 UTC, Walter Bright wrote:
 On 8/11/2012 1:30 AM, Era Scarecrow wrote:
 The compiler could always have flags specifying if variables 
 were used, and if they are false they are as good as NaN. Only 
 downside is a performance hit unless you Mark it as a release 
 binary. It really comes down to if it's worth implementing or 
 considered a big change (unless it's a flag you have to 
 specially turn on)
Not so easy. Suppose you pass a pointer to the variable to another function. Does that function set it?
I suppose there could be a second hidden pointer/bool as part of calls, but then it's completely incompatible with any C calling convention, meaning that is probably out of the question. Either a) pointers are low level enough that like casting; At which case it's all up to the programmer. or b) same as before that unless it's an 'out' parameter is specified, it would likely throw an exception at that point, (Since attempting to read/pass the address of an uninitialized variable is the same as accessing it directly). Afterall having a false positive is better than not being involved at all right? Of course with that in mind, specifying a variable to begin as void (uninitialized) could be it's own form of initialization? (Meaning it wouldn't be checking those even though they hold known garbage)
Aug 11 2012
prev sibling parent reply "F i L" <witte2008 gmail.com> writes:
F i L wrote:
 Walter Bright wrote:
 It catches only a subset of these at compile time. I can craft 
 any number of ways of getting it to miss diagnosing it. 
 Consider this one:

    float z;
    if (condition1)
         z = 5;
    ... lotsa code ...
    if (condition2)
         z++;
 
 [...]
Yes, but that's not really an issue since the compiler informs the coder of it's limitation. You're simply forced to initialize the variable in this situation.
fields are defaulted to a usable value. Locals have to be explicitly set before they're used.. so, expanding on your example above: float z; if (condition1) z = 5; else z = 6; // 'else' required ... lotsa code ... if (condition2) z++; On the first condition, without an 'else z = ...', or if the condition was removed at a later time, then you'll get a compiler error and be forced to explicitly assign 'z' somewhere above you need to: 1. run the program 2. get bad result 3. hunt down bug are initialized in a constructor: class Foo { float f = float.NaN; // Can't 'f' use unless Foo is // properly constructed. }
Aug 10 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/10/2012 9:55 PM, F i L wrote:
 On the first condition, without an 'else z = ...', or if the condition was
 removed at a later time, then you'll get a compiler error and be forced to


 whereas in D you need to:

    1. run the program
    2. get bad result
    3. hunt down bug
However, and I've seen this happen, people will satisfy the compiler complaint by initializing the variable to any old value (usually 0), because that value will never get used. Later, after other things change in the code, that value suddenly gets used, even though it may be an incorrect value for the use.
Aug 10 2012
parent reply "Mehrdad" <wfunction hotmail.com> writes:
On Saturday, 11 August 2012 at 05:41:23 UTC, Walter Bright wrote:
 On 8/10/2012 9:55 PM, F i L wrote:
 On the first condition, without an 'else z = ...', or if the 
 condition was removed at a later time, then you'll get a 
 compiler error and be forced to explicitly assign 'z' 


 compile-time, whereas in D you need to:
   1. run the program
   2. get bad result
   3. hunt down bug
However, and I've seen this happen, people will satisfy the compiler complaint by initializing the variable to any old value (usually 0), because that value will never get used. Later, after other things change in the code, that value suddenly gets used, even though it may be an incorrect value for the use.
Note to Walter: You're obviously correct that you can make an arbitrarily complex program to make it too difficult for the compiler to enforce What you seem to be missing is that the issue you're saying is correct in theory, but too much of a corner case in practice. you're mentioning, and even when they do, they don't have nearly as much of a problem with fixing it as you seem to think. The only reason you run into this sort of problem (assuming you do, and it's not just a theoretical discussion) is that you're in the C/C++ mindset, and using variables in the C/C++ fashion. you simply _wouldn't_ try to make things so complicated when coding, and you simply _wouldn't_ run into these problems the way you /think/ you would, as a C++ programmer. Regardless, it looks to me like you two are arguing for two orthogonal issues: F i L: The compiler should detect uninitialized variables. Walter: The compiler should choose initialize variables with NaN. What I'm failing to understand is, why can't we have both? 1. Compiler _warns_ about "uninitialized variables" (or scalars, the address of the variable, in which case the compiler gives up Bonus points: Try to detect a couple of common cases (e.g. if/else) instead of giving up so easily. 2. In any case, the compiler initializes the variable with whatever default value Walter deems useful. Then you get the best of both worlds: 1. You force the programmer to manually initialize the variable in most cases, forcing him to think about the default value. It's almost no trouble for 2. In the cases where it's not possible, the language helps the programmer catch bugs. Nothing lost, anyway. to be gained.
Aug 14 2012
next sibling parent "Michal Minich" <michal.minich gmail.com> writes:
On Tuesday, 14 August 2012 at 10:31:30 UTC, Mehrdad wrote:
 Note to Walter:

 You're obviously correct that you can make an arbitrarily 
 complex program to make it too difficult for the compiler to 

 cases).

 What you seem to be missing is that the issue you're saying is 
 correct in theory, but too much of a corner case in practice.


 you're mentioning, and even when they do, they don't have 
 nearly as much of a problem with fixing it as you seem to think.
hairy code (nested if/foreach/try) to make sure all cases are handled when initializing variable. Compilation errors can be simply dismissed by assigning a 'default' value to variable at the beginning the functions, but is generally a sloppy programing and you loose useful help of the compiler. applied to D http://msdn.microsoft.com/en-us/library/aa691172%28v=vs.71%29.aspx
Aug 14 2012
prev sibling next sibling parent Don Clugston <dac nospam.com> writes:
On 14/08/12 12:31, Mehrdad wrote:
 On Saturday, 11 August 2012 at 05:41:23 UTC, Walter Bright wrote:
 On 8/10/2012 9:55 PM, F i L wrote:
 On the first condition, without an 'else z = ...', or if the
 condition was removed at a later time, then you'll get a compiler
 error and be forced to explicitly assign 'z' somewhere above using

 catches these issues at compile-time, whereas in D you need to:
   1. run the program
   2. get bad result
   3. hunt down bug
However, and I've seen this happen, people will satisfy the compiler complaint by initializing the variable to any old value (usually 0), because that value will never get used. Later, after other things change in the code, that value suddenly gets used, even though it may be an incorrect value for the use.
Note to Walter: You're obviously correct that you can make an arbitrarily complex program to make it too difficult for the compiler to enforce What you seem to be missing is that the issue you're saying is correct in theory, but too much of a corner case in practice. mentioning, and even when they do, they don't have nearly as much of a problem with fixing it as you seem to think. The only reason you run into this sort of problem (assuming you do, and it's not just a theoretical discussion) is that you're in the C/C++ mindset, and using variables in the C/C++ fashion. simply _wouldn't_ try to make things so complicated when coding, and you simply _wouldn't_ run into these problems the way you /think/ you would, as a C++ programmer. Regardless, it looks to me like you two are arguing for two orthogonal issues: F i L: The compiler should detect uninitialized variables. Walter: The compiler should choose initialize variables with NaN. What I'm failing to understand is, why can't we have both? 1. Compiler _warns_ about "uninitialized variables" (or scalars, at address of the variable, in which case the compiler gives up trying to Bonus points: Try to detect a couple of common cases (e.g. if/else) instead of giving up so easily. 2. In any case, the compiler initializes the variable with whatever default value Walter deems useful. Then you get the best of both worlds: 1. You force the programmer to manually initialize the variable in most cases, forcing him to think about the default value. It's almost no trouble for 2. In the cases where it's not possible, the language helps the programmer catch bugs.
DMD detects uninitialized variables if you compile with -O. It's hard to implement the full Monty at the moment, because all that code is in the backend rather than the front-end.


Completely agree. I always thought the intention was that assigning to NaN was simply a way of catching the difficult cases that slip through compile-time checks. Which includes the situation where the compile-time checking isn't yet implemented at all. This is the first time I've heard the suggestion that it might never be implemented. The thing which is really bizarre though, is float.init. I don't know what the semantics of it are.
Aug 14 2012
prev sibling next sibling parent reply "F i L" <witte2008 gmail.com> writes:
Mehrdad wrote:
 Note to Walter:

 You're obviously correct that you can make an arbitrarily 
 complex program to make it too difficult for the compiler to 

 cases).
 
 [ ... ]
I think some here are mis-interpreting Walters position concerning static analysis from our earlier conversation, so I'll share my impression of his thoughts. I can't speak for Walter, of course, but I'm pretty sure that early on in our conversation he agreed that having the compiler catch local scope initialization issues was a good idea, or at least, wasn't a bad one (again, correct me if I'm wrong). I doubt he would be adverse to eventually having DMD perform this sort of static analysis to help developers, though I doubt it's a high priority for him. The majority of the conversation after that was concerning struct/class fields defaults: class Foo { float x; // I think this should be 0.0f // Walter thinks it should be NaN } In this situation static analysis can't help catch issues, and we're forced to rely on a default value of some kind. Both Walter and I have stated our opinion's reasoning previously, so I won't repeat them here.
Aug 14 2012
next sibling parent reply "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Tue, 14 Aug 2012 16:32:25 +0200, F i L <witte2008 gmail.com> wrote:

    class Foo
    {
        float x; // I think this should be 0.0f
                 // Walter thinks it should be NaN
    }

 In this situation static analysis can't help catch issues, and we're  
 forced to rely on a default value of some kind.
Really? We can catch (or, should be able to) missing initialization of stuff with disable this(), but not floats? Classes have constructors, which lend themselves perfectly to doing exactly this (just pretend the member is a local variable). Perhaps there are problems with structs without disabled default constructors, but even those are trivially solvable by requiring a default value at declaration time. -- Simen
Aug 14 2012
parent reply "F i L" <witte2008 gmail.com> writes:
On Tuesday, 14 August 2012 at 14:46:30 UTC, Simen Kjaeraas wrote:
 On Tue, 14 Aug 2012 16:32:25 +0200, F i L <witte2008 gmail.com> 
 wrote:

   class Foo
   {
       float x; // I think this should be 0.0f
                // Walter thinks it should be NaN
   }

 In this situation static analysis can't help catch issues, and 
 we're forced to rely on a default value of some kind.
Really? We can catch (or, should be able to) missing initialization of stuff with disable this(), but not floats? Classes have constructors, which lend themselves perfectly to doing exactly this (just pretend the member is a local variable). Perhaps there are problems with structs without disabled default constructors, but even those are trivially solvable by requiring a default value at declaration time.
You know, I never actually thought about it much, but I think you're right. I guess the same rules could apply to type fields.
Aug 14 2012
next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Tuesday, 14 August 2012 at 15:24:30 UTC, F i L wrote:
 Really? We can catch (or, should be able to) missing 
 initialization of stuff with  disable this(), but not floats?

 Classes have constructors, which lend themselves perfectly to 
 doing exactly this (just pretend the member is a local 
 variable).

 Perhaps there are problems with structs without disabled 
 default constructors, but even those are trivially solvable by 
 requiring a default value at declaration time.
You know, I never actually thought about it much, but I think you're right. I guess the same rules could apply to type fields.
Mmmm... What if you added a command that has a file/local scope? perhaps following the disable this(), it could be disable init; or disable .init. This would only work for built-in types, and possibly structs with variables that aren't explicitly set with default values. It sorta already fits with what's there. disable init; //global scope in file, like safe. struct someCipher { disable init; //local scope, in this case the whole struct. int[][] tables; //now gives compile-time error unless disable this() used. ubyte[] key = [1,2,3,4]; //explicitly defined as a default this(ubyte[] k, int[][] t){key=k;tables=t;} } void myfun() { someCipher x; //compile time error since struct fails (But not at this line unless disable this() used) someCipher y = someCipher([[1,2],[1,2]]); //should work as expected. }
Aug 14 2012
prev sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 14 August 2012 at 15:24:30 UTC, F i L wrote:
 On Tuesday, 14 August 2012 at 14:46:30 UTC, Simen Kjaeraas 
 wrote:
 On Tue, 14 Aug 2012 16:32:25 +0200, F i L 
 <witte2008 gmail.com> wrote:

  class Foo
  {
      float x; // I think this should be 0.0f
               // Walter thinks it should be NaN
  }

 In this situation static analysis can't help catch issues, 
 and we're forced to rely on a default value of some kind.
Really? We can catch (or, should be able to) missing initialization of stuff with disable this(), but not floats? Classes have constructors, which lend themselves perfectly to doing exactly this (just pretend the member is a local variable). Perhaps there are problems with structs without disabled default constructors, but even those are trivially solvable by requiring a default value at declaration time.
You know, I never actually thought about it much, but I think you're right. I guess the same rules could apply to type fields.
:) We could do the same for structs and classes... what I said doesn't just apply to local variables.
Aug 14 2012
prev sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 14 August 2012 at 14:32:26 UTC, F i L wrote:
 Mehrdad wrote:
 Note to Walter:

 You're obviously correct that you can make an arbitrarily 
 complex program to make it too difficult for the compiler to 

 cases).
 
 [ ... ]
I think some here are mis-interpreting Walters position concerning static analysis from our earlier conversation, so I'll share my impression of his thoughts. I can't speak for Walter, of course, but I'm pretty sure that early on in our conversation he agreed that having the compiler catch local scope initialization issues was a good idea, or at least, wasn't a bad one (again, correct me if I'm wrong). I doubt he would be adverse to eventually having DMD perform this sort of static analysis to help developers, though I doubt it's a high priority for him.
Ah, well if he's for it, then I misunderstood. I read through the entire thread (but not too carefully, just 1 read) and my impression was that he didn't like the idea because it would fail in some cases (and because D doesn't seem to love emitting compiler warnings in general), but if he likes it, then great. :)
Aug 14 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2012 3:31 AM, Mehrdad wrote:
 Then you get the best of both worlds:

 1. You force the programmer to manually initialize the variable in most cases,
 forcing him to think about the default value. It's almost no trouble for

 2. In the cases where it's not possible, the language helps the programmer
catch
 bugs.



As I've explained before, user defined types have "default constructors". If builtin types do not, then you've got a barrier to writing generic code. Default initialization also applies to static arrays, tuples, structs and dynamic allocation. It seems a large inconsistency to complain about them only for local variables of basic types, and not for any aggregate type or user defined type.


As for the 'rarity' of the error I mentioned, yes, it is unusual. The trouble is when it creeps unexpectedly into otherwise working code that has been working for a long time.
Aug 14 2012
parent reply "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 14 August 2012 at 21:13:01 UTC, Walter Bright wrote:
 On 8/14/2012 3:31 AM, Mehrdad wrote:
 Then you get the best of both worlds:

 1. You force the programmer to manually initialize the 
 variable in most cases,
 forcing him to think about the default value. It's almost no 
 trouble for

 2. In the cases where it's not possible, the language helps 
 the programmer catch
 bugs.



As I've explained before, user defined types have "default constructors". If builtin types do not, then you've got a barrier to writing generic code.
Just because they _have_ a default constructor doesn't mean the compiler should implicitly _call_ them on your behalf.


Huh? I think you completely misread my post... I was talking about "definite assignment", i.e. the _lack_ of automatic initialization.
 As for the 'rarity' of the error I mentioned, yes, it is 
 unusual. The trouble is when it creeps unexpectedly into 
 otherwise working code that has been working for a long time.
It's no "trouble" in practice, that's what I'm trying to say. It only looks like "trouble" if you look at it from the C/C++
Aug 14 2012
next sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 14 August 2012 at 21:22:14 UTC, Mehrdad wrote:

Typo, scratch Java, it's N/A for Java.
Aug 14 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2012 2:22 PM, Mehrdad wrote:
 I was talking about "definite assignment", i.e. the _lack_ of automatic
 initialization.
I know. How does that fit in with default construction?
Aug 14 2012
parent reply "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 14 August 2012 at 21:58:20 UTC, Walter Bright wrote:
 On 8/14/2012 2:22 PM, Mehrdad wrote:
 I was talking about "definite assignment", i.e. the _lack_ of 
 automatic initialization.
I know. How does that fit in with default construction?
They aren't called unless the user calls them. void Bar<T>(T value) { } void Foo<T>() where T : new() // generic constraint for default constructor { T uninitialized; T initialized = new T(); Bar(initialized); // error Bar(uninitialized); // OK } void Test() { Foo<int>(); Foo<Object>(); } D could take a similar approach.
Aug 14 2012
next sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 14 August 2012 at 22:57:26 UTC, Mehrdad wrote:
 		Bar(initialized);  // error
 		Bar(uninitialized);  // OK
Er, other way around I mean...
Aug 14 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2012 3:57 PM, Mehrdad wrote:
 I know. How does that fit in with default construction?
They aren't called unless the user calls them.
I guess they aren't really default constructors, then <g>. So what happens when you allocate an array of them?
 D could take a similar approach.
It could, but default construction is better (!).
Aug 14 2012
parent "Mehrdad" <wfunction hotmail.com> writes:
On Wednesday, 15 August 2012 at 00:32:43 UTC, Walter Bright wrote:
 On 8/14/2012 3:57 PM, Mehrdad wrote:
 I guess they aren't really default constructors, then <g>.
I say potayto, you say potahto... :P
 So what happens when you allocate an array of them?
For arrays, they're called automatically. Well, OK, that's a bit of a simplification. It's what happens from the user perspective, not the compiler's (or runtime's). Here's the full story. And please read it carefully, since I'm __not__ saying D should - You can define a custom default constructor for classes, but not structs. - Structs _always_ have a zero-initializing default (no-parameter) constructor. - Therefore, there is no such thing as "copy construction"; it's bitwise-copied. - Ctors for _structs_ MUST initialize every field (or call the default ctor) - Ctors for _classes_ don't have this restriction. - Since initialization is "Cheap", the runtime _always_ does it, for _security_. - The above^ is IRRELEVANT to the compiler! * It enforces initialization where it can. * It explicitly tells the runtime to auto-initialize when it can't. -- You can ONLY take the address of a variable in unsafe{} blocks. -- This implies you know what you're doing, so it's not a problem. What D would do _ideally_, IMO: 1. Keep the ability to define default (no-args) and postblit constructors. 2. _Always_ force the programmer to initialize _all_ variables explicitly. * No, this is NOT what C++ does. * Yes, it is tested & DOES work well in practice. But NOT in the C++ mindset. * If the programmer _needs_ vars to be uninitialized, he can say = void. * If the programmer wants NaNs, he can just say = T.init. Bingo. It should work pretty darn well, if you actually give it a try. (Don't believe me? Put it behind a compiler switch, and see how many people start using it, and how many of them [don't] complain about it!)
 D could take a similar approach.
It could, but default construction is better (!).
Well, that's so convincing, I'm left speechless!
Aug 14 2012
prev sibling next sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Friday, 10 August 2012 at 22:01:46 UTC, Walter Bright wrote:
 It catches only a subset of these at compile time. I can craft 
 any number of ways of getting it to miss diagnosing it. 
 Consider this one:

     float z;
     if (condition1)
          z = 5;
     ... lotsa code ...
     if (condition2)
          z++;

 To diagnose this correctly, the static analyzer would have to 
 determine that condition1 produces the same result as 
 condition2, or not. This is impossible to prove. So the static 
 analyzer either gives up and lets it pass, or issues an 
 incorrect diagnostic. So our intrepid programmer is forced to 
 write:

     float z = 0;
     if (condition1)
          z = 5;
     ... lotsa code ...
     if (condition2)
          z++;

 Now, as it may turn out, for your algorithm the value "0" is an 
 out-of-range, incorrect value. Not a problem as it is a dead 
 assignment, right?

 But then the maintenance programmer comes along and changes 
 condition1 so it is not always the same as condition2, and now 
 the z++ sees the invalid "0" value sometimes, and a silent bug 
 is introduced.

 This bug will not remain undetected with the default NaN 
 initialization.
variable is NOT set and then emits an error. It tries to prove that the variable IS set, and if it can't prove that, it's an error. It's not an incorrect diagnostic, it does exactly what it's supposed to do and the programmer has to be explicit when one takes on the responsibility of initialization. I don't see programmers I've talked to love it (I much prefer it too). Leaving a local variable initially uninitialized (or rather, not explicitly initialized) is a good way to portray the intention your program compiles, your variable is guaranteed to be initialized later but before use. This is a useful guarantee when reading/maintaining code. In D, on the other hand, it's possible to write D code like: for(size_t i; i < length; ++i) { ... } And I've actually seen this kind of code a lot in the wild. It boggles my mind that you think that this code should be legal. I think it's lazy - the intention is not clear. Is the default initializer being intentionally relied on, or was it unintentional? I've seen both cases. The for-loop example is an extreme one for demonstrative purposes, most examples are less obvious. Saying that most programmers will explicitly initialize floating point numbers to 0 instead of NaN when taking on initialization responsibility is a cop-out - float.init and float.nan are obviously the values you should be going for. The benefit is easy for programmers to understand, especially if they already understand why float.init is NaN. You say yelling at them probably won't help - why not? I personally use float.init/double.init etc. in my own code, and I'm sure other informed programmers do too. I can understand why people don't do it in, say, C, with NaN being less defined there afaik. D promotes NaN actively and programmers should be eager to leverage NaN explicitly too. non-local variables - they all have a defined default initializer that the local-variable analysis is limited to the scope of a single function body, it does not do inter-procedural analysis. I think this would be a great thing for D, and I believe that all code this change breaks is actually broken to begin with.
Aug 11 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/11/2012 1:57 AM, Jakob Ovrum wrote:

 set and then emits an error. It tries to prove that the variable IS set, and if
 it can't prove that, it's an error.

 It's not an incorrect diagnostic, it does exactly what it's supposed to do
Of course it is doing what the language requires, but it is an incorrect diagnostic because a dead assignment is required. And being a dead assignment, it can lead to errors when the code is later modified, as I explained. I also dislike on aesthetic grounds meaningless code being required.
 In D, on the other hand, it's possible to write D code like:

 for(size_t i; i < length; ++i)
 {
      ...
 }

 And I've actually seen this kind of code a lot in the wild. It boggles my mind
 that you think that this code should be legal. I think it's lazy - the
intention
 is not clear. Is the default initializer being intentionally relied on, or was
 it unintentional? I've seen both cases. The for-loop example is an extreme one
 for demonstrative purposes, most examples are less obvious.
That perhaps is your experience with other languages (that do not default initialize) showing. I don't think that default initialization is so awful. In fact, C++ enables one to specify default initialization for user defined types. Are you against that, too?
 Saying that most programmers will explicitly initialize floating point numbers
 to 0 instead of NaN when taking on initialization responsibility is a cop-out -
You can certainly say it's a copout, but it's what I see them do. I've never seen them initialize to NaN, but I've seen the "just throw in a 0" many times.
 float.init and float.nan are obviously the values you should be going for. The
 benefit is easy for programmers to understand, especially if they already
 understand why float.init is NaN. You say yelling at them probably won't help -
 why not?
Because experience shows that even the yellers tend to do the short, convenient one rather than the longer, correct one. Bruce Eckel wrote an article about this years ago in reference to why Java exception specifications were a failure and actually caused people to write bad code, including those who knew better.
Aug 11 2012
next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
 On 8/11/2012 1:57 AM, Jakob Ovrum wrote:
 Because experience shows that even the yellers tend to do the 
 short, convenient one rather than the longer, correct one. 
 Bruce Eckel wrote an article about this years ago in reference 
 to why Java exception specifications were a failure and 
 actually caused people to write bad code, including those who 
 knew better.
I have to agree here. I spend my work time between JVM and .NET based languages, and checked exceptions are on my top 5 list of what went wrong with Java. You see lots of try { ... } catch (Exception e) { e.printStackException(); } in enterprise code. -- Paulo
Aug 11 2012
prev sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
 Of course it is doing what the language requires, but it is an 
 incorrect diagnostic because a dead assignment is required.

 And being a dead assignment, it can lead to errors when the 
 code is later modified, as I explained. I also dislike on 
 aesthetic grounds meaningless code being required.
It is not meaningless, it's declarative. The same resulting code as now would be generated, but it's easier for the maintainer to understand what's being meant.
 That perhaps is your experience with other languages (that do 
 not default initialize) showing. I don't think that default 
 initialization is so awful. In fact, C++ enables one to specify 
 default initialization for user defined types. Are you against 
 that, too?
No, because user-defined types can have explicitly initialized members. I do think that member fields relying on the default initializer are ambiguous and should be explicit, but flow analysis on aggregate members is not going to work in any current point. even though D is my personal favourite.
 You can certainly say it's a copout, but it's what I see them 
 do. I've never seen them initialize to NaN, but I've seen the 
 "just throw in a 0" many times.
Again, I agree with this - except the examples are not from D, and certainly not from the future D that is being proposed. I don't blame anyone from steering away from NaN in other C-style languages. I do, however, believe that D programmers are perfectly capable of doing the right thing if informed. And let's face it - there's a lot that relies on education in D, like whether to receive a string parameter as const or immutable, and using scope on a subset of callback parameters. Both of these examples require more typing than the intuitive/straight-forward choice (always receive `string` and no `scope` on delegates), but informed D programmers still choose the more lengthy, correct version. Consider `pure` member functions - turns out most of them are actually pure because the implicit `this` parameter is allowed to be mutated and it's rare for a member function to mutate global state, yet we all strive to correctly decorate our methods `pure` when applicable.
 Because experience shows that even the yellers tend to do the 
 short, convenient one rather than the longer, correct one. 
 Bruce Eckel wrote an article about this years ago in reference 
 to why Java exception specifications were a failure and 
 actually caused people to write bad code, including those who 
 knew better.
I don't think the comparison is fair. Compared to Java exception specifications, the difference between '0' and 'float.nan'/'float.init' is negligible, especially in generic functions when the desired initializer would typically be 'T.init'. Java exception specifications have widespread implications for the entire codebase, while the difference between '0' and 'float.nan' is constant and entirely a local improvement.
Aug 11 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/11/2012 7:30 AM, Jakob Ovrum wrote:
 On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
 Of course it is doing what the language requires, but it is an incorrect
 diagnostic because a dead assignment is required.

 And being a dead assignment, it can lead to errors when the code is later
 modified, as I explained. I also dislike on aesthetic grounds meaningless code
 being required.
It is not meaningless, it's declarative. The same resulting code as now would be generated, but it's easier for the maintainer to understand what's being meant.
No, it is not easier to understand, because there's no way to determine if the intent is to: 1. initialize to a valid value -or- 2. initialize to get the compiler to stop complaining
 I do, however, believe that D programmers are perfectly capable of doing the
 right thing if informed.
Of course they are capable of it. But experience shows they simply don't.
 Consider `pure` member functions - turns out most of them are actually pure
 because the implicit `this` parameter is allowed to be mutated and it's rare
for
 a member function to mutate global state, yet we all strive to correctly
 decorate our methods `pure` when applicable.
A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.
 Java exception specifications have widespread implications for the entire
 codebase, while the difference between '0' and 'float.nan' is constant and
 entirely a local improvement.
I believe there's a lot more potential for success when you have a design where the easiest way is the correct way, and you've got to make some effort to do it wrong. Much of my attitude on that goes back to my experience at Boeing on designing things (yes, my boring Boeing anecdotes again), and Boeing's long experience with pilots and mechanics and what they actually do vs what they're trained to do. (And not only are these people professionals, not fools, but their lives depend on doing it right.) Over and over and over again, the easy way had better be the correct way. I could bore you even more with the aviation horror stories I heard that justified that attitude.
Aug 12 2012
next sibling parent simendsjo <simendsjo gmail.com> writes:
On Sun, 12 Aug 2012 12:38:47 +0200, Walter Bright  
<newshound2 digitalmars.com> wrote:
 On 8/11/2012 7:30 AM, Jakob Ovrum wrote:
 On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
 Consider `pure` member functions - turns out most of them are actually  
 pure
 because the implicit `this` parameter is allowed to be mutated and it's  
 rare for
 a member function to mutate global state, yet we all strive to correctly
 decorate our methods `pure` when applicable.
A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.
I have thought that many times. The same with default non-null class references. I keep adding assert(someClass) everywhere.
Aug 12 2012
prev sibling next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 12.08.2012 12:38, schrieb Walter Bright:
 On 8/11/2012 7:30 AM, Jakob Ovrum wrote:
 Consider `pure` member functions - turns out most of them are actually pure
 because the implicit `this` parameter is allowed to be mutated and it's rare
for
 a member function to mutate global state, yet we all strive to correctly
 decorate our methods `pure` when applicable.
A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.
its never to late - put it back on the list for D 3 - please (and local variables are immuteable by default - or seomthing like that)
Aug 12 2012
parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Sunday, 12 August 2012 at 11:34:20 UTC, dennis luehring wrote:
 Am 12.08.2012 12:38, schrieb Walter Bright:
 A better design would be to have pure be the default and 
 impure would require annotation. The same for const/immutable. 
 Unfortunately, it's too late for that now. My fault.
its never to late - put it back on the list for D 3 - please (and local variables are immutable by default - or something like that)
Agreed. If it is only a signature change then it might have been possible to accept such a change; as I'm sure it would simplify quite a bit of signatures and only complicate a few. Probably default signatures to try and include are: pure, and safe (Others off hand I can't think of). Make a list of all the issues/mistakes that can be done in D3 (be it ten or fifteen years from now); who knows, maybe the future is just around the corner if there's a big enough reason for it. The largest reason not to make big changes is so people don't get fed up and quit (especially while still trying to write library code); That and this is suppose to be the 'stable' D2 language right now with language changes having to be weighted heavily on.
Aug 12 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Sunday, 12 August 2012 at 10:39:01 UTC, Walter Bright wrote:
 No, it is not easier to understand, because there's no way to 
 determine if the intent is to:

 1. initialize to a valid value -or-
 2. initialize to get the compiler to stop complaining
If there is an explicit initializer, it means that the intent is either of those two. The latter case is probably quite rare, and might suggest a problem with the code - if the compiler can't prove your variable to be initialized, then the programmer probably has to spend some time figuring out the real answer. Legitimate cases of the compiler being too conservative can be annotated with a comment to eliminate the ambiguity. The interesting part is that you can be sure that variables *without* initializers are guaranteed to be initialized at a later point, or the program won't compile. Without the guarantee, the default value could be intended as a valid initializer or there could be a bug in the program. The current situation is not bad, I just think the one that allows for catching more errors at compile-time is much, much better.
 Of course they are capable of it. But experience shows they 
 simply don't.
If they do it for contagious attributes like const, immutable and pure, I'm sure they'll do it for a simple fix like using explicit 'float.nan' in the rare case the compiler can't prove initialization before use.
 A better design would be to have pure be the default and impure 
 would require annotation. The same for const/immutable. 
 Unfortunately, it's too late for that now. My fault.
I agree, but on the flip side it was easier to port D1 code to D2 this way, and that might have saved D2 from even further alienation by some D1 users during its early stages. The most common complaints I remember from the IRC channel were complaints about const and immutable which was now forced on D programs to some degree due to string literals. This made some people really apprehensive about moving their code to D2, and I can imagine the fallout would be a lot worse if they had to annotate all their impure functions etc.
 I believe there's a lot more potential for success when you 
 have a design where the easiest way is the correct way, and 
 you've got to make some effort to do it wrong. Much of my 
 attitude on that goes back to my experience at Boeing on 
 designing things (yes, my boring Boeing anecdotes again), and 
 Boeing's long experience with pilots and mechanics and what 
 they actually do vs what they're trained to do. (And not only 
 are these people professionals, not fools, but their lives 
 depend on doing it right.)

 Over and over and over again, the easy way had better be the 
 correct way. I could bore you even more with the aviation 
 horror stories I heard that justified that attitude.
Problem is, we've pointed out the easy way has issues and is not necessarily correct.
Aug 12 2012
prev sibling parent "Adam Wilson" <flyboynw gmail.com> writes:
On Sun, 12 Aug 2012 03:38:47 -0700, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 8/11/2012 7:30 AM, Jakob Ovrum wrote:
 On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
 Of course it is doing what the language requires, but it is an  
 incorrect
 diagnostic because a dead assignment is required.

 And being a dead assignment, it can lead to errors when the code is  
 later
 modified, as I explained. I also dislike on aesthetic grounds  
 meaningless code
 being required.
It is not meaningless, it's declarative. The same resulting code as now would be generated, but it's easier for the maintainer to understand what's being meant.
No, it is not easier to understand, because there's no way to determine if the intent is to: 1. initialize to a valid value -or- 2. initialize to get the compiler to stop complaining
 I do, however, believe that D programmers are perfectly capable of  
 doing the
 right thing if informed.
Of course they are capable of it. But experience shows they simply don't.
 Consider `pure` member functions - turns out most of them are actually  
 pure
 because the implicit `this` parameter is allowed to be mutated and it's  
 rare for
 a member function to mutate global state, yet we all strive to correctly
 decorate our methods `pure` when applicable.
A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.
 Java exception specifications have widespread implications for the  
 entire
 codebase, while the difference between '0' and 'float.nan' is constant  
 and
 entirely a local improvement.
I believe there's a lot more potential for success when you have a design where the easiest way is the correct way, and you've got to make some effort to do it wrong. Much of my attitude on that goes back to my experience at Boeing on designing things (yes, my boring Boeing anecdotes again), and Boeing's long experience with pilots and mechanics and what they actually do vs what they're trained to do. (And not only are these people professionals, not fools, but their lives depend on doing it right.) Over and over and over again, the easy way had better be the correct way. I could bore you even more with the aviation horror stories I heard that justified that attitude.
As a pilot, I completely agree! -- Adam Wilson IRC: LightBender Project Coordinator The Horizon Project http://www.thehorizonproject.org/
Aug 12 2012
prev sibling parent reply Chad J <chadjoan __spam.is.bad__gmail.com> writes:
On 08/10/2012 06:01 PM, Walter Bright wrote:
 On 8/10/2012 1:38 AM, F i L wrote:
 Walter Bright wrote:
 3. Floating point values are default initialized to NaN.
with just as much optimization/debugging benefit (arguably more so, because it catches NaN class Foo { float x; // defaults to 0.0f void bar() { float y; // doesn't default y ++; // ERROR: use of unassigned local float z = 0.0f; z ++; // OKAY } } This is the same behavior for any local variable,
It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one: float z; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; To diagnose this correctly, the static analyzer would have to determine that condition1 produces the same result as condition2, or not. This is impossible to prove. So the static analyzer either gives up and lets it pass, or issues an incorrect diagnostic. So our intrepid programmer is forced to write: float z = 0; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; Now, as it may turn out, for your algorithm the value "0" is an out-of-range, incorrect value. Not a problem as it is a dead assignment, right? But then the maintenance programmer comes along and changes condition1 so it is not always the same as condition2, and now the z++ sees the invalid "0" value sometimes, and a silent bug is introduced. This bug will not remain undetected with the default NaN initialization.
To address the concern of static analysis being too hard: I wish we could have it but limit the amount of static analysis that's done. Something like this: the compiler will test branches of if-else statements and switch-case statements, but it will not drop into function calls with ref parameters nor will it accept initialization in looping constructs (foreach, for, while, etc). A compiler is an incorrect implementation if it implements /too much/ static analysis. The example code you give can be implemented with such limited static analysis: void lotsaCode() { ... lotsa code ... } float z; if ( condition1 ) { z = 5; lotsaCode(); z++; } else { lotsaCode(); } I will, in advance, concede that this does not prevent people from just writing "float z = 0;". In my dream-world the compiler recognizes a set of common mistake-inducing patterns like the one you mentioned and then prints helpful error messages suggesting alternative design patterns. That way, bugs are prevented and users become better programmers.
Aug 11 2012
parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 11 August 2012 at 23:49:18 UTC, Chad J wrote:
 On 08/10/2012 06:01 PM, Walter Bright wrote:
 It catches only a subset of these at compile time. I can craft 
 any number of ways of getting it to miss diagnosing it. 
 Consider this one:

 float z;
 if (condition1)
 z = 5;
 ... lotsa code ...
 if (condition2)
 z++;

 To diagnose this correctly, the static analyzer would have to 
 determine that condition1 produces the same result as 
 condition2, or not. This is impossible to prove. So the static 
 analyzer either gives up and lets it pass, or issues an 
 incorrect diagnostic. So our intrepid programmer is forced to 
 write:

 float z = 0;
 if (condition1)
 z = 5;
 ... lotsa code ...
 if (condition2)
 z++;

 Now, as it may turn out, for your algorithm the value "0" is 
 an out-of-range, incorrect value. Not a problem as it is a 
 dead assignment, right?

 But then the maintenance programmer comes along and changes 
 condition1 so it is not always the same as condition2, and now 
 the z++ sees the invalid "0" value sometimes, and a silent bug 
 is introduced.

 This bug will not remain undetected with the default NaN 
 initialization.
Let's keep in mind everyone of these truths: 1) Programmers are lazy; If you can get away with not initializing something then you'll avoid it. In C I've failed to initialized variables many times until a bug crops up and it's difficult to find sometimes, where a NaN or equiv would have quickly cropped them out before running with any real data. 2) There are a lot of inexperienced programmers. I worked for a company for a short period of time that did minimal training on a language like Java, where I ended up being seen as an utter genius (compared to even the teachers). 3) Bugs in a large environment and/or scenarios are far more difficult if not impossible to debug. I've made a program that handles merging of various dialogs (using double linked-like lists); I can debug them if they are 100 or less to work with, but after 100 (and often it's tens of thousands) it can become such a pain based on it's indirection and how the original structure was built that I refuse based on difficulty vs end results (Plus sanity). We also need to sometimes laugh at our mistakes, and learn from others. I'll recommend everyone read from rinkworks a bit if you have the time and refresh yourselves. http://www.rinkworks.com/stupid/cs_programming.shtml
Aug 11 2012
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
F i L:

 Walter Bright wrote:
 3. Floating point values are default initialized to NaN.
conveniently
An alternative possibility is to: 1) Default initialize variables just as currently done in D, with 0s, NaNs, etc; 2) Where the compiler is certain a variable is read before any possible initialization, it generates a compile-time error; 3) Warnings for unused variables and unused last assignments. Where the compiler is not sure, not able to tell, or sees there is one or more paths where the variable is initialized, it gives no errors, and eventually the code will use the default initialized values, as currently done in D. The D compiler is already doing this a little, if you compile this with -O: class Foo { void bar() {} } void main() { Foo f; f.bar(); } You get at compile-time: temp.d(6): Error: null dereference in function _Dmain A side effect of those rules is that this code doesn't compile, and similarly lot of current D code: class Foo {} void main() { Foo f; assert(f is null); } Bye, bearophile
Aug 11 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/11/2012 2:41 PM, bearophile wrote:
 2) Where the compiler is certain a variable is read before any possible
 initialization, it generates a compile-time error;
This has been suggested repeatedly, but it is in utter conflict with the whole notion of default initialization, which nobody complains about for user-defined types.
Aug 11 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/11/12 7:33 PM, Walter Bright wrote:
[snip]

Allow me to insert an opinion here. This post illustrates quite well how 
opinionated our community is (for better or worse).

The OP has asked a topical question in a matter that is interesting and 
also may influence the impact of the language to the larger community. 
Before long the thread has evolved into the familiar pattern of a debate 
over a minor issue on which reasonable people may disagree and that's 
unlikely to change. We should instead do our best to give a balanced 
high-level view of what D offers for econometrics.

To the OP - here are a few aspects that may deserve interest:

* Modeling power - from what I understand econometrics is 
modeling-heavy, which is more difficult to address in languages such as 
Fortran, C, C++, Java, Python, or the likes of Matlab.

* Efficiency - D generates native code for floating point operations and 
has control over data layout and allocation. Speed of generated code is 
dependent on the compiler, and the reference compiler (dmd) does a 
poorer job at it than the gnu-based compiler (gdc) compiler.

* Convenience - D is designed to "do what you mean" wherever possible 
and simplify common programming tasks, numeric or not. That makes the 
language comfortable to use even by a non-specialist, in particular in 
conjunction with appropriate libraries.

A few minuses I can think of:

- Maturity and availability of numeric and econometrics library is an 
obvious issue. There are some libraries (e.g. 
https://github.com/kyllingstad/scid/wiki) maintained and extended 
through volunteer effort.

- The language's superior modeling power and level of control comes at 
an increase in complexity compared to languages such as e.g. Python. So 
the statistician would need a larger upfront investment in order to reap 
the associated benefits.


Andrei
Aug 11 2012
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 - The language's superior modeling power and level of control 
 comes at an increase in complexity compared to languages such 
 as e.g. Python. So the statistician would need a larger upfront 
 investment in order to reap the associated benefits.
Statistician often use the R language (http://en.wikipedia.org/wiki/R_language ). Python contains much more "computer science" and CS complexity compared to R. Not just advanced stuff like coroutines, metaclasses, decorators, Abstract Base Classes, operator overloading, and so on, but even simpler things, like generators, standard library collections like heaps and deques, and so on. For some statisticians I've seen, even several parts of Python are too much hard to use or understand. I have rewritten several of their Python scripts. Bye, bearophile
Aug 11 2012
parent reply "dsimcha" <dsimcha yahoo.com> writes:
On Sunday, 12 August 2012 at 03:30:24 UTC, bearophile wrote:
 Andrei Alexandrescu:

 - The language's superior modeling power and level of control 
 comes at an increase in complexity compared to languages such 
 as e.g. Python. So the statistician would need a larger 
 upfront investment in order to reap the associated benefits.
Statistician often use the R language (http://en.wikipedia.org/wiki/R_language ). Python contains much more "computer science" and CS complexity compared to R. Not just advanced stuff like coroutines, metaclasses, decorators, Abstract Base Classes, operator overloading, and so on, but even simpler things, like generators, standard library collections like heaps and deques, and so on. For some statisticians I've seen, even several parts of Python are too much hard to use or understand. I have rewritten several of their Python scripts. Bye, bearophile
For people with more advanced CS/programming knowledge, though, this is an advantage of D. I find Matlab and R incredibly frustrating to use for anything but very standard matrix/statistics computations on data that's already structured the way I like it. This is mostly because the standard CS concepts you mention are at best awkward and at worst impossible to express and, being aware of them, I naturally want to take advantage of them. Using Matlab or R feels like being forced to program with half the tools in my toolbox either missing or awkwardly misshapen, so I avoid it whenever practical. (Actually, languages like C and Java that don't have much modeling power feel the same way to me now that I've primarily used D and to a lesser extent Python for the past few years. Ironically, these are the languages that are easy to integrate with R and Matlab respectively. Do most serious programmers who work in problem domains relevant to Matlab and R feel this way or is it just me?). This was my motivation for writing Dstats and mentoring Cristi's fork of SciD. D's modeling power is so outstanding that I was able to replace R and Matlab for a lot of use cases with plain old libraries written in D.
Aug 12 2012
next sibling parent "TJB" <broughtj gmail.com> writes:
On Sunday, 12 August 2012 at 17:22:21 UTC, dsimcha wrote:

 ...  I find Matlab and R incredibly frustrating to use for 
 anything but very standard matrix/statistics computations on 
 data that's already structured the way I like it.
This is exactly how I feel, and why I am turning to D. My data sets are huge (64 TB for just a few years of data) and my econometric methods computationally intensive and the limitations of Matlab and R are always almost instantly constraining.
 Using Matlab or R feels like being forced to program with half 
 the tools in my toolbox either missing or awkwardly misshapen, 
 so I avoid it whenever practical.  Actually, languages like C 
 and Java that don't have much modeling power feel the same way 
 to me ...
Very well put - it expresses my feeling precisely. And C++ is such a complicated beast that I feel caught in between. I'd been dreaming of a language that offers modeling power as well as efficiency.
 ...  Do most serious programmers who work in problem domains 
 relevant to Matlab and R feel this way or is it just me?.
I certainly feel the same. I only use them when I have to or for very simple prototyping.
 This was my motivation for writing Dstats and mentoring 
 Cristi's fork of SciD.  D's modeling power is so outstanding 
 that I was able to replace R and Matlab for a lot of use cases 
 with plain old libraries written in D.
Thanks for your work on these packages! I will for sure be including them in my write up. I think they offer great possibilities for econometrics in D. TJB
Aug 12 2012
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 12/08/12 18:22, dsimcha wrote:
 For people with more advanced CS/programming knowledge, though, this is an
 advantage of D.  I find Matlab and R incredibly frustrating to use for anything
 but very standard matrix/statistics computations on data that's already
 structured the way I like it.  This is mostly because the standard CS concepts
 you mention are at best awkward and at worst impossible to express and, being
 aware of them, I naturally want to take advantage of them.
The main use-case and advantage of both R and MATLAB/Octave seems to me to be the plotting functionality -- I've seen some exceptionally beautiful stuff done with R in particular, although I've not personally explored its capabilities too far. The annoyance of R in particular is the impenetrable thicket of dependencies that can arise among contributed packages; it feels very much like some are thrown over the wall and then built on without much concern for organization. :-(
Aug 12 2012
parent "dsimcha" <dsimcha yahoo.com> writes:
On Monday, 13 August 2012 at 01:52:28 UTC, Joseph Rushton 
Wakeling wrote:
 The main use-case and advantage of both R and MATLAB/Octave 
 seems to me to be the plotting functionality -- I've seen some 
 exceptionally beautiful stuff done with R in particular, 
 although I've not personally explored its capabilities too far.

 The annoyance of R in particular is the impenetrable thicket of 
 dependencies that can arise among contributed packages; it 
 feels very much like some are thrown over the wall and then 
 built on without much concern for organization. :-(
I've addressed that, too :). https://github.com/dsimcha/Plot2kill Obviously this is a one-man project without nearly the same number of features that R and Matlab have, but like Dstats and SciD, it has probably the 20% of functionality that handles 80% of use cases. I've used it for the figures in scientific articles that I've submitted for publication and in my Ph.D. proposal and dissertation. Unlike SciD and Dstats, Plot2kill doesn't highlight D's modeling capabilities that much, but it does get the job done for simple 2D plots.
Aug 12 2012
prev sibling next sibling parent reply "TJB" <broughtj gmail.com> writes:
On Sunday, 12 August 2012 at 02:28:44 UTC, Andrei Alexandrescu 
wrote:
 On 8/11/12 7:33 PM, Walter Bright wrote:
 [snip]

 Allow me to insert an opinion here. This post illustrates quite 
 well how opinionated our community is (for better or worse).

 The OP has asked a topical question in a matter that is 
 interesting and also may influence the impact of the language 
 to the larger community. Before long the thread has evolved 
 into the familiar pattern of a debate over a minor issue on 
 which reasonable people may disagree and that's unlikely to 
 change. We should instead do our best to give a balanced 
 high-level view of what D offers for econometrics.

 To the OP - here are a few aspects that may deserve interest:

 * Modeling power - from what I understand econometrics is 
 modeling-heavy, which is more difficult to address in languages 
 such as Fortran, C, C++, Java, Python, or the likes of Matlab.

 * Efficiency - D generates native code for floating point 
 operations and has control over data layout and allocation. 
 Speed of generated code is dependent on the compiler, and the 
 reference compiler (dmd) does a poorer job at it than the 
 gnu-based compiler (gdc) compiler.

 * Convenience - D is designed to "do what you mean" wherever 
 possible and simplify common programming tasks, numeric or not. 
 That makes the language comfortable to use even by a 
 non-specialist, in particular in conjunction with appropriate 
 libraries.

 A few minuses I can think of:

 - Maturity and availability of numeric and econometrics library 
 is an obvious issue. There are some libraries (e.g. 
 https://github.com/kyllingstad/scid/wiki) maintained and 
 extended through volunteer effort.

 - The language's superior modeling power and level of control 
 comes at an increase in complexity compared to languages such 
 as e.g. Python. So the statistician would need a larger upfront 
 investment in order to reap the associated benefits.


 Andrei
Andrei, Thanks for bringing this back to the original topic and for your thoughts. Indeed, a lot of econometricians are using MATLAB, R, Guass, Ox and the like. But there are a number of econometricians who need the raw power of a natively compiled language (especially financial econometricians whose data are huge) who typically program in either Fortran or C/C++. It is really this group that I am trying to reach. I think D has a lot to offer this group in terms of programmer productivity and reliability of code. I think this applies to statisticians as well, as I see a lot of them in this latter group too. I also want to reach the MATLABers because I think they can get a lot more modeling power (I like how you put that) without too much more difficulty (see Ox - nearly as complicated as C++ but without the power). Many MATLAB and R programmers end up recoding a good part of their algorithms in C++ and calling that code from the interpreted language. I have always found this kind of mixed language programming to be messy, time consuming, and error prone. Special tools are cropping up to handle this (see Rcpp). This just proves to me the usefulness of a productive AND powerful language like D for econometricians! I am sensitive to the drawbacks you mention (especially lack of numeric libraries). I am so sick of wasting my time in C++ though that I have almost decided to just start writing my own econometric library in D. Earlier in this thread there was a discussion of extended precision in D and I mentioned the need to recode things like BLAS and LAPACK in D. Templates in D seem perfect for this problem. As an expert in template meta-programming what are your thoughts? How is this different than what is being done in SciD? It seems they are mostly concerned about wrapping the old CBLAS and CLAPACK libraries. Again, thanks for your thoughts and your TDPL book. Probably the best programming book I've ever read! TJB
Aug 11 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/12/12 12:52 AM, TJB wrote:
 Thanks for bringing this back to the original topic and for your thoughts.

 Indeed, a lot of econometricians are using MATLAB, R, Guass, Ox and the
 like. But there are a number of econometricians who need the raw power
 of a natively compiled language (especially financial econometricians
 whose data are huge) who typically program in either Fortran or C/C++.
 It is really this group that I am trying to reach. I think D has a lot
 to offer this group in terms of programmer productivity and reliability
 of code. I think this applies to statisticians as well, as I see a lot
 of them in this latter group too.

 I also want to reach the MATLABers because I think they can get a lot
 more modeling power (I like how you put that) without too much more
 difficulty (see Ox - nearly as complicated as C++ but without the
 power). Many MATLAB and R programmers end up recoding a good part of
 their algorithms in C++ and calling that code from the interpreted
 language. I have always found this kind of mixed language programming to
 be messy, time consuming, and error prone. Special tools are cropping up
 to handle this (see Rcpp). This just proves to me the usefulness of a
 productive AND powerful language like D for econometricians!
I think this is a great angle. In our lab when I was a grad student in NLP/ML there was also a very annoying trend going on: people would start with Perl for text preprocessing and Matlab for math, and then, after the proof of concept, would need to recode most parts in C++. (I recall hearing complaints about large overheads in Matlab caused by eager copy semantics, is that true?)
 I am sensitive to the drawbacks you mention (especially lack of numeric
 libraries). I am so sick of wasting my time in C++ though that I have
 almost decided to just start writing my own econometric library in D.
 Earlier in this thread there was a discussion of extended precision in D
 and I mentioned the need to recode things like BLAS and LAPACK in D.
 Templates in D seem perfect for this problem. As an expert in template
 meta-programming what are your thoughts? How is this different than what
 is being done in SciD? It seems they are mostly concerned about wrapping
 the old CBLAS and CLAPACK libraries.
There's a large body of experience and many optimizations accumulated in these libraries, which are worth exploiting. The remaining matter is offering a convenient shell. I think Cristi's work on SciD goes that direction. Andrei
Aug 12 2012
parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 (I recall hearing complaints about large overheads in Matlab 
 caused by eager copy semantics, is that true?)
In Matlab there is COW: http://www.matlabtips.com/copy-on-write-in-subfunctions/ Bye, bearophile
Aug 12 2012
prev sibling parent reply "F i L" <witte2008 gmail.com> writes:
Andrei Alexandrescu wrote:
 * Efficiency - D generates native code for floating point 
 operations and has control over data layout and allocation. 
 Speed of generated code is dependent on the compiler, and the 
 reference compiler (dmd) does a poorer job at it than the 
 gnu-based compiler (gdc) compiler.
I'd like to add to this. Right now I'm reworking some libraries to include Simd support using DMD on Linux 64bit. A simple benchmark between DMD and GCC of 2 million simd vector addition/subtractions actually runs faster with my DMD D code than the GCC C code. Only by ~0.8 ms, and that could be due to a difference between D's sdt.datetime.StopWatch() and C's time.h/clock(), but it's consistently faster none-the-less, which is impressive. That said, it's also much easier to "accidentally slow that figure down significantly in DMD, whereas GCC usually always optimizes very well. Also, and I'm not sure this isn't just me, but I ran a DMD vs ~88ms). Now a similar test compiled with DMD 2.060 runs at optimization improvements in the internal DMD compiler over the last few version.
Aug 12 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/12/2012 6:38 PM, F i L wrote:
 Also, and I'm not sure this isn't just me, but I ran a DMD (v2.057 T think)

ms

2.060

optimization
 improvements in the internal DMD compiler over the last few version.
There's a fair amount of low hanging optimization fruit that D makes possible that dmd does not take advantage of. I hope to get to this. One thing is I suspect that D can generate much better SIMD code than C/C++ can without compiler extensions. Another is that D allows values to be moved without needing a copyconstruct/destruct operation.
Aug 13 2012
prev sibling parent reply "TJB" <broughtj gmail.com> writes:
On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:
 On 8/9/2012 10:40 AM, dsimcha wrote:
 I'd emphasize the following:
I'd like to add to that: 1. Proper support for 80 bit floating point types. Many compilers' libraries have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 80 bit floats reduce the incidence of creeping roundoff error.
How unique to D is this feature? Does this imply that things like BLAS and LAPACK, random number generators, statistical distribution functions, and other numerical software should be rewritten in pure D rather than calling out to external C or Fortran codes? TJB
Aug 10 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/10/2012 8:31 AM, TJB wrote:
 On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:
 On 8/9/2012 10:40 AM, dsimcha wrote:
 I'd emphasize the following:
I'd like to add to that: 1. Proper support for 80 bit floating point types. Many compilers' libraries have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 80 bit floats reduce the incidence of creeping roundoff error.
How unique to D is this feature? Does this imply that things like BLAS and LAPACK, random number generators, statistical distribution functions, and other numerical software should be rewritten in pure D rather than calling out to external C or Fortran codes?
I attended a talk given by a physicist a few months ago where he was using C transcendental functions. I pointed out to him that those functions were unreliable, producing wrong bits in a manner that suggested to me that they were internally truncating to double precision. He expressed astonishment and told me I must be mistaken. What can I say? I run across this repeatedly, and that's exactly why Phobos (with Don's help) has its own implementations, rather than simply calling the corresponding C ones. I encourage you to run your own tests, and draw your own conclusions.
Aug 10 2012
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, August 10, 2012 15:10:47 Walter Bright wrote:
 What can I say? I run across this repeatedly, and that's exactly why Phobos
 (with Don's help) has its own implementations, rather than simply calling
 the corresponding C ones.
I think that it's pretty typical for programmers to think that something like a standard library function is essentially bug-free - especially for an older language like C. And unless you see results that are clearly wrong or someone else points out the problem, I don't know why you'd ever think that there was one. I certainly had no clue that C implementations had issues with floating point arithmetic before it was pointed out here. Regardless though, it's great that D gets it right. - Jonathan M Davis
Aug 10 2012
prev sibling parent "TJB" <broughtj gmail.com> writes:
On Friday, 10 August 2012 at 22:11:23 UTC, Walter Bright wrote:
 On 8/10/2012 8:31 AM, TJB wrote:
 On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright 
 wrote:
 On 8/9/2012 10:40 AM, dsimcha wrote:
 I'd emphasize the following:
I'd like to add to that: 1. Proper support for 80 bit floating point types. Many compilers' libraries have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 80 bit floats reduce the incidence of creeping roundoff error.
How unique to D is this feature? Does this imply that things like BLAS and LAPACK, random number generators, statistical distribution functions, and other numerical software should be rewritten in pure D rather than calling out to external C or Fortran codes?
I attended a talk given by a physicist a few months ago where he was using C transcendental functions. I pointed out to him that those functions were unreliable, producing wrong bits in a manner that suggested to me that they were internally truncating to double precision. He expressed astonishment and told me I must be mistaken. What can I say? I run across this repeatedly, and that's exactly why Phobos (with Don's help) has its own implementations, rather than simply calling the corresponding C ones. I encourage you to run your own tests, and draw your own conclusions.
Hopefully this will help make the case that D is the best choice for numerical programmers. I want to do my part to convince economists. Another reason to implement BLAS and LAPACK in pure D is that the old routines like dgemm, cgemm, sgemm, and zgemm (all defined for different types) seem ripe for templatization. Almost thou convinceth me ... TJB
Aug 10 2012
prev sibling next sibling parent reply Justin Whear <justin economicmodeling.com> writes:
On Thu, 09 Aug 2012 17:57:27 +0200, TJB wrote:

 Hello D Users,
 
 The Software Editor for the Journal of Applied Econometrics has agreed
 to let me write a review of the D programming language for
 econometricians (econometrics is where economic theory and statistical
 analysis meet).  I will have only about 6 pages.  I have an idea of what
 I am going to write about, but I thought I would ask here what features
 are most relevant (in your minds) to numerical programmers writing codes
 for statistical inference.
 
 I look forward to your suggestions.
 
 Thanks,
 
 TJB
Lazy ranges are a lifesaver when dealing with big data. E.g. read a large csv file, use filter and map to clean and transform the data, collect stats as you go, then output to a destination file. The lazy nature of most of the ranges in Phobos means that you don't need to have the data in memory, but you can write simple imperative code just as if it was.
Aug 09 2012
parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Thursday, 9 August 2012 at 18:20:08 UTC, Justin Whear wrote:
 On Thu, 09 Aug 2012 17:57:27 +0200, TJB wrote:

 Hello D Users,
 
 The Software Editor for the Journal of Applied Econometrics 
 has agreed
 to let me write a review of the D programming language for
 econometricians (econometrics is where economic theory and 
 statistical
 analysis meet).  I will have only about 6 pages.  I have an 
 idea of what
 I am going to write about, but I thought I would ask here what 
 features
 are most relevant (in your minds) to numerical programmers 
 writing codes
 for statistical inference.
 
 I look forward to your suggestions.
 
 Thanks,
 
 TJB
Lazy ranges are a lifesaver when dealing with big data. E.g. read a large csv file, use filter and map to clean and transform the data, collect stats as you go, then output to a destination file. The lazy nature of most of the ranges in Phobos means that you don't need to have the data in memory, but you can write simple imperative code just as if it was.
Ah, the beauty of functional programming and streams.
Aug 09 2012
prev sibling parent "Minas Mina" <minas_mina1990 hotmail.co.uk> writes:
1) I think compile-time function execution is a very big plus for 
people doing calculations.

For example:

ulong fibonacci(ulong n) { .... }

static x = fibonacci(50); // calculated at compile time! runtime 
cost = 0 !!!

2) It has support for a BigInt structure in its standard library 
(which is really fast!)
Aug 10 2012