www.digitalmars.com         C & C++   DMDScript  

D - D in scientific computing

reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Hi there,

as you might have gathered from my previous posts, I am quite interested in
numerical/scientific computing.

I have no idea how many experts in this area have had a look on the current
state of the D language. From all I can gather, the scientific community
has not really taken much notice of this great language at all.

From what I have read so far, D really has the potential to close the gap
between C++ and Fortran and in this way gain a huge share in the
scientific/high-performance area of computing. Anyway, to have any chance
to go there, much care has to be taken now.

People unfamiliar with numeric programming often wonder why Fortran still
has such a huge share among scientists. Many scientists still use Fortran77
and even those who have moved to Fortran95 only use it for its modern
syntax, never touching the advanced concepts of it. And this is not only
because they don't know better, but also because it is extremely hard to
match the performance of Fortran77! (OK, 99% of the reason might actually
be the lazyness to learn a different language and the existing code-base,
but still people usual argue based on the superior performance of the
language)

Actually, programming in Fortran77 is not much different from programming in
plain old C (different syntax, many differences in details, but similar
philosophy) Anyway, there are rather subtle details in the language
definition of C that prevent optimization of the code the same way good
Fortran compliers can do. Therefore the same algorithm  is very often
somewhat slower in C. (The difference may be marginal but big enough for
Fortran-believers to defend their language...)

Many people have spent a lot of time on C and C++ to repair at least the
most pressing details (one of the results of that is the "restrict" keyword
introduced recently)

It would really be advisable to get numerics experts (preferably those who
worked on trying to close the gap between C/C++ and Fortran) to take a look
at D and try to eliminate showstoppers before the language has been
standardized. I don't claim to be an expert in these matters, but I have
digged into the subject deep enough to realize how close one has to look to
understand the issues of true high-performance computing in scientific
applications.

For those interest in more details, the forum at http://www.oonumerics.org
might be a good starting point.

Ciao,
Nobbi
Apr 21 2004
next sibling parent reply "Matthew" <matthew.hat stlsoft.dot.org> writes:
I think what you propose is sensible. Who knows where we can did up some
scientific programmers?

"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c65fmi$2js$1 digitaldaemon.com...
 Hi there,

 as you might have gathered from my previous posts, I am quite interested in
 numerical/scientific computing.

 I have no idea how many experts in this area have had a look on the current
 state of the D language. From all I can gather, the scientific community
 has not really taken much notice of this great language at all.

 From what I have read so far, D really has the potential to close the gap
 between C++ and Fortran and in this way gain a huge share in the
 scientific/high-performance area of computing. Anyway, to have any chance
 to go there, much care has to be taken now.

 People unfamiliar with numeric programming often wonder why Fortran still
 has such a huge share among scientists. Many scientists still use Fortran77
 and even those who have moved to Fortran95 only use it for its modern
 syntax, never touching the advanced concepts of it. And this is not only
 because they don't know better, but also because it is extremely hard to
 match the performance of Fortran77! (OK, 99% of the reason might actually
 be the lazyness to learn a different language and the existing code-base,
 but still people usual argue based on the superior performance of the
 language)

 Actually, programming in Fortran77 is not much different from programming in
 plain old C (different syntax, many differences in details, but similar
 philosophy) Anyway, there are rather subtle details in the language
 definition of C that prevent optimization of the code the same way good
 Fortran compliers can do. Therefore the same algorithm  is very often
 somewhat slower in C. (The difference may be marginal but big enough for
 Fortran-believers to defend their language...)

 Many people have spent a lot of time on C and C++ to repair at least the
 most pressing details (one of the results of that is the "restrict" keyword
 introduced recently)

 It would really be advisable to get numerics experts (preferably those who
 worked on trying to close the gap between C/C++ and Fortran) to take a look
 at D and try to eliminate showstoppers before the language has been
 standardized. I don't claim to be an expert in these matters, but I have
 digged into the subject deep enough to realize how close one has to look to
 understand the issues of true high-performance computing in scientific
 applications.

 For those interest in more details, the forum at http://www.oonumerics.org
 might be a good starting point.

 Ciao,
 Nobbi
Apr 21 2004
parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Matthew wrote:

 I think what you propose is sensible. Who knows where we can did up some
 scientific programmers?
I already contacted the mailing list at oonumerics that should be perfect for this purpose. I hope I'll attract people from there and get some discussion going on the list (which would probably attract even more people to take a look at D)
Apr 21 2004
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Norbert Nemec wrote:

 Hi there,
 
 as you might have gathered from my previous posts, I am quite interested in
 numerical/scientific computing.
 
 I have no idea how many experts in this area have had a look on the current
 state of the D language. From all I can gather, the scientific community
 has not really taken much notice of this great language at all.
 
 From what I have read so far, D really has the potential to close the gap
 between C++ and Fortran and in this way gain a huge share in the
 scientific/high-performance area of computing. Anyway, to have any chance
 to go there, much care has to be taken now.
<snip> You've practically taken the words out of my mouth here. My university department uses Fortran 90, though there is still some F77 legacy code around. Apparently they think C++ is too complex, or at least that OOP is an unnecessary complexity as far as scientific programming is concerned. I don't know if my department could be persuaded to adopt D, considering its simplicity over C++. It probably wouldn't happen in my time, particularly considering the number of things that have been recently redesigned and rewritten in F90 (including my code, which was in C++ before). The features I like in F90 (and which are useful for SP) are built-in vector arithmetic and aggregate functions. The former is in the D language, it just needs to be finally put into the compiler. I've briefly suggested the latter.... http://www.digitalmars.com/drn-bin/wwwnews?D/21671 Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Apr 21 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Stewart Gordon wrote:
 The features I like in F90 (and which are useful for SP) are built-in
 vector arithmetic and aggregate functions.  The former is in the D
 language, it just needs to be finally put into the compiler.  I've
 briefly suggested the latter....
 http://www.digitalmars.com/drn-bin/wwwnews?D/21671
I don't really know how much of that has to be in the language. It is nice to have arrays as part of the language, but vectors and matrices should rather be defined in the library. There are plenty of efficient ways to deal with arrays, which can be implemented in a very efficient way by optimizing (perhaps also parallelizing) compilers. Vector arithmetic actually gives arrays special matrix semantics which is not what you would want in general. By far not every array is a matrix so why should it behave like one? vector arithmetic and aggregate functions may just as well be defined in the library. Just encapsulate arrays in your own class and give it all the semantics you need, without forcing everyone else to get that semantics when his arrays are something completely different.
Apr 21 2004
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Norbert Nemec wrote:

 Stewart Gordon wrote:
 
The features I like in F90 (and which are useful for SP) are built-in
vector arithmetic and aggregate functions.  The former is in the D
language, it just needs to be finally put into the compiler.  I've
briefly suggested the latter....
http://www.digitalmars.com/drn-bin/wwwnews?D/21671
I don't really know how much of that has to be in the language. It is nice to have arrays as part of the language, but vectors and matrices should rather be defined in the library. There are plenty of efficient ways to deal with arrays, which can be implemented in a very efficient way by optimizing (perhaps also parallelizing) compilers.
The same applies to vector arithmetic.
 Vector arithmetic actually gives arrays special matrix semantics which is
 not what you would want in general. By far not every array is a matrix so
 why should it behave like one?
If you don't want to do arithmetic on a certain array, you don't have to.
 vector arithmetic and aggregate functions may just as well be defined in the
 library. Just encapsulate arrays in your own class and give it all the
 semantics you need, without forcing everyone else to get that semantics
 when his arrays are something completely different.
Operating on each element of an array is an intuitive and common concept, hence arguably a reasonable default. One that saves having to define a class for such a simple task. It also saves having to do two memory allocations for every vector. It also doesn't prevent someone from encapsulating an array in a class, and defining operations on it, for more specialised purposes. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Apr 21 2004
parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Sorry, misunderstanding of terms. Obviously you mean
"vector arithmetic"="elementwise operations" - there I fully agree with you.
It is good to have that in the language. I was just disturbed by an
association of vectors with linear algebra with matrix arithmetic. And that
definitely should go in the library.

For aggregates, though, I'm still not sure, whether sum,product,max,min,
etc. have to be in the language or whether they could just go in the
library. Might be an idea, though, thinking of parallelizing compilers.
Apr 21 2004
prev sibling parent reply Drew McCormack <drewmccormack mac.com> writes:
On 2004-04-21 13:33:46 +0200, Norbert Nemec <Norbert.Nemec gmx.de> said:

 Stewart Gordon wrote:
 The features I like in F90 (and which are useful for SP) are built-in
 vector arithmetic and aggregate functions.  The former is in the D
 language, it just needs to be finally put into the compiler.  I've
 briefly suggested the latter....
 http://www.digitalmars.com/drn-bin/wwwnews?D/21671
I don't really know how much of that has to be in the language. It is nice to have arrays as part of the language, but vectors and matrices should rather be defined in the library. There are plenty of efficient ways to deal with arrays, which can be implemented in a very efficient way by optimizing (perhaps also parallelizing) compilers. Vector arithmetic actually gives arrays special matrix semantics which is not what you would want in general. By far not every array is a matrix so why should it behave like one? vector arithmetic and aggregate functions may just as well be defined in the library. Just encapsulate arrays in your own class and give it all the semantics you need, without forcing everyone else to get that semantics when his arrays are something completely different.
I agree that matrices, which are basically mathematical tools from linear algebra, do not belong in the core language. But powerful multidimensional arrays are important, if you are to avoid the mess that C++ is when it comes to high-performance programming. To be effective, you must be able to create multidimensional static arrays at run time on the heap. As far as I can tell, you can only set the size of dynamic arrays at run time at the moment, but the data in these arrays are not contiguous in memory, and thus not good for most high-performane computting. We really need to be able to do this, to get a continuous 2d array: int n = 10, m = 20; double[][] a = new double[n][m]; Drew McCormack
Apr 28 2004
parent reply "Matthew" <matthew.hat stlsoft.dot.org> writes:
"Drew McCormack" <drewmccormack mac.com> wrote in message
news:c6q5h7$2u5k$1 digitaldaemon.com...
 On 2004-04-21 13:33:46 +0200, Norbert Nemec <Norbert.Nemec gmx.de> said:

 Stewart Gordon wrote:
 The features I like in F90 (and which are useful for SP) are built-in
 vector arithmetic and aggregate functions.  The former is in the D
 language, it just needs to be finally put into the compiler.  I've
 briefly suggested the latter....
 http://www.digitalmars.com/drn-bin/wwwnews?D/21671
I don't really know how much of that has to be in the language. It is nice to have arrays as part of the language, but vectors and matrices should rather be defined in the library. There are plenty of efficient ways to deal with arrays, which can be implemented in a very efficient way by optimizing (perhaps also parallelizing) compilers. Vector arithmetic actually gives arrays special matrix semantics which is not what you would want in general. By far not every array is a matrix so why should it behave like one? vector arithmetic and aggregate functions may just as well be defined in the library. Just encapsulate arrays in your own class and give it all the semantics you need, without forcing everyone else to get that semantics when his arrays are something completely different.
I agree that matrices, which are basically mathematical tools from linear algebra, do not belong in the core language. But powerful multidimensional arrays are important, if you are to avoid the mess that C++ is when it comes to high-performance programming.
Have you looked at the very efficient (though poorly documented) multidimensional arrays in STLSoft (http://stlsoft.org/)? There's fixed_array for fixed-dimensional arrays (up to four) with dimension extents variable at runtime, and frame_array (soon to be renamed to something more meaningful) which have a fixed number of dimensions and fixed extents (a thin veneer for STL compatility over built-in arrays). I reckon they're about as close to the bone as C++ will let you get. I'd be interested in hearing your opinions of the implementations. Their storage is contiguous
Apr 29 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Matthew wrote:
 Have you looked at the very efficient (though poorly documented)
 multidimensional arrays in STLSoft (http://stlsoft.org/)? There's
 fixed_array for fixed-dimensional arrays (up to four) with dimension
 extents variable at runtime, and frame_array (soon to be renamed to
 something more meaningful) which have a fixed number of dimensions and
 fixed extents (a thin veneer for STL compatility over built-in arrays).
 
 I reckon they're about as close to the bone as C++ will let you get. I'd
 be interested in hearing your opinions of the implementations.
 
 Their storage is contiguous
Just found them. What I'm missing is slicing. Actually, the internal representation of STL has only to be generalized slightly to give far more power to arrays. The vision I have of rectangular arrays in D is, to take the concept of the current dynamic arrays, where many different array references can point to different slices of the same physical data block. One array reference does not need to know anything but position of its own [0,0] element, and the range and the stride of all dimensions. The stride for the last dimension would usually be one, but allowing different values will immediately give you grid-slicing at no additional cost. It will even allow direct access to fortran arrays and even reversing the array without moving anything in the memory. Many array implementations have plain arrays and array slices as different types. I believe, this is not necessary. Assuming that every array has arbitrary strides will allow much more flexible use of libraries etc. Optimizing the code for cases where certain strides are one should then be left to the compiler.
Apr 29 2004
parent reply "Ivan Senji" <ivan.senji public.srce.hr> writes:
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c6qrhd$119j$1 digitaldaemon.com...
 Matthew wrote:
 Have you looked at the very efficient (though poorly documented)
 multidimensional arrays in STLSoft (http://stlsoft.org/)? There's
 fixed_array for fixed-dimensional arrays (up to four) with dimension
 extents variable at runtime, and frame_array (soon to be renamed to
 something more meaningful) which have a fixed number of dimensions and
 fixed extents (a thin veneer for STL compatility over built-in arrays).

 I reckon they're about as close to the bone as C++ will let you get. I'd
 be interested in hearing your opinions of the implementations.

 Their storage is contiguous
Just found them. What I'm missing is slicing. Actually, the internal representation of STL has only to be generalized slightly to give far more power to arrays. The vision I have of rectangular arrays in D is, to take the concept of
the
 current dynamic arrays, where many different array references can point to
 different slices of the same physical data block.

 One array reference does not need to know anything but position of its own
 [0,0] element, and the range and the stride of all dimensions.

 The stride for the last dimension would usually be one, but allowing
 different values will immediately give you grid-slicing at no additional
 cost. It will even allow direct access to fortran arrays and even
reversing
 the array without moving anything in the memory.

 Many array implementations have plain arrays and array slices as different
 types. I believe, this is not necessary. Assuming that every array has
 arbitrary strides will allow much more flexible use of libraries etc.
 Optimizing the code for cases where certain strides are one should then be
 left to the compiler.
I agree that arrays and their slices should be the same type, but rectangular arrays and jagged arrays should be different types.
Apr 29 2004
parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Ivan Senji wrote:

 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c6qrhd$119j$1 digitaldaemon.com...
 Matthew wrote:
 Have you looked at the very efficient (though poorly documented)
 multidimensional arrays in STLSoft (http://stlsoft.org/)? There's
 fixed_array for fixed-dimensional arrays (up to four) with dimension
 extents variable at runtime, and frame_array (soon to be renamed to
 something more meaningful) which have a fixed number of dimensions and
 fixed extents (a thin veneer for STL compatility over built-in arrays).

 I reckon they're about as close to the bone as C++ will let you get.
 I'd be interested in hearing your opinions of the implementations.

 Their storage is contiguous
Just found them. What I'm missing is slicing. Actually, the internal representation of STL has only to be generalized slightly to give far more power to arrays. The vision I have of rectangular arrays in D is, to take the concept of
the
 current dynamic arrays, where many different array references can point
 to different slices of the same physical data block.

 One array reference does not need to know anything but position of its
 own
 [0,0] element, and the range and the stride of all dimensions.

 The stride for the last dimension would usually be one, but allowing
 different values will immediately give you grid-slicing at no additional
 cost. It will even allow direct access to fortran arrays and even
reversing
 the array without moving anything in the memory.

 Many array implementations have plain arrays and array slices as
 different types. I believe, this is not necessary. Assuming that every
 array has arbitrary strides will allow much more flexible use of
 libraries etc. Optimizing the code for cases where certain strides are
 one should then be left to the compiler.
I agree that arrays and their slices should be the same type, but rectangular arrays and jagged arrays should be different types.
Of course. As I mentioned, I'm working on a detailed proposal, hoping to get to have it written by next week. Jagged arrays would be just what they are now: arrays of references to arrays of arbitrary size. Newly introduced would be references to truely rectangular arrays with arbitrary dimension, ranges and strides.
May 05 2004
prev sibling next sibling parent Ben Hinkle <bhinkle4 juno.com> writes:
 From what I have read so far, D really has the potential to close the gap
 between C++ and Fortran and in this way gain a huge share in the
 scientific/high-performance area of computing. Anyway, to have any chance
 to go there, much care has to be taken now.
In general I'd be pleasantly surprised if D is faster than C/C++ for scientific programming. I was just updating my GMP (www.swox.com/gmp) D wrapper and I timed the D object wrapper vs the C++ object wrapper supplied with GMP on evaluating an expression with about 10 terms thousands and thousands of times and the D version took 2.7 second and the C++ took 3.2 seconds. The reason is that the D version is reusing temporary values and the C++ version doesn't. If I don't recycle temporaries the GC overhead causes the D version to be slower. -Ben
Apr 21 2004
prev sibling next sibling parent reply lacs <lacs_member pathlink.com> writes:
Add primitive-units-checking to D and a lot of people who work with numbers will
fall in love with D. What is primitive-units-checking? It is something we miss
in C++ or Java(I have never read a line of Fortran :$ ), that is being teach in
physic 101 and that could have avoid a Nasa probe crash on Mars(not the digital
one) few years ago(I dont remember the name of the probe sorry. It was a
forgoten miles-kilometers convertion factor problems). Check the following code
written in a fictive language and you will understand what was meant by
primitive-units-checking
float<meter>        distance = 100.0;
float<second>       time     =   3.0;
float<meter/second> speed1;
float<second>       speed2;
float<feet/second>  speed3;

speed1 = distance/time; // no error
speed2 = distance/time; // generate compile-time error
speed3 = distance/time; // ok only if the compiler knows the conversion

// factor between meters and foot because the                           
//compiler can do the conversion for us automatically

As you migth have guessed, the notation float<meter> has nothing to do with c++
templates and wouldnt interfere with it since it can only be used with
primitives. Notice that float<meter> still is a primitive. Of course, a lots of
units are way more complex than meter/second in scientific equations. And its
what would make primitive units checking a time-saving feature in D for
scientific computing.


newbie11234
Apr 21 2004
next sibling parent C <dont respond.com> writes:
Sounds cool!  I'd like to try some template stuff to accomplish this.

Mars Climate Orbiter was the name of the probe btw :).

Charles

lacs wrote:
 Add primitive-units-checking to D and a lot of people who work with numbers
will
 fall in love with D. What is primitive-units-checking? It is something we miss
 in C++ or Java(I have never read a line of Fortran :$ ), that is being teach in
 physic 101 and that could have avoid a Nasa probe crash on Mars(not the digital
 one) few years ago(I dont remember the name of the probe sorry. It was a
 forgoten miles-kilometers convertion factor problems). Check the following code
 written in a fictive language and you will understand what was meant by
 primitive-units-checking
 float<meter>        distance = 100.0;
 float<second>       time     =   3.0;
 float<meter/second> speed1;
 float<second>       speed2;
 float<feet/second>  speed3;
 
 speed1 = distance/time; // no error
 speed2 = distance/time; // generate compile-time error
 speed3 = distance/time; // ok only if the compiler knows the conversion
 
 // factor between meters and foot because the                           
 //compiler can do the conversion for us automatically
 
 As you migth have guessed, the notation float<meter> has nothing to do with c++
 templates and wouldnt interfere with it since it can only be used with
 primitives. Notice that float<meter> still is a primitive. Of course, a lots of
 units are way more complex than meter/second in scientific equations. And its
 what would make primitive units checking a time-saving feature in D for
 scientific computing.
 
 
 newbie11234
Apr 21 2004
prev sibling next sibling parent "Ben Hinkle" <bhinkle4 juno.com> writes:
"lacs" <lacs_member pathlink.com> wrote in message
news:c66dls$1mt5$1 digitaldaemon.com...
 Add primitive-units-checking to D and a lot of people who work with
numbers will
 fall in love with D.
That would be neat. I wonder how far D's template would go without modifying the language. See http://www2.inf.ethz.ch/~meyer/publications/OTHERS/scott_meyers/dimensions.pdf for an approach in C++. There are probably lots and lots of ways to support "units" depending on what the user wants and dimension checking would be a useful start.
Apr 21 2004
prev sibling next sibling parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Is a neat idea, but I'm pretty sure, you can simply implement that in the
library with the language as it is now. A good language should not cover
everything, but it should be slim and still powerful enough to do
everything in the library.

lacs wrote:

 Add primitive-units-checking to D and a lot of people who work with
 numbers will fall in love with D. What is primitive-units-checking? It is
 something we miss in C++ or Java(I have never read a line of Fortran :$ ),
 that is being teach in physic 101 and that could have avoid a Nasa probe
 crash on Mars(not the digital one) few years ago(I dont remember the name
 of the probe sorry. It was a forgoten miles-kilometers convertion factor
 problems). Check the following code written in a fictive language and you
 will understand what was meant by primitive-units-checking
 float<meter>        distance = 100.0;
 float<second>       time     =   3.0;
 float<meter/second> speed1;
 float<second>       speed2;
 float<feet/second>  speed3;
 
 speed1 = distance/time; // no error
 speed2 = distance/time; // generate compile-time error
 speed3 = distance/time; // ok only if the compiler knows the conversion
 
 // factor between meters and foot because the
 //compiler can do the conversion for us automatically
 
 As you migth have guessed, the notation float<meter> has nothing to do
 with c++ templates and wouldnt interfere with it since it can only be used
 with primitives. Notice that float<meter> still is a primitive. Of course,
 a lots of units are way more complex than meter/second in scientific
 equations. And its what would make primitive units checking a time-saving
 feature in D for scientific computing.
 
 
 newbie11234
Apr 21 2004
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
lacs wrote:

 Add primitive-units-checking to D and a lot of people who work with 
 numbers will fall in love with D.
<snip> If only we had a rational number type built in, implementing by templates would be straightforward. Just assign each primitive unit to a different prime number. Otherwise, we could define types to represent units (possibly based on a rational number implementation) and values. Of course, this would move unit checking to the runtime and so wouldn't be good for performance-critical apps. The other approach is to define a struct for each unit (primitive and in combination) that the program is going to use. Operations would be defined to take and return the right types. This would be compile-time checking, but require quite some repetitive code to be written.
 As you migth have guessed, the notation float<meter> has nothing to 
 do with c++ templates and wouldnt interfere with it since it can only 
 be used with primitives.
<snip> Not sure about that. Unless we're going to restrict units to a list of ad-hoc keywords, it'll break CFG just as well as the C++ template syntax does by itself. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Apr 22 2004
prev sibling next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c65fmi$2js$1 digitaldaemon.com...
 From what I have read so far, D really has the potential to close the gap
 between C++ and Fortran and in this way gain a huge share in the
 scientific/high-performance area of computing. Anyway, to have any chance
 to go there, much care has to be taken now.
I agree totally with you, which is why D has several features with a numerics focus. If there are any I missed, I want to know about it.
Apr 21 2004
parent reply Drew McCormack <drewmccormack mac.com> writes:
On 2004-04-21 20:53:10 +0200, "Walter" <walter digitalmars.com> said:

 
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c65fmi$2js$1 digitaldaemon.com...
 From what I have read so far, D really has the potential to close the gap
 between C++ and Fortran and in this way gain a huge share in the
 scientific/high-performance area of computing. Anyway, to have any chance
 to go there, much care has to be taken now.
I agree totally with you, which is why D has several features with a numerics focus. If there are any I missed, I want to know about it.
As you would have gathered, had you read my previous posts, I am strongly in favor of more powerful multidimensional arrays. This should not be left to library writers like in C++, in my view. A library writer can, in theory, write a powerful and fast class, but it holds back adoption in practice because you get problems compiling the library on different platforms, and performance may sometimes be subpar for a given platform. It also often leads to ugly syntax. Basically, it leaves too much to chance to be left to a library. Here are some more for the wish list: It is important to be able to create a contiguous multidimensional array on the heap at run time. There are very few occasions where I know how big an array is at compile time. Elementwise operations are important too, but I think these are already on the list of things to do. Note that they should work with arrays of any shape, eg, double[10][5] a, b; double[][] c; // Initialize a and b here .. c = a + b; Something which I missed when I was working yesterday with D was array literals. These don't work: double[] a = { 1, 2, 3, 4 }; double[4] a = { 1, 2, 3, 4 }; It would be good if they did, and even that array literals were allowed in other contexts: funcToDoSomething( someArgument, {1, 2, 3, 4} ); It would be nice if slicing worked for multidimensional arrays: double [][] c = a[4..5][16,.19]; Of course, there are any number of elementwise functions you can think of (eg max, min, sin etc), but these would belong in the library. I am more concerned to get powerful multidimensional arrays. It is possible to build the other stuff yourself, but if you have to write a multidimensional array class that is high performance, you end up back in the C++ expression template moras. Drew McCormack Free University, Amsterdam
Apr 28 2004
parent reply "Unknown W. Brackets" <unknown at.simplemachines.dot.org> writes:
Drew McCormack wrote:
 double[] a = { 1, 2, 3, 4 };
 double[4] a = { 1, 2, 3, 4 };
 
 It would be good if they did, and even that array literals were allowed 
 in other contexts:
 
 funcToDoSomething( someArgument, {1, 2, 3, 4} );
 
 Drew McCormack
 Free University, Amsterdam
 
http://digitalmars.com/d/arrays.html#bounds Scroll down to "Array Initialization"... it seems you use brackets. (meaning the kind in my name - [ and ].) It doesn't work, of course, for associative arrays though.. sadly... I'm not sure if that's meant to change or not. -[Unknown]
Apr 28 2004
parent reply Drew McCormack <drewmccormack mac.com> writes:
On 2004-04-29 08:43:37 +0200, "Unknown W. Brackets" 
<unknown at.simplemachines.dot.org> said:

 Drew McCormack wrote:
 double[] a = { 1, 2, 3, 4 };
 double[4] a = { 1, 2, 3, 4 };
 
 It would be good if they did, and even that array literals were allowed 
 in other contexts:
 
 funcToDoSomething( someArgument, {1, 2, 3, 4} );
 
 Drew McCormack
 Free University, Amsterdam
 
http://digitalmars.com/d/arrays.html#bounds Scroll down to "Array Initialization"... it seems you use brackets. (meaning the kind in my name - [ and ].) It doesn't work, of course, for associative arrays though.. sadly... I'm not sure if that's meant to change or not. -[Unknown]
Hmm, you're right. Please excuse my ignorance. It would be good if you could do this in places other than initialization though. I still would like array literals. Drew
Apr 29 2004
parent "Unknown W. Brackets" <unknown at.simplemachines.dot.org> writes:
Drew McCormack wrote:
 Hmm, you're right. Please excuse my ignorance.
 It would be good if you could do this in places other than 
 initialization though. I still would like array literals.
 
 Drew
 
No, not ignorance... I happened to remember where it was. I know I would like array literals myself, but as it seems this is planned for the future: http://digitalmars.com/d/future.html -[Unknown]
Apr 29 2004
prev sibling next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c65fmi$2js$1 digitaldaemon.com...
 From what I have read so far, D really has the potential to close the gap
 between C++ and Fortran and in this way gain a huge share in the
 scientific/high-performance area of computing. Anyway, to have any chance
 to go there, much care has to be taken now.

 People unfamiliar with numeric programming often wonder why Fortran still
 has such a huge share among scientists. Many scientists still use
Fortran77
 and even those who have moved to Fortran95 only use it for its modern
 syntax, never touching the advanced concepts of it. And this is not only
 because they don't know better, but also because it is extremely hard to
 match the performance of Fortran77! (OK, 99% of the reason might actually
 be the lazyness to learn a different language and the existing code-base,
 but still people usual argue based on the superior performance of the
 language)
What Fortran has over C is the 'noalias' on function parameters which allows for aggressive optimization. What I'm thinking of is writing the spec for D functions so that parameters are always 'noalias' (for extern (C) functions this would not apply). What do you think? For reference: http://www.lysator.liu.se/c/restrict.html
Apr 21 2004
next sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Walter wrote:
 What Fortran has over C is the 'noalias' on function parameters which
 allows for aggressive optimization. What I'm thinking of is writing the
 spec for D functions so that parameters are always 'noalias' (for extern
 (C) functions this would not apply).
 
 What do you think?
 
 For reference: http://www.lysator.liu.se/c/restrict.html
I only have some basic knowledge about the "restrict" problem and would not want to judge any detailed language design decisions. Anyhow, the problem of both, the Fortran- as well as the C-"restrict"-solution, that I see is, that in both cases, you "demand" that pointers are not aliased, without being able to check it. Of course, it is not possible to check this in every case at compile time (if it were possible, C compilers would not have a problem), but maybe, we can at least identify a class of situations where it is possible? Or, alternatively enforce nonaliasing parameters via preconditions, so it is checked in debugging mode? For plain pointers it rather hard to check for illegal aliasing, since you never know how large the are is that is not allowed to overlap, but pointers are rarely used in D anyway. Object references on the other hand are far easier to check: objects cannot overlap, so two references are either equal or noalias. If you want to allow the compiler to optimize a routine for nonaliased arguments, just put in a precondition prohibiting references to identical objects. For arrays, the matter gets more tricky, but again, arrays have enough semantics that I believe it should be possible to specify and check whether to arrays or slices overlap in memory. The tricky question here is: are two disjunct slices of the same array aliased? The simple way would be to say, that two slices of the same array are always aliased (simple to check, simple to specify, simple to enforce) The more powerful solution would allow disjunct slices to be used as nonaliased objects. (Allowing stuff like copying internal parts of arrays around with full optimization.) So my uneducated suggestion would be: * accept that pointers may always be aliased to anything and don't try to optimize too much there. * prohibit aliased object references by explicit preconditions/assertions * check closely what "aliasing" means for arrays and slices and find a way to prohibit this kind of aliasing in contracts as well. The problem certainly is not trivial to solve, but looking at the time that researchers have spent on working it out in C it certainly is worth some more consideration in D...
Apr 21 2004
parent reply "Walter" <walter digitalmars.com> writes:
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c66quv$2fe5$1 digitaldaemon.com...
 Anyhow, the problem of both, the Fortran- as well as the
 C-"restrict"-solution, that I see is, that in both cases, you "demand"
that
 pointers are not aliased, without being able to check it.
Correct.
 Of course, it is not possible to check this in every case at compile time
 (if it were possible, C compilers would not have a problem),
Correct.
 but maybe, we
 can at least identify a class of situations where it is possible? Or,
 alternatively enforce nonaliasing parameters via preconditions, so it is
 checked in debugging mode?
This is a possibility.
 For plain pointers it rather hard to check for illegal aliasing, since you
 never know how large the are is that is not allowed to overlap, but
 pointers are rarely used in D anyway.
It might be reasonable to constrain the requirement to be just for arrays and objects, not pointers.
 Object references on the other hand are far easier to check: objects
cannot
 overlap, so two references are either equal or noalias.
This is, unfortunately, not true when you get into interfaces. It's also possible for pointers into class objects, as well as arrays referencing into class objects.
 If you want to
 allow the compiler to optimize a routine for nonaliased arguments, just
put
 in a precondition prohibiting references to identical objects.
Historically, adding in special keywords for such optimizations has not worked out well. That's why I was thinking of making it implicit for D function parameters.
 For arrays, the matter gets more tricky, but again, arrays have enough
 semantics that I believe it should be possible to specify and check
whether
 to arrays or slices overlap in memory. The tricky question here is: are
two
 disjunct slices of the same array aliased? The simple way would be to say,
 that two slices of the same array are always aliased (simple to check,
 simple to specify, simple to enforce) The more powerful solution would
 allow disjunct slices to be used as nonaliased objects. (Allowing stuff
 like copying internal parts of arrays around with full optimization.)
Yes, it should be possible to have disjoint slices treated as noaliased.
 So my uneducated suggestion would be:

 * accept that pointers may always be aliased to anything and don't try to
 optimize too much there.
This is a good idea, but I'm concerned it may not be sufficient.
 * prohibit aliased object references by explicit preconditions/assertions
Having the compiler insert runtime checks for debug builds is a good idea. Unfortunately, as you pointed out, adding runtime checks for aliased pointers is impossible.
 * check closely what "aliasing" means for arrays and slices and find a way
 to prohibit this kind of aliasing in contracts as well.

 The problem certainly is not trivial to solve, but looking at the time
that
 researchers have spent on working it out in C it certainly is worth some
 more consideration in D...
Solving it is worthwhile, as it removes a major barrier to the Fortran crowd being interested in upgrading to D. C/C++ failed to supplant Fortran largely for this reason. C99's "restrict" keyword is an ugly kludge (and D users all know how much I hate type modifiers <g>).
Apr 21 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Walter wrote:
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 Object references on the other hand are far easier to check: objects
 cannot overlap, so two references are either equal or noalias.
This is, unfortunately, not true when you get into interfaces. It's also possible for pointers into class objects, as well as arrays referencing into class objects.
OK, at that point it really gets messy. Anyway: if a interface reference points into an object, it should certainly be possible to recover a pointer to the object itself? This, of course, adds a little overhead to the checking algorithm, but in debugging mode that should still be acceptable.
 If you want to
 allow the compiler to optimize a routine for nonaliased arguments, just
 put in a precondition prohibiting references to identical objects.
Historically, adding in special keywords for such optimizations has not worked out well. That's why I was thinking of making it implicit for D function parameters.
True, the language would be simpler that way. Anyway: * You will no only have to think about function arguments but also about references that are stored in objects. Every time the sourcecode handles two references, it should be possible to tell the compiler that they are not aliased. And for this, it I would suggest a builtin function "bool nonaliased(x,y)" that takes two references and checks whether they refer to disjunct portions of the memory. Then you just put an "assert(nonaliased(x,y))" before critical portions of the code and the compiler can happily optimize. * even for function arguments: there certainly are plenty of cases, where it makes perfect sense to pass two references to the same object to some function. I wonder whether it is worth giving up all of these to be able to optimize in certain cases? * if you take my proposal and assume that references may be aliased in general, but allow to give a powerful means (like the above mentioned "nonaliased"-builtin) to specify exactly where the compiler is allowed to optimize, then you don't restrict anyone unnecessarily. And still, authors of timecritical code can examine and specify exactly what they mean be "nonaliased".
 * accept that pointers may always be aliased to anything and don't try to
 optimize too much there.
This is a good idea, but I'm concerned it may not be sufficient.
Guess then, that is a misunderstanding: by "pointers" I mean raw, C-like pointers. You already agreed that these should be allowed to alias anything. Whoever uses pointers just has to accept that they don't get utmost optimization.
 * prohibit aliased object references by explicit preconditions/assertions
Having the compiler insert runtime checks for debug builds is a good idea. Unfortunately, as you pointed out, adding runtime checks for aliased pointers is impossible.
Again: forget about pointers. They may alias anything and cannot efficiently be checked. If anyone wants full optimization, they should use arrays, slices and object references that have enough semantics for the compiler to check for aliasing.
Apr 22 2004
parent reply "Walter" <walter digitalmars.com> writes:
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c67t93$170u$1 digitaldaemon.com...
 * You will no only have to think about function arguments but also about
 references that are stored in objects. Every time the sourcecode handles
 two references, it should be possible to tell the compiler that they are
 not aliased. And for this, it I would suggest a builtin function "bool
 nonaliased(x,y)" that takes two references and checks whether they refer
to
 disjunct portions of the memory. Then you just put an
 "assert(nonaliased(x,y))" before critical portions of the code and the
 compiler can happily optimize.
If I'm understanding this correctly, it has the same problem that the "restrict" and "noalias" keywords in C have - it's too confusing to users to use correctly, as well as being aesthetically not so pleasing. I think it would be better to have the compiler assume they are not aliased (since that is by far the usual case) and have to say when they are not aliased. Also, a runtime check that they really are not aliased might be appropriate in debug mode. Now, since aliasing is sadly allowed in C functions, I was thinking: extern (C) int func(int a[], int b[]) // a and b may be aliased extern (D) int func(int a[], int b[]) // a and be must be disjoint
Apr 22 2004
parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Walter wrote:
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c67t93$170u$1 digitaldaemon.com...
 * You will no only have to think about function arguments but also about
 references that are stored in objects. Every time the sourcecode handles
 two references, it should be possible to tell the compiler that they are
 not aliased. And for this, it I would suggest a builtin function "bool
 nonaliased(x,y)" that takes two references and checks whether they refer
to
 disjunct portions of the memory. Then you just put an
 "assert(nonaliased(x,y))" before critical portions of the code and the
 compiler can happily optimize.
If I'm understanding this correctly, it has the same problem that the "restrict" and "noalias" keywords in C have - it's too confusing to users to use correctly, as well as being aesthetically not so pleasing. I think it would be better to have the compiler assume they are not aliased (since that is by far the usual case) and have to say when they are not aliased. Also, a runtime check that they really are not aliased might be appropriate in debug mode. Now, since aliasing is sadly allowed in C functions, I was thinking: extern (C) int func(int a[], int b[]) // a and b may be aliased extern (D) int func(int a[], int b[]) // a and be must be disjoint
Yes, it may be confusing to users, but then - nobody has to use the feature. Only those people trying to squeeze out performance. People writing numeric libraries etc. - they will gladly accept the fine tuning capabilites. As I said: a simple solution as in Fortran will not buy you much. References can not only come through function arguments but also from object members. Saying that function arguments may not be aliased only covers part of the problem. The fact that makes Fortran 77 so highly optimizable without much language overhead is, that it doesn't have pointers or references at all. Aliasing could *only* come through function arguments. So once this is prohibited, the fortran compiler can simply assume that *nothing whatsoever* is aliased. I don't know how Fortran 95 handles this issue, but I guess that, as soon as you use pointers, performance goes down. In D, since we have references everywhere, any real solution to the aliasing problem will get a bit more complex.
Apr 22 2004
prev sibling next sibling parent Kevin Bealer <Kevin_member pathlink.com> writes:
In article <c66irg$216j$1 digitaldaemon.com>, Walter says...
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c65fmi$2js$1 digitaldaemon.com...
 From what I have read so far, D really has the potential to close the gap
 between C++ and Fortran and in this way gain a huge share in the
 scientific/high-performance area of computing. Anyway, to have any chance
 to go there, much care has to be taken now.

 People unfamiliar with numeric programming often wonder why Fortran still
 has such a huge share among scientists. Many scientists still use
Fortran77
 and even those who have moved to Fortran95 only use it for its modern
 syntax, never touching the advanced concepts of it. And this is not only
 because they don't know better, but also because it is extremely hard to
 match the performance of Fortran77! (OK, 99% of the reason might actually
 be the lazyness to learn a different language and the existing code-base,
 but still people usual argue based on the superior performance of the
 language)
What Fortran has over C is the 'noalias' on function parameters which allows for aggressive optimization. What I'm thinking of is writing the spec for D functions so that parameters are always 'noalias' (for extern (C) functions this would not apply). What do you think? For reference: http://www.lysator.liu.se/c/restrict.html
I am a programmer, working in a scientific area (bioinformatics). I think it would be bad to imply no-aliasing, because it trades safety for performance. A lot of the code here is written by biologists and/or statisticians (some of whom are quite brilliant, but only a few are trained as programmers). They are going to go nuts trying to find bugs like this. If you work on the "heavy lifting" code that really needs performance, you generally understand about the cache line size etc; you can be trusted to know to use the "restrict" keyword. If you are a cytologist, writing new statistics functions, you will not know enough to use the "may-alias" keyword. The burden of knowing that the tradeoffs has to be on the performance-guru; not on the other guy. Kevin
Apr 28 2004
prev sibling parent Drew McCormack <drewmccormack mac.com> writes:
On 2004-04-21 21:38:49 +0200, "Walter" <walter digitalmars.com> said:

 
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c65fmi$2js$1 digitaldaemon.com...
 From what I have read so far, D really has the potential to close the gap
 between C++ and Fortran and in this way gain a huge share in the
 scientific/high-performance area of computing. Anyway, to have any chance
 to go there, much care has to be taken now.
 
 People unfamiliar with numeric programming often wonder why Fortran still
 has such a huge share among scientists. Many scientists still use
Fortran77
 and even those who have moved to Fortran95 only use it for its modern
 syntax, never touching the advanced concepts of it. And this is not only
 because they don't know better, but also because it is extremely hard to
 match the performance of Fortran77! (OK, 99% of the reason might actually
 be the lazyness to learn a different language and the existing code-base,
 but still people usual argue based on the superior performance of the
 language)
What Fortran has over C is the 'noalias' on function parameters which allows for aggressive optimization. What I'm thinking of is writing the spec for D functions so that parameters are always 'noalias' (for extern (C) functions this would not apply). What do you think? For reference: http://www.lysator.liu.se/c/restrict.html
I'm all for it. It's bad programming form anyway, so just forbid it like fortran. It will solve a lot of headaches. Drew
Apr 29 2004
prev sibling parent reply Stephan Wienczny <wienczny web.de> writes:
Norbert Nemec wrote:

 Hi there,
 
 as you might have gathered from my previous posts, I am quite interested in
 numerical/scientific computing.
 
 
 Ciao,
 Nobbi
It would be nice to have classes for really big numbers inside phobos.
Apr 21 2004
parent reply "Matthew" <matthew.hat stlsoft.dot.org> writes:
I agree with that.

Want to write them?

"Stephan Wienczny" <wienczny web.de> wrote in message
news:c66tm4$2k9o$1 digitaldaemon.com...
 Norbert Nemec wrote:

 Hi there,

 as you might have gathered from my previous posts, I am quite interested in
 numerical/scientific computing.


 Ciao,
 Nobbi
It would be nice to have classes for really big numbers inside phobos.
Apr 21 2004
parent reply Stephan Wienczny <wienczny web.de> writes:
Matthew wrote:

 I agree with that.
 
 Want to write them?
 
 
I actually started to months ago, but then wanted to wait until DTL finishes ;-)
Apr 21 2004
next sibling parent J Anderson <REMOVEanderson badmama.com.au> writes:
Stephan Wienczny wrote:

 Matthew wrote:

 I actually started to months ago, but then wanted to wait until DTL 
 finishes ;-)
Your quick. <g> -- -Anderson: http://badmama.com.au/~anderson/
Apr 21 2004
prev sibling parent "Matthew" <matthew.hat stlsoft.dot.org> writes:
Touche! <G>

All I can say is DTL 0.1 will be out as soon as Walter can find the bandwidth to
give me the lang/comp changes I need. It will contain sequence containers. I'm
hoping that this can be within a week, but it's hard to say at this point.

DTL 0.2 will be out once I get feedback on things from people, and will probably
contain some tree and/or associative containers.

Once the basic lang/comp support is there for what I'm trying to do, I see no
reason why the library cannot evolve quickly, and with the input of other
contributors.


"Stephan Wienczny" <wienczny web.de> wrote in message
news:c66vig$2n7s$1 digitaldaemon.com...
 Matthew wrote:

 I agree with that.

 Want to write them?

 I actually started to months ago, but then wanted to wait until DTL
 finishes ;-)
Apr 21 2004