digitalmars.D - Thoughts: Aliasing

Norbert Nemec (80/80) Jul 02 2004 Hi there,

Ivan Senji (6/20) Jul 02 2004 any

Norbert Nemec (6/8) Jul 02 2004 True. The vector expressions I'm imagining would be a generalization of ...

David Barrett (19/27) Jul 02 2004 Instead of using vector expressions, why not just add an "unordered" blo...

Norbert Nemec (21/40) Jul 03 2004 Interesting idea. Anyhow, in this form it seems a bit over-simplistic to...

Sam McCall (14/20) Jul 03 2004 A problem might be that the index in the for loop is iterative, so in

Sam McCall (13/21) Jul 03 2004 Interesting. I don't think this solves the problem in itself though,

Jonathan Leffler (13/21) Jul 02 2004 When I was programming in Fortran 77, the EQUIVALENCE statement was

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Hi there,

just a few more thoughts. This time on the topic of "aliasing" - i.e. the
question whether different pointers may reference the same memory in
certain context. The topic has been discussed before but was not resolved.

Just a short primer on the topic:

------------------
Fortran (at least Fortran 77) does not have pointers at all. Every variable
has exactly one name. The only point where a new name is given to a certain
location of memory is when parameters are passed to a function. Now, it is
simple to state a rule in the specs of Fortran that says: you may never
call a function with the same variable for two arguments. With this simple
rule, you can be sure to have eliminated every possible source of aliasing.
Two different symbols in the source code always mean two different
locations in memory.

C has pointers, therefore you may have aliasing all over the place.

The result of this is, that Fortran compilers are allowed to do certain
kinds of optimization that C compilers may not do. This leads to a certain
performance gain for Fortran. Especially for numerics, where performace is
of utmost importance, it has been a fact for a long time that C programs
can never quite measure with the performace of Fortran programs.

Now one of the recent developments in C (yes, C does still evolve) was the
introduction of the keyword "restrict" to address this issue. The keyword
just tells the compiler: "This variable is not aliased, feel free to
optimize." - there is no check or semantic meaning connected with it, it is
just a hint for the compiler, a "permission" to do certain optimizations
that it would in generally not be allowed to do.

Anyhow, this "restrict" really is a crude hack which should, by no means, be
repeated in D. There were a few alternative ideas proposed, none of them
really satisfying.
--------------------

Now, to continue with my recent thoughts:

--------------------
What is the kind of optimizations we are talking about? Mostly reordering of
statements, especially loops. Looking only at the local statements, the
compiler can often easily determine whether the order of execution of
consecutive statements does matter. If it does not, it may often find that
swapping independent statements gives a performance gain. This is
especially true for modern processors where the cache has to used
efficiently and the pipeline has to be filled very carefully to really
exploit the speed of the processor (we are talking about factors of 10 and
more in performance here!)

Anyhow, if the code is dealing with pointers that might be aliased, the
order of execution might be important. Just think of copying data between
overlapping blocks of memory. Of course, such situations rarely happen in a
program. In most cases, there will be no aliasing, but still, the mere
chance of it prevents the compiler of doing some important reordering of
code. Therefore the "restrict" keyword - even though being a crude hack -
really made some difference in C.
---------------------

Now, what is my conclusion

---------------------
We can completely forget about the aliasing problem once we have really
powerful vectorization expressions!

What is a vectorized expression? Basically, a loop that does not specify any
order of execution. If there is no order specified, of course the compiler
can choose any one that is efficient or maybe even distribute the code and
execute it in parallel. Of course it is up to the programmer to use this
tool in a way that does not depend on the order of execution - otherwise,
he'll get undefined results.

The problem of C and Fortran is, that there are no vector expressions at
all. You always have to write a good-old for-loop, even if the order of
execution does not matter of all. Now it is up to the compiler to decide
whether order does matter which need some close analysis. Fortran compiler
have better chances to detect loops that can be reordered, but still it is
suboptimal compared to languages with vector expressions that simply tell
the compiler: hey - pick any order you like!

How does this solve the aliasing-problem? The problem is gone completely for
vector expressions! The question of aliasing does only matter for the
decision of whether or not the order has to be preserved. If there is no
order given, there is no decision to make.

Of course, a good compiler should still try to optimize the good-old-loops
as well, but there is little reason to be found to worry about it too much.
If you want fast code, you should just use vector expressions wherever
possible.
-----------------------

Of course, this all is just for the far future. We don't have vector
expressions yet, but I don't think anyone would really worry about aliasing
in D yet, anyway. (Except, of course, looking at the future.)

Ciao,
Nobbi

Jul 02 2004

"Ivan Senji" <ivan.senji public.srce.hr> writes:

"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:cc4htg$kav$1 digitaldaemon.com...
 ...
 What is a vectorized expression? Basically, a loop that does not specify

any
 order of execution. If there is no order specified, of course the compiler
 can choose any one that is efficient or maybe even distribute the code and
 execute it in parallel. Of course it is up to the programmer to use this
 tool in a way that does not depend on the order of execution - otherwise,
 he'll get undefined results.

Here's an answer to my question. So are vectorized expressions basically
future array operations?

...

 -----------------------

 Of course, this all is just for the far future. We don't have vector
 expressions yet, but I don't think anyone would really worry about

aliasing
 in D yet, anyway. (Except, of course, looking at the future.)

 Ciao,
 Nobbi

Jul 02 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Ivan Senji wrote:

 Here's an answer to my question. So are vectorized expressions basically
 future array operations?

True. The vector expressions I'm imagining would be a generalization of the
array expressions we currently have (defined but not implemented).

Array expressions as they are are hard to extend. It is nice to have
elementwise addition, multiplication etc. but its just a toy if you cannot
go beyond.

Jul 02 2004

"David Barrett" <dbarrett quinthar.com> writes:

Instead of using vector expressions, why not just add an "unordered" block:







Just notify the compiler that the contained statements can be executed in
any order.  Obviously, within a given statement order matters (y-- must
complete before sqrt() starts), but the the X and Z calculations can be done
independently.

Not that I'm opposed to vectorized expressions; just suggesting another
angle from which to approach the problem.  Lots of problems can't be
decomposed to a clean vectorized solution, so this might give the compiler
options to find parallel operations (and fill out the pipeline) in those
areas.

-david

"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:cc4k7b$nbo$2 digitaldaemon.com...
 Ivan Senji wrote:

 Here's an answer to my question. So are vectorized expressions basically
 future array operations?

 True. The vector expressions I'm imagining would be a generalization of

the
 array expressions we currently have (defined but not implemented).

 Array expressions as they are are hard to extend. It is nice to have
 elementwise addition, multiplication etc. but its just a toy if you cannot
 go beyond.

Jul 02 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

David Barrett wrote:

 Instead of using vector expressions, why not just add an "unordered"
 block:
 





 
 Just notify the compiler that the contained statements can be executed in
 any order.  Obviously, within a given statement order matters (y-- must
 complete before sqrt() starts), but the the X and Z calculations can be
 done independently.
 
 Not that I'm opposed to vectorized expressions; just suggesting another
 angle from which to approach the problem.  Lots of problems can't be
 decomposed to a clean vectorized solution, so this might give the compiler
 options to find parallel operations (and fill out the pipeline) in those
 areas.

Interesting idea. Anyhow, in this form it seems a bit over-simplistic to me.
In this form, the "unordered" block does not specify at which graining the
reordering may happen (if the unordered block contains other blocks and
loops, what should happen to them?) Furthermore, it smells a lot like one
of these compiler hints that are not needed for the functionality.
Therefore any sane programmer would leave them out first and then, in the
end, go through the code to spice it up with some unordered statements.
(Where most locations could probably be detected by the compiler anyway. If
there is no possible aliasing involved, the compiler certainly is just as
good as any programmer in finding the swappable commands.)

One other, more specific alternative would be parallelizing blocks. I.e.
something like

        parallel for(int i=0;i<10;i++) {
                a[i] += b[i];
        }

meaning that the individual runs of the loops can happen in parallel.
Anyhow, this is a completely new can of worms. There are many more language
elements that could be added in that manner.

This kind of language constructs would also be very interesting to
investigate, but it leads into a completely different direction.

Jul 03 2004

Sam McCall <tunah.d tunah.net> writes:

Norbert Nemec wrote:

 One other, more specific alternative would be parallelizing blocks. I.e.
 something like
 
         parallel for(int i=0;i<10;i++) {
                 a[i] += b[i];
         }

A problem might be that the index in the for loop is iterative, so in 
general you don't know ahead of time what indices will be used.

What about an intrinsic function
vectorise(void delegate(T,...) func,T[] list ...)

which would do
for(int i=0;i<list.length;i++)
	func(list[i]);
except in arbitrary order.
The function pointed to by the delegate would presumably be declared 
either inline or at least local to the function.
Of course there'd have to be a way of ensuring that creating the 
delegate doesn't stop the function being inline.

Sam

Jul 03 2004

Sam McCall <tunah.d tunah.net> writes:

David Barrett wrote:

 Instead of using vector expressions, why not just add an "unordered" block:
 





 

Interesting. I don't think this solves the problem in itself though, 
because in general the statements can't be written without a loop, take 
vector multiplication for example.

Also useful would be
simultaneous {
	x=y;
	y=x;
}
but I'm probably dreaming/been using too many functional languages. It's 
probably not possible in general without saying something like "if any 
statement calls a function whose side effects affect another statement, 
result is undefined" :-\

Jul 03 2004

Jonathan Leffler <jleffler earthlink.net> writes:

Norbert Nemec wrote:
 Fortran (at least Fortran 77) does not have pointers at all. Every variable
 has exactly one name. The only point where a new name is given to a certain
 location of memory is when parameters are passed to a function. Now, it is
 simple to state a rule in the specs of Fortran that says: you may never
 call a function with the same variable for two arguments. With this simple
 rule, you can be sure to have eliminated every possible source of aliasing.
 Two different symbols in the source code always mean two different
 locations in memory.

When I was programming in Fortran 77, the EQUIVALENCE statement was 
still available - not to mention common blocks.  Both allow you to 
dink with a given expanse of memory, calling the same locations by 
different names and different types under different circumstances.

That said, it is pure quibbling - you are right that clean Fortran is 
a lot easier to handle than the majority of languages.  And one of the 
trickier bugs I ever had to deal with in Fortran was where someone had 
casually used the same variable twice in a function calling list.

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler earthlink.net, jleffler us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/

Jul 02 2004

D Programming

C/C++ Programming

Other

digitalmars.D - Thoughts: Aliasing