digitalmars.D.learn - three little issues

spir (83/83) Feb 06 2011 Hello,

bearophile (8/10) Feb 06 2011 The situation is easy to understand once you know how generally what a s...

spir (67/68) Feb 06 2011 Right, I may be successful to store by value as you say, before the fram...

bearophile (8/18) Feb 06 2011 Accepting the second is a bug in the escape analysis done by the front-e...

spir <denis.spir gmail.com> writes:

Hello,

Here are three little issues I faced while implemented a lexing toolkit (see 
other post).

1. Regex match

Let us say there are three "natures" or "modes" of lexeme:
* SKIP: not even kept, just matched and dropped (eg optional spacing)
* MARK: kept, but slice is irrelevant data (eg all kinds of punctuation)
* DATA: slice is necessary data (eg constant value or symbol)

For the 2 first cases, I still need to get the size ot the matched slice, to 
advance in source by the corresponding offset. Is there a way to get this 
information without fetching the slice by calling hit()?

Also, I would like to know when Regex.hit() copies or slices.


2. reference escape

This is a little enigma I face somewhere in this module. Say S is a struct:
     ...
     auto s = S(data);
     return &s;
This code is obvioulsy wrong and the compiler gently warns me about that. But 
the variant below is allowed and more, seems towork fine:
     return &(S(data);
For me, both versions are synonym. Thus, why does the compiler accept the 
latter and why does it work? Any later use to the returned struct (recorded in 
an array) should miserably fail with segfault. (*)
Or is it that the compiler recognises the idiom and implicitely allocates the 
struct outside the local stack?
Example:

struct S { int i; }

S* newS (int i) {
     if (i < 0)
         return null;
//  auto s = S(i);
//  return &s;  // Error: escaping reference to local s
     return &(S(i));
}

unittest {
     int[] ints = [2, -2, 1, -1, 0];
     S[] structs;
     foreach (i ; ints) {
         auto p = newS(i);
         if (p) {
             structs ~= *p;      // explicite deref!
         }
     }
     assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

How can this work?


3. implicite deref

But there is even more mysterious for me: if I first access the struct before 
recording it like in:

unittest {
     int[] ints = [2, -2, 1, -1, 0];
     S[] structs;
     foreach (i ; ints) {
         auto p = newS(i);
         if (p) {
             write (p.i,' ');    // implicite deref!
             structs ~= *p;      // explicite deref!
         }
     }
     assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

...then the final assert fails!? But the written i's are correct ("2 1 0").
Worse, if I exchange the two deref lines:

unittest {
     int[] ints = [2, -2, 1, -1, 0];
     S[] structs;
     foreach (i ; ints) {
         auto p = newS(i);
         if (p) {
             structs ~= *p;      // explicite deref!
             write (p.i,' ');    // implicite deref!
         }
     }
     assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

...then the assertion passes, but the written integers are wrong (looks like 
either garbage or an address, repeated 3 times, eg: "134518949 134518949 
134518949"; successive runs constantly produce the same value).


Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 06 2011

bearophile <bearophileHUGS lycos.com> writes:

spir:

 2. reference escape
 3. implicite deref

The situation is easy to understand once you know how generally what a stack
frame is and how C functions are called:
http://en.wikipedia.org/wiki/Stack_frame
The D call stack is a contiguous-allocated backwards-single-linked list of
differently-sized records, each record is a stack frame, and the whole data
structure is of course managed as stack :-)

When you have similar doubts I also suggest you to take a look at the asm DMD
generates. Writing asm requires some work, but reading a bit of asm is
something you may learn in few days or even one day.

Before a D function starts, a stack frame is created. It will contain your
stack-allocated struct instance. When the function ends its stack frame is
destroyed virtually by moving a stack pointer, so the struct may be overwritten
by other things, like by a call to writeln that creates many stack frames. If
the stack frame is not overwritten and you save by *value* the stack contents,
you have successfully saved your data in the array of S, but accessing
virtually deleted data in the stack is a bad practice to avoid.

Bye,
bearophile

Feb 06 2011

spir <denis.spir gmail.com> writes:

On 02/06/2011 02:13 PM, bearophile wrote:
 Before a D function starts, a stack frame is created. It will contain your
stack-allocated struct instance. When the function ends its stack frame is
destroyed virtually by moving a stack pointer, so the struct may be overwritten
by other things, like by a call to writeln that creates many stack frames. If
the stack frame is not overwritten and you save by*value*  the stack contents,
you have successfully saved your data in the array of S, but accessing
virtually deleted data in the stack is a bad practice to avoid.

Right, I may be successful to store by value as you say, before the frame is 
overwritten, and so-to-say by chance.

But this does not explain why the compiler refuses:
	// 1
	auto s = S(data);
	return &s;
and accepts:
	// 2
	return &(S(data));
or does it?
What are the supposed differences in semantics or behaviour, if any? For 
(naive) me, these 2 pieces of code are exactly synonym (and I would be happy 
with the compiler suppressing s in 1 or instead creating an intermediate var in 
2, whatever it judges better).

I have a third version, in the case where I need to check something in s before 
returning its address:
	// 3
	auto p = &(S(data));
	if ((*p).check())
	    return null;
	return p;
(This is just a synopsis). I need to write it that way, else it's refused. (I 
mean I cannot first have an s var explicitely, check on it directly, then take 
it's address as return value).

Another use case is where S's are in an array, else both the synopsis and the 
solution are analog to the last code above. Since they are struct values, I use 
a pointer to avoid a useless local copy. What do you think of this idiom? Is it 
common? Is it good at all?
Real code:

     /** AST Node constructed from lexeme of type typeCode, if any,
         at current position in lexeme stream --else null.
         Node's constructor must expect the lexeme's slice as (only) input.
     */
     Node node (Node) (string typeCode) if (is(Node == class)) {
         // Avoid useless local copy of lexeme by using pointer
         // (instead of local struct variable).
         Lexeme* pointer = &(this.lexemes[this.cursor]);
         if ((*pointer).typeCode == typeCode) {
             ++ this.cursor;
             return new Node((*pointer).slice);
         }
         return null;
     }

My "spontaneous" version of this code would indeed be:

     Node node (Node) (string typeCode) if (is(Node == class)) {
         Lexeme lexeme = this.lexemes[this.cursor];
         if (lexeme.typeCode == typeCode) {
             ++ this.cursor;
             return new Node(lexeme.slice);
         }
         return null;
     }

The aim is avoiding copying pieces of the (plain text) source when lexing, 
parsing, constructing the AST. If I'm right in analysing my app as of now, I 
have, thank to D's "view" slices, exactly 0 copy from source text to AST. 
Meaning even AST nodes which hold a piece of the source text (strings, symbols, 
maybe more) actually have a view of the very original source. In any other 
language (or is it in my pr2vious coding style?), I would have copied at the 
very minimum once. Even in a dynamic language (which strings are indeed 
ref'ed), to create the first "slice" (in the sense of substring).
Thank you, Walter!

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 06 2011

bearophile <bearophileHUGS lycos.com> writes:

spir:

 But this does not explain why the compiler refuses:
 	// 1
 	auto s = S(data);
 	return &s;
 and accepts:
 	// 2
 	return &(S(data));
 or does it?

Accepting the second is a bug in the escape analysis done by the front-end, I
think. 
But see also what Walter has invented here:
http://en.wikipedia.org/wiki/Return_value_optimization


 What are the supposed differences in semantics or behaviour, if any?

Regarding what the compiler actually does, take a look at the produced asm.


 (This is just a synopsis). I need to write it that way, else it's refused.

Don't return pointers to memory present in to-be-deleted stack frames.

Bye,
bearophile

Feb 06 2011

D Programming

C/C++ Programming

Other

digitalmars.D.learn - three little issues