www.digitalmars.com         C & C++   DMDScript  

c++ - ##: "concatenation vs. juxtaposition" full dissertation...

reply dan <dan_member pathlink.com> writes:
(straight from boost email forum; just pasting it below...)


...............................
 Even if 'concatenation' per-se is not called for, and against 
 the Standard, could it be that the "." (dot) relieves the 
 preprocessor from 
 responsibility for
 adding a space at the end of the preceding string (since the 
 dot already 
 acts as
 a kind of 'separator'..)?
No. The preprocessor does not "insert spaces" *ever*. At this point in translation, the preprocessor is operating on preprocessing tokens, not characters. There is a big difference between a lack of whitespace and concatenation. The first simply has adjacent preprocessing tokens, while the second forms a new preprocessing token. E.g. #define ID(x) x #define MACRO(a, b) ID(a)b MACRO(+,+) results in two immediately adjacent '+' preprocessing tokens. There is no intervening whitespace. Whether or not whitespace exists is irrelevant for all purposes *except* stringizing and the creation of an <h-char-sequence>. A preprocessor that does text stream -> text stream must insert whitespace in order to avoid the errant retokenization that would occur when the result gets reprocessed by some other tool (such as a C or C++ compiler). However, that is just a hack to make it work similarly in the presence of retokenization which does not exist in the phases of translation.
 I just find it hilarious how the boost libraries work with so 

 work with DM.  I wouldn't be surprised at all that they'd be 
 all wrong; --won't be the first time that everybody is wrong, 
 but this bug may be just about ready for acceptance by 
 ANSI/ISO/whatever...  ;-)
I wish that arbitrary token-pasting was well-defined. However, the example given doesn't even make sense (per se). The reason is that token-pasting occurs prior to rescanning, so a construction like this: #define #define B(x) x The period (.) gets concatenated to right parenthesis before the expansion of B(x). Even if arbitrary token-pasting was well-defined, the argument 'x' could contain any amount of whitespace, and cause the construction to not work properly: #define EMPTY() A(file EMPTY()) // file .h In other words, there are only certain points in which whitespace is removed or when whitespace is condensed to only a single whitespace. This is not one of them. As I said before, however, this kind of problem only occurs during stringizing and during the creation of a header-name preprocessing token of the form <h-char-sequence>. Further, there is only one sure-fire way to guarantee that no whitespace exists and that is to concatenate to a placemarker preprocessing token ala C99: #define NO_LEADING(x) NO_LEADING_I(, x) #define #define NO_TRAILING(x) NO_TRAILING_I(, x) #define #define NO_LEADING_AND_TRAILING(x) \ NO_LEADING(NO_TRAILING(x)) \ /**/ ..but that is not currently well-defined in C++ as it is in C99.
 --------------------------------------------------------------
 ------------------------
  >The separator inserted by dmc is to make the preprocessor 
 work right, it  >isn't easilly removed. I don't really 
 understand why boost seems to want to  >rely on the 

 was  >added to Standard C specifically to move away from that 
 practice.
Juxtaposition is not concatenation, and a preprocessor that is operating at the character level rather than the preprocessing token level at this point in translation has to jump through hoops to mimic the behavior the actual phases of translation. This is not a kludge on Boost's side, this is a preprocessor implementation kludge revolving around textual representation at a phase of translation where it doesn't exist.
 --------------------------------------------------------------
 ------------------------
 Maybe if someone could paste the section of the Standard 
 dealing with this, 
 I'd much appreciate it.
 Yours.
 dan
There is no section of the standard that *ever* says whitespace should be inserted. There are only places where it says whitespace should be removed or adjacent whitespace should be condensed. Regards, Paul Mensonides
Dec 02 2003
parent reply "Walter" <walter digitalmars.com> writes:
I appreciate your doing this. I still think, however, that tokens are


"dan" <dan_member pathlink.com> wrote in message
news:bqjemg$m3b$1 digitaldaemon.com...
 (straight from boost email forum; just pasting it below...)


 ...............................
 Even if 'concatenation' per-se is not called for, and against
 the Standard, could it be that the "." (dot) relieves the
 preprocessor from
 responsibility for
 adding a space at the end of the preceding string (since the
 dot already
 acts as
 a kind of 'separator'..)?
No. The preprocessor does not "insert spaces" *ever*. At this point in translation, the preprocessor is operating on preprocessing tokens, not characters. There is a big difference between a lack of whitespace and concatenation. The first simply has adjacent preprocessing tokens, while
the
 second forms a new preprocessing token.  E.g.

 #define ID(x) x

 #define MACRO(a, b) ID(a)b

 MACRO(+,+)

 results in two immediately adjacent '+' preprocessing tokens.  There is no
 intervening whitespace.  Whether or not whitespace exists is irrelevant
for all
 purposes *except* stringizing and the creation of an <h-char-sequence>.

 A preprocessor that does text stream -> text stream must insert whitespace
in
 order to avoid the errant retokenization that would occur when the result
gets
 reprocessed by some other tool (such as a C or C++ compiler).  However,
that is
 just a hack to make it work similarly in the presence of retokenization
which
 does not exist in the phases of translation.

 I just find it hilarious how the boost libraries work with so

 work with DM.  I wouldn't be surprised at all that they'd be
 all wrong; --won't be the first time that everybody is wrong,
 but this bug may be just about ready for acceptance by
 ANSI/ISO/whatever...  ;-)
I wish that arbitrary token-pasting was well-defined. However, the
example
 given doesn't even make sense (per se).  The reason is that token-pasting
occurs
 prior to rescanning, so a construction like this:

 #define
 #define B(x) x

 The period (.) gets concatenated to right parenthesis before the expansion
of
 B(x).  Even if arbitrary token-pasting was well-defined, the argument 'x'
could
 contain any amount of whitespace, and cause the construction to not work
 properly:

 #define EMPTY()

 A(file EMPTY()) // file .h

 In other words, there are only certain points in which whitespace is
removed or
 when whitespace is condensed to only a single whitespace.  This is not one
of
 them.  As I said before, however, this kind of problem only occurs during
 stringizing and during the creation of a header-name preprocessing token
of the
 form <h-char-sequence>.

 Further, there is only one sure-fire way to guarantee that no whitespace
exists
 and that is to concatenate to a placemarker preprocessing token ala C99:

 #define NO_LEADING(x) NO_LEADING_I(, x)
 #define

 #define NO_TRAILING(x) NO_TRAILING_I(, x)
 #define

 #define NO_LEADING_AND_TRAILING(x) \
 NO_LEADING(NO_TRAILING(x)) \
 /**/

 ..but that is not currently well-defined in C++ as it is in C99.

 --------------------------------------------------------------
 ------------------------
  >The separator inserted by dmc is to make the preprocessor
 work right, it  >isn't easilly removed. I don't really
 understand why boost seems to want to  >rely on the

 was  >added to Standard C specifically to move away from that
 practice.
Juxtaposition is not concatenation, and a preprocessor that is operating
at the
 character level rather than the preprocessing token level at this point in
 translation has to jump through hoops to mimic the behavior the actual
phases of
 translation.  This is not a kludge on Boost's side, this is a preprocessor
 implementation kludge revolving around textual representation at a phase
of
 translation where it doesn't exist.

 --------------------------------------------------------------
 ------------------------
 Maybe if someone could paste the section of the Standard
 dealing with this,
 I'd much appreciate it.
 Yours.
 dan
There is no section of the standard that *ever* says whitespace should be inserted. There are only places where it says whitespace should be
removed or
 adjacent whitespace should be condensed.

 Regards,
 Paul Mensonides
Dec 02 2003
parent reply dan <dan_member pathlink.com> writes:
I appreciate your doing this. I still think, however, that tokens are

I'm having a hard time understanding his explanation. I think that what he means is that concatenation is not what is intended; --though that is not to mean that an extra space is. I always thought that the preprocessor did pure text substitution; but he seems to violate the initial tokenization. But having tokens with a dot in between like in 'something.else' the tokens are well separated already, adding a white space does nothing of value to it. Whereas with 'something else' it needs to preserve the white space, of course. And so in the case you need to violate initial tokenization to concatenate But in the case of #define a(x) x a(something).else turning that into something.else is not concatenation, nor juxtaposition, for that matter, because no tokens are in fact merging. So, at the text level you might call it concatenation, but at the token level it isn't. But then I'm not sure what happens if the preprocessor encounters, a(something)else Then we're in real trouble... ;-) Donno what the answer is Walter, I posted the whole thing in comp.lang.c++ but no replies yet... Cheers! dan
Dec 02 2003
parent dan <dan_member pathlink.com> writes:
To my question:

.............................
But then I'm not sure what happens if the preprocessor encounters,

a(something)else
.............................

AG replied:

-----------------------------------------------------
16.3.3 [cpp.concat] para 3 (my emphasis):

"For both object-like and function-like macro invocations, before the
replacement list is reexamined for more macro names to replace, each

argument) is deleted and the preceding preprocessing token is concatenated
with the following preprocessing token. *If the result is not a valid
preprocessing token, the behavior is undefined*. [...]"

In the case in question, ")." is definitely not a valid preprocessing token
(it's two).
----------------------------------------------------- there, since it would not result in an invalid token being created. And that if an invalid token were being created, the result is undefined, according to the standard, anyways. Just my take. Cheers! dan
Dec 03 2003