digitalmars.D.bugs - [Issue 3827] New: automatic joining of adjacent strings is bad
- d-bugmail puremagic.com (35/35) Feb 18 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (8/8) Feb 18 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (8/12) Feb 18 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (7/9) Feb 18 2010 And if you think it's needed, you can add the clear error message I was ...
- d-bugmail puremagic.com (10/12) Feb 28 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (10/10) Jun 20 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (13/17) Jun 20 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (7/7) Jun 20 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (10/10) Aug 21 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (18/18) Nov 10 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (14/14) Nov 10 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (7/7) Nov 12 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (8/8) Nov 12 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (7/13) Nov 13 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (10/10) Nov 13 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (32/38) Nov 13 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (18/18) Nov 13 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (16/22) Nov 16 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (7/9) Nov 16 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (13/15) Nov 16 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (14/21) Nov 16 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (12/24) Nov 16 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (7/8) Nov 17 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (6/6) Nov 17 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (6/17) Nov 22 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (7/7) Mar 20 2011 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (8/20) Mar 10 2012 http://d.puremagic.com/issues/show_bug.cgi?id=3827
- d-bugmail puremagic.com (22/28) Mar 10 2012 http://d.puremagic.com/issues/show_bug.cgi?id=3827
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Summary: automatic joining of adjacent strings is bad Product: D Version: 2.040 Platform: All OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: DMD AssignedTo: nobody puremagic.com ReportedBy: bearophile_hugs eml.cc import std.stdio; void main() { string[] a = ["foo", "bar" "baz", "spam"]; writeln(a); } This code prints: foo barbaz spam But probably the programmer meant to create an array with 4 strings. D has the ~ concat operator, so to prevent possible programming bugs it's better to remove the implicit concat of strings separated by whitespace. Everywhere the programmer wants to concat strings the explicit concat operator can be used: string s = "this is a very long string that doesn't fit in" ~ " a line"; The "Python Zen" has a rule that says: Explicit is better than implicit. The compiler can optimize the concat away at compile time. C code ported to D that doesn't put a ~ just raises a compile time error that's easy to understand and fix. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 18 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 --- Created an attachment (id=571) patch for parse.c Vote++ and patch -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 18 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827Created an attachment (id=571) [details] patch for parse.c Vote++ and patchThank you. But is DMD doing the joining with ~ at compile time? If not, then you can add that optimization to your patch (if you are able to). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 18 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827Thank you. But is DMD doing the joining with ~ at compile time? If not, then you can add that optimization to your patch (if you are able to).And if you think it's needed, you can add the clear error message I was talking about :-) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 18 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Alexey Ivanov <aifgi90 gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |aifgi90 gmail.com ---Thank you. But is DMD doing the joining with ~ at compile time? If not, then you can add that optimization to your patch (if you are able to).I think DMD is doing joining at compile time (constfold.c, from line 1387) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 28 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 The error message for the missing ~ can be something like this (adapted from the "'l' suffix is deprecated, use 'L' instead" error message generated by the usage of a 10l long literal): adjacent string literals concatenation is deprecated, add ~ between them instead. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 20 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Ellery Newcomer <ellery-newcomer utulsa.edu> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ellery-newcomer utulsa.edu 16:29:07 PDT ---The "Python Zen" has a rule that says: Explicit is better than implicit.the python compiler has a rule that says do the exact same thing as what d is doing. Your serve. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 20 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 I know Python, but I hope D will become better than Python on this syntax detail. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 20 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 A particularly nice example of why untidy syntax easily leads to bugs (this comes from two different sources of sloppiness of the D2 language): enum string[5] data = ["green", "magenta", "blue" "red", "yellow"]; static assert(data[4] == "yellow"); // asserts void main() {} -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 21 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Another bug caused in my code by that anti-feature: unittest { auto tests = [["", "0000"], ["12346", "0000"], ["he", "H000"], ["soundex", "S532"], ["example", "E251"], ["ciondecks", "C532"], ["ekzampul", "E251"], ["resume", "R250"], ["Robert", "R163"], ["Rupert", "R163"], ["Rubin" "R150"], ["Ashcraft", "A226"], ["Ashcroft", "A226"]]; foreach (pair; tests) assert(processit(pair[0]) == pair[1]); } That code compiles with no errors with DMD 2.050, and then causes a Range violation at runtime because one of those arrays isn't a pair. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 10 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 that it doesn't perform automatic joining of adjacent strings: public class Test { public static void Main() { string s = "hello " "world"; } } prog.cs(3,35): error CS1525: Unexpected symbol `world' Compilation failed: 1 error(s), 0 warnings -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 10 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Walter agrees: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=121830 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 12 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 A comment from Andrei Alexandrescu: Walter, please don't forget to tweak the associativity rules: var ~ " literal " ~ " literal " concatenates literals first. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 12 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 A comment from Stewart Gordon:You mean make ~ right-associative? I think this'll break more code than it fixes. But implementing a compiler optimisation so that var ~ ctc ~ ctc is processed as var ~ (ctc ~ ctc), _in those cases where they're equivalent_, would be sensible. ctc = compile-time constant-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 13 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 19:26:18 PST --- you don't need to mess with associativity rules, you just need to be able to handle two or three ast cases: 1. (~ str str) ie str ~ str 2. (~ (~ x str) str) ie x ~ str ~ str 3. (~ str (~ str x)) ie str ~ (str ~ x) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 13 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Don <clugdbug yahoo.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |clugdbug yahoo.com.auyou don't need to mess with associativity rules, you just need to be able to handle two or three ast cases: 1. (~ str str) ie str ~ str 2. (~ (~ x str) str) ie x ~ str ~ str 3. (~ str (~ str x)) ie str ~ (str ~ x)Like this (optimize.c, line 1023): Expression *CatExp::optimize(int result) { Expression *e; //printf("CatExp::optimize(%d) %s\n", result, toChars()); e1 = e1->optimize(result); e2 = e2->optimize(result); + if (e1->op == TOKcat && (e2->op == TOKstring || e2->op == TOKnull) + && (((CatExp *)e1)->e2->op == TOKstring || ((CatExp *)e1)->e2->op == TOKnull)) + { + // Convert (e ~ str) ~ str into e ~ (str ~ str) + CatExp *ce = ((CatExp *)e1); + e1 = ce->e1; + ce->e1 = ce->e2; + ce->e2 = e2; + e2 = ce; + } e = Cat(type, e1, e2); if (e == EXP_CANT_INTERPRET) e = this; return e; } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 13 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Sorry, missed out a line: if (e1->op == TOKcat && (e2->op == TOKstring || e2->op == TOKnull) && (((CatExp *)e1)->e2->op == TOKstring || ((CatExp *)e1)->e2->op == TOKnull)) { // Convert (e ~ str) ~ str into e ~ (str ~ str) CatExp *ce = ((CatExp *)e1); e1 = ce->e1; ce->e1 = ce->e2; ce->e2 = e2; e2 = ce; + e2 = e2->optimize(result); } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 13 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Stewart Gordon <smjg iname.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |smjg iname.comThe error message for the missing ~ can be something like this (adapted from the "'l' suffix is deprecated, use 'L' instead" error message generated by the usage of a 10l long literal): adjacent string literals concatenation is deprecated, add ~ between them instead.Better watch out for cases where just adding ~ changes the behaviour. For example, if a is a string[], then a ~ "this" "that" and a ~ "this" ~ "that" evaluate to different strings. Not that there's any real use case for "this" "that" anyway. And those rare use cases it does have in D can be fixed by inserting the ~, though there may be easier-to-miss cases of the above of which to be wary. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 16 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827For example, if a is a string[], then a ~ "this" "that" and a ~ "this" ~ "that" evaluate to different strings.Different string arrays even. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 16 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 nfxjfg gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nfxjfg gmail.comNot that there's any real use case for "this" "that" anyway. And those rare use casesI use automatic joining all the time for long string literals. I want them to span multiple source lines without containing line breaks. No, not a rarely used feature. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 16 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827Stewart Gordon was just talking about code like: a ~ "this" "that" where a is a string[]. To join multiple lines you may add a ~ at their end: string text = "I use automatic joining all the time for long string literals. I want them to " ~ "span multiple source lines without containing line breaks. " ~ "No, not a rarely used feature."; -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------Not that there's any real use case for "this" "that" anyway. And those rare use casesI use automatic joining all the time for long string literals. I want them to span multiple source lines without containing line breaks. No, not a rarely used feature.
Nov 16 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Steven Schveighoffer <schveiguy yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |schveiguy yahoo.com 21:33:05 PST ---doesn't this solve that problem? a ~ ("this" ~ "that") BTW, I don't expect very many cases like this (in fact, I bet there are none). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------The error message for the missing ~ can be something like this (adapted from the "'l' suffix is deprecated, use 'L' instead" error message generated by the usage of a 10l long literal): adjacent string literals concatenation is deprecated, add ~ between them instead.Better watch out for cases where just adding ~ changes the behaviour. For example, if a is a string[], then a ~ "this" "that" and a ~ "this" ~ "that" evaluate to different strings.
Nov 16 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827doesn't this solve that problem? a ~ ("this" ~ "that")It does. My point was that somebody might accidentally not add the brackets. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 17 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 12:04:03 PST --- If constfold can access a's type, it can make the right decision. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 17 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 A recent note by Walter:Andrei's right. This is not about making it right-associative. It is about defining in the language that: ((a ~ b) ~ c) is guaranteed to produce the same result as: (a ~ (b ~ c)) Unfortunately, the language cannot make such a guarantee in the face of operator overloading. But it can do it for cases where operator overloading is not in play.-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 22 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3827 See also: http://stackoverflow.com/questions/2504536/why-allow-concatenation-of-string-literals -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 20 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3827 An example of the problems this avoids:http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=22649 Andrej Mitrovic:I see you are not the only one who started writing string array literals like this: enum PEGCode = grammarCode!( "Grammar <- S Definition+ EOI" ,"Definition <- RuleName Arrow Expression" ,"RuleName <- Identifier>(ParamList?)" ,"Expression <- Sequence (OR Sequence)*" ); IOW comma on the left side. I know it's not a style preference but actually a (unfortunate but needed) technique for avoiding bugs. :)-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 10 2012
http://d.puremagic.com/issues/show_bug.cgi?id=3827 Andrej Mitrovic <andrej.mitrovich gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrej.mitrovich gmail.com 17:56:16 PST ---Note that this is Philippe Sigaud's code. So you can him, and me to the list of people affected by this. I'm doing string processing in D on a day-to-day basis, and whenever I have a list of strings I eventually end up shooting myself in the foot because of a missing comma. It's very easy (at least for clumsy me) to make the mistake. E.g. writing some headers to ignore: string[] ignoredHeaders = [ "foo.bar" // todo: have to fix this later "foo.do", // todo: later ]; When I have comments next to the strings it makes it easy to miss the missing comma, especially if the strings are of a different length. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------enum PEGCode = grammarCode!( "Grammar <- S Definition+ EOI" ,"Definition <- RuleName Arrow Expression" ,"RuleName <- Identifier>(ParamList?)" ,"Expression <- Sequence (OR Sequence)*" );
Mar 10 2012