digitalmars.D.announce - dmd 1.060 and 2.045 release

Walter Bright (5/5) May 04 2010 This is to fix the unittest and dwarf screwups in the last release.

Robert Clipsham (7/12) May 04 2010 Thanks a lot for prioritizing this and getting it fixed :) Debugging D

Walter Bright (2/6) May 04 2010 I agree that getting all the gdb issues sorted out will be a nice win.

Robert Clipsham (13/14) May 04 2010 I fixed all the killer ones (for ELF systems at least), you can now set

Walter Bright (5/10) May 04 2010 Basically, I don't think the switches need to change, just:

Robert Clipsham (14/18) May 04 2010 You'd be ok with, for example:

Walter Bright (4/18) May 04 2010 That's the problem with D extensions; unless they get officially adopted...

Robert Clipsham (8/11) May 04 2010 Too late for this, DWARF 4 has introduced conflicts with them already.

Brad Roberts (11/23) May 04 2010 I've lost track of the details of the extensions that are in use. I
Walter Bright (2/14) May 04 2010 Yes, do it!

Robert Clipsham (4/5) May 04 2010 I have submitted a proposal, I'm currently awaiting confirmation that it...
Robert Clipsham (3/4) May 05 2010 http://dwarfstd.org/ShowIssue.php?issue=100504.1

Robert Clipsham (13/17) May 05 2010 "This has been assigned issue #100504.1.

Walter Bright (2/25) May 05 2010

Leandro Lucarella (10/17) May 05 2010 Specially now that GDB will support D natively!

Adam D. Ruppe (2/3) May 04 2010 Yay, it seems to have fixed the weird endless loop I got in the last one...
Arth Lloyd Flores (4/9) May 04 2010 --
Alex Makhotin (26/34) May 05 2010 Hi,

Walter Bright (2/5) May 05 2010 Definitely there's a problem.

Walter Bright (3/9) May 05 2010 The problem is the spell checker is O(n*n) on the number of characters i...

Alex Makhotin (6/16) May 06 2010 Is there a way to disable it?

Walter Bright (2/16) May 05 2010 Currently, no.

Steven Schveighoffer (6/14) May 06 2010 That can't be it. The identifier shown by Alex is only 33 characters. ...

Leandro Lucarella (9/25) May 06 2010 Run a profiler.
Walter Bright (2/5) May 06 2010 I recompiled dmd with the profiler (-gt switch) which confirmed it.

Steven Schveighoffer (11/16) May 06 2010 So a single unknown symbol (from Alex's example) which can be checked

Walter Bright (31/49) May 06 2010 Check out the profiler output. It's clearly the vast number of calls to ...

Walter Bright (2/3) May 06 2010 For those interested, try out changeset 470.

Walter Bright (3/7) May 06 2010 On my timing tests, the time spent is linear with the number of characte...

BCS (6/19) May 06 2010 How about switch algos for long identifiers: you could bucket the knows ...
Michel Fortin (10/20) May 06 2010 That's an algorithm that can't scale then.

Lionello Lunesu (6/28) May 06 2010 and especially this line:

Brad Roberts (5/34) May 06 2010 The source for this is pretty isolated.. anyone want to volunteer take a...

Lionello Lunesu (2/39) May 06 2010 I see, speller.c, I'll have a look..

Lionello Lunesu (9/48) May 08 2010 I'm in the middle of moving from one city to another so don't wait for

Lionello Lunesu (3/3) May 08 2010 That code is in the public domain, by the way.
Steven Schveighoffer (43/50) May 10 2010 Several others have privately brought up this problem to Walter. He doe...

Ary Borenszweig (7/79) May 10 2010 I can't imagine why something as simple as iterating a symbol lookup

Steven Schveighoffer (5/15) May 10 2010 On that point I'm just the messenger, I'm not sure why Walter has that

BCS (5/9) May 10 2010 Is it fundamentally impossible to iterate or is the code just not there ...

Steven Schveighoffer (8/15) May 10 2010 I can only speculate, but I would guess the latter. What I can think of...

Walter Bright (11/13) May 10 2010 It's a good summary of the situation. I've felt that as long as the mess...

Walter Bright <newshound1 digitalmars.com> writes:

This is to fix the unittest and dwarf screwups in the last release.

http://www.digitalmars.com/d/1.0/changelog.html
http://ftp.digitalmars.com/dmd.1.060.zip


http://www.digitalmars.com/d/2.0/changelog.html
http://ftp.digitalmars.com/dmd.2.045.zip

May 04 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 04/05/10 18:30, Walter Bright wrote:
 This is to fix the unittest and dwarf screwups in the last release.

 http://www.digitalmars.com/d/1.0/changelog.html
 http://ftp.digitalmars.com/dmd.1.060.zip


 http://www.digitalmars.com/d/2.0/changelog.html
 http://ftp.digitalmars.com/dmd.2.045.zip

Thanks a lot for prioritizing this and getting it fixed :) Debugging D 
on linux is now bearable, and even a pleasant experience at times =)

Still a long way to go though, various (much!) smaller issues that need 
fixing... If no one else gets to them I'll go on a debug fixing spree at 

(debugging tracker) closed :)

May 04 2010

Walter Bright <newshound1 digitalmars.com> writes:

Robert Clipsham wrote:
 Still a long way to go though, various (much!) smaller issues that need 
 fixing... If no one else gets to them I'll go on a debug fixing spree at 

 (debugging tracker) closed :)

I agree that getting all the gdb issues sorted out will be a nice win.

May 04 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 04/05/10 18:52, Walter Bright wrote:
 I agree that getting all the gdb issues sorted out will be a nice win.

I fixed all the killer ones (for ELF systems at least), you can now set 
breakpoints and get backtraces etc, the remaining issues are to do with 
line numbers being a bit off under certain circumstances, code listing 
not working in gdb (I've never used this feature, hehe) and a few 
enhancements.

This said, debugging on OS X is currently impossible, I should be able 
to take a look at that some time soon.

While you're reading, is there any chance you could take a look at my 
post on the dmd-internals ML and give me some feedback? I noticed you 
agreed with Brad in a bug report, but to what extent? When I get around 
to fixing all those bugs I'd like to know where you stand so I can work 
on some enhancements to debug info as well as just bug fixes :)

May 04 2010

Walter Bright <newshound1 digitalmars.com> writes:

Robert Clipsham wrote:
 While you're reading, is there any chance you could take a look at my 
 post on the dmd-internals ML and give me some feedback? I noticed you 
 agreed with Brad in a bug report, but to what extent? When I get around 
 to fixing all those bugs I'd like to know where you stand so I can work 
 on some enhancements to debug info as well as just bug fixes :)

Basically, I don't think the switches need to change, just:

-g: debugger can handle D data types and extensions

-gc: make it work with whatever the debugger can handle, i.e. for debuggers
that 
don't know about D

May 04 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 04/05/10 19:03, Walter Bright wrote:
 Basically, I don't think the switches need to change, just:

 -g: debugger can handle D data types and extensions

 -gc: make it work with whatever the debugger can handle, i.e. for
 debuggers that don't know about D

You'd be ok with, for example:
   -g             add symbolic debug info
   -gc            add symbolic debug info, pretend to be C++

Instead of C then? Or some other language that debuggers support? I say 
this as C++ supports more of D's features, so we'd be able to give 
better debugging info for debuggers without explicit D support.

There was another point in that post, about the D extensions to DWARF... 
I think it is unlikely that patches to support D's extensions to DWARF 
would be accepted into gdb, particularly as the values for the DW_TAG's 
conflicts with things in the DWARF4 spec. I think there should be a way 
to act like D but without these extensions. Ideally the solution to this 
is to try and get the extensions officially into the DWARF spec, which 
I'd be willing to push for if possible.

May 04 2010

Walter Bright <newshound1 digitalmars.com> writes:

Robert Clipsham wrote:
 You'd be ok with, for example:
   -g             add symbolic debug info
   -gc            add symbolic debug info, pretend to be C++
 
 Instead of C then? Or some other language that debuggers support? I say 
 this as C++ supports more of D's features, so we'd be able to give 
 better debugging info for debuggers without explicit D support.

Yes.

 There was another point in that post, about the D extensions to DWARF... 
 I think it is unlikely that patches to support D's extensions to DWARF 
 would be accepted into gdb, particularly as the values for the DW_TAG's 
 conflicts with things in the DWARF4 spec. I think there should be a way 
 to act like D but without these extensions. Ideally the solution to this 
 is to try and get the extensions officially into the DWARF spec, which 
 I'd be willing to push for if possible.

That's the problem with D extensions; unless they get officially adopted they 
conflict with future changes to the spec. We need to get them officially
adopted.

May 04 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 04/05/10 19:50, Walter Bright wrote:
 That's the problem with D extensions; unless they get officially adopted
 they conflict with future changes to the spec. We need to get them
 officially adopted.

Too late for this, DWARF 4 has introduced conflicts with them already. 
We could try and get them pushed for dwarf 4, but if they get in the 
values of the tags will have to change (not really a problem, as only 
one debugger supports them I think, and to my knowledge it isn't widely 
used for D yet... So we'd be able to talk to the developers and work 
around this). I'd be willing to take action to get them pushed for DWARF 
4 or (if it's too late for that) DWARF 5, if that's OK with you?

May 04 2010

Brad Roberts <braddr slice-2.puremagic.com> writes:

On Tue, 4 May 2010, Robert Clipsham wrote:

 On 04/05/10 19:50, Walter Bright wrote:
 That's the problem with D extensions; unless they get officially adopted
 they conflict with future changes to the spec. We need to get them
 officially adopted.

 
 Too late for this, DWARF 4 has introduced conflicts with them already. We
 could try and get them pushed for dwarf 4, but if they get in the values of
 the tags will have to change (not really a problem, as only one debugger
 supports them I think, and to my knowledge it isn't widely used for D yet...
 So we'd be able to talk to the developers and work around this). I'd be
 willing to take action to get them pushed for DWARF 4 or (if it's too late for
 that) DWARF 5, if that's OK with you?

I've lost track of the details of the extensions that are in use.  I 
assume they're to cover the array and dictionary layouts.  I wouldn't be 
suprised if there wasn't coverage for them in the current dwarf formats, 
or close enough to make them work.

Another potential path, that would obviously be gdb specific, is to 
include some scripts that help display the more complex types.  I saw 
something about gdb/python integration.  A little bit of scripting can go 
a long long way.

Later,
Brad

May 04 2010

Walter Bright <newshound1 digitalmars.com> writes:

Robert Clipsham wrote:
 On 04/05/10 19:50, Walter Bright wrote:
 That's the problem with D extensions; unless they get officially adopted
 they conflict with future changes to the spec. We need to get them
 officially adopted.

 
 Too late for this, DWARF 4 has introduced conflicts with them already. 
 We could try and get them pushed for dwarf 4, but if they get in the 
 values of the tags will have to change (not really a problem, as only 
 one debugger supports them I think, and to my knowledge it isn't widely 
 used for D yet... So we'd be able to talk to the developers and work 
 around this). I'd be willing to take action to get them pushed for DWARF 
 4 or (if it's too late for that) DWARF 5, if that's OK with you?

Yes, do it!

May 04 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 04/05/10 20:43, Walter Bright wrote:
 Yes, do it!

I have submitted a proposal, I'm currently awaiting confirmation that it 
has been received. I'll let you know how/if it progresses and paste 
links if I can so you can track it yourself :)

May 04 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 04/05/10 20:43, Walter Bright wrote:
 Yes, do it!

http://dwarfstd.org/ShowIssue.php?issue=100504.1

Please feel free to comment on it/make corrections :)

May 05 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 05/05/10 20:36, Robert Clipsham wrote:
 On 04/05/10 20:43, Walter Bright wrote:
 Yes, do it!

 http://dwarfstd.org/ShowIssue.php?issue=100504.1

 Please feel free to comment on it/make corrections :)



We are not accepting extension proposals for DWARF Version 4,
but will consider this for the next version."

Looks like we'll have to wait a while for its inclusion. In the mean 
time could I suggest we move the DW_TAG's of dmd's extensions to the 
area of the spec specified for language specific extensions? I'm not 
sure if this was specified in DWARF 2 which dmd uses, or what the range 
is (I don't have the spec to hand), but it would be good not to conflict 
with DWARF 4.

Another good idea could be to decide how else DWARF could be extended to 
help with debug info so we have more chance of getting some useful 
things for debugging into DWARF 5.

May 05 2010

Walter Bright <newshound1 digitalmars.com> writes:

Robert Clipsham wrote:
 On 05/05/10 20:36, Robert Clipsham wrote:
 On 04/05/10 20:43, Walter Bright wrote:
 Yes, do it!

 http://dwarfstd.org/ShowIssue.php?issue=100504.1

 Please feel free to comment on it/make corrections :)

 

 
 We are not accepting extension proposals for DWARF Version 4,
 but will consider this for the next version."
 
 Looks like we'll have to wait a while for its inclusion. In the mean 
 time could I suggest we move the DW_TAG's of dmd's extensions to the 
 area of the spec specified for language specific extensions? I'm not 
 sure if this was specified in DWARF 2 which dmd uses, or what the range 
 is (I don't have the spec to hand), but it would be good not to conflict 
 with DWARF 4.

Good idea.

 
 Another good idea could be to decide how else DWARF could be extended to 
 help with debug info so we have more chance of getting some useful 
 things for debugging into DWARF 5.

May 05 2010

Leandro Lucarella <llucax gmail.com> writes:

Walter Bright, el  4 de mayo a las 10:52 me escribiste:
 Robert Clipsham wrote:
Still a long way to go though, various (much!) smaller issues that
need fixing... If no one else gets to them I'll go on a debug
fixing spree at some point in a couple of months and see if we


 
 I agree that getting all the gdb issues sorted out will be a nice win.

Specially now that GDB will support D natively!

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Los sueños de los niños son especialmente importantes en su etapa de
formación; si un niño no sueña es que será un pelotudo toda la vida.
	-- Ricardo Vaporeso

May 05 2010

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Tue, May 04, 2010 at 10:30:36AM -0700, Walter Bright wrote:
 This is to fix the unittest and dwarf screwups in the last release.

Yay, it seems to have fixed the weird endless loop I got in the last one.

May 04 2010

Arth Lloyd Flores <floresarthlloyd gmail.com> writes:

Yay that was fast :)

On Wed, May 5, 2010 at 1:30 AM, Walter Bright <newshound1 digitalmars.com>wrote:

 This is to fix the unittest and dwarf screwups in the last release.

 http://www.digitalmars.com/d/1.0/changelog.html
 http://ftp.digitalmars.com/dmd.1.060.zip


 http://www.digitalmars.com/d/2.0/changelog.html
 http://ftp.digitalmars.com/dmd.2.045.zip



-- 
-Arth

May 04 2010

Alex Makhotin <alex bitprox.com> writes:

Walter Bright wrote:
 This is to fix the unittest and dwarf screwups in the last release.
 
 http://www.digitalmars.com/d/1.0/changelog.html
 http://ftp.digitalmars.com/dmd.1.060.zip
 
 
 http://www.digitalmars.com/d/2.0/changelog.html
 http://ftp.digitalmars.com/dmd.2.045.zip

Hi,

The fixes are good but when I compile, for example, the following 
simplified code:

import std.stdio;
import core.sync.condition;
import core.sync.mutex;
import std.contracts;
import std.regex;
import std.date;

void foo(int x, int y)
{
	currentPipedProcessWhateverYouLike = new PipedProcessWhateverYouLike();
}

It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 
kernel 2.6.32.4), to get the actual error messages about the undefined 
identifier.

As I said this example is simplified and I need to wait more in the real
project, while compiling the BDE itself, eventually I kill the never 
ending dmd process and go back to the v2.043 which has no such problem.

I guess the problem is a spell checker. It would be much better to make 
this feature optional(if this is the problem), if this cannot be done in 
other way.


Alex Makhotin,
the founder of BITPROX,
http://bitprox.com

May 05 2010

Walter Bright <newshound1 digitalmars.com> writes:

Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 
 kernel 2.6.32.4), to get the actual error messages about the undefined 
 identifier.

Definitely there's a problem.

May 05 2010

Walter Bright <newshound1 digitalmars.com> writes:

Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 
 kernel 2.6.32.4), to get the actual error messages about the undefined 
 identifier.

 
 Definitely there's a problem.

The problem is the spell checker is O(n*n) on the number of characters in the 
undefined identifier.

May 05 2010

Alex Makhotin <alex bitprox.com> writes:

Walter Bright wrote:
 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 
 kernel 2.6.32.4), to get the actual error messages about the 
 undefined identifier.

 Definitely there's a problem.

 
 The problem is the spell checker is O(n*n) on the number of characters 
 in the undefined identifier.

Is there a way to disable it?

-- 
Alex Makhotin,
the founder of BITPROX,
http://bitprox.com

May 06 2010

Walter Bright <newshound1 digitalmars.com> writes:

Alex Makhotin wrote:
 Walter Bright wrote:
 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 
 kernel 2.6.32.4), to get the actual error messages about the 
 undefined identifier.

 Definitely there's a problem.

 The problem is the spell checker is O(n*n) on the number of characters 
 in the undefined identifier.

 
 Is there a way to disable it?
 

Currently, no.

May 05 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 05 May 2010 23:45:50 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3  
 kernel 2.6.32.4), to get the actual error messages about the undefined  
 identifier.

  Definitely there's a problem.

 The problem is the spell checker is O(n*n) on the number of characters  
 in the undefined identifier.

That can't be it.  The identifier shown by Alex is only 33 characters.   
O(n^2) is not that slow, especially for smaller variables.  There must be  
other factors you're not considering...

-Steve

May 06 2010

Leandro Lucarella <llucax gmail.com> writes:

Steven Schveighoffer, el  6 de mayo a las 07:17 me escribiste:
 On Wed, 05 May 2010 23:45:50 -0400, Walter Bright
 <newshound1 digitalmars.com> wrote:
 
Walter Bright wrote:
Alex Makhotin wrote:
It takes ~40 seconds 50% load on the dual core
processor(CentOS 5.3 kernel 2.6.32.4), to get the actual error
messages about the undefined identifier.

 Definitely there's a problem.

The problem is the spell checker is O(n*n) on the number of
characters in the undefined identifier.

 
 That can't be it.  The identifier shown by Alex is only 33
 characters.  O(n^2) is not that slow, especially for smaller
 variables.  There must be other factors you're not considering...

Run a profiler.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
No existiría el sonido del mar si faltara en la vida oreja y caracol.
	-- Ricardo Vaporeso. Cosquín, 1908.

May 06 2010

Walter Bright <newshound1 digitalmars.com> writes:

Steven Schveighoffer wrote:
 That can't be it.  The identifier shown by Alex is only 33 characters.  
 O(n^2) is not that slow, especially for smaller variables.  There must 
 be other factors you're not considering...

I recompiled dmd with the profiler (-gt switch) which confirmed it.

May 06 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 06 May 2010 17:07:12 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Steven Schveighoffer wrote:
 That can't be it.  The identifier shown by Alex is only 33 characters.   
 O(n^2) is not that slow, especially for smaller variables.  There must  
 be other factors you're not considering...

 I recompiled dmd with the profiler (-gt switch) which confirmed it.

So a single unknown symbol (from Alex's example) which can be checked  
against each existing symbol in O(n^2) time, takes 40 seconds on a decent  
CPU?  How many other symbols are there?  33^2 == 1089, if there are 10000  
symbols, that's 10 million iterations, that shouldn't take 40 seconds to  
run, should it?  Are there more symbols to compare against?  Do you use  
heuristics to prune the search?  For example, if the max distance is 2,  
and the difference in length between two strings is >2, you should be able  
to return immediately.

-Steve

May 06 2010

Walter Bright <newshound1 digitalmars.com> writes:

Steven Schveighoffer wrote:
 On Thu, 06 May 2010 17:07:12 -0400, Walter Bright 
 <newshound1 digitalmars.com> wrote:
 
 Steven Schveighoffer wrote:
 That can't be it.  The identifier shown by Alex is only 33 
 characters.  O(n^2) is not that slow, especially for smaller 
 variables.  There must be other factors you're not considering...

 I recompiled dmd with the profiler (-gt switch) which confirmed it.

 
 So a single unknown symbol (from Alex's example) which can be checked 
 against each existing symbol in O(n^2) time, takes 40 seconds on a 
 decent CPU?  How many other symbols are there?  33^2 == 1089, if there 
 are 10000 symbols, that's 10 million iterations, that shouldn't take 40 
 seconds to run, should it?  Are there more symbols to compare against?  
 Do you use heuristics to prune the search?  For example, if the max 
 distance is 2, and the difference in length between two strings is >2, 
 you should be able to return immediately.

Check out the profiler output. It's clearly the vast number of calls to the 
symbol lookup, not the time spent in each call.

-----------------------------------------
   Num          Tree        Func        Per
   Calls        Time        Time        Call

50409318   632285778   145858160           2     Dsymbol *syscall 
ScopeDsymbol::search(Loc ,Identifier *,int )
50411264   131394915   106356855           2     void **syscall 
StringTable::search(char const *,unsigned )
50409329   341960075   105532978           2     Dsymbol *syscall 
DsymbolTable::lookup(Identifier *)
50409329   236427096   105037393           2     StringValue *syscall 
StringTable::lookup(char const *,unsigned )
12602340   613890619    67393753           5     Dsymbol *syscall 
Scope::search(Loc ,Identifier *,Dsymbol **)
12602178   693915197    66918360           5     void *cdecl 
scope_search_fp(void *,char const *)
25204505   461352920    52529164           2     Dsymbol *syscall 
Module::search(Loc ,Identifier *,int )
50412137    25038474    25038474           0     unsigned cdecl 
Dchar::calcHash(char const *,unsigned )
    3520  1428323068    20349375        5781     void *cdecl spellerX(char
const 
*,void *cdecl (*)(void *,char const *),void *,char const *,int )
12602664     6811916     6811916           0     syscall 
Identifier::Identifier(char const *,int )
12602178     6299089     6299089           0     void cdecl Module::clearCache()
12602183     6151175     6151175           0     Module *syscall
Module::isModule()
    1600       11329        4261           2     StringValue *syscall 
StringTable::update(char const *,unsigned )

-----------------------------------------

May 06 2010

Walter Bright <newshound1 digitalmars.com> writes:

Walter Bright wrote:
 I recompiled dmd with the profiler (-gt switch) which confirmed it.

For those interested, try out changeset 470.

May 06 2010

Walter Bright <newshound1 digitalmars.com> writes:

Walter Bright wrote:
 Walter Bright wrote:
 I recompiled dmd with the profiler (-gt switch) which confirmed it.

 
 For those interested, try out changeset 470.

On my timing tests, the time spent is linear with the number of characters in 
the identifier. It's still too slow, though.

May 06 2010

BCS <none anon.com> writes:

Hello Walter,

 Walter Bright wrote:
 
 Alex Makhotin wrote:
 
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3
 kernel 2.6.32.4), to get the actual error messages about the
 undefined identifier.
 

 Definitely there's a problem.
 

 The problem is the spell checker is O(n*n) on the number of characters
 in the undefined identifier.
 

How about switch algos for long identifiers: you could bucket the knows by 
length and compare histograms on things of similar length. Or maybe just 
turn it off for long names. 

-- 
... <IXOYE><

May 06 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-05-05 23:45:50 -0400, Walter Bright <newshound1 digitalmars.com> said:

 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 
 kernel 2.6.32.4), to get the actual error messages about the undefined 
 identifier.

 
 Definitely there's a problem.

 
 The problem is the spell checker is O(n*n) on the number of characters 
 in the undefined identifier.

That's an algorithm that can't scale then.

Checking the Levenshtein distance for each known identifier within a 
small difference in length would be a better idea. (Clang is said to 
use the Levenshtein distance, it probably does something of the sort.)

http://en.wikipedia.org/wiki/Levenshtein_distance

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

May 06 2010

Lionello Lunesu <lio lunesu.remove.com> writes:

On 6-5-2010 22:37, Michel Fortin wrote:
 On 2010-05-05 23:45:50 -0400, Walter Bright <newshound1 digitalmars.com>
 said:
 
 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3
 kernel 2.6.32.4), to get the actual error messages about the
 undefined identifier.

 Definitely there's a problem.

 The problem is the spell checker is O(n*n) on the number of characters
 in the undefined identifier.

 
 That's an algorithm that can't scale then.
 
 Checking the Levenshtein distance for each known identifier within a
 small difference in length would be a better idea. (Clang is said to use
 the Levenshtein distance, it probably does something of the sort.)
 
 http://en.wikipedia.org/wiki/Levenshtein_distance
 

and especially this line:


threshold k, then it suffices to compute a diagonal stripe of width 2k+1
in the matrix. In this way, the algorithm can be run in O(kl) time,
where l is the length of the shortest string.

May 06 2010

Brad Roberts <braddr slice-2.puremagic.com> writes:

On Fri, 7 May 2010, Lionello Lunesu wrote:

 On 6-5-2010 22:37, Michel Fortin wrote:
 On 2010-05-05 23:45:50 -0400, Walter Bright <newshound1 digitalmars.com>
 said:
 
 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3
 kernel 2.6.32.4), to get the actual error messages about the
 undefined identifier.

 Definitely there's a problem.

 The problem is the spell checker is O(n*n) on the number of characters
 in the undefined identifier.

 
 That's an algorithm that can't scale then.
 
 Checking the Levenshtein distance for each known identifier within a
 small difference in length would be a better idea. (Clang is said to use
 the Levenshtein distance, it probably does something of the sort.)
 
 http://en.wikipedia.org/wiki/Levenshtein_distance
 

 and especially this line:
 

 threshold k, then it suffices to compute a diagonal stripe of width 2k+1
 in the matrix. In this way, the algorithm can be run in O(kl) time,
 where l is the length of the shortest string.

The source for this is pretty isolated.. anyone want to volunteer take a 
shot at improving this part of dmd?

Later,
Brad

May 06 2010

Lionello Lunesu <lio lunesu.remove.com> writes:

On 7-5-2010 9:10, Brad Roberts wrote:
 On Fri, 7 May 2010, Lionello Lunesu wrote:
 
 On 6-5-2010 22:37, Michel Fortin wrote:
 On 2010-05-05 23:45:50 -0400, Walter Bright <newshound1 digitalmars.com>
 said:

 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3
 kernel 2.6.32.4), to get the actual error messages about the
 undefined identifier.

 Definitely there's a problem.

 The problem is the spell checker is O(n*n) on the number of characters
 in the undefined identifier.

 That's an algorithm that can't scale then.

 Checking the Levenshtein distance for each known identifier within a
 small difference in length would be a better idea. (Clang is said to use
 the Levenshtein distance, it probably does something of the sort.)

 http://en.wikipedia.org/wiki/Levenshtein_distance

 and especially this line:


 threshold k, then it suffices to compute a diagonal stripe of width 2k+1
 in the matrix. In this way, the algorithm can be run in O(kl) time,
 where l is the length of the shortest string.

 
 The source for this is pretty isolated.. anyone want to volunteer take a 
 shot at improving this part of dmd?
 
 Later,
 Brad

I see, speller.c, I'll have a look..

May 06 2010

Lionello Lunesu <lio lunesu.remove.com> writes:

On 7-5-2010 12:01, Lionello Lunesu wrote:
 On 7-5-2010 9:10, Brad Roberts wrote:
 On Fri, 7 May 2010, Lionello Lunesu wrote:

 On 6-5-2010 22:37, Michel Fortin wrote:
 On 2010-05-05 23:45:50 -0400, Walter Bright <newshound1 digitalmars.com>
 said:

 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3
 kernel 2.6.32.4), to get the actual error messages about the
 undefined identifier.

 Definitely there's a problem.

 The problem is the spell checker is O(n*n) on the number of characters
 in the undefined identifier.

 That's an algorithm that can't scale then.

 Checking the Levenshtein distance for each known identifier within a
 small difference in length would be a better idea. (Clang is said to use
 the Levenshtein distance, it probably does something of the sort.)

 http://en.wikipedia.org/wiki/Levenshtein_distance

 and especially this line:


 threshold k, then it suffices to compute a diagonal stripe of width 2k+1
 in the matrix. In this way, the algorithm can be run in O(kl) time,
 where l is the length of the shortest string.

 The source for this is pretty isolated.. anyone want to volunteer take a 
 shot at improving this part of dmd?

 Later,
 Brad

 I see, speller.c, I'll have a look..

I'm in the middle of moving from one city to another so don't wait for
me. I have attached the D version of the code in the wikipedia article
(including the patch for transpositions.)

It's not straightforward to drop this in (apart from it being in D),
since speller.c creates all variations on a string (=inefficient) and
uses a callback function to check if a variation is a valid symbol.

I'll have more time to look at this next week.

L.

May 08 2010

Lionello Lunesu <lio lunesu.remove.com> writes:

That code is in the public domain, by the way.

DMD should require a copyright notice in each source file :)

L.

May 08 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 09 May 2010 02:11:21 -0400, Lionello Lunesu  
<lio lunesu.remove.com> wrote:


 I'm in the middle of moving from one city to another so don't wait for
 me. I have attached the D version of the code in the wikipedia article
 (including the patch for transpositions.)

 It's not straightforward to drop this in (apart from it being in D),
 since speller.c creates all variations on a string (=inefficient) and
 uses a callback function to check if a variation is a valid symbol.

 I'll have more time to look at this next week.

Several others have privately brought up this problem to Walter.  He does  
not want to change how the symbol lookup tables work, and there is no way  
to iterate them.  Therefore, without a way to iterate current symbols,  
this is the only way the algo can be written.

However, according to reports on the latest beta, he's sped up the lookup  
times for symbols significantly enough to perceptually reduce the problem.

Because of the nature of the algorithm, the longer the invalid symbol, the  
slower the algorithm.  It would be interesting to see a comparison between  
the current beta code and code that does a full iteration with very long  
symbols.  I don't know if anyone wants to look at modifying the symbol  
lookup data structures to allow iteration, it may be perceptually  
insignificant, and unimportant for most developers.

A quick test on a long symbol name:


void  
s023456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012
45678901234567891()  
{}
void main()
{
      s123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890();
}



And on the various compilers:

2.043: 0.052s, does not suggest
2.045: 5.4s, suggests the alternative
beta: 0.8s, does not suggest

2.043 only does spell checking with a Levenshtein distance of 1, 2.045  
does 2, but is extremely slow.  The beta does a distance of 2, but only if  
the errors are close together (Walter added this as an optimization to  
remove one factor from the runtime complexity).

So clearly, there is still room for improvement, I think with a proper  
symbol iteration, we could get the time down to be as quick as 2.043 or  
faster *and* provide the ability to check for a complete LD of 2 where the  
errors are not close together.  We might be able to even push the LD to 3  
or 4.

I've thought about the optimization of errors close together being  
checked, and I think the counter case is the case which takes the longest  
-- a long symbol.  Typically people use camelCase to denote symbols, what  
if two of the words in the symbol were misspelled by one character (or a  
capitalization was forgotten)?

It may not be an issue, the spell checker is simply a nice hint, but isn't  
essential to determine errors.

-Steve

May 10 2010

Ary Borenszweig <ary esperanto.org.ar> writes:

Steven Schveighoffer wrote:
On Sun, 09 May 2010 02:11:21 -0400, Lionello Lunesu
<lio lunesu.remove.com> wrote:

I'm in the middle of moving from one city to another so don't wait for
me. I have attached the D version of the code in the wikipedia article
(including the patch for transpositions.)

It's not straightforward to drop this in (apart from it being in D),
since speller.c creates all variations on a string (=inefficient) and
uses a callback function to check if a variation is a valid symbol.

I'll have more time to look at this next week.

Several others have privately brought up this problem to Walter. He
does not want to change how the symbol lookup tables work, and there is
no way to iterate them. Therefore, without a way to iterate current
symbols, this is the only way the algo can be written.

However, according to reports on the latest beta, he's sped up the
lookup times for symbols significantly enough to perceptually reduce the
problem.

Because of the nature of the algorithm, the longer the invalid symbol,
the slower the algorithm. It would be interesting to see a comparison
between the current beta code and code that does a full iteration with
very long symbols. I don't know if anyone wants to look at modifying
the symbol lookup data structures to allow iteration, it may be
perceptually insignificant, and unimportant for most developers.

A quick test on a long symbol name:

void
s023456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012
45678901234567891()
{}
void main()
{

s1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
5678901234567890();

}

And on the various compilers:

2.043: 0.052s, does not suggest
2.045: 5.4s, suggests the alternative
beta: 0.8s, does not suggest

2.043 only does spell checking with a Levenshtein distance of 1, 2.045
does 2, but is extremely slow. The beta does a distance of 2, but only
if the errors are close together (Walter added this as an optimization
to remove one factor from the runtime complexity).

So clearly, there is still room for improvement, I think with a proper
symbol iteration, we could get the time down to be as quick as 2.043 or
faster *and* provide the ability to check for a complete LD of 2 where
the errors are not close together. We might be able to even push the LD
to 3 or 4.

I've thought about the optimization of errors close together being
checked, and I think the counter case is the case which takes the
longest -- a long symbol. Typically people use camelCase to denote
symbols, what if two of the words in the symbol were misspelled by one
character (or a capitalization was forgotten)?

It may not be an issue, the spell checker is simply a nice hint, but
isn't essential to determine errors.

-Steve

I can't imagine why something as simple as iterating a symbol lookup
table can't be implemented.

(in fact I iterate that lookup table in Descent to show
autocompletions... I don't generate every possible string in the world
and see if it's a symbol, and if so, suggest it as an autocompletion...
hmmm...)

May 10 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 10 May 2010 09:22:21 -0400, Ary Borenszweig <ary esperanto.org.ar>  
wrote:

 Steven Schveighoffer wrote:

  Several others have privately brought up this problem to Walter.  He  
 does not want to change how the symbol lookup tables work, and there is  
 no way to iterate them.

 I can't imagine why something as simple as iterating a symbol lookup  
 table can't be implemented.

 (in fact I iterate that lookup table in Descent to show  
 autocompletions... I don't generate every possible string in the world  
 and see if it's a symbol, and if so, suggest it as an autocompletion...  
 hmmm...)

On that point I'm just the messenger, I'm not sure why Walter has that  
position, and I don't know enough about the code to know how to fix it.

-Steve

May 10 2010

BCS <none anon.com> writes:

Hello Steven,

 Several others have privately brought up this problem to Walter. He
 does not want to change how the symbol lookup tables work, and there
 is no way to iterate them.
 

Is it fundamentally impossible to iterate or is the code just not there and/or 
nasty to write?

-- 
... <IXOYE><

May 10 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 10 May 2010 11:41:08 -0400, BCS <none anon.com> wrote:

 Hello Steven,

 Several others have privately brought up this problem to Walter. He
 does not want to change how the symbol lookup tables work, and there
 is no way to iterate them.

 Is it fundamentally impossible to iterate or is the code just not there  
 and/or nasty to write?

I can only speculate, but I would guess the latter.  What I can think of  
is there may be multiple keys for the same symbol (for example alises, or  
fully qualified names).  But even those should be enumerable.

I think the tables themselves are simple hash-maps, but the layers and  
shadowing is where the problems lie.  I don't have a very educated opinion  
on this, so it may be something else entirely.

-Steve

May 10 2010

Walter Bright <newshound1 digitalmars.com> writes:

Steven Schveighoffer wrote:
 It may not be an issue, the spell checker is simply a nice hint, but 
 isn't essential to determine errors.

It's a good summary of the situation. I've felt that as long as the message
took 
under a second to generate, it was ok. Realistically, I don't see anyone using 
identifiers as long as you tested unless they were machine generated.

In my own code, I've found the spell checker is more than a nice hint. Most of 
the time it nails it and speeds fixing the problem because I don't need to 
myself go and manually look up the correct spelling.

Iterating through the symbol table is not impossible, it's just a lot of work
as 
it is not designed for that. There are also some issues with shadowing. 
Furthermore, I'd like the speller to work with the symbol table in the C++ 
compiler, too, which I don't wish to reimplement.

May 10 2010

D Programming

C/C++ Programming

Other

digitalmars.D.announce - dmd 1.060 and 2.045 release