digitalmars.D.announce - Port of Python's difflib.SequenceMatcher class

Michael Butscher (9/9) Dec 02 2006 Hi,

Walter Bright (3/13) Dec 02 2006 Yes: please put up a web page about it! See
Pragma (11/23) Dec 04 2006 I agree with Walter that you should throw this up on a page somewhere.

Michael Butscher (42/65) Dec 06 2006 At least I have mentioned it on the page

Bill Baxter (26/41) Dec 06 2006 +1. Me too.

Oskar Linde (16/43) Dec 07 2006 And what compiler do you use? The above code works perfectly. :)
Oskar Linde (10/28) Dec 07 2006 Sorry, i missed this part. The compiler is confused by not being able to...

Bill Baxter (6/39) Dec 07 2006 Oh, ok. So I was right, but for the wrong reason. :-) The compiler

Michael Butscher <mbutscher gmx.de> writes:

Hi, 

a D port (version 0.175) of Python's difflib.SequenceMatcher class to 
generate diff's is available at

  http://www.mbutscher.de/snippets/difflib_d20061202.zip

It might need some cleaning up yet but the translated doctests pass 
(except one I couldn't make compile in D, but "in theory" it passes as 
well).

Comments, critique?



Michael

Dec 02 2006

Walter Bright <newshound digitalmars.com> writes:

Michael Butscher wrote:
 a D port (version 0.175) of Python's difflib.SequenceMatcher class to 
 generate diff's is available at
 
   http://www.mbutscher.de/snippets/difflib_d20061202.zip
 
 It might need some cleaning up yet but the translated doctests pass 
 (except one I couldn't make compile in D, but "in theory" it passes as 
 well).
 
 Comments, critique?

Yes: please put up a web page about it! See 
http://www.digitalmars.com/d/howto-promote.html

Dec 02 2006

Pragma <ericanderton yahoo.removeme.com> writes:

Michael Butscher wrote:
 Hi, 
 
 a D port (version 0.175) of Python's difflib.SequenceMatcher class to 
 generate diff's is available at
 
   http://www.mbutscher.de/snippets/difflib_d20061202.zip
 
 It might need some cleaning up yet but the translated doctests pass 
 (except one I couldn't make compile in D, but "in theory" it passes as 
 well).
 
 Comments, critique?

I agree with Walter that you should throw this up on a page somewhere. 
I'm curious, but rarely have time to sift through sourcecode unless I'm 
in need of something specific - I develop using SVN 99% of the time, 
which does .diff output for me already.

But I *am* curious about how the porting went, what the pitfalls were, 
and how you worked around Python idioms and tuple types.  Also, I'm 
wondering if the D version brings any extra perks like better 
performance, or less/clearer code?

-- 
- EricAnderton at yahoo

Dec 04 2006

Michael Butscher <mbutscher gmx.de> writes:

Pragma wrote:
 Michael Butscher wrote:
 Hi, 
 
 a D port (version 0.175) of Python's difflib.SequenceMatcher class to 
 generate diff's is available at
 
   http://www.mbutscher.de/snippets/difflib_d20061202.zip
 
 It might need some cleaning up yet but the translated doctests pass 
 (except one I couldn't make compile in D, but "in theory" it passes as 
 well).
 
 Comments, critique?

 
 I agree with Walter that you should throw this up on a page somewhere. 

At least I have mentioned it on the page

  http://www.mbutscher.de/software.html

as a "snippet" (it isn't much more, I think).



 I'm curious, but rarely have time to sift through sourcecode unless I'm 
 in need of something specific - I develop using SVN 99% of the time, 
 which does .diff output for me already.

I will need it later for a project written in Python (kind of personal 
wiki without server) to allow to store different versions of a wiki 
page.

When the time comes, I will add a little C interface for a DLL which 
mainly can create some sort of binary diff of two arbitrary byte-blocks 
and allows to apply the diff to the first block to create the second.


 But I *am* curious about how the porting went, what the pitfalls were, 
 and how you worked around Python idioms and tuple types.

- The often used "self" was just translated to "this" therefore the 
code looks a bit weird in D, e.g.:


    void set_seq2(ST b)
    {
        if (b is this.b)
            return;
        this.b = b;
        this.matching_blocks = null;
        this.opcodes = null;
        this.fullbcount = null;
        this.chain_b();
    }


- One thing I really missed in D was the get() method for Python 
dictionaries with a default argument. Therefore I created inner 
functions like

        IndexType j2lenget(IndexType i, IndexType def)
        {
            IndexType* result = i in j2len;
            if (result)
                return *result;
            else
                return def;
        }

Probably this can be done more elegantly, but I personally think that
get() should be a standard method of AAs.



- The class used only two types of tuples which had clear purposes, so 
they were translated into structs without much harm.



 Also, I'm 
 wondering if the D version brings any extra perks like better 
 performance, or less/clearer code?

I have not yet done any benchmarks, but I just assume that D is much 
faster.


The D code is a bit longer and IMHO a bit less readable than Python, 
but I'm much more used to Python than D.


Michael

Dec 06 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Michael Butscher wrote:

 - One thing I really missed in D was the get() method for Python 
 dictionaries with a default argument. Therefore I created inner 
 functions like
 
         IndexType j2lenget(IndexType i, IndexType def)
         {
             IndexType* result = i in j2len;
             if (result)
                 return *result;
             else
                 return def;
         }
 
 Probably this can be done more elegantly, but I personally think that
 get() should be a standard method of AAs.

+1.  Me too.

If IFTI were smarter, something like this would do the trick:

V get(V,K)(V[K] dict, K key, V def = V.init)
{
     V* ptr = key in dict;
     return ptr? *ptr: def;
}

The property trick works for AA's too so taking one instance of that:

char[] get(char[][int] dict, int key, char[] def = null)
{
     char[]* ptr = key in dict;
     return ptr? *ptr: def;
}

you can do:

     char[][int] i2s;
     i2s[1] = "Hello";
     i2s[5] = "There";

     writefln( i2s.get(1, "yeh") );
     writefln( i2s.get(2, "default") );
     writefln( i2s.get(1) );
     writefln( i2s.get(2) );

Too bad the template version doesn't work.
D doesn't seem to be able to pick out the V and K from an associative 
array argument.

--bb

Dec 06 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Bill Baxter wrote:
 Michael Butscher wrote:
 
 - One thing I really missed in D was the get() method for Python 
 dictionaries with a default argument. Therefore I created inner 
 functions like

         IndexType j2lenget(IndexType i, IndexType def)
         {
             IndexType* result = i in j2len;
             if (result)
                 return *result;
             else
                 return def;
         }

 Probably this can be done more elegantly, but I personally think that
 get() should be a standard method of AAs.

 
 +1.  Me too.
 
 If IFTI were smarter, something like this would do the trick:
 
 V get(V,K)(V[K] dict, K key, V def = V.init)
 {
     V* ptr = key in dict;
     return ptr? *ptr: def;
 }

And what compiler do you use? The above code works perfectly. :)

The following two get functions have been part of my own standard 
imports for quite a while and I find them very handy.

T get(T,U)(T[U] aa, U key) {
         T* ptr = key in aa;
         return ptr ? *ptr : T.init;
}

bool get(T,U,int dummy=1)(T[U] aa, U key, out T val) {
         T* ptr = key in aa;
         if (!ptr)
                 return false;
         val = *ptr;
         return true;
}

/Oskar

Dec 07 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Bill Baxter wrote:

 V get(V,K)(V[K] dict, K key, V def = V.init)
 {
     V* ptr = key in dict;
     return ptr? *ptr: def;
 }
 

[snip]
     char[][int] i2s;
     i2s[1] = "Hello";
     i2s[5] = "There";
 
     writefln( i2s.get(1, "yeh") );
     writefln( i2s.get(2, "default") );
     writefln( i2s.get(1) );
     writefln( i2s.get(2) );
 
 Too bad the template version doesn't work.
 D doesn't seem to be able to pick out the V and K from an associative 
 array argument.

Sorry, i missed this part. The compiler is confused by not being able to 
tell if V should be char[] or char[3].

writefln( i2s.get(1, "yeh"[]) );
writefln( i2s.get(2, "default"[]) );

both works. So you are right. The IFTI could perhaps be improved by 
figuring out that both V argument types are implicitly convertible to 
the same type.

/Oskar

Dec 07 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Oskar Linde wrote:
 Bill Baxter wrote:
 
 V get(V,K)(V[K] dict, K key, V def = V.init)
 {
     V* ptr = key in dict;
     return ptr? *ptr: def;
 }

 [snip]
     char[][int] i2s;
     i2s[1] = "Hello";
     i2s[5] = "There";

     writefln( i2s.get(1, "yeh") );
     writefln( i2s.get(2, "default") );
     writefln( i2s.get(1) );
     writefln( i2s.get(2) );

 Too bad the template version doesn't work.
 D doesn't seem to be able to pick out the V and K from an associative 
 array argument.

 
 Sorry, i missed this part. The compiler is confused by not being able to 
 tell if V should be char[] or char[3].
 
 writefln( i2s.get(1, "yeh"[]) );
 writefln( i2s.get(2, "default"[]) );
 
 both works. So you are right. The IFTI could perhaps be improved by 
 figuring out that both V argument types are implicitly convertible to 
 the same type.
 
 /Oskar

Oh, ok.  So I was right, but for the wrong reason.  :-)  The compiler 
message wasn't very specific about what it didn't like, just "no match" 
was all it was willing to divulge.

These char[] char[N] conversion issues are rather annoying.


--bb

Dec 07 2006

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Port of Python's difflib.SequenceMatcher class