www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - converting to/from char[]/string

reply mark <mark qtrac.eu> writes:
I want to use the Porter stemming algorithm.
There's a D implementation here: 
https://tartarus.org/martin/PorterStemmer/d.txt

The main public function's signature is:

char[] stem(char[] p, int i, int j)

But I work entirely in terms of strings (containing individual 
words), so I want to add another function with this signature:

string stem(string word)

I've tried this without success:

     public string stem(string word) {
         import std.conv: to;

         char[] chars = word.to!char[];
         int end = chars.length.to!int;
         return stem(chars, 0, end).to!string;
     }

Here are just a few of the errors:

src/porterstemmer.d(197,13): Error: cannot implicitly convert 
expression s.length of type ulong to int
src/porterstemmer.d(222,9): Error: cannot implicitly convert 
expression cast(ulong)this.m_j + s.length of type ulong to int
src/porterstemmer.d(259,12): Error: function 
porterstemmer.PorterStemmer.ends(char[] s) is not callable using 
argument types (string)
src/porterstemmer.d(259,12):        cannot pass argument "sses" 
of type string to parameter char[] s
Mar 05 2020
next sibling parent reply drug <drug2004 bk.ru> writes:
On 3/5/20 2:03 PM, mark wrote:
 I want to use the Porter stemming algorithm.
 There's a D implementation here: 
 https://tartarus.org/martin/PorterStemmer/d.txt
 
 The main public function's signature is:
 
 char[] stem(char[] p, int i, int j)
 
 But I work entirely in terms of strings (containing individual words), 
 so I want to add another function with this signature:
 
 string stem(string word)
 
 I've tried this without success:
 
      public string stem(string word) {
          import std.conv: to;
 
          char[] chars = word.to!char[];
          int end = chars.length.to!int; >          return
stem(chars, 0, end).to!string;
      }
 
 Here are just a few of the errors:
 
 src/porterstemmer.d(197,13): Error: cannot implicitly convert expression 
 s.length of type ulong to int
 src/porterstemmer.d(222,9): Error: cannot implicitly convert expression 
 cast(ulong)this.m_j + s.length of type ulong to int
 src/porterstemmer.d(259,12): Error: function 
 porterstemmer.PorterStemmer.ends(char[] s) is not callable using 
 argument types (string)
 src/porterstemmer.d(259,12):        cannot pass argument "sses" of type 
 string to parameter char[] s
 
Your code and errors seem to be not related.
Mar 05 2020
parent reply mark <mark qtrac.eu> writes:
On Thursday, 5 March 2020 at 11:12:24 UTC, drug wrote:
 On 3/5/20 2:03 PM, mark wrote:
[snip]
 Your code and errors seem to be not related.
OK, it is probably that the D stemmer is 19 years old! I've now got Martin Porter's own Java version, so I'll have a go at porting that to D myself.
Mar 05 2020
parent reply Dennis <dkorpel gmail.com> writes:
On Thursday, 5 March 2020 at 11:31:43 UTC, mark wrote:
 I've now got Martin Porter's own Java version, so I'll have a 
 go at porting that to D myself.
I don't think that's necessary, the errors seem easy to fix.
 src/porterstemmer.d(197,13): Error: cannot implicitly convert 
 expression s.length of type ulong to int
 src/porterstemmer.d(222,9): Error: cannot implicitly convert 
 expression cast(ulong)this.m_j + s.length of type ulong to int
These errors are probably because the code was only compiled on 32-bit targets where .length is of type `uint`, but you are compiling on 64-bit where .length is of type `ulong`. A quick fix is to simply cast the result like `cast(int) s.length` and `cast(int) (this.m_j + s.length)`, though a proper fix would be to change the types of variables to `long`, `size_t`, `auto` or `const` (depending on which is most appropriate).
 src/porterstemmer.d(259,12): Error: function 
 porterstemmer.PorterStemmer.ends(char[] s) is not callable 
 using argument types (string)
 src/porterstemmer.d(259,12):        cannot pass argument "sses" 
 of type string to parameter char[] s
These errors are because `string` is `immutable(char)[]`, meaning the characters may not be modified, while the function accepts a `char[]` which is allowed to mutate the characters. I don't think the functions actually do that, so you can simply change `char[]` into `const(char)[]` so a string can be passed to those functions.
Mar 05 2020
parent reply mark <mark qtrac.eu> writes:
I changed int to size_t and used const(char[]) etc. as suggested.
It ran but crashed. Each crash was a range violation, so for each 
one I put in a guard so instead of

if ( ... m_b[m_k])

I used

if (m_k < m_b.length && ... m_b[m_k)

I did this kind of fix in three places.

The result is that it does some but not all the stemming!

Anyway, I'll compare it with the Python version and see if I can 
spot the problem(s).

Thanks.
Mar 05 2020
parent mark <mark qtrac.eu> writes:
I suspect the problem is using .length rather than some other 
size property.
Mar 05 2020
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 5 March 2020 at 11:03:30 UTC, mark wrote:
 I want to use the Porter stemming algorithm.
 There's a D implementation here: 
 https://tartarus.org/martin/PorterStemmer/d.txt
I think I (or ketmar and I stole it from him) ported that very same file before: https://github.com/adamdruppe/adrdox/blob/master/stemmer.d By just adding `const` where appropriate it becomes compatible with string and you can slice to take care of the size thing. https://github.com/adamdruppe/adrdox/blob/master/stemmer.d#L512 is that stem function as a const slice
Mar 05 2020
parent mark <mark qtrac.eu> writes:
On Thursday, 5 March 2020 at 13:31:14 UTC, Adam D. Ruppe wrote:
 On Thursday, 5 March 2020 at 11:03:30 UTC, mark wrote:
 I want to use the Porter stemming algorithm.
 There's a D implementation here: 
 https://tartarus.org/martin/PorterStemmer/d.txt
I think I (or ketmar and I stole it from him) ported that very same file before: https://github.com/adamdruppe/adrdox/blob/master/stemmer.d By just adding `const` where appropriate it becomes compatible with string and you can slice to take care of the size thing. https://github.com/adamdruppe/adrdox/blob/master/stemmer.d#L512 is that stem function as a const slice
I thought the problem was using char[] rather than dchar[], but evidently not. I downloaded yours and it "just works": I didn't have to change anything. (dscanner gives a couple of const/immutable hints which I'll fix, but still.) Might be good to ask to add yours to https://tartarus.org/martin/PorterStemmer/ since it works and the old one doesn't. Thank you!
Mar 05 2020