digitalmars.D.learn - converting to/from char[]/string
- mark (25/25) Mar 05 2020 I want to use the Porter stemming algorithm.
- drug (2/36) Mar 05 2020 Your code and errors seem to be not related.
- mark (5/7) Mar 05 2020 OK, it is probably that the D stemmer is 19 years old!
- Adam D. Ruppe (8/11) Mar 05 2020 I think I (or ketmar and I stole it from him) ported that very
- mark (10/21) Mar 05 2020 I thought the problem was using char[] rather than dchar[], but
I want to use the Porter stemming algorithm. There's a D implementation here: https://tartarus.org/martin/PorterStemmer/d.txt The main public function's signature is: char[] stem(char[] p, int i, int j) But I work entirely in terms of strings (containing individual words), so I want to add another function with this signature: string stem(string word) I've tried this without success: public string stem(string word) { import std.conv: to; char[] chars = word.to!char[]; int end = chars.length.to!int; return stem(chars, 0, end).to!string; } Here are just a few of the errors: src/porterstemmer.d(197,13): Error: cannot implicitly convert expression s.length of type ulong to int src/porterstemmer.d(222,9): Error: cannot implicitly convert expression cast(ulong)this.m_j + s.length of type ulong to int src/porterstemmer.d(259,12): Error: function porterstemmer.PorterStemmer.ends(char[] s) is not callable using argument types (string) src/porterstemmer.d(259,12): cannot pass argument "sses" of type string to parameter char[] s
Mar 05 2020
On 3/5/20 2:03 PM, mark wrote:I want to use the Porter stemming algorithm. There's a D implementation here: https://tartarus.org/martin/PorterStemmer/d.txt The main public function's signature is: char[] stem(char[] p, int i, int j) But I work entirely in terms of strings (containing individual words), so I want to add another function with this signature: string stem(string word) I've tried this without success: public string stem(string word) { import std.conv: to; char[] chars = word.to!char[]; int end = chars.length.to!int; > return stem(chars, 0, end).to!string; } Here are just a few of the errors: src/porterstemmer.d(197,13): Error: cannot implicitly convert expression s.length of type ulong to int src/porterstemmer.d(222,9): Error: cannot implicitly convert expression cast(ulong)this.m_j + s.length of type ulong to int src/porterstemmer.d(259,12): Error: function porterstemmer.PorterStemmer.ends(char[] s) is not callable using argument types (string) src/porterstemmer.d(259,12): cannot pass argument "sses" of type string to parameter char[] sYour code and errors seem to be not related.
Mar 05 2020
On Thursday, 5 March 2020 at 11:12:24 UTC, drug wrote:On 3/5/20 2:03 PM, mark wrote:[snip]Your code and errors seem to be not related.OK, it is probably that the D stemmer is 19 years old! I've now got Martin Porter's own Java version, so I'll have a go at porting that to D myself.
Mar 05 2020
On Thursday, 5 March 2020 at 11:31:43 UTC, mark wrote:I've now got Martin Porter's own Java version, so I'll have a go at porting that to D myself.I don't think that's necessary, the errors seem easy to fix.src/porterstemmer.d(197,13): Error: cannot implicitly convert expression s.length of type ulong to int src/porterstemmer.d(222,9): Error: cannot implicitly convert expression cast(ulong)this.m_j + s.length of type ulong to intThese errors are probably because the code was only compiled on 32-bit targets where .length is of type `uint`, but you are compiling on 64-bit where .length is of type `ulong`. A quick fix is to simply cast the result like `cast(int) s.length` and `cast(int) (this.m_j + s.length)`, though a proper fix would be to change the types of variables to `long`, `size_t`, `auto` or `const` (depending on which is most appropriate).src/porterstemmer.d(259,12): Error: function porterstemmer.PorterStemmer.ends(char[] s) is not callable using argument types (string) src/porterstemmer.d(259,12): cannot pass argument "sses" of type string to parameter char[] sThese errors are because `string` is `immutable(char)[]`, meaning the characters may not be modified, while the function accepts a `char[]` which is allowed to mutate the characters. I don't think the functions actually do that, so you can simply change `char[]` into `const(char)[]` so a string can be passed to those functions.
Mar 05 2020
I changed int to size_t and used const(char[]) etc. as suggested. It ran but crashed. Each crash was a range violation, so for each one I put in a guard so instead of if ( ... m_b[m_k]) I used if (m_k < m_b.length && ... m_b[m_k) I did this kind of fix in three places. The result is that it does some but not all the stemming! Anyway, I'll compare it with the Python version and see if I can spot the problem(s). Thanks.
Mar 05 2020
I suspect the problem is using .length rather than some other size property.
Mar 05 2020
On Thursday, 5 March 2020 at 11:03:30 UTC, mark wrote:I want to use the Porter stemming algorithm. There's a D implementation here: https://tartarus.org/martin/PorterStemmer/d.txtI think I (or ketmar and I stole it from him) ported that very same file before: https://github.com/adamdruppe/adrdox/blob/master/stemmer.d By just adding `const` where appropriate it becomes compatible with string and you can slice to take care of the size thing. https://github.com/adamdruppe/adrdox/blob/master/stemmer.d#L512 is that stem function as a const slice
Mar 05 2020
On Thursday, 5 March 2020 at 13:31:14 UTC, Adam D. Ruppe wrote:On Thursday, 5 March 2020 at 11:03:30 UTC, mark wrote:I thought the problem was using char[] rather than dchar[], but evidently not. I downloaded yours and it "just works": I didn't have to change anything. (dscanner gives a couple of const/immutable hints which I'll fix, but still.) Might be good to ask to add yours to https://tartarus.org/martin/PorterStemmer/ since it works and the old one doesn't. Thank you!I want to use the Porter stemming algorithm. There's a D implementation here: https://tartarus.org/martin/PorterStemmer/d.txtI think I (or ketmar and I stole it from him) ported that very same file before: https://github.com/adamdruppe/adrdox/blob/master/stemmer.d By just adding `const` where appropriate it becomes compatible with string and you can slice to take care of the size thing. https://github.com/adamdruppe/adrdox/blob/master/stemmer.d#L512 is that stem function as a const slice
Mar 05 2020