digitalmars.D - Natural language parsing (NLP) with D
- Eliatto (16/16) Oct 20 2015 Hello! I am rather new to D ecosystem (I am a C++ developer). I
- Rikki Cattermole (12/25) Oct 20 2015 The only thing I could find on code.dlang.org was
- ponce (5/6) Oct 20 2015 We aren't numerous, so there hasn't been anyone to tackle the NLP
- Andrei Alexandrescu (5/18) Oct 20 2015 In my NLP days I remember the common procedure was to run
- Chris (13/14) Oct 20 2015 I work with NLP almost all the time and D is very well suited for
- bachmeier (6/22) Oct 20 2015 It's not my area, but are you thinking of something like Freeling?
- Chris (40/46) Oct 20 2015 Interesting, I heard of it a while ago. In D I have the following:
- Laeeth Isharc (16/65) Oct 20 2015 Hi.
- Chris (2/17) Oct 21 2015 What exactly is sentiment analysis and how do you go about it?
- Henry Gouk (10/11) Oct 21 2015 Determining whether the sentiment of a piece of text is positive,
- Laeeth Isharc (6/8) Oct 23 2015 Chris - please drop me a line. I am sure there are some things
- Eliatto (5/8) Oct 20 2015 I think that in order to make a new wrapper more popular, it
- bachmeier (12/21) Oct 21 2015 The internet doesn't need another discussion about licensing.
Hello! I am rather new to D ecosystem (I am a C++ developer). I know that there are code-dlang and awesome-D collections of libraries. But I have not found any NLP libraries in D (https://github.com/jogojapan/drulex is not worth mentioning), though there are Go and Rust NLP libraries on github (they are new languages too). Why is this field unpopular among (D)evelopers? What can be used for base POS tagging and NP chunking of English texts instead? I mean wrapping some C/C++ library without porting. Which one will cause minimal headache during glueing with D? P.S. I suppose that it will be nice to see the histogram of libs using "awesome-D" list. For example, one rectangle shows 3D engine percentage(libs number divided by total awesome-D libs count and multiplied by 100), another shows logger libs percentage...
Oct 20 2015
On 21/10/15 1:01 AM, Eliatto wrote:Hello! I am rather new to D ecosystem (I am a C++ developer). I know that there are code-dlang and awesome-D collections of libraries. But I have not found any NLP libraries in D (https://github.com/jogojapan/drulex is not worth mentioning), though there are Go and Rust NLP libraries on github (they are new languages too). Why is this field unpopular among (D)evelopers? What can be used for base POS tagging and NP chunking of English texts instead? I mean wrapping some C/C++ library without porting. Which one will cause minimal headache during glueing with D? P.S. I suppose that it will be nice to see the histogram of libs using "awesome-D" list. For example, one rectangle shows 3D engine percentage(libs number divided by total awesome-D libs count and multiplied by 100), another shows logger libs percentage...The only thing I could find on code.dlang.org was https://github.com/Herringway/natcmp Not really what you want I think. In terms of binding c/c++ to D, you should be able to do it almost whole sale. You will need to create shims on C++'s side for certain features such as operator overloads, templates and of course creation. At least as of what I know. There was some serious C++ improvement fairly recently (DDMD) so somebody else will need to confirm about it. As for which C/C++ library you should base off of? Well no idea, what would you like to use? Also this would be better suited for D.learn.
Oct 20 2015
On Tuesday, 20 October 2015 at 12:01:44 UTC, Eliatto wrote:Why is this field unpopular among (D)evelopers?We aren't numerous, so there hasn't been anyone to tackle the NLP problems now (and many other domains). There is plenty of space to start domain-specific libraries. You could do it :)
Oct 20 2015
On 10/20/2015 08:01 AM, Eliatto wrote:Hello! I am rather new to D ecosystem (I am a C++ developer). I know that there are code-dlang and awesome-D collections of libraries. But I have not found any NLP libraries in D (https://github.com/jogojapan/drulex is not worth mentioning), though there are Go and Rust NLP libraries on github (they are new languages too). Why is this field unpopular among (D)evelopers? What can be used for base POS tagging and NP chunking of English texts instead? I mean wrapping some C/C++ library without porting. Which one will cause minimal headache during glueing with D? P.S. I suppose that it will be nice to see the histogram of libs using "awesome-D" list. For example, one rectangle shows 3D engine percentage(libs number divided by total awesome-D libs count and multiplied by 100), another shows logger libs percentage...In my NLP days I remember the common procedure was to run taggers/chunkers/etc as processes driven by scripts. That said, a library offers more options and it would be interesting to see such in code.dlang.org. -- Andrei
Oct 20 2015
On Tuesday, 20 October 2015 at 12:01:44 UTC, Eliatto wrote:Why is this field unpopular among (D)evelopers?I work with NLP almost all the time and D is very well suited for it. It's mainly text-to-speech stuff, but I have a tiny POS tagger (or rather POS identifier) as well. D would be well suited for creating higher, simpler rule languages that linguists who have no clue about programming could easily use. I've been thinking about this for a while now, and I wish I had the time to come up with something and implement it. I'm thinking of a suite that would cater for the various aspects of NLP, e.g. phonemic transcriptions, POS tagging, morphological and grammatical analysis, collocation etc. A one stop shop for linguists. But, alas, time is scarce. If you have any ideas, please share.
Oct 20 2015
On Tuesday, 20 October 2015 at 12:01:44 UTC, Eliatto wrote:Hello! I am rather new to D ecosystem (I am a C++ developer). I know that there are code-dlang and awesome-D collections of libraries. But I have not found any NLP libraries in D (https://github.com/jogojapan/drulex is not worth mentioning), though there are Go and Rust NLP libraries on github (they are new languages too). Why is this field unpopular among (D)evelopers? What can be used for base POS tagging and NP chunking of English texts instead? I mean wrapping some C/C++ library without porting. Which one will cause minimal headache during glueing with D? P.S. I suppose that it will be nice to see the histogram of libs using "awesome-D" list. For example, one rectangle shows 3D engine percentage(libs number divided by total awesome-D libs count and multiplied by 100), another shows logger libs percentage...It's not my area, but are you thinking of something like Freeling? http://nlp.lsi.upc.edu/freeling/ Asking for a friend. I think a C++ expert could get it to work with D with little difficulty, at least by creating C bindings, but I'm not a C++ expert and I failed.
Oct 20 2015
On Tuesday, 20 October 2015 at 15:49:18 UTC, bachmeier wrote:It's not my area, but are you thinking of something like Freeling? http://nlp.lsi.upc.edu/freeling/ Asking for a friend. I think a C++ expert could get it to work with D with little difficulty, at least by creating C bindings, but I'm not a C++ expert and I failed.Interesting, I heard of it a while ago. In D I have the following: Text tokenization Yes. Sentence splitting Yes. Morphological analysis Yes. Suffix treatment [, retokenization of clitic pronouns] Yes. Flexible multiword recognition Yes. Contraction splitting Depends on what they mean. But I can handle contractions like "l'ami". Probabilistic prediction of unkown word categories No. Phonetic encoding Transcription? If so, yes. SED-based search for similar words in dictionary No. Named entity detection No. Recognition of dates, numbers, ratios, currency, and physical magnitudes (speed, weight, temperature, density, etc.) Partially implemented. PoS tagging Started. Chart-based shallow parsing No. Named entity classification No. WordNet-based sense annotation and disambiguation No. Rule-based dependency parsing No. Nominal correference resolution No. If anyone is interested in starting something like FreeLing in D, please share your thoughts.
Oct 20 2015
On Tuesday, 20 October 2015 at 16:01:41 UTC, Chris wrote:On Tuesday, 20 October 2015 at 15:49:18 UTC, bachmeier wrote:Hi. I am very interested in this topic (especially sentiment analysis), and slowly I am getting a bit more firepower. I started porting the Python version of the stanford NLP API (the underlying code is Java) to D - it's not very complicated, but I have too much on my plate and so it goes slowly. I would be interested in working together on this with others, and I don't mind open sourcing the building blocks (which is really the time consuming bit). I hope to have some others from D world helping me, so it should go a bit faster, although the NLP stuff might not be the first project we work on. Feel free to drop me an email. Laeeth At kaleidicassociates.com Thanks. LaeethIt's not my area, but are you thinking of something like Freeling? http://nlp.lsi.upc.edu/freeling/ Asking for a friend. I think a C++ expert could get it to work with D with little difficulty, at least by creating C bindings, but I'm not a C++ expert and I failed.Interesting, I heard of it a while ago. In D I have the following: Text tokenization Yes. Sentence splitting Yes. Morphological analysis Yes. Suffix treatment [, retokenization of clitic pronouns] Yes. Flexible multiword recognition Yes. Contraction splitting Depends on what they mean. But I can handle contractions like "l'ami". Probabilistic prediction of unkown word categories No. Phonetic encoding Transcription? If so, yes. SED-based search for similar words in dictionary No. Named entity detection No. Recognition of dates, numbers, ratios, currency, and physical magnitudes (speed, weight, temperature, density, etc.) Partially implemented. PoS tagging Started. Chart-based shallow parsing No. Named entity classification No. WordNet-based sense annotation and disambiguation No. Rule-based dependency parsing No. Nominal correference resolution No. If anyone is interested in starting something like FreeLing in D, please share your thoughts.
Oct 20 2015
On Tuesday, 20 October 2015 at 18:43:54 UTC, Laeeth Isharc wrote:Hi. I am very interested in this topic (especially sentiment analysis), and slowly I am getting a bit more firepower. I started porting the Python version of the stanford NLP API (the underlying code is Java) to D - it's not very complicated, but I have too much on my plate and so it goes slowly. I would be interested in working together on this with others, and I don't mind open sourcing the building blocks (which is really the time consuming bit). I hope to have some others from D world helping me, so it should go a bit faster, although the NLP stuff might not be the first project we work on. Feel free to drop me an email. Laeeth At kaleidicassociates.com Thanks. LaeethWhat exactly is sentiment analysis and how do you go about it?
Oct 21 2015
On Wednesday, 21 October 2015 at 09:09:27 UTC, Chris wrote:What exactly is sentiment analysis and how do you go about it?Determining whether the sentiment of a piece of text is positive, neutral, or negative. Currently twitter is a pretty popular source of data in academia, as emoticons can be used as sufficiently accurate proxies for labels. Using psuedo-labelled tweets, one can then come up with a feature representation (e.g. bag of words, tf-idf) and use some sort of classifier (e.g. linear SVM or softmax regression) to determine the sentiment of novel tweets. This is a pretty simple approach, and probably not hard to improve on.
Oct 21 2015
On Tuesday, 20 October 2015 at 16:01:41 UTC, Chris wrote:If anyone is interested in starting something like FreeLing in D, please share your thoughts.Chris - please drop me a line. I am sure there are some things we could work together on over time. auto domain="laeeth.com"; auto user="laeeth"; writefln(user~" "~domain);
Oct 23 2015
On Tuesday, 20 October 2015 at 15:49:18 UTC, bachmeier wrote:It's not my area, but are you thinking of something like Freeling? http://nlp.lsi.upc.edu/freeling/I think that in order to make a new wrapper more popular, it should be created with LGPL license (not GPL). Freeling is GPL. Is YamCha worth revival in D? http://chasen.org/~taku/software/yamcha/
Oct 20 2015
On Wednesday, 21 October 2015 at 06:34:44 UTC, Eliatto wrote:On Tuesday, 20 October 2015 at 15:49:18 UTC, bachmeier wrote:The internet doesn't need another discussion about licensing. I'll just say that it depends. However, the most important factors when you currently have nothing to offer are: - How complete is the library? - How many people are using it? - How easy is it to create the bindings? From my conversations, Freeling does quite well on all counts. Maybe that won't work for you personally because you want to use it in a proprietary project. That's not a compelling reason to ignore it, though, as others might want to use it, and they may be willing to comply with the GPL.It's not my area, but are you thinking of something like Freeling? http://nlp.lsi.upc.edu/freeling/I think that in order to make a new wrapper more popular, it should be created with LGPL license (not GPL). Freeling is GPL. Is YamCha worth revival in D? http://chasen.org/~taku/software/yamcha/
Oct 21 2015