www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - counting words

reply Benjamin Thaut <code benjamin-thaut.de> writes:
I'm currently making a few tests with std.algorithm, std.range, etc

I have a arry of words. Is it possible to count how often each word is 
contained in the array and then sort the array by the count of the 
individual words by chaining ranges? (e.g. without using a foreach loop 
+ hashmap)?

-- 
Kind Regards
Benjamin Thaut
Jun 28 2013
parent reply "Brad Anderson" <eco gnuk.net> writes:
On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, std.range, 
 etc

 I have a arry of words. Is it possible to count how often each 
 word is contained in the array and then sort the array by the 
 count of the individual words by chaining ranges? (e.g. without 
 using a foreach loop + hashmap)?
If you don't mind sorting twice: words.sort() .group() .array() .sort!((a, b)=> a[1] > b[1]) .map!(a => a[0]) .copy(words); You could also do it with a hashmap to keep the count.
Jun 28 2013
next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, 
 std.range, etc

 I have a arry of words. Is it possible to count how often each 
 word is contained in the array and then sort the array by the 
 count of the individual words by chaining ranges? (e.g. 
 without using a foreach loop + hashmap)?
If you don't mind sorting twice: words.sort() .group() .array() .sort!((a, b)=> a[1] > b[1]) .map!(a => a[0]) .copy(words); You could also do it with a hashmap to keep the count.
Like so: size_t[string] dic; words.map!((w) { ++dic[w.idup]; return w; }) .array // eager (so dic is filled first), sortable .sort!((a, b) { bool less = dic[a] > dic[b]; return less || less && a < b; }) .uniq .copy(words); It's a bit ugly and abuses side effects with the hash map. The order will differ from the other program when words have identical counts.
Jun 28 2013
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 28.06.2013 18:42, schrieb Brad Anderson:
 On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, std.range, etc

 I have a arry of words. Is it possible to count how often each word
 is contained in the array and then sort the array by the count of the
 individual words by chaining ranges? (e.g. without using a foreach
 loop + hashmap)?
If you don't mind sorting twice: words.sort() .group() .array() .sort!((a, b)=> a[1] > b[1]) .map!(a => a[0]) .copy(words); You could also do it with a hashmap to keep the count.
Like so: size_t[string] dic; words.map!((w) { ++dic[w.idup]; return w; }) .array // eager (so dic is filled first), sortable .sort!((a, b) { bool less = dic[a] > dic[b]; return less || less && a < b; }) .uniq .copy(words); It's a bit ugly and abuses side effects with the hash map. The order will differ from the other program when words have identical counts.
I figured something like this by now too. Thank you. But I don't quite understand what the copy is for at the end? -- Kind Regards Benjamin Thaut
Jun 28 2013
parent reply "Brad Anderson" <eco gnuk.net> writes:
On Friday, 28 June 2013 at 16:48:08 UTC, Benjamin Thaut wrote:
 Am 28.06.2013 18:42, schrieb Brad Anderson:
 On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, 
 std.range, etc

 I have a arry of words. Is it possible to count how often 
 each word
 is contained in the array and then sort the array by the 
 count of the
 individual words by chaining ranges? (e.g. without using a 
 foreach
 loop + hashmap)?
If you don't mind sorting twice: words.sort() .group() .array() .sort!((a, b)=> a[1] > b[1]) .map!(a => a[0]) .copy(words); You could also do it with a hashmap to keep the count.
Like so: size_t[string] dic; words.map!((w) { ++dic[w.idup]; return w; }) .array // eager (so dic is filled first), sortable .sort!((a, b) { bool less = dic[a] > dic[b]; return less || less && a < b; }) .uniq .copy(words); It's a bit ugly and abuses side effects with the hash map. The order will differ from the other program when words have identical counts.
I figured something like this by now too. Thank you. But I don't quite understand what the copy is for at the end?
Just replacing your original word list with the sorted list (which I just realized is wrong because it will leave a bunch of words on the end, oops). You could .array it instead to get a new array or just store the range with auto and consume that where needed with no extra array allocation.
Jun 28 2013
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 28.06.2013 19:01, schrieb Brad Anderson:
 On Friday, 28 June 2013 at 16:48:08 UTC, Benjamin Thaut wrote:
 Am 28.06.2013 18:42, schrieb Brad Anderson:
 On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, std.range, etc

 I have a arry of words. Is it possible to count how often each word
 is contained in the array and then sort the array by the count of the
 individual words by chaining ranges? (e.g. without using a foreach
 loop + hashmap)?
If you don't mind sorting twice: words.sort() .group() .array() .sort!((a, b)=> a[1] > b[1]) .map!(a => a[0]) .copy(words); You could also do it with a hashmap to keep the count.
Like so: size_t[string] dic; words.map!((w) { ++dic[w.idup]; return w; }) .array // eager (so dic is filled first), sortable .sort!((a, b) { bool less = dic[a] > dic[b]; return less || less && a < b; }) .uniq .copy(words); It's a bit ugly and abuses side effects with the hash map. The order will differ from the other program when words have identical counts.
I figured something like this by now too. Thank you. But I don't quite understand what the copy is for at the end?
Just replacing your original word list with the sorted list (which I just realized is wrong because it will leave a bunch of words on the end, oops). You could .array it instead to get a new array or just store the range with auto and consume that where needed with no extra array allocation.
Ok, thank you very much -- Kind Regards Benjamin Thaut
Jun 28 2013
prev sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, 
 std.range, etc

 I have a arry of words. Is it possible to count how often each 
 word is contained in the array and then sort the array by the 
 count of the individual words by chaining ranges? (e.g. 
 without using a foreach loop + hashmap)?
If you don't mind sorting twice: words.sort() .group() .array() .sort!((a, b)=> a[1] > b[1]) .map!(a => a[0]) .copy(words); You could also do it with a hashmap to keep the count.
It's just missing the construction scheme for "words". I had this: text .splitter(); .reduce!((words, w)=>++words[w], words)(int[string]); But alas... reduce's signature is not ctfe-able :( Well, I've had a PR open for this for about 8 months now...
Jun 28 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
monarch_dodra:

 Well, I've had a PR open for this for about 8 months now...
Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours. Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-) Bye, bearophile
Jun 28 2013
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
 monarch_dodra:

 Well, I've had a PR open for this for about 8 months now...
Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours. Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-) Bye, bearophile
The answer is negative though :(
Jun 28 2013
parent reply "Brad Anderson" <eco gnuk.net> writes:
On Friday, 28 June 2013 at 18:53:44 UTC, monarch_dodra wrote:
 On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
 monarch_dodra:

 Well, I've had a PR open for this for about 8 months now...
Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours. Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-) Bye, bearophile
The answer is negative though :(
95 merged phobos pull requests (109 total). Long overdue, I'd say. I don't know if there is really a formal process to getting write privileges but if any of the people who decide are reading this I nominate monarch_dodra.
Jun 28 2013
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 28 June 2013 at 19:13:28 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 18:53:44 UTC, monarch_dodra wrote:
 On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
 monarch_dodra:

 Well, I've had a PR open for this for about 8 months now...
Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours. Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-) Bye, bearophile
The answer is negative though :(
95 merged phobos pull requests (109 total). Long overdue, I'd say. I don't know if there is really a formal process to getting write privileges but if any of the people who decide are reading this I nominate monarch_dodra.
How did you come to that number? Total amount of pulls *submitted* ? I count 95 *closed* pull requests, but I think only about 50% of them ever made it through? I'm not sure how to check though (apart from counting 1 by 1...) I'm unsure I'd be totally comfortable pulling inside phobos, but it *is* true that the queue is just getting longer and longer. There are some pulls just sitting there, submitted by trusted people (with pull rights), that are validated and reviewed, but are just waiting for a puller to do the deed. There needs to be a few more reviewers, the ones we have are doing a great job, but are overworked. But I wouldn't want to hijack this thread, nor is learn really the place to talk about that.
Jun 28 2013
parent "Brad Anderson" <eco gnuk.net> writes:
On Friday, 28 June 2013 at 19:31:22 UTC, monarch_dodra wrote:
 On Friday, 28 June 2013 at 19:13:28 UTC, Brad Anderson wrote:
 95 merged phobos pull requests (109 total).  Long overdue, I'd 
 say.  I don't know if there is really a formal process to 
 getting write privileges but if any of the people who decide 
 are reading this I nominate monarch_dodra.
How did you come to that number? Total amount of pulls *submitted* ? I count 95 *closed* pull requests, but I think only about 50% of them ever made it through? I'm not sure how to check though (apart from counting 1 by 1...) I'm unsure I'd be totally comfortable pulling inside phobos, but it *is* true that the queue is just getting longer and longer. There are some pulls just sitting there, submitted by trusted people (with pull rights), that are validated and reviewed, but are just waiting for a puller to do the deed. There needs to be a few more reviewers, the ones we have are doing a great job, but are overworked. But I wouldn't want to hijack this thread, nor is learn really the place to talk about that.
I counted closed (I assumed most were merged). Github makes it hard to glean data from pull requests. Allow me to share Exhibit B though: https://github.com/D-Programming-Language/phobos/graphs/contributors?from=2012-06-21&to=2013-06-21&type=c Second most prolific contributor by commits to phobos over the last year. I am nominating you to do extra bureaucratic work though so I'm kind of being an asshole when you think about it. :P
Jun 28 2013