digitalmars.D.learn - counting words

Benjamin Thaut (8/8) Jun 28 2013 I'm currently making a few tests with std.algorithm, std.range, etc

Brad Anderson (9/15) Jun 28 2013 If you don't mind sorting twice:

Brad Anderson (12/28) Jun 28 2013 Like so:

Benjamin Thaut (6/35) Jun 28 2013 I figured something like this by now too. Thank you.

Brad Anderson (6/48) Jun 28 2013 Just replacing your original word list with the sorted list

Benjamin Thaut (5/47) Jun 28 2013 Ok, thank you very much

monarch_dodra (7/23) Jun 28 2013 It's just missing the construction scheme for "words". I had this:

bearophile (8/9) Jun 28 2013 Do you have Phobos commit rights? If the answer is positive then

monarch_dodra (2/11) Jun 28 2013 The answer is negative though :(

Brad Anderson (5/19) Jun 28 2013 95 merged phobos pull requests (109 total). Long overdue, I'd

monarch_dodra (14/34) Jun 28 2013 How did you come to that number? Total amount of pulls

Brad Anderson (9/28) Jun 28 2013 I counted closed (I assumed most were merged). Github makes it

Benjamin Thaut <code benjamin-thaut.de> writes:

I'm currently making a few tests with std.algorithm, std.range, etc

I have a arry of words. Is it possible to count how often each word is 
contained in the array and then sort the array by the count of the 
individual words by chaining ranges? (e.g. without using a foreach loop 
+ hashmap)?

-- 
Kind Regards
Benjamin Thaut

Jun 28 2013

"Brad Anderson" <eco gnuk.net> writes:

On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, std.range, 
 etc

 I have a arry of words. Is it possible to count how often each 
 word is contained in the array and then sort the array by the 
 count of the individual words by chaining ranges? (e.g. without 
 using a foreach loop + hashmap)?

If you don't mind sorting twice:

words.sort()
      .group()
      .array()
      .sort!((a, b)=> a[1] > b[1])
      .map!(a => a[0])
      .copy(words);

You could also do it with a hashmap to keep the count.

Jun 28 2013

"Brad Anderson" <eco gnuk.net> writes:

On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, 
 std.range, etc

 I have a arry of words. Is it possible to count how often each 
 word is contained in the array and then sort the array by the 
 count of the individual words by chaining ranges? (e.g. 
 without using a foreach loop + hashmap)?

 If you don't mind sorting twice:

 words.sort()
      .group()
      .array()
      .sort!((a, b)=> a[1] > b[1])
      .map!(a => a[0])
      .copy(words);

 You could also do it with a hashmap to keep the count.

Like so:

     size_t[string] dic;
     words.map!((w) { ++dic[w.idup]; return w; })
          .array // eager (so dic is filled first), sortable
          .sort!((a, b) { bool less = dic[a] > dic[b]; return less 
|| less && a < b; })
          .uniq
          .copy(words);

It's a bit ugly and abuses side effects with the hash map.  The 
order will differ from the other program when words have 
identical counts.

Jun 28 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 28.06.2013 18:42, schrieb Brad Anderson:
 On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, std.range, etc

 I have a arry of words. Is it possible to count how often each word
 is contained in the array and then sort the array by the count of the
 individual words by chaining ranges? (e.g. without using a foreach
 loop + hashmap)?

 If you don't mind sorting twice:

 words.sort()
      .group()
      .array()
      .sort!((a, b)=> a[1] > b[1])
      .map!(a => a[0])
      .copy(words);

 You could also do it with a hashmap to keep the count.

 Like so:

      size_t[string] dic;
      words.map!((w) { ++dic[w.idup]; return w; })
           .array // eager (so dic is filled first), sortable
           .sort!((a, b) { bool less = dic[a] > dic[b]; return less ||
 less && a < b; })
           .uniq
           .copy(words);

 It's a bit ugly and abuses side effects with the hash map.  The order
 will differ from the other program when words have identical counts.

I figured something like this by now too. Thank you.
But I don't quite understand what the copy is for at the end?

-- 
Kind Regards
Benjamin Thaut

Jun 28 2013

"Brad Anderson" <eco gnuk.net> writes:

On Friday, 28 June 2013 at 16:48:08 UTC, Benjamin Thaut wrote:
 Am 28.06.2013 18:42, schrieb Brad Anderson:
 On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, 
 std.range, etc

 I have a arry of words. Is it possible to count how often 
 each word
 is contained in the array and then sort the array by the 
 count of the
 individual words by chaining ranges? (e.g. without using a 
 foreach
 loop + hashmap)?

 If you don't mind sorting twice:

 words.sort()
     .group()
     .array()
     .sort!((a, b)=> a[1] > b[1])
     .map!(a => a[0])
     .copy(words);

 You could also do it with a hashmap to keep the count.

 Like so:

     size_t[string] dic;
     words.map!((w) { ++dic[w.idup]; return w; })
          .array // eager (so dic is filled first), sortable
          .sort!((a, b) { bool less = dic[a] > dic[b]; return 
 less ||
 less && a < b; })
          .uniq
          .copy(words);

 It's a bit ugly and abuses side effects with the hash map.  
 The order
 will differ from the other program when words have identical 
 counts.

 I figured something like this by now too. Thank you.
 But I don't quite understand what the copy is for at the end?

Just replacing your original word list with the sorted list 
(which I just realized is wrong because it will leave a bunch of 
words on the end, oops).  You could .array it instead to get a 
new array or just store the range with auto and consume that 
where needed with no extra array allocation.

Jun 28 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 28.06.2013 19:01, schrieb Brad Anderson:
 On Friday, 28 June 2013 at 16:48:08 UTC, Benjamin Thaut wrote:
 Am 28.06.2013 18:42, schrieb Brad Anderson:
 On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, std.range, etc

 I have a arry of words. Is it possible to count how often each word
 is contained in the array and then sort the array by the count of the
 individual words by chaining ranges? (e.g. without using a foreach
 loop + hashmap)?

 If you don't mind sorting twice:

 words.sort()
     .group()
     .array()
     .sort!((a, b)=> a[1] > b[1])
     .map!(a => a[0])
     .copy(words);

 You could also do it with a hashmap to keep the count.

 Like so:

     size_t[string] dic;
     words.map!((w) { ++dic[w.idup]; return w; })
          .array // eager (so dic is filled first), sortable
          .sort!((a, b) { bool less = dic[a] > dic[b]; return less ||
 less && a < b; })
          .uniq
          .copy(words);

 It's a bit ugly and abuses side effects with the hash map. The order
 will differ from the other program when words have identical counts.

 I figured something like this by now too. Thank you.
 But I don't quite understand what the copy is for at the end?

 Just replacing your original word list with the sorted list (which I
 just realized is wrong because it will leave a bunch of words on the
 end, oops).  You could .array it instead to get a new array or just
 store the range with auto and consume that where needed with no extra
 array allocation.

Ok, thank you very much

-- 
Kind Regards
Benjamin Thaut

Jun 28 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
 I'm currently making a few tests with std.algorithm, 
 std.range, etc

 I have a arry of words. Is it possible to count how often each 
 word is contained in the array and then sort the array by the 
 count of the individual words by chaining ranges? (e.g. 
 without using a foreach loop + hashmap)?

 If you don't mind sorting twice:

 words.sort()
      .group()
      .array()
      .sort!((a, b)=> a[1] > b[1])
      .map!(a => a[0])
      .copy(words);

 You could also do it with a hashmap to keep the count.

It's just missing the construction scheme for "words". I had this:

     text
         .splitter();
         .reduce!((words, w)=>++words[w], words)(int[string]);

But alas... reduce's signature is not ctfe-able :(

Well, I've had a PR open for this for about 8 months now...

Jun 28 2013

"bearophile" <bearophileHUGS lycos.com> writes:

monarch_dodra:

 Well, I've had a PR open for this for about 8 months now...

Do you have Phobos commit rights? If the answer is positive then 
perhaps you can review and commit some Phobos code written by 
Andrei and he can do the same with yours.
Today it seems the major development bottleneck is the review 
queue and not having enough writing of patches :-)

Bye,
bearophile

Jun 28 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
 monarch_dodra:

 Well, I've had a PR open for this for about 8 months now...

 Do you have Phobos commit rights? If the answer is positive 
 then perhaps you can review and commit some Phobos code written 
 by Andrei and he can do the same with yours.
 Today it seems the major development bottleneck is the review 
 queue and not having enough writing of patches :-)

 Bye,
 bearophile

The answer is negative though :(

Jun 28 2013

"Brad Anderson" <eco gnuk.net> writes:

On Friday, 28 June 2013 at 18:53:44 UTC, monarch_dodra wrote:
 On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
 monarch_dodra:

 Well, I've had a PR open for this for about 8 months now...

 Do you have Phobos commit rights? If the answer is positive 
 then perhaps you can review and commit some Phobos code 
 written by Andrei and he can do the same with yours.
 Today it seems the major development bottleneck is the review 
 queue and not having enough writing of patches :-)

 Bye,
 bearophile

 The answer is negative though :(

95 merged phobos pull requests (109 total).  Long overdue, I'd 
say.  I don't know if there is really a formal process to getting 
write privileges but if any of the people who decide are reading 
this I nominate monarch_dodra.

Jun 28 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 28 June 2013 at 19:13:28 UTC, Brad Anderson wrote:
 On Friday, 28 June 2013 at 18:53:44 UTC, monarch_dodra wrote:
 On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
 monarch_dodra:

 Well, I've had a PR open for this for about 8 months now...

 Do you have Phobos commit rights? If the answer is positive 
 then perhaps you can review and commit some Phobos code 
 written by Andrei and he can do the same with yours.
 Today it seems the major development bottleneck is the review 
 queue and not having enough writing of patches :-)

 Bye,
 bearophile

 The answer is negative though :(

 95 merged phobos pull requests (109 total).  Long overdue, I'd 
 say.  I don't know if there is really a formal process to 
 getting write privileges but if any of the people who decide 
 are reading this I nominate monarch_dodra.

How did you come to that number? Total amount of pulls 
*submitted* ? I count 95 *closed* pull requests, but I think only 
about 50% of them ever made it through? I'm not sure how to check 
though (apart from counting 1 by 1...)

I'm unsure I'd be totally comfortable pulling inside phobos, but 
it *is* true that the queue is just getting longer and longer. 
There are some pulls just sitting there, submitted by trusted 
people (with pull rights), that are validated and reviewed, but 
are just waiting for a puller to do the deed. There needs to be a 
few more reviewers, the ones we have are doing a great job, but 
are overworked.

But I wouldn't want to hijack this thread, nor is learn really 
the place to talk about that.

Jun 28 2013

"Brad Anderson" <eco gnuk.net> writes:

On Friday, 28 June 2013 at 19:31:22 UTC, monarch_dodra wrote:
On Friday, 28 June 2013 at 19:13:28 UTC, Brad Anderson wrote:
95 merged phobos pull requests (109 total). Long overdue, I'd
say. I don't know if there is really a formal process to
getting write privileges but if any of the people who decide
are reading this I nominate monarch_dodra.

How did you come to that number? Total amount of pulls
*submitted* ? I count 95 *closed* pull requests, but I think
only about 50% of them ever made it through? I'm not sure how
to check though (apart from counting 1 by 1...)

I'm unsure I'd be totally comfortable pulling inside phobos,
but it *is* true that the queue is just getting longer and
longer. There are some pulls just sitting there, submitted by
trusted people (with pull rights), that are validated and
reviewed, but are just waiting for a puller to do the deed.
There needs to be a few more reviewers, the ones we have are
doing a great job, but are overworked.

But I wouldn't want to hijack this thread, nor is learn really
the place to talk about that.

I counted closed (I assumed most were merged). Github makes it
hard to glean data from pull requests.

Allow me to share Exhibit B though:
https://github.com/D-Programming-Language/phobos/graphs/contributors?from=2012-06-21&to=2013-06-21&type=c

Second most prolific contributor by commits to phobos over the
last year.

I am nominating you to do extra bureaucratic work though so I'm
kind of being an asshole when you think about it. :P

Jun 28 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - counting words