digitalmars.D - Algorithms in the std lib

bearophile (168/168) Feb 17 2009 Seeing how the Phobos is being worked on, I think it's a good moment to ...

Andrei Alexandrescu (4/6) Feb 17 2009 [snip double pasted post]

bearophile (5/6) Feb 17 2009 Sorry for the double posting.

Daniel de Kok (15/20) Feb 17 2009 How about suffix array construction? It's a very underused useful data

bearophile (5/6) Feb 17 2009 I agree that it's useful to have in a std lib (I have had to implement i...
bearophile (4/5) Feb 17 2009 That code needs much more unittests, that test all corner cases you can ...

Daniel de Kok (2/5) Feb 17 2009 Indeed, thanks for the suggestion!

Andrei Alexandrescu (6/31) Feb 17 2009 Yah; in fact I already have a Trie implementation (prefix tree) hanging

bearophile <bearophileHUGS lycos.com> writes:

Seeing how the Phobos is being worked on, I think it's a good moment to think
about other things that it can eventually contain.
In all the languages that I use I slowly create a lib of algorithms and data
structures and similar stuff.

Every programmer has special needs that require external modules, a standard
lib can't contain everything. But often you can find two levels of needs: a
more general one and a more specialized one. For example in the std lib of
Python you can find lazy generators of permutations and combinators. They are
often useful, but they can't be enough for people that work a lot with with
combinatorics. Such people have to write or download a module with much more
specific stuff.

So I think the std lib of D may enjoy some more algorithms, to fulfill better
the "general needs".
Some of the following algorithms may be already present in Phobos and/or Tango,
in such case please ignore them.

The following stuff is already implemented in my dlibs, and I use it:

bisect:
  Return the index where to insert item x in an array a, assuming a is sorted. 

isInSorted:
  Given a sorted array, an item, and an optional key callable (like the key of
bisect()), used a binary search and answers true if the item is present, false
otherwise.

binomial(n, k)
factorial
combinations/xcombinations
permutations/xpermutations
subsets/xsubsets
xlexicalPermutations
apply (see lisp)
mapApply(map an apply on an iterable)

convexHull2D
polygonCentroid
RGB2HSV
HSBV2RGB
floodFill

all/any
  (all or any of the items of an iterable are true or is true a given predicate
on them)

nextPow2
isPow2
powMod

reverseBits
  Quickly reverse the bits in a ubyte/uint. 


partition/xpartition
  splits an iterable in sub-iterables of fixed length

gcd

xprimes (yields primes lazily at hyper speed)

nrank
median/medianObj

thousands(n)  (1000000 => "1_000_000")
joinArray
xsplit
numSorted (smart sort foo1x comes before foo10x)

allEqual
filterAA (associative arrays)
get a.get(2, "def") ==> "def" (for associative arrays)

rotateLeft/rotateRight (for arrays)
AAintersection

maxs
  Given an iterable 'iter', return a new array of the equally maximum items of
'iter'.
mins

xgroupBy/xgroupBync
  This is really useful: it takes an iterable and an optional predicate. This
iterable object yields consecutive key(item) and group arrays from the given
iterable that have the same result or are all true or all false (it can also
yield just the group arrays, or it can yield the index too).

pop (removes last of an array/AA and returns it)
prepend (for arrays, adds at the start)

There is a refined function to flatten nested iterables.

----------------

The following stuff is present in my Python "bag of tricks", I'll probably want
to translate them for D1 too:

baseConvert (base[1,64] => base[1, 64])
To convert numbers to strings from/to any base.

K-means

IntervalMap (map of intervals, a kind of associative data structure)

toposort (topological sort, already in Graph below, but also as a single
external function for simpler usages).

Odict (associative array where items are also ordered according to the
insertion order, they are in a double linked list)

mean
mode
stdev

digitalLine
  Brensenham line algorithm, returns an iterator of (x,y) coords (not useful
for the screen)
digitalCircle
segmentSegmentIntersection

triangleContains (predicate, is point into triangle?)

A directed Graph data structure, the class contains most common graph
algorithms.
http://sourceforge.net/projects/pynetwork/
(I have already partially implemented this in D1).

-----------------

You may want to add some things to this list, or to remove things you think
aren't useful enough for lot of people.

Bye,
bearophile

IntervalMap (map of intervals, a kind of associative data structure)

toposort (topological sort, already in Graph below, but also as a single
external function for simpler usages).

Odict (associative array where items are also ordered according to the
insertion order, they are in a double linked list)

mean
mode
stdev

digitalLine
  Brensenham line algorithm, returns an iterator of (x,y) coords (not useful
for the screen)
digitalCircle
segmentSegmentIntersection

triangleContains (predicate, is point into triangle?)

A directed Graph data structure, the class contains most common graph
algorithms.
http://sourceforge.net/projects/pynetwork/
(I have already partially implemented this in D1).

-----------------

You may want to add some things to this list, or to remove things you think
aren't useful enough for lot of people.Soggetto: Algorithms in the std lib

Seeing how the Phobos is being worked on, I think it's a good moment to think
about other things that it can eventually contain.
In all the languages that I use I slowly create a lib of algorithms and data
structures and similar stuff.

Every programmer has special needs that require external modules, a standard
lib can't contain everything. But often you can find two levels of needs: a
more general one and a more specialized one. For example in the std lib of
Python you can find lazy generators of permutations and combinators. They are
often useful, but they can't be enough for people that work a lot with with
combinatorics. Such people have to write or download a module with much more
specific stuff.

So I think the std lib of D may enjoy some more algorithms, to fulfill better
the "general needs".
Some of the following algorithms may be already present in Phobos and/or Tango,
in such case please ignore them.

The following stuff is already implemented in my dlibs, and I use it:

bisect:
  Return the index where to insert item x in an array a, assuming a is sorted. 

isInSorted:
  Given a sorted array, an item, and an optional key callable (like the key of
bisect()), used a binary search and answers true if the item is present, false
otherwise.

binomial(n, k)
factorial
combinations/xcombinations
permutations/xpermutations
subsets/xsubsets
xlexicalPermutations
apply (see lisp)
mapApply(map an apply on an iterable)

convexHull2D
polygonCentroid
RGB2HSV
HSBV2RGB
floodFill

all/any
  (all or any of the items of an iterable are true or is true a given predicate
on them)

nextPow2
isPow2
powMod

reverseBits
  Quickly reverse the bits in a ubyte/uint. 


partition/xpartition
  splits an iterable in sub-iterables of fixed length

gcd

xprimes (yields primes lazily at hyper speed)

nrank
median/medianObj

thousands(n)  (1000000 => "1_000_000")
joinArray
xsplit
numSorted (smart sort foo1x comes before foo10x)

allEqual
filterAA (associative arrays)
get a.get(2, "def") ==> "def" (for associative arrays)

rotateLeft/rotateRight (for arrays)
AAintersection

maxs
  Given an iterable 'iter', return a new array of the equally maximum items of
'iter'.
mins

xgroupBy/xgroupBync
  This is really useful: it takes an iterable and an optional predicate. This
iterable object yields consecutive key(item) and group arrays from the given
iterable that have the same result or are all true or all false (it can also
yield just the group arrays, or it can yield the index too).

pop (removes last of an array/AA and returns it)
prepend (for arrays, adds at the start)

There is a refined function to flatten nested iterables.

----------------

The following stuff is present in my Python "bag of tricks", I'll write some of
this for D1 too:

baseConvert (base[1,64] => base[1, 64])
To convert numbers to strings from/to any base.

K-means

IntervalMap (map of intervals, a kind of associative data structure)

toposort (topological sort, already in Graph below, but also as a single
external function for simpler usages).

Odict (associative array where items are also ordered according to the
insertion order, they are in a double linked list)

mean
mode
stdev

digitalLine
  Brensenham line algorithm, returns an iterator of (x,y) coords (not useful
for the screen)
digitalCircle
segmentSegmentIntersection

triangleContains (predicate, is point into triangle?)

A directed Graph data structure, the class contains most common graph
algorithms.
http://sourceforge.net/projects/pynetwork/
(I have already partially implemented this in D1).

-----------------

You may want to add some things to this list, or to remove things you think
aren't useful enough for lot of people.

Bye,
bearophile

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Seeing how the Phobos is being worked on, I think it's a good moment to think
about other things that it can eventually contain.
 In all the languages that I use I slowly create a lib of algorithms and data
structures and similar stuff.

[snip double pasted post]

Ok, I can implement all that... but twice?!? :o)

Andrei

Feb 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Ok, I can implement all that... but twice?!? :o)

Sorry for the double posting.
And regarding the implementation, first of all it can be seen if you and other
people think it's generally useful stuff. Later I'll offer some/most
implementations :-)

Bye,
bearophile

Feb 17 2009

Daniel de Kok <me danieldk.org> writes:

On Tue, Feb 17, 2009 at 5:50 PM, bearophile <bearophileHUGS lycos.com> wrote:
 Andrei Alexandrescu:
 Ok, I can implement all that... but twice?!? :o)

 Sorry for the double posting.
 And regarding the implementation, first of all it can be seen if you and other
people think it's generally
 useful stuff. Later I'll offer some/most implementations :-)

How about suffix array construction? It's a very underused useful data
structure where you compose a parallel array to data array that is
sorted by suffix. This is often used in natural language processing to
be able to quickly count the frequency of arbitrary long n-grams
without having to store counts of n-grams in a large hash map or
traversing the data array in linear order.

This is a probably pretty lame implementation that I made as one of my
first tiny D programs (I have only started to learn D in tiny bits):

http://static.danieldk.org/snippets/d/sarr.d

Take care,
Daniel

(P.s. there are some algorithms that can construct suffix arrays
faster than a regular sort, but they usually have a lot of
requirements, like perfect hashing of all the elements.)

Feb 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Daniel de Kok:
 How about suffix array construction?

I agree that it's useful to have in a std lib (I have had to implement it some
times, but not in D yet).
(In those dlibs there are other things like Kd-trees and Range Minimum Query
algorithms and data structures, that despite being nice and all are probably
not useful to enough people).

Bye,
bearophile

Feb 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Daniel de Kok:
 http://static.danieldk.org/snippets/d/sarr.d

That code needs much more unittests, that test all corner cases you can think
of too.

Bye,
bearophile

Feb 17 2009

Daniel de Kok <me danieldk.org> writes:

On Tue, Feb 17, 2009 at 7:05 PM, bearophile <bearophileHUGS lycos.com> wrote:
 Daniel de Kok:
 http://static.danieldk.org/snippets/d/sarr.d

 That code needs much more unittests, that test all corner cases you can think
of too.

Indeed, thanks for the suggestion!

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Daniel de Kok wrote:
 On Tue, Feb 17, 2009 at 5:50 PM, bearophile <bearophileHUGS lycos.com> wrote:
 Andrei Alexandrescu:
 Ok, I can implement all that... but twice?!? :o)

 Sorry for the double posting.
 And regarding the implementation, first of all it can be seen if you and other
people think it's generally
 useful stuff. Later I'll offer some/most implementations :-)

 
 How about suffix array construction? It's a very underused useful data
 structure where you compose a parallel array to data array that is
 sorted by suffix. This is often used in natural language processing to
 be able to quickly count the frequency of arbitrary long n-grams
 without having to store counts of n-grams in a large hash map or
 traversing the data array in linear order.
 
 This is a probably pretty lame implementation that I made as one of my
 first tiny D programs (I have only started to learn D in tiny bits):
 
 http://static.danieldk.org/snippets/d/sarr.d
 
 Take care,
 Daniel
 
 (P.s. there are some algorithms that can construct suffix arrays
 faster than a regular sort, but they usually have a lot of
 requirements, like perfect hashing of all the elements.)

Yah; in fact I already have a Trie implementation (prefix tree) hanging 
on around std. I'm just afraid to bite the bullet in choosing a 
container overall design. I'd need more experience, and currently bugs 
in dmd's copy constructors prevent most experimentation with value types.


Andrei

Feb 17 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Algorithms in the std lib