## digitalmars.D - Algorithms in the std lib

- bearophile (168/168) Feb 17 2009 Seeing how the Phobos is being worked on, I think it's a good moment to ...
- Andrei Alexandrescu (4/6) Feb 17 2009 [snip double pasted post]
- bearophile (5/6) Feb 17 2009 Sorry for the double posting.
- Daniel de Kok (15/20) Feb 17 2009 How about suffix array construction? It's a very underused useful data
- bearophile (5/6) Feb 17 2009 I agree that it's useful to have in a std lib (I have had to implement i...
- bearophile (4/5) Feb 17 2009 That code needs much more unittests, that test all corner cases you can ...
- Daniel de Kok (2/5) Feb 17 2009 Indeed, thanks for the suggestion!
- Andrei Alexandrescu (6/31) Feb 17 2009 Yah; in fact I already have a Trie implementation (prefix tree) hanging

Seeing how the Phobos is being worked on, I think it's a good moment to think about other things that it can eventually contain. In all the languages that I use I slowly create a lib of algorithms and data structures and similar stuff. Every programmer has special needs that require external modules, a standard lib can't contain everything. But often you can find two levels of needs: a more general one and a more specialized one. For example in the std lib of Python you can find lazy generators of permutations and combinators. They are often useful, but they can't be enough for people that work a lot with with combinatorics. Such people have to write or download a module with much more specific stuff. So I think the std lib of D may enjoy some more algorithms, to fulfill better the "general needs". Some of the following algorithms may be already present in Phobos and/or Tango, in such case please ignore them. The following stuff is already implemented in my dlibs, and I use it: bisect: Return the index where to insert item x in an array a, assuming a is sorted. isInSorted: Given a sorted array, an item, and an optional key callable (like the key of bisect()), used a binary search and answers true if the item is present, false otherwise. binomial(n, k) factorial combinations/xcombinations permutations/xpermutations subsets/xsubsets xlexicalPermutations apply (see lisp) mapApply(map an apply on an iterable) convexHull2D polygonCentroid RGB2HSV HSBV2RGB floodFill all/any (all or any of the items of an iterable are true or is true a given predicate on them) nextPow2 isPow2 powMod reverseBits Quickly reverse the bits in a ubyte/uint. partition/xpartition splits an iterable in sub-iterables of fixed length gcd xprimes (yields primes lazily at hyper speed) nrank median/medianObj thousands(n) (1000000 => "1_000_000") joinArray xsplit numSorted (smart sort foo1x comes before foo10x) allEqual filterAA (associative arrays) get a.get(2, "def") ==> "def" (for associative arrays) rotateLeft/rotateRight (for arrays) AAintersection maxs Given an iterable 'iter', return a new array of the equally maximum items of 'iter'. mins xgroupBy/xgroupBync This is really useful: it takes an iterable and an optional predicate. This iterable object yields consecutive key(item) and group arrays from the given iterable that have the same result or are all true or all false (it can also yield just the group arrays, or it can yield the index too). pop (removes last of an array/AA and returns it) prepend (for arrays, adds at the start) There is a refined function to flatten nested iterables. ---------------- The following stuff is present in my Python "bag of tricks", I'll probably want to translate them for D1 too: baseConvert (base[1,64] => base[1, 64]) To convert numbers to strings from/to any base. K-means IntervalMap (map of intervals, a kind of associative data structure) toposort (topological sort, already in Graph below, but also as a single external function for simpler usages). Odict (associative array where items are also ordered according to the insertion order, they are in a double linked list) mean mode stdev digitalLine Brensenham line algorithm, returns an iterator of (x,y) coords (not useful for the screen) digitalCircle segmentSegmentIntersection triangleContains (predicate, is point into triangle?) A directed Graph data structure, the class contains most common graph algorithms. http://sourceforge.net/projects/pynetwork/ (I have already partially implemented this in D1). ----------------- You may want to add some things to this list, or to remove things you think aren't useful enough for lot of people. Bye, bearophile IntervalMap (map of intervals, a kind of associative data structure) toposort (topological sort, already in Graph below, but also as a single external function for simpler usages). Odict (associative array where items are also ordered according to the insertion order, they are in a double linked list) mean mode stdev digitalLine Brensenham line algorithm, returns an iterator of (x,y) coords (not useful for the screen) digitalCircle segmentSegmentIntersection triangleContains (predicate, is point into triangle?) A directed Graph data structure, the class contains most common graph algorithms. http://sourceforge.net/projects/pynetwork/ (I have already partially implemented this in D1). ----------------- You may want to add some things to this list, or to remove things you think aren't useful enough for lot of people.Soggetto: Algorithms in the std lib Seeing how the Phobos is being worked on, I think it's a good moment to think about other things that it can eventually contain. In all the languages that I use I slowly create a lib of algorithms and data structures and similar stuff. Every programmer has special needs that require external modules, a standard lib can't contain everything. But often you can find two levels of needs: a more general one and a more specialized one. For example in the std lib of Python you can find lazy generators of permutations and combinators. They are often useful, but they can't be enough for people that work a lot with with combinatorics. Such people have to write or download a module with much more specific stuff. So I think the std lib of D may enjoy some more algorithms, to fulfill better the "general needs". Some of the following algorithms may be already present in Phobos and/or Tango, in such case please ignore them. The following stuff is already implemented in my dlibs, and I use it: bisect: Return the index where to insert item x in an array a, assuming a is sorted. isInSorted: Given a sorted array, an item, and an optional key callable (like the key of bisect()), used a binary search and answers true if the item is present, false otherwise. binomial(n, k) factorial combinations/xcombinations permutations/xpermutations subsets/xsubsets xlexicalPermutations apply (see lisp) mapApply(map an apply on an iterable) convexHull2D polygonCentroid RGB2HSV HSBV2RGB floodFill all/any (all or any of the items of an iterable are true or is true a given predicate on them) nextPow2 isPow2 powMod reverseBits Quickly reverse the bits in a ubyte/uint. partition/xpartition splits an iterable in sub-iterables of fixed length gcd xprimes (yields primes lazily at hyper speed) nrank median/medianObj thousands(n) (1000000 => "1_000_000") joinArray xsplit numSorted (smart sort foo1x comes before foo10x) allEqual filterAA (associative arrays) get a.get(2, "def") ==> "def" (for associative arrays) rotateLeft/rotateRight (for arrays) AAintersection maxs Given an iterable 'iter', return a new array of the equally maximum items of 'iter'. mins xgroupBy/xgroupBync This is really useful: it takes an iterable and an optional predicate. This iterable object yields consecutive key(item) and group arrays from the given iterable that have the same result or are all true or all false (it can also yield just the group arrays, or it can yield the index too). pop (removes last of an array/AA and returns it) prepend (for arrays, adds at the start) There is a refined function to flatten nested iterables. ---------------- The following stuff is present in my Python "bag of tricks", I'll write some of this for D1 too: baseConvert (base[1,64] => base[1, 64]) To convert numbers to strings from/to any base. K-means IntervalMap (map of intervals, a kind of associative data structure) toposort (topological sort, already in Graph below, but also as a single external function for simpler usages). Odict (associative array where items are also ordered according to the insertion order, they are in a double linked list) mean mode stdev digitalLine Brensenham line algorithm, returns an iterator of (x,y) coords (not useful for the screen) digitalCircle segmentSegmentIntersection triangleContains (predicate, is point into triangle?) A directed Graph data structure, the class contains most common graph algorithms. http://sourceforge.net/projects/pynetwork/ (I have already partially implemented this in D1). ----------------- You may want to add some things to this list, or to remove things you think aren't useful enough for lot of people. Bye, bearophile

Feb 17 2009

bearophile wrote:Seeing how the Phobos is being worked on, I think it's a good moment to think about other things that it can eventually contain. In all the languages that I use I slowly create a lib of algorithms and data structures and similar stuff.[snip double pasted post] Ok, I can implement all that... but twice?!? :o) Andrei

Feb 17 2009

Andrei Alexandrescu:Ok, I can implement all that... but twice?!? :o)Sorry for the double posting. And regarding the implementation, first of all it can be seen if you and other people think it's generally useful stuff. Later I'll offer some/most implementations :-) Bye, bearophile

Feb 17 2009

On Tue, Feb 17, 2009 at 5:50 PM, bearophile <bearophileHUGS lycos.com> wrote:Andrei Alexandrescu:How about suffix array construction? It's a very underused useful data structure where you compose a parallel array to data array that is sorted by suffix. This is often used in natural language processing to be able to quickly count the frequency of arbitrary long n-grams without having to store counts of n-grams in a large hash map or traversing the data array in linear order. This is a probably pretty lame implementation that I made as one of my first tiny D programs (I have only started to learn D in tiny bits): http://static.danieldk.org/snippets/d/sarr.d Take care, Daniel (P.s. there are some algorithms that can construct suffix arrays faster than a regular sort, but they usually have a lot of requirements, like perfect hashing of all the elements.)Ok, I can implement all that... but twice?!? :o)Sorry for the double posting. And regarding the implementation, first of all it can be seen if you and other people think it's generally useful stuff. Later I'll offer some/most implementations :-)

Feb 17 2009

Daniel de Kok:How about suffix array construction?I agree that it's useful to have in a std lib (I have had to implement it some times, but not in D yet). (In those dlibs there are other things like Kd-trees and Range Minimum Query algorithms and data structures, that despite being nice and all are probably not useful to enough people). Bye, bearophile

Feb 17 2009

Daniel de Kok:http://static.danieldk.org/snippets/d/sarr.dThat code needs much more unittests, that test all corner cases you can think of too. Bye, bearophile

Feb 17 2009

On Tue, Feb 17, 2009 at 7:05 PM, bearophile <bearophileHUGS lycos.com> wrote:Daniel de Kok:Indeed, thanks for the suggestion!http://static.danieldk.org/snippets/d/sarr.dThat code needs much more unittests, that test all corner cases you can think of too.

Feb 17 2009

Daniel de Kok wrote:On Tue, Feb 17, 2009 at 5:50 PM, bearophile <bearophileHUGS lycos.com> wrote:Yah; in fact I already have a Trie implementation (prefix tree) hanging on around std. I'm just afraid to bite the bullet in choosing a container overall design. I'd need more experience, and currently bugs in dmd's copy constructors prevent most experimentation with value types. AndreiAndrei Alexandrescu:How about suffix array construction? It's a very underused useful data structure where you compose a parallel array to data array that is sorted by suffix. This is often used in natural language processing to be able to quickly count the frequency of arbitrary long n-grams without having to store counts of n-grams in a large hash map or traversing the data array in linear order. This is a probably pretty lame implementation that I made as one of my first tiny D programs (I have only started to learn D in tiny bits): http://static.danieldk.org/snippets/d/sarr.d Take care, Daniel (P.s. there are some algorithms that can construct suffix arrays faster than a regular sort, but they usually have a lot of requirements, like perfect hashing of all the elements.)Ok, I can implement all that... but twice?!? :o)Sorry for the double posting. And regarding the implementation, first of all it can be seen if you and other people think it's generally useful stuff. Later I'll offer some/most implementations :-)

Feb 17 2009