digitalmars.D.learn - Implementing a cache
- qznc (16/16) Jul 02 2016 I want to implement some caching for HTTP GET requests. Basically
- Lodovico Giaretta (10/26) Jul 02 2016 If I understand correctly, you want fast access based on the URL,
- qznc (4/7) Jul 03 2016 For now, I settled for a sorted array of cache entries plus an AA
- Lodovico Giaretta (6/14) Jul 03 2016 To avoid the ~= operator and reallocations of the cache array,
- Jack Stouffer (42/45) Jul 03 2016 Here's a simple ring buffer I use, feel free to take it
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/17) Jul 03 2016 No, you just need the hash of the key in the object for many
- Steven Schveighoffer (16/29) Jul 05 2016 Yes, it is annoying to have to deal with creating a fake element. In
I want to implement some caching for HTTP GET requests. Basically a map of URL to content. A cache needs some additional meta data (size, age, etc). There seem to be two basic data structures available: Associative array (AA) or red black tree (RBT). With AA cache eviction is inefficient. It requires to sort the keys of the AA with respect to a field in the value. Stack overflow says "use RBT instead" [0]. With RBT retrieving something is inefficient. To get an entry, we need to create an fake entry with the key and get a range of entries 'equal' to the fake one? Or can I exploit some "find on sorted range" algorithm somewhere? Alternatively, any better idea to implement the cache? I guess there is no off-the-shelf/dub solution. [0] https://stackoverflow.com/questions/10060625/how-to-sort-associative-arrays
Jul 02 2016
On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:I want to implement some caching for HTTP GET requests. Basically a map of URL to content. A cache needs some additional meta data (size, age, etc). There seem to be two basic data structures available: Associative array (AA) or red black tree (RBT). With AA cache eviction is inefficient. It requires to sort the keys of the AA with respect to a field in the value. Stack overflow says "use RBT instead" [0]. With RBT retrieving something is inefficient. To get an entry, we need to create an fake entry with the key and get a range of entries 'equal' to the fake one? Or can I exploit some "find on sorted range" algorithm somewhere? Alternatively, any better idea to implement the cache? I guess there is no off-the-shelf/dub solution. [0] https://stackoverflow.com/questions/10060625/how-to-sort-associative-arraysIf I understand correctly, you want fast access based on the URL, but also fast access based on the age (to evict old entries)? If that's the case, you need some very complex structure (I have some ideas in mind based on a pair of indexes, but they require much effort). Instead, if you just need an ordered associative array, like C++'s std::map, it should be possible to create one copying much of the code from the std RedBlackTree (as a side note, having a TreeMap in std.container would be a Good Thing(tm)).
Jul 02 2016
On Saturday, 2 July 2016 at 12:21:14 UTC, Lodovico Giaretta wrote:On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:For now, I settled for a sorted array of cache entries plus an AA to map urls to indices into this array. https://github.com/qznc/d-github/commit/46c013c650d2cf34911ad3e10308ee6a0b5e3690#diff-24abb1e4df591f2d28947bbde1a09220R79Alternatively, any better idea to implement the cache? I guess there is no off-the-shelf/dub solution.
Jul 03 2016
On Sunday, 3 July 2016 at 16:03:51 UTC, qznc wrote:On Saturday, 2 July 2016 at 12:21:14 UTC, Lodovico Giaretta wrote:To avoid the ~= operator and reallocations of the cache array, you could impose a max number of cache entries, preallocate the array and use it as a circular buffer. I think this way should be faster (if imposing a max number of cache entries is feasible, I don't know your use cases).On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:For now, I settled for a sorted array of cache entries plus an AA to map urls to indices into this array. https://github.com/qznc/d-github/commit/46c013c650d2cf34911ad3e10308ee6a0b5e3690#diff-24abb1e4df591f2d28947bbde1a09220R79Alternatively, any better idea to implement the cache? I guess there is no off-the-shelf/dub solution.
Jul 03 2016
On Sunday, 3 July 2016 at 17:15:32 UTC, Lodovico Giaretta wrote:To avoid the ~= operator and reallocations of the cache array, you could impose a max number of cache entries, preallocate the array and use it as a circular buffer.Here's a simple ring buffer I use, feel free to take it struct RingBuffer (T, uint buffer_length = 10) { private T[buffer_length] buffer; private int lastIndex; safe nogc nothrow pure { this(const T initial_value) { this.lastIndex = 0; reset(initial_value); } void reset(const T value) { for (int i = 0; i < this.buffer.length; ++i) { this.buffer[i] = value; } } void put(const T value) { this.buffer[this.lastIndex] = value; // store the new sample // advance the index and wrap it around this.lastIndex += 1; if (this.lastIndex >= buffer_length) { this.lastIndex = 0; } } auto ref opIndex(size_t n) { return buffer[n]; } auto opSlice() { return buffer[]; } auto opAssign(T[] rhs) in { assert(rhs.length <= buffer.length); } body { reset(); foreach (e; rhs) { put(e); } return this; } } }
Jul 03 2016
On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:With AA cache eviction is inefficient. It requires to sort the keys of the AA with respect to a field in the value. StackNo, you just need the hash of the key in the object for many types of hash-tables. There are many solutions, but here are some for Least Recently Used (LRU) schemes: 1. The simplest efficient solution is to use a hash + an embedded double-linked list and an intrusive reference counter. When the refcount > 0 you unlink the object. When the refcount == 0 you put the object at the tail of the list. When you need memory you deallocate the head of the list. 2. Use a hash + partially time-sorted queue for objects that are not in use, and do a partial quick-sort to find LRU lazily (when you need to evict). Not using a hash for lookup seems silly, unless the ID can be efficiently used for a trie like search tree. This might work out ok for URLs.
Jul 03 2016
On 7/2/16 8:10 AM, qznc wrote:I want to implement some caching for HTTP GET requests. Basically a map of URL to content. A cache needs some additional meta data (size, age, etc). There seem to be two basic data structures available: Associative array (AA) or red black tree (RBT). With AA cache eviction is inefficient. It requires to sort the keys of the AA with respect to a field in the value. Stack overflow says "use RBT instead" [0]. With RBT retrieving something is inefficient. To get an entry, we need to create an fake entry with the key> and get a range of entries 'equal' to the fake one?Yes, it is annoying to have to deal with creating a fake element. In dcollections, I have a TreeMap wrapper which does this for you: https://github.com/schveiguy/dcollections/blob/master/dcollections/TreeMap.d#L881 We should probably add support for the searching functions that allows parameters other than full elements if the element supports comparison with that type. In terms of using equalRange, it should be pretty efficient. If you are allowing duplicates, then it does 2 binary searches (one for the beginning node and one for the end). If no duplicates, one binary search, and the equalRange will be 0 or 1 element (depending on if the key is present).Or can I exploit some "find on sorted range" algorithm somewhere?There is find on sorted range support in std.algorithm, but it must be a random-access range. RBTree range is not RA. What's wrong with the binary search support in the tree itself? -Steve
Jul 05 2016