digitalmars.D.learn - Multidimensional slicing
- Joseph Rushton Wakeling (31/31) Oct 24 2012 Hello all,
- Nathan M. Swan (20/57) Oct 25 2012 Short answer: no.
- bearophile (23/38) Oct 26 2012 In Python people that want to use a associative arrays to store a
Hello all, Suppose I have some data in an 2-dimensional associative array (it could be larger dimension, but let's limit it to 2d for now). I'm using an associative array because the underlying data is sparse, i.e. for any given index pair (i, j) there's most likely not an entry. It's trivial to take a slice of this data using the first dimension, i.e. if a[i][j] gives the value for the key pair (i, j), a[i] will give the array of values for all values of j, and a[i].keys will give me all the values of j that are "connected" to i. ... however, I can't do this the opposite way, i.e. to slice it a[][j] -- I get an error message, e.g.: Error: ulong[ulong][ulong] cannot be sliced with [] and similarly there is no way to identify a[][j].keys i.e. all the i indexes associated with j. So, I'm curious: is there any practical way to slice across the 2nd (or higher) dimension in this way, so that I get access to both values and keys? The application here is a large (but sparse) dataset of links forming a bipartite graph of user-object weighted links. The goal would be for the 2d array to store those weights, and the slices would allow you to look at that collection from one of two viewpoints -- i.e. all the objects a given user is linked to, or all the users a given object is linked to. Of course, it's trivial to have two associative arrays, say userLinks and objectLinks, where the keys are respectively object and user IDs and the values are the weights, but that means storing the weight values twice. I'd like to be able to have one entity that stores the links and weights, and which can be sliced either way without needing to copy data and without any added CPU overhead (the planned simulation and data sizes are large, so something that is costly CPU-wise or that requires additional allocation is unlikely to scale). Thanks in advance for any advice, Best wishes, -- Joe
Oct 24 2012
On Wednesday, 24 October 2012 at 14:25:33 UTC, Joseph Rushton Wakeling wrote:Hello all, Suppose I have some data in an 2-dimensional associative array (it could be larger dimension, but let's limit it to 2d for now). I'm using an associative array because the underlying data is sparse, i.e. for any given index pair (i, j) there's most likely not an entry. It's trivial to take a slice of this data using the first dimension, i.e. if a[i][j] gives the value for the key pair (i, j), a[i] will give the array of values for all values of j, and a[i].keys will give me all the values of j that are "connected" to i. ... however, I can't do this the opposite way, i.e. to slice it a[][j] -- I get an error message, e.g.: Error: ulong[ulong][ulong] cannot be sliced with [] and similarly there is no way to identify a[][j].keys i.e. all the i indexes associated with j. So, I'm curious: is there any practical way to slice across the 2nd (or higher) dimension in this way, so that I get access to both values and keys? The application here is a large (but sparse) dataset of links forming a bipartite graph of user-object weighted links. The goal would be for the 2d array to store those weights, and the slices would allow you to look at that collection from one of two viewpoints -- i.e. all the objects a given user is linked to, or all the users a given object is linked to. Of course, it's trivial to have two associative arrays, say userLinks and objectLinks, where the keys are respectively object and user IDs and the values are the weights, but that means storing the weight values twice. I'd like to be able to have one entity that stores the links and weights, and which can be sliced either way without needing to copy data and without any added CPU overhead (the planned simulation and data sizes are large, so something that is costly CPU-wise or that requires additional allocation is unlikely to scale). Thanks in advance for any advice, Best wishes, -- JoeShort answer: no. Long answer: It would be inefficient to create a slice like you ask if your data structure is associative arrays. Here's the code: ulong[] result; foreach(i, js; a) { foreach (j, w; js) { if (j == jYouAskedFor) { result ~= w; } } } // use result... The worst case here is O(n), where n is the number of links. The slicing of a[i] is O(1). The two associative arrays is probably your best bet, but I'm no expert on this. Hope it helps, Nathan M. Swan
Oct 25 2012
Joseph Rushton Wakeling:Suppose I have some data in an 2-dimensional associative array (it could be larger dimension, but let's limit it to 2d for now). I'm using an associative array because the underlying data is sparse, i.e. for any given index pair (i, j) there's most likely not an entry.In Python people that want to use a associative arrays to store a sparse 2D array usually use just one associative array, where the keys are (r,c) pairs. But you can't "slice" this. In low level languages for sparse 2D arrays often people use something different. One popular choice is to use just a (dense) array for the first coordinate, and to put just index-value pairs (with ordered indexes) into such arrays. It's a simple solution, but it's often good enough. If the rows contain many items (more than 50-100) it's better to chunk each row in some way. If you want to slice on both coordinates consider using two data structures, where the second contains pointers to the first, and if you want to add/remove items then maybe each value needs a pointer to the start of the array. See also: http://en.wikipedia.org/wiki/Sparse_matrixIt's trivial to take a slice of this data using the first dimension, i.e. if a[i][j] gives the value for the key pair (i, j), a[i] will give the array of values for all values of j, and a[i].keys will give me all the values of j that are "connected" to i. ... however, I can't do this the opposite way, i.e. to slice it a[][j] -- I get an error message, e.g.: Error: ulong[ulong][ulong] cannot be sliced with [] and similarly there is no way to identify a[][j].keys i.e. all the i indexes associated with j.Maybe you are able to create a wrapper, that wraps the 2D associative array, and uses a little "most recently used" cache stored in a little array to memoize some of the last lookups on the first coordinate. But first it's better to try simpler solutions. Bye, bearophile
Oct 26 2012