www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Making mir.random.ndvariable.multivariateNormalVar create bigger data

reply kerdemdemir <kerdemdemir gmail.com> writes:
I need a classifier in my project.
Since it is I believe most easy to implement I am trying to 
implement logistic regression.

I am trying to do the same as the python example:  
https://beckernick.github.io/logistic-regression-from-scratch/

I need to data sets with which I will test.

This works(https://run.dlang.io/is/yGa4a0) :

	double[2] x1;
	Random* gen = threadLocalPtr!Random;
	
	auto mu = [0.0, 0.0].sliced;
	auto sigma = [1.0, 0.75, 0.75, 1].sliced(2,2);
	auto rv = multivariateNormalVar(mu, sigma);
	rv(gen, x1[]);
	writeln(x1);
	
But when I increase my data set size from double[2] to 
double[100] I am getting an assert :

mir-random-0.4.3/mir-random/source/mir/random/ndvariable.d(378): 
Assertion failure

which is:
assert(result.length == n);

How can I have a result vector which has size like 5000 something?

Erdemdem
Feb 27 2018
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 27 February 2018 at 09:23:49 UTC, kerdemdemir wrote:
 I need a classifier in my project.
 Since it is I believe most easy to implement I am trying to 
 implement logistic regression.

 I am trying to do the same as the python example:  
 https://beckernick.github.io/logistic-regression-from-scratch/

 I need to data sets with which I will test.

 This works(https://run.dlang.io/is/yGa4a0) :

 	double[2] x1;
 	Random* gen = threadLocalPtr!Random;
 	
 	auto mu = [0.0, 0.0].sliced;
 	auto sigma = [1.0, 0.75, 0.75, 1].sliced(2,2);
 	auto rv = multivariateNormalVar(mu, sigma);
 	rv(gen, x1[]);
 	writeln(x1);
 	
 But when I increase my data set size from double[2] to 
 double[100] I am getting an assert :

 mir-random-0.4.3/mir-random/source/mir/random/ndvariable.d(378): Assertion
failure

 which is:
 assert(result.length == n);

 How can I have a result vector which has size like 5000 
 something?

 Erdemdem
I haven't made much use of mir.random yet... The dimension 2 in this case is the size of the dimension of the random variable. What you want to do is simulate multiple times from this 2-dimensional random variable. It looks like the examples on the main Readme page uses mir.random.algorithm.range. I tried below, but I got errors. I did notice that the MultivariateNormalVariable documentation says that it is in beta still. void main() { import mir.random : Random, unpredictableSeed; import mir.random.ndvariable : MultivariateNormalVariable; import mir.random.algorithm : range; import mir.ndslice.slice : sliced; import std.range : take; auto mu = [10.0, 0.0].sliced; auto sigma = [2.0, -1.5, -1.5, 2.0].sliced(2,2); auto rng = Random(unpredictableSeed); auto sample = range!rng (MultivariateNormalVariable!double(mu, sigma)) .take(10); } However, doing it manually with a for loop works. void main() { import mir.random : rne; import mir.random.ndvariable : multivariateNormalVar; import mir.random.algorithm : range; import mir.ndslice.slice : sliced; import std.stdio : writeln; auto mu = [10.0, 0.0].sliced; auto sigma = [2.0, -1.5, -1.5, 2.0].sliced(2,2); auto rv = multivariateNormalVar(mu, sigma); double[2][100] x; for (size_t i = 0; i < 100; i++) { rv(rne, x[i][]); } writeln(x); } Nevertheless, it probably can't hurt to file an issue if you can't get something like the first one to work. I would think it should just work.
Feb 27 2018
parent reply Nathan S. <no.public.email example.com> writes:
On Tuesday, 27 February 2018 at 15:08:42 UTC, jmh530 wrote:
 Nevertheless, it probably can't hurt to file an issue if you 
 can't get something like the first one to work. I would think 
 it should just work.
The problem is that `mir.random.ndvariable` doesn't satisfy `mir.random.variable.isRandomVariable!T`. ndvariables have a slightly different interface from variables: instead of of `rv(gen)` returning a result, `rv(gen, dst)` writes to dst. I agree that the various methods for working with variables should be enhanced to work with ndvariables.
Feb 27 2018
parent reply Nathan S. <no.public.email example.com> writes:
On Tuesday, 27 February 2018 at 16:42:00 UTC, Nathan S. wrote:
 On Tuesday, 27 February 2018 at 15:08:42 UTC, jmh530 wrote:
 Nevertheless, it probably can't hurt to file an issue if you 
 can't get something like the first one to work. I would think 
 it should just work.
The problem is that `mir.random.ndvariable` doesn't satisfy `mir.random.variable.isRandomVariable!T`. ndvariables have a slightly different interface from variables: instead of of `rv(gen)` returning a result, `rv(gen, dst)` writes to dst. I agree that the various methods for working with variables should be enhanced to work with ndvariables.
So, I see that the interface will have to be slightly different for ndvariable than for variable. With the exception of MultivariateNormalVariable, the same ndvariable instance can be called to fill output of any length "n", so one can't meaningfully create a range based on just the ndvariable without further specification. What would "front" return? For MultivariateNormalVariable "n" is constrained but it is a runtime parameter rather than a compile-time parameter. You'll want to ping 9il / Ilya Yaroshenko to discuss what the API should be like for this.
Feb 27 2018
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 27 February 2018 at 17:24:22 UTC, Nathan S. wrote:
 On Tuesday, 27 February 2018 at 16:42:00 UTC, Nathan S. wrote:
 On Tuesday, 27 February 2018 at 15:08:42 UTC, jmh530 wrote:
 Nevertheless, it probably can't hurt to file an issue if you 
 can't get something like the first one to work. I would think 
 it should just work.
The problem is that `mir.random.ndvariable` doesn't satisfy `mir.random.variable.isRandomVariable!T`. ndvariables have a slightly different interface from variables: instead of of `rv(gen)` returning a result, `rv(gen, dst)` writes to dst. I agree that the various methods for working with variables should be enhanced to work with ndvariables.
So, I see that the interface will have to be slightly different for ndvariable than for variable. With the exception of MultivariateNormalVariable, the same ndvariable instance can be called to fill output of any length "n", so one can't meaningfully create a range based on just the ndvariable without further specification. What would "front" return? For MultivariateNormalVariable "n" is constrained but it is a runtime parameter rather than a compile-time parameter. You'll want to ping 9il / Ilya Yaroshenko to discuss what the API should be like for this.
Honestly, I think the post above was my first use of mir.random, so I'm nowhere near familiar enough at this point to add much useful feedback. I'm definitely glad that it is getting worked on and plan on using it in the future. The only thing I would note is that there are not just N-dimensional random variables, there are also NXN dimensional random variables (not sure what else there could be, but it would be significantly less popular). A Wishart distribution (used for the distribution of covariance matrices) can be simulated by multiplying the transpose of a multivariate random normal by itself. This produces an NXN matrix. Ideally, the API could handle this type of distribution as well. Another type of distribution I sometimes see is from Bayesian statistics (less common than typical distributions and could probably be built on top of what is already in mir.random, but I figured it couldn't hurt to bring it to your attention). A normal-inverse-gamma distribution is one example of these types of distributions. Simulating from this distribution would produce a pair of the mean and variance, not just one value. This would contrast with multivariate normal in that you would know it has two dimensions at compile-time.
Feb 27 2018
parent reply Nathan S. <no.public.email example.com> writes:
Cross-posting from the github issue 
(https://github.com/libmir/mir-random/issues/77) with a 
workaround (execute it at https://run.dlang.io/is/Swr1xU):
----

I am not sure what the correct interface should be for this in 
the long run, but for now you can use a wrapper function to 
convert an ndvariable to a variable:

```d
/++
Converts an N-dimensional variable to a fixed-dimensional 
variable.
+/
auto specifyDimension(ReturnType, NDVariable)(NDVariable vr)
     if (__traits(isStaticArray, ReturnType) && __traits(compiles, 
{static assert(NDVariable.isRandomVariable);}))
{
     import mir.random : isSaturatedRandomEngine;
     import mir.random.variable : isRandomVariable;
     static struct V
     {
         enum bool isRandomVariable = true;
         NDVariable vr;
         ReturnType opCall(G)(scope ref G gen) if 
(isSaturatedRandomEngine!G)
         {
             ReturnType ret;
             vr(gen, ret[]);
             return ret;
         }

         ReturnType opCall(G)(scope G* gen) if 
(isSaturatedRandomEngine!G)
         {
             return opCall!(G)(*gen);
         }
     }
     static assert(isRandomVariable!V);
     V v = { vr };
     return v;
}
```

So `main` from your above example becomes:

```d
void main()
{
     import std.stdio;
     import mir.random : Random, threadLocalPtr;
     import mir.random.ndvariable : multivariateNormalVar;
     import mir.random.algorithm : range;
     import mir.ndslice.slice : sliced;
     import std.range : take;

     auto mu = [10.0, 0.0].sliced;
     auto sigma = [2.0, -1.5, -1.5, 2.0].sliced(2,2);

     Random* rng = threadLocalPtr!Random;
     auto sample = rng
                 .range(multivariateNormalVar(mu, 
sigma).specifyDimension!(double[2]))
                 .take(10);
     writeln(sample);
}
```
Feb 27 2018
parent jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 27 February 2018 at 21:54:34 UTC, Nathan S. wrote:
 Cross-posting from the github issue 
 (https://github.com/libmir/mir-random/issues/77) with a 
 workaround (execute it at https://run.dlang.io/is/Swr1xU):
 ----
 [snip]
Step in the right direction at least.
Feb 27 2018
prev sibling parent 9il <ilyayaroshenko gmail.com> writes:
On Tuesday, 27 February 2018 at 09:23:49 UTC, kerdemdemir wrote:
 I need a classifier in my project.
 Since it is I believe most easy to implement I am trying to 
 implement logistic regression.

 [...]
Mir Random v1.0.0 has new `range` overloads that can work NdRandomVariable. Example: https://run.dlang.io/is/jte3gx
Sep 10 2018