digitalmars.D.learn - Scala Spark-like RDD for D?
- data pulverizer (17/17) Feb 15 2016 Are there are any plans to create a scala spark-like RDD class
- data pulverizer (5/20) Feb 15 2016 Perhaps the question is too prescriptive. Another way is: Does D
- Jakob Jenkov (8/11) Feb 16 2016 I cannot speak on behalf of the D community. In my opinion I
- jmh530 (15/23) Feb 16 2016 Good attitude. Nevertheless, I think there is a much larger
- bachmeier (4/6) Feb 16 2016 You can use MPI:
Are there are any plans to create a scala spark-like RDD class for D (https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)? This is a powerful model that has taken the data science world by storm; it would be useful to have something like this in the D world. Most of the algorithms in statistics/data science are iterative in nature which fits well with this kind of data model. I read through the Kind Of Container thread which has some relationship with this issue (https://forum.dlang.org/thread/n07rh8$dmb$1 digitalmars.com). It looks like Immutability would be the way to go for an RDD data structure. But I am not wedded to any model as long as we can have something that performs the same functionality as the RDD. As an alternative are there plans for parallel/cluster computing frameworks for D? Apologies if I am kicking a hornet's nest. It is not my intention. Thanks
Feb 15 2016
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer wrote:Are there are any plans to create a scala spark-like RDD class for D (https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)? This is a powerful model that has taken the data science world by storm; it would be useful to have something like this in the D world. Most of the algorithms in statistics/data science are iterative in nature which fits well with this kind of data model. I read through the Kind Of Container thread which has some relationship with this issue (https://forum.dlang.org/thread/n07rh8$dmb$1 digitalmars.com). It looks like Immutability would be the way to go for an RDD data structure. But I am not wedded to any model as long as we can have something that performs the same functionality as the RDD. As an alternative are there plans for parallel/cluster computing frameworks for D? Apologies if I am kicking a hornet's nest. It is not my intention. ThanksPerhaps the question is too prescriptive. Another way is: Does D have a big data strategy? But I tried to anchor it to some currently functioning framework which is why I suggested RDD.
Feb 15 2016
Perhaps the question is too prescriptive. Another way is: Does D have a big data strategy? But I tried to anchor it to some currently functioning framework which is why I suggested RDD.I cannot speak on behalf of the D community. In my opinion I don't think that it is D that needs a big data strategy. It is the users of D that need that strategy. I am originally a Java developer. Java devs. create all kinds of crazy tools all the time. Lots fail, but some survive and grow big, like Spark. D devs need to do the same. Just jump into it. Have it be your hobby project in D. Then see where it takes you.
Feb 16 2016
On Tuesday, 16 February 2016 at 15:03:36 UTC, Jakob Jenkov wrote:I cannot speak on behalf of the D community. In my opinion I don't think that it is D that needs a big data strategy. It is the users of D that need that strategy. I am originally a Java developer. Java devs. create all kinds of crazy tools all the time. Lots fail, but some survive and grow big, like Spark. D devs need to do the same. Just jump into it. Have it be your hobby project in D. Then see where it takes you.Good attitude. Nevertheless, I think there is a much larger population of people who would want to use D for normal data analysis if packages could replicate much of what people do in R/Python. If the OP really wants to contribute to big data projects in D, he might want to start with things that will more easily allow D to interact with existing libraries. For instance, Google's MR4C allows C code to be run in a Hadoop instance. Maybe adding support for D might be do-able? http://google-opensource.blogspot.com/2015/02/mapreduce-for-c-run-native-code-in.html There is likely value in writing bindings to machine learning libraries. I did a quick search of machine learning libraries and much of it looked like it was in C++. I don't have much expertise with writing bindings to C++ libraries.
Feb 16 2016
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer wrote:As an alternative are there plans for parallel/cluster computing frameworks for D?You can use MPI: https://github.com/DlangScience/OpenMPI
Feb 16 2016
On Tuesday, 16 February 2016 at 16:27:27 UTC, bachmeier wrote:On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer wrote:FWIW, I'm interested in the wider topic of incorporating D into data science environments also. Sounds as if there are several interesting projects in the area, but so far my understanding of them is limited. Perhaps the forum isn't the best place to discuss, but if there happen to be any blog posts or other descriptions, it'd be great to get links. --JonAs an alternative are there plans for parallel/cluster computing frameworks for D?You can use MPI: https://github.com/DlangScience/OpenMPI
Feb 16 2016
On Wednesday, 17 February 2016 at 02:03:40 UTC, Jon D wrote:On Tuesday, 16 February 2016 at 16:27:27 UTC, bachmeier wrote:You can discuss here, but there is also a gitter room https://gitter.im/DlangScience/public Also, I've got a project that embeds R inside D http://lancebachmeier.com/rdlang/ It's not quite as good a user experience as others because I have limited time for things not related to work. I've got an older project to embed D inside R, but it hasn't been updated in a while and it's Linux only. https://bitbucket.org/bachmeil/dmdinline2On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer wrote:FWIW, I'm interested in the wider topic of incorporating D into data science environments also. Sounds as if there are several interesting projects in the area, but so far my understanding of them is limited. Perhaps the forum isn't the best place to discuss, but if there happen to be any blog posts or other descriptions, it'd be great to get links. --JonAs an alternative are there plans for parallel/cluster computing frameworks for D?You can use MPI: https://github.com/DlangScience/OpenMPI
Feb 16 2016
On Wednesday, 17 February 2016 at 02:32:01 UTC, bachmeier wrote:You can discuss here, but there is also a gitter room https://gitter.im/DlangScience/public Also, I've got a project that embeds R inside D http://lancebachmeier.com/rdlang/ It's not quite as good a user experience as others because I have limited time for things not related to work. I've got an older project to embed D inside R, but it hasn't been updated in a while and it's Linux only. https://bitbucket.org/bachmeil/dmdinline2Excellent, thanks, I'll check these out. --Jon
Feb 16 2016