www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Pandas like features

reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
As a researcher in BioInformatics I use a lot python numpy pandas 
and scipy. But I am bored by the slowness of python even with 
cpython code thanks to the GIL and un-optimized tail recursion.

So I thinks really that D could play a big role in this field 
with MIR and dcompute.

1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

2/ does the scientific computing field is something that D 
language want to grow ?

Thanks

Best regards
Oct 23 2020
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards
2. Yes!
Oct 23 2020
prev sibling next sibling parent reply mw <mingwu gmail.com> writes:
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?
I think it's definitely the biggest area and opportunities for D to become more popular. GIL, lack of performance, and huge memory bloat are such pain in Python. Probably the best way to move forward is to provide libmir as a Numpy/Pandas *drop-in* replacement. (And I've suggested to rename Mir as NumD from a marketing / promotional perspective). For the time being, from the language/lib user's perspective, we can just use D/libmir to pre-process the data, and maybe save the result as csv/npz for further processing (by ... Python). Build or wrap something like tensorflow, I think will need much more resource than the D community current have, also I'm not sure if it worth the effort. And from the language perspective, maybe D should adopt Python/Numpy's array indexing syntax, specifically: 1) use Python's arr[start:end], in addition to D's arr[start..end] 2) and also allow negative index, instead of [$-1]. (This $ is an improvement of Java/C++'s arr[arr.length -1], but still is less convenient than Python’s negative index syntax). Python gained such popularity in scientific computing in the past ~10 years is not an accident, actually Guido made that happen by extending Python's syntax: https://en.wikipedia.org/wiki/NumPy#History """ The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier.[6] """ Maybe Walter should join one of such SIGs as well :-)
Oct 23 2020
next sibling parent mw <mingwu gmail.com> writes:
On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
 And from the language perspective, maybe D should adopt 
 Python/Numpy's array indexing syntax, specifically:

 1) use Python's arr[start:end], in addition to D's 
 arr[start..end]

 2) and also allow negative index, instead of [$-1]. (This $ is 
 an improvement of Java/C++'s arr[arr.length -1], but still is 
 less convenient than Python’s negative index syntax).

 Python gained such popularity in scientific computing in the 
 past ~10 years is not an accident, actually Guido made that 
 happen by extending Python's syntax:

 https://en.wikipedia.org/wiki/NumPy#History

 """
 The Python programming language was not originally designed for 
 numerical computing, but attracted the attention of the 
 scientific and engineering community early on. In 1995 the 
 special interest group (SIG) matrix-sig was founded with the 
 aim of defining an array computing package; among its members 
 was Python designer and maintainer Guido van Rossum, who 
 extended Python's syntax (in particular the indexing syntax) to 
 make array computing easier.[6]
 """

 Maybe Walter should join one of such SIGs as well :-)
Let me further quote from [6] """ During these early years, there was considerable interaction between the standard and scientific Python communities. In fact, Guido van Rossum, Python's Benevolent Dictator For Life (BDFL), was an active member of the matrix-sig. This close interaction resulted in Python gaining new features and syntax specifically needed by the scientific Python community. While there were miscellaneous changes, such as the addition of complex numbers, many changes focused on providing a more succinct and easier to read syntax for array manipulation. For instance, the parenthesis around tuples were made optional so that array elements could be accessed through, for example, a[0,1] instead of a[(0,1)]. The slice syntax gained a step argument— a[::2] instead of just a[:], for example—and an ellipsis operator, which is useful when dealing with multidimensional data structures. """ [6] https://www.computer.org/csdl/magazine/cs/2011/02/mcs2011020009/13rRUx0xPMx
Oct 23 2020
prev sibling next sibling parent reply mw <mingwu gmail.com> writes:
On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
 1) use Python's arr[start:end], in addition to D's 
 arr[start..end]
BTW, in Python arr[start:end:step], how / if it's possible for this `step` in now D?
Oct 23 2020
parent mw <mingwu gmail.com> writes:
On Friday, 23 October 2020 at 22:53:29 UTC, mw wrote:
 On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
 1) use Python's arr[start:end], in addition to D's 
 arr[start..end]
BTW, in Python arr[start:end:step], how / if it's possible for this `step` in now D?
(Today I'm in the mood of a language historian :-) Some of Guido's early discussion of Python array index: Slices https://mail.python.org/pipermail/matrix-sig/1996-April/000553.html Pseudo Indices https://mail.python.org/pipermail/matrix-sig/1996-January/000331.html Mutli-dimensional indexing and other comments https://mail.python.org/pipermail/matrix-sig/1995-October/000077.html A problem with slicing https://mail.python.org/pipermail/matrix-sig/1995-September/000042.html
Oct 23 2020
prev sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
 [snip] Build or wrap something like tensorflow, I think will 
 need much more resource than the D community current have, also 
 I'm not sure if it worth the effort.
https://github.com/ShigekiKarita/tfd The author of that has some other useful libraries.
Oct 25 2020
prev sibling next sibling parent reply bachmeier <no spam.net> writes:
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards
There is some activity in this space: https://code.dlang.org/?sort=updated&category=library.scientific This project doesn't seem too active, but it was an earlier attempt: http://dlangscience.github.io/
Oct 23 2020
parent reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
On Friday, 23 October 2020 at 22:48:16 UTC, bachmeier wrote:
 On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics 
 wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python 
 even with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards
There is some activity in this space: https://code.dlang.org/?sort=updated&category=library.scientific This project doesn't seem too active, but it was an earlier attempt: http://dlangscience.github.io/
To me a scientific library need to be HPC oriented, able - to perform // computation on CPU or GPU - to use divide and conquer strategy in order to compute over multinode - to have dataframe features - to have scipy features A such library would be awesome as at these time python slowness become more and more important as data grow exponentially year after year
Oct 23 2020
parent reply Russel Winder <russel winder.org.uk> writes:
On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via Digitalmars-d wrote:
[=E2=80=A6]
 To me a scientific library need to be HPC oriented, able
 - to perform // computation on CPU or GPU
 - to use divide and conquer strategy in order to compute over=20
 multinode
 - to have dataframe features
 - to have scipy features
 A such library would be awesome as at these time python slowness=20
 become more and more important as data grow exponentially year=20
 after year
Acting somewhat as "Devil's Advocate"=E2=80=A6 Why not just use Chapel https://chapel-lang.org/ =E2=80=93 it is a programm= ing language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing.=20 I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
Oct 24 2020
parent reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
On Saturday, 24 October 2020 at 09:29:46 UTC, Russel Winder wrote:
 On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via 
 Digitalmars-d wrote: […]
 To me a scientific library need to be HPC oriented, able
 - to perform // computation on CPU or GPU
 - to use divide and conquer strategy in order to compute over
 multinode
 - to have dataframe features
 - to have scipy features
 A such library would be awesome as at these time python 
 slowness
 become more and more important as data grow exponentially year
 after year
Acting somewhat as "Devil's Advocate"… Why not just use Chapel https://chapel-lang.org/ – it is a programming language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing. I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there.
Maybe, anyway since years D search the killer app. Really I thanks thisr area it is perfect for D. Data Business analysis is so important in this day in science, economy and other D could be a good choice.
Oct 24 2020
next sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Sat, 2020-10-24 at 11:05 +0000, bioinfornatics via Digitalmars-d wrote:

[=E2=80=A6]

 Maybe, anyway since years D search the killer app. Really I=20
 thanks thisr area it is perfect for D.
 Data Business analysis is so important in this day in science,=20
 economy and other D could be a good choice.
I agree that D could be the replacement for Python for many scientific mili= eu: bioinformatics, astronomy, to name but two obvious ones. The issue though i= s that the Python language over NumPy and associated communities captured the moment years ago and many people contributed many extensions to a few libraries and packages. Traditionally (as it were) bioinformatics and astronomy have emphasised exploration over computation, and often offloaded computation to C or C++ realised frameworks. This has reinforced prioritising code comprehension an= d evolution over computation speed, thus militating in favour of Python since the packages were there. Whilst D could replace Python, the question is will it and the answer is determined by who would write the code. Sadly history tells us this will le= ad to a (very) long (divergent) thread and result in no-one actually doing anything. I would like to be proved wrong. The possible upside is that all the major Python packages started as one or two people creating something that others then joined in with and turned in= to the de facto standard. Might this finally happen in the D community? --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
Oct 24 2020
parent Andre Pany <andre s-e-a-p.de> writes:
On Saturday, 24 October 2020 at 12:08:00 UTC, Russel Winder wrote:
 On Sat, 2020-10-24 at 11:05 +0000, bioinfornatics via 
 Digitalmars-d wrote:

 […]

 Maybe, anyway since years D search the killer app. Really I
 thanks thisr area it is perfect for D.
 Data Business analysis is so important in this day in science,
 economy and other D could be a good choice.
I agree that D could be the replacement for Python for many scientific milieu: bioinformatics, astronomy, to name but two obvious ones. The issue though is that the Python language over NumPy and associated communities captured the moment years ago and many people contributed many extensions to a few libraries and packages. Traditionally (as it were) bioinformatics and astronomy have emphasised exploration over computation, and often offloaded computation to C or C++ realised frameworks. This has reinforced prioritising code comprehension and evolution over computation speed, thus militating in favour of Python since the packages were there. Whilst D could replace Python, the question is will it and the answer is determined by who would write the code. Sadly history tells us this will lead to a (very) long (divergent) thread and result in no-one actually doing anything. I would like to be proved wrong. The possible upside is that all the major Python packages started as one or two people creating something that others then joined in with and turned into the de facto standard. Might this finally happen in the D community?
Just expecting someone else is doing the work will very likely not happen at this point in time. But you can actually increase the chances it will happen in future. My opinion: Neither a language feature X nor a specific library Y is missing at the moment but the community needs to do massive advertisements for the D Programming Language. The bigger the community will become, the more libraries will be created. Therefore you can start here by advising D and its strengths at every channel which make sense. Kind regards Andre
Oct 24 2020
prev sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Saturday, 24 October 2020 at 11:05:48 UTC, bioinfornatics 
wrote:
 On Saturday, 24 October 2020 at 09:29:46 UTC, Russel Winder 
 wrote:
 On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via 
 Digitalmars-d wrote: […]
 To me a scientific library need to be HPC oriented, able
 - to perform // computation on CPU or GPU
 - to use divide and conquer strategy in order to compute over
 multinode
 - to have dataframe features
 - to have scipy features
 A such library would be awesome as at these time python 
 slowness
 become more and more important as data grow exponentially year
 after year
Acting somewhat as "Devil's Advocate"… Why not just use Chapel https://chapel-lang.org/ – it is a programming language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing. I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there.
Maybe, anyway since years D search the killer app. Really I thanks thisr area it is perfect for D. Data Business analysis is so important in this day in science, economy and other D could be a good choice.
D is already quite late for the party. getting better in this domain. https://docs.microsoft.com/en-us/dotnet/csharp/tutorials/ranges-indexes While support for working with Spark just went 1.0 this week, https://dotnet.microsoft.com/apps/data/spark D would do better to see how to interoperate with existing stuff.
Oct 27 2020
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 27 October 2020 at 10:23:52 UTC, Paulo Pinto wrote:
 D is already quite late for the party.
Right, but I don't think being CPU centric will work out in that domain anyway. You need to use Vulkan and Metal, aim for future hardware and do it really well. The market is in the future, not here, right now.
Oct 27 2020
prev sibling parent reply mw <mingwu gmail.com> writes:
On Tuesday, 27 October 2020 at 10:23:52 UTC, Paulo Pinto wrote:
 D is already quite late for the party.
Is there a saying: better late than never :-) I still think D has chance, if we really want to catch-up, to overthrown Python
Oct 29 2020
parent mw <mingwu gmail.com> writes:
On Friday, 30 October 2020 at 02:03:10 UTC, mw wrote:
 On Tuesday, 27 October 2020 at 10:23:52 UTC, Paulo Pinto wrote:
 D is already quite late for the party.
Is there a saying: better late than never :-) I still think D has chance, if we really want to catch-up, to overthrown Python
Of course, this community have to make the effort to make it happen.
Oct 29 2020
prev sibling next sibling parent reply 9il <ilyayaroshenko gmail.com> writes:
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards
Magpie was another attempt to create Data Frame in D using the architecture patterns from Python/R. It has never been part of Mir infrastructure. DataFrame in D should be a little different from what people have in scripting languages. Otherwise, it is will be not good enough.
 2/ does the scientific computing field is something that D 
 language want to grow ?
I would like to say Yes. But the reality is that the answer is that D isn't going to grow. The sci related answer is that the members of DLF either ignore my requests or even reject related work for the compiler because they "don't see much of a difference" The following work is what really will be good for Sci D and especially for DataFrame. https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1023.md https://github.com/dlang/dmd/pull/9778
Oct 24 2020
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Saturday, 24 October 2020 at 12:26:21 UTC, 9il wrote:
 [snip]

 I would like to say Yes. But the reality is that the answer is 
 that D isn't going to grow. The sci related answer is that the 
 members of DLF either ignore my requests or even reject related 
 work for the compiler because they "don't see much of a 
 difference"

 The following work is what really will be good for Sci D and 
 especially for DataFrame.

 https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1023.md
 https://github.com/dlang/dmd/pull/9778
I think, unfortunately, it is not always easy to communicate why these changes are important or valuable. But, while you weren't able to convince Atila of the value of the proposed features, I also don't think you were ignored either, at least in that case. That being said, I never understood Atila's argument about this feature as being a light version of Rust traits (I'm not aware of how Haskell's typeclasses work). A Rust trait is a list of functions that must be implemented by a type. However, you can also statically dispatch based on the trait (you can dynamically as well, but I imagine you would prefer not to be able to do that). You conceivably would be more interested in the static dispatch part of it. It's not really about a PackedUpperTriangularMatrix requiring specific functions to be a PackedUpperTriangularMatrix rather than a Slice. All it takes is the right iterator type. So it's more about how the specific type is specialized (giving a specific iterator to PackedUpperTriangularMatrix). There might be a way to create a feature that's not Rust traits that does what you want and is a more general feature than this type of template alias deduction.
Oct 25 2020
parent jmh530 <john.michael.hall gmail.com> writes:
On Sunday, 25 October 2020 at 21:30:59 UTC, jmh530 wrote:
 [snip]

 I think, unfortunately, it is not always easy to communicate 
 why these changes are important or valuable. But, while you 
 weren't able to convince Atila of the value of the proposed 
 features, I also don't think you were ignored either, at least 
 in that case.

 That being said, I never understood Atila's argument about this 
 feature as being a light version of Rust traits (I'm not aware 
 of how Haskell's typeclasses work). A Rust trait is a list of 
 functions that must be implemented by a type. However, you can 
 also statically dispatch based on the trait (you can 
 dynamically as well, but I imagine you would prefer not to be 
 able to do that). You conceivably would be more interested in 
 the static dispatch part of it. It's not really about a 
 PackedUpperTriangularMatrix requiring specific functions to be 
 a PackedUpperTriangularMatrix rather than a Slice. All it takes 
 is the right iterator type. So it's more about how the specific 
 type is specialized (giving a specific iterator to 
 PackedUpperTriangularMatrix).

 There might be a way to create a feature that's not Rust traits 
 that does what you want and is a more general feature than this 
 type of template alias deduction.
Adding a little more... The situation that this issue is trying to address is something like a template T!(U!V) that you want to be able to use like W!V. I can't help but think that concepts could help with this situation. Adapting the C++20 syntax, consider this simplest implementation: template PackedUpperTriangularMatrix(T) { concept PackedUpperTriangularMatrix = is(T: Slice!(StairsIterator!(U*, "-")), U); } Assuming the same functionality as in C++20, you could use this in a function as in void foo(PackedUpperTriangularMatrix x) {} However, if you then want to place any constraints on the U above, then you're a bit SOL. To really get the functionality working, you would need a generic kind of concept, where the concept you are defining is itself generic. As far as I can tell, you can't do this with C++20, but I would imagine the syntax adapted from above might be something like template PackedUpperTriangularMatrix(T) { concept PackedUpperTriangularMatrix(U) = is(T: Slice!(StairsIterator!(U*, "-"))); } and would enable you to write the function as void foo(T)(PackedUpperTriangularMatrix!T x) {} In other words, if D had the ability to define concepts that are also generic themselves, then it would enable the functionality you want also.
Oct 26 2020
prev sibling next sibling parent data pulverizer <data.pulverizer gmail.com> writes:
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards
I think the answer to questions like this is often money. You need to hire someone to write the code that does this. In the world of research this means a grant to fund it. If you think there is real merit in the D programming language in your field, the best thing to do is to make the case in form of a grant application to pay a researcher to write the necessary code. This is what languages like R, Python, and Julia do. They are flush with cash because people write grant applications for PhD students and researchers to build libraries for those languages.
Oct 24 2020
prev sibling next sibling parent reply James Blachly <james.blachly gmail.com> writes:
On 10/23/20 3:31 PM, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy pandas and 
 scipy. But I am bored by the slowness of python even with cpython code 
 thanks to the GIL and un-optimized tail recursion.
 
 So I thinks really that D could play a big role in this field with MIR 
 and dcompute.
 
 1/ what is the state of Magpie which was a GSoC 2019:
   - Mir Data Analysis and Processing Library
 
 2/ does the scientific computing field is something that D language want 
 to grow ?
 
 Thanks
 
 Best regards
Aside / self-promotion: We use D extensively in our bioinformatics / computational biology program. Check out https://github.com/blachlylab/dhtslib/ Also just published a HTS/NGS tool written in D: https://academic.oup.com/nargab/article/2/4/lqaa070/5917298 I should probably do an `announce` forum post. Currently trying to decide whether to extend Magpie or roll our own (adding only features that are needed). I've also enjoyed using Mir ndslice in a couple of test projects, but as you know that is not really a dataframe.
Oct 24 2020
parent glis-glis <andreas.fueglistaler gmail.com> writes:
On Saturday, 24 October 2020 at 16:43:45 UTC, James Blachly wrote:
 Aside / self-promotion:
 We use D extensively in our bioinformatics / computational 
 biology program.

 Check out https://github.com/blachlylab/dhtslib/

 Also just published a HTS/NGS tool written in D:

 https://academic.oup.com/nargab/article/2/4/lqaa070/5917298

 I should probably do an `announce` forum post.

 Currently trying to decide whether to extend Magpie or roll our 
 own (adding only features that are needed). I've also enjoyed 
 using Mir ndslice in a couple of test projects, but as you know 
 that is not really a dataframe.
Self-promoting as well :-) I also started to use D in the domain of biophysics. Big computations are still done with our C++ code, but I'm translating old python pre- and posttreatment scripts into D, getting a nice speedup: https://github.com/glis-glis/biophysics Please note that all I know about D is from tour.dlang.org and I'm often rather lazy concerning comments (which obviously comes back biting me rather often when I don't understand my own code 2 months later...). What is sometimes lacking is information and documentation of what exists for D. Here it states that D can work with GPUs: https://dlang.org/areas-of-d-usage.html#gpu but all you get is a link to a presentation of 2016. I needed a way to calculate the principal component analysis. Mir can't do it, but I found out Lubeck can. So, I tried the Lubeck-example of the dlang tour, which didn't work. I was able to correct it, and did a pull-request so the example is now working, but most newcomer woud probably just say "Ok, it's broken, let's get back to Numpy".
Oct 27 2020
prev sibling next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 [snip]

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards
I'm certainly interested in it, but doing it well takes time.
Oct 25 2020
parent reply bachmeier <no spam.net> writes:
On Sunday, 25 October 2020 at 20:30:58 UTC, jmh530 wrote:
 On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics 
 wrote:
 [snip]

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards
I'm certainly interested in it, but doing it well takes time.
Over time, I've come to three conclusions on this topic. I don't know that time is the issue. 1. This community seems to have NIH syndrome. A lot of users are averse to reuising the functionality provided by other languages. I find that downright weird given that one of the selling points of D is its ability to easily interoperate with other languages. It makes sense to *extend* existing projects in other languages using D. The idea of rewriting millions of lines of code for no benefit other than just saying it's written in D is obviously pointless, so there's no motivation to do it. 2. Scientific computing is a big field. In terms of things you'd need to be "complete", you'd have to write maybe ten times as much code as you would to have a complete web development offering. It also requires incredible amounts of expertise. Statistics, economics, physics, math, chemistry, biology, and on and on are all areas that individually require a great deal of specialized knowledge in addition to time. For some things, performance is the most important property, including use of the GPU. That's not simple. 3. D's syntax is okay, but it's not flexible enough to express eveything you need to work comfortably. A DSL or similar might be necessary.
Oct 25 2020
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Sunday, 25 October 2020 at 23:30:10 UTC, bachmeier wrote:
 [snip]

 1. This community seems to have NIH syndrome. A lot of users 
 are averse to reuising the functionality provided by other 
 languages. I find that downright weird given that one of the 
 selling points of D is its ability to easily interoperate with 
 other languages. It makes sense to *extend* existing projects 
 in other languages using D. The idea of rewriting millions of 
 lines of code for no benefit other than just saying it's 
 written in D is obviously pointless, so there's no motivation 
 to do it.

 2. Scientific computing is a big field. In terms of things 
 you'd need to be "complete", you'd have to write maybe ten 
 times as much code as you would to have a complete web 
 development offering. It also requires incredible amounts of 
 expertise. Statistics, economics, physics, math, chemistry, 
 biology, and on and on are all areas that individually require 
 a great deal of specialized knowledge in addition to time. For 
 some things, performance is the most important property, 
 including use of the GPU. That's not simple.

 3. D's syntax is okay, but it's not flexible enough to express 
 eveything you need to work comfortably. A DSL or similar might 
 be necessary.
I have no issue with calling libraries from other languages (particularly C) if it's something that is too much work or whatever to do myself. But I think that it's helpful to have a base level of functionality, akin to Numpy/Scipy, that a new person could come in to accomplish a lot.
Oct 25 2020
parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 26 October 2020 at 01:42:42 UTC, jmh530 wrote:
 I have no issue with calling libraries from other languages 
 (particularly C) if it's something that is too much work or 
 whatever to do myself. But I think that it's helpful to have a 
 base level of functionality, akin to Numpy/Scipy, that a new 
 person could come in to accomplish a lot.
Aren't numpy and scipy themselves largely built on "calling libraries from other languages"? Specifically, C and Fortran.
Oct 25 2020
parent bachmeier <no spam.net> writes:
On Monday, 26 October 2020 at 01:55:46 UTC, Paul Backus wrote:
 On Monday, 26 October 2020 at 01:42:42 UTC, jmh530 wrote:
 I have no issue with calling libraries from other languages 
 (particularly C) if it's something that is too much work or 
 whatever to do myself. But I think that it's helpful to have a 
 base level of functionality, akin to Numpy/Scipy, that a new 
 person could come in to accomplish a lot.
Aren't numpy and scipy themselves largely built on "calling libraries from other languages"? Specifically, C and Fortran.
Exactly. And R (actually S at that time) started as a glue language for libraries in those languages.
Oct 25 2020
prev sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards
Just ran across a Hacker News thread about pyston that relates to some of the discussion here [1]. It seems there's still a demand for alternatives to python for data science. [1] https://news.ycombinator.com/item?id=24921790
Oct 29 2020
parent reply Russel Winder <russel winder.org.uk> writes:
On Thu, 2020-10-29 at 10:23 +0000, jmh530 via Digitalmars-d wrote:
=20
[=E2=80=A6]
 Just ran across a Hacker News thread about pyston that relates to=20
 some of the discussion here [1]. It seems there's still a demand=20
 for alternatives to python for data science.
=20
 [1] https://news.ycombinator.com/item?id=3D24921790
I only quickly skimmed the blog page, so this is a first reaction. I shall read the material more carefully tomorrow and send an update. 1. People have been trying to make Python execute faster for 30 years. In t= he end everyone ends up just using CPython with any and all optimisations it c= an get in. 2. Python is slow, and fundamentally single threaded. Attempts to make Pyth= on multi-threaded seem to fall by the wayside. The micro-benchmarks seem to indicate Pyston is just a slightly faster Python and thus nothing really to write home about =E2=80=93 yes even a headline figure of 20% is nothing to = write home about! 3. If you want computational performance from Python code, you use C, C++(,= or D) extensions. In particular you use NumPy. I would guess that almost all bioinformatics, astronomy, machine learning, AI, data science stuff uses NumPy. Python execution performance is irrelevant compared to NumPy code performance. I am happy to be shown to be wrong, but I suspect not. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
Oct 29 2020
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 29 October 2020 at 22:52:59 UTC, Russel Winder wrote:
 [snip]

 I only quickly skimmed the blog page, so this is a first 
 reaction. I shall read the material more carefully tomorrow and 
 send an update.

 1. People have been trying to make Python execute faster for 30 
 years. In the end everyone ends up just using CPython with any 
 and all optimisations it can get in.

 2. Python is slow, and fundamentally single threaded. Attempts 
 to make Python multi-threaded seem to fall by the wayside. The 
 micro-benchmarks seem to indicate Pyston is just a slightly 
 faster Python and thus nothing really to write home about – yes 
 even a headline figure of 20% is nothing to write home about!

 [snip]
I think the point on multi-threaded Python came away as a big complaint there. Lots of mentions of the GIL or people being CPU-bound. Pandas was mentioned in this context as well.
Oct 30 2020
parent reply Russel Winder <russel winder.org.uk> writes:
On Fri, 2020-10-30 at 10:12 +0000, jmh530 via Digitalmars-d wrote:
=20
[=E2=80=A6]
 I think the point on multi-threaded Python came away as a big=20
 complaint there. Lots of mentions of the GIL or people being=20
 CPU-bound. Pandas was mentioned in this context as well.
<< I haven't properly read the blog entry as yet. Sorry. >> Guido saw (cf. he and I had a long "discussion" at EuroPython 2010, there w= ere many witnesses) GIL as absolutely fine for CPython in perpetuity, that if P= ypy came up with a GIL-free VM then that would be fine. His mindset was (and I suspect may still be) that Python code was/is not about being CPU bound cod= e, it was/is about sequential and concurrent, not parallel for performance, co= de. As long as there is NumPy and other PVM extensions, or use of message passi= ng between processes, that allow for GIL-free parallel, CPU bound processing, = it is hard to say Guido was/is wrong. (And in 2010 it was even harder :-) ) Having thought about it on and off for a decade, I am happy with the status quo around Python. Python code is (or should be) highly maintainable code designed for execution on a single threaded VM, easily understood and amend= ed. Anyone trying to do CPU bound code using Python is "doing it wrong". Whethe= r D is the right alternative, or a language such as Chapel is better, is a moot point. Pandas is build on NumPy and so has the same parallelism properties as any other NumPy realised package. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
Oct 30 2020
next sibling parent reply Abdulhaq <alynch4047 gmail.com> writes:
On Friday, 30 October 2020 at 12:15:58 UTC, Russel Winder wrote:
 On Fri, 2020-10-30 at 10:12 +0000, jmh530 via Digitalmars-d 
 wrote:
 
[…]
 I think the point on multi-threaded Python came away as a big 
 complaint there. Lots of mentions of the GIL or people being 
 CPU-bound. Pandas was mentioned in this context as well.
<< I haven't properly read the blog entry as yet. Sorry. >> Guido saw (cf. he and I had a long "discussion" at EuroPython 2010, there were many witnesses) GIL as absolutely fine for CPython in perpetuity, that if Pypy came up with a GIL-free VM then that would be fine. His mindset was (and I suspect may still be) that Python code was/is not about being CPU bound code, it was/is about sequential and concurrent, not parallel for performance, code. As long as there is NumPy and other PVM extensions, or use of message passing between processes, that allow for GIL-free parallel, CPU bound processing, it is hard to say Guido was/is wrong. (And in 2010 it was even harder :-) ) Pandas is build on NumPy and so has the same parallelism properties as any other NumPy realised package.
I've spent much of the last 5 years writing code for trade studies and other optimisations on top of python, numpy and multiprocessing. Lately I have been working a lot with Pandas for multi-dimensional optimisation and machine learning. The slow performance of python in the glue layer between numpy, multiprocessing etc. is a non-issue. I can easily keep all 8 cores very busy running efficient C++ CFD, machine learning codes etc. using the above combination. The migration from P2 to P3 was also pretty tame. For people doing real work, it's not a big deal. Sure it was a distraction but it has its benefits, I'm glad they did it. Boring opinion, and doesn't generate ad income from blog hits, but there you go. I would like to see D have a numpy equivalent but realistically you won't duplicate the numy ecosystem here, it's too much work. And why do it? Just wrap up the numpy ecosystem from D and use it like that. Core Pandas on its own BTW isn't hard to implement IMO. It turns out it's very expressive and very useful, but not a hard thing to copy.
Oct 30 2020
parent reply bachmeier <no spam.net> writes:
On Friday, 30 October 2020 at 18:23:38 UTC, Abdulhaq wrote:

 I would like to see D have a numpy equivalent but realistically 
 you won't duplicate the numy ecosystem here, it's too much 
 work. And why do it? Just wrap up the numpy ecosystem from D 
 and use it like that.
I would love to see this. A project to use the functionality of Python, R, and Julia from inside a D program with little effort. William Stein did something like that with SageMath, but from a different angle. I can say the R part is simple. (Not only the parts written in R, but any underlying C, C++, or Fortran code with R bindings as well.) I wouldn't expect it to be much harder for the other languages, but since I don't work with them, I can't say. The advantage of D would be the new functionality you write in D on top of the existing functionality in those languages.
Oct 30 2020
parent reply Laeeth Isharc <laeeth laeeth.com> writes:
On Friday, 30 October 2020 at 20:32:32 UTC, bachmeier wrote:
 On Friday, 30 October 2020 at 18:23:38 UTC, Abdulhaq wrote:

 I would like to see D have a numpy equivalent but 
 realistically you won't duplicate the numy ecosystem here, 
 it's too much work. And why do it? Just wrap up the numpy 
 ecosystem from D and use it like that.
I would love to see this. A project to use the functionality of Python, R, and Julia from inside a D program with little effort. William Stein did something like that with SageMath, but from a different angle. I can say the R part is simple. (Not only the parts written in R, but any underlying C, C++, or Fortran code with R bindings as well.) I wouldn't expect it to be much harder for the other languages, but since I don't work with them, I can't say. The advantage of D would be the new functionality you write in D on top of the existing functionality in those languages.
We can call C++ libraries from our little language written in D and you can even write C++ inline, compile it at runtime and call it thanks to Cling. Can call python although it's not yet in master. Initially via pyd but people have their own particular versions, installs and setups so instead moving to RPC over named pipes using nanomsg. That should generalise to anything other languages we would want to call too. Serialisation and deserialisation isn't dirt cheap but the idea isn't to write inner loops in python. There's a lot more overhead doing it this way - it's not for free. But it is valuable for internal use for the problems we currently have. I have a little plugin that uses your R wrapper but it's not used by anyone yet. Time taken to a first version matters for us. The first version doesn't usually need to be fast for user code. This should allow us to access libraries without having to combine that with language choices. In time I figure we could use cling to generate declarations and light wrappers for C++ too. Robert Schadek made a beginning on Julia integration work but we haven't had time to do more than that.
Nov 03 2020
parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Tuesday, 3 November 2020 at 22:51:14 UTC, Laeeth Isharc wrote:
 Robert Schadek made a beginning on Julia integration work but 
 we haven't had time to do more than that.
If you're just passing arrays and pointers between Julia and D, this is pretty simple no? Julia's ccall makes that relatively simple. You can even compile D code and call it from Julia - that should be pretty straightforward. Calling Julia from D just needs the Julia C API, which again is pretty straightforward. You'll need to convert what you need from julia.h header file.
Nov 05 2020
parent reply bachmeier <no spam.net> writes:
On Thursday, 5 November 2020 at 13:11:17 UTC, data pulverizer 
wrote:
 On Tuesday, 3 November 2020 at 22:51:14 UTC, Laeeth Isharc 
 wrote:
 Robert Schadek made a beginning on Julia integration work but 
 we haven't had time to do more than that.
If you're just passing arrays and pointers between Julia and D, this is pretty simple no? Julia's ccall makes that relatively simple. You can even compile D code and call it from Julia - that should be pretty straightforward. Calling Julia from D just needs the Julia C API, which again is pretty straightforward. You'll need to convert what you need from julia.h header file.
The question for me is if you can work with the same data structures in D, R, Python, and Julia. Can your main program be written in D, but calling out to all three for loading, transforming, and analyzing the data? I'm guessing not, but would be awesome if you could do it.
Nov 05 2020
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:
 [snip]

 The question for me is if you can work with the same data 
 structures in D, R, Python, and Julia. Can your main program be 
 written in D, but calling out to all three for loading, 
 transforming, and analyzing the data? I'm guessing not, but 
 would be awesome if you could do it.
Yeah, that would be pretty nice. However, I would emphasize what aberba has been saying across several different threads, which is the importance of documentation and tutorials. It's nice to have the ability to do it, but if you don't make it clear for the typical user of R/Python/Julia to figure it out, then the reach will be limited.
Nov 05 2020
parent reply bachmeier <no spam.net> writes:
On Thursday, 5 November 2020 at 19:39:43 UTC, jmh530 wrote:
 On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:
 [snip]

 The question for me is if you can work with the same data 
 structures in D, R, Python, and Julia. Can your main program 
 be written in D, but calling out to all three for loading, 
 transforming, and analyzing the data? I'm guessing not, but 
 would be awesome if you could do it.
Yeah, that would be pretty nice. However, I would emphasize what aberba has been saying across several different threads, which is the importance of documentation and tutorials. It's nice to have the ability to do it, but if you don't make it clear for the typical user of R/Python/Julia to figure it out, then the reach will be limited.
Definitely, but you need to have the functionality first. On the homepage for embedr, I have examples showing most of the functionality: https://embedr.netlify.app/ I started writing up lecture notes but then the pandemic sent my workload through the roof.
Nov 05 2020
parent data pulverizer <data.pulverizer gmail.com> writes:
On Thursday, 5 November 2020 at 21:57:46 UTC, bachmeier wrote:
 Definitely, but you need to have the functionality first. On 
 the homepage for embedr, I have examples showing most of the 
 functionality: https://embedr.netlify.app/ I started writing up 
 lecture notes but then the pandemic sent my workload through 
 the roof.
Looks cool.
Nov 05 2020
prev sibling parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:
 The question for me is if you can work with the same data 
 structures in D, R, Python, and Julia. Can your main program be 
 written in D, but calling out to all three for loading, 
 transforming, and analyzing the data? I'm guessing not, but 
 would be awesome if you could do it.
It's actually a problem I've been thinking about on and off for a while but haven't gone round to actually trying to implement it. 1. If I had to do this, I would first decide on a collection of common data structures to share starting with *compositions* of R/Python/Julia style multi-dimensional arrays - contiguous arrays with basic element types with a dimensional information in form of another array. So a 2x3 double matrix is a double array of length 6 with another long array containing [2, 3]. R has externalptr, Julia can interface with pointers, as can Python. 2. Next I would use memory mapped i/o for storage. Usually memory mapped files are only accessible by one thread for security but I believe that this can be changed. For security you could use cryptographic keys to access the files between threads. So that memory written in one language can be access by another. 3. Binary file i/o for those is pretty simple, but necessary to store results and read then in any of the programs afterwards. 4. All the languages have C APIs so you'd write interfaces in D using these to call from D to the languages. All the languages can call D extern C functions in dlls directly using their versions of ccall. Another alternative to mmap is using network serialization which would be more cross-platform and fungible but this seems like it could be slow to me.
Nov 05 2020
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer 
wrote:
 [snip]

 2. Next I would use memory mapped i/o for storage. Usually 
 memory mapped files are only accessible by one thread for 
 security but I believe that this can be changed. For security 
 you could use cryptographic keys to access the files between 
 threads. So that memory written in one language can be access 
 by another.

 [snip]
One thread only? Sounds like GIL...
Nov 05 2020
parent data pulverizer <data.pulverizer gmail.com> writes:
On Thursday, 5 November 2020 at 20:30:03 UTC, jmh530 wrote:
 On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer 
 wrote:
 [snip]

 2. Next I would use memory mapped i/o for storage. Usually 
 memory mapped files are only accessible by one thread for 
 security but I believe that this can be changed. For security 
 you could use cryptographic keys to access the files between 
 threads. So that memory written in one language can be access 
 by another.

 [snip]
One thread only? Sounds like GIL...
Not necessarily. The cryptographic keys are used to access the file not to lock it, I believe mmap files can be secured with a password, which should be generated cryptograhically as an alternative to manually entered and stored somewhere. It protects the file from unsanctioned access. Even though the file itself will probably only take a single password rather than some synchronized rotating mechanism. However it is done, the memory will need to be protected. There should be no reason why multiple processes could not read from a file. Only writing would require a lock from other processes for obvious reasons. As I said, I haven't even begun to properly plan an implementation yet, just something that I think about from time to time.
Nov 05 2020
prev sibling parent reply bachmeier <no spam.net> writes:
On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer 
wrote:

 1. If I had to do this, I would first decide on a collection of 
 common data structures to share starting with *compositions* of 
 R/Python/Julia style multi-dimensional arrays - contiguous 
 arrays with basic element types with a dimensional information 
 in form of another array. So a 2x3 double matrix is a double 
 array of length 6 with another long array containing [2, 3]. R 
 has externalptr, Julia can interface with pointers, as can 
 Python.
R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R. It assumes it can do anything it wants with that data. Unless they've changed something (which is possible since I haven't looked into it in years) you'd have to copy any data you send to an R function. But if you're calling R maybe you don't care about that.
Nov 05 2020
parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:
 On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer 
 wrote:

 1. If I had to do this, I would first decide on a collection 
 of common data structures to share starting with 
 *compositions* of R/Python/Julia style multi-dimensional 
 arrays - contiguous arrays with basic element types with a 
 dimensional information in form of another array. So a 2x3 
 double matrix is a double array of length 6 with another long 
 array containing [2, 3]. R has externalptr, Julia can 
 interface with pointers, as can Python.
R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R.
Yes but you make C calls in R on the pointed object. Given the choice that's how I would write any application in R. The only purpose R would serve is as an interface to the underlying dlls. I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.
Nov 05 2020
parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer 
wrote:
 On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:
 R has externalptr, but to my knowledge, that's only for 
 transporting around C objects. I don't know of any way to call 
 R API functions with data not allocated by R.
Yes but you make C calls in R on the pointed object. Given the choice that's how I would write any application in R. The only purpose R would serve is as an interface to the underlying dlls. I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.
p.s. I'm not saying that data shouldn't be accessible or returned in R, I'm just saying that externalptr is there for other pointed objects that R might need to interface with. I hope that's clear - avoiding writing production code in R is just my professional advice.
Nov 05 2020
parent reply bachmeier <no spam.net> writes:
On Thursday, 5 November 2020 at 22:46:21 UTC, data pulverizer 
wrote:
 On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer 
 wrote:
 On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:
 R has externalptr, but to my knowledge, that's only for 
 transporting around C objects. I don't know of any way to 
 call R API functions with data not allocated by R.
Yes but you make C calls in R on the pointed object. Given the choice that's how I would write any application in R. The only purpose R would serve is as an interface to the underlying dlls. I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.
p.s. I'm not saying that data shouldn't be accessible or returned in R, I'm just saying that externalptr is there for other pointed objects that R might need to interface with. I hope that's clear - avoiding writing production code in R is just my professional advice.
It really depends (which was one of the points of my earlier post about how broad this field is). For someone doing academic research or statistical analysis for, say, marketing purposes, the interactive code they write is the production code. They're not going to write two versions of their code. I know for web applications or finance or some other areas where the distinction matters. But as far as telling people "don't write code in R", that's simply a non-starter, and there's no reason to even begin a project like this if you're going to tell people to avoid existing libraries in either R or Python. They'll just shrug when you start talking about performance because for the vast majority of what they're doing it's not an issue.
Nov 12 2020
next sibling parent bachmeier <no spam.net> writes:
On Thursday, 12 November 2020 at 19:09:48 UTC, bachmeier wrote:

 I know for web applications or finance or some other areas 
 where the distinction matters.
This should be
 I know for web applications or finance or some other areas 
 performance matters enough that they'll distinguish between 
 interactive and production code, and even write two versions.
Nov 12 2020
prev sibling parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Thursday, 12 November 2020 at 19:09:48 UTC, bachmeier wrote:
 On Thursday, 5 November 2020 at 22:46:21 UTC, data pulverizer 
 wrote:
 On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer 
 wrote:
 ... I have many years of writing code in R and from my 
 experience, apart from minor instances I would try to avoid 
 writing production libraries or code in it.
... avoiding writing production code in R is just my professional advice.
It really depends (which was one of the points of my earlier post about how broad this field is). For someone doing academic research or statistical analysis for, say, marketing purposes, the interactive code they write is the production code. They're not going to write two versions of their code. I know for web applications or finance or some other areas where the distinction matters. But as far as telling people "don't write code in R", that's simply a non-starter, and there's no reason to even begin a project like this if you're going to tell people to avoid existing libraries in either R or Python. They'll just shrug when you start talking about performance because for the vast majority of what they're doing it's not an issue.
You act as if I'm banning people from writing code in R - I certainly don't have the power to do that. And yes, it varies from situation to situation, as I clearly eluded to. I've done a lot of projects in R. I'm well aware that sometimes it is unavoidable for the client. What I am saying is given the choice, you should probably choose a different tool apart from "some minor instances". I've seen R go spectacularly wrong because of the type of language it is, it makes assumptions of that the programmer means which can cause epic bugs, and very often, it does it silently and it happens all the time. You can never be sure that *any* piece of R code will work as it should. It's just the nature of the language. People write it because it's easy and has "boilerplate", which is fine if you are proof of concepting or doing research and some other things, but you use it in mission critical production apps and it may well blow up in your face, and you might not even know. And that's before we get to performance, and other things blah, blah, blah.
Nov 13 2020
parent reply bachmeier <no spam.net> writes:
On Friday, 13 November 2020 at 21:28:30 UTC, data pulverizer 
wrote:

 You act as if I'm banning people from writing code in R - I 
 certainly don't have the power to do that. And yes, it varies 
 from situation to situation, as I clearly eluded to.
All I'm saying is that any project that interoperates with R or any other language has to accept that the programmers you're targeting are going to write code in the other language. If for no other reason than the fact that they've already written tens of thousands of lines of code that they don't want to throw away.
 I've done a lot of projects in R. I'm well aware that sometimes 
 it is unavoidable for the client. What I am saying is given the 
 choice, you should probably choose a different tool apart from 
 "some minor instances". I've seen R go spectacularly wrong 
 because of the type of language it is, it makes assumptions of 
 that the programmer means which can cause epic bugs, and very 
 often, it does it silently and it happens all the time. You can 
 never be sure that *any* piece of R code will work as it 
 should. It's just the nature of the language.
Well, I don't want to get into a big debate about R, but I don't for the most part agree with this view. R was designed to be used as (1) a functional language, and (2) a specialized tool to quickly solve a specialized set of problems. It's remarkably difficult to write incorrect code if you're using it as it was designed to be used, which includes pure functions and immutable data structures. It originated as a dialect of Scheme and that's the foundation everything is built on. Where I do agree with you is the type system. Not only is it a dynamic language, it has an extremely weak type system, which most likely has something to do with the fact that it originated down the hall in the same place that gave us C. I nonetheless don't agree with the conclusion that it should never be used. I've seen loads of R criticism, and it's almost always something like this. Here's one I've probably seen 50 times: x <- 1:10 j <- 4 x[2:3+j] The code returns [6 7]! It should obviously return [2 3 4 5 6 7]! R is trash! That's nonsense. Operators have precedence in every language. The critic would have gotten the "correct" answer with x[2:(3+j)]. No language is going to work if you don't understand operator precedence. Another is that x[-1] drops the first element. If you come from another language, that might not be what you expect. If you come from a language that forbids negative index values, you might even think this makes R unusable. Honestly, the vast majority of R critiques are not different from the folks that post here about how D does things wrong because it's different from C++ or Python or whatever language.
Nov 13 2020
parent data pulverizer <data.pulverizer gmail.com> writes:
On Friday, 13 November 2020 at 22:59:23 UTC, bachmeier wrote:
 Well, I don't want to get into a big debate about R ...
You already did so by reading something else into what I was saying
 I nonetheless don't agree with the conclusion that it should 
 never be used.
I never said this. From the beginning I said that there are some instances where R could be used.
 I've seen loads of R criticism, and it's almost always 
 something like this. Here's one I've probably seen 50 times:

 x <- 1:10
 j <- 4
 x[2:3+j]

 The code returns [6 7]! It should obviously return [2 3 4 5 6 
 7]! R is trash! ...
 Another is that x[-1] drops the first element. If you come from 
 another language, that might not be what you expect.
This is not what I'm talking about, but there are *many* known issues with how R behaves, but you seen not to have selected any of the well known ones. Here are just a few: 1. R can suddenly decide that your character (string) should suddenly become a factor - even if you know about this it can still difficult to tell when this occurs. That's why stringsAsFactors exists. 2. R can suddenly decide that your selection in a matrix should be a vector. So if you were selecting mat[, 1:n] and n == 1 you don't get a matrix anymore, and your code will fall over. That's why drop = TRUE/FALSE exists. Can still be a difficult bug to find. I've seen this happen MANY times. The behaviour in Julia mat[:, 1:n] when n == 1 is the expected one. 3. Recycling elements instead of throwing an error - loads of bugs for this one. 4. sapply will return whatever it wants with the same argument types. One minute a matrix and another time a list and so on. With the SAME ARGUMENT TYPES! 5. Dates will suddenly unpredictably morph into numbers. cat("Today is: ", Sys.Date()). 6. The flimsy and almost unusable set of OOP tools. S3, S4, "R5" - Refererence Classes, and R6 - how many languages have that many OOP systems? Mone of which are particularly effective. These are just a few of the popular ones but there are MANY more. When your code base grows, these and many other types of issues start to have a serious impact on the stability of your application. There are places for R, but you have to be VERY careful where you put it.
 R was designed to be used as (1) a functional language, and (2) 
 a specialized tool to quickly solve a specialized set of 
 problems. It's remarkably difficult to write incorrect code if 
 you're using it as it was designed to be used, which includes 
 pure functions and immutable data structures.
Sounds as if you're "quoting from authority here". The flow of what you've said here is misleading. If you said, "R is *weakly* 'functional-like' and has some convenience as a result", I might reluctantly accept that. But R doesn't have anywhere near enough features from functional programming to be even *used* as a functional language. How can you have so much instability built into a language can call it functional? R is the opposite of functional ethos! It has obscene permissiveness on some issues and irrational restrictiveness on others.
 Honestly, the vast majority of R critiques are not different 
 from the folks that post here about how D does things wrong 
 because it's different from C++ or Python or whatever language.
This is not true. I can write code in D and be pretty sure that it does what I think it does - even before thorough testing, you win massively just with static typing. C++ too. Even before the new Python function type system it was still pretty robust for a dynamic language, now you can fairly well gaurantee some things. Julia in principle is all but a static language with dynamic-typing "tagged on". R occupies a particular space as a programming language, and it's a space I'm wary of, and I think others should be careful, cognizant of it, and use it accordingly.
Nov 13 2020
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 30.10.20 13:15, Russel Winder wrote:
 Having thought about it on and off for a decade, I am happy with the status
 quo around Python. Python code is (or should be) highly maintainable code
 designed for execution on a single threaded VM, easily understood and amended.
Weird. If you had given me those requirements as a list and asked me to relate them to Python, I would have told you you made a list of weak points of Python.
Nov 13 2020
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 29 October 2020 at 22:52:59 UTC, Russel Winder wrote:
 1. People have been trying to make Python execute faster for 30 
 years. In the end everyone ends up just using CPython with any 
 and all optimisations it can get in.
I think where such efforts go wrong is that they try to optimize Python instead of looking at the usage pattern that most programmers have. Most Python users never use much of the esoteric features (including concurrency, beyond generators) that Python offers. So you could easily create a simpler language with low level implemented libraries that exhibit behaviour close enough to Python for current Python users to feel at home with it.
Oct 30 2020