digitalmars.D - Pandas like features
- bioinfornatics (11/11) Oct 23 2020 As a researcher in BioInformatics I use a lot python numpy pandas
- Imperatorn (2/14) Oct 23 2020 2. Yes!
- mw (34/44) Oct 23 2020 I think it's definitely the biggest area and opportunities for D
- mw (20/42) Oct 23 2020 Let me further quote from [6]
- mw (3/5) Oct 23 2020 BTW, in Python arr[start:end:step], how / if it's possible for
- mw (11/16) Oct 23 2020 (Today I'm in the mood of a language historian :-)
- jmh530 (3/6) Oct 25 2020 https://github.com/ShigekiKarita/tfd
- bachmeier (6/18) Oct 23 2020 There is some activity in this space:
- bioinfornatics (10/34) Oct 23 2020 To me a scientific library need to be HPC oriented, able
- Russel Winder (17/26) Oct 24 2020 Acting somewhat as "Devil's Advocate"=E2=80=A6
- bioinfornatics (5/24) Oct 24 2020 Maybe, anyway since years D search the killer app. Really I
- Russel Winder (31/35) Oct 24 2020 I agree that D could be the replacement for Python for many scientific m...
- Andre Pany (12/40) Oct 24 2020 Just expecting someone else is doing the work will very likely
- Paulo Pinto (10/38) Oct 27 2020 D is already quite late for the party.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/6) Oct 27 2020 Right, but I don't think being CPU centric will work out in that
- mw (4/5) Oct 29 2020 Is there a saying: better late than never :-)
- mw (3/9) Oct 29 2020 Of course, this community have to make the effort to make it
- 9il (15/29) Oct 24 2020 Magpie was another attempt to create Data Frame in D using the
- jmh530 (21/31) Oct 25 2020 I think, unfortunately, it is not always easy to communicate why
- jmh530 (35/57) Oct 26 2020 Adding a little more...
- data pulverizer (10/22) Oct 24 2020 I think the answer to questions like this is often money. You
- James Blachly (11/27) Oct 24 2020 Aside / self-promotion:
- glis-glis (21/32) Oct 27 2020 Self-promoting as well :-)
- jmh530 (2/7) Oct 25 2020 I'm certainly interested in it, but doing it well takes time.
- bachmeier (23/34) Oct 25 2020 Over time, I've come to three conclusions on this topic. I don't
- jmh530 (6/28) Oct 25 2020 I have no issue with calling libraries from other languages
- Paul Backus (3/8) Oct 25 2020 Aren't numpy and scipy themselves largely built on "calling
- bachmeier (3/12) Oct 25 2020 Exactly. And R (actually S at that time) started as a glue
- jmh530 (5/17) Oct 29 2020 Just ran across a Hacker News thread about pyston that relates to
- Russel Winder (30/36) Oct 29 2020 I only quickly skimmed the blog page, so this is a first reaction. I sha...
- jmh530 (4/17) Oct 30 2020 I think the point on multi-threaded Python came away as a big
- Russel Winder (34/38) Oct 30 2020 << I haven't properly read the blog entry as yet. Sorry. >>
- Abdulhaq (20/40) Oct 30 2020 I've spent much of the last 5 years writing code for trade
- bachmeier (11/15) Oct 30 2020 I would love to see this. A project to use the functionality of
- Laeeth Isharc (23/38) Nov 03 2020 We can call C++ libraries from our little language written in D
- data pulverizer (7/9) Nov 05 2020 If you're just passing arrays and pointers between Julia and D,
- bachmeier (7/18) Nov 05 2020 The question for me is if you can work with the same data
- jmh530 (7/13) Nov 05 2020 Yeah, that would be pretty nice. However, I would emphasize what
- bachmeier (6/20) Nov 05 2020 Definitely, but you need to have the functionality first. On the
- data pulverizer (2/7) Nov 05 2020 Looks cool.
- data pulverizer (24/29) Nov 05 2020 It's actually a problem I've been thinking about on and off for a
- jmh530 (3/11) Nov 05 2020 One thread only? Sounds like GIL...
- data pulverizer (15/28) Nov 05 2020 Not necessarily. The cryptographic keys are used to access the
- bachmeier (9/17) Nov 05 2020 R has externalptr, but to my knowledge, that's only for
- data pulverizer (7/20) Nov 05 2020 Yes but you make C calls in R on the pointed object. Given the
- data pulverizer (7/17) Nov 05 2020 p.s. I'm not saying that data shouldn't be accessible or returned
- bachmeier (15/33) Nov 12 2020 It really depends (which was one of the points of my earlier post
- bachmeier (2/7) Nov 12 2020
- data pulverizer (18/41) Nov 13 2020 You act as if I'm banning people from writing code in R - I
- bachmeier (37/49) Nov 13 2020 All I'm saying is that any project that interoperates with R or
- data pulverizer (51/71) Nov 13 2020 You already did so by reading something else into what I was
- Timon Gehr (4/7) Nov 13 2020 Weird. If you had given me those requirements as a list and asked me to
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (9/12) Oct 30 2020 I think where such efforts go wrong is that they try to optimize
As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regards
Oct 23 2020
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regards2. Yes!
Oct 23 2020
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ?I think it's definitely the biggest area and opportunities for D to become more popular. GIL, lack of performance, and huge memory bloat are such pain in Python. Probably the best way to move forward is to provide libmir as a Numpy/Pandas *drop-in* replacement. (And I've suggested to rename Mir as NumD from a marketing / promotional perspective). For the time being, from the language/lib user's perspective, we can just use D/libmir to pre-process the data, and maybe save the result as csv/npz for further processing (by ... Python). Build or wrap something like tensorflow, I think will need much more resource than the D community current have, also I'm not sure if it worth the effort. And from the language perspective, maybe D should adopt Python/Numpy's array indexing syntax, specifically: 1) use Python's arr[start:end], in addition to D's arr[start..end] 2) and also allow negative index, instead of [$-1]. (This $ is an improvement of Java/C++'s arr[arr.length -1], but still is less convenient than Python’s negative index syntax). Python gained such popularity in scientific computing in the past ~10 years is not an accident, actually Guido made that happen by extending Python's syntax: https://en.wikipedia.org/wiki/NumPy#History """ The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier.[6] """ Maybe Walter should join one of such SIGs as well :-)
Oct 23 2020
On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:And from the language perspective, maybe D should adopt Python/Numpy's array indexing syntax, specifically: 1) use Python's arr[start:end], in addition to D's arr[start..end] 2) and also allow negative index, instead of [$-1]. (This $ is an improvement of Java/C++'s arr[arr.length -1], but still is less convenient than Python’s negative index syntax). Python gained such popularity in scientific computing in the past ~10 years is not an accident, actually Guido made that happen by extending Python's syntax: https://en.wikipedia.org/wiki/NumPy#History """ The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier.[6] """ Maybe Walter should join one of such SIGs as well :-)Let me further quote from [6] """ During these early years, there was considerable interaction between the standard and scientific Python communities. In fact, Guido van Rossum, Python's Benevolent Dictator For Life (BDFL), was an active member of the matrix-sig. This close interaction resulted in Python gaining new features and syntax specifically needed by the scientific Python community. While there were miscellaneous changes, such as the addition of complex numbers, many changes focused on providing a more succinct and easier to read syntax for array manipulation. For instance, the parenthesis around tuples were made optional so that array elements could be accessed through, for example, a[0,1] instead of a[(0,1)]. The slice syntax gained a step argument— a[::2] instead of just a[:], for example—and an ellipsis operator, which is useful when dealing with multidimensional data structures. """ [6] https://www.computer.org/csdl/magazine/cs/2011/02/mcs2011020009/13rRUx0xPMx
Oct 23 2020
On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:1) use Python's arr[start:end], in addition to D's arr[start..end]BTW, in Python arr[start:end:step], how / if it's possible for this `step` in now D?
Oct 23 2020
On Friday, 23 October 2020 at 22:53:29 UTC, mw wrote:On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:(Today I'm in the mood of a language historian :-) Some of Guido's early discussion of Python array index: Slices https://mail.python.org/pipermail/matrix-sig/1996-April/000553.html Pseudo Indices https://mail.python.org/pipermail/matrix-sig/1996-January/000331.html Mutli-dimensional indexing and other comments https://mail.python.org/pipermail/matrix-sig/1995-October/000077.html A problem with slicing https://mail.python.org/pipermail/matrix-sig/1995-September/000042.html1) use Python's arr[start:end], in addition to D's arr[start..end]BTW, in Python arr[start:end:step], how / if it's possible for this `step` in now D?
Oct 23 2020
On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:[snip] Build or wrap something like tensorflow, I think will need much more resource than the D community current have, also I'm not sure if it worth the effort.https://github.com/ShigekiKarita/tfd The author of that has some other useful libraries.
Oct 25 2020
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regardsThere is some activity in this space: https://code.dlang.org/?sort=updated&category=library.scientific This project doesn't seem too active, but it was an earlier attempt: http://dlangscience.github.io/
Oct 23 2020
On Friday, 23 October 2020 at 22:48:16 UTC, bachmeier wrote:On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:To me a scientific library need to be HPC oriented, able - to perform // computation on CPU or GPU - to use divide and conquer strategy in order to compute over multinode - to have dataframe features - to have scipy features A such library would be awesome as at these time python slowness become more and more important as data grow exponentially year after yearAs a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regardsThere is some activity in this space: https://code.dlang.org/?sort=updated&category=library.scientific This project doesn't seem too active, but it was an earlier attempt: http://dlangscience.github.io/
Oct 23 2020
On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via Digitalmars-d wrote: [=E2=80=A6]To me a scientific library need to be HPC oriented, able - to perform // computation on CPU or GPU - to use divide and conquer strategy in order to compute over=20 multinode - to have dataframe features - to have scipy features A such library would be awesome as at these time python slowness=20 become more and more important as data grow exponentially year=20 after yearActing somewhat as "Devil's Advocate"=E2=80=A6 Why not just use Chapel https://chapel-lang.org/ =E2=80=93 it is a programm= ing language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing.=20 I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
Oct 24 2020
On Saturday, 24 October 2020 at 09:29:46 UTC, Russel Winder wrote:On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via Digitalmars-d wrote: […]Maybe, anyway since years D search the killer app. Really I thanks thisr area it is perfect for D. Data Business analysis is so important in this day in science, economy and other D could be a good choice.To me a scientific library need to be HPC oriented, able - to perform // computation on CPU or GPU - to use divide and conquer strategy in order to compute over multinode - to have dataframe features - to have scipy features A such library would be awesome as at these time python slowness become more and more important as data grow exponentially year after yearActing somewhat as "Devil's Advocate"… Why not just use Chapel https://chapel-lang.org/ – it is a programming language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing. I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there.
Oct 24 2020
On Sat, 2020-10-24 at 11:05 +0000, bioinfornatics via Digitalmars-d wrote: [=E2=80=A6]Maybe, anyway since years D search the killer app. Really I=20 thanks thisr area it is perfect for D. Data Business analysis is so important in this day in science,=20 economy and other D could be a good choice.I agree that D could be the replacement for Python for many scientific mili= eu: bioinformatics, astronomy, to name but two obvious ones. The issue though i= s that the Python language over NumPy and associated communities captured the moment years ago and many people contributed many extensions to a few libraries and packages. Traditionally (as it were) bioinformatics and astronomy have emphasised exploration over computation, and often offloaded computation to C or C++ realised frameworks. This has reinforced prioritising code comprehension an= d evolution over computation speed, thus militating in favour of Python since the packages were there. Whilst D could replace Python, the question is will it and the answer is determined by who would write the code. Sadly history tells us this will le= ad to a (very) long (divergent) thread and result in no-one actually doing anything. I would like to be proved wrong. The possible upside is that all the major Python packages started as one or two people creating something that others then joined in with and turned in= to the de facto standard. Might this finally happen in the D community? --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
Oct 24 2020
On Saturday, 24 October 2020 at 12:08:00 UTC, Russel Winder wrote:On Sat, 2020-10-24 at 11:05 +0000, bioinfornatics via Digitalmars-d wrote: […]Just expecting someone else is doing the work will very likely not happen at this point in time. But you can actually increase the chances it will happen in future. My opinion: Neither a language feature X nor a specific library Y is missing at the moment but the community needs to do massive advertisements for the D Programming Language. The bigger the community will become, the more libraries will be created. Therefore you can start here by advising D and its strengths at every channel which make sense. Kind regards AndreMaybe, anyway since years D search the killer app. Really I thanks thisr area it is perfect for D. Data Business analysis is so important in this day in science, economy and other D could be a good choice.I agree that D could be the replacement for Python for many scientific milieu: bioinformatics, astronomy, to name but two obvious ones. The issue though is that the Python language over NumPy and associated communities captured the moment years ago and many people contributed many extensions to a few libraries and packages. Traditionally (as it were) bioinformatics and astronomy have emphasised exploration over computation, and often offloaded computation to C or C++ realised frameworks. This has reinforced prioritising code comprehension and evolution over computation speed, thus militating in favour of Python since the packages were there. Whilst D could replace Python, the question is will it and the answer is determined by who would write the code. Sadly history tells us this will lead to a (very) long (divergent) thread and result in no-one actually doing anything. I would like to be proved wrong. The possible upside is that all the major Python packages started as one or two people creating something that others then joined in with and turned into the de facto standard. Might this finally happen in the D community?
Oct 24 2020
On Saturday, 24 October 2020 at 11:05:48 UTC, bioinfornatics wrote:On Saturday, 24 October 2020 at 09:29:46 UTC, Russel Winder wrote:D is already quite late for the party. getting better in this domain. https://docs.microsoft.com/en-us/dotnet/csharp/tutorials/ranges-indexes While support for working with Spark just went 1.0 this week, https://dotnet.microsoft.com/apps/data/spark D would do better to see how to interoperate with existing stuff.On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via Digitalmars-d wrote: […]Maybe, anyway since years D search the killer app. Really I thanks thisr area it is perfect for D. Data Business analysis is so important in this day in science, economy and other D could be a good choice.To me a scientific library need to be HPC oriented, able - to perform // computation on CPU or GPU - to use divide and conquer strategy in order to compute over multinode - to have dataframe features - to have scipy features A such library would be awesome as at these time python slowness become more and more important as data grow exponentially year after yearActing somewhat as "Devil's Advocate"… Why not just use Chapel https://chapel-lang.org/ – it is a programming language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing. I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there.
Oct 27 2020
On Tuesday, 27 October 2020 at 10:23:52 UTC, Paulo Pinto wrote:D is already quite late for the party.Right, but I don't think being CPU centric will work out in that domain anyway. You need to use Vulkan and Metal, aim for future hardware and do it really well. The market is in the future, not here, right now.
Oct 27 2020
On Tuesday, 27 October 2020 at 10:23:52 UTC, Paulo Pinto wrote:D is already quite late for the party.Is there a saying: better late than never :-) I still think D has chance, if we really want to catch-up, to overthrown Python
Oct 29 2020
On Friday, 30 October 2020 at 02:03:10 UTC, mw wrote:On Tuesday, 27 October 2020 at 10:23:52 UTC, Paulo Pinto wrote:Of course, this community have to make the effort to make it happen.D is already quite late for the party.Is there a saying: better late than never :-) I still think D has chance, if we really want to catch-up, to overthrown Python
Oct 29 2020
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regardsMagpie was another attempt to create Data Frame in D using the architecture patterns from Python/R. It has never been part of Mir infrastructure. DataFrame in D should be a little different from what people have in scripting languages. Otherwise, it is will be not good enough.2/ does the scientific computing field is something that D language want to grow ?I would like to say Yes. But the reality is that the answer is that D isn't going to grow. The sci related answer is that the members of DLF either ignore my requests or even reject related work for the compiler because they "don't see much of a difference" The following work is what really will be good for Sci D and especially for DataFrame. https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1023.md https://github.com/dlang/dmd/pull/9778
Oct 24 2020
On Saturday, 24 October 2020 at 12:26:21 UTC, 9il wrote:[snip] I would like to say Yes. But the reality is that the answer is that D isn't going to grow. The sci related answer is that the members of DLF either ignore my requests or even reject related work for the compiler because they "don't see much of a difference" The following work is what really will be good for Sci D and especially for DataFrame. https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1023.md https://github.com/dlang/dmd/pull/9778I think, unfortunately, it is not always easy to communicate why these changes are important or valuable. But, while you weren't able to convince Atila of the value of the proposed features, I also don't think you were ignored either, at least in that case. That being said, I never understood Atila's argument about this feature as being a light version of Rust traits (I'm not aware of how Haskell's typeclasses work). A Rust trait is a list of functions that must be implemented by a type. However, you can also statically dispatch based on the trait (you can dynamically as well, but I imagine you would prefer not to be able to do that). You conceivably would be more interested in the static dispatch part of it. It's not really about a PackedUpperTriangularMatrix requiring specific functions to be a PackedUpperTriangularMatrix rather than a Slice. All it takes is the right iterator type. So it's more about how the specific type is specialized (giving a specific iterator to PackedUpperTriangularMatrix). There might be a way to create a feature that's not Rust traits that does what you want and is a more general feature than this type of template alias deduction.
Oct 25 2020
On Sunday, 25 October 2020 at 21:30:59 UTC, jmh530 wrote:[snip] I think, unfortunately, it is not always easy to communicate why these changes are important or valuable. But, while you weren't able to convince Atila of the value of the proposed features, I also don't think you were ignored either, at least in that case. That being said, I never understood Atila's argument about this feature as being a light version of Rust traits (I'm not aware of how Haskell's typeclasses work). A Rust trait is a list of functions that must be implemented by a type. However, you can also statically dispatch based on the trait (you can dynamically as well, but I imagine you would prefer not to be able to do that). You conceivably would be more interested in the static dispatch part of it. It's not really about a PackedUpperTriangularMatrix requiring specific functions to be a PackedUpperTriangularMatrix rather than a Slice. All it takes is the right iterator type. So it's more about how the specific type is specialized (giving a specific iterator to PackedUpperTriangularMatrix). There might be a way to create a feature that's not Rust traits that does what you want and is a more general feature than this type of template alias deduction.Adding a little more... The situation that this issue is trying to address is something like a template T!(U!V) that you want to be able to use like W!V. I can't help but think that concepts could help with this situation. Adapting the C++20 syntax, consider this simplest implementation: template PackedUpperTriangularMatrix(T) { concept PackedUpperTriangularMatrix = is(T: Slice!(StairsIterator!(U*, "-")), U); } Assuming the same functionality as in C++20, you could use this in a function as in void foo(PackedUpperTriangularMatrix x) {} However, if you then want to place any constraints on the U above, then you're a bit SOL. To really get the functionality working, you would need a generic kind of concept, where the concept you are defining is itself generic. As far as I can tell, you can't do this with C++20, but I would imagine the syntax adapted from above might be something like template PackedUpperTriangularMatrix(T) { concept PackedUpperTriangularMatrix(U) = is(T: Slice!(StairsIterator!(U*, "-"))); } and would enable you to write the function as void foo(T)(PackedUpperTriangularMatrix!T x) {} In other words, if D had the ability to define concepts that are also generic themselves, then it would enable the functionality you want also.
Oct 26 2020
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regardsI think the answer to questions like this is often money. You need to hire someone to write the code that does this. In the world of research this means a grant to fund it. If you think there is real merit in the D programming language in your field, the best thing to do is to make the case in form of a grant application to pay a researcher to write the necessary code. This is what languages like R, Python, and Julia do. They are flush with cash because people write grant applications for PhD students and researchers to build libraries for those languages.
Oct 24 2020
On 10/23/20 3:31 PM, bioinfornatics wrote:As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regardsAside / self-promotion: We use D extensively in our bioinformatics / computational biology program. Check out https://github.com/blachlylab/dhtslib/ Also just published a HTS/NGS tool written in D: https://academic.oup.com/nargab/article/2/4/lqaa070/5917298 I should probably do an `announce` forum post. Currently trying to decide whether to extend Magpie or roll our own (adding only features that are needed). I've also enjoyed using Mir ndslice in a couple of test projects, but as you know that is not really a dataframe.
Oct 24 2020
On Saturday, 24 October 2020 at 16:43:45 UTC, James Blachly wrote:Aside / self-promotion: We use D extensively in our bioinformatics / computational biology program. Check out https://github.com/blachlylab/dhtslib/ Also just published a HTS/NGS tool written in D: https://academic.oup.com/nargab/article/2/4/lqaa070/5917298 I should probably do an `announce` forum post. Currently trying to decide whether to extend Magpie or roll our own (adding only features that are needed). I've also enjoyed using Mir ndslice in a couple of test projects, but as you know that is not really a dataframe.Self-promoting as well :-) I also started to use D in the domain of biophysics. Big computations are still done with our C++ code, but I'm translating old python pre- and posttreatment scripts into D, getting a nice speedup: https://github.com/glis-glis/biophysics Please note that all I know about D is from tour.dlang.org and I'm often rather lazy concerning comments (which obviously comes back biting me rather often when I don't understand my own code 2 months later...). What is sometimes lacking is information and documentation of what exists for D. Here it states that D can work with GPUs: https://dlang.org/areas-of-d-usage.html#gpu but all you get is a link to a presentation of 2016. I needed a way to calculate the principal component analysis. Mir can't do it, but I found out Lubeck can. So, I tried the Lubeck-example of the dlang tour, which didn't work. I was able to correct it, and did a pull-request so the example is now working, but most newcomer woud probably just say "Ok, it's broken, let's get back to Numpy".
Oct 27 2020
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:[snip] 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regardsI'm certainly interested in it, but doing it well takes time.
Oct 25 2020
On Sunday, 25 October 2020 at 20:30:58 UTC, jmh530 wrote:On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:Over time, I've come to three conclusions on this topic. I don't know that time is the issue. 1. This community seems to have NIH syndrome. A lot of users are averse to reuising the functionality provided by other languages. I find that downright weird given that one of the selling points of D is its ability to easily interoperate with other languages. It makes sense to *extend* existing projects in other languages using D. The idea of rewriting millions of lines of code for no benefit other than just saying it's written in D is obviously pointless, so there's no motivation to do it. 2. Scientific computing is a big field. In terms of things you'd need to be "complete", you'd have to write maybe ten times as much code as you would to have a complete web development offering. It also requires incredible amounts of expertise. Statistics, economics, physics, math, chemistry, biology, and on and on are all areas that individually require a great deal of specialized knowledge in addition to time. For some things, performance is the most important property, including use of the GPU. That's not simple. 3. D's syntax is okay, but it's not flexible enough to express eveything you need to work comfortably. A DSL or similar might be necessary.[snip] 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regardsI'm certainly interested in it, but doing it well takes time.
Oct 25 2020
On Sunday, 25 October 2020 at 23:30:10 UTC, bachmeier wrote:[snip] 1. This community seems to have NIH syndrome. A lot of users are averse to reuising the functionality provided by other languages. I find that downright weird given that one of the selling points of D is its ability to easily interoperate with other languages. It makes sense to *extend* existing projects in other languages using D. The idea of rewriting millions of lines of code for no benefit other than just saying it's written in D is obviously pointless, so there's no motivation to do it. 2. Scientific computing is a big field. In terms of things you'd need to be "complete", you'd have to write maybe ten times as much code as you would to have a complete web development offering. It also requires incredible amounts of expertise. Statistics, economics, physics, math, chemistry, biology, and on and on are all areas that individually require a great deal of specialized knowledge in addition to time. For some things, performance is the most important property, including use of the GPU. That's not simple. 3. D's syntax is okay, but it's not flexible enough to express eveything you need to work comfortably. A DSL or similar might be necessary.I have no issue with calling libraries from other languages (particularly C) if it's something that is too much work or whatever to do myself. But I think that it's helpful to have a base level of functionality, akin to Numpy/Scipy, that a new person could come in to accomplish a lot.
Oct 25 2020
On Monday, 26 October 2020 at 01:42:42 UTC, jmh530 wrote:I have no issue with calling libraries from other languages (particularly C) if it's something that is too much work or whatever to do myself. But I think that it's helpful to have a base level of functionality, akin to Numpy/Scipy, that a new person could come in to accomplish a lot.Aren't numpy and scipy themselves largely built on "calling libraries from other languages"? Specifically, C and Fortran.
Oct 25 2020
On Monday, 26 October 2020 at 01:55:46 UTC, Paul Backus wrote:On Monday, 26 October 2020 at 01:42:42 UTC, jmh530 wrote:Exactly. And R (actually S at that time) started as a glue language for libraries in those languages.I have no issue with calling libraries from other languages (particularly C) if it's something that is too much work or whatever to do myself. But I think that it's helpful to have a base level of functionality, akin to Numpy/Scipy, that a new person could come in to accomplish a lot.Aren't numpy and scipy themselves largely built on "calling libraries from other languages"? Specifically, C and Fortran.
Oct 25 2020
On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regardsJust ran across a Hacker News thread about pyston that relates to some of the discussion here [1]. It seems there's still a demand for alternatives to python for data science. [1] https://news.ycombinator.com/item?id=24921790
Oct 29 2020
On Thu, 2020-10-29 at 10:23 +0000, jmh530 via Digitalmars-d wrote:=20[=E2=80=A6]Just ran across a Hacker News thread about pyston that relates to=20 some of the discussion here [1]. It seems there's still a demand=20 for alternatives to python for data science. =20 [1] https://news.ycombinator.com/item?id=3D24921790I only quickly skimmed the blog page, so this is a first reaction. I shall read the material more carefully tomorrow and send an update. 1. People have been trying to make Python execute faster for 30 years. In t= he end everyone ends up just using CPython with any and all optimisations it c= an get in. 2. Python is slow, and fundamentally single threaded. Attempts to make Pyth= on multi-threaded seem to fall by the wayside. The micro-benchmarks seem to indicate Pyston is just a slightly faster Python and thus nothing really to write home about =E2=80=93 yes even a headline figure of 20% is nothing to = write home about! 3. If you want computational performance from Python code, you use C, C++(,= or D) extensions. In particular you use NumPy. I would guess that almost all bioinformatics, astronomy, machine learning, AI, data science stuff uses NumPy. Python execution performance is irrelevant compared to NumPy code performance. I am happy to be shown to be wrong, but I suspect not. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
Oct 29 2020
On Thursday, 29 October 2020 at 22:52:59 UTC, Russel Winder wrote:[snip] I only quickly skimmed the blog page, so this is a first reaction. I shall read the material more carefully tomorrow and send an update. 1. People have been trying to make Python execute faster for 30 years. In the end everyone ends up just using CPython with any and all optimisations it can get in. 2. Python is slow, and fundamentally single threaded. Attempts to make Python multi-threaded seem to fall by the wayside. The micro-benchmarks seem to indicate Pyston is just a slightly faster Python and thus nothing really to write home about – yes even a headline figure of 20% is nothing to write home about! [snip]I think the point on multi-threaded Python came away as a big complaint there. Lots of mentions of the GIL or people being CPU-bound. Pandas was mentioned in this context as well.
Oct 30 2020
On Fri, 2020-10-30 at 10:12 +0000, jmh530 via Digitalmars-d wrote:=20[=E2=80=A6]I think the point on multi-threaded Python came away as a big=20 complaint there. Lots of mentions of the GIL or people being=20 CPU-bound. Pandas was mentioned in this context as well.<< I haven't properly read the blog entry as yet. Sorry. >> Guido saw (cf. he and I had a long "discussion" at EuroPython 2010, there w= ere many witnesses) GIL as absolutely fine for CPython in perpetuity, that if P= ypy came up with a GIL-free VM then that would be fine. His mindset was (and I suspect may still be) that Python code was/is not about being CPU bound cod= e, it was/is about sequential and concurrent, not parallel for performance, co= de. As long as there is NumPy and other PVM extensions, or use of message passi= ng between processes, that allow for GIL-free parallel, CPU bound processing, = it is hard to say Guido was/is wrong. (And in 2010 it was even harder :-) ) Having thought about it on and off for a decade, I am happy with the status quo around Python. Python code is (or should be) highly maintainable code designed for execution on a single threaded VM, easily understood and amend= ed. Anyone trying to do CPU bound code using Python is "doing it wrong". Whethe= r D is the right alternative, or a language such as Chapel is better, is a moot point. Pandas is build on NumPy and so has the same parallelism properties as any other NumPy realised package. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
Oct 30 2020
On Friday, 30 October 2020 at 12:15:58 UTC, Russel Winder wrote:On Fri, 2020-10-30 at 10:12 +0000, jmh530 via Digitalmars-d wrote:I've spent much of the last 5 years writing code for trade studies and other optimisations on top of python, numpy and multiprocessing. Lately I have been working a lot with Pandas for multi-dimensional optimisation and machine learning. The slow performance of python in the glue layer between numpy, multiprocessing etc. is a non-issue. I can easily keep all 8 cores very busy running efficient C++ CFD, machine learning codes etc. using the above combination. The migration from P2 to P3 was also pretty tame. For people doing real work, it's not a big deal. Sure it was a distraction but it has its benefits, I'm glad they did it. Boring opinion, and doesn't generate ad income from blog hits, but there you go. I would like to see D have a numpy equivalent but realistically you won't duplicate the numy ecosystem here, it's too much work. And why do it? Just wrap up the numpy ecosystem from D and use it like that. Core Pandas on its own BTW isn't hard to implement IMO. It turns out it's very expressive and very useful, but not a hard thing to copy.[…]I think the point on multi-threaded Python came away as a big complaint there. Lots of mentions of the GIL or people being CPU-bound. Pandas was mentioned in this context as well.<< I haven't properly read the blog entry as yet. Sorry. >> Guido saw (cf. he and I had a long "discussion" at EuroPython 2010, there were many witnesses) GIL as absolutely fine for CPython in perpetuity, that if Pypy came up with a GIL-free VM then that would be fine. His mindset was (and I suspect may still be) that Python code was/is not about being CPU bound code, it was/is about sequential and concurrent, not parallel for performance, code. As long as there is NumPy and other PVM extensions, or use of message passing between processes, that allow for GIL-free parallel, CPU bound processing, it is hard to say Guido was/is wrong. (And in 2010 it was even harder :-) ) Pandas is build on NumPy and so has the same parallelism properties as any other NumPy realised package.
Oct 30 2020
On Friday, 30 October 2020 at 18:23:38 UTC, Abdulhaq wrote:I would like to see D have a numpy equivalent but realistically you won't duplicate the numy ecosystem here, it's too much work. And why do it? Just wrap up the numpy ecosystem from D and use it like that.I would love to see this. A project to use the functionality of Python, R, and Julia from inside a D program with little effort. William Stein did something like that with SageMath, but from a different angle. I can say the R part is simple. (Not only the parts written in R, but any underlying C, C++, or Fortran code with R bindings as well.) I wouldn't expect it to be much harder for the other languages, but since I don't work with them, I can't say. The advantage of D would be the new functionality you write in D on top of the existing functionality in those languages.
Oct 30 2020
On Friday, 30 October 2020 at 20:32:32 UTC, bachmeier wrote:On Friday, 30 October 2020 at 18:23:38 UTC, Abdulhaq wrote:We can call C++ libraries from our little language written in D and you can even write C++ inline, compile it at runtime and call it thanks to Cling. Can call python although it's not yet in master. Initially via pyd but people have their own particular versions, installs and setups so instead moving to RPC over named pipes using nanomsg. That should generalise to anything other languages we would want to call too. Serialisation and deserialisation isn't dirt cheap but the idea isn't to write inner loops in python. There's a lot more overhead doing it this way - it's not for free. But it is valuable for internal use for the problems we currently have. I have a little plugin that uses your R wrapper but it's not used by anyone yet. Time taken to a first version matters for us. The first version doesn't usually need to be fast for user code. This should allow us to access libraries without having to combine that with language choices. In time I figure we could use cling to generate declarations and light wrappers for C++ too. Robert Schadek made a beginning on Julia integration work but we haven't had time to do more than that.I would like to see D have a numpy equivalent but realistically you won't duplicate the numy ecosystem here, it's too much work. And why do it? Just wrap up the numpy ecosystem from D and use it like that.I would love to see this. A project to use the functionality of Python, R, and Julia from inside a D program with little effort. William Stein did something like that with SageMath, but from a different angle. I can say the R part is simple. (Not only the parts written in R, but any underlying C, C++, or Fortran code with R bindings as well.) I wouldn't expect it to be much harder for the other languages, but since I don't work with them, I can't say. The advantage of D would be the new functionality you write in D on top of the existing functionality in those languages.
Nov 03 2020
On Tuesday, 3 November 2020 at 22:51:14 UTC, Laeeth Isharc wrote:Robert Schadek made a beginning on Julia integration work but we haven't had time to do more than that.If you're just passing arrays and pointers between Julia and D, this is pretty simple no? Julia's ccall makes that relatively simple. You can even compile D code and call it from Julia - that should be pretty straightforward. Calling Julia from D just needs the Julia C API, which again is pretty straightforward. You'll need to convert what you need from julia.h header file.
Nov 05 2020
On Thursday, 5 November 2020 at 13:11:17 UTC, data pulverizer wrote:On Tuesday, 3 November 2020 at 22:51:14 UTC, Laeeth Isharc wrote:The question for me is if you can work with the same data structures in D, R, Python, and Julia. Can your main program be written in D, but calling out to all three for loading, transforming, and analyzing the data? I'm guessing not, but would be awesome if you could do it.Robert Schadek made a beginning on Julia integration work but we haven't had time to do more than that.If you're just passing arrays and pointers between Julia and D, this is pretty simple no? Julia's ccall makes that relatively simple. You can even compile D code and call it from Julia - that should be pretty straightforward. Calling Julia from D just needs the Julia C API, which again is pretty straightforward. You'll need to convert what you need from julia.h header file.
Nov 05 2020
On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:[snip] The question for me is if you can work with the same data structures in D, R, Python, and Julia. Can your main program be written in D, but calling out to all three for loading, transforming, and analyzing the data? I'm guessing not, but would be awesome if you could do it.Yeah, that would be pretty nice. However, I would emphasize what aberba has been saying across several different threads, which is the importance of documentation and tutorials. It's nice to have the ability to do it, but if you don't make it clear for the typical user of R/Python/Julia to figure it out, then the reach will be limited.
Nov 05 2020
On Thursday, 5 November 2020 at 19:39:43 UTC, jmh530 wrote:On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:Definitely, but you need to have the functionality first. On the homepage for embedr, I have examples showing most of the functionality: https://embedr.netlify.app/ I started writing up lecture notes but then the pandemic sent my workload through the roof.[snip] The question for me is if you can work with the same data structures in D, R, Python, and Julia. Can your main program be written in D, but calling out to all three for loading, transforming, and analyzing the data? I'm guessing not, but would be awesome if you could do it.Yeah, that would be pretty nice. However, I would emphasize what aberba has been saying across several different threads, which is the importance of documentation and tutorials. It's nice to have the ability to do it, but if you don't make it clear for the typical user of R/Python/Julia to figure it out, then the reach will be limited.
Nov 05 2020
On Thursday, 5 November 2020 at 21:57:46 UTC, bachmeier wrote:Definitely, but you need to have the functionality first. On the homepage for embedr, I have examples showing most of the functionality: https://embedr.netlify.app/ I started writing up lecture notes but then the pandemic sent my workload through the roof.Looks cool.
Nov 05 2020
On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:The question for me is if you can work with the same data structures in D, R, Python, and Julia. Can your main program be written in D, but calling out to all three for loading, transforming, and analyzing the data? I'm guessing not, but would be awesome if you could do it.It's actually a problem I've been thinking about on and off for a while but haven't gone round to actually trying to implement it. 1. If I had to do this, I would first decide on a collection of common data structures to share starting with *compositions* of R/Python/Julia style multi-dimensional arrays - contiguous arrays with basic element types with a dimensional information in form of another array. So a 2x3 double matrix is a double array of length 6 with another long array containing [2, 3]. R has externalptr, Julia can interface with pointers, as can Python. 2. Next I would use memory mapped i/o for storage. Usually memory mapped files are only accessible by one thread for security but I believe that this can be changed. For security you could use cryptographic keys to access the files between threads. So that memory written in one language can be access by another. 3. Binary file i/o for those is pretty simple, but necessary to store results and read then in any of the programs afterwards. 4. All the languages have C APIs so you'd write interfaces in D using these to call from D to the languages. All the languages can call D extern C functions in dlls directly using their versions of ccall. Another alternative to mmap is using network serialization which would be more cross-platform and fungible but this seems like it could be slow to me.
Nov 05 2020
On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer wrote:[snip] 2. Next I would use memory mapped i/o for storage. Usually memory mapped files are only accessible by one thread for security but I believe that this can be changed. For security you could use cryptographic keys to access the files between threads. So that memory written in one language can be access by another. [snip]One thread only? Sounds like GIL...
Nov 05 2020
On Thursday, 5 November 2020 at 20:30:03 UTC, jmh530 wrote:On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer wrote:Not necessarily. The cryptographic keys are used to access the file not to lock it, I believe mmap files can be secured with a password, which should be generated cryptograhically as an alternative to manually entered and stored somewhere. It protects the file from unsanctioned access. Even though the file itself will probably only take a single password rather than some synchronized rotating mechanism. However it is done, the memory will need to be protected. There should be no reason why multiple processes could not read from a file. Only writing would require a lock from other processes for obvious reasons. As I said, I haven't even begun to properly plan an implementation yet, just something that I think about from time to time.[snip] 2. Next I would use memory mapped i/o for storage. Usually memory mapped files are only accessible by one thread for security but I believe that this can be changed. For security you could use cryptographic keys to access the files between threads. So that memory written in one language can be access by another. [snip]One thread only? Sounds like GIL...
Nov 05 2020
On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer wrote:1. If I had to do this, I would first decide on a collection of common data structures to share starting with *compositions* of R/Python/Julia style multi-dimensional arrays - contiguous arrays with basic element types with a dimensional information in form of another array. So a 2x3 double matrix is a double array of length 6 with another long array containing [2, 3]. R has externalptr, Julia can interface with pointers, as can Python.R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R. It assumes it can do anything it wants with that data. Unless they've changed something (which is possible since I haven't looked into it in years) you'd have to copy any data you send to an R function. But if you're calling R maybe you don't care about that.
Nov 05 2020
On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer wrote:Yes but you make C calls in R on the pointed object. Given the choice that's how I would write any application in R. The only purpose R would serve is as an interface to the underlying dlls. I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.1. If I had to do this, I would first decide on a collection of common data structures to share starting with *compositions* of R/Python/Julia style multi-dimensional arrays - contiguous arrays with basic element types with a dimensional information in form of another array. So a 2x3 double matrix is a double array of length 6 with another long array containing [2, 3]. R has externalptr, Julia can interface with pointers, as can Python.R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R.
Nov 05 2020
On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer wrote:On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:p.s. I'm not saying that data shouldn't be accessible or returned in R, I'm just saying that externalptr is there for other pointed objects that R might need to interface with. I hope that's clear - avoiding writing production code in R is just my professional advice.R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R.Yes but you make C calls in R on the pointed object. Given the choice that's how I would write any application in R. The only purpose R would serve is as an interface to the underlying dlls. I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.
Nov 05 2020
On Thursday, 5 November 2020 at 22:46:21 UTC, data pulverizer wrote:On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer wrote:It really depends (which was one of the points of my earlier post about how broad this field is). For someone doing academic research or statistical analysis for, say, marketing purposes, the interactive code they write is the production code. They're not going to write two versions of their code. I know for web applications or finance or some other areas where the distinction matters. But as far as telling people "don't write code in R", that's simply a non-starter, and there's no reason to even begin a project like this if you're going to tell people to avoid existing libraries in either R or Python. They'll just shrug when you start talking about performance because for the vast majority of what they're doing it's not an issue.On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:p.s. I'm not saying that data shouldn't be accessible or returned in R, I'm just saying that externalptr is there for other pointed objects that R might need to interface with. I hope that's clear - avoiding writing production code in R is just my professional advice.R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R.Yes but you make C calls in R on the pointed object. Given the choice that's how I would write any application in R. The only purpose R would serve is as an interface to the underlying dlls. I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.
Nov 12 2020
On Thursday, 12 November 2020 at 19:09:48 UTC, bachmeier wrote:I know for web applications or finance or some other areas where the distinction matters.This should beI know for web applications or finance or some other areas performance matters enough that they'll distinguish between interactive and production code, and even write two versions.
Nov 12 2020
On Thursday, 12 November 2020 at 19:09:48 UTC, bachmeier wrote:On Thursday, 5 November 2020 at 22:46:21 UTC, data pulverizer wrote:You act as if I'm banning people from writing code in R - I certainly don't have the power to do that. And yes, it varies from situation to situation, as I clearly eluded to. I've done a lot of projects in R. I'm well aware that sometimes it is unavoidable for the client. What I am saying is given the choice, you should probably choose a different tool apart from "some minor instances". I've seen R go spectacularly wrong because of the type of language it is, it makes assumptions of that the programmer means which can cause epic bugs, and very often, it does it silently and it happens all the time. You can never be sure that *any* piece of R code will work as it should. It's just the nature of the language. People write it because it's easy and has "boilerplate", which is fine if you are proof of concepting or doing research and some other things, but you use it in mission critical production apps and it may well blow up in your face, and you might not even know. And that's before we get to performance, and other things blah, blah, blah.On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer wrote:It really depends (which was one of the points of my earlier post about how broad this field is). For someone doing academic research or statistical analysis for, say, marketing purposes, the interactive code they write is the production code. They're not going to write two versions of their code. I know for web applications or finance or some other areas where the distinction matters. But as far as telling people "don't write code in R", that's simply a non-starter, and there's no reason to even begin a project like this if you're going to tell people to avoid existing libraries in either R or Python. They'll just shrug when you start talking about performance because for the vast majority of what they're doing it's not an issue.... I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.... avoiding writing production code in R is just my professional advice.
Nov 13 2020
On Friday, 13 November 2020 at 21:28:30 UTC, data pulverizer wrote:You act as if I'm banning people from writing code in R - I certainly don't have the power to do that. And yes, it varies from situation to situation, as I clearly eluded to.All I'm saying is that any project that interoperates with R or any other language has to accept that the programmers you're targeting are going to write code in the other language. If for no other reason than the fact that they've already written tens of thousands of lines of code that they don't want to throw away.I've done a lot of projects in R. I'm well aware that sometimes it is unavoidable for the client. What I am saying is given the choice, you should probably choose a different tool apart from "some minor instances". I've seen R go spectacularly wrong because of the type of language it is, it makes assumptions of that the programmer means which can cause epic bugs, and very often, it does it silently and it happens all the time. You can never be sure that *any* piece of R code will work as it should. It's just the nature of the language.Well, I don't want to get into a big debate about R, but I don't for the most part agree with this view. R was designed to be used as (1) a functional language, and (2) a specialized tool to quickly solve a specialized set of problems. It's remarkably difficult to write incorrect code if you're using it as it was designed to be used, which includes pure functions and immutable data structures. It originated as a dialect of Scheme and that's the foundation everything is built on. Where I do agree with you is the type system. Not only is it a dynamic language, it has an extremely weak type system, which most likely has something to do with the fact that it originated down the hall in the same place that gave us C. I nonetheless don't agree with the conclusion that it should never be used. I've seen loads of R criticism, and it's almost always something like this. Here's one I've probably seen 50 times: x <- 1:10 j <- 4 x[2:3+j] The code returns [6 7]! It should obviously return [2 3 4 5 6 7]! R is trash! That's nonsense. Operators have precedence in every language. The critic would have gotten the "correct" answer with x[2:(3+j)]. No language is going to work if you don't understand operator precedence. Another is that x[-1] drops the first element. If you come from another language, that might not be what you expect. If you come from a language that forbids negative index values, you might even think this makes R unusable. Honestly, the vast majority of R critiques are not different from the folks that post here about how D does things wrong because it's different from C++ or Python or whatever language.
Nov 13 2020
On Friday, 13 November 2020 at 22:59:23 UTC, bachmeier wrote:Well, I don't want to get into a big debate about R ...You already did so by reading something else into what I was sayingI nonetheless don't agree with the conclusion that it should never be used.I never said this. From the beginning I said that there are some instances where R could be used.I've seen loads of R criticism, and it's almost always something like this. Here's one I've probably seen 50 times: x <- 1:10 j <- 4 x[2:3+j] The code returns [6 7]! It should obviously return [2 3 4 5 6 7]! R is trash! ... Another is that x[-1] drops the first element. If you come from another language, that might not be what you expect.This is not what I'm talking about, but there are *many* known issues with how R behaves, but you seen not to have selected any of the well known ones. Here are just a few: 1. R can suddenly decide that your character (string) should suddenly become a factor - even if you know about this it can still difficult to tell when this occurs. That's why stringsAsFactors exists. 2. R can suddenly decide that your selection in a matrix should be a vector. So if you were selecting mat[, 1:n] and n == 1 you don't get a matrix anymore, and your code will fall over. That's why drop = TRUE/FALSE exists. Can still be a difficult bug to find. I've seen this happen MANY times. The behaviour in Julia mat[:, 1:n] when n == 1 is the expected one. 3. Recycling elements instead of throwing an error - loads of bugs for this one. 4. sapply will return whatever it wants with the same argument types. One minute a matrix and another time a list and so on. With the SAME ARGUMENT TYPES! 5. Dates will suddenly unpredictably morph into numbers. cat("Today is: ", Sys.Date()). 6. The flimsy and almost unusable set of OOP tools. S3, S4, "R5" - Refererence Classes, and R6 - how many languages have that many OOP systems? Mone of which are particularly effective. These are just a few of the popular ones but there are MANY more. When your code base grows, these and many other types of issues start to have a serious impact on the stability of your application. There are places for R, but you have to be VERY careful where you put it.R was designed to be used as (1) a functional language, and (2) a specialized tool to quickly solve a specialized set of problems. It's remarkably difficult to write incorrect code if you're using it as it was designed to be used, which includes pure functions and immutable data structures.Sounds as if you're "quoting from authority here". The flow of what you've said here is misleading. If you said, "R is *weakly* 'functional-like' and has some convenience as a result", I might reluctantly accept that. But R doesn't have anywhere near enough features from functional programming to be even *used* as a functional language. How can you have so much instability built into a language can call it functional? R is the opposite of functional ethos! It has obscene permissiveness on some issues and irrational restrictiveness on others.Honestly, the vast majority of R critiques are not different from the folks that post here about how D does things wrong because it's different from C++ or Python or whatever language.This is not true. I can write code in D and be pretty sure that it does what I think it does - even before thorough testing, you win massively just with static typing. C++ too. Even before the new Python function type system it was still pretty robust for a dynamic language, now you can fairly well gaurantee some things. Julia in principle is all but a static language with dynamic-typing "tagged on". R occupies a particular space as a programming language, and it's a space I'm wary of, and I think others should be careful, cognizant of it, and use it accordingly.
Nov 13 2020
On 30.10.20 13:15, Russel Winder wrote:Having thought about it on and off for a decade, I am happy with the status quo around Python. Python code is (or should be) highly maintainable code designed for execution on a single threaded VM, easily understood and amended.Weird. If you had given me those requirements as a list and asked me to relate them to Python, I would have told you you made a list of weak points of Python.
Nov 13 2020
On Thursday, 29 October 2020 at 22:52:59 UTC, Russel Winder wrote:1. People have been trying to make Python execute faster for 30 years. In the end everyone ends up just using CPython with any and all optimisations it can get in.I think where such efforts go wrong is that they try to optimize Python instead of looking at the usage pattern that most programmers have. Most Python users never use much of the esoteric features (including concurrency, beyond generators) that Python offers. So you could easily create a simpler language with low level implemented libraries that exhibit behaviour close enough to Python for current Python users to feel at home with it.
Oct 30 2020