digitalmars.D - Pandas like features

bioinfornatics (11/11) Oct 23 2020 As a researcher in BioInformatics I use a lot python numpy pandas

Imperatorn (2/14) Oct 23 2020 2. Yes!
mw (34/44) Oct 23 2020 I think it's definitely the biggest area and opportunities for D

mw (20/42) Oct 23 2020 Let me further quote from [6]
mw (3/5) Oct 23 2020 BTW, in Python arr[start:end:step], how / if it's possible for

mw (11/16) Oct 23 2020 (Today I'm in the mood of a language historian :-)

jmh530 (3/6) Oct 25 2020 https://github.com/ShigekiKarita/tfd

bachmeier (6/18) Oct 23 2020 There is some activity in this space:

bioinfornatics (10/34) Oct 23 2020 To me a scientific library need to be HPC oriented, able

Russel Winder (17/26) Oct 24 2020 Acting somewhat as "Devil's Advocate"=E2=80=A6

bioinfornatics (5/24) Oct 24 2020 Maybe, anyway since years D search the killer app. Really I

Russel Winder (31/35) Oct 24 2020 I agree that D could be the replacement for Python for many scientific m...

Andre Pany (12/40) Oct 24 2020 Just expecting someone else is doing the work will very likely

Paulo Pinto (10/38) Oct 27 2020 D is already quite late for the party.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/6) Oct 27 2020 Right, but I don't think being CPU centric will work out in that
mw (4/5) Oct 29 2020 Is there a saying: better late than never :-)

mw (3/9) Oct 29 2020 Of course, this community have to make the effort to make it

9il (15/29) Oct 24 2020 Magpie was another attempt to create Data Frame in D using the

jmh530 (21/31) Oct 25 2020 I think, unfortunately, it is not always easy to communicate why

jmh530 (35/57) Oct 26 2020 Adding a little more...

data pulverizer (10/22) Oct 24 2020 I think the answer to questions like this is often money. You
James Blachly (11/27) Oct 24 2020 Aside / self-promotion:

glis-glis (21/32) Oct 27 2020 Self-promoting as well :-)

jmh530 (2/7) Oct 25 2020 I'm certainly interested in it, but doing it well takes time.

bachmeier (23/34) Oct 25 2020 Over time, I've come to three conclusions on this topic. I don't

jmh530 (6/28) Oct 25 2020 I have no issue with calling libraries from other languages

Paul Backus (3/8) Oct 25 2020 Aren't numpy and scipy themselves largely built on "calling

bachmeier (3/12) Oct 25 2020 Exactly. And R (actually S at that time) started as a glue

jmh530 (5/17) Oct 29 2020 Just ran across a Hacker News thread about pyston that relates to

Russel Winder (30/36) Oct 29 2020 I only quickly skimmed the blog page, so this is a first reaction. I sha...

jmh530 (4/17) Oct 30 2020 I think the point on multi-threaded Python came away as a big

Russel Winder (34/38) Oct 30 2020 << I haven't properly read the blog entry as yet. Sorry. >>

Abdulhaq (20/40) Oct 30 2020 I've spent much of the last 5 years writing code for trade

bachmeier (11/15) Oct 30 2020 I would love to see this. A project to use the functionality of

Laeeth Isharc (23/38) Nov 03 2020 We can call C++ libraries from our little language written in D

data pulverizer (7/9) Nov 05 2020 If you're just passing arrays and pointers between Julia and D,

bachmeier (7/18) Nov 05 2020 The question for me is if you can work with the same data

jmh530 (7/13) Nov 05 2020 Yeah, that would be pretty nice. However, I would emphasize what

bachmeier (6/20) Nov 05 2020 Definitely, but you need to have the functionality first. On the

data pulverizer (2/7) Nov 05 2020 Looks cool.

data pulverizer (24/29) Nov 05 2020 It's actually a problem I've been thinking about on and off for a

jmh530 (3/11) Nov 05 2020 One thread only? Sounds like GIL...

data pulverizer (15/28) Nov 05 2020 Not necessarily. The cryptographic keys are used to access the

bachmeier (9/17) Nov 05 2020 R has externalptr, but to my knowledge, that's only for

data pulverizer (7/20) Nov 05 2020 Yes but you make C calls in R on the pointed object. Given the

data pulverizer (7/17) Nov 05 2020 p.s. I'm not saying that data shouldn't be accessible or returned

bachmeier (15/33) Nov 12 2020 It really depends (which was one of the points of my earlier post

bachmeier (2/7) Nov 12 2020
data pulverizer (18/41) Nov 13 2020 You act as if I'm banning people from writing code in R - I

bachmeier (37/49) Nov 13 2020 All I'm saying is that any project that interoperates with R or

data pulverizer (51/71) Nov 13 2020 You already did so by reading something else into what I was

Timon Gehr (4/7) Nov 13 2020 Weird. If you had given me those requirements as a list and asked me to

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (9/12) Oct 30 2020 I think where such efforts go wrong is that they try to optimize

bioinfornatics <bioinfornatics fedoraproject.org> writes:

As a researcher in BioInformatics I use a lot python numpy pandas 
and scipy. But I am bored by the slowness of python even with 
cpython code thanks to the GIL and un-optimized tail recursion.

So I thinks really that D could play a big role in this field 
with MIR and dcompute.

1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

2/ does the scientific computing field is something that D 
language want to grow ?

Thanks

Best regards

Oct 23 2020

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards

2. Yes!

Oct 23 2020

mw <mingwu gmail.com> writes:

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

I think it's definitely the biggest area and opportunities for D 
to become more popular. GIL, lack of performance, and huge memory 
bloat are such pain in Python.

Probably the best way to move forward is to provide libmir as a 
Numpy/Pandas *drop-in* replacement. (And I've suggested to rename 
Mir as NumD from a marketing / promotional perspective).

For the time being, from the language/lib user's perspective, we 
can just use D/libmir to pre-process the data, and maybe save the 
result as csv/npz for further processing (by ... Python). Build 
or wrap something like tensorflow, I think will need much more 
resource than the D community current have, also I'm not sure if 
it worth the effort.


And from the language perspective, maybe D should adopt 
Python/Numpy's array indexing syntax, specifically:

1) use Python's arr[start:end], in addition to D's arr[start..end]

2) and also allow negative index, instead of [$-1]. (This $ is an 
improvement of Java/C++'s arr[arr.length -1], but still is less 
convenient than Python’s negative index syntax).

Python gained such popularity in scientific computing in the past 
~10 years is not an accident, actually Guido made that happen by 
extending Python's syntax:

https://en.wikipedia.org/wiki/NumPy#History

"""
The Python programming language was not originally designed for 
numerical computing, but attracted the attention of the 
scientific and engineering community early on. In 1995 the 
special interest group (SIG) matrix-sig was founded with the aim 
of defining an array computing package; among its members was 
Python designer and maintainer Guido van Rossum, who extended 
Python's syntax (in particular the indexing syntax) to make array 
computing easier.[6]
"""

Maybe Walter should join one of such SIGs as well :-)

Oct 23 2020

mw <mingwu gmail.com> writes:

On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
 And from the language perspective, maybe D should adopt 
 Python/Numpy's array indexing syntax, specifically:

 1) use Python's arr[start:end], in addition to D's 
 arr[start..end]

 2) and also allow negative index, instead of [$-1]. (This $ is 
 an improvement of Java/C++'s arr[arr.length -1], but still is 
 less convenient than Python’s negative index syntax).

 Python gained such popularity in scientific computing in the 
 past ~10 years is not an accident, actually Guido made that 
 happen by extending Python's syntax:

 https://en.wikipedia.org/wiki/NumPy#History

 """
 The Python programming language was not originally designed for 
 numerical computing, but attracted the attention of the 
 scientific and engineering community early on. In 1995 the 
 special interest group (SIG) matrix-sig was founded with the 
 aim of defining an array computing package; among its members 
 was Python designer and maintainer Guido van Rossum, who 
 extended Python's syntax (in particular the indexing syntax) to 
 make array computing easier.[6]
 """

 Maybe Walter should join one of such SIGs as well :-)

Let me further quote from [6]

"""
During these early years, there was considerable interaction 
between the standard and scientific Python communities. In fact, 
Guido van Rossum, Python's Benevolent Dictator For Life (BDFL), 
was an active member of the matrix-sig. This close interaction 
resulted in Python gaining new features and syntax specifically 
needed by the scientific Python community. While there were 
miscellaneous changes, such as the addition of complex numbers, 
many changes focused on providing a more succinct and easier to 
read syntax for array manipulation. For instance, the parenthesis 
around tuples were made optional so that array elements could be 
accessed through, for example, a[0,1] instead of a[(0,1)]. The 
slice syntax gained a step argument— a[::2] instead of just a[:], 
for example—and an ellipsis operator, which is useful when 
dealing with multidimensional data structures.
"""


[6] 
https://www.computer.org/csdl/magazine/cs/2011/02/mcs2011020009/13rRUx0xPMx

Oct 23 2020

mw <mingwu gmail.com> writes:

On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
 1) use Python's arr[start:end], in addition to D's 
 arr[start..end]

BTW, in Python arr[start:end:step], how / if it's possible for 
this `step` in now D?

Oct 23 2020

mw <mingwu gmail.com> writes:

On Friday, 23 October 2020 at 22:53:29 UTC, mw wrote:
 On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
 1) use Python's arr[start:end], in addition to D's 
 arr[start..end]

 BTW, in Python arr[start:end:step], how / if it's possible for 
 this `step` in now D?


(Today I'm in the mood of a language historian :-)

Some of Guido's early discussion of Python array index:

Slices
https://mail.python.org/pipermail/matrix-sig/1996-April/000553.html

Pseudo Indices
https://mail.python.org/pipermail/matrix-sig/1996-January/000331.html

Mutli-dimensional indexing and other comments
https://mail.python.org/pipermail/matrix-sig/1995-October/000077.html

A problem with slicing
https://mail.python.org/pipermail/matrix-sig/1995-September/000042.html

Oct 23 2020

jmh530 <john.michael.hall gmail.com> writes:

On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
 [snip] Build or wrap something like tensorflow, I think will 
 need much more resource than the D community current have, also 
 I'm not sure if it worth the effort.

https://github.com/ShigekiKarita/tfd

The author of that has some other useful libraries.

Oct 25 2020

bachmeier <no spam.net> writes:

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards

There is some activity in this space:
https://code.dlang.org/?sort=updated&category=library.scientific

This project doesn't seem too active, but it was an earlier 
attempt:
http://dlangscience.github.io/

Oct 23 2020

bioinfornatics <bioinfornatics fedoraproject.org> writes:

On Friday, 23 October 2020 at 22:48:16 UTC, bachmeier wrote:
 On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics 
 wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python 
 even with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards

 There is some activity in this space:
 https://code.dlang.org/?sort=updated&category=library.scientific

 This project doesn't seem too active, but it was an earlier 
 attempt:
 http://dlangscience.github.io/

To me a scientific library need to be HPC oriented, able
- to perform // computation on CPU or GPU
- to use divide and conquer strategy in order to compute over 
multinode
- to have dataframe features
- to have scipy features
A such library would be awesome as at these time python slowness 
become more and more important as data grow exponentially year 
after year

Oct 23 2020

Russel Winder <russel winder.org.uk> writes:

On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via Digitalmars-d wrote:
[=E2=80=A6]
 To me a scientific library need to be HPC oriented, able
 - to perform // computation on CPU or GPU
 - to use divide and conquer strategy in order to compute over=20
 multinode
 - to have dataframe features
 - to have scipy features
 A such library would be awesome as at these time python slowness=20
 become more and more important as data grow exponentially year=20
 after year

Acting somewhat as "Devil's Advocate"=E2=80=A6

Why not just use Chapel https://chapel-lang.org/ =E2=80=93 it is a programm=
ing
language designed to run in parallel contexts and has an awful lot of the
stuff other (invariable sequential, cf. C++, D, Rust) programming language
have trouble providing.=20

I am not sure Chapel has pandas style data frames explicitly but I'll bet
something equivalent is already in there.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Oct 24 2020

bioinfornatics <bioinfornatics fedoraproject.org> writes:

On Saturday, 24 October 2020 at 09:29:46 UTC, Russel Winder wrote:
 On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via 
 Digitalmars-d wrote: […]
 To me a scientific library need to be HPC oriented, able
 - to perform // computation on CPU or GPU
 - to use divide and conquer strategy in order to compute over
 multinode
 - to have dataframe features
 - to have scipy features
 A such library would be awesome as at these time python 
 slowness
 become more and more important as data grow exponentially year
 after year

 Acting somewhat as "Devil's Advocate"…

 Why not just use Chapel https://chapel-lang.org/ – it is a 
 programming language designed to run in parallel contexts and 
 has an awful lot of the stuff other (invariable sequential, cf. 
 C++, D, Rust) programming language have trouble providing.

 I am not sure Chapel has pandas style data frames explicitly 
 but I'll bet something equivalent is already in there.

Maybe, anyway since years D search the killer app. Really I 
thanks thisr area it is perfect for D.
Data Business analysis is so important in this day in science, 
economy and other D could be a good choice.

Oct 24 2020

Russel Winder <russel winder.org.uk> writes:

On Sat, 2020-10-24 at 11:05 +0000, bioinfornatics via Digitalmars-d wrote:

[=E2=80=A6]

 Maybe, anyway since years D search the killer app. Really I=20
 thanks thisr area it is perfect for D.
 Data Business analysis is so important in this day in science,=20
 economy and other D could be a good choice.

I agree that D could be the replacement for Python for many scientific mili=
eu:
bioinformatics, astronomy, to name but two obvious ones. The issue though i=
s
that the Python language over NumPy and associated communities captured the
moment years ago and many people contributed many extensions to a few
libraries and packages.

Traditionally (as it were) bioinformatics and astronomy have emphasised
exploration over computation, and often offloaded computation to C or C++
realised frameworks. This has reinforced prioritising code comprehension an=
d
evolution over computation speed, thus militating in favour of Python since
the packages were there.

Whilst D could replace Python, the question is will it and the answer is
determined by who would write the code. Sadly history tells us this will le=
ad
to a (very) long (divergent) thread and result in no-one actually doing
anything. I would like to be proved wrong.

The possible upside is that all the major Python packages started as one or
two people creating something that others then joined in with and turned in=
to
the de facto standard. Might this finally happen in the D community?

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Oct 24 2020

Andre Pany <andre s-e-a-p.de> writes:

On Saturday, 24 October 2020 at 12:08:00 UTC, Russel Winder wrote:
 On Sat, 2020-10-24 at 11:05 +0000, bioinfornatics via 
 Digitalmars-d wrote:

 […]

 Maybe, anyway since years D search the killer app. Really I
 thanks thisr area it is perfect for D.
 Data Business analysis is so important in this day in science,
 economy and other D could be a good choice.

 I agree that D could be the replacement for Python for many 
 scientific milieu: bioinformatics, astronomy, to name but two 
 obvious ones. The issue though is that the Python language over 
 NumPy and associated communities captured the moment years ago 
 and many people contributed many extensions to a few libraries 
 and packages.

 Traditionally (as it were) bioinformatics and astronomy have 
 emphasised exploration over computation, and often offloaded 
 computation to C or C++ realised frameworks. This has 
 reinforced prioritising code comprehension and evolution over 
 computation speed, thus militating in favour of Python since 
 the packages were there.

 Whilst D could replace Python, the question is will it and the 
 answer is determined by who would write the code. Sadly history 
 tells us this will lead to a (very) long (divergent) thread and 
 result in no-one actually doing anything. I would like to be 
 proved wrong.

 The possible upside is that all the major Python packages 
 started as one or two people creating something that others 
 then joined in with and turned into the de facto standard. 
 Might this finally happen in the D community?

Just expecting someone else is doing the work will very likely 
not happen at this point in time. But you can actually increase 
the chances it will happen in future.
My opinion: Neither a language feature X nor a specific library Y 
is missing at the moment but the community needs to do massive 
advertisements for the D Programming Language. The bigger the 
community will become, the more libraries will be created.

Therefore you can start here by advising D and its strengths at 
every channel which make sense.

Kind regards
Andre

Oct 24 2020

Paulo Pinto <pjmlp progtools.org> writes:

On Saturday, 24 October 2020 at 11:05:48 UTC, bioinfornatics 
wrote:
 On Saturday, 24 October 2020 at 09:29:46 UTC, Russel Winder 
 wrote:
 On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via 
 Digitalmars-d wrote: […]
 To me a scientific library need to be HPC oriented, able
 - to perform // computation on CPU or GPU
 - to use divide and conquer strategy in order to compute over
 multinode
 - to have dataframe features
 - to have scipy features
 A such library would be awesome as at these time python 
 slowness
 become more and more important as data grow exponentially year
 after year

 Acting somewhat as "Devil's Advocate"…

 Why not just use Chapel https://chapel-lang.org/ – it is a 
 programming language designed to run in parallel contexts and 
 has an awful lot of the stuff other (invariable sequential, 
 cf. C++, D, Rust) programming language have trouble providing.

 I am not sure Chapel has pandas style data frames explicitly 
 but I'll bet something equivalent is already in there.

 Maybe, anyway since years D search the killer app. Really I 
 thanks thisr area it is perfect for D.
 Data Business analysis is so important in this day in science, 
 economy and other D could be a good choice.

D is already quite late for the party.


getting better in this domain.



https://docs.microsoft.com/en-us/dotnet/csharp/tutorials/ranges-indexes

While support for working with Spark just went 1.0 this week,

https://dotnet.microsoft.com/apps/data/spark

D would do better to see how to interoperate with existing stuff.

Oct 27 2020

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 27 October 2020 at 10:23:52 UTC, Paulo Pinto wrote:
 D is already quite late for the party.

Right, but I don't think being CPU centric will work out in that 
domain anyway. You need to use Vulkan and Metal, aim for future 
hardware and do it really well. The market is in the future, not 
here, right now.

Oct 27 2020

mw <mingwu gmail.com> writes:

On Tuesday, 27 October 2020 at 10:23:52 UTC, Paulo Pinto wrote:
 D is already quite late for the party.


Is there a saying: better late than never :-)


I still think D has chance, if we really want to catch-up, to 
overthrown Python

Oct 29 2020

mw <mingwu gmail.com> writes:

On Friday, 30 October 2020 at 02:03:10 UTC, mw wrote:
 On Tuesday, 27 October 2020 at 10:23:52 UTC, Paulo Pinto wrote:
 D is already quite late for the party.


 Is there a saying: better late than never :-)


 I still think D has chance, if we really want to catch-up, to 
 overthrown Python

Of course, this community have to make the effort to make it 
happen.

Oct 29 2020

9il <ilyayaroshenko gmail.com> writes:

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards

Magpie was another attempt to create Data Frame in D using the 
architecture patterns from Python/R. It has never been part of 
Mir infrastructure. DataFrame in D should be a little different 
from what people have in scripting languages. Otherwise, it is 
will be not good enough.

 2/ does the scientific computing field is something that D 
 language want to grow ?

I would like to say Yes. But the reality is that the answer is 
that D isn't going to grow. The sci related answer is that the 
members of DLF either ignore my requests or even reject related 
work for the compiler because they "don't see much of a 
difference"

The following work is what really will be good for Sci D and 
especially for DataFrame.

https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1023.md
https://github.com/dlang/dmd/pull/9778

Oct 24 2020

jmh530 <john.michael.hall gmail.com> writes:

On Saturday, 24 October 2020 at 12:26:21 UTC, 9il wrote:
 [snip]

 I would like to say Yes. But the reality is that the answer is 
 that D isn't going to grow. The sci related answer is that the 
 members of DLF either ignore my requests or even reject related 
 work for the compiler because they "don't see much of a 
 difference"

 The following work is what really will be good for Sci D and 
 especially for DataFrame.

 https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1023.md
 https://github.com/dlang/dmd/pull/9778

I think, unfortunately, it is not always easy to communicate why 
these changes are important or valuable. But, while you weren't 
able to convince Atila of the value of the proposed features, I 
also don't think you were ignored either, at least in that case.

That being said, I never understood Atila's argument about this 
feature as being a light version of Rust traits (I'm not aware of 
how Haskell's typeclasses work). A Rust trait is a list of 
functions that must be implemented by a type. However, you can 
also statically dispatch based on the trait (you can dynamically 
as well, but I imagine you would prefer not to be able to do 
that). You conceivably would be more interested in the static 
dispatch part of it. It's not really about a 
PackedUpperTriangularMatrix requiring specific functions to be a 
PackedUpperTriangularMatrix rather than a Slice. All it takes is 
the right iterator type. So it's more about how the specific type 
is specialized (giving a specific iterator to 
PackedUpperTriangularMatrix).

There might be a way to create a feature that's not Rust traits 
that does what you want and is a more general feature than this 
type of template alias deduction.

Oct 25 2020

jmh530 <john.michael.hall gmail.com> writes:

On Sunday, 25 October 2020 at 21:30:59 UTC, jmh530 wrote:
 [snip]

 I think, unfortunately, it is not always easy to communicate 
 why these changes are important or valuable. But, while you 
 weren't able to convince Atila of the value of the proposed 
 features, I also don't think you were ignored either, at least 
 in that case.

 That being said, I never understood Atila's argument about this 
 feature as being a light version of Rust traits (I'm not aware 
 of how Haskell's typeclasses work). A Rust trait is a list of 
 functions that must be implemented by a type. However, you can 
 also statically dispatch based on the trait (you can 
 dynamically as well, but I imagine you would prefer not to be 
 able to do that). You conceivably would be more interested in 
 the static dispatch part of it. It's not really about a 
 PackedUpperTriangularMatrix requiring specific functions to be 
 a PackedUpperTriangularMatrix rather than a Slice. All it takes 
 is the right iterator type. So it's more about how the specific 
 type is specialized (giving a specific iterator to 
 PackedUpperTriangularMatrix).

 There might be a way to create a feature that's not Rust traits 
 that does what you want and is a more general feature than this 
 type of template alias deduction.

Adding a little more...

The situation that this issue is trying to address is something 
like a template
T!(U!V)
that you want to be able to use like
W!V.

I can't help but think that concepts could help with this 
situation. Adapting the C++20 syntax, consider this simplest 
implementation:

template PackedUpperTriangularMatrix(T)
{
     concept PackedUpperTriangularMatrix = is(T: 
Slice!(StairsIterator!(U*, "-")), U);
}

Assuming the same functionality as in C++20, you could use this 
in a function as in
void foo(PackedUpperTriangularMatrix x) {}

However, if you then want to place any constraints on the U 
above, then you're a bit SOL.

To really get the functionality working, you would need a generic 
kind of concept, where the concept you are defining is itself 
generic.
As far as I can tell, you can't do this with C++20, but I would 
imagine the syntax adapted from above might be something like

template PackedUpperTriangularMatrix(T)
{
     concept PackedUpperTriangularMatrix(U) = is(T: 
Slice!(StairsIterator!(U*, "-")));
}

and would enable you to write the function as
void foo(T)(PackedUpperTriangularMatrix!T x) {}

In other words, if D had the ability to define concepts that are 
also generic themselves, then it would enable the functionality 
you want also.

Oct 26 2020

data pulverizer <data.pulverizer gmail.com> writes:

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards

I think the answer to questions like this is often money. You 
need to hire someone to write the code that does this. In the 
world of research this means a grant to fund it. If you think 
there is real merit in the D programming language in your field, 
the best thing to do is to make the case in form of a grant 
application to pay a researcher to write the necessary code. This 
is what languages like R, Python, and Julia do. They are flush 
with cash because people write grant applications for PhD 
students and researchers to build libraries for those languages.

Oct 24 2020

James Blachly <james.blachly gmail.com> writes:

On 10/23/20 3:31 PM, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy pandas and 
 scipy. But I am bored by the slowness of python even with cpython code 
 thanks to the GIL and un-optimized tail recursion.
 
 So I thinks really that D could play a big role in this field with MIR 
 and dcompute.
 
 1/ what is the state of Magpie which was a GSoC 2019:
   - Mir Data Analysis and Processing Library
 
 2/ does the scientific computing field is something that D language want 
 to grow ?
 
 Thanks
 
 Best regards

Aside / self-promotion:
We use D extensively in our bioinformatics / computational biology program.

Check out https://github.com/blachlylab/dhtslib/

Also just published a HTS/NGS tool written in D:

https://academic.oup.com/nargab/article/2/4/lqaa070/5917298

I should probably do an `announce` forum post.

Currently trying to decide whether to extend Magpie or roll our own 
(adding only features that are needed). I've also enjoyed using Mir 
ndslice in a couple of test projects, but as you know that is not really 
a dataframe.

Oct 24 2020

glis-glis <andreas.fueglistaler gmail.com> writes:

On Saturday, 24 October 2020 at 16:43:45 UTC, James Blachly wrote:
 Aside / self-promotion:
 We use D extensively in our bioinformatics / computational 
 biology program.

 Check out https://github.com/blachlylab/dhtslib/

 Also just published a HTS/NGS tool written in D:

 https://academic.oup.com/nargab/article/2/4/lqaa070/5917298

 I should probably do an `announce` forum post.

 Currently trying to decide whether to extend Magpie or roll our 
 own (adding only features that are needed). I've also enjoyed 
 using Mir ndslice in a couple of test projects, but as you know 
 that is not really a dataframe.

Self-promoting as well :-)
I also started to use D in the domain of biophysics. Big 
computations are still done with our C++ code, but I'm 
translating old python pre- and posttreatment scripts into D, 
getting a nice speedup:

https://github.com/glis-glis/biophysics

Please note that all I know about D is from tour.dlang.org and 
I'm often rather lazy concerning comments (which obviously comes 
back biting me rather often when I don't understand my own code 2 
months later...).

What is sometimes lacking is information and documentation of 
what exists for D. Here it states that D can work with GPUs:
https://dlang.org/areas-of-d-usage.html#gpu
but all you get is a link to a presentation of 2016.

I needed a way to calculate the principal component analysis. Mir 
can't do it, but I found out Lubeck can. So, I tried the 
Lubeck-example of the dlang tour, which didn't work. I was able 
to correct it, and did a pull-request so the example is now 
working, but most newcomer woud probably just say "Ok, it's 
broken, let's get back to Numpy".

Oct 27 2020

jmh530 <john.michael.hall gmail.com> writes:

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 [snip]

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards

I'm certainly interested in it, but doing it well takes time.

Oct 25 2020

bachmeier <no spam.net> writes:

On Sunday, 25 October 2020 at 20:30:58 UTC, jmh530 wrote:
 On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics 
 wrote:
 [snip]

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards

 I'm certainly interested in it, but doing it well takes time.

Over time, I've come to three conclusions on this topic. I don't 
know that time is the issue.

1. This community seems to have NIH syndrome. A lot of users are 
averse to reuising the functionality provided by other languages. 
I find that downright weird given that one of the selling points 
of D is its ability to easily interoperate with other languages. 
It makes sense to *extend* existing projects in other languages 
using D. The idea of rewriting millions of lines of code for no 
benefit other than just saying it's written in D is obviously 
pointless, so there's no motivation to do it.

2. Scientific computing is a big field. In terms of things you'd 
need to be "complete", you'd have to write maybe ten times as 
much code as you would to have a complete web development 
offering. It also requires incredible amounts of expertise. 
Statistics, economics, physics, math, chemistry, biology, and on 
and on are all areas that individually require a great deal of 
specialized knowledge in addition to time. For some things, 
performance is the most important property, including use of the 
GPU. That's not simple.

3. D's syntax is okay, but it's not flexible enough to express 
eveything you need to work comfortably. A DSL or similar might be 
necessary.

Oct 25 2020

jmh530 <john.michael.hall gmail.com> writes:

On Sunday, 25 October 2020 at 23:30:10 UTC, bachmeier wrote:
 [snip]

 1. This community seems to have NIH syndrome. A lot of users 
 are averse to reuising the functionality provided by other 
 languages. I find that downright weird given that one of the 
 selling points of D is its ability to easily interoperate with 
 other languages. It makes sense to *extend* existing projects 
 in other languages using D. The idea of rewriting millions of 
 lines of code for no benefit other than just saying it's 
 written in D is obviously pointless, so there's no motivation 
 to do it.

 2. Scientific computing is a big field. In terms of things 
 you'd need to be "complete", you'd have to write maybe ten 
 times as much code as you would to have a complete web 
 development offering. It also requires incredible amounts of 
 expertise. Statistics, economics, physics, math, chemistry, 
 biology, and on and on are all areas that individually require 
 a great deal of specialized knowledge in addition to time. For 
 some things, performance is the most important property, 
 including use of the GPU. That's not simple.

 3. D's syntax is okay, but it's not flexible enough to express 
 eveything you need to work comfortably. A DSL or similar might 
 be necessary.

I have no issue with calling libraries from other languages 
(particularly C) if it's something that is too much work or 
whatever to do myself. But I think that it's helpful to have a 
base level of functionality, akin to Numpy/Scipy, that a new 
person could come in to accomplish a lot.

Oct 25 2020

Paul Backus <snarwin gmail.com> writes:

On Monday, 26 October 2020 at 01:42:42 UTC, jmh530 wrote:
 I have no issue with calling libraries from other languages 
 (particularly C) if it's something that is too much work or 
 whatever to do myself. But I think that it's helpful to have a 
 base level of functionality, akin to Numpy/Scipy, that a new 
 person could come in to accomplish a lot.

Aren't numpy and scipy themselves largely built on "calling 
libraries from other languages"? Specifically, C and Fortran.

Oct 25 2020

bachmeier <no spam.net> writes:

On Monday, 26 October 2020 at 01:55:46 UTC, Paul Backus wrote:
 On Monday, 26 October 2020 at 01:42:42 UTC, jmh530 wrote:
 I have no issue with calling libraries from other languages 
 (particularly C) if it's something that is too much work or 
 whatever to do myself. But I think that it's helpful to have a 
 base level of functionality, akin to Numpy/Scipy, that a new 
 person could come in to accomplish a lot.

 Aren't numpy and scipy themselves largely built on "calling 
 libraries from other languages"? Specifically, C and Fortran.

Exactly. And R (actually S at that time) started as a glue 
language for libraries in those languages.

Oct 25 2020

jmh530 <john.michael.hall gmail.com> writes:

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
 As a researcher in BioInformatics I use a lot python numpy 
 pandas and scipy. But I am bored by the slowness of python even 
 with cpython code thanks to the GIL and un-optimized tail 
 recursion.

 So I thinks really that D could play a big role in this field 
 with MIR and dcompute.

 1/ what is the state of Magpie which was a GSoC 2019:
  - Mir Data Analysis and Processing Library

 2/ does the scientific computing field is something that D 
 language want to grow ?

 Thanks

 Best regards

Just ran across a Hacker News thread about pyston that relates to 
some of the discussion here [1]. It seems there's still a demand 
for alternatives to python for data science.

[1] https://news.ycombinator.com/item?id=24921790

Oct 29 2020

Russel Winder <russel winder.org.uk> writes:

On Thu, 2020-10-29 at 10:23 +0000, jmh530 via Digitalmars-d wrote:
=20

[=E2=80=A6]
 Just ran across a Hacker News thread about pyston that relates to=20
 some of the discussion here [1]. It seems there's still a demand=20
 for alternatives to python for data science.
=20
 [1] https://news.ycombinator.com/item?id=3D24921790

I only quickly skimmed the blog page, so this is a first reaction. I shall
read the material more carefully tomorrow and send an update.

1. People have been trying to make Python execute faster for 30 years. In t=
he
end everyone ends up just using CPython with any and all optimisations it c=
an
get in.

2. Python is slow, and fundamentally single threaded. Attempts to make Pyth=
on
multi-threaded seem to fall by the wayside. The micro-benchmarks seem to
indicate Pyston is just a slightly faster Python and thus nothing really to
write home about =E2=80=93 yes even a headline figure of 20% is nothing to =
write home
about!

3. If you want computational performance from Python code, you use C, C++(,=
 or
D) extensions. In particular you use NumPy. I would guess that almost all
bioinformatics, astronomy, machine learning, AI, data science stuff uses
NumPy. Python execution performance is irrelevant compared to NumPy code
performance.

I am happy to be shown to be wrong, but I suspect not.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Oct 29 2020

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 29 October 2020 at 22:52:59 UTC, Russel Winder wrote:
 [snip]

 I only quickly skimmed the blog page, so this is a first 
 reaction. I shall read the material more carefully tomorrow and 
 send an update.

 1. People have been trying to make Python execute faster for 30 
 years. In the end everyone ends up just using CPython with any 
 and all optimisations it can get in.

 2. Python is slow, and fundamentally single threaded. Attempts 
 to make Python multi-threaded seem to fall by the wayside. The 
 micro-benchmarks seem to indicate Pyston is just a slightly 
 faster Python and thus nothing really to write home about – yes 
 even a headline figure of 20% is nothing to write home about!

 [snip]

I think the point on multi-threaded Python came away as a big 
complaint there. Lots of mentions of the GIL or people being 
CPU-bound. Pandas was mentioned in this context as well.

Oct 30 2020

Russel Winder <russel winder.org.uk> writes:

On Fri, 2020-10-30 at 10:12 +0000, jmh530 via Digitalmars-d wrote:
=20

[=E2=80=A6]
 I think the point on multi-threaded Python came away as a big=20
 complaint there. Lots of mentions of the GIL or people being=20
 CPU-bound. Pandas was mentioned in this context as well.

<< I haven't properly read the blog entry as yet. Sorry. >>

Guido saw (cf. he and I had a long "discussion" at EuroPython 2010, there w=
ere
many witnesses) GIL as absolutely fine for CPython in perpetuity, that if P=
ypy
came up with a GIL-free VM then that would be fine. His mindset was (and I
suspect may still be) that Python code was/is not about being CPU bound cod=
e,
it was/is about sequential and concurrent, not parallel for performance, co=
de.
As long as there is NumPy and other PVM extensions, or use of message passi=
ng
between processes, that allow for GIL-free parallel, CPU bound processing, =
it
is hard to say Guido was/is wrong. (And in 2010 it was even harder :-) )

Having thought about it on and off for a decade, I am happy with the status
quo around Python. Python code is (or should be) highly maintainable code
designed for execution on a single threaded VM, easily understood and amend=
ed.
Anyone trying to do CPU bound code using Python is "doing it wrong". Whethe=
r D
is the right alternative, or a language such as Chapel is better, is a moot
point.

Pandas is build on NumPy and so has the same parallelism properties as any
other NumPy realised package.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Oct 30 2020

Abdulhaq <alynch4047 gmail.com> writes:

On Friday, 30 October 2020 at 12:15:58 UTC, Russel Winder wrote:
 On Fri, 2020-10-30 at 10:12 +0000, jmh530 via Digitalmars-d 
 wrote:
 

 […]
 I think the point on multi-threaded Python came away as a big 
 complaint there. Lots of mentions of the GIL or people being 
 CPU-bound. Pandas was mentioned in this context as well.

 << I haven't properly read the blog entry as yet. Sorry. >>

 Guido saw (cf. he and I had a long "discussion" at EuroPython 
 2010, there were many witnesses) GIL as absolutely fine for 
 CPython in perpetuity, that if Pypy came up with a GIL-free VM 
 then that would be fine. His mindset was (and I suspect may 
 still be) that Python code was/is not about being CPU bound 
 code, it was/is about sequential and concurrent, not parallel 
 for performance, code. As long as there is NumPy and other PVM 
 extensions, or use of message passing between processes, that 
 allow for GIL-free parallel, CPU bound processing, it is hard 
 to say Guido was/is wrong. (And in 2010 it was even harder :-) )

 Pandas is build on NumPy and so has the same parallelism 
 properties as any other NumPy realised package.

I've spent much of the last 5 years writing code for trade 
studies and other optimisations on top of python, numpy and 
multiprocessing. Lately I have been working a lot with Pandas for 
multi-dimensional optimisation and machine learning.

The slow performance of python in the glue layer between numpy, 
multiprocessing etc. is a non-issue. I can easily keep all 8 
cores very busy running efficient C++ CFD, machine learning codes 
etc. using the above combination.

The migration from P2 to P3 was also pretty tame. For people 
doing real work, it's not a big deal. Sure it was a distraction 
but it has its benefits, I'm glad they did it. Boring opinion, 
and doesn't generate ad income from blog hits, but there you go.

I would like to see D have a numpy equivalent but realistically 
you won't duplicate the numy ecosystem here, it's too much work. 
And why do it? Just wrap up the numpy ecosystem from D and use it 
like that.

Core Pandas on its own BTW isn't hard to implement IMO. It turns 
out it's very expressive and very useful, but not a hard thing to 
copy.

Oct 30 2020

bachmeier <no spam.net> writes:

On Friday, 30 October 2020 at 18:23:38 UTC, Abdulhaq wrote:

 I would like to see D have a numpy equivalent but realistically 
 you won't duplicate the numy ecosystem here, it's too much 
 work. And why do it? Just wrap up the numpy ecosystem from D 
 and use it like that.

I would love to see this. A project to use the functionality of 
Python, R, and Julia from inside a D program with little effort. 
William Stein did something like that with SageMath, but from a 
different angle. I can say the R part is simple. (Not only the 
parts written in R, but any underlying C, C++, or Fortran code 
with R bindings as well.) I wouldn't expect it to be much harder 
for the other languages, but since I don't work with them, I 
can't say. The advantage of D would be the new functionality you 
write in D on top of the existing functionality in those 
languages.

Oct 30 2020

Laeeth Isharc <laeeth laeeth.com> writes:

On Friday, 30 October 2020 at 20:32:32 UTC, bachmeier wrote:
 On Friday, 30 October 2020 at 18:23:38 UTC, Abdulhaq wrote:

 I would like to see D have a numpy equivalent but 
 realistically you won't duplicate the numy ecosystem here, 
 it's too much work. And why do it? Just wrap up the numpy 
 ecosystem from D and use it like that.

 I would love to see this. A project to use the functionality of 
 Python, R, and Julia from inside a D program with little 
 effort. William Stein did something like that with SageMath, 
 but from a different angle. I can say the R part is simple. 
 (Not only the parts written in R, but any underlying C, C++, or 
 Fortran code with R bindings as well.) I wouldn't expect it to 
 be much harder for the other languages, but since I don't work 
 with them, I can't say. The advantage of D would be the new 
 functionality you write in D on top of the existing 
 functionality in those languages.

We can call C++ libraries from our little language written in D 
and you can even write C++ inline, compile it at runtime and call 
it thanks to Cling.

Can call python although it's not yet in master.  Initially via 
pyd but people have their own particular versions, installs and 
setups so instead moving to RPC over named pipes using nanomsg.  
That should generalise to anything other languages we would want 
to call too.  Serialisation and deserialisation isn't dirt cheap 
but the idea isn't to write inner loops in python.

There's a lot more overhead doing it this way - it's not for 
free.  But it is valuable for internal use for the problems we 
currently have.

I have a little plugin that uses your R wrapper but it's not used 
by anyone yet.

Time taken to a first version matters for us.  The first version 
doesn't usually need to be fast for user code.  This should allow 
us to access libraries without having to combine that with 
language choices.

In time I figure we could use cling to generate declarations and 
light wrappers for C++ too.

Robert Schadek made a beginning on Julia integration work but we 
haven't had time to do more than that.

Nov 03 2020

data pulverizer <data.pulverizer gmail.com> writes:

On Tuesday, 3 November 2020 at 22:51:14 UTC, Laeeth Isharc wrote:
 Robert Schadek made a beginning on Julia integration work but 
 we haven't had time to do more than that.

If you're just passing arrays and pointers between Julia and D, 
this is pretty simple no? Julia's ccall makes that relatively 
simple. You can even compile D code and call it from Julia - that 
should be pretty straightforward. Calling Julia from D just needs 
the Julia C API, which again is pretty straightforward. You'll 
need to convert what you need from julia.h header file.

Nov 05 2020

bachmeier <no spam.net> writes:

On Thursday, 5 November 2020 at 13:11:17 UTC, data pulverizer 
wrote:
 On Tuesday, 3 November 2020 at 22:51:14 UTC, Laeeth Isharc 
 wrote:
 Robert Schadek made a beginning on Julia integration work but 
 we haven't had time to do more than that.

 If you're just passing arrays and pointers between Julia and D, 
 this is pretty simple no? Julia's ccall makes that relatively 
 simple. You can even compile D code and call it from Julia - 
 that should be pretty straightforward. Calling Julia from D 
 just needs the Julia C API, which again is pretty 
 straightforward. You'll need to convert what you need from 
 julia.h header file.

The question for me is if you can work with the same data 
structures in D, R, Python, and Julia. Can your main program be 
written in D, but calling out to all three for loading, 
transforming, and analyzing the data? I'm guessing not, but would 
be awesome if you could do it.

Nov 05 2020

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:
 [snip]

 The question for me is if you can work with the same data 
 structures in D, R, Python, and Julia. Can your main program be 
 written in D, but calling out to all three for loading, 
 transforming, and analyzing the data? I'm guessing not, but 
 would be awesome if you could do it.

Yeah, that would be pretty nice. However, I would emphasize what 
aberba has been saying across several different threads, which is 
the importance of documentation and tutorials. It's nice to have 
the ability to do it, but if you don't make it clear for the 
typical user of R/Python/Julia to figure it out, then the reach 
will be limited.

Nov 05 2020

bachmeier <no spam.net> writes:

On Thursday, 5 November 2020 at 19:39:43 UTC, jmh530 wrote:
 On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:
 [snip]

 The question for me is if you can work with the same data 
 structures in D, R, Python, and Julia. Can your main program 
 be written in D, but calling out to all three for loading, 
 transforming, and analyzing the data? I'm guessing not, but 
 would be awesome if you could do it.

 Yeah, that would be pretty nice. However, I would emphasize 
 what aberba has been saying across several different threads, 
 which is the importance of documentation and tutorials. It's 
 nice to have the ability to do it, but if you don't make it 
 clear for the typical user of R/Python/Julia to figure it out, 
 then the reach will be limited.

Definitely, but you need to have the functionality first. On the 
homepage for embedr, I have examples showing most of the 
functionality: https://embedr.netlify.app/ I started writing up 
lecture notes but then the pandemic sent my workload through the 
roof.

Nov 05 2020

data pulverizer <data.pulverizer gmail.com> writes:

On Thursday, 5 November 2020 at 21:57:46 UTC, bachmeier wrote:
 Definitely, but you need to have the functionality first. On 
 the homepage for embedr, I have examples showing most of the 
 functionality: https://embedr.netlify.app/ I started writing up 
 lecture notes but then the pandemic sent my workload through 
 the roof.

Looks cool.

Nov 05 2020

data pulverizer <data.pulverizer gmail.com> writes:

On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:
 The question for me is if you can work with the same data 
 structures in D, R, Python, and Julia. Can your main program be 
 written in D, but calling out to all three for loading, 
 transforming, and analyzing the data? I'm guessing not, but 
 would be awesome if you could do it.

It's actually a problem I've been thinking about on and off for a 
while but haven't gone round to actually trying to implement it.

1. If I had to do this, I would first decide on a collection of 
common data structures to share starting with *compositions* of 
R/Python/Julia style multi-dimensional arrays - contiguous arrays 
with basic element types with a dimensional information in form 
of another array. So a 2x3 double matrix is a double array of 
length 6 with another long array containing [2, 3]. R has 
externalptr, Julia can interface with pointers, as can Python.

2. Next I would use memory mapped i/o for storage. Usually memory 
mapped files are only accessible by one thread for security but I 
believe that this can be changed. For security you could use 
cryptographic keys to access the files between threads. So that 
memory written in one language can be access by another.

3. Binary file i/o for those is pretty simple, but necessary to 
store results and read then in any of the programs afterwards.

4. All the languages have C APIs so you'd write interfaces in D 
using these to call from D to the languages. All the languages 
can call D extern C functions in dlls directly using their 
versions of ccall.

Another alternative to mmap is using network serialization which 
would be more cross-platform and fungible but this seems like it 
could be slow to me.

Nov 05 2020

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer 
wrote:
 [snip]

 2. Next I would use memory mapped i/o for storage. Usually 
 memory mapped files are only accessible by one thread for 
 security but I believe that this can be changed. For security 
 you could use cryptographic keys to access the files between 
 threads. So that memory written in one language can be access 
 by another.

 [snip]

One thread only? Sounds like GIL...

Nov 05 2020

data pulverizer <data.pulverizer gmail.com> writes:

On Thursday, 5 November 2020 at 20:30:03 UTC, jmh530 wrote:
 On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer 
 wrote:
 [snip]

 2. Next I would use memory mapped i/o for storage. Usually 
 memory mapped files are only accessible by one thread for 
 security but I believe that this can be changed. For security 
 you could use cryptographic keys to access the files between 
 threads. So that memory written in one language can be access 
 by another.

 [snip]

 One thread only? Sounds like GIL...

Not necessarily. The cryptographic keys are used to access the 
file not to lock it, I believe mmap files can be secured with a 
password, which should be generated cryptograhically as an 
alternative to manually entered and stored somewhere. It protects 
the file from unsanctioned access. Even though the file itself 
will probably only take a single password rather than some 
synchronized rotating mechanism. However it is done, the memory 
will need to be protected.

There should be no reason why multiple processes could not read 
from a file. Only writing would require a lock from other 
processes for obvious reasons.

As I said, I haven't even begun to properly plan an 
implementation yet, just something that I think about from time 
to time.

Nov 05 2020

bachmeier <no spam.net> writes:

On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer 
wrote:

 1. If I had to do this, I would first decide on a collection of 
 common data structures to share starting with *compositions* of 
 R/Python/Julia style multi-dimensional arrays - contiguous 
 arrays with basic element types with a dimensional information 
 in form of another array. So a 2x3 double matrix is a double 
 array of length 6 with another long array containing [2, 3]. R 
 has externalptr, Julia can interface with pointers, as can 
 Python.

R has externalptr, but to my knowledge, that's only for 
transporting around C objects. I don't know of any way to call R 
API functions with data not allocated by R. It assumes it can do 
anything it wants with that data. Unless they've changed 
something (which is possible since I haven't looked into it in 
years) you'd have to copy any data you send to an R function. But 
if you're calling R maybe you don't care about that.

Nov 05 2020

data pulverizer <data.pulverizer gmail.com> writes:

On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:
 On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer 
 wrote:

 1. If I had to do this, I would first decide on a collection 
 of common data structures to share starting with 
 *compositions* of R/Python/Julia style multi-dimensional 
 arrays - contiguous arrays with basic element types with a 
 dimensional information in form of another array. So a 2x3 
 double matrix is a double array of length 6 with another long 
 array containing [2, 3]. R has externalptr, Julia can 
 interface with pointers, as can Python.

 R has externalptr, but to my knowledge, that's only for 
 transporting around C objects. I don't know of any way to call 
 R API functions with data not allocated by R.

Yes but you make C calls in R on the pointed object. Given the 
choice that's how I would write any application in R. The only 
purpose R would serve is as an interface to the underlying dlls. 
I have many years of writing code in R and from my experience, 
apart from minor instances I would try to avoid writing 
production libraries or code in it.

Nov 05 2020

data pulverizer <data.pulverizer gmail.com> writes:

On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer 
wrote:
 On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:
 R has externalptr, but to my knowledge, that's only for 
 transporting around C objects. I don't know of any way to call 
 R API functions with data not allocated by R.

 Yes but you make C calls in R on the pointed object. Given the 
 choice that's how I would write any application in R. The only 
 purpose R would serve is as an interface to the underlying 
 dlls. I have many years of writing code in R and from my 
 experience, apart from minor instances I would try to avoid 
 writing production libraries or code in it.

p.s. I'm not saying that data shouldn't be accessible or returned 
in R, I'm just saying that externalptr is there for other pointed 
objects that R might need to interface with. I hope that's clear 
- avoiding writing production code in R is just my professional 
advice.

Nov 05 2020

bachmeier <no spam.net> writes:

On Thursday, 5 November 2020 at 22:46:21 UTC, data pulverizer 
wrote:
 On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer 
 wrote:
 On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:
 R has externalptr, but to my knowledge, that's only for 
 transporting around C objects. I don't know of any way to 
 call R API functions with data not allocated by R.

 Yes but you make C calls in R on the pointed object. Given the 
 choice that's how I would write any application in R. The only 
 purpose R would serve is as an interface to the underlying 
 dlls. I have many years of writing code in R and from my 
 experience, apart from minor instances I would try to avoid 
 writing production libraries or code in it.

 p.s. I'm not saying that data shouldn't be accessible or 
 returned in R, I'm just saying that externalptr is there for 
 other pointed objects that R might need to interface with. I 
 hope that's clear - avoiding writing production code in R is 
 just my professional advice.

It really depends (which was one of the points of my earlier post 
about how broad this field is). For someone doing academic 
research or statistical analysis for, say, marketing purposes, 
the interactive code they write is the production code. They're 
not going to write two versions of their code. I know for web 
applications or finance or some other areas where the distinction 
matters.

But as far as telling people "don't write code in R", that's 
simply a non-starter, and there's no reason to even begin a 
project like this if you're going to tell people to avoid 
existing libraries in either R or Python. They'll just shrug when 
you start talking about performance because for the vast majority 
of what they're doing it's not an issue.

Nov 12 2020

bachmeier <no spam.net> writes:

On Thursday, 12 November 2020 at 19:09:48 UTC, bachmeier wrote:

 I know for web applications or finance or some other areas 
 where the distinction matters.

This should be

 I know for web applications or finance or some other areas 
 performance matters enough that they'll distinguish between 
 interactive and production code, and even write two versions.

Nov 12 2020

data pulverizer <data.pulverizer gmail.com> writes:

On Thursday, 12 November 2020 at 19:09:48 UTC, bachmeier wrote:
 On Thursday, 5 November 2020 at 22:46:21 UTC, data pulverizer 
 wrote:
 On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer 
 wrote:
 ... I have many years of writing code in R and from my 
 experience, apart from minor instances I would try to avoid 
 writing production libraries or code in it.

 ... avoiding writing production code in R is just my 
 professional advice.

 It really depends (which was one of the points of my earlier 
 post about how broad this field is). For someone doing academic 
 research or statistical analysis for, say, marketing purposes, 
 the interactive code they write is the production code. They're 
 not going to write two versions of their code. I know for web 
 applications or finance or some other areas where the 
 distinction matters.

 But as far as telling people "don't write code in R", that's 
 simply a non-starter, and there's no reason to even begin a 
 project like this if you're going to tell people to avoid 
 existing libraries in either R or Python. They'll just shrug 
 when you start talking about performance because for the vast 
 majority of what they're doing it's not an issue.

You act as if I'm banning people from writing code in R - I 
certainly don't have the power to do that. And yes, it varies 
from situation to situation, as I clearly eluded to.

I've done a lot of projects in R. I'm well aware that sometimes 
it is unavoidable for the client. What I am saying is given the 
choice, you should probably choose a different tool apart from 
"some minor instances". I've seen R go spectacularly wrong 
because of the type of language it is, it makes assumptions of 
that the programmer means which can cause epic bugs, and very 
often, it does it silently and it happens all the time. You can 
never be sure that *any* piece of R code will work as it should. 
It's just the nature of the language. People write it because 
it's easy and has "boilerplate", which is fine if you are proof 
of concepting or doing research and some other things, but you 
use it in mission critical production apps and it may well blow 
up in your face, and you might not even know. And that's before 
we get to performance, and other things blah, blah, blah.

Nov 13 2020

bachmeier <no spam.net> writes:

On Friday, 13 November 2020 at 21:28:30 UTC, data pulverizer 
wrote:

 You act as if I'm banning people from writing code in R - I 
 certainly don't have the power to do that. And yes, it varies 
 from situation to situation, as I clearly eluded to.

All I'm saying is that any project that interoperates with R or 
any other language has to accept that the programmers you're 
targeting are going to write code in the other language. If for 
no other reason than the fact that they've already written tens 
of thousands of lines of code that they don't want to throw away.

 I've done a lot of projects in R. I'm well aware that sometimes 
 it is unavoidable for the client. What I am saying is given the 
 choice, you should probably choose a different tool apart from 
 "some minor instances". I've seen R go spectacularly wrong 
 because of the type of language it is, it makes assumptions of 
 that the programmer means which can cause epic bugs, and very 
 often, it does it silently and it happens all the time. You can 
 never be sure that *any* piece of R code will work as it 
 should. It's just the nature of the language.

Well, I don't want to get into a big debate about R, but I don't 
for the most part agree with this view. R was designed to be used 
as (1) a functional language, and (2) a specialized tool to 
quickly solve a specialized set of problems. It's remarkably 
difficult to write incorrect code if you're using it as it was 
designed to be used, which includes pure functions and immutable 
data structures. It originated as a dialect of Scheme and that's 
the foundation everything is built on.

Where I do agree with you is the type system. Not only is it a 
dynamic language, it has an extremely weak type system, which 
most likely has something to do with the fact that it originated 
down the hall in the same place that gave us C. I nonetheless 
don't agree with the conclusion that it should never be used. 
I've seen loads of R criticism, and it's almost always something 
like this. Here's one I've probably seen 50 times:

x <- 1:10
j <- 4
x[2:3+j]

The code returns [6 7]! It should obviously return [2 3 4 5 6 7]! 
R is trash!

That's nonsense. Operators have precedence in every language. The 
critic would have gotten the "correct" answer with x[2:(3+j)]. No 
language is going to work if you don't understand operator 
precedence. Another is that x[-1] drops the first element. If you 
come from another language, that might not be what you expect. If 
you come from a language that forbids negative index values, you 
might even think this makes R unusable. Honestly, the vast 
majority of R critiques are not different from the folks that 
post here about how D does things wrong because it's different 
from C++ or Python or whatever language.

Nov 13 2020

data pulverizer <data.pulverizer gmail.com> writes:

On Friday, 13 November 2020 at 22:59:23 UTC, bachmeier wrote:
 Well, I don't want to get into a big debate about R ...

You already did so by reading something else into what I was 
saying

 I nonetheless don't agree with the conclusion that it should 
 never be used.

I never said this. From the beginning I said that there are some 
instances where R could be used.

 I've seen loads of R criticism, and it's almost always 
 something like this. Here's one I've probably seen 50 times:

 x <- 1:10
 j <- 4
 x[2:3+j]

 The code returns [6 7]! It should obviously return [2 3 4 5 6 
 7]! R is trash! ...
 Another is that x[-1] drops the first element. If you come from 
 another language, that might not be what you expect.

This is not what I'm talking about, but there are *many* known 
issues with how R behaves, but you seen not to have selected any 
of the well known ones. Here are just a few:

1. R can suddenly decide that your character (string) should 
suddenly become a factor - even if you know about this it can 
still difficult to tell when this occurs. That's why 
stringsAsFactors exists.
2. R can suddenly decide that your selection in a matrix should 
be a vector. So if you were selecting mat[, 1:n] and n == 1 you 
don't get a matrix anymore, and your code will fall over. That's 
why drop = TRUE/FALSE exists. Can still be a difficult bug to 
find. I've seen this happen MANY times. The behaviour in Julia 
mat[:, 1:n] when n == 1 is the expected one.
3. Recycling elements instead of throwing an error - loads of 
bugs for this one.
4. sapply will return whatever it wants with the same argument 
types. One minute a matrix and another time a list and so on. 
With the SAME ARGUMENT TYPES!
5. Dates will suddenly unpredictably morph into numbers. 
cat("Today is: ", Sys.Date()).
6. The flimsy and almost unusable set of OOP tools. S3, S4, "R5" 
- Refererence Classes, and R6 - how many languages have that many 
OOP systems? Mone of which are particularly effective.

These are just a few of the popular ones but there are MANY more. 
When your code base grows, these and many other types of issues 
start to have a serious impact on the stability of your 
application. There are places for R, but you have to be VERY 
careful where you put it.

 R was designed to be used as (1) a functional language, and (2) 
 a specialized tool to quickly solve a specialized set of 
 problems. It's remarkably difficult to write incorrect code if 
 you're using it as it was designed to be used, which includes 
 pure functions and immutable data structures.

Sounds as if you're "quoting from authority here". The flow of 
what you've said here is misleading. If you said, "R is *weakly* 
'functional-like' and has some convenience as a result", I might 
reluctantly accept that. But R doesn't have anywhere near enough 
features from functional programming to be even *used* as a 
functional language. How can you have so much instability built 
into a language can call it functional? R is the opposite of 
functional ethos! It has obscene permissiveness on some issues 
and irrational restrictiveness on others.

 Honestly, the vast majority of R critiques are not different 
 from the folks that post here about how D does things wrong 
 because it's different from C++ or Python or whatever language.

This is not true. I can write code in D and be pretty sure that 
it does what I think it does - even before thorough testing, you 
win massively just with static typing. C++ too. Even before the 
new Python function type system it was still pretty robust for a 
dynamic language, now you can fairly well gaurantee some things. 
Julia in principle is all but a static language with 
dynamic-typing "tagged on". R occupies a particular space as a 
programming language, and it's a space I'm wary of, and I think 
others should be careful, cognizant of it, and use it accordingly.

Nov 13 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 30.10.20 13:15, Russel Winder wrote:
 Having thought about it on and off for a decade, I am happy with the status
 quo around Python. Python code is (or should be) highly maintainable code
 designed for execution on a single threaded VM, easily understood and amended.

Weird. If you had given me those requirements as a list and asked me to 
relate them to Python, I would have told you you made a list of weak 
points of Python.

Nov 13 2020

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Thursday, 29 October 2020 at 22:52:59 UTC, Russel Winder wrote:
 1. People have been trying to make Python execute faster for 30 
 years. In the end everyone ends up just using CPython with any 
 and all optimisations it can get in.

I think where such efforts go wrong is that they try to optimize 
Python instead of looking at the usage pattern that most 
programmers have. Most Python users never use much of the 
esoteric features (including concurrency, beyond generators) that 
Python offers.

So you could easily create a simpler language with low level 
implemented libraries that exhibit behaviour close enough to 
Python for current Python users to feel at home with it.

Oct 30 2020

D Programming

C/C++ Programming

Other

digitalmars.D - Pandas like features