digitalmars.D - They wrote the fastest parallelized BAM parser in D

george (5/5) Mar 29 2015 http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinf...

Laeeth Isharc (6/11) Mar 30 2015 Thanks. Added to Python wiki section here:
Paulo Pinto (8/13) Mar 30 2015 .NET actually already has a foothold in bioinformatics, specially

george (7/14) Mar 30 2015 Though when it comes to open source bioinformatics projects, Perl

Russel Winder via Digitalmars-d (21/40) Mar 30 2015 Paulo,

Andrei Alexandrescu (2/32) Mar 30 2015 ... incongruent with the recently-published bioinformatics paper. -- And...
Laeeth Isharc (12/22) Mar 30 2015 You're right about the lack of visualization being a shame. I

CraigDillabaugh (13/18) Mar 30 2015 clip

george (10/15) Mar 30 2015 I personally prefer the model where I create a tool that takes
lobo (9/22) Mar 30 2015 [snip]

Craig Dillabaugh (12/35) Mar 30 2015 My point wasn't that visualization isn't important, it is that in

Laeeth Isharc (6/45) Mar 30 2015 Yes, I tried to pick my words carefully. It is not a disaster,

Andrew Brown (29/79) Mar 31 2015 Visualisation is certainly not behind python's success in

John Colvin (8/37) Mar 31 2015 Building LDC and its depedencies isn't that difficult, but it was

Paulo Pinto (4/17) Mar 30 2015 Sure, just sent to your email.
Chris (27/64) Mar 31 2015 As Andrew Brown pointed out, visualization is not behind Pythons

Laeeth Isharc (36/56) Mar 31 2015 Sounds right. I am not in the camp that says it is a killer for

Chris (22/79) Mar 31 2015 The article that gave rise to this thread is a good reference.

Chris (7/96) Mar 31 2015 It'd be nice, if we had a dedicated data-analysis section and/or

Paulo Pinto (12/51) Mar 31 2015 It is in the JVM and .NET eco-systems. Both have AOT compilers

Paulo Pinto (8/25) Mar 30 2015 Yes on the server side and UNIX based research.

John Colvin (15/44) Mar 31 2015 Yes, to the benefit of literally no-one. To be fair, it's not a

Andrei Alexandrescu (8/13) Mar 30 2015 Nice! Went to post it on reddit, was already there:

"george" <georgkam gmail.com> writes:

http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinformatics.btv098.full.pdf+html

and a feature
http://google-opensource.blogspot.nl/2015/03/gsoc-project-sambamba-published-in.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+GoogleOpenSourceBlog+(Google+Open+Source+Blog)


D may hold a sweet spot in bioinformatics where you often require 
quick turnaround (productivity) , raw speed and agility.

Mar 29 2015

"Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:

On Monday, 30 March 2015 at 06:50:19 UTC, george wrote:
 http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinformatics.btv098.full.pdf+html

 and a feature
 http://google-opensource.blogspot.nl/2015/03/gsoc-project-sambamba-published-in.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+GoogleOpenSourceBlog+(Google+Open+Source+Blog)


 D may hold a sweet spot in bioinformatics where you often 
 require quick turnaround (productivity) , raw speed and agility.

Thanks.  Added to Python wiki section here: 
http://wiki.dlang.org/Coming_From/Python

But we should also create anchors for guides by different use 
domains for D: finance, bioinformatics, etc.  Enterprise users 
often like to know they are not the first.

Mar 30 2015

"Paulo Pinto" <pjmlp progtools.org> writes:

On Monday, 30 March 2015 at 06:50:19 UTC, george wrote:
 http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinformatics.btv098.full.pdf+html

 and a feature
 http://google-opensource.blogspot.nl/2015/03/gsoc-project-sambamba-published-in.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+GoogleOpenSourceBlog+(Google+Open+Source+Blog)


 D may hold a sweet spot in bioinformatics where you often 
 require quick turnaround (productivity) , raw speed and agility.

.NET actually already has a foothold in bioinformatics, specially 
in user facing software and steering of reading equipments and 
robots.


visualization) use cases.

--
Paulo

Mar 30 2015

"george" <georgkam gmail.com> writes:

 .NET actually already has a foothold in bioinformatics, 
 specially in user facing software and steering of reading 
 equipments and robots.


 visualization) use cases.

 --
 Paulo

Though when it comes to open source bioinformatics projects, Perl 
and Python have a large foothold
among most most bioinformaticians. Most utilities that require 
speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS etc).

I think D stands a good chance as a language of choice for 
bioinformatics projects.

George

Mar 30 2015

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d wrote:
 .NET actually already has a foothold in bioinformatics,=20
 specially in user facing software and steering of reading=20
 equipments and robots.
=20

 visualization) use cases.
=20
 --
 Paulo


Paulo,

Can you send me some pointers to this stuff?

=20
 Though when it comes to open source bioinformatics projects, Perl=20
 and Python have a large foothold
 among most most bioinformaticians. Most utilities that require=20
 speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS etc).
=20
 I think D stands a good chance as a language of choice for=20
 bioinformatics projects.
=20
 George

My "prejudice", based on training people in Python and C++ over the=20
last few years, is that Python and C++ have a very strong position in=20
the bioinformatics community, with the use of IPython (now becoming=20
Jupyter) increasing and solidifying the Python position.

D's position is quite weak here because one of the important things is=20
visualising data, something SciPy/Matplotlib are very good at. D has=20
no real play in this arena and so there is no way (currently) of=20
creating a foothold. Sad, but=E2=80=A6

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Mar 30 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/30/15 11:23 AM, Russel Winder via Digitalmars-d wrote:
 On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d wrote:
 .NET actually already has a foothold in bioinformatics,
 specially in user facing software and steering of reading
 equipments and robots.


 visualization) use cases.

 --
 Paulo


 Paulo,

 Can you send me some pointers to this stuff?

 Though when it comes to open source bioinformatics projects, Perl
 and Python have a large foothold
 among most most bioinformaticians. Most utilities that require
 speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS etc).

 I think D stands a good chance as a language of choice for
 bioinformatics projects.

 George

 My "prejudice", based on training people in Python and C++ over the
 last few years, is that Python and C++ have a very strong position in
 the bioinformatics community, with the use of IPython (now becoming
 Jupyter) increasing and solidifying the Python position.

 D's position is quite weak here because one of the important things is
 visualising data, something SciPy/Matplotlib are very good at. D has
 no real play in this arena and so there is no way (currently) of
 creating a foothold. Sad, but…

... incongruent with the recently-published bioinformatics paper. -- Andrei

Mar 30 2015

"Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:

 My "prejudice", based on training people in Python and C++ over 
 the last few years, is that Python and C++ have a very strong 
 position in the bioinformatics community, with the use of 
 IPython (now becoming Jupyter) increasing and solidifying the 
 Python position.

It's just possible there is a selection effect ;)  Plus the 
future may not be like the past.

 D's position is quite weak here because one of the important 
 things is visualising data, something SciPy/Matplotlib are very 
 good at. D has no real play in this arena and so there is no 
 way (currently) of
 creating a foothold. Sad, but…

You're right about the lack of visualization being a shame. I 
have been thinking about porting Bokeh bindings to D.  There 
isn't much too it on the server side - all you need to do is 
build up the object model and translate it to JSON - but I have 
not time right now to do it all myself.

https://github.com/bokeh/bokeh

I did port MathGL C API to D, although I haven't tested yet 
beyond the simplest example.  The C++ bindings aren't so much 
work to add, although even the C API is not so ugly.

http://mathgl.sourceforge.net/doc_en/Main.html

Mar 30 2015

"CraigDillabaugh" <craig.dillabaugh gmail.com> writes:

On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc wrote:

clip
 You're right about the lack of visualization being a shame. I 
 have been thinking about porting Bokeh bindings to D.  There 
 isn't much too it on the server side - all you need to do is 
 build up the object model and translate it to JSON - but I have 
 not time right now to do it all myself.

clip

A comment on the visualization thing. Is this really a big issue? 
  Data processing (D's strong point) and visualization are 
different tasks, and presumably as long as outputs are to 
standard file types (ie. NetCDF, HDF5 or other domain specific 
formats) then existing visualization tools should be usable.

I did some image processing work with D and didn't find the lack 
of specific D tools for visualization a big issue.

There is some advantage to being able to perform visualization 
tasks in the same lanaguage as you do the data processing work, 
but I wouldn't this this would be a major obstacle.

Mar 30 2015

"george" <georgkam gmail.com> writes:

 I did some image processing work with D and didn't find the 
 lack of specific D tools for visualization a big issue.

 There is some advantage to being able to perform visualization 
 tasks in the same lanaguage as you do the data processing work, 
 but I wouldn't this this would be a major obstacle.

I personally prefer the model where I create a tool that takes 
some input and provides output in a suitable format that I can 
load to a proper statistical environment  (R or Julia ) for 
visualisation and manipulation. Therefore I would rather write a  
tool that performs a single task optimally and pipes its output 
to  a different tool for another task. This way I can use the 
tools and allow for flexible pipelines.

rawdata -> clean -> QC –> to format Y –> to format X -> tool A -> 
tool B-> visualize

George

Mar 30 2015

"lobo" <swamplobo gmail.com> writes:

On Monday, 30 March 2015 at 20:25:33 UTC, CraigDillabaugh wrote:
 On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc wrote:

 clip
 You're right about the lack of visualization being a shame. I 
 have been thinking about porting Bokeh bindings to D.  There 
 isn't much too it on the server side - all you need to do is 
 build up the object model and translate it to JSON - but I 
 have not time right now to do it all myself.

 clip

 A comment on the visualization thing. Is this really a big 
 issue?

[snip]

Yes of course, why do you think Pyhton + sciPy/Numpy has such a 
foothold in the scientific community. Visualisation is an 
important part of data processing pipeline.

It's also why Matlab is so useful for those lucky enough to work 
for a company that can afford it.

bye,
lobo

Mar 30 2015

"Craig Dillabaugh" <craig.dillabaugh gmail.com> writes:

On Monday, 30 March 2015 at 22:55:37 UTC, lobo wrote:
 On Monday, 30 March 2015 at 20:25:33 UTC, CraigDillabaugh wrote:
 On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc wrote:

 clip
 You're right about the lack of visualization being a shame. I 
 have been thinking about porting Bokeh bindings to D.  There 
 isn't much too it on the server side - all you need to do is 
 build up the object model and translate it to JSON - but I 
 have not time right now to do it all myself.

 clip

 A comment on the visualization thing. Is this really a big 
 issue?

 [snip]

 Yes of course, why do you think Pyhton + sciPy/Numpy has such a 
 foothold in the scientific community. Visualisation is an 
 important part of data processing pipeline.

 It's also why Matlab is so useful for those lucky enough to 
 work for a company that can afford it.

 bye,
 lobo

My point wasn't that visualization isn't important, it is that in 
most scientific computing it is very easy (and sensible) to 
separate the processing and visualization aspects.  So lack of D 
visualization tools should not hinder  its value as a data 
processing tool.

For example, Hadoop is immensely popular for data processing, but 
it includes no visualization tools. That is a slightly different 
domain I understand, but there are similarities.

So in short, if there were nice D visualization tools that would 
certainly be helpful, but I don't think is should be a show 
stopper.

Mar 30 2015

"Laeeth Isharc" <Laeeth.nospam nospam-laeeth.com> writes:

On Tuesday, 31 March 2015 at 02:31:58 UTC, Craig Dillabaugh wrote:
 On Monday, 30 March 2015 at 22:55:37 UTC, lobo wrote:
 On Monday, 30 March 2015 at 20:25:33 UTC, CraigDillabaugh 
 wrote:
 On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc wrote:

 clip
 You're right about the lack of visualization being a shame. 
 I have been thinking about porting Bokeh bindings to D.  
 There isn't much too it on the server side - all you need to 
 do is build up the object model and translate it to JSON - 
 but I have not time right now to do it all myself.

 clip

 A comment on the visualization thing. Is this really a big 
 issue?

 [snip]

 Yes of course, why do you think Pyhton + sciPy/Numpy has such 
 a foothold in the scientific community. Visualisation is an 
 important part of data processing pipeline.

 It's also why Matlab is so useful for those lucky enough to 
 work for a company that can afford it.

 bye,
 lobo

 My point wasn't that visualization isn't important, it is that 
 in most scientific computing it is very easy (and sensible) to 
 separate the processing and visualization aspects.  So lack of 
 D visualization tools should not hinder  its value as a data 
 processing tool.

 For example, Hadoop is immensely popular for data processing, 
 but it includes no visualization tools. That is a slightly 
 different domain I understand, but there are similarities.

 So in short, if there were nice D visualization tools that 
 would certainly be helpful, but I don't think is should be a 
 show stopper.

Yes, I tried to pick my words carefully.  It is not a disaster, 
as a someone seemed to imply, but it would be nice to have 
visualization, particularly for interactive exploration of data.  
One is back to Walter's quote about the two language combination 
being an indicator that something is lacking.

Mar 30 2015

"Andrew Brown" <aabrown24 hotmail.com> writes:

Visualisation is certainly not behind python's success in 
bioinformatics, which predates ipython. If you look through 
journals, very few of the figures are done in python (and none at 
all in julia). It succeeded because it allows you to hack your 
way through massive text files and it's not perl.

One problem with using D instead of C or C++ for projects like 
this, is that these projects are a few people developing software 
for many users, who are working on frequently very old clusters 
where they don't have admin rights. Getting an executable file to 
work for them is not trivial. Programs like samtools solve this 
by expecting people to compile it themselves, knowing they can 
rely on gcc to be installed. But none of these clusters have a D 
compiler handy.

On my university, out of the box executables for ldc don't run, 
gdc executable files don't link with libc, and dmd sometimes 
shouts it can't find dmd.conf. And this is a fairly up to date 
and well administered cluster, I know quite a few instituions 
still on centOS 5. Now, I can work to fix these problems for 
myself, but I can't expect a user spend 3 hours compiling llvm, 
then ldc and various libraries to use my software, rather than 
just look for the C/C++ equivalent.

Yesterday I was asked if I'd rewrite my code in C++ to solve this 
problem, not really an option as I don't know C++. I guess this 
is a fairly niche issue, D Learn kindly pointed me in the 
direction of VMs which I think will solve most of my problems. 
The sambabamba authors seem to be sharing dockers (congrat on the 
paper by the way!). But I think it is a factor to be considered 
when using D: disseminating software is trickier than with C/C++.

On Tuesday, 31 March 2015 at 03:30:09 UTC, Laeeth Isharc wrote:
 On Tuesday, 31 March 2015 at 02:31:58 UTC, Craig Dillabaugh 
 wrote:
 On Monday, 30 March 2015 at 22:55:37 UTC, lobo wrote:
 On Monday, 30 March 2015 at 20:25:33 UTC, CraigDillabaugh 
 wrote:
 On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc 
 wrote:

 clip
 You're right about the lack of visualization being a shame. 
 I have been thinking about porting Bokeh bindings to D.  
 There isn't much too it on the server side - all you need 
 to do is build up the object model and translate it to JSON 
 - but I have not time right now to do it all myself.

 clip

 A comment on the visualization thing. Is this really a big 
 issue?

 [snip]

 Yes of course, why do you think Pyhton + sciPy/Numpy has such 
 a foothold in the scientific community. Visualisation is an 
 important part of data processing pipeline.

 It's also why Matlab is so useful for those lucky enough to 
 work for a company that can afford it.

 bye,
 lobo

 My point wasn't that visualization isn't important, it is that 
 in most scientific computing it is very easy (and sensible) to 
 separate the processing and visualization aspects.  So lack of 
 D visualization tools should not hinder  its value as a data 
 processing tool.

 For example, Hadoop is immensely popular for data processing, 
 but it includes no visualization tools. That is a slightly 
 different domain I understand, but there are similarities.

 So in short, if there were nice D visualization tools that 
 would certainly be helpful, but I don't think is should be a 
 show stopper.

 Yes, I tried to pick my words carefully.  It is not a disaster, 
 as a someone seemed to imply, but it would be nice to have 
 visualization, particularly for interactive exploration of 
 data.  One is back to Walter's quote about the two language 
 combination being an indicator that something is lacking.

Mar 31 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Tuesday, 31 March 2015 at 08:09:00 UTC, Andrew Brown wrote:
 Visualisation is certainly not behind python's success in 
 bioinformatics, which predates ipython. If you look through 
 journals, very few of the figures are done in python (and none 
 at all in julia). It succeeded because it allows you to hack 
 your way through massive text files and it's not perl.

 One problem with using D instead of C or C++ for projects like 
 this, is that these projects are a few people developing 
 software for many users, who are working on frequently very old 
 clusters where they don't have admin rights. Getting an 
 executable file to work for them is not trivial. Programs like 
 samtools solve this by expecting people to compile it 
 themselves, knowing they can rely on gcc to be installed. But 
 none of these clusters have a D compiler handy.

 On my university, out of the box executables for ldc don't run, 
 gdc executable files don't link with libc, and dmd sometimes 
 shouts it can't find dmd.conf. And this is a fairly up to date 
 and well administered cluster, I know quite a few instituions 
 still on centOS 5. Now, I can work to fix these problems for 
 myself, but I can't expect a user spend 3 hours compiling llvm, 
 then ldc and various libraries to use my software, rather than 
 just look for the C/C++ equivalent.

 Yesterday I was asked if I'd rewrite my code in C++ to solve 
 this problem, not really an option as I don't know C++. I guess 
 this is a fairly niche issue, D Learn kindly pointed me in the 
 direction of VMs which I think will solve most of my problems. 
 The sambabamba authors seem to be sharing dockers (congrat on 
 the paper by the way!). But I think it is a factor to be 
 considered when using D: disseminating software is trickier 
 than with C/C++.

Building LDC and its depedencies isn't that difficult, but it was 
still a pain to have to do that just to compile my code for the 
cluster.

There needs to be some sort of bootstrap script, downloads 
included, available to go from a bare bones c++ toolchain to a 
working D compiler. Or even just some executables online compiled 
with an ancient glibc.

Mar 31 2015

"Paulo Pinto" <pjmlp progtools.org> writes:

On Monday, 30 March 2015 at 18:23:31 UTC, Russel Winder wrote:
 On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d 
 wrote:
 .NET actually already has a foothold in bioinformatics, 
 specially in user facing software and steering of reading 
 equipments and robots.
 

 visualization) use cases.
 
 --
 Paulo


 Paulo,

 Can you send me some pointers to this stuff?

Sure, just sent to your email.

--
Paulo

Mar 30 2015

"Chris" <wendlec tcd.ie> writes:

On Monday, 30 March 2015 at 18:23:31 UTC, Russel Winder wrote:
 On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d 
 wrote:
 .NET actually already has a foothold in bioinformatics, 
 specially in user facing software and steering of reading 
 equipments and robots.
 

 visualization) use cases.
 
 --
 Paulo


 Paulo,

 Can you send me some pointers to this stuff?

 
 Though when it comes to open source bioinformatics projects, 
 Perl and Python have a large foothold
 among most most bioinformaticians. Most utilities that require 
 speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS 
 etc).
 
 I think D stands a good chance as a language of choice for 
 bioinformatics projects.
 
 George

 My "prejudice", based on training people in Python and C++ over 
 the
 last few years, is that Python and C++ have a very strong 
 position in
 the bioinformatics community, with the use of IPython (now 
 becoming
 Jupyter) increasing and solidifying the Python position.

 D's position is quite weak here because one of the important 
 things is
 visualising data, something SciPy/Matplotlib are very good at. 
 D has
 no real play in this arena and so there is no way (currently) of
 creating a foothold. Sad, but…

As Andrew Brown pointed out, visualization is not behind Pythons 
success. Its success lies in the fact that it's a language you 
can hack away in easily. Almost everybody who has to do some data 
processing (most researchers do these days) and has limited or no 
experience with programming will opt for Python: easy (at 
first!), well-documented and everyone else uses it. However, the 
initial euphoria of being able to automatically rename files and 
extract value X from file Y soon gives way to frustration when it 
comes to performance.

The paper shows well that in a world where data processing is of 
utmost importance, and we're talking about huge sets of data, 
languages like Python don't cut it anymore. Two things are 
happening at the moment: on the one hand people still use Python 
for various reasons (see above and hundreds of posts on this 
forum), at the same time there's growing discontent among 
researchers, scientists and engineers as regards performance, 
simply because the data sets are becoming bigger and bigger every 
day and the algorithms are getting more and more refined. Sooner 
or later people will have to find new ways, out of sheer 
necessity.

Don't forget that "the state of the art" can change very quickly 
in IT and the name of the game is anticipating new developments 
rather than taking snapshots of the current state of the art and 
frame them. D really has a lot to offer for data processing and I 
wouldn't rule it out that more and more programmers will turn to 
it for this task.

Mar 31 2015

"Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:

 As Andrew Brown pointed out, visualization is not behind 
 Pythons success. Its success lies in the fact that it's a 
 language you can hack away in easily.

Sounds right.  I am not in the camp that says it is a killer for 
D.  It would just be nice to have both at least a passable 
solution for visualization, and some way of making it 
interactive.  (The REPL might be one route).  The problem with 
separating the processes completely and just piping the output 
from D code that does the heavy lifting to a python or julia 
front end is it may make it more painful to play with and explore 
the data.  My interests are finance more than science, so that 
may lead to a different set of needs.  Finishing mathgl and 
writing D bindings for bokeh (take a look - it is pretty cool, 
particularly to be able to use the browser as client, 
acknowledging that it is a tradeoff) is not so much work.  But 
some help on bokeh particularly would be nice, as I fear picking 
one way of implementing the object structure and later finding it 
is a mistake.

 the initial euphoria of being able to automatically rename 
 files and extract value X from file Y soon gives way to 
 frustration when it comes to performance.

Yep.

 The paper shows well that in a world where data processing is 
 of utmost importance, and we're talking about huge sets of 
 data, languages like Python don't cut it anymore.

I could not agree more, and I do think the intersection of two 
trends creates tremendous opportunity for D.  It's also 
commonsensical to look at notable successes - and I hope it is 
not just my biases that lead me to think many of these are in 
just this kind of application.  Data sets keep getting larger 
(but not necessarily more information rich in dollar terms), and 
Moore's Law/memory speed+latency is not keeping pace.  This is 
exactly the kind of change that creeps up on you because not much 
changes in a few months (which is the kind of horizon many of us 
tend to think in).

People say "what is D's edge", but my personal perception is 
"where is the competition for D" in this area.  It has to be 
native code/JIT, and I refuse to learn Java; it also should be 
plastic and lend itself to rapid iteration.

 at the same time there's growing discontent among researchers, 
 scientists and engineers as regards performance, simply because 
 the data sets are becoming bigger and bigger every day and the 
 algorithms are getting more and more refined. Sooner or later 
 people will have to find new ways, out of sheer necessity.

upvote.  I would love to see any references you have on this - 
not because it's not rather obvious to me, but because it is 
helpful when talking to other people.

 Don't forget that "the state of the art" can change very 
 quickly in IT and the name of the game is anticipating new 
 developments rather than taking snapshots of the current state 
 of the art and frame them. D really has a lot to offer for data 
 processing and I wouldn't rule it out that more and more 
 programmers will turn to it for this task.

I fully agree.  If we started a section on use cases, would you 
be able to write a page or two on D's advantages in data 
processing?

Mar 31 2015

"Chris" <wendlec tcd.ie> writes:

On Tuesday, 31 March 2015 at 11:04:50 UTC, Laeeth Isharc wrote:
 As Andrew Brown pointed out, visualization is not behind 
 Pythons success. Its success lies in the fact that it's a 
 language you can hack away in easily.

 Sounds right.  I am not in the camp that says it is a killer 
 for D.  It would just be nice to have both at least a passable 
 solution for visualization, and some way of making it 
 interactive.  (The REPL might be one route).  The problem with 
 separating the processes completely and just piping the output 
 from D code that does the heavy lifting to a python or julia 
 front end is it may make it more painful to play with and 
 explore the data.  My interests are finance more than science, 
 so that may lead to a different set of needs.  Finishing mathgl 
 and writing D bindings for bokeh (take a look - it is pretty 
 cool, particularly to be able to use the browser as client, 
 acknowledging that it is a tradeoff) is not so much work.  But 
 some help on bokeh particularly would be nice, as I fear 
 picking one way of implementing the object structure and later 
 finding it is a mistake.

 the initial euphoria of being able to automatically rename 
 files and extract value X from file Y soon gives way to 
 frustration when it comes to performance.

 Yep.

 The paper shows well that in a world where data processing is 
 of utmost importance, and we're talking about huge sets of 
 data, languages like Python don't cut it anymore.

 I could not agree more, and I do think the intersection of two 
 trends creates tremendous opportunity for D.  It's also 
 commonsensical to look at notable successes - and I hope it is 
 not just my biases that lead me to think many of these are in 
 just this kind of application.  Data sets keep getting larger 
 (but not necessarily more information rich in dollar terms), 
 and Moore's Law/memory speed+latency is not keeping pace.  This 
 is exactly the kind of change that creeps up on you because not 
 much changes in a few months (which is the kind of horizon many 
 of us tend to think in).

 People say "what is D's edge", but my personal perception is 
 "where is the competition for D" in this area.  It has to be 
 native code/JIT, and I refuse to learn Java; it also should be 
 plastic and lend itself to rapid iteration.

 at the same time there's growing discontent among researchers, 
 scientists and engineers as regards performance, simply 
 because the data sets are becoming bigger and bigger every day 
 and the algorithms are getting more and more refined. Sooner 
 or later people will have to find new ways, out of sheer 
 necessity.

 upvote.  I would love to see any references you have on this - 
 not because it's not rather obvious to me, but because it is 
 helpful when talking to other people.

The article that gave rise to this thread is a good reference.

I came from a slightly different angle, I looked for alternatives 
to Python, because I needed:

1. fast native execution (real time)
2. easy interfacing to C
3. cross-platform development

(Modern convenience, templates, ranges etc. were bonuses I 
discovered bit by bit)

As regards algorithms and data processing, most people in 
research use Matlab (proprietary) and Python. However, in my 
field they're useless when it comes to building data-driven 
systems (fast analysis, retraining of machine based on (slight) 
modifications), and putting computationally heavy algorithms into 
real world applications. Proof of concept is all it amounts to, 
usually.

So D has a real chance here, because of

1. native code
2. modern convenience
3. templates, structs, mixins, ranges, std.algorithm etcetc.
4. interfacing to C libs

 Don't forget that "the state of the art" can change very 
 quickly in IT and the name of the game is anticipating new 
 developments rather than taking snapshots of the current state 
 of the art and frame them. D really has a lot to offer for 
 data processing and I wouldn't rule it out that more and more 
 programmers will turn to it for this task.

 I fully agree.  If we started a section on use cases, would you 
 be able to write a page or two on D's advantages in data 
 processing?

I think that Dicebot et al would have good examples.

Mar 31 2015

"Chris" <wendlec tcd.ie> writes:

On Tuesday, 31 March 2015 at 13:31:33 UTC, Chris wrote:
 On Tuesday, 31 March 2015 at 11:04:50 UTC, Laeeth Isharc wrote:
 As Andrew Brown pointed out, visualization is not behind 
 Pythons success. Its success lies in the fact that it's a 
 language you can hack away in easily.

 Sounds right.  I am not in the camp that says it is a killer 
 for D.  It would just be nice to have both at least a passable 
 solution for visualization, and some way of making it 
 interactive.  (The REPL might be one route).  The problem with 
 separating the processes completely and just piping the output 
 from D code that does the heavy lifting to a python or julia 
 front end is it may make it more painful to play with and 
 explore the data.  My interests are finance more than science, 
 so that may lead to a different set of needs.  Finishing 
 mathgl and writing D bindings for bokeh (take a look - it is 
 pretty cool, particularly to be able to use the browser as 
 client, acknowledging that it is a tradeoff) is not so much 
 work.  But some help on bokeh particularly would be nice, as I 
 fear picking one way of implementing the object structure and 
 later finding it is a mistake.

 the initial euphoria of being able to automatically rename 
 files and extract value X from file Y soon gives way to 
 frustration when it comes to performance.

 Yep.

 The paper shows well that in a world where data processing is 
 of utmost importance, and we're talking about huge sets of 
 data, languages like Python don't cut it anymore.

 I could not agree more, and I do think the intersection of two 
 trends creates tremendous opportunity for D.  It's also 
 commonsensical to look at notable successes - and I hope it is 
 not just my biases that lead me to think many of these are in 
 just this kind of application.  Data sets keep getting larger 
 (but not necessarily more information rich in dollar terms), 
 and Moore's Law/memory speed+latency is not keeping pace.  
 This is exactly the kind of change that creeps up on you 
 because not much changes in a few months (which is the kind of 
 horizon many of us tend to think in).

 People say "what is D's edge", but my personal perception is 
 "where is the competition for D" in this area.  It has to be 
 native code/JIT, and I refuse to learn Java; it also should be 
 plastic and lend itself to rapid iteration.

 at the same time there's growing discontent among 
 researchers, scientists and engineers as regards performance, 
 simply because the data sets are becoming bigger and bigger 
 every day and the algorithms are getting more and more 
 refined. Sooner or later people will have to find new ways, 
 out of sheer necessity.

 upvote.  I would love to see any references you have on this - 
 not because it's not rather obvious to me, but because it is 
 helpful when talking to other people.

 The article that gave rise to this thread is a good reference.

 I came from a slightly different angle, I looked for 
 alternatives to Python, because I needed:

 1. fast native execution (real time)
 2. easy interfacing to C
 3. cross-platform development

 (Modern convenience, templates, ranges etc. were bonuses I 
 discovered bit by bit)

 As regards algorithms and data processing, most people in 
 research use Matlab (proprietary) and Python. However, in my 
 field they're useless when it comes to building data-driven 
 systems (fast analysis, retraining of machine based on (slight) 
 modifications), and putting computationally heavy algorithms 
 into real world applications. Proof of concept is all it 
 amounts to, usually.

 So D has a real chance here, because of

 1. native code
 2. modern convenience
 3. templates, structs, mixins, ranges, std.algorithm etcetc.
 4. interfacing to C libs

 Don't forget that "the state of the art" can change very 
 quickly in IT and the name of the game is anticipating new 
 developments rather than taking snapshots of the current 
 state of the art and frame them. D really has a lot to offer 
 for data processing and I wouldn't rule it out that more and 
 more programmers will turn to it for this task.

 I fully agree.  If we started a section on use cases, would 
 you be able to write a page or two on D's advantages in data 
 processing?

 I think that Dicebot et al would have good examples.

It'd be nice, if we had a dedicated data-analysis section and/or 
library. I'm almost sure that people working with massive amounts 
of data would find it by googling "efficient data analysis" or 
something like that.

Facebook probably has a wealth of data analysis examples / 
techniques, too.

Mar 31 2015

"Paulo Pinto" <pjmlp progtools.org> writes:

On Tuesday, 31 March 2015 at 11:04:50 UTC, Laeeth Isharc wrote:
 As Andrew Brown pointed out, visualization is not behind 
 Pythons success. Its success lies in the fact that it's a 
 language you can hack away in easily.

 Sounds right.  I am not in the camp that says it is a killer 
 for D.  It would just be nice to have both at least a passable 
 solution for visualization, and some way of making it 
 interactive.  (The REPL might be one route).  The problem with 
 separating the processes completely and just piping the output 
 from D code that does the heavy lifting to a python or julia 
 front end is it may make it more painful to play with and 
 explore the data.  My interests are finance more than science, 
 so that may lead to a different set of needs.  Finishing mathgl 
 and writing D bindings for bokeh (take a look - it is pretty 
 cool, particularly to be able to use the browser as client, 
 acknowledging that it is a tradeoff) is not so much work.  But 
 some help on bokeh particularly would be nice, as I fear 
 picking one way of implementing the object structure and later 
 finding it is a mistake.

 the initial euphoria of being able to automatically rename 
 files and extract value X from file Y soon gives way to 
 frustration when it comes to performance.

 Yep.

 The paper shows well that in a world where data processing is 
 of utmost importance, and we're talking about huge sets of 
 data, languages like Python don't cut it anymore.

 I could not agree more, and I do think the intersection of two 
 trends creates tremendous opportunity for D.  It's also 
 commonsensical to look at notable successes - and I hope it is 
 not just my biases that lead me to think many of these are in 
 just this kind of application.  Data sets keep getting larger 
 (but not necessarily more information rich in dollar terms), 
 and Moore's Law/memory speed+latency is not keeping pace.  This 
 is exactly the kind of change that creeps up on you because not 
 much changes in a few months (which is the kind of horizon many 
 of us tend to think in).

 People say "what is D's edge", but my personal perception is 
 "where is the competition for D" in this area.  It has to be 
 native code/JIT, and I refuse to learn Java; it also should be 
 plastic and lend itself to rapid iteration.

It is in the JVM and .NET eco-systems. Both have AOT compilers 
available, are able to chew data on GPGPUs and offer SIMD 
libraries.

This is why there is such a strong focus with value types and 
better C interop planned for Java 10, has its use for data 
analysis has been growing.

In HPF, companies prefer to live with JVM workarounds for the 
current limitations than go out and hire a few C++ developers, 
given the amount of money saved in salaries.


--
Paulo

Mar 31 2015

"Paulo Pinto" <pjmlp progtools.org> writes:

On Monday, 30 March 2015 at 18:04:58 UTC, george wrote:
 .NET actually already has a foothold in bioinformatics, 
 specially in user facing software and steering of reading 
 equipments and robots.


 visualization) use cases.

 --
 Paulo

 Though when it comes to open source bioinformatics projects, 
 Perl and Python have a large foothold
 among most most bioinformaticians. Most utilities that require 
 speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS 
 etc).

 I think D stands a good chance as a language of choice for 
 bioinformatics projects.

 George

Yes on the server side and UNIX based research.

However, I have learned in the last years that Windows based 
systems are also used a lot, specially in controlling robots and 
doing the first processing steps and visualization.

At least in commercial research.

--
Paulo

Mar 30 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 30 March 2015 at 20:28:11 UTC, Paulo Pinto wrote:
 On Monday, 30 March 2015 at 18:04:58 UTC, george wrote:
 .NET actually already has a foothold in bioinformatics, 
 specially in user facing software and steering of reading 
 equipments and robots.


 visualization) use cases.

 --
 Paulo

 Though when it comes to open source bioinformatics projects, 
 Perl and Python have a large foothold
 among most most bioinformaticians. Most utilities that require 
 speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS 
 etc).

 I think D stands a good chance as a language of choice for 
 bioinformatics projects.

 George

 Yes on the server side and UNIX based research.

 However, I have learned in the last years that Windows based 
 systems are also used a lot, specially in controlling robots 
 and doing the first processing steps and visualization.

 At least in commercial research.

 --
 Paulo

Yes, to the benefit of literally no-one. To be fair, it's not a 
problem of the operating system, just that special purpose GUI 
programmes for scientific work always seem to be utterly dreadful.
"Hey, we need to record some time series and show a spectrum on 
the fly" "OK great, let's commission a closed source Windows GUI 
application with its own proprietary file format, sure it'll 
crash once a day and have scientifically important paramters 
hard-coded and undocumented, but at least you can point and 
click!"

It seems to be true across the board in government research 
facilities, pharmaceutical companies, most of academia and so 
on... Enormous piles of proprietary vomit being propped up by an 
endless stream of disinterested and semi-incompetent programmers, 
steadily digging their way to job security.

Mar 31 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/29/15 11:50 PM, george wrote:
 http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinformatics.btv098.full.pdf+html


 and a feature
 http://google-opensource.blogspot.nl/2015/03/gsoc-project-sambamba-published-in.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+GoogleOpenSourceBlog+(Google+Open+Source+Blog)



 D may hold a sweet spot in bioinformatics where you often require quick
 turnaround (productivity) , raw speed and agility.

Nice! Went to post it on reddit, was already there: 
http://www.reddit.com/r/programming/comments/30tvlf/d_in_bioinformatics_gsoc_project_sambamba/

More:

https://news.ycombinator.com/newest

https://twitter.com/D_Programming/status/582603844355424257

https://www.facebook.com/dlang.org/posts/1041963349150679


Andrei

Mar 30 2015

D Programming

C/C++ Programming

Other

digitalmars.D - They wrote the fastest parallelized BAM parser in D