digitalmars.D - They wrote the fastest parallelized BAM parser in D
- george (5/5) Mar 29 2015 http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinf...
- Laeeth Isharc (6/11) Mar 30 2015 Thanks. Added to Python wiki section here:
- Paulo Pinto (8/13) Mar 30 2015 .NET actually already has a foothold in bioinformatics, specially
- george (7/14) Mar 30 2015 Though when it comes to open source bioinformatics projects, Perl
- Russel Winder via Digitalmars-d (21/40) Mar 30 2015 Paulo,
- Andrei Alexandrescu (2/32) Mar 30 2015 ... incongruent with the recently-published bioinformatics paper. -- And...
- Laeeth Isharc (12/22) Mar 30 2015 You're right about the lack of visualization being a shame. I
- CraigDillabaugh (13/18) Mar 30 2015 clip
- george (10/15) Mar 30 2015 I personally prefer the model where I create a tool that takes
- lobo (9/22) Mar 30 2015 [snip]
- Craig Dillabaugh (12/35) Mar 30 2015 My point wasn't that visualization isn't important, it is that in
- Laeeth Isharc (6/45) Mar 30 2015 Yes, I tried to pick my words carefully. It is not a disaster,
- Andrew Brown (29/79) Mar 31 2015 Visualisation is certainly not behind python's success in
- John Colvin (8/37) Mar 31 2015 Building LDC and its depedencies isn't that difficult, but it was
- Paulo Pinto (4/17) Mar 30 2015 Sure, just sent to your email.
- Chris (27/64) Mar 31 2015 As Andrew Brown pointed out, visualization is not behind Pythons
- Laeeth Isharc (36/56) Mar 31 2015 Sounds right. I am not in the camp that says it is a killer for
- Chris (22/79) Mar 31 2015 The article that gave rise to this thread is a good reference.
- Chris (7/96) Mar 31 2015 It'd be nice, if we had a dedicated data-analysis section and/or
- Paulo Pinto (12/51) Mar 31 2015 It is in the JVM and .NET eco-systems. Both have AOT compilers
- Paulo Pinto (8/25) Mar 30 2015 Yes on the server side and UNIX based research.
- John Colvin (15/44) Mar 31 2015 Yes, to the benefit of literally no-one. To be fair, it's not a
- Andrei Alexandrescu (8/13) Mar 30 2015 Nice! Went to post it on reddit, was already there:
http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinformatics.btv098.full.pdf+html and a feature http://google-opensource.blogspot.nl/2015/03/gsoc-project-sambamba-published-in.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+GoogleOpenSourceBlog+(Google+Open+Source+Blog) D may hold a sweet spot in bioinformatics where you often require quick turnaround (productivity) , raw speed and agility.
Mar 29 2015
On Monday, 30 March 2015 at 06:50:19 UTC, george wrote:http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinformatics.btv098.full.pdf+html and a feature http://google-opensource.blogspot.nl/2015/03/gsoc-project-sambamba-published-in.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+GoogleOpenSourceBlog+(Google+Open+Source+Blog) D may hold a sweet spot in bioinformatics where you often require quick turnaround (productivity) , raw speed and agility.Thanks. Added to Python wiki section here: http://wiki.dlang.org/Coming_From/Python But we should also create anchors for guides by different use domains for D: finance, bioinformatics, etc. Enterprise users often like to know they are not the first.
Mar 30 2015
On Monday, 30 March 2015 at 06:50:19 UTC, george wrote:http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinformatics.btv098.full.pdf+html and a feature http://google-opensource.blogspot.nl/2015/03/gsoc-project-sambamba-published-in.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+GoogleOpenSourceBlog+(Google+Open+Source+Blog) D may hold a sweet spot in bioinformatics where you often require quick turnaround (productivity) , raw speed and agility..NET actually already has a foothold in bioinformatics, specially in user facing software and steering of reading equipments and robots. visualization) use cases. -- Paulo
Mar 30 2015
.NET actually already has a foothold in bioinformatics, specially in user facing software and steering of reading equipments and robots. visualization) use cases. -- PauloThough when it comes to open source bioinformatics projects, Perl and Python have a large foothold among most most bioinformaticians. Most utilities that require speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS etc). I think D stands a good chance as a language of choice for bioinformatics projects. George
Mar 30 2015
On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d wrote:Paulo, Can you send me some pointers to this stuff?.NET actually already has a foothold in bioinformatics,=20 specially in user facing software and steering of reading=20 equipments and robots. =20 visualization) use cases. =20 -- Paulo=20 Though when it comes to open source bioinformatics projects, Perl=20 and Python have a large foothold among most most bioinformaticians. Most utilities that require=20 speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS etc). =20 I think D stands a good chance as a language of choice for=20 bioinformatics projects. =20 GeorgeMy "prejudice", based on training people in Python and C++ over the=20 last few years, is that Python and C++ have a very strong position in=20 the bioinformatics community, with the use of IPython (now becoming=20 Jupyter) increasing and solidifying the Python position. D's position is quite weak here because one of the important things is=20 visualising data, something SciPy/Matplotlib are very good at. D has=20 no real play in this arena and so there is no way (currently) of=20 creating a foothold. Sad, but=E2=80=A6 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 30 2015
On 3/30/15 11:23 AM, Russel Winder via Digitalmars-d wrote:On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d wrote:... incongruent with the recently-published bioinformatics paper. -- AndreiPaulo, Can you send me some pointers to this stuff?.NET actually already has a foothold in bioinformatics, specially in user facing software and steering of reading equipments and robots. visualization) use cases. -- PauloThough when it comes to open source bioinformatics projects, Perl and Python have a large foothold among most most bioinformaticians. Most utilities that require speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS etc). I think D stands a good chance as a language of choice for bioinformatics projects. GeorgeMy "prejudice", based on training people in Python and C++ over the last few years, is that Python and C++ have a very strong position in the bioinformatics community, with the use of IPython (now becoming Jupyter) increasing and solidifying the Python position. D's position is quite weak here because one of the important things is visualising data, something SciPy/Matplotlib are very good at. D has no real play in this arena and so there is no way (currently) of creating a foothold. Sad, but…
Mar 30 2015
My "prejudice", based on training people in Python and C++ over the last few years, is that Python and C++ have a very strong position in the bioinformatics community, with the use of IPython (now becoming Jupyter) increasing and solidifying the Python position.It's just possible there is a selection effect ;) Plus the future may not be like the past.D's position is quite weak here because one of the important things is visualising data, something SciPy/Matplotlib are very good at. D has no real play in this arena and so there is no way (currently) of creating a foothold. Sad, but…You're right about the lack of visualization being a shame. I have been thinking about porting Bokeh bindings to D. There isn't much too it on the server side - all you need to do is build up the object model and translate it to JSON - but I have not time right now to do it all myself. https://github.com/bokeh/bokeh I did port MathGL C API to D, although I haven't tested yet beyond the simplest example. The C++ bindings aren't so much work to add, although even the C API is not so ugly. http://mathgl.sourceforge.net/doc_en/Main.html
Mar 30 2015
On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc wrote:clipYou're right about the lack of visualization being a shame. I have been thinking about porting Bokeh bindings to D. There isn't much too it on the server side - all you need to do is build up the object model and translate it to JSON - but I have not time right now to do it all myself.clip A comment on the visualization thing. Is this really a big issue? Data processing (D's strong point) and visualization are different tasks, and presumably as long as outputs are to standard file types (ie. NetCDF, HDF5 or other domain specific formats) then existing visualization tools should be usable. I did some image processing work with D and didn't find the lack of specific D tools for visualization a big issue. There is some advantage to being able to perform visualization tasks in the same lanaguage as you do the data processing work, but I wouldn't this this would be a major obstacle.
Mar 30 2015
I did some image processing work with D and didn't find the lack of specific D tools for visualization a big issue. There is some advantage to being able to perform visualization tasks in the same lanaguage as you do the data processing work, but I wouldn't this this would be a major obstacle.I personally prefer the model where I create a tool that takes some input and provides output in a suitable format that I can load to a proper statistical environment (R or Julia ) for visualisation and manipulation. Therefore I would rather write a tool that performs a single task optimally and pipes its output to a different tool for another task. This way I can use the tools and allow for flexible pipelines. rawdata -> clean -> QC –> to format Y –> to format X -> tool A -> tool B-> visualize George
Mar 30 2015
On Monday, 30 March 2015 at 20:25:33 UTC, CraigDillabaugh wrote:On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc wrote:[snip] Yes of course, why do you think Pyhton + sciPy/Numpy has such a foothold in the scientific community. Visualisation is an important part of data processing pipeline. It's also why Matlab is so useful for those lucky enough to work for a company that can afford it. bye, loboclipYou're right about the lack of visualization being a shame. I have been thinking about porting Bokeh bindings to D. There isn't much too it on the server side - all you need to do is build up the object model and translate it to JSON - but I have not time right now to do it all myself.clip A comment on the visualization thing. Is this really a big issue?
Mar 30 2015
On Monday, 30 March 2015 at 22:55:37 UTC, lobo wrote:On Monday, 30 March 2015 at 20:25:33 UTC, CraigDillabaugh wrote:My point wasn't that visualization isn't important, it is that in most scientific computing it is very easy (and sensible) to separate the processing and visualization aspects. So lack of D visualization tools should not hinder its value as a data processing tool. For example, Hadoop is immensely popular for data processing, but it includes no visualization tools. That is a slightly different domain I understand, but there are similarities. So in short, if there were nice D visualization tools that would certainly be helpful, but I don't think is should be a show stopper.On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc wrote:[snip] Yes of course, why do you think Pyhton + sciPy/Numpy has such a foothold in the scientific community. Visualisation is an important part of data processing pipeline. It's also why Matlab is so useful for those lucky enough to work for a company that can afford it. bye, loboclipYou're right about the lack of visualization being a shame. I have been thinking about porting Bokeh bindings to D. There isn't much too it on the server side - all you need to do is build up the object model and translate it to JSON - but I have not time right now to do it all myself.clip A comment on the visualization thing. Is this really a big issue?
Mar 30 2015
On Tuesday, 31 March 2015 at 02:31:58 UTC, Craig Dillabaugh wrote:On Monday, 30 March 2015 at 22:55:37 UTC, lobo wrote:Yes, I tried to pick my words carefully. It is not a disaster, as a someone seemed to imply, but it would be nice to have visualization, particularly for interactive exploration of data. One is back to Walter's quote about the two language combination being an indicator that something is lacking.On Monday, 30 March 2015 at 20:25:33 UTC, CraigDillabaugh wrote:My point wasn't that visualization isn't important, it is that in most scientific computing it is very easy (and sensible) to separate the processing and visualization aspects. So lack of D visualization tools should not hinder its value as a data processing tool. For example, Hadoop is immensely popular for data processing, but it includes no visualization tools. That is a slightly different domain I understand, but there are similarities. So in short, if there were nice D visualization tools that would certainly be helpful, but I don't think is should be a show stopper.On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc wrote:[snip] Yes of course, why do you think Pyhton + sciPy/Numpy has such a foothold in the scientific community. Visualisation is an important part of data processing pipeline. It's also why Matlab is so useful for those lucky enough to work for a company that can afford it. bye, loboclipYou're right about the lack of visualization being a shame. I have been thinking about porting Bokeh bindings to D. There isn't much too it on the server side - all you need to do is build up the object model and translate it to JSON - but I have not time right now to do it all myself.clip A comment on the visualization thing. Is this really a big issue?
Mar 30 2015
Visualisation is certainly not behind python's success in bioinformatics, which predates ipython. If you look through journals, very few of the figures are done in python (and none at all in julia). It succeeded because it allows you to hack your way through massive text files and it's not perl. One problem with using D instead of C or C++ for projects like this, is that these projects are a few people developing software for many users, who are working on frequently very old clusters where they don't have admin rights. Getting an executable file to work for them is not trivial. Programs like samtools solve this by expecting people to compile it themselves, knowing they can rely on gcc to be installed. But none of these clusters have a D compiler handy. On my university, out of the box executables for ldc don't run, gdc executable files don't link with libc, and dmd sometimes shouts it can't find dmd.conf. And this is a fairly up to date and well administered cluster, I know quite a few instituions still on centOS 5. Now, I can work to fix these problems for myself, but I can't expect a user spend 3 hours compiling llvm, then ldc and various libraries to use my software, rather than just look for the C/C++ equivalent. Yesterday I was asked if I'd rewrite my code in C++ to solve this problem, not really an option as I don't know C++. I guess this is a fairly niche issue, D Learn kindly pointed me in the direction of VMs which I think will solve most of my problems. The sambabamba authors seem to be sharing dockers (congrat on the paper by the way!). But I think it is a factor to be considered when using D: disseminating software is trickier than with C/C++. On Tuesday, 31 March 2015 at 03:30:09 UTC, Laeeth Isharc wrote:On Tuesday, 31 March 2015 at 02:31:58 UTC, Craig Dillabaugh wrote:On Monday, 30 March 2015 at 22:55:37 UTC, lobo wrote:Yes, I tried to pick my words carefully. It is not a disaster, as a someone seemed to imply, but it would be nice to have visualization, particularly for interactive exploration of data. One is back to Walter's quote about the two language combination being an indicator that something is lacking.On Monday, 30 March 2015 at 20:25:33 UTC, CraigDillabaugh wrote:My point wasn't that visualization isn't important, it is that in most scientific computing it is very easy (and sensible) to separate the processing and visualization aspects. So lack of D visualization tools should not hinder its value as a data processing tool. For example, Hadoop is immensely popular for data processing, but it includes no visualization tools. That is a slightly different domain I understand, but there are similarities. So in short, if there were nice D visualization tools that would certainly be helpful, but I don't think is should be a show stopper.On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc wrote:[snip] Yes of course, why do you think Pyhton + sciPy/Numpy has such a foothold in the scientific community. Visualisation is an important part of data processing pipeline. It's also why Matlab is so useful for those lucky enough to work for a company that can afford it. bye, loboclipYou're right about the lack of visualization being a shame. I have been thinking about porting Bokeh bindings to D. There isn't much too it on the server side - all you need to do is build up the object model and translate it to JSON - but I have not time right now to do it all myself.clip A comment on the visualization thing. Is this really a big issue?
Mar 31 2015
On Tuesday, 31 March 2015 at 08:09:00 UTC, Andrew Brown wrote:Visualisation is certainly not behind python's success in bioinformatics, which predates ipython. If you look through journals, very few of the figures are done in python (and none at all in julia). It succeeded because it allows you to hack your way through massive text files and it's not perl. One problem with using D instead of C or C++ for projects like this, is that these projects are a few people developing software for many users, who are working on frequently very old clusters where they don't have admin rights. Getting an executable file to work for them is not trivial. Programs like samtools solve this by expecting people to compile it themselves, knowing they can rely on gcc to be installed. But none of these clusters have a D compiler handy. On my university, out of the box executables for ldc don't run, gdc executable files don't link with libc, and dmd sometimes shouts it can't find dmd.conf. And this is a fairly up to date and well administered cluster, I know quite a few instituions still on centOS 5. Now, I can work to fix these problems for myself, but I can't expect a user spend 3 hours compiling llvm, then ldc and various libraries to use my software, rather than just look for the C/C++ equivalent. Yesterday I was asked if I'd rewrite my code in C++ to solve this problem, not really an option as I don't know C++. I guess this is a fairly niche issue, D Learn kindly pointed me in the direction of VMs which I think will solve most of my problems. The sambabamba authors seem to be sharing dockers (congrat on the paper by the way!). But I think it is a factor to be considered when using D: disseminating software is trickier than with C/C++.Building LDC and its depedencies isn't that difficult, but it was still a pain to have to do that just to compile my code for the cluster. There needs to be some sort of bootstrap script, downloads included, available to go from a bare bones c++ toolchain to a working D compiler. Or even just some executables online compiled with an ancient glibc.
Mar 31 2015
On Monday, 30 March 2015 at 18:23:31 UTC, Russel Winder wrote:On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d wrote:Sure, just sent to your email. -- PauloPaulo, Can you send me some pointers to this stuff?.NET actually already has a foothold in bioinformatics, specially in user facing software and steering of reading equipments and robots. visualization) use cases. -- Paulo
Mar 30 2015
On Monday, 30 March 2015 at 18:23:31 UTC, Russel Winder wrote:On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d wrote:As Andrew Brown pointed out, visualization is not behind Pythons success. Its success lies in the fact that it's a language you can hack away in easily. Almost everybody who has to do some data processing (most researchers do these days) and has limited or no experience with programming will opt for Python: easy (at first!), well-documented and everyone else uses it. However, the initial euphoria of being able to automatically rename files and extract value X from file Y soon gives way to frustration when it comes to performance. The paper shows well that in a world where data processing is of utmost importance, and we're talking about huge sets of data, languages like Python don't cut it anymore. Two things are happening at the moment: on the one hand people still use Python for various reasons (see above and hundreds of posts on this forum), at the same time there's growing discontent among researchers, scientists and engineers as regards performance, simply because the data sets are becoming bigger and bigger every day and the algorithms are getting more and more refined. Sooner or later people will have to find new ways, out of sheer necessity. Don't forget that "the state of the art" can change very quickly in IT and the name of the game is anticipating new developments rather than taking snapshots of the current state of the art and frame them. D really has a lot to offer for data processing and I wouldn't rule it out that more and more programmers will turn to it for this task.Paulo, Can you send me some pointers to this stuff?.NET actually already has a foothold in bioinformatics, specially in user facing software and steering of reading equipments and robots. visualization) use cases. -- PauloThough when it comes to open source bioinformatics projects, Perl and Python have a large foothold among most most bioinformaticians. Most utilities that require speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS etc). I think D stands a good chance as a language of choice for bioinformatics projects. GeorgeMy "prejudice", based on training people in Python and C++ over the last few years, is that Python and C++ have a very strong position in the bioinformatics community, with the use of IPython (now becoming Jupyter) increasing and solidifying the Python position. D's position is quite weak here because one of the important things is visualising data, something SciPy/Matplotlib are very good at. D has no real play in this arena and so there is no way (currently) of creating a foothold. Sad, but…
Mar 31 2015
As Andrew Brown pointed out, visualization is not behind Pythons success. Its success lies in the fact that it's a language you can hack away in easily.Sounds right. I am not in the camp that says it is a killer for D. It would just be nice to have both at least a passable solution for visualization, and some way of making it interactive. (The REPL might be one route). The problem with separating the processes completely and just piping the output from D code that does the heavy lifting to a python or julia front end is it may make it more painful to play with and explore the data. My interests are finance more than science, so that may lead to a different set of needs. Finishing mathgl and writing D bindings for bokeh (take a look - it is pretty cool, particularly to be able to use the browser as client, acknowledging that it is a tradeoff) is not so much work. But some help on bokeh particularly would be nice, as I fear picking one way of implementing the object structure and later finding it is a mistake.the initial euphoria of being able to automatically rename files and extract value X from file Y soon gives way to frustration when it comes to performance.Yep.The paper shows well that in a world where data processing is of utmost importance, and we're talking about huge sets of data, languages like Python don't cut it anymore.I could not agree more, and I do think the intersection of two trends creates tremendous opportunity for D. It's also commonsensical to look at notable successes - and I hope it is not just my biases that lead me to think many of these are in just this kind of application. Data sets keep getting larger (but not necessarily more information rich in dollar terms), and Moore's Law/memory speed+latency is not keeping pace. This is exactly the kind of change that creeps up on you because not much changes in a few months (which is the kind of horizon many of us tend to think in). People say "what is D's edge", but my personal perception is "where is the competition for D" in this area. It has to be native code/JIT, and I refuse to learn Java; it also should be plastic and lend itself to rapid iteration.at the same time there's growing discontent among researchers, scientists and engineers as regards performance, simply because the data sets are becoming bigger and bigger every day and the algorithms are getting more and more refined. Sooner or later people will have to find new ways, out of sheer necessity.upvote. I would love to see any references you have on this - not because it's not rather obvious to me, but because it is helpful when talking to other people.Don't forget that "the state of the art" can change very quickly in IT and the name of the game is anticipating new developments rather than taking snapshots of the current state of the art and frame them. D really has a lot to offer for data processing and I wouldn't rule it out that more and more programmers will turn to it for this task.I fully agree. If we started a section on use cases, would you be able to write a page or two on D's advantages in data processing?
Mar 31 2015
On Tuesday, 31 March 2015 at 11:04:50 UTC, Laeeth Isharc wrote:The article that gave rise to this thread is a good reference. I came from a slightly different angle, I looked for alternatives to Python, because I needed: 1. fast native execution (real time) 2. easy interfacing to C 3. cross-platform development (Modern convenience, templates, ranges etc. were bonuses I discovered bit by bit) As regards algorithms and data processing, most people in research use Matlab (proprietary) and Python. However, in my field they're useless when it comes to building data-driven systems (fast analysis, retraining of machine based on (slight) modifications), and putting computationally heavy algorithms into real world applications. Proof of concept is all it amounts to, usually. So D has a real chance here, because of 1. native code 2. modern convenience 3. templates, structs, mixins, ranges, std.algorithm etcetc. 4. interfacing to C libsAs Andrew Brown pointed out, visualization is not behind Pythons success. Its success lies in the fact that it's a language you can hack away in easily.Sounds right. I am not in the camp that says it is a killer for D. It would just be nice to have both at least a passable solution for visualization, and some way of making it interactive. (The REPL might be one route). The problem with separating the processes completely and just piping the output from D code that does the heavy lifting to a python or julia front end is it may make it more painful to play with and explore the data. My interests are finance more than science, so that may lead to a different set of needs. Finishing mathgl and writing D bindings for bokeh (take a look - it is pretty cool, particularly to be able to use the browser as client, acknowledging that it is a tradeoff) is not so much work. But some help on bokeh particularly would be nice, as I fear picking one way of implementing the object structure and later finding it is a mistake.the initial euphoria of being able to automatically rename files and extract value X from file Y soon gives way to frustration when it comes to performance.Yep.The paper shows well that in a world where data processing is of utmost importance, and we're talking about huge sets of data, languages like Python don't cut it anymore.I could not agree more, and I do think the intersection of two trends creates tremendous opportunity for D. It's also commonsensical to look at notable successes - and I hope it is not just my biases that lead me to think many of these are in just this kind of application. Data sets keep getting larger (but not necessarily more information rich in dollar terms), and Moore's Law/memory speed+latency is not keeping pace. This is exactly the kind of change that creeps up on you because not much changes in a few months (which is the kind of horizon many of us tend to think in). People say "what is D's edge", but my personal perception is "where is the competition for D" in this area. It has to be native code/JIT, and I refuse to learn Java; it also should be plastic and lend itself to rapid iteration.at the same time there's growing discontent among researchers, scientists and engineers as regards performance, simply because the data sets are becoming bigger and bigger every day and the algorithms are getting more and more refined. Sooner or later people will have to find new ways, out of sheer necessity.upvote. I would love to see any references you have on this - not because it's not rather obvious to me, but because it is helpful when talking to other people.I think that Dicebot et al would have good examples.Don't forget that "the state of the art" can change very quickly in IT and the name of the game is anticipating new developments rather than taking snapshots of the current state of the art and frame them. D really has a lot to offer for data processing and I wouldn't rule it out that more and more programmers will turn to it for this task.I fully agree. If we started a section on use cases, would you be able to write a page or two on D's advantages in data processing?
Mar 31 2015
On Tuesday, 31 March 2015 at 13:31:33 UTC, Chris wrote:On Tuesday, 31 March 2015 at 11:04:50 UTC, Laeeth Isharc wrote:It'd be nice, if we had a dedicated data-analysis section and/or library. I'm almost sure that people working with massive amounts of data would find it by googling "efficient data analysis" or something like that. Facebook probably has a wealth of data analysis examples / techniques, too.The article that gave rise to this thread is a good reference. I came from a slightly different angle, I looked for alternatives to Python, because I needed: 1. fast native execution (real time) 2. easy interfacing to C 3. cross-platform development (Modern convenience, templates, ranges etc. were bonuses I discovered bit by bit) As regards algorithms and data processing, most people in research use Matlab (proprietary) and Python. However, in my field they're useless when it comes to building data-driven systems (fast analysis, retraining of machine based on (slight) modifications), and putting computationally heavy algorithms into real world applications. Proof of concept is all it amounts to, usually. So D has a real chance here, because of 1. native code 2. modern convenience 3. templates, structs, mixins, ranges, std.algorithm etcetc. 4. interfacing to C libsAs Andrew Brown pointed out, visualization is not behind Pythons success. Its success lies in the fact that it's a language you can hack away in easily.Sounds right. I am not in the camp that says it is a killer for D. It would just be nice to have both at least a passable solution for visualization, and some way of making it interactive. (The REPL might be one route). The problem with separating the processes completely and just piping the output from D code that does the heavy lifting to a python or julia front end is it may make it more painful to play with and explore the data. My interests are finance more than science, so that may lead to a different set of needs. Finishing mathgl and writing D bindings for bokeh (take a look - it is pretty cool, particularly to be able to use the browser as client, acknowledging that it is a tradeoff) is not so much work. But some help on bokeh particularly would be nice, as I fear picking one way of implementing the object structure and later finding it is a mistake.the initial euphoria of being able to automatically rename files and extract value X from file Y soon gives way to frustration when it comes to performance.Yep.The paper shows well that in a world where data processing is of utmost importance, and we're talking about huge sets of data, languages like Python don't cut it anymore.I could not agree more, and I do think the intersection of two trends creates tremendous opportunity for D. It's also commonsensical to look at notable successes - and I hope it is not just my biases that lead me to think many of these are in just this kind of application. Data sets keep getting larger (but not necessarily more information rich in dollar terms), and Moore's Law/memory speed+latency is not keeping pace. This is exactly the kind of change that creeps up on you because not much changes in a few months (which is the kind of horizon many of us tend to think in). People say "what is D's edge", but my personal perception is "where is the competition for D" in this area. It has to be native code/JIT, and I refuse to learn Java; it also should be plastic and lend itself to rapid iteration.at the same time there's growing discontent among researchers, scientists and engineers as regards performance, simply because the data sets are becoming bigger and bigger every day and the algorithms are getting more and more refined. Sooner or later people will have to find new ways, out of sheer necessity.upvote. I would love to see any references you have on this - not because it's not rather obvious to me, but because it is helpful when talking to other people.I think that Dicebot et al would have good examples.Don't forget that "the state of the art" can change very quickly in IT and the name of the game is anticipating new developments rather than taking snapshots of the current state of the art and frame them. D really has a lot to offer for data processing and I wouldn't rule it out that more and more programmers will turn to it for this task.I fully agree. If we started a section on use cases, would you be able to write a page or two on D's advantages in data processing?
Mar 31 2015
On Tuesday, 31 March 2015 at 11:04:50 UTC, Laeeth Isharc wrote:It is in the JVM and .NET eco-systems. Both have AOT compilers available, are able to chew data on GPGPUs and offer SIMD libraries. This is why there is such a strong focus with value types and better C interop planned for Java 10, has its use for data analysis has been growing. In HPF, companies prefer to live with JVM workarounds for the current limitations than go out and hire a few C++ developers, given the amount of money saved in salaries. -- PauloAs Andrew Brown pointed out, visualization is not behind Pythons success. Its success lies in the fact that it's a language you can hack away in easily.Sounds right. I am not in the camp that says it is a killer for D. It would just be nice to have both at least a passable solution for visualization, and some way of making it interactive. (The REPL might be one route). The problem with separating the processes completely and just piping the output from D code that does the heavy lifting to a python or julia front end is it may make it more painful to play with and explore the data. My interests are finance more than science, so that may lead to a different set of needs. Finishing mathgl and writing D bindings for bokeh (take a look - it is pretty cool, particularly to be able to use the browser as client, acknowledging that it is a tradeoff) is not so much work. But some help on bokeh particularly would be nice, as I fear picking one way of implementing the object structure and later finding it is a mistake.the initial euphoria of being able to automatically rename files and extract value X from file Y soon gives way to frustration when it comes to performance.Yep.The paper shows well that in a world where data processing is of utmost importance, and we're talking about huge sets of data, languages like Python don't cut it anymore.I could not agree more, and I do think the intersection of two trends creates tremendous opportunity for D. It's also commonsensical to look at notable successes - and I hope it is not just my biases that lead me to think many of these are in just this kind of application. Data sets keep getting larger (but not necessarily more information rich in dollar terms), and Moore's Law/memory speed+latency is not keeping pace. This is exactly the kind of change that creeps up on you because not much changes in a few months (which is the kind of horizon many of us tend to think in). People say "what is D's edge", but my personal perception is "where is the competition for D" in this area. It has to be native code/JIT, and I refuse to learn Java; it also should be plastic and lend itself to rapid iteration.
Mar 31 2015
On Monday, 30 March 2015 at 18:04:58 UTC, george wrote:Yes on the server side and UNIX based research. However, I have learned in the last years that Windows based systems are also used a lot, specially in controlling robots and doing the first processing steps and visualization. At least in commercial research. -- Paulo.NET actually already has a foothold in bioinformatics, specially in user facing software and steering of reading equipments and robots. visualization) use cases. -- PauloThough when it comes to open source bioinformatics projects, Perl and Python have a large foothold among most most bioinformaticians. Most utilities that require speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS etc). I think D stands a good chance as a language of choice for bioinformatics projects. George
Mar 30 2015
On Monday, 30 March 2015 at 20:28:11 UTC, Paulo Pinto wrote:On Monday, 30 March 2015 at 18:04:58 UTC, george wrote:Yes, to the benefit of literally no-one. To be fair, it's not a problem of the operating system, just that special purpose GUI programmes for scientific work always seem to be utterly dreadful. "Hey, we need to record some time series and show a spectrum on the fly" "OK great, let's commission a closed source Windows GUI application with its own proprietary file format, sure it'll crash once a day and have scientifically important paramters hard-coded and undocumented, but at least you can point and click!" It seems to be true across the board in government research facilities, pharmaceutical companies, most of academia and so on... Enormous piles of proprietary vomit being propped up by an endless stream of disinterested and semi-incompetent programmers, steadily digging their way to job security.Yes on the server side and UNIX based research. However, I have learned in the last years that Windows based systems are also used a lot, specially in controlling robots and doing the first processing steps and visualization. At least in commercial research. -- Paulo.NET actually already has a foothold in bioinformatics, specially in user facing software and steering of reading equipments and robots. visualization) use cases. -- PauloThough when it comes to open source bioinformatics projects, Perl and Python have a large foothold among most most bioinformaticians. Most utilities that require speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS etc). I think D stands a good chance as a language of choice for bioinformatics projects. George
Mar 31 2015
On 3/29/15 11:50 PM, george wrote:http://bioinformatics.oxfordjournals.org/content/early/2015/02/18/bioinformatics.btv098.full.pdf+html and a feature http://google-opensource.blogspot.nl/2015/03/gsoc-project-sambamba-published-in.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+GoogleOpenSourceBlog+(Google+Open+Source+Blog) D may hold a sweet spot in bioinformatics where you often require quick turnaround (productivity) , raw speed and agility.Nice! Went to post it on reddit, was already there: http://www.reddit.com/r/programming/comments/30tvlf/d_in_bioinformatics_gsoc_project_sambamba/ More: https://news.ycombinator.com/newest https://twitter.com/D_Programming/status/582603844355424257 https://www.facebook.com/dlang.org/posts/1041963349150679 Andrei
Mar 30 2015