www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Variant Graph Support to BioD

reply Njagi Mwaniki <null+dlang njagi.me> writes:
Hello I’m Njagi Mwaniki,

I am part of the 2019  Google Summer of Code under the Open 
Bioinformatics Foundation with a project aimed to add variation 
graph support to BioD under mentors George Githinji and Pjotr 
Prins.

What are variation graphs? Well it’s sequence graph that is used 
to represent variation in a genome. Let me explain.

A sequence graph also an alignment graph, breakpoint graph, or 
adjacency graph is a bidirected graph in which the vertices 
represent segments of DNA and the edges represent adjacency 
between segments in a genome. (from Wikipedia)

Sequence graphs have long been proposed as a replacement for 
reference genomes which are linear structures/sequences of bases.

A variation graph is a sequence graph together with a set of 
paths representing possible sequences from a population[1]. 
Despite these ideas being around for a long time we haven’t yet 
been able to use sequence graphs in real life bioinformatics 
applications such as sequence alignment or determining homology. 
This is what we hope to speed up.

VG is a set of tools that already implements variation graphs but 
which is a bit broad in its focus. In this project we are 
building upon the existing tools and knowledge from VG and 
looking for ways to improve its performance in terms of lookups 
and also its application with small genomes, specifically viruses 
and smaller mammals such as mice.


[1] Variation graph toolkit improves read mapping by representing 
genetic variation in the reference
May 28 2019
next sibling parent reply James Blachly <james.blachly gmail.com> writes:
On 5/28/19 5:41 AM, Njagi Mwaniki wrote:
 Hello I’m Njagi Mwaniki,
 
 I am part of the 2019  Google Summer of Code under the Open 
 Bioinformatics Foundation with a project aimed to add variation graph 
 support to BioD under mentors George Githinji and Pjotr Prins.
 ...
 VG is a set of tools that already implements variation graphs but which 
 is a bit broad in its focus. In this project we are building upon the 
 existing tools and knowledge from VG and looking for ways to improve its 
 performance in terms of lookups and also its application with small 
 genomes, specifically viruses and smaller mammals such as mice.
This sounds like a great project. Be aware that the size of the organism (e.g. mouse) has naught to do with the size of its genome.
May 28 2019
parent Njagi Mwaniki <null+dlang njagi.me> writes:
On Tuesday, 28 May 2019 at 21:24:54 UTC, James Blachly wrote:
 On 5/28/19 5:41 AM, Njagi Mwaniki wrote:
 Hello I’m Njagi Mwaniki,
 
 I am part of the 2019  Google Summer of Code under the Open 
 Bioinformatics Foundation with a project aimed to add 
 variation graph support to BioD under mentors George Githinji 
 and Pjotr Prins.
 ...
 VG is a set of tools that already implements variation graphs 
 but which is a bit broad in its focus. In this project we are 
 building upon the existing tools and knowledge from VG and 
 looking for ways to improve its performance in terms of 
 lookups and also its application with small genomes, 
 specifically viruses and smaller mammals such as mice.
This sounds like a great project. Be aware that the size of the organism (e.g. mouse) has naught to do with the size of its genome.
Thank you. With regards to the complexity of the genome, we're starting with a very small virus dataset and building up from it https://github.com/urbanslug/GSoC-experiments/tree/master/data/RSV/refererence_and_vcf_file The mouse is a possible application area of the genome really and a good place to test the robustness of the tool.
May 29 2019
prev sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Tuesday, 28 May 2019 at 09:41:18 UTC, Njagi Mwaniki wrote:
 Hello I’m Njagi Mwaniki,

 I am part of the 2019  Google Summer of Code under the Open 
 Bioinformatics Foundation with a project aimed to add variation 
 graph support to BioD under mentors George Githinji and Pjotr 
 Prins.
Awsome! can you supply some links please?
May 28 2019
parent Njagi Mwaniki <null+dlang njagi.me> writes:
On Wednesday, 29 May 2019 at 03:16:19 UTC, Nicholas Wilson wrote:
 On Tuesday, 28 May 2019 at 09:41:18 UTC, Njagi Mwaniki wrote:
 Hello I’m Njagi Mwaniki,

 I am part of the 2019  Google Summer of Code under the Open 
 Bioinformatics Foundation with a project aimed to add 
 variation graph support to BioD under mentors George Githinji 
 and Pjotr Prins.
Awsome! can you supply some links please?
I don't have a lot of links but the D lib is here https://github.com/biod/biod I'm prototyping a bit of the data structures here in racket https://github.com/urbanslug/GSoC-experiments I plan on writing blog posts soon explaining the graph, it's implementation and progress as well.
May 29 2019