Rand Stats

BioInfo

github:MattOates

BioInfo Build Status

BioInfo is a reimagining of bioinformatics libraries for Perl6 based on all the new goodies available such as parallel processing through Promises and Channels and well defined and extensible parsing with the use of Grammars. Not to be confused with the bioperl6 project lead by Chris Fields which aims to be a reimplementation of the BioPerl API for Perl6. Checkout Chris' project for any serious business.

The Good

The Bad

The Ugly


An Example

Dealing with sequences from a file:

use BioInfo::Parser::FASTA;
use BioInfo::IO::FileParser;

#Spawn an IO thread that parses the file and creates BioInfo::Seq objects on .get
my $seq_file = BioInfo::IO::FileParser.new(file => 'myseqs.fa', parser => BioInfo::Parser::FASTA);

#Print all the IDs from the sequence file
while (my $seq = $seq_file.get()) {
    say $seq.id;
}

Using the BioInfo slang to seamlessly manipulate sequences in your code. Anything within backticks or grave accents `` will be treated as FASTA sequence data and create an array of sequence objects:

use BioInfo;

my @seqs = `
>ENST00000505673 cds:NOVEL_protein_coding
NGGTGTGGAGCCATAGGCTGTGAAATGTTGAAAAATTTTGCTTTACTTGGTGTTGGCACA
AGCAAAGAGAAAGGAATGATTACAGTTACAGATCCTGACTTGATAGAGAAATCCAACTTA
AATAGACAGTTCCTATTTCGTCCTCATCACATACAGAAACCTAAAAGCTACACTGCTGCT
GATGCTACTCTGAAAATAAATTCTCAAATAAAGATAGATGCACACCTGAACAAAGTATGT
CCAACCACTGAGACCATTTACAATGATGAGTTCTATACTAAACAAGATGTAATTATTACA
GCATTAGATAATGTGGAAGCCAGGAGATACGTAGACAGTCGTTGCTTAGCAAATCTAAGG
CCTCTTTTAGATTCTGGAACAATGGGCACTAAGGGACACACTGAAGTTATTGTACCGCAT
TTGACTGAGTCTTACAATAGTCATCGGGATCCCCCAGAAGAGGAAATACCATTTTGTACT
CTAAAATCCTTTCCAGCTGCTATTGAACATACCATACAGTGGGCAAGAGATAAGTTTGAA
AGTTCCTTTTCCCACAAACCTTCATTGTTTAACAAATTTTGGCAAACCTATTCATCTGCA
GAAGAAGTCTTACAGGCTCTTCAGCTTCTTCACTGTTTCCCTCTGGACATACGATTAAAA
GATGGCAGTTTATTTTGGCAGTCACCAAAGAGGCCACCCTCTCCAATAAAATTTGATTTA
AATGAGCCTTTGCACCTCAGTTTCCTTCAGAATGCTGCAAAACTATATGCTACAGTATAT
TGTATTCCATTTGCAGAAGAGGTAAGATATTCT
>ENST00000505673 peptide: ENSP00000421984 pep:NOVEL_protein_coding
XCGAIGCEMLKNFALLGVGTSKEKGMITVTDPDLIEKSNLNRQFLFRPHHIQKPKSYTAA
DATLKINSQIKIDAHLNKVCPTTETIYNDEFYTKQDVIITALDNVEARRYVDSRCLANLR
PLLDSGTMGTKGHTEVIVPHLTESYNSHRDPPEEEIPFCTLKSFPAAIEHTIQWARDKFE
SSFSHKPSLFNKFWQTYSSAEEVLQALQLLHCFPLDIRLKDGSLFWQSPKRPPSPIKFDL
NEPLHLSFLQNAAKLYATVYCIPFAEEVRYS
`;

#Print out all of the sequences, if the sequence was DNA translate to Amino Acids
for @seqs -> $seq {
    say $seq ~~ BioInfo::Seq::DNA ?? $seq.translate !! $seq;
}

The previous example will output the protein sequence twice since the DNA was the coding sequence for the protein. Notice that what type of sequence was automatically detected, for very short sequences this could get confused. However, it does allow for convenient and quick manipualtion of sequences.


Installation

Installation is quite simple at the moment, but might become more involved once third party software is involved.

With zef (part of the Rakudo* distribution)

zef install BioInfo
perl6 -M BioInfo -e 'say BioInfo::Seq.new(id => "seq1", sequence => "MDADAFA");'

By hand

#Get the code into your library path
cd ~/wherever_you_keep_perl6_libs
git clone 'https://github.com/MattOates/BioInfo'
#Run some tests
cd BioInfo
prove -e 'perl6 -I ./lib' -lrv t/

License

Artistic, for now just so that I don't have to worry about hardcore Perl users being fearful of anything else.