Main functions usable by end users during the assembling process

The orgasm.assembler package

author

Eric Coissac

contact

eric.coissac@inria.fr

The orgasm.assembler python package provide the Assembler class which manage the assembling process.

class orgasm.assembler.Assembler

totolitoto

graph

A doc string can go here.

index

A doc string can go here.

readType()

Internal function : Given a set of read ids, return the one that have to be used as standard id.

The set of ids given to this function corresponds to a set of all strictly identical reads.

Parameters

ids (iterable) – an iterable elements contining read ids.

Returns

a tuble of three elements

  • the standard id to use

  • the length of ids

  • a set containing all unique ids given as parameter

Return type

tuple

seeds

A doc string can go here.

The orgasm.tango package

The tango package contains a set of functions useful to manage assembling structure.

author

Eric Coissac

contact

eric.coissac@inria.fr

Warning

The tango package functions aims to be integrated into other packages as standalone functions of class methods.

orgasm.tango.coverageEstimate(self, matches=None, index=None, timeout=60.0)[source]

Estimates the average coverage depth of the sequence.

The algorithm is masic and can be very slow. To avoid infinity computation time a timeout limits it to 60 secondes by default.

Three values are returned by the function :

  • The number of bp considered to estimate the coverage

  • The length of the segment used for the estimation

  • The coverage depth

Parameters

timeout – Maximum computation time.

Returns

a triplet (int,int,float)

orgasm.tango.cutLowCoverage(self, mincov, terminal=True)[source]

Remove sequences in the assembling graph with a coverage below mincov.

In [159]: asm = Assembler(r)

In [160]: s = matchtoseed(m,r)

In [161]: a = tango(asm,s,mincov=1,minread=10,minoverlap=30,maxjump=0,cycle=1)

In [162]: asm.cleanDeadBranches(maxlength=10)

Remaining edges : 424216 node : 423896
Out[162]: 34821
In [162]: cutLowCoverage(asm,10,terminal=False)
Parameters
  • mincov (int) – coverage threshold

  • terminal (bool) – if set to True only terminal edges are removed from the assembling graph

Returns

the count of deleted node

Return type

int

Seealso

cleanDeadBranches()

orgasm.tango.cutLowSeeds(self, minseeds, seeds, terminal=True)[source]

Remove sequences in the assembling graph with a coverage below mincov.

In [159]: asm = Assembler(r)

In [160]: s = matchtoseed(m,r)

In [161]: a = tango(asm,s,mincov=1,minread=10,minoverlap=30,maxjump=0,cycle=1)

In [162]: asm.cleanDeadBranches(maxlength=10)

Remaining edges : 424216 node : 423896
Out[162]: 34821
In [162]: cutLowCoverage(asm,10,terminal=False)
Parameters
  • mincov (int) – coverage threshold

  • terminal (bool) – if set to True only terminal edges are removed from the assembling graph

Returns

the count of deleted node

Return type

int

Seealso

cleanDeadBranches()

orgasm.tango.fillGaps(self, minlink=5, back=200, kmer=12, smin=40, delta=0, cmincov=5, minread=20, minratio=0.1, emincov=1, maxlength=None, gmincov=1, minoverlap=60, lowfilter=True, adapters5=(), adapters3=(), maxjump=0, snp=False, nodeLimit=1000000, onlyLinking=False, useonce=True, logger=None)[source]
Parameters
  • minlink

  • back

  • kmer

  • smin

  • delta

  • cmincov

  • minread

  • minratio

  • emincov

  • maxlength

  • gmincov

  • minoverlap

  • lowfilter

  • maxjump

  • snp – If set to True (default value is False) erase SNP variation by conserving the most abundant version

orgasm.tango.fillGaps2(self, minlink=5, back=200, kmer=12, smin=40, delta=0, cmincov=5, minread=20, minratio=0.1, emincov=1, maxlength=None, gmincov=1, minoverlap=60, lowfilter=True, adapters5=(), adapters3=(), maxjump=0, snp=False, nodeLimit=1000000, onlyfill=False)[source]
Parameters
  • minlink

  • back

  • kmer

  • smin

  • delta

  • cmincov

  • minread

  • minratio

  • emincov

  • maxlength

  • gmincov

  • minoverlap

  • lowfilter

  • maxjump

  • snp – If set to True (default value is False) erase SNP variation by conserving the most abundant version

orgasm.tango.getPairedRead(self, assgraph, stemid, back, end=True)[source]
Parameters
  • assgraph

  • stemid

  • back

  • end

orgasm.tango.mode(data)[source]

Compute a raw estimation of the mode of a data set

Parameters

data (a permanent iterable object (list, tuble...)) – The data set to analyse

orgasm.tango.pairEndedConnected(self, assgraph, edge1, edge2, back=250)[source]

Returns how many pair ended reads link two edges in a compact assembling graph

Parameters
  • assgraph (DiGraphMultiEdge) – The compact assembling graph as produced by the compactAssembling() method

  • edge1 (int) – The stemid of the first edge

  • edge2 (int) – The stemid of the second edge

  • back (int) – How many base pairs must be considered at the end of each edge

Returns

The count of pair ended reads linking both the edges

Return type

int

orgasm.tango.path2fasta(self, assgraph, path, identifier='contig', minlink=10, nlength=20, back=200, logger=None, tags=[])[source]

Convert a path in an compact assembling graph in a fasta formated sequences.

Parameters
  • assgraph (DiGraphMultiEdge) – The compact assembling graph as produced by the compactAssembling() method

  • path (an iterable over int) – an iterable providing an ordered list of stemid indicating the path to follow.

  • identifier (bytes) – the identifier used in the header of the fasta formated sequence

  • minlink (int) – the minimum count of pair ended link to consider for asserting the relationship

  • nlength (int) – how many N must be added between two segment of sequences only connected by pair ended links

  • back (int) – How many base pairs must be considered at the end of each edge

Returns

a string containing the fasta formated sequence

Return type

bytes

Raises

AssertionError

orgasm.tango.scaffold(self, assgraph, minlink=5, back=200, addConnectedLink=False, forcedLink={}, logger=None)[source]

Add relationships between edges of the assembling graph related to the par ended links.

Parameters
  • assgraph (DiGraphMultiEdge) – The compact assembling graph as produced by the compactAssembling() method

  • minlink (int) – the minimum count of pair ended link to consider for asserting the relationship

  • back (int) – How many base pairs must be considered at the end of each edge

  • addConnectedLink (bool) – add to the assembling graph green edges for each directly connected edge pair representing the pair ended links asserting the connection.

orgasm.tango.unfoldAssembling(self, assgraph, constraints=None, seeds=None, threshold=5.0, back=500, minlink=5, limitSize=0, circular=False, force=False, cov1x=None, logger=None)[source]
Parameters
  • assgraph

  • constraints

  • seeds – set of stem to use as seed for the unfolding algorithm

  • threshold

  • back

  • minlink

  • limitSize – maximum size of the contig in base pair

  • circular – if TRUE, we hope to get a circular contig

  • force – if TRUE, we ask for a circular contig

  • logger