Main functions usable by end users during the assembling process¶
The orgasm.assembler
package¶
- author
Eric Coissac
- contact
The orgasm.assembler
python package provide the Assembler
class
which manage the assembling process.
-
class
orgasm.assembler.
Assembler
¶ totolitoto
-
graph
¶ A doc string can go here.
-
index
¶ A doc string can go here.
-
readType
()¶ Internal function : Given a set of read ids, return the one that have to be used as standard id.
The set of ids given to this function corresponds to a set of all strictly identical reads.
- Parameters
ids (iterable) – an iterable elements contining read ids.
- Returns
a tuble of three elements
the standard id to use
the length of ids
a set containing all unique ids given as parameter
- Return type
tuple
-
seeds
¶ A doc string can go here.
-
The orgasm.tango
package¶
The tango
package contains a set of functions useful to manage assembling structure.
- author
Eric Coissac
- contact
Warning
The tango
package functions aims to be integrated
into other packages as standalone functions of class methods.
-
orgasm.tango.
coverageEstimate
(self, matches=None, index=None, timeout=60.0)[source]¶ Estimates the average coverage depth of the sequence.
The algorithm is masic and can be very slow. To avoid infinity computation time a timeout limits it to 60 secondes by default.
Three values are returned by the function :
The number of bp considered to estimate the coverage
The length of the segment used for the estimation
The coverage depth
- Parameters
timeout – Maximum computation time.
- Returns
a triplet (int,int,float)
-
orgasm.tango.
cutLowCoverage
(self, mincov, terminal=True)[source]¶ Remove sequences in the assembling graph with a coverage below
mincov
.In [159]: asm = Assembler(r) In [160]: s = matchtoseed(m,r) In [161]: a = tango(asm,s,mincov=1,minread=10,minoverlap=30,maxjump=0,cycle=1) In [162]: asm.cleanDeadBranches(maxlength=10) Remaining edges : 424216 node : 423896 Out[162]: 34821 In [162]: cutLowCoverage(asm,10,terminal=False)
- Parameters
mincov (
int
) – coverage thresholdterminal (
bool
) – if set toTrue
only terminal edges are removed from the assembling graph
- Returns
the count of deleted node
- Return type
int
- Seealso
cleanDeadBranches()
-
orgasm.tango.
cutLowSeeds
(self, minseeds, seeds, terminal=True)[source]¶ Remove sequences in the assembling graph with a coverage below
mincov
.In [159]: asm = Assembler(r) In [160]: s = matchtoseed(m,r) In [161]: a = tango(asm,s,mincov=1,minread=10,minoverlap=30,maxjump=0,cycle=1) In [162]: asm.cleanDeadBranches(maxlength=10) Remaining edges : 424216 node : 423896 Out[162]: 34821 In [162]: cutLowCoverage(asm,10,terminal=False)
- Parameters
mincov (
int
) – coverage thresholdterminal (
bool
) – if set toTrue
only terminal edges are removed from the assembling graph
- Returns
the count of deleted node
- Return type
int
- Seealso
cleanDeadBranches()
-
orgasm.tango.
fillGaps
(self, minlink=5, back=200, kmer=12, smin=40, delta=0, cmincov=5, minread=20, minratio=0.1, emincov=1, maxlength=None, gmincov=1, minoverlap=60, lowfilter=True, adapters5=(), adapters3=(), maxjump=0, snp=False, nodeLimit=1000000, onlyLinking=False, useonce=True, logger=None)[source]¶ - Parameters
minlink –
back –
kmer –
smin –
delta –
cmincov –
minread –
minratio –
emincov –
maxlength –
gmincov –
minoverlap –
lowfilter –
maxjump –
snp – If set to True (default value is False) erase SNP variation by conserving the most abundant version
-
orgasm.tango.
fillGaps2
(self, minlink=5, back=200, kmer=12, smin=40, delta=0, cmincov=5, minread=20, minratio=0.1, emincov=1, maxlength=None, gmincov=1, minoverlap=60, lowfilter=True, adapters5=(), adapters3=(), maxjump=0, snp=False, nodeLimit=1000000, onlyfill=False)[source]¶ - Parameters
minlink –
back –
kmer –
smin –
delta –
cmincov –
minread –
minratio –
emincov –
maxlength –
gmincov –
minoverlap –
lowfilter –
maxjump –
snp – If set to True (default value is False) erase SNP variation by conserving the most abundant version
-
orgasm.tango.
getPairedRead
(self, assgraph, stemid, back, end=True)[source]¶ - Parameters
assgraph –
stemid –
back –
end –
-
orgasm.tango.
mode
(data)[source]¶ Compute a raw estimation of the mode of a data set
- Parameters
data (a permanent iterable object (list, tuble...)) – The data set to analyse
-
orgasm.tango.
pairEndedConnected
(self, assgraph, edge1, edge2, back=250)[source]¶ Returns how many pair ended reads link two edges in a compact assembling graph
- Parameters
assgraph (
DiGraphMultiEdge
) – The compact assembling graph as produced by thecompactAssembling()
methodedge1 (
int
) – Thestemid
of the first edgeedge2 (
int
) – Thestemid
of the second edgeback (
int
) – How many base pairs must be considered at the end of each edge
- Returns
The count of pair ended reads linking both the edges
- Return type
int
-
orgasm.tango.
path2fasta
(self, assgraph, path, identifier='contig', minlink=10, nlength=20, back=200, logger=None, tags=[])[source]¶ Convert a path in an compact assembling graph in a fasta formated sequences.
- Parameters
assgraph (
DiGraphMultiEdge
) – The compact assembling graph as produced by thecompactAssembling()
methodpath (an
iterable
overint
) – aniterable
providing an ordered list ofstemid
indicating the path to follow.identifier (
bytes
) – the identifier used in the header of the fasta formated sequenceminlink (
int
) – the minimum count of pair ended link to consider for asserting the relationshipnlength (
int
) – how manyN
must be added between two segment of sequences only connected by pair ended linksback (
int
) – How many base pairs must be considered at the end of each edge
- Returns
a string containing the fasta formated sequence
- Return type
bytes
- Raises
AssertionError
-
orgasm.tango.
scaffold
(self, assgraph, minlink=5, back=200, addConnectedLink=False, forcedLink={}, logger=None)[source]¶ Add relationships between edges of the assembling graph related to the par ended links.
- Parameters
assgraph (
DiGraphMultiEdge
) – The compact assembling graph as produced by thecompactAssembling()
methodminlink (
int
) – the minimum count of pair ended link to consider for asserting the relationshipback (
int
) – How many base pairs must be considered at the end of each edgeaddConnectedLink (
bool
) – add to the assembling graph green edges for each directly connected edge pair representing the pair ended links asserting the connection.
-
orgasm.tango.
unfoldAssembling
(self, assgraph, constraints=None, seeds=None, threshold=5.0, back=500, minlink=5, limitSize=0, circular=False, force=False, cov1x=None, logger=None)[source]¶ - Parameters
assgraph –
constraints –
seeds – set of stem to use as seed for the unfolding algorithm
threshold –
back –
minlink –
limitSize – maximum size of the contig in base pair
circular – if TRUE, we hope to get a circular contig
force – if TRUE, we ask for a circular contig
logger –