Alphabet#

class Alphabet(motifset, gap='-', moltype=None)#

An ordered set of fixed-length strings, e.g. the 61 sense codons.

ambiguities (e.g. N for any base in DNA) are not considered part of the alphabet itself, although a sequence is valid on the alphabet even if it contains ambiguities that are known to the alphabet. A gap is considered a separate motif and is not part of the alphabet itself.

The typical use is for the Alphabet to hold nucleic acid bases, amino acids, or codons.

The moltype, if supplied, handles ambiguities, coercion of the sequence to the correct data type, and complementation (if appropriate).

Attributes:
moltype

Methods

count(value, /)

Return number of occurrences of value.

from_indices(data)

Returns sequence of elements from sequence of indices.

get_gap_motif()

Returns the motif that self is using as a gap.

get_motif_len()

Returns the length of the items in self, or None if they differ.

get_subset(motif_subset[, excluded])

Returns a new Alphabet object containing a subset of motifs in self.

get_word_alphabet(word_length)

Returns a new Alphabet object with items as word_length strings.

includes_gap_motif()

Returns True if self includes the gap motif, False otherwise.

index(item)

Returns the index of a specified item.

is_valid(seq)

Returns True if seq contains only items in self.

to_indices(data)

Returns sequence of indices from sequence of elements.

to_json()

returns result of json formatted string

with_gap_motif()

Returns an Alphabet object resembling self but including the gap.

AlphabetError

to_rich_dict

exception AlphabetError#
add_note()#

Exception.add_note(note) – add a note to the exception

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

count(value, /)#

Return number of occurrences of value.

from_indices(data)#

Returns sequence of elements from sequence of indices.

Specifically, takes as input a sequence of numbers corresponding to elements in the Enumeration (i.e. the numbers must all be < len(self). Returns a list of the items in the same order as the indices. Inverse of to_indices.

e.g. for the DNA alphabet (‘U’,’C’,’A’,’G’), the sequence [1,1,2,0] would produce the result ‘CCAU’, returning the element corresponding to each element in the input.

get_gap_motif()#

Returns the motif that self is using as a gap. Note that this will typically be a multiple of self.gap.

get_motif_len()#

Returns the length of the items in self, or None if they differ.

get_subset(motif_subset, excluded=False)#

Returns a new Alphabet object containing a subset of motifs in self.

Raises an exception if any of the items in the subset are not already in self. Always returns a new object.

get_word_alphabet(word_length)#

Returns a new Alphabet object with items as word_length strings.

Note that the result is not a JointEnumeration object, and cannot unpack its indices. However, the items in the result _are_ all strings.

includes_gap_motif()#

Returns True if self includes the gap motif, False otherwise.

index(item)#

Returns the index of a specified item.

This goes through an extra object lookup. If you _really_ need speed, you can bind self._obj_to_index.__getitem__ directly, but this is not recommended because the internal implementation may change.

is_valid(seq)#

Returns True if seq contains only items in self.

property moltype#
to_indices(data)#

Returns sequence of indices from sequence of elements.

Raises KeyError if some of the elements were not found.

Expects data to be a sequence (e.g. list of tuple) of items that are in the Enumeration. Returns a list containing the index of each element in the input, in order.

e.g. for the RNA alphabet (‘U’,’C’,’A’,’G’), the sequence ‘CCAU’ would produce the result [1,1,2,0], returning the index of each element in the input.

to_json()#

returns result of json formatted string

to_rich_dict(for_pickle=False)#
with_gap_motif()#

Returns an Alphabet object resembling self but including the gap.

Always returns the same object.