Applying GNC, a non-stationary codon model#

See Kaehler et al for the formal description of this model. Note that perform hypothesis testing using this model elsewhere.

We apply this to a sample alignment.

from cogent3 import get_app

loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")

The model is specified using it’s abbreviation.

model = get_app("model", "GNC", tree="data/primate_brca1.tree")
result = model(aln)
result
GNC
keylnLnfpDLCunique_Q
'GNC'-6713.273323TrueTrue
result.lf

GNC

log-likelihood = -6713.2733

number of free parameters = 23

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>A
0.86153.53740.97921.66682.20426.25677.91971.22530.80151.2911
continuation
T>Comega
3.07240.8204
Edge params
edgeparentlength
Galagoroot0.5232
HowlerMonroot0.1338
Rhesusedge.30.0640
Orangutanedge.20.0233
Gorillaedge.10.0075
Humanedge.00.0182
Chimpanzeeedge.00.0085
edge.0edge.10.0000
edge.1edge.20.0100
edge.2edge.30.0366
edge.3root0.0238
Motif params
AAAAACAAGAATACAACCACGACTAGAAGC
0.05560.02350.03440.05560.02280.00460.00080.02890.02310.0286
continuation
AGGAGTATAATCATGATTCAACACCAGCAT
0.01400.03810.01860.00700.01280.01920.01960.00520.02380.0221
continuation
CCACCCCCGCCTCGACGCCGGCGTCTACTC
0.01950.00620.00060.02630.00110.00090.00230.00320.01370.0078
continuation
CTGCTTGAAGACGAGGATGCAGCCGCGGCT
0.01250.01050.07550.01050.03030.03150.01580.00960.00140.0137
continuation
GGAGGCGGGGGTGTAGTCGTGGTTTACTAT
0.01610.00900.00670.01330.01480.00700.00690.02130.00230.0101
continuation
TCATCCTCGTCTTGCTGGTGTTTATTCTTG
0.02210.00820.00150.02510.00180.00400.02010.02120.00780.0108
continuation
TTT
0.0187

We can obtain the tree with branch lengths as ENS#

If this tree is written to newick (using the write() method), the lengths will now be ENS.

tree = result.tree
fig = tree.get_figure()
fig.scale_bar = "top right"
fig.show(width=500, height=500)