FASTA format for DNA/RNA and amino acid sequences

periodictable.fasta

Biomolecule support.

Molecule lets you define biomolecules with labile hydrogen atoms specified using tritium (T) in the chemical formula. The biomolecule object creates forms with natural isotope ratio, all hydrogen and all deuterium. Density can be provided as natural density or cell volume. A %D2O contrast match value is computed for matching the molecule SLD in the presence of labile hydrogens. Molecule.D2Osld() computes the neutron SLD for the solvated molecule in a %D2O solvent.

D2Omatch() computes the %D2O constrast match value given the fully hydrogenated and fully deuterated forms.

Sequence lets you read amino acid and DNA/RNA sequences from FASTA files.

Tables for common molecules are provided[1]:

AMINO_ACID_CODES : amino acids indexed by FASTA code

RNA_CODES, DNA_CODES* : nucleic bases indexed by FASTA code

RNA_BASES, DNA_BASES* : individual nucleic acid bases

NUCLEIC_ACID_COMPONENTS, LIPIDS, CARBOHYDRATE_RESIDUES

Neutron SLD for water at 20C is also provided as H2O_SLD and D2O_SLD.

For unmodified protein need to add 2*T and O for terminations.

Assumes that proteins were created in an environment with the usual H/D isotope ratio on the non-swappable hydrogens.

[1] Perkins, Modern Physical Methods in Biochemistry Part B, 143-265 (1988)

AMINO_ACID_CODES:

-: gap
A: alanine
B: aspartic acid/asparagine
C: cysteine
D: aspartic acid
E: glutamic acid
F: phenylalanine
G: glycine
H: histidine
I: isoleucine
J: leucine/isoleucine
K: lysine
L: leucine
M: methionine
N: asparagine
P: proline
Q: glutamine
R: arginine
S: serine
T: threonine
V: valine
W: tryptophan
X: any
Y: tyrosine
Z: glutamic acid/glutamine

NUCLEIC_ACID_COMPONENTS:

adenine: C5H2T2N5
cytosine: C4H2T2N3O
deoxyribose: C5H7O2
guanine: C5HT3N5O
phosphate: NaPO3
ribose: C5H6TO3
thymine: C5H4TN2O2
uracil: C4H2TN2O2

CARBOHYDRATE_RESIDUES:

Fuc (terminal): C6H7T3O4
Gal: C6H7T3O5
GalNAc: C8H10T3NO5
Glc: C6H7T3O5
GlcNAc: C8H10T3NO5
Man: C6H7T3O5
Man (terminal): C6H7T4O5
NeuNac (terminal): C11H11T5NO8
chondroitin sulphate: C14H15T4NO14SNa
hyaluronate: C14H15T5NO11Na
keratan sulphate: C14H17T5NO13SNa

LIPIDS:

DLPE: C29H55T3NO8P
DMPC: C36H72NO8P
DMPC-D52: C36H20D52NO8P
cholesteral: C27H45TO
methylene: CH2
methylene-D: CD2
oleate: C45H78O2
palmitate ester: C39H77T2N2O2P
phospholipid headgroup: C10H18NO8P
triglyceride headgroup: C6H5O6
trioleate form: C57H104O6

RNA_BASES:

A:adenosine
C:cytidine
G:guanosine
T:uridine

DNA_BASES:

A:adenosine
C:cytidine
G:guanosine
T:thymidine
class periodictable.fasta.Molecule(name, formula, cell_volume=None, density=None, charge=0)

Bases: object

Specify a biomolecule by name, chemical formula, cell volume and charge.

Labile hydrogen positions should be coded using tritium (T) rather than H. That way the tritium can be changed to H[1] for solutions with pure water, H for solutions with a natural abundance of water or D for solutions with pure deuterium.

Attributes

formula is the original tritiated formula. You can retrieve the hydrogenated or deuterated forms using isotope_substitution() with formula, periodictable.T and periodictable.H or periodictable.D.

D2Omatch is the % D2O in H2O required to contrast match the molecule, including substitution of labile hydrogen in proportion to the D/H ratio.

sld/Hsld/Dsld are the the scattering length densities of the molecule with tritium replaced by naturally occurring H/D ratios, pure H[1] and pure H[2] respectively.

mass/Hmass/Dmass are the masses the three conditions.

charge is the charge on the molecule

cell_volume is the estimated cell volume for the molecule

density is the estimated molecule density

D2Osld(volume_fraction=1.0, D2O_fraction=0.0)

Neutron SLD of the molecule in a %D2O solvent.

class periodictable.fasta.Sequence(name, sequence, type='aa')

Bases: periodictable.fasta.Molecule

Convert FASTA sequence into chemical formula.

name sequence name

sequence code string

type is one of:

aa: amino acid sequence
dna: dna sequence
rna: rna sequence

Note: rna sequence files treat T as U and dna sequence files treat U as T.

D2Osld(volume_fraction=1.0, D2O_fraction=0.0)

Neutron SLD of the molecule in a %D2O solvent.

static load(filename, type=None)

Load the first FASTA sequence from a file.

static loadall(filename, type=None)

Iterate over sequences in FASTA file, loading each in turn.

Yields one FASTA sequence each cycle.

periodictable.fasta.D2Omatch(Hsld, Dsld)

Find the D2O% concentration of solvent such that neutron SLD of the material matches the neutron SLD of the solvent.

Hsld, Dsld are the SLDs for the hydrogenated and deuterated forms of the material respectively, where D includes all the labile protons swapped for deuterons. Water SLD is calculated at 20 C.

Note that the resulting percentage is only meaningful between 0% to 100%. Beyond 100% you will need an additional constrast agent in the 100% D2O solvent to increase the SLD enough to match.

periodictable.fasta.fasta_table()
periodictable.fasta.isotope_substitution(formula, source, target, portion=1)

Substitute one atom/isotope in a formula with another in some proportion.

formula is the formula being updated.

source is the isotope/element to be substituted.

target is the replacement isotope/element.

portion is the proportion of source which is substituted for target.

periodictable.fasta.read_fasta(fp)

Iterate over the sequences in a FASTA file.

Each iteration is a pair (sequence name, sequence codes).

periodictable.fasta.test()