FASTA format for DNA/RNA and amino acid sequences¶
periodictable.fasta
¶
Biomolecule support.
Molecule
lets you define biomolecules with labile hydrogen atoms
specified using tritium (T) in the chemical formula. The biomolecule object
creates forms with natural isotope ratio, all hydrogen and all deuterium.
Density can be provided as natural density or cell volume. A %D2O contrast
match value is computed for matching the molecule SLD in the presence of
labile hydrogens. Molecule.D2Osld()
computes the neutron SLD for
the solvated molecule in a %D2O solvent.
D2Omatch()
computes the %D2O constrast match value given the fully
hydrogenated and fully deuterated forms.
Sequence
lets you read amino acid and DNA/RNA sequences from FASTA
files.
Tables for common molecules are provided[1]:
AMINO_ACID_CODES : amino acids indexed by FASTA code
RNA_CODES, DNA_CODES* : nucleic bases indexed by FASTA code
RNA_BASES, DNA_BASES* : individual nucleic acid bases
NUCLEIC_ACID_COMPONENTS, LIPIDS, CARBOHYDRATE_RESIDUES
Neutron SLD for water at 20C is also provided as H2O_SLD and D2O_SLD.
For unmodified protein need to add 2*T and O for terminations.
Assumes that proteins were created in an environment with the usual H/D isotope ratio on the non-swappable hydrogens.
[1] Perkins, Modern Physical Methods in Biochemistry Part B, 143-265 (1988)
AMINO_ACID_CODES:
-: gap
A: alanine
B: aspartic acid/asparagine
C: cysteine
D: aspartic acid
E: glutamic acid
F: phenylalanine
G: glycine
H: histidine
I: isoleucine
J: leucine/isoleucine
K: lysine
L: leucine
M: methionine
N: asparagine
P: proline
Q: glutamine
R: arginine
S: serine
T: threonine
V: valine
W: tryptophan
X: any
Y: tyrosine
Z: glutamic acid/glutamine
NUCLEIC_ACID_COMPONENTS:
adenine: C5H2T2N5
cytosine: C4H2T2N3O
deoxyribose: C5H7O2
guanine: C5HT3N5O
phosphate: NaPO3
ribose: C5H6TO3
thymine: C5H4TN2O2
uracil: C4H2TN2O2
CARBOHYDRATE_RESIDUES:
Fuc (terminal): C6H7T3O4
Gal: C6H7T3O5
GalNAc: C8H10T3NO5
Glc: C6H7T3O5
GlcNAc: C8H10T3NO5
Man: C6H7T3O5
Man (terminal): C6H7T4O5
NeuNac (terminal): C11H11T5NO8
chondroitin sulphate: C14H15T4NO14SNa
hyaluronate: C14H15T5NO11Na
keratan sulphate: C14H17T5NO13SNa
LIPIDS:
DLPE: C29H55T3NO8P
DMPC: C36H72NO8P
DMPC-D52: C36H20D52NO8P
cholesteral: C27H45TO
methylene: CH2
methylene-D: CD2
oleate: C45H78O2
palmitate ester: C39H77T2N2O2P
phospholipid headgroup: C10H18NO8P
triglyceride headgroup: C6H5O6
trioleate form: C57H104O6
RNA_BASES:
A:adenosine
C:cytidine
G:guanosine
T:uridine
DNA_BASES:
A:adenosine
C:cytidine
G:guanosine
T:thymidine
-
class
periodictable.fasta.
Molecule
(name, formula, cell_volume=None, density=None, charge=0)¶ Bases:
object
Specify a biomolecule by name, chemical formula, cell volume and charge.
Labile hydrogen positions should be coded using tritium (T) rather than H. That way the tritium can be changed to H[1] for solutions with pure water, H for solutions with a natural abundance of water or D for solutions with pure deuterium.
Attributes
formula is the original tritiated formula. You can retrieve the hydrogenated or deuterated forms using
isotope_substitution()
with formula, periodictable.T and periodictable.H or periodictable.D.D2Omatch is the % D2O in H2O required to contrast match the molecule, including substitution of labile hydrogen in proportion to the D/H ratio.
sld/Hsld/Dsld are the the scattering length densities of the molecule with tritium replaced by naturally occurring H/D ratios, pure H[1] and pure H[2] respectively.
mass/Hmass/Dmass are the masses the three conditions.
charge is the charge on the molecule
cell_volume is the estimated cell volume for the molecule
density is the estimated molecule density
-
D2Osld
(volume_fraction=1.0, D2O_fraction=0.0)¶ Neutron SLD of the molecule in a %D2O solvent.
-
-
class
periodictable.fasta.
Sequence
(name, sequence, type='aa')¶ Bases:
periodictable.fasta.Molecule
Convert FASTA sequence into chemical formula.
name sequence name
sequence code string
type is one of:
aa: amino acid sequence dna: dna sequence rna: rna sequence
Note: rna sequence files treat T as U and dna sequence files treat U as T.
-
D2Osld
(volume_fraction=1.0, D2O_fraction=0.0)¶ Neutron SLD of the molecule in a %D2O solvent.
-
static
load
(filename, type=None)¶ Load the first FASTA sequence from a file.
-
static
loadall
(filename, type=None)¶ Iterate over sequences in FASTA file, loading each in turn.
Yields one FASTA sequence each cycle.
-
-
periodictable.fasta.
D2Omatch
(Hsld, Dsld)¶ Find the D2O% concentration of solvent such that neutron SLD of the material matches the neutron SLD of the solvent.
Hsld, Dsld are the SLDs for the hydrogenated and deuterated forms of the material respectively, where D includes all the labile protons swapped for deuterons. Water SLD is calculated at 20 C.
Note that the resulting percentage is only meaningful between 0% to 100%. Beyond 100% you will need an additional constrast agent in the 100% D2O solvent to increase the SLD enough to match.
-
periodictable.fasta.
fasta_table
()¶
-
periodictable.fasta.
isotope_substitution
(formula, source, target, portion=1)¶ Substitute one atom/isotope in a formula with another in some proportion.
formula is the formula being updated.
source is the isotope/element to be substituted.
target is the replacement isotope/element.
portion is the proportion of source which is substituted for target.
-
periodictable.fasta.
read_fasta
(fp)¶ Iterate over the sequences in a FASTA file.
Each iteration is a pair (sequence name, sequence codes).
-
periodictable.fasta.
test
()¶