FASTA format for DNA/RNA and amino acid sequences


Biomolecule support.

Molecule lets you define biomolecules with labile hydrogen atoms specified using H[1] in the chemical formula. The biomolecule object creates forms with natural isotope ratio, all hydrogen and all deuterium. Density can be provided as natural density or cell volume. A %D2O contrast match value is computed for matching the molecule SLD in the presence of labile hydrogens. Molecule.D2Osld() computes the neutron SLD for the solvated molecule in a %D2O solvent.

Sequence lets you read amino acid and DNA/RNA sequences from FASTA files.

Tables for common molecules are provided[1]:

AMINO_ACID_CODES : amino acids indexed by FASTA code

RNA_CODES, DNA_CODES* : nucleic bases indexed by FASTA code

RNA_BASES, DNA_BASES* : individual nucleic acid bases


Neutron SLD for water at 20C is also provided as H2O_SLD and D2O_SLD.

For unmodified protein need to add 2*H[1] and O for terminations.

Assumes that proteins were created in an environment with the usual H/D isotope ratio on the non-swappable hydrogens.

[1] Perkins, S.J., 1985. Chapter 6 X-Ray and Neutron Solution Scattering, in: New Comprehensive Biochemistry. Elsevier, pp. 143-265. https://doi.org/10.1016/S0167-7306(08)60575-X


-: gap
A: alanine
B: aspartic acid/asparagine
C: cysteine
D: aspartic acid
E: glutamic acid
F: phenylalanine
G: glycine
H: histidine
I: isoleucine
J: leucine/isoleucine
K: lysine
L: leucine
M: methionine
N: asparagine
P: proline
Q: glutamine
R: arginine
S: serine
T: threonine
V: valine
W: tryptophan
X: any
Y: tyrosine
Z: glutamic acid/glutamine


adenine: C5H2H[1]2N5
cytosine: C4H2H[1]2N3O
deoxyribose: C5H7O2
guanine: C5HH[1]3N5O
phosphate: NaPO3
ribose: C5H6H[1]O3
thymine: C5H4H[1]N2O2
uracil: C4H2H[1]N2O2


Fuc (terminal): C6H7H[1]3O4
Gal: C6H7H[1]3O5
GalNAc: C8H10H[1]3NO5
Glc: C6H7H[1]3O5
GlcNAc: C8H10H[1]3NO5
Man: C6H7H[1]3O5
Man (terminal): C6H7H[1]4O5
NeuNac (terminal): C11H11H[1]5NO8
chondroitin sulphate: C14H15H[1]4NO14SNa
hyaluronate: C14H15H[1]5NO11Na
keratan sulphate: C14H17H[1]5NO13SNa


DLPE: C29H55H[1]3NO8P
DMPC-D52: C36H20D52NO8P
cholesteral: C27H45H[1]O
methylene: CH2
methylene-D: CD2
oleate: C45H78O2
palmitate ester: C39H77H[1]2N2O2P
phospholipid headgroup: C10H18NO8P
triglyceride headgroup: C6H5O6
trioleate form: C57H104O6




class periodictable.fasta.Molecule(name, formula, cell_volume=None, density=None, charge=0)

Bases: object

Specify a biomolecule by name, chemical formula, cell volume and charge.

Labile hydrogen positions should be coded using H[1] rather than H. H[1] will be substituded with H for solutions with natural water or D for solutions with heavy water. Any deuterated non-labile hydrogen can be marked with D, and they will stay as D regardless of the solvent.

name is the molecule name.

formula is the chemical formula as string or atom dictionary, with H[1] for labile hydrogen.

cell_volume is the volume of the molecule. If None, cell volume will be inferred from the natural density of the molecule. Cell volume is assumed to be independent of isotope.

density is the natural density of the molecule. If None, density will be inferred from cell volume.

charge is the overall charge on the molecule.


labile_formula is the original formula, with H[1] for the labile H. You can retrieve the deuterated from using:

molecule.labile_formula.replace(elements.H[1], elements.D)

natural_formula has H substituted for H[1] in labile_formula.

D2Omatch is percentage of D2O by volume in H2O required to match the SLD of the molecule, including substitution of labile hydrogen in proportion to the D/H ratio in the solvent. Values will be outside the range [0, 100] if the contrast match is impossible.

sld/Dsld are the the SLDs of the molecule with H[1] replaced by naturally occurring H/D ratios and pure D respectively.

mass/Dmass are the masses for natural H/D and pure D respectively.

charge is the charge on the molecule

cell_volume is the estimated cell volume for the molecule

density is the estimated molecule density

Change 1.5.3: drop Hmass and Hsld. Move formula to labile_formula. Move Hnatural to formula.

D2Osld(volume_fraction=1.0, D2O_fraction=0.0)

Neutron SLD of the molecule in a deuterated solvent.

Changed 1.5.3: fix errors in SLD calculations.

class periodictable.fasta.Sequence(name, sequence, type='aa')

Bases: periodictable.fasta.Molecule

Convert FASTA sequence into chemical formula.

name sequence name

sequence code string

type is one of:

aa: amino acid sequence
dna: dna sequence
rna: rna sequence

Note: rna sequence files treat T as U and dna sequence files treat U as T.

D2Osld(volume_fraction=1.0, D2O_fraction=0.0)

Neutron SLD of the molecule in a deuterated solvent.

Changed 1.5.3: fix errors in SLD calculations.

static load(filename, type=None)

Load the first FASTA sequence from a file.

static loadall(filename, type=None)

Iterate over sequences in FASTA file, loading each in turn.

Yields one FASTA sequence each cycle.

periodictable.fasta.D2Omatch(Hsld, Dsld)

Find the D2O% concentration of solvent such that neutron SLD of the material matches the neutron SLD of the solvent.

Hsld, Dsld are the SLDs for the hydrogenated and deuterated forms of the material respectively, where D includes all the labile protons swapped for deuterons. Water SLD is calculated at 20 C.

Note that the resulting percentage is only meaningful between 0% to 100%. Beyond 100% you will need an additional constrast agent in the 100% D2O solvent to increase the SLD enough to match.

Deprecated since version 1.5.3: Use periodictable.nsf.D2O_match(formula) instead.

Change 1.5.3: corrected D2O sld, which will change the computed match point.

periodictable.fasta.isotope_substitution(formula, source, target, portion=1)

Substitute one atom/isotope in a formula with another in some proportion.

formula is the formula being updated.

source is the isotope/element to be substituted.

target is the replacement isotope/element.

portion is the proportion of source which is substituted for target.

Deprecated since version 1.5.3: Use formula.replace(source, target, portion) instead.


Iterate over the sequences in a FASTA file.

Each iteration is a pair (sequence name, sequence codes).

Change 1.5.3: Now uses H[1] rather than T for labile hydrogen.

periodictable.fasta.D2O_SLD = 6.390934026937301

real portion of D2O sld at 20 C Change 1.5.2: Use correct density in SLD calculation

periodictable.fasta.H2O_SLD = -0.5595112084983276

real portion of H2O sld at 20 C