The SeqProp Class

This section will give an overview of the methods that can be executed for a single protein sequence.

Available functions

Sequence-based predictions

Secondary structure
solvent accessibilities
Predictions of secondary structure and
relative solvent accessibilities per residue
scratch module SCRATCH    
Thermostability Free energy of unfolding (ΔG), adapted from
Oobatake (Oobatake & Ooi 1993) and Dill (Dill et al. 2011)
thermostability module      
Transmembrane domains Prediction of transmembrane domains from sequence tmhmm module TMHMM    
Aggregation propensity Consensus method to predict the aggregation
propensity of proteins, specifically the number
of aggregation-prone segments on an unfolded
protein sequence
aggregation_propensity module   AMYLPRED2  

Sequence-based calculations

Various sequence
Basic properties of the sequence, such as
percent of polar, non-polar, hydrophobic
or hydrophilic residues.
    EMBOSS pepstats
Sequence alignment Basic functions to run pairwise or multiple
sequence alignments
    EMBOSS needle



class ssbio.protein.sequence.seqprop.SeqProp(seq, id, name='<unknown name>', description='<unknown description>', sequence_path=None, metadata_path=None, feature_path=None)[source]

Generic class to represent information for a protein sequence.

Extends the Biopython SeqRecord class. The main functionality added is the ability to set and load directly from sequence, metadata, and feature files. Additionally, methods are provided to calculate and store sequence properties in the annotations and letter_annotations field of a SeqProp. These can then be accessed for a range of residue numbers.


str – Unique identifier for this protein sequence


Seq – Protein sequence as a Biopython Seq object


str – Optional name for this sequence


str – Optional description for this sequence


str, list – BiGG IDs mapped to this sequence


str, list – KEGG IDs mapped to this sequence


str, list – RefSeq IDs mapped to this sequence


str, list – UniProt IDs mapped to this sequence


str, list – Gene names mapped to this sequence


list – PDB IDs mapped to this sequence


str, list – GO terms mapped to this sequence


str, list – PFAMs mapped to this sequence


str, list – EC numbers mapped to this sequence


str – FASTA file for this sequence


str – Metadata file (any format) for this sequence


str – GFF file for this sequence


list – List of protein sequence features, which define regions of the protein


dict – Annotations of this protein sequence, which summarize global properties


RestrictedDict – Residue-level annotations, which describe single residue properties


add_point_feature(resnum, feat_type=None, feat_id=None)[source]

Add a feature to the features list describing a single residue.

  • resnum (int) – Protein sequence residue number
  • feat_type (str, optional) – Optional description of the feature type (ie. ‘catalytic residue’)
  • feat_id (str, optional) – Optional ID of the feature type (ie. ‘TM1’)
add_region_feature(start_resnum, end_resnum, feat_type=None, feat_id=None)[source]

Add a feature to the features list describing a region of the protein sequence.

  • start_resnum (int) – Start residue number of the protein sequence feature
  • end_resnum (int) – End residue number of the protein sequence feature
  • feat_type (str, optional) – Optional description of the feature type (ie. ‘binding domain’)
  • feat_id (str, optional) – Optional ID of the feature type (ie. ‘TM1’)
blast_pdb(seq_ident_cutoff=0, evalue=0.0001, display_link=False, outdir=None, force_rerun=False)[source]

BLAST this sequence to the PDB


Test if the sequence is equal to another SeqProp’s sequence

Parameters:seq_prop – SeqProp object
Returns:If the sequences are the same
Return type:bool

Copy features to memory and remove the association of the feature file.


list – Get the features stored in memory or in the GFF file

get_aggregation_propensity(email, password, cutoff_v=5, cutoff_n=5, run_amylmuts=False, outdir=None)[source]

Run the AMYLPRED2 web server to calculate the aggregation propensity of this protein sequence, which is the number of aggregation-prone segments on the unfolded protein sequence.

Stores statistics in the annotations attribute, under the key aggprop-amylpred.

See for instructions and details.


Run Biopython’s built in ProteinAnalysis module and store statistics in the annotations attribute.

get_dict(only_attributes=None, exclude_attributes=None, df_format=False)[source]

Get a dictionary of this object’s attributes. Optional format for storage in a Pandas DataFrame.

  • only_attributes (str, list) – Attributes that should be returned. If not provided, all are returned.
  • exclude_attributes (str, list) – Attributes that should be excluded.
  • df_format (bool) – If dictionary values should be formatted for a dataframe (everything possible is transformed into strings, int, or float - if something can’t be transformed it is excluded)

Dictionary of attributes

Run the EMBOSS pepstats program on the protein sequence.

Stores statistics in the annotations attribute. Saves a .pepstats file of the results where the sequence file is located.

get_kinetic_folding_rate(secstruct, at_temp=None)[source]

Run the FOLD-RATE web server to calculate the kinetic folding rate given an amino acid sequence and its structural classficiation (alpha/beta/mixed)

Stores statistics in the annotations attribute, under the key kinetic_folding_rate_<TEMP>-foldrate.

See for instructions and details.

get_residue_annotations(start_resnum, end_resnum=None)[source]

Retrieve letter annotations for a residue or a range of residues

  • start_resnum (int) – Residue number
  • end_resnum (int) – Optional residue number, specify if a range is desired

Letter annotations for this residue or residues

Get a subsequence as a new SeqProp object given a list of residue numbers

get_subsequence_from_property(property_key, property_value, condition, return_resnums=False)[source]

Get a subsequence as a new SeqProp object given a certain property you want to find in the original SeqProp’s letter_annotation

This can be used to do something like extract the subsequence of exposed residues, so you can can run calculations on that subsequence. Useful if you have questions like “are there any predicted surface exposed cysteines in my protein sequence?”


>>> sp = SeqProp(id='tester', seq='MQSLE')
>>> sp.letter_annotations['a_key'] = [2, 2, 3, 1, 0]
>>> pk = 'a_key'
>>> pv = 2
>>> cond = '<'
>>> new_sp = sp.get_subsequence_from_property(pk, pv, cond)
>>> new_sp.letter_annotations[pk]
[1, 0]
>>> new_sp
SeqProp(seq=Seq('LE', ExtendedIUPACProtein()), id='tester_a_key_<_2_extracted', name='<unknown name>', description='<unknown description>', dbxrefs=[])
  • property_key (str) – Property key in the letter_annotations attribute that you want to filter using
  • property_value (str) – Property value that you want to filter by
  • condition (str) – <, =, >, >=, or <= to filter the values by

New SeqProp object that you can run computations on or just extract its properties

Run the thermostability calculator using either the Dill or Oobatake methods.

Stores calculated (dG, Keq) tuple in the annotations attribute, under the key thermostability_<TEMP>-<METHOD_USED>.

See for instructions and details.


int – Report the number of PDB IDs stored in the pdbs attribute


Seq – Dynamically loaded Seq object from the sequence file


int – Get the sequence length


str – Get the sequence formatted as a string

write_fasta_file(outfile, force_rerun=False)[source]

Write a FASTA file for the protein sequence, seq will now load directly from this file.

  • outfile (str) – Path to new FASTA file to be written to
  • force_rerun (bool) – If an existing file should be overwritten
write_gff_file(outfile, force_rerun=False)[source]

Write a GFF file for the protein features, features will now load directly from this file.

  • outfile (str) – Path to new FASTA file to be written to
  • force_rerun (bool) – If an existing file should be overwritten