The SeqProp Class¶

Introduction¶

This section will give an overview of the methods that can be executed for a single protein sequence.

Tutorials¶

SeqProp - Protein Sequence Properties

Available functions¶

Sequence-based predictions¶

Function	Description	Internal Python class used and functions provided	External software to install	Web server
Secondary structure and solvent accessibilities	Predictions of secondary structure and relative solvent accessibilities per residue	`scratch module`	SCRATCH
Thermostability	Free energy of unfolding (ΔG), adapted from Oobatake (Oobatake & Ooi 1993) and Dill (Dill et al. 2011)	`thermostability module`
Transmembrane domains	Prediction of transmembrane domains from sequence	`tmhmm module`	TMHMM
Aggregation propensity	Consensus method to predict the aggregation propensity of proteins, specifically the number of aggregation-prone segments on an unfolded protein sequence	`aggregation_propensity module`		AMYLPRED2

Sequence-based calculations¶

Function	Description	Internal Python class used and functions provided	External software to install	Web server	Alternate external software to install
Various sequence properties	Basic properties of the sequence, such as percent of polar, non-polar, hydrophobic or hydrophilic residues.	Biopython ProteinAnalysis `sequence residues module`			EMBOSS pepstats
Sequence alignment	Basic functions to run pairwise or multiple sequence alignments	Biopython pairwise2 `alignment module`			EMBOSS needle

API¶

SeqProp¶

class ssbio.protein.sequence.seqprop.SeqProp(seq, id, name='<unknown name>', description='<unknown description>', sequence_path=None, metadata_path=None, feature_path=None)[source]¶

Generic class to represent information for a protein sequence.

Extends the Biopython SeqRecord class. The main functionality added is the ability to set and load directly from sequence, metadata, and feature files. Additionally, methods are provided to calculate and store sequence properties in the annotations and letter_annotations field of a SeqProp. These can then be accessed for a range of residue numbers.

id¶: str – Unique identifier for this protein sequence

seq¶: Seq – Protein sequence as a Biopython Seq object

name¶: str – Optional name for this sequence

description¶: str – Optional description for this sequence

bigg¶: str, list – BiGG IDs mapped to this sequence

kegg¶: str, list – KEGG IDs mapped to this sequence

refseq¶: str, list – RefSeq IDs mapped to this sequence

uniprot¶: str, list – UniProt IDs mapped to this sequence

gene_name¶: str, list – Gene names mapped to this sequence

pdbs¶: list – PDB IDs mapped to this sequence

go¶: str, list – GO terms mapped to this sequence

pfam¶: str, list – PFAMs mapped to this sequence

ec_number¶: str, list – EC numbers mapped to this sequence

sequence_file¶: str – FASTA file for this sequence

metadata_file¶: str – Metadata file (any format) for this sequence

feature_file¶: str – GFF file for this sequence

features¶: list – List of protein sequence features, which define regions of the protein

annotations¶: dict – Annotations of this protein sequence, which summarize global properties

letter_annotations¶: RestrictedDict – Residue-level annotations, which describe single residue properties

Todo

Properly inherit methods from the Object class…

add_point_feature(resnum, feat_type=None, feat_id=None)[source]¶

Add a feature to the features list describing a single residue.

Parameters:	resnum (int) – Protein sequence residue number feat_type (str, optional) – Optional description of the feature type (ie. ‘catalytic residue’) feat_id (str, optional) – Optional ID of the feature type (ie. ‘TM1’)

add_region_feature(start_resnum, end_resnum, feat_type=None, feat_id=None)[source]¶

Add a feature to the features list describing a region of the protein sequence.

Parameters:	start_resnum (int) – Start residue number of the protein sequence feature end_resnum (int) – End residue number of the protein sequence feature feat_type (str, optional) – Optional description of the feature type (ie. ‘binding domain’) feat_id (str, optional) – Optional ID of the feature type (ie. ‘TM1’)

blast_pdb(seq_ident_cutoff=0, evalue=0.0001, display_link=False, outdir=None, force_rerun=False)[source]¶: BLAST this sequence to the PDB

equal_to(seq_prop)[source]¶

Test if the sequence is equal to another SeqProp’s sequence

Parameters:	seq_prop – SeqProp object
Returns:	If the sequences are the same
Return type:	bool

feature_path_unset()[source]¶: Copy features to memory and remove the association of the feature file.

features: list – Get the features stored in memory or in the GFF file

get_aggregation_propensity(email, password, cutoff_v=5, cutoff_n=5, run_amylmuts=False, outdir=None)[source]¶

Run the AMYLPRED2 web server to calculate the aggregation propensity of this protein sequence, which is the number of aggregation-prone segments on the unfolded protein sequence.

Stores statistics in the annotations attribute, under the key aggprop-amylpred.

See ssbio.protein.sequence.properties.aggregation_propensity for instructions and details.

get_biopython_pepstats()[source]¶: Run Biopython’s built in ProteinAnalysis module and store statistics in the annotations attribute.

get_dict(only_attributes=None, exclude_attributes=None, df_format=False)[source]¶

Get a dictionary of this object’s attributes. Optional format for storage in a Pandas DataFrame.

Parameters:	only_attributes (str, list) – Attributes that should be returned. If not provided, all are returned. exclude_attributes (str, list) – Attributes that should be excluded. df_format (bool) – If dictionary values should be formatted for a dataframe (everything possible is transformed into strings, int, or float - if something can’t be transformed it is excluded)
Returns:	Dictionary of attributes
Return type:	dict

get_emboss_pepstats()[source]¶

Run the EMBOSS pepstats program on the protein sequence.

Stores statistics in the annotations attribute. Saves a .pepstats file of the results where the sequence file is located.

get_kinetic_folding_rate(secstruct, at_temp=None)[source]¶

Run the FOLD-RATE web server to calculate the kinetic folding rate given an amino acid sequence and its structural classficiation (alpha/beta/mixed)

Stores statistics in the annotations attribute, under the key kinetic_folding_rate_<TEMP>-foldrate.

See ssbio.protein.sequence.properties.kinetic_folding_rate.get_foldrate() for instructions and details.

get_residue_annotations(start_resnum, end_resnum=None)[source]¶

Retrieve letter annotations for a residue or a range of residues

Parameters:	start_resnum (int) – Residue number end_resnum (int) – Optional residue number, specify if a range is desired
Returns:	Letter annotations for this residue or residues
Return type:	dict

get_thermostability(at_temp)[source]¶

Run the thermostability calculator using either the Dill or Oobatake methods.

Stores calculated (dG, Keq) tuple in the annotations attribute, under the key thermostability_<TEMP>-<METHOD_USED>.

See ssbio.protein.sequence.properties.thermostability.get_dG_at_T() for instructions and details.

num_pdbs¶: int – Report the number of PDB IDs stored in the pdbs attribute

seq: Seq – Dynamically loaded Seq object from the sequence file

seq_len¶: int – Get the sequence length

seq_str¶: str – Get the sequence formatted as a string

write_fasta_file(outfile, force_rerun=False)[source]¶

Write a FASTA file for the protein sequence, seq will now load directly from this file.

Parameters:	outfile (str) – Path to new FASTA file to be written to force_rerun (bool) – If an existing file should be overwritten

write_gff_file(outfile, force_rerun=False)[source]¶

Write a GFF file for the protein features, features will now load directly from this file.

Parameters:	outfile (str) – Path to new FASTA file to be written to force_rerun (bool) – If an existing file should be overwritten