The StructProp Class

The StructProp Class

Introduction

This section will give an overview of the methods that can be executed for a single protein structure.

Available functions

Sequence & structure-based predictions

Function Description Internal Python class
used and functions provided
External software
to install
Web server Alternate external
software to install
Homology modeling Preparation scripts and parsers for
executing homology modeling algorithms
I-TASSER    
Transmembrane
orientation
Prediction of transmembrane domains and
orientation in a membrane
opm module   OPM  
Kinetic folding rate Prediction of protein folding rates
from amino acid sequence
kinetic_folding_rate module   FOLD-RATE  

Structure-based calculations or functions

Function Description Internal Python class
used and functions provided
External software
to install
Web server Alternate external
software to install
Secondary structure Calculations of secondary structure DSSP   STRIDE
Solvent accessibilities Calculations of per-residue absolute and
relative solvent accessibilities
DSSP   FreeSASA
Residue depths Calculations of residue depths MSMS    
Structural similarity Pairwise calculations of 3D structural
similarity
fatcat module FATCAT    
Various structure
properties
Basic properties of the structure, such
as distance measurements between residues
or number of disulfide bridges
     
Quality Custom functions to allow ranking of
structures by percent identity to a defined
sequence, structure resolution, and other structure
quality metrics
set_representative_structure function      
Structure cleaning,
mutating
Custom functions to allow for the preparation
of structure files for molecular modeling, with
options to remove hydrogens/waters/heteroatoms,
select specific chains, or mutate specific residues.
AmberTools    

API

StructProp

class ssbio.protein.structure.structprop.StructProp(ident, description=None, chains=None, mapped_chains=None, is_experimental=False, structure_path=None, file_type=None)[source]

Generic class to represent information for a protein structure.

The main utilities of this class are to:

  • Provide access to the 3D coordinates using a Biopython Structure object through the method parse_structure.
  • Run predictions and computations on the structure
  • Analyze specific chains using the mapped_chains attribute
  • Provide wrapper methods to nglview to view the structure in a Jupyter notebook
Parameters:
  • ident (str) – Unique identifier for this structure
  • description (str) – Optional human-readable description
  • chains (str, list) – Chain ID or list of IDs
  • mapped_chains (str, list) – A chain ID or IDs to indicate what chains should be analyzed
  • is_experimental (bool) – Flag to indicate if structure is an experimental or computational model
  • structure_path (str) – Path to structure file
  • file_type (str) – Type of structure file - pdb, pdb.gz, mmcif, cif, cif.gz, xml.gz, mmtf, mmtf.gz
add_chain_ids(chains)[source]

Add chains by ID into the chains attribute

Parameters:chains (str, list) – Chain ID or list of IDs
add_mapped_chain_ids(mapped_chains)[source]

Add chains by ID into the mapped_chains attribute

Parameters:mapped_chains (str, list) – Chain ID or list of IDs
add_residues_highlight_to_nglview(view, structure_resnums, chain=None, res_color='red')[source]

Add a residue number or numbers to an NGLWidget view object.

Parameters:
  • view (NGLWidget) – NGLWidget view object
  • structure_resnums (int, list) – Residue number(s) to highlight, structure numbering
  • chain (str, list) – Chain ID or IDs of which residues are a part of. If not provided, all chains in the mapped_chains attribute will be used. If that is also empty, and exception is raised.
  • res_color (str) – Color to highlight residues with
add_scaled_residues_highlight_to_nglview(view, structure_resnums, chain=None, color='red', unique_colors=False, opacity_range=(0.5, 1), scale_range=(0.7, 10))[source]
Add a list of residue numbers (which may contain repeating residues) to a view, or add a dictionary of
residue numbers to counts. Size and opacity of added residues are scaled by counts.
Parameters:
  • view (NGLWidget) – NGLWidget view object
  • structure_resnums (int, list, dict) – Residue number(s) to highlight, or a dictionary of residue number to frequency count
  • chain (str, list) – Chain ID or IDs of which residues are a part of. If not provided, all chains in the mapped_chains attribute will be used. If that is also empty, and exception is raised.
  • color (str) – Color to highlight residues with
  • unique_colors (bool) – If each mutation should be colored uniquely (will override color argument)
  • opacity_range (tuple) – Min/max opacity values (residues that have higher frequency counts will be opaque)
  • scale_range (tuple) – Min/max size values (residues that have higher frequency counts will be bigger)
chains = None

DictList – A DictList of chains have their sequence stored in them, along with residue-specific

clean_structure(out_suffix='_clean', outdir=None, force_rerun=False, remove_atom_alt=True, keep_atom_alt_id='A', remove_atom_hydrogen=True, add_atom_occ=True, remove_res_hetero=True, keep_chemicals=None, keep_res_only=None, add_chain_id_if_empty='X', keep_chains=None)[source]

Clean the structure file associated with this structure, and save it as a new file. Returns the file path.

Parameters:
  • out_suffix (str) – Suffix to append to original filename
  • outdir (str) – Path to output directory
  • force_rerun (bool) – If structure should be re-cleaned if a clean file exists already
  • remove_atom_alt (bool) – Remove alternate positions
  • keep_atom_alt_id (str) – If removing alternate positions, which alternate ID to keep
  • remove_atom_hydrogen (bool) – Remove hydrogen atoms
  • add_atom_occ (bool) – Add atom occupancy fields if not present
  • remove_res_hetero (bool) – Remove all HETATMs
  • keep_chemicals (str, list) – If removing HETATMs, keep specified chemical names
  • keep_res_only (str, list) – Keep ONLY specified resnames, deletes everything else!
  • add_chain_id_if_empty (str) – Add a chain ID if not present
  • keep_chains (str, list) – Keep only these chains
Returns:

Path to cleaned PDB file

Return type:

str

file_type = None

str – Type of structure file

find_disulfide_bridges(threshold=3.0)[source]

Run Biopython’s search_ss_bonds to find potential disulfide bridges for each chain and store in ChainProp.

get_dict_with_chain(chain, only_keys=None, chain_keys=None, exclude_attributes=None, df_format=False)[source]

get_dict method which incorporates attributes found in a specific chain. Does not overwrite any attributes in the original StructProp.

Parameters:
  • chain
  • only_keys
  • chain_keys
  • exclude_attributes
  • df_format
Returns:

attributes of StructProp + the chain specified

Return type:

dict

get_dssp_annotations(outdir, force_rerun=False)[source]

Run DSSP on this structure and store the DSSP annotations in the corresponding ChainProp SeqRecords

Calculations are stored in the ChainProp’s letter_annotations at the following keys:

  • SS-dssp
  • RSA-dssp
  • ASA-dssp
  • PHI-dssp
  • PSI-dssp
Parameters:
  • outdir (str) – Path to where DSSP dataframe will be stored.
  • force_rerun (bool) – If DSSP results should be recalculated

Todo

  • Also parse global properties, like total accessible surface area. Don’t think Biopython parses those?
get_freesasa_annotations(outdir, include_hetatms=False, force_rerun=False)[source]

Run freesasa on this structure and store the calculated properties in the corresponding ChainProps

get_msms_annotations(outdir, force_rerun=False)[source]

Run MSMS on this structure and store the residue depths/ca depths in the corresponding ChainProp SeqRecords

get_structure_seqs(model)[source]

Gather chain sequences and store in their corresponding ChainProp objects in the chains attribute.

Parameters:model (Model) – Biopython Model object of the structure you would like to parse
is_experimental = None

bool – Flag to note if this structure is an experimental model or a homology model

load_structure_path(structure_path, file_type)[source]

Load a structure file and provide pointers to its location

Parameters:
  • structure_path (str) – Path to structure file
  • file_type (str) – Type of structure file
mapped_chains = None

list – A simple list of chain IDs (strings) that will be used to subset analyses

parse_structure(store_in_memory=False)[source]

Read the 3D coordinates of a structure file and return it as a Biopython Structure object. Also create ChainProp objects in the chains attribute for each chain in the first model.

Parameters:store_in_memory (bool) – If the Biopython Structure object should be stored in the attribute structure.
Returns:Biopython Structure object
Return type:Structure
parsed = None

bool – Simple flag to track if this structure has had its structure + chain sequences parsed

structure = None

Structure – Biopython Structure object, only used if store_in_memory option of parse_structure is set to True

structure_file = None

str – Name of the structure file

view_structure(only_chains=None, opacity=1.0, recolor=False, gui=False)[source]

Use NGLviewer to display a structure in a Jupyter notebook

Parameters:
  • only_chains (str, list) – Chain ID or IDs to display
  • opacity (float) – Opacity of the structure
  • recolor (bool) – If structure should be cleaned and recolored to silver
  • gui (bool) – If the NGLview GUI should show up
Returns:

NGLviewer object