The StructProp Class¶

Introduction¶

This section will give an overview of the methods that can be executed for a single protein structure.

Tutorials¶

Available functions¶

Sequence & structure-based predictions¶

Function	Description	Internal Python class used and functions provided	External software to install	Web server
Homology modeling	Preparation scripts and parsers for executing homology modeling algorithms	`itasserprep module` `itasserprop module`	I-TASSER
Transmembrane orientation	Prediction of transmembrane domains and orientation in a membrane	`opm module`		OPM
Kinetic folding rate	Prediction of protein folding rates from amino acid sequence	`kinetic_folding_rate module`		FOLD-RATE

Structure-based calculations or functions¶

Function	Description	Internal Python class used and functions provided	External software to install	Alternate external software to install
Secondary structure	Calculations of secondary structure	Biopython DSSP `dssp module` `stride module`	DSSP	STRIDE
Solvent accessibilities	Calculations of per-residue absolute and relative solvent accessibilities	Biopython DSSP `dssp module` `freesasa module`	DSSP	FreeSASA
Residue depths	Calculations of residue depths	Biopython ResidueDepth `msms module`	MSMS
Structural similarity	Pairwise calculations of 3D structural similarity	`fatcat module`	FATCAT
Various structure properties	Basic properties of the structure, such as distance measurements between residues or number of disulfide bridges	Biopython Struct `structure residues module`
Quality	Custom functions to allow ranking of structures by percent identity to a defined sequence, structure resolution, and other structure quality metrics	`set_representative_structure function`
Structure cleaning, mutating	Custom functions to allow for the preparation of structure files for molecular modeling, with options to remove hydrogens/waters/heteroatoms, select specific chains, or mutate specific residues.	Biopython Select `cleanpdb module` `mutatepdb module`	AmberTools

API¶

StructProp¶

class ssbio.protein.structure.structprop.StructProp(ident, description=None, chains=None, mapped_chains=None, is_experimental=False, structure_path=None, file_type=None)[source]¶

Generic class to represent information for a protein structure.

The main utilities of this class are to:

Provide access to the 3D coordinates using a Biopython Structure object through the method parse_structure.
Run predictions and computations on the structure
Analyze specific chains using the mapped_chains attribute
Provide wrapper methods to nglview to view the structure in a Jupyter notebook

Parameters:

ident (str) – Unique identifier for this structure
description (str) – Optional human-readable description
chains (str, list) – Chain ID or list of IDs
mapped_chains (str, list) – A chain ID or IDs to indicate what chains should be analyzed
is_experimental (bool) – Flag to indicate if structure is an experimental or computational model
structure_path (str) – Path to structure file
file_type (str) – Type of structure file - pdb, pdb.gz, mmcif, cif, cif.gz, xml.gz, mmtf, mmtf.gz

add_chain_ids(chains)[source]¶

Add chains by ID into the chains attribute

Parameters:	chains (str, list) – Chain ID or list of IDs

add_mapped_chain_ids(mapped_chains)[source]¶

Add chains by ID into the mapped_chains attribute

Parameters:	mapped_chains (str, list) – Chain ID or list of IDs

add_residues_highlight_to_nglview(view, structure_resnums, chain=None, res_color='red')[source]¶

Add a residue number or numbers to an NGLWidget view object.

Parameters:

view (NGLWidget) – NGLWidget view object
structure_resnums (int, list) – Residue number(s) to highlight, structure numbering
chain (str, list) – Chain ID or IDs of which residues are a part of. If not provided, all chains in the mapped_chains attribute will be used. If that is also empty, and exception is raised.
res_color (str) – Color to highlight residues with

add_scaled_residues_highlight_to_nglview(view, structure_resnums, chain=None, color='red', unique_colors=False, opacity_range=(0.5, 1), scale_range=(0.7, 10))[source]¶

Add a list of residue numbers (which may contain repeating residues) to a view, or add a dictionary of: residue numbers to counts. Size and opacity of added residues are scaled by counts.

Parameters:

view (NGLWidget) – NGLWidget view object
structure_resnums (int, list, dict) – Residue number(s) to highlight, or a dictionary of residue number to frequency count
chain (str, list) – Chain ID or IDs of which residues are a part of. If not provided, all chains in the mapped_chains attribute will be used. If that is also empty, and exception is raised.
color (str) – Color to highlight residues with
unique_colors (bool) – If each mutation should be colored uniquely (will override color argument)
opacity_range (tuple) – Min/max opacity values (residues that have higher frequency counts will be opaque)
scale_range (tuple) – Min/max size values (residues that have higher frequency counts will be bigger)

chains = None¶: DictList – A DictList of chains have their sequence stored in them, along with residue-specific

clean_structure(out_suffix='_clean', outdir=None, force_rerun=False, remove_atom_alt=True, keep_atom_alt_id='A', remove_atom_hydrogen=True, add_atom_occ=True, remove_res_hetero=True, keep_chemicals=None, keep_res_only=None, add_chain_id_if_empty='X', keep_chains=None)[source]¶

Clean the structure file associated with this structure, and save it as a new file. Returns the file path.

Parameters:	out_suffix (str) – Suffix to append to original filename outdir (str) – Path to output directory force_rerun (bool) – If structure should be re-cleaned if a clean file exists already remove_atom_alt (bool) – Remove alternate positions keep_atom_alt_id (str) – If removing alternate positions, which alternate ID to keep remove_atom_hydrogen (bool) – Remove hydrogen atoms add_atom_occ (bool) – Add atom occupancy fields if not present remove_res_hetero (bool) – Remove all HETATMs keep_chemicals (str, list) – If removing HETATMs, keep specified chemical names keep_res_only (str, list) – Keep ONLY specified resnames, deletes everything else! add_chain_id_if_empty (str) – Add a chain ID if not present keep_chains (str, list) – Keep only these chains
Returns:	Path to cleaned PDB file
Return type:	str

file_type = None¶: str – Type of structure file

find_disulfide_bridges(threshold=3.0)[source]¶: Run Biopython’s search_ss_bonds to find potential disulfide bridges for each chain and store in ChainProp.

get_dict_with_chain(chain, only_keys=None, chain_keys=None, exclude_attributes=None, df_format=False)[source]¶

get_dict method which incorporates attributes found in a specific chain. Does not overwrite any attributes in the original StructProp.

Parameters:	chain – only_keys – chain_keys – exclude_attributes – df_format –
Returns:	attributes of StructProp + the chain specified
Return type:	dict

get_dssp_annotations(outdir, force_rerun=False)[source]¶

Run DSSP on this structure and store the DSSP annotations in the corresponding ChainProp SeqRecords

Calculations are stored in the ChainProp’s letter_annotations at the following keys:

SS-dssp

RSA-dssp

ASA-dssp

PHI-dssp

PSI-dssp

Parameters:	outdir (str) – Path to where DSSP dataframe will be stored. force_rerun (bool) – If DSSP results should be recalculated

Todo

Also parse global properties, like total accessible surface area. Don’t think Biopython parses those?

get_freesasa_annotations(outdir, include_hetatms=False, force_rerun=False)[source]¶: Run freesasa on this structure and store the calculated properties in the corresponding ChainProps

get_msms_annotations(outdir, force_rerun=False)[source]¶: Run MSMS on this structure and store the residue depths/ca depths in the corresponding ChainProp SeqRecords

get_structure_seqs(model)[source]¶

Gather chain sequences and store in their corresponding ChainProp objects in the chains attribute.

Parameters:	model (Model) – Biopython Model object of the structure you would like to parse

is_experimental = None¶: bool – Flag to note if this structure is an experimental model or a homology model

load_structure_path(structure_path, file_type)[source]¶

Load a structure file and provide pointers to its location

Parameters:	structure_path (str) – Path to structure file file_type (str) – Type of structure file

mapped_chains = None¶: list – A simple list of chain IDs (strings) that will be used to subset analyses

parse_structure(store_in_memory=False)[source]¶

Read the 3D coordinates of a structure file and return it as a Biopython Structure object. Also create ChainProp objects in the chains attribute for each chain in the first model.

Parameters:	store_in_memory (bool) – If the Biopython Structure object should be stored in the attribute `structure`.
Returns:	Biopython Structure object
Return type:	Structure

parsed = None¶: bool – Simple flag to track if this structure has had its structure + chain sequences parsed

structure = None¶: Structure – Biopython Structure object, only used if store_in_memory option of parse_structure is set to True

structure_file = None¶: str – Name of the structure file

view_structure(only_chains=None, opacity=1.0, recolor=False, gui=False)[source]¶

Use NGLviewer to display a structure in a Jupyter notebook

Parameters:	only_chains (str, list) – Chain ID or IDs to display opacity (float) – Opacity of the structure recolor (bool) – If structure should be cleaned and recolored to silver gui (bool) – If the NGLview GUI should show up
Returns:	NGLviewer object