StructProp(ident, description=None, chains=None, mapped_chains=None, is_experimental=False, structure_path=None, file_type=None)¶
Generic class to represent information for a protein structure.
Provides access to the 3D coordinates using a Biopython Structure object through the method
parse_structure. The main functionality added is the ability to set and load directly from any supported structure and metadata file. Additionally, the
mapped_chainsattribute allows for analysis of a subset of chains, which will map to a gene of interest. Also provides methods through
nglviewto view the structure in a Jupyter notebook.
str – Unique identifier for this protein structure
str – Optional name for this structure
str – Optional description for this structure
bool – Flag to note if this structure is an experimental model or a homology model
DictList – A DictList of chains have their sequence stored in them, along with residue-specific annotations
list – A simple list of chain IDs (strings) that will be used to subset analyses
str – Type of structure file
str – Name of the structure file
Add chains by ID into the chains attribute
Parameters: chains (str, list) – Chain ID or list of IDs
Add chains by ID into the mapped_chains attribute
Parameters: mapped_chains (str, list) – Chain ID or list of IDs
add_residues_highlight_to_nglview(view, structure_resnums, chain=None, res_color='red')¶
Add a residue number or numbers to an NGLWidget view object.
- view (NGLWidget) – NGLWidget view object
- structure_resnums (int, list) – Residue number(s) to highlight, structure numbering
- chain (str, list) – Chain ID or IDs of which residues are a part of. If not provided, all chains in the mapped_chains attribute will be used. If that is also empty, and exception is raised.
- res_color (str) – Color to highlight residues with
add_scaled_residues_highlight_to_nglview(view, structure_resnums, chain=None, color='red', unique_colors=False, opacity_range=(0.5, 1), scale_range=(0.7, 10))¶
- Add a list of residue numbers (which may contain repeating residues) to a view, or add a dictionary of
- residue numbers to counts. Size and opacity of added residues are scaled by counts.
- view (NGLWidget) – NGLWidget view object
- structure_resnums (int, list, dict) – Residue number(s) to highlight, or a dictionary of residue number to frequency count
- color (str) – Color to highlight residues with
- unique_colors (bool) – If each mutation should be colored uniquely (will override color argument)
- opacity_range (tuple) – Min/max opacity values (residues that have higher frequency counts will be opaque)
- scale_range (tuple) – Min/max size values (residues that have higher frequency counts will be bigger)
clean_structure(out_suffix='_clean', outdir=None, force_rerun=False, remove_atom_alt=True, keep_atom_alt_id='A', remove_atom_hydrogen=True, add_atom_occ=True, remove_res_hetero=True, keep_chemicals=None, keep_res_only=None, add_chain_id_if_empty='X', keep_chains=None)¶
Clean the structure file associated with this structure, and save it as a new file. Returns the file path.
- out_suffix (str) – Suffix to append to original filename
- outdir (str) – Path to output directory
- force_rerun (bool) – If structure should be re-cleaned if a clean file exists already
- remove_atom_alt (bool) – Remove alternate positions
- keep_atom_alt_id (str) – If removing alternate positions, which alternate ID to keep
- remove_atom_hydrogen (bool) – Remove hydrogen atoms
- add_atom_occ (bool) – Add atom occupancy fields if not present
- remove_res_hetero (bool) – Remove all HETATMs
- keep_chemicals (str, list) – If removing HETATMs, keep specified chemical names
- keep_res_only (str, list) – Keep ONLY specified resnames, deletes everything else!
- add_chain_id_if_empty (str) – Add a chain ID if not present
- keep_chains (str, list) – Keep only these chains
Path to cleaned PDB file
Run Biopython’s search_ss_bonds to find potential disulfide bridges for each chain and store in ChainProp.
get_dict_with_chain(chain, only_keys=None, chain_keys=None, exclude_attributes=None, df_format=False)¶
get_dict method which incorporates attributes found in a specific chain. Does not overwrite any attributes in the original StructProp.
- chain –
- only_keys –
- chain_keys –
- exclude_attributes –
- df_format –
attributes of StructProp + the chain specified
Run DSSP on this structure and store the DSSP annotations in the corresponding ChainProp SeqRecords
Calculations are stored in the ChainProp’s
letter_annotationsat the following keys:
- outdir (str) – Path to where DSSP dataframe will be stored.
- force_rerun (bool) – If DSSP results should be recalculated
- Also parse global properties, like total accessible surface area. Don’t think Biopython parses those?
get_freesasa_annotations(outdir, include_hetatms=False, force_rerun=False)¶
freesasaon this structure and store the calculated properties in the corresponding ChainProps
Run MSMS on this structure and store the residue depths/ca depths in the corresponding ChainProp SeqRecords
Gather chain sequences and store in their corresponding
ChainPropobjects in the
Parameters: model (Model) – Biopython Model object of the structure you would like to parse
Load a structure file and provide pointers to its location
- structure_path (str) – Path to structure file
- file_type (str) – Type of structure file
Read the 3D coordinates of a structure file and return it as a Biopython Structure object
Also create ChainProp objects in the chains attribute
Returns: Biopython Structure object Return type: Structure
view_structure(opacity=1.0, recolor=False, gui=False)¶
Use NGLviewer to display a structure in a Jupyter notebook
- opacity (float) – Opacity of the structure
- recolor (bool) – If structure should be cleaned and recolored to silver
- gui (bool) – If the NGLview GUI should show up