SCRATCH

Secondary structure

Description

SCRATCH is a suite of tools to predict many types of structural properties directly from sequence. ssbio contains wrappers to execute and parse results from SSpro/SSpro8 - predictors of secondary structure, and ACCpro/ACCpro20 - predictors of solvent accessibility.

Installation instructions (Ubuntu)

Note

These instructions were created on an Ubuntu 17.04 system.

  1. Download the source and install it using the perl script:

    mkdir /path/to/my/software/scratch
    cd /path/to/my/software/scratch
    wget http://download.igb.uci.edu/SCRATCH-1D_1.1.tar.gz
    tar -zxf SCRATCH-1D_1.1.tar.gz
    cd SCRATCH-1D_1.1
    perl install.pl
    
  2. To run it from the command line directly:

    
    
  3. ssbio also provides command line wrappers to run it and parse the results, see for details.

Program execution

In the shell

To run the program on its own in the shell…

/path/to/my/software/scratch/SCRATCH-1D_1.1/bin/run_SCRATCH-1D_predictors.sh  input_fasta  output_prefix  [num_threads]

With ssbio

To run the program using the ssbio Python wrapper, see: ssbio.protein.sequence.properties.scratch.SCRATCH.run_scratch()

FAQs

  • How do I cite SCRATCH?

    • Cheng J, Randall AZ, Sweredoski MJ & Baldi P (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res. 33: W72–6 Available at: http://dx.doi.org/10.1093/nar/gki396
  • I’m having issues running STRIDE…

    • See the ssbio wiki for (hopefully) some solutions - or add yours in when you find the answer!

API

class ssbio.protein.sequence.properties.scratch.SCRATCH(project_name, seq_file=None, seq_str=None)[source]

Provide wrappers for running and parsing SCRATCH on a sequence file or sequence string.

To run from the command line:

./run_SCRATCH-1D_predictors.sh  input_fasta  output_prefix  [num_threads]

SCRATCH predicts:

  • Secondary structure

    • 3 classes (helix, strand, other) using SSpro
    • 8 classes (standard DSSP definitions) using SSpro8
  • Relative solvent accessibility (RSA, also known as relative accessible surface area)

    • @ 25% exposed RSA cutoff (<25% RSA means it is buried)
    • @ all cutoffs in 5% increments from 0 to 100
accpro20_results()[source]

Parse the ACCpro output file and return a dict of secondary structure compositions

accpro20_summary(cutoff)[source]

Parse the ACCpro output file and return a summary of percent exposed/buried residues based on a cutoff.

Below the cutoff = buried Equal to or greater than cutoff = exposed The default cutoff used in accpro is 25%.

The output file is just a FASTA formatted file, so you can get residue level
information by parsing it like a normal sequence file.
Parameters:cutoff (float) – Cutoff for defining a buried or exposed residue.
Returns:Percentage of buried and exposed residues
Return type:dict
accpro_results()[source]

Parse the ACCpro output file and return a dict of secondary structure compositions.

accpro_summary()[source]

Parse the ACCpro output file and return a summary of percent exposed/buried residues.

The output file is just a FASTA formatted file, so you can get residue level
information by parsing it like a normal sequence file.
Returns:Percentage of buried and exposed residues
Return type:dict
run_scratch(path_to_scratch, num_cores=1, outname=None, outdir=None, force_rerun=False)[source]

Run SCRATCH on the sequence_file that was loaded into the class.

Parameters:
  • path_to_scratch – Path to the SCRATCH executable, run_SCRATCH-1D_predictors.sh
  • outname – Prefix to name the output files
  • outdir – Directory to store the output files
  • force_rerun – Flag to force rerunning of SCRATCH even if the output files exist

Returns:

sspro8_results()[source]

Parse the SSpro8 output file and return a dict of secondary structure compositions.

sspro8_summary()[source]

Parse the SSpro8 output file and return a summary of secondary structure composition.

The output file is just a FASTA formatted file, so you can get residue level
information by parsing it like a normal sequence file.
Returns:
Percentage of:
H: alpha-helix G: 310-helix I: pi-helix (extremely rare) E: extended strand B: beta-bridge T: turn S: bend C: the rest
Return type:dict
sspro_results()[source]

Parse the SSpro output file and return a dict of secondary structure compositions.

Returns:
Keys are sequence IDs, values are the lists of secondary structure predictions.
H: helix E: strand C: the rest
Return type:dict
sspro_summary()[source]

Parse the SSpro output file and return a summary of secondary structure composition.

The output file is just a FASTA formatted file, so you can get residue level
information by parsing it like a normal sequence file.
Returns:
Percentage of:
H: helix E: strand C: the rest
Return type:dict
ssbio.protein.sequence.properties.scratch.read_accpro20(infile)[source]

Read the accpro20 output (.acc20) and return the parsed FASTA records.

Keeps the spaces between the accessibility numbers.

Parameters:infile – Path to .acc20 file
Returns:Dictionary of accessibilities with keys as the ID
Return type:dict