SCRATCH¶
Description¶
SCRATCH is a suite of tools to predict many types of structural properties directly from sequence. ssbio contains wrappers to execute and parse results from SSpro/SSpro8 - predictors of secondary structure, and ACCpro/ACCpro20 - predictors of solvent accessibility.
Installation instructions (Ubuntu)¶
Note
These instructions were created on an Ubuntu 17.04 system.
Download the source and install it using the perl script:
mkdir /path/to/my/software/scratch cd /path/to/my/software/scratch wget http://download.igb.uci.edu/SCRATCH-1D_1.1.tar.gz tar -zxf SCRATCH-1D_1.1.tar.gz cd SCRATCH-1D_1.1 perl install.pl
To run it from the command line directly:
ssbio also provides command line wrappers to run it and parse the results, see for details.
Program execution¶
In the shell¶
To run the program on its own in the shell…
/path/to/my/software/scratch/SCRATCH-1D_1.1/bin/run_SCRATCH-1D_predictors.sh input_fasta output_prefix [num_threads]
With ssbio¶
To run the program using the ssbio Python wrapper, see: ssbio.protein.sequence.properties.scratch.SCRATCH.run_scratch()
FAQs¶
How do I cite SCRATCH?
- Cheng J, Randall AZ, Sweredoski MJ & Baldi P (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res. 33: W72–6 Available at: http://dx.doi.org/10.1093/nar/gki396
I’m having issues running STRIDE…
- See the ssbio wiki for (hopefully) some solutions - or add yours in when you find the answer!
API¶
-
class
ssbio.protein.sequence.properties.scratch.
SCRATCH
(project_name, seq_file=None, seq_str=None)[source]¶ Provide wrappers for running and parsing SCRATCH on a sequence file or sequence string.
To run from the command line:
./run_SCRATCH-1D_predictors.sh input_fasta output_prefix [num_threads]
SCRATCH predicts:
Secondary structure
- 3 classes (helix, strand, other) using SSpro
- 8 classes (standard DSSP definitions) using SSpro8
Relative solvent accessibility (RSA, also known as relative accessible surface area)
- @ 25% exposed RSA cutoff (<25% RSA means it is buried)
- @ all cutoffs in 5% increments from 0 to 100
-
accpro20_results
()[source]¶ Parse the ACCpro output file and return a dict of secondary structure compositions
-
accpro20_summary
(cutoff)[source]¶ Parse the ACCpro output file and return a summary of percent exposed/buried residues based on a cutoff.
Below the cutoff = buried Equal to or greater than cutoff = exposed The default cutoff used in accpro is 25%.
- The output file is just a FASTA formatted file, so you can get residue level
- information by parsing it like a normal sequence file.
Parameters: cutoff (float) – Cutoff for defining a buried or exposed residue. Returns: Percentage of buried and exposed residues Return type: dict
-
accpro_results
()[source]¶ Parse the ACCpro output file and return a dict of secondary structure compositions.
-
accpro_summary
()[source]¶ Parse the ACCpro output file and return a summary of percent exposed/buried residues.
- The output file is just a FASTA formatted file, so you can get residue level
- information by parsing it like a normal sequence file.
Returns: Percentage of buried and exposed residues Return type: dict
-
run_scratch
(path_to_scratch, num_cores=1, outname=None, outdir=None, force_rerun=False)[source]¶ Run SCRATCH on the sequence_file that was loaded into the class.
Parameters: - path_to_scratch – Path to the SCRATCH executable, run_SCRATCH-1D_predictors.sh
- outname – Prefix to name the output files
- outdir – Directory to store the output files
- force_rerun – Flag to force rerunning of SCRATCH even if the output files exist
Returns:
-
sspro8_results
()[source]¶ Parse the SSpro8 output file and return a dict of secondary structure compositions.
-
sspro8_summary
()[source]¶ Parse the SSpro8 output file and return a summary of secondary structure composition.
- The output file is just a FASTA formatted file, so you can get residue level
- information by parsing it like a normal sequence file.
Returns: - Percentage of:
- H: alpha-helix G: 310-helix I: pi-helix (extremely rare) E: extended strand B: beta-bridge T: turn S: bend C: the rest
Return type: dict
-
sspro_results
()[source]¶ Parse the SSpro output file and return a dict of secondary structure compositions.
Returns: - Keys are sequence IDs, values are the lists of secondary structure predictions.
- H: helix E: strand C: the rest
Return type: dict
-
sspro_summary
()[source]¶ Parse the SSpro output file and return a summary of secondary structure composition.
- The output file is just a FASTA formatted file, so you can get residue level
- information by parsing it like a normal sequence file.
Returns: - Percentage of:
- H: helix E: strand C: the rest
Return type: dict
-
ssbio.protein.sequence.properties.scratch.
read_accpro20
(infile)[source]¶ Read the accpro20 output (.acc20) and return the parsed FASTA records.
Keeps the spaces between the accessibility numbers.
Parameters: infile – Path to .acc20 file Returns: Dictionary of accessibilities with keys as the ID Return type: dict