EMBOSS

Description

EMBOSS is the European Molecular Biology Open Software Suite. EMBOSS contains a wide array of general purpose bioinformatics programs. For the GEM-PRO pipeline, we mainly need the needle pairwise alignment tool (although this can be replaced with Biopython’s built-in pairwise alignment function), and the pepstats protein sequence statistics tool.

Installation instructions (Ubuntu)

Note

These instructions were created on an Ubuntu 17.04 system.

  1. Install the EMBOSS package which contains many programs

    sudo apt-get install emboss
    
  2. And then once that installs, try running the needle program:

    needle
    

Installation instructions (Mac OSX, other Unix)

  1. Just install after downloading the EMBOSS source code

    ./configure
    make
    sudo make install
    

Program execution

In the shell

To run the program on its own in the shell…

needle

With ssbio

To run the program using the ssbio Python wrapper, see:

FAQs

  • How do I cite EMBOSS?

  • I’m having issues running EMBOSS programs…

    • See the ssbio wiki for (hopefully) some solutions - or add yours in when you find the answer!

API

ssbio.protein.sequence.properties.residues.biopython_protein_analysis(inseq)[source]

Utiize Biopython’s ProteinAnalysis module to return general sequence properties of an amino acid string.

For full definitions see: http://biopython.org/DIST/docs/api/Bio.SeqUtils.ProtParam.ProteinAnalysis-class.html

Parameters:inseq – Amino acid sequence
Returns:Dictionary of sequence properties. Some definitions include: instability_index: Any value above 40 means the protein is unstable (has a short half life). secondary_structure_fraction: Percentage of protein in helix, turn or sheet
Return type:dict

Todo

Finish definitions of dictionary

ssbio.protein.sequence.properties.residues.emboss_pepstats_on_fasta(infile, outfile='', outdir='', outext='.pepstats', force_rerun=False)[source]

Run EMBOSS pepstats on a FASTA file.

Parameters:
  • infile – Path to FASTA file
  • outfile – Name of output file without extension
  • outdir – Path to output directory
  • outext – Extension of results file, default is “.pepstats”
  • force_rerun – Flag to rerun pepstats
Returns:

Path to output file.

Return type:

str

ssbio.protein.sequence.properties.residues.emboss_pepstats_parser(infile)[source]

Get dictionary of pepstats results.

Parameters:infile – Path to pepstats outfile
Returns:Parsed information from pepstats
Return type:dict

Todo

Only currently parsing the bottom of the file for percentages of properties.

ssbio.protein.sequence.properties.residues.flexibility_index(aa_one)[source]

From Smith DK, Radivoja P, ObradovicZ, et al. Improved amino acid flexibility parameters, Protein Sci.2003, 12:1060

Author: Ke Chen

Parameters:aa_one

Returns:

ssbio.protein.sequence.properties.residues.grantham_score(ref_aa, mut_aa)[source]

https://github.com/ashutoshkpandey/Annotation/blob/master/Grantham_score_calculator.py