A simple Taverna tutorial
Bioinformatics workflows are an extremely effective way for reducing the tedium of the traditional ‘cut & paste’ methods when using distributed services on the web. The Taverna Workbench is a fantastic tool for automating in-silico experiments. Not only is the automation of these large scale analysis pipelines great for time saving and eradicating human error, they enable provenance tracking of intermediate results and allow for method re-use by publishing the workflows you have designed. If you are a bioinformatician and you are not familiar with any kind of workflow technology then you need to catch up.
This is a simple tutorial on how to build a pipeline for the analysis of protein sequences. In this example I will demonstrate how to automate numerous BLASTp jobs on a large list of protein sequences.
First of all you need to add an input ‘proteins’. The BLAST program requires each protein to be in FASTA format. Therefore, in order to separate each sequence, a small piece of code is required:
public class RemoveFirstCharacter {
public String removeFirstCharacter(String fastaList) {
return fastaList.substring(1).toString();
}
}
This can be used as a web service by saving the file as .jws and saving it to your jakarta/webapps/axis directory. This can be accessed by copying and pasting the url of the WSDL file into Taverna by right clicking on the Available processors and selecting ‘add new WSDL scavenger’ or alternatively placing the code into a beanshell script. Connect the input to this processor. Then you need to follow these instructions:
- Connect the output of RemoveFirstCharacter to a ‘Split string into string list by regular expression’ processor (found in the Available Services window>Local Services >Local Java Widgets >text menu).
- A second input is required for this processor so you need to add another Local Service processor known as a String Constant to your workflow and edit the string value to ‘>’.
- This needs to be connected to the ‘Regex’ port.
- Each sequence must then be converted back into FASTA format by adding another processor ‘Concatenate_two_strings’.
- Connect the output of ‘Split string into string list by regular expression’ into the ’string2′ input port of concatenate and connect the same StringConstant (‘>’) to the ’string1′ port which will split your proteins into separate objects. See below:
This can then be connected to a number of services as each protein will be sent consecutively to the next processor. In this example I will demonstrate using Blastp. The WSDL file for this Blast service can be found here.
- Select the searchSimple processor from the list once it appears in the Available Services window.
- Connect the output port of the concatenate processor to the ‘query’ input for the Blast service.
- This services requires 2 extra inputs to define the database and the program. Therefore, add another two String Constants to the workflow.
- These can be renamed, for example to ‘Database’ and ‘Program’. In this example edit the string value of the Database processor to ‘SWISS’ and the Program to ‘blastp’.
- Connect these processors to the relevant input ports for the searchSimple processor. See below:
In my next post I will explain by providing examples of how to extract specific information from the BLAST reports and how to store this data into a relational database.
4 comments so far
Leave a reply


Thanks for this tutorial. I will check it out.
Hi,
This tutorial appears to be good, but i don’t obtain the simple code that posted in link. The link posted is going for a null page. How can obtain this simple code to conclude this tutorial?
Thanks, Guilherme
Hi,
Thank you for this tutorial. I set the database as SWISS and i obtain the swissprot blast result. I would like to know how can I extract the sequences from the blast result to do an multiple sequence alignment? I tried more a lot of things, like, seqret, which reads different files. Do you have any idea?
Thank you,
Mariana
Great tutorial!
Note that you shouldn’t need to make your own web service for such little shims as removeFirstCharacter(), in most of the cases the code should work fine also as a Beanshell:
Add a Beanshell processor called “removeFirstCharacter”. Create an input port called “input” and an output “output”. Set the script to:
String output = input.substring(1);