Author: N. Gossmann
ConCysFind is a pipeline tool searching conserved amino acids in protein sequences of plant kingdom

ConCysFind was developed on behalf of the department "Plant Biochemistry and Physiology" at the University of Bielefeld. The development was supported by the department "Computational Metagenomics" at same university. As a basis served the pipeline of A. Sahm, who has already searched for conserved cysteines in transcription factors of the Plant Transcription Factor Database (PlantTFDB) and could show a conservation for several transcription factors. In previous versions, only conserved cysteines were sought. Now the possibility is given to search for further conserved amino acids: tryptophan, serine, threonine, tyrosine and methionine. The search of conserved amino acids is limited to the plant kingdom. For this purpose, 20 plant species were considered, covering as many families of the plant kingdom as possible. A phylogenetic tree of these species, created with the Tree of Life Web Project (Maddison 2007) can be viewed here: phylogenetic tree created with the Tree of Life Web project All protein sequences of these plant species were downloaded from UniProt in FASTA format. This tool is working with this database.

Working steps:

  1. Finding Homologues to the query sequence by inclosed tool blastp, then filtering the best homologues from each species to represent the conservation between all species
  2. Building a Multiple Alignment with heuristic progressive methods (Greedy algorithm) of all filtered homologues (one homologue per species)
  3. Based on the computed Multiple Alignment calculating a degree of conservation (Cysteine Score and P-Value)
  4. Additionally building a phylogenetic Guide Tree computing by Neighbor Joining algorithm (Saitou and Nei 1987) based on Multiple Alignment

ConCysFind has two functions:

  1. First function allows you to give in a tsv input, tab separated view with UniProt ID in first column and protein description in second column. You can put several queries in this input, one query per line. You can also upload an input tsv file. For every query there will be processed a multiple alignment, and for every found amino acid in query sequence there will be calculated an amino acid score and P-Value and also the phylogenetic tree.
  2. Second function allows you to give in a single amino acid sequence (without a UniProt ID). You can also upload an file containing only one sequence. Interesting: In next view by setting parameters you can choose whether you want to prove amino acid conservation for all found positions (by setting searched position 0) or you want to prove only for one particular position. In case of given particular position the programm iterate the working steps 2-4 with further found homologues (max. 9 iterations). The opinion is to consider possible isoforms, mutations or mistakes happend by sequencing the protein sequence, and to find possibly higher amino acid score by looking at next similar homologues.

