Friday, May 2, 2025

Cloning in silico-PDF

Share

ESTs based

databases of pre-clustered ESTs

A shortcut to obtain either consensus sequence (TIGR) or a set of ESTs (Unigene) derived from a gene of interest.

  • STACKdb (limited access, tissue-specific splice forms) [1]
  • Unigene (no consensus sequence) [2]
  • TIGR [3]

Search of EST databases using BLAST

  1. Depending on the level of homology we can use:
  • blastn program, cDNA sequence as query, EST DB from the same species (== novel splice forms discovery in the same species)
  • tblastn program, protein sequence as a query, EST DB from the same (==paralogue discovery) or other species (== cloning any homologs)

If possible, use protein sequences from related species i.e. zebrafish protein when looking for a homolog in salmon), but for a large number of proteins one can detect homology between humans and C.elegans.

  1. Restrict blast output with species, i.e search only porcine ESTs to simplify the output
  2. On the BLAST output page select reasonable hits by checking a box on the left in the alignment section.
  3. Retrieve all checked results as FASTA file (i.e. pig_Xgene_ESTs_date_round1.fasta
  4. check how many sensible hits you got, i.e. using grep on Unix/Linux
grep '>' pig_Xgene_ESTs_date_round1.fasta | wc 
  1. assembly all your EST sequences using phrap (on Unix command line):
phrase pig_Xgene_ESTs_date_round1.fasta

you should get file: pig_Xgene_ESTs_date_round1.fasta.contigs

If you do not have phrap you may use:

  • CAP3
  • ESSEM (Est’s aSSEmbly using Malig) from the Technical University of Catalonia.

You may download sequences of human SYNGR4 [ESTs http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?UGID=221005&TAXID=9606&SEARCH= here], save it as FASTA file and then feed CAP3 or ESSEM with it to check how it works. Use Suggested assembly sequence:

>assembly: gnl|UG|Hs# -> gnl|UG|Hs# (R)
TTTTTTTTTTTTTTTGTTTTTAGAAACCCTTCTGGAGGGAGGATTCTCTCTTTATTGATTTGGATAAGGATATTTAGTTG
TCAGGCATCATAGCAAGCCGGGGGGACTTTGGAGCGGTCAGACAGGGGGACAGGGCAGAGCTAGCATAACTCAGGCTGTT
GGGGCCAGTGGTGGGCATGTTCACAGGGCTGTTGGCAGAGGGCAAGGGGAGGGTGGTCAGCACCATGCCACCCTCATCCA
GGAAGCGCTTGTAAGGGACTGGAGCATCATTTCGGAGGTCCTGGAATGCCAGGTAGGCCTGGAATATCCAGACAAGGATG
GAGAAGAAGGTGAAGGCGATGGCTGCCTGGCACTGCTGCTCCCCAGGAGGAACTCTTTGGGCGGCGAATGCTGCCATTGG
TTGGCCAGGAAGCAGAAACCCATGAACCAGACAACTGCCCAGAGAACAGCCAGGATGAAGTCCAGGAGCTGGAAGGCTGT
CTTGAAGCGGGTGCCGGCAATGCGGGTCTCCTGTGTGTCCAGGACGAGGAAGGCCAGCCACGCTGAGGAAGGCCAGGAAG
CCGGCTCCCACGGCAAAGCTGCAGGCCACGCTGTTGCTGTTGAGAATGCAGTGGAGCTGCGGAGACTCCATCTTGTTCTG
GTAGCCGTCGGTCAGCAGGGAGGAGAAGACGATCAGGGAGAAGACCCCTGCCTCCCCCACACTCTCCTTCTGCCACCAAA
CC
  1. mask possible repeats using the RepeatMasker server. EST libraries are notorious for containing non-spliced ESTs/contaminations.
  1. use masked consensus sequence (MCS) from the step above in the next round of BLAST search:

in the blastn program, MCS as query, EST DB from the same species

check how many sensible hits you got.

  1. repeat EST assembly, repeat masking, and compare new EST contigs with contigs from the previous step until you get no new hits in the EST database.
  2. after every assembly step make sure that the contig you use contains a sequence of interest (== compare it with the first cDNA or protein sequence)

Genome annotation using ESTs assembly

  • PASA http://www.tigr.org/tdb/e2k1/ath1/pasa_annot_updates/pasa_annot_updates.shtml

Importing human, mouse, and zebrafish EST trace files

For a significant subset of human, mouse, and zebrafish ESTs there are available trace and even experiment files. For sane gene cloning, we need them because:

  • sequences in GeneBank are usually shorter than original trace files
  • there is no way you can detect a sequencing error in plain text/fasta file without looking at the trace file

To get them one can search for relevant trace files using Sanger’s Trace server:

http://trace.ensembl.org/cgi-bin/tracesearch

or NCBI http://www.ncbi.nlm.nih.gov/blast/mmtrace.shtml

After blasting one can retrieve trace files as compressed tar in SCF or RCF. RCF is encoded & shrunk SCF: obtain and compile the rcf2scf program here if you plan to get a large number of trace files for speeding up transfer times.

Genome-based

  • based on homology
  • de novo

This will be covered in the genome annotation guide.

 

 This article is a stub. You can help OpenWetWare by expanding it.

Read more

Local News