How do I use Witness for the Whales?
If you have a tissue sample from a cetacean, you can obtain expert and reliable identification of the species in two steps:- Use standard molecular laboratory techniques to obtain nucleotide sequence from the mtDNA control region (5'end) OR mtDNA cytochrome b (5'end).
- Submit the sequence to this site and select the appropriate reference sequence dataset for comparison. Anadvanced cluster search option gives you the opportunity to perform a bootstrap analysis, while the maximum likelihood will perform more rigorous statistical analyses in placing your query sequence on the tree. Both the advanced cluster and maximum likelihood options will send you the results by email.
Be awarethat there are issues of interpretation which you must bear in mind when using this site.
Search Strategy
You will have the greatest success if you use a hierarchical or iterative approach to identifying the source of your sequence. There are several reference sets to choose from, and each is available for both the mtDNA control region and cytochrome b.- Using the simple search strategy, start with the first reference set (All Cetaceans) to determine the suborder or family which is most closely related to your sequence. You may wish to view a summary of the phylogeny of whales, dolphins and porpoises which forms the basis of our reference datasets.
- Then choose one of the more specific and more detailed reference sets to fine-tune your analysis.
- Then, repeat the search using the advanced mode, and use bootstrap resampling to evaluate the robustness of your identification.
- If bootstrap support for the species grouping of your test sequence is low, it may be worth doing a full alignment (versus a profile alignment) with the appropriate reference dataset.
Submitting a Sequence
To submit a sequence for analysis:- click on the Simple search link
- paste your sequence into the Data Entry window
- select the reference dataset and the genomic locus
- click on the Submit button
>mysampleor
ACCATAATAGTACAGCTGAAGGAATCTGTAGAAATTAAACCATAATAGTACAGCTGAAGGAATC
GTAGAAATTAAACCATAATAGTACAGCTGAAGGAATCTGTAGAAATTAAACCATAATAGTACAG
CTGAAGGAATCTGTAGAAATTAA
ACCATAATAGTACAGCTGAAGGAATCTGTAGAAATTAAACCATAATAGTACAGCTGAAGGAATCOnly one sequence may be submitted at a time.
GTAGAAATTAAACCATAATAGTACAGCTGAAGGAATCTGTAGAAATTAAACCATAATAGTACAG
CTGAAGGAATCTGTAGAAATTAA
If your sequence contains illegal characters, that is those not included in the IUPAC ambiguity codes, then it will be rejected with an error message. If your sequence does contain any of the ambiguity codes, then they will be used both in aligning the sequence and in calculating evolutionary distances.
Your sequence will be analysed automatically. Please wait about 15 seconds and then click the Retrieve Results button to view your results. It will take longer for results to become available if full alignment and/or bootstrap resampling are requested.
IUPAC Nucleotide Codes
Ambiguous | Symbol | Meaning | Origin of designation |
G | G | Guanine | |
A | A | Adenine | |
T | T | Thymine | |
C | C | Cytosine | |
U | U | Uracil | |
X | R | G or A | puRine |
X | Y | T or C | pYrimidine |
X | M | A or C | aMino |
X | K | G or T | Keto |
X | S | G or C | Strong interaction (3 H bonds) |
X | W | A or T | Weak interaction (2 H bonds) |
X | H | A or C or T | not-G, H follows G in the alphabet |
X | B | G or T or C | not-A, B follows A |
X | V | G or C or A | not-T (not-U), V follows U |
X | D | G or A or T | not-C, D follows C |
X | N | G or A or T or C | aNy |
Advanced search and bootstrapping
The Advanced search window adds additional functions to the search process:Bootstrapping
To perform a bootstrap analysis:- click on the Advanced search link
- paste your sequence into the Data Entry window
- select the reference dataset and genomic locus
- select the number of bootstrap replicates you require
- optionally enter an email address to which the results will be sent
- click on the Submit button
Emailed response
You can choose to have the results sent to you by email. If you enter an optional email address, you can close your browser once the search has been submitted.Maximum Likelihood Analysis
The reference alignment, and the associated phylogenetic tree, are considered to be prior knowledge about the relationships among the reference organisms. Potentially the query sequence can be joined to that tree on any branch. We seek the connection point that has the highest statistical likelihood, thereby giving the maximum likelihood estimate of the relationship between the query and reference sequences. The maximum likelihood connection point is represented in the output by a dashed branch. For a particular connection point the determined likelihood score is the maximum likelihood estimate under the associated topology (that is, all the branch lengths are re-optimised for each connection point).The Shimodaira-Hasegawa (SH) test is used for assessing a confidence limit on the connection point with the highest expected likelihood. The expected likelihood of a connection point is the expectation of likelihood under the true process of evolution (as a random variable). The SH test calculates such a confindence limit by simulating replicate datasets under an approximation of the least configurable configuration (LFC) in which is that all connection points have equivalent expected likelihoods, and comparing the observed differences in likelihood with the expected distribution of likelihoods under the LFC.
The utilised implementation of the SH test simulates 1000 non-parameteric bootstraps, and uses the RELL (Shimodaira and Hasegawa 1999) approximation. Branches that represent connection points within the confidence limit are colour red. A critical value of ?= 0.05 is used (95%confidence limit).
The Results
The results will be displayed first as a phylogenetic tree in which the differences between sequences are proportional to the lengths of thehorizontal branches separating the tips. The names of the reference species are colour-coded to help you identify close relatives. To save a copy of the tree as a PNG-format file, right-click (PC) or control-click (Mac) on the image and choose Download Image to Disk, or similar, from the pop-up menu.If you have performed a bootstrap analysis, the resulting phylogenetic tree will display numbers at some of the nodes. These numbers are the percentage of bootstrap pseudoreplicates that contain the clade formed by the subtree starting at that node. This measure of bootstrap supportis displayed only when at least 50%of the pseudoreplicates contain the clade. The phylogenetic tree displayed is the estimated tree, and not the consensus of the bootstrap pseudoreplicate trees.
If you scroll further down past the tree, you will also find a table showing the evolutionary distances between the user-submitted sequence and each of the sequences in the reference set. Sites having IUPAC ambiguity codes are included in the calculation of evolutionary distances. To save the contents of the table to disk, select all of the table, copy it, open a text file document on your computer (eg Notepad or SimpleText) and then paste it in.
If you scroll further down further again, there is a text version of the phylogenetic tree in Newick format. To save this to disk, select the contents of the text box in which it is displayed, open a text file document on your computer (eg Notepad or SimpleText) and then paste it in.
You can fine-tune your analysis by clicking on the Submit a sequence link to return to the Data Entry page, where you can choose a different reference set.
Issues of Interpretation
Is it a cetacean?Witness for the Whales is an online service for the identification of cetaceans by phylogenetic analysis. Its scope is limited to the cetaceans, and any submitted sequence will be treated as if it were derived froma cetacean. A simple system has been implemented to flag sequences which might give unreliable results. Nevertheless, it remains the responsibility of the user to decide whether a phylogenetic analysis is appropriate in their individual case. The user should also seek other evidence to corroborate that any DNA sequence which they submit is actually cetacean in origin, perhaps by searching Genbank.
Poorly Resolved Species Groups
The derivation of a phylogenetic tree from DNA sequence data which reflects the taxonomy of cetaceans is dependent on the alignment of the sequences, and the ability of the locus in question to differentiate among the species. This is on the assumption that species recognised on traditional morphological grounds will also possess diagnostic genetic characters distinguishing them from all other species. In some groups (e.g., subfamily Delphininae), this does not seem to be the case due likely to the recent and rapid rate of diversification of these species. Due to this problem, a warning note will appear on screen for all user-submitted sequences identified as members of the family Delphininae. When in doubt, consult our reference phylogenetic tree.Due to the rapid rate of mutation of the mtDNA control region and frequency of insertion/deletion mutations (indels), it can be difficult to align sets of sequences which represent a large proportion of the genetic diversity observed among cetaceans at this locus (i.e., the "All Cetaceans"and "Odontocetes"datasets). Establishing positional homology among all nucleotide sites in alignments of sequences in these datasets is problematic, and multiple alignments are often equally plausible. Consequently, test sequences that are compared to the mtDNA control region "All Cetaceans"and "Odontocetes"datasets may be somewhat "mis-aligned", and as a result, may be slightly misplaced on the phylogenetic tree. Nevertheless, all test sequences will be placed close to the appropriate group. This problem is resolved as the user searches further down through the hierarchical series of datasets.
When in doubt about the species identification suggested by the phylogenetic analysis, resubmit your sequence using a database giving finer resolution (e.g., at the family or sub-family level), or use the advanced search mode, and use the full alignment method and/or bootstrap resampling of the data, or all of the above. Other sources of information, including other loci, may be needed to provide corroborating results.