JaMBW Chapter 3.2.1
Word Comparison
The following applet allows to compare sequences using the word matching
mechanism well known as "DOT PLOT". Just paste in the horizontal sequence one sequence
and in the vertical sequence a different one, fill in the word size and the step size
and press compute. A set of dots will appear, identifying identical elements of the given
size in the two sequences. If instead you want to just analyze a single sequence, paste it
in the horizontal sequence window and after having filled the word size and step size push compute.
The dots will show the repeats present in your sequence, of that given size.
The "word comparison" is a conceptually very simple analysis which could produce very
useful and deep insights. It can be used for analyse both single sequences and pair of
sequences:
- Single sequence
Analyzing a single sequence with this applet allows the identification of repeats in a very straightforward manner.
Infact, identical elements of the sequence located in different parts of the whole construct will show up as dots
which "join" the same word present in different locations along the sequence. Just by clicking on the dot will be
revealed (in the upper window) the matching word and the location on the sequence. The text "horizontal" and "vertical sequence"
is used in order to recall the user that the drawing ideally represents the word identity along two sequences one placed
across the page and one along it. Dots identify identical words.
- Pair of sequences
Pairs of sequences analyzed by the word matching allows identification of common elements (defined as short stretches of
identical kind). In this way the comparison of pairs of sequences is made very easy and the analysis of the results
absolutely straightforward.
Some experiments useful to appreciate the value of the word matching algorithm
- Word Size
Try to vary the word size and see how the pattern change. Is there any relation between the word size and the "similarity"
of the sequences ? What happen when the word size get very small ? How useful become the pattern ? Would you like to try
to consider the word size as a "filter" to remove noisy information ? When the Word size is 1 and there is a single sequence
in the input what represents each dot ? What is the meaning of the symmetricity in the pattern obtained ?
- Step size
What happens by varying the step size ? Try to change the step size and the word size and observe how the pattern changes.
Can you draw any conclusion ? Is also the step size to be considered a "filter" ? How efficient is the filtering when performed by
the step size and by word size and by their combination ?
This applet accept at startup the following parameters:
- HORI : the sequence used horizontally in the graph
- VERT : the sequence used in the vertical part of the graph
- WOSI : the word size
- JUMP : the jump step (1 for sliding)
Planned improvements:
- allow a pure GIF or XBM output
- zooming
- ... anything else missing ? let me know !
Last modified:12 October 1996, by Luca I.G. TOLDO