|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object edu.ucsb.nmsl.tools.LCS
public class LCS
This class is responsible for finding the Longest Common Subsequence (LCS) of a set. A subsequence of a set is defined as sequence, B of obtained by removing elements of sequence A. For example, if A = { 'A', 'B', 'C', 'B', 'D', 'A', 'B' } then {'A', 'B', 'B', 'B'} is a subsequence of A if you remove items 3, 5, 6 from the sequence. Furthermore, the longest common subsequence is a subsequence of two sequences containing the greatest number of common elements between the two sequences. Two sequences may have multiple common subsequences that are of the lognest length. The implementation provide in this class is based on the classic Dynamic Programming approach and achieves O(n2) performance.
The LCS class is capable of calculating an LCS for strings as well as collections of objects. Similarly, the sets returned by the LCS class are either collections of strings or collections of objects. Also, this implementation of the LCS offers a method called "getShortestEmbedding()". An embedding is defined a sequence of indices for a longest common subsequence. For example, an embedding, with respect to the first sequence, for the sequences { 'B', 'A', 'B' }, and { 'A', 'B', 'A' } is { 1, 2 } since { 'B', 'A' } is an LCS for the two sequences. Another embedding is { 2, 3 } since { 'A', 'B' } is also an LCS. The shortest embedding, however, is defined as the embedding with the shortest difference between the first and last index of an embedding. Again, it is possible for two sequences to have more than one shortest embedding, just as they can have more than one LCS.
The LCS algorithm is important to the alignment phase of the AutoCap process. Using this algorithm AutoCap is able to filter out words that do not match between what is recognized and what is actually in the transcript. Once all utterances have been collected during the speech recognition phase, the LCS is calculated between all the utterances and the transcript. The result is all the common words between the two, with the order retained. The time-stamps for the recognized words are then used to determine the time- stamp of each caption withing the transcript.
Future implementations of this algorithm could be done using the either the Hirschberg or Hunt-Szymanski Paradigm. In the case of AutoCap, the Hunt- Szymanski paradigm may be preferred since the Hirschberg paradigm can degrade to worse than O(n2) performance if the number of matches is not small compared to the size of each subsequence. Since this is not likely when using AutoCap, the Hunt-Szymanski paradigmn will perform better for this application
Field Summary | |
---|---|
protected static int[][] |
b
Print matrix used to reconstruct an LCS of two sequences |
protected static int[][] |
c
Distance matrix used to hold the lengths of the LCS |
protected static int |
DIAG
Indicates a move should be diagnal within the Print matrix |
protected static int |
LEFT
Indicates a move should be to the left within the Print matrix |
protected static char[] |
SYMBOLS
Symbols used to print the Print matrix for debuggin purposes |
protected static int |
UP
Indicates a move should be up within the Print matrix |
Constructor Summary | |
---|---|
LCS()
|
Method Summary | |
---|---|
protected static java.util.Collection |
buildLCS(java.util.Collection x)
This method is a helper function that recursively builds the actual LCS for two given collections. |
protected static void |
buildLCS(java.lang.Object[] x,
java.util.Collection col,
int i,
int j)
This recursive method is called by buildLCS and does the actual work of building the LCS from the print matrix. |
static java.util.Collection |
getLCS(java.util.Collection x,
java.util.Collection y)
This method computes the LCS of two collections and returns it as a collection. |
static java.util.Collection |
getLCSIndices(java.util.Collection x,
java.util.Collection y)
This method returns an embedding with respect to x for the sequences x and y. |
protected static void |
getLCSIndices(java.util.Collection col,
int i,
int j)
This method is a helper function that computes the indices of an LCS for a given collection. |
static java.util.Collection |
getShortestEmbedding(java.util.Collection x,
java.util.Collection y)
This method returns the shortest embedding with respect to x between subsequences x and y. |
private static java.util.Collection |
getShortestEmbeddingFrom(java.util.Collection x,
java.util.Collection y,
int startI,
int startJ)
This method returns the shortest embedding with respect to x between subsequences x and y starting from startI and startJ the distance matrix. |
static java.lang.String |
getWordLCS(java.lang.String x,
java.lang.String y)
This method computes the LCS of two stringss and returns it as a string. |
static void |
main(java.lang.String[] args)
This main method is for testing and debugging purposes. |
protected static void |
populateMatrix(java.util.Collection x,
java.util.Collection y)
This method is called to popluate both the distance and print matrix. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static int[][] c
protected static int[][] b
protected static final int DIAG
protected static final int UP
protected static final int LEFT
protected static final char[] SYMBOLS
Constructor Detail |
---|
public LCS()
Method Detail |
---|
public static java.util.Collection getLCS(java.util.Collection x, java.util.Collection y)
x
- A sequence that will be compared to y to find their LCS.y
- A sequence that will be compared to x to find their LCS.
public static java.lang.String getWordLCS(java.lang.String x, java.lang.String y)
x
- A String that will be compared to y to find their LCS.y
- A String that will be compared to x to find their LCS.
public static java.util.Collection getLCSIndices(java.util.Collection x, java.util.Collection y)
x
- A sequence that will be compared to y to find their LCS.y
- A sequence that will be compared to x to find their LCS.
private static java.util.Collection getShortestEmbeddingFrom(java.util.Collection x, java.util.Collection y, int startI, int startJ)
x
- A sequence that will be compared to y to find the shortest
embedding.y
- A sequence that will be compared to x to find the shortest
embedding.startI
- The starting row in the distance matrix for this call.startJ
- The starting column in the distance matrix for this call.
public static java.util.Collection getShortestEmbedding(java.util.Collection x, java.util.Collection y)
x
- A sequence that will be compared to y to find the shortest
embedding.y
- A sequence that will be compared to x to find the shortest
embedding.
protected static void getLCSIndices(java.util.Collection col, int i, int j)
col
- A collection that holds the indices of an embedding.protected static java.util.Collection buildLCS(java.util.Collection x)
x
- One of the collections for which the LCS is being calculated.
protected static void buildLCS(java.lang.Object[] x, java.util.Collection col, int i, int j)
x
- An array of Objects that is one of the collections for which the
LCS is being calculated.col
- The collection that holds the LCS after it is built.i
- The index of the row in the print matrix that is currently being
inspected.j
- The index of the column in the print matrix that is currently
being used inspected.protected static void populateMatrix(java.util.Collection x, java.util.Collection y)
x
- A sequence that will be compared to y to find their LCS.y
- A sequence that will be compared to x to find their LCS.public static void main(java.lang.String[] args)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |