

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object edu.ucsb.nmsl.tools.LCS
public class LCS
This class is responsible for finding the Longest Common Subsequence (LCS) of a set. A subsequence of a set is defined as sequence, B of obtained by removing elements of sequence A. For example, if A = { 'A', 'B', 'C', 'B', 'D', 'A', 'B' } then {'A', 'B', 'B', 'B'} is a subsequence of A if you remove items 3, 5, 6 from the sequence. Furthermore, the longest common subsequence is a subsequence of two sequences containing the greatest number of common elements between the two sequences. Two sequences may have multiple common subsequences that are of the lognest length. The implementation provide in this class is based on the classic Dynamic Programming approach and achieves O(n^{2}) performance.
The LCS class is capable of calculating an LCS for strings as well as collections of objects. Similarly, the sets returned by the LCS class are either collections of strings or collections of objects. Also, this implementation of the LCS offers a method called "getShortestEmbedding()". An embedding is defined a sequence of indices for a longest common subsequence. For example, an embedding, with respect to the first sequence, for the sequences { 'B', 'A', 'B' }, and { 'A', 'B', 'A' } is { 1, 2 } since { 'B', 'A' } is an LCS for the two sequences. Another embedding is { 2, 3 } since { 'A', 'B' } is also an LCS. The shortest embedding, however, is defined as the embedding with the shortest difference between the first and last index of an embedding. Again, it is possible for two sequences to have more than one shortest embedding, just as they can have more than one LCS.
The LCS algorithm is important to the alignment phase of the AutoCap process. Using this algorithm AutoCap is able to filter out words that do not match between what is recognized and what is actually in the transcript. Once all utterances have been collected during the speech recognition phase, the LCS is calculated between all the utterances and the transcript. The result is all the common words between the two, with the order retained. The timestamps for the recognized words are then used to determine the time stamp of each caption withing the transcript.
Future implementations of this algorithm could be done using the either the Hirschberg or HuntSzymanski Paradigm. In the case of AutoCap, the Hunt Szymanski paradigm may be preferred since the Hirschberg paradigm can degrade to worse than O(n^{2}) performance if the number of matches is not small compared to the size of each subsequence. Since this is not likely when using AutoCap, the HuntSzymanski paradigmn will perform better for this application
Field Summary  

protected static int[][] 
b
Print matrix used to reconstruct an LCS of two sequences 
protected static int[][] 
c
Distance matrix used to hold the lengths of the LCS 
protected static int 
DIAG
Indicates a move should be diagnal within the Print matrix 
protected static int 
LEFT
Indicates a move should be to the left within the Print matrix 
protected static char[] 
SYMBOLS
Symbols used to print the Print matrix for debuggin purposes 
protected static int 
UP
Indicates a move should be up within the Print matrix 
Constructor Summary  

LCS()

Method Summary  

protected static java.util.Collection 
buildLCS(java.util.Collection x)
This method is a helper function that recursively builds the actual LCS for two given collections. 
protected static void 
buildLCS(java.lang.Object[] x,
java.util.Collection col,
int i,
int j)
This recursive method is called by buildLCS and does the actual work of building the LCS from the print matrix. 
static java.util.Collection 
getLCS(java.util.Collection x,
java.util.Collection y)
This method computes the LCS of two collections and returns it as a collection. 
static java.util.Collection 
getLCSIndices(java.util.Collection x,
java.util.Collection y)
This method returns an embedding with respect to x for the sequences x and y. 
protected static void 
getLCSIndices(java.util.Collection col,
int i,
int j)
This method is a helper function that computes the indices of an LCS for a given collection. 
static java.util.Collection 
getShortestEmbedding(java.util.Collection x,
java.util.Collection y)
This method returns the shortest embedding with respect to x between subsequences x and y. 
private static java.util.Collection 
getShortestEmbeddingFrom(java.util.Collection x,
java.util.Collection y,
int startI,
int startJ)
This method returns the shortest embedding with respect to x between subsequences x and y starting from startI and startJ the distance matrix. 
static java.lang.String 
getWordLCS(java.lang.String x,
java.lang.String y)
This method computes the LCS of two stringss and returns it as a string. 
static void 
main(java.lang.String[] args)
This main method is for testing and debugging purposes. 
protected static void 
populateMatrix(java.util.Collection x,
java.util.Collection y)
This method is called to popluate both the distance and print matrix. 
Methods inherited from class java.lang.Object 

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
Field Detail 

protected static int[][] c
protected static int[][] b
protected static final int DIAG
protected static final int UP
protected static final int LEFT
protected static final char[] SYMBOLS
Constructor Detail 

public LCS()
Method Detail 

public static java.util.Collection getLCS(java.util.Collection x, java.util.Collection y)
x
 A sequence that will be compared to y to find their LCS.y
 A sequence that will be compared to x to find their LCS.
public static java.lang.String getWordLCS(java.lang.String x, java.lang.String y)
x
 A String that will be compared to y to find their LCS.y
 A String that will be compared to x to find their LCS.
public static java.util.Collection getLCSIndices(java.util.Collection x, java.util.Collection y)
x
 A sequence that will be compared to y to find their LCS.y
 A sequence that will be compared to x to find their LCS.
private static java.util.Collection getShortestEmbeddingFrom(java.util.Collection x, java.util.Collection y, int startI, int startJ)
x
 A sequence that will be compared to y to find the shortest
embedding.y
 A sequence that will be compared to x to find the shortest
embedding.startI
 The starting row in the distance matrix for this call.startJ
 The starting column in the distance matrix for this call.
public static java.util.Collection getShortestEmbedding(java.util.Collection x, java.util.Collection y)
x
 A sequence that will be compared to y to find the shortest
embedding.y
 A sequence that will be compared to x to find the shortest
embedding.
protected static void getLCSIndices(java.util.Collection col, int i, int j)
col
 A collection that holds the indices of an embedding.protected static java.util.Collection buildLCS(java.util.Collection x)
x
 One of the collections for which the LCS is being calculated.
protected static void buildLCS(java.lang.Object[] x, java.util.Collection col, int i, int j)
x
 An array of Objects that is one of the collections for which the
LCS is being calculated.col
 The collection that holds the LCS after it is built.i
 The index of the row in the print matrix that is currently being
inspected.j
 The index of the column in the print matrix that is currently
being used inspected.protected static void populateMatrix(java.util.Collection x, java.util.Collection y)
x
 A sequence that will be compared to y to find their LCS.y
 A sequence that will be compared to x to find their LCS.public static void main(java.lang.String[] args)


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 