edu.ucsb.nmsl.autocap
Class CaptionAligner

java.lang.Object
  extended by edu.ucsb.nmsl.autocap.CaptionAligner

public class CaptionAligner
extends java.lang.Object

This class is reponsible for aligning captions from utterances recognized by Sphinx. This is done in essentially two steps, first find longest common substrings and bursts, then estimating captions that are not recognized. These two steps will eventually be put in their own classes. This is a proof of concept implementation, not a final implementation.

Version:
1.0
Author:
Allan Knight

Field Summary
(package private) static int BURST_MIN_LENGTH
          Burst length for alignment.
(package private)  java.util.List bursts
          Linked list of bursts from the utterances in the transcript.
(package private)  java.util.regex.Pattern endPat
          RegEx pattern for matching the end of a time from Sphinx.
(package private)  double finishTime
          Holds finish time of aligning so time can be measured.
(package private)  java.util.LinkedList Raw
          Linked list of all words in the trancript.
(package private)  java.lang.String RawText
          Raw text of all recognized words
(package private)  double sentencesCovered
          Holds number of sentences with some coverage.
(package private)  DataSetStatistic SpeakingRate
          Collects all the speaking rates for analysis
(package private)  double speakingTime
          Holds the amount of time speaking during bursts.
(package private)  java.util.regex.Pattern startPat
          RegEx pattern for matching the start of a time from Sphinx.
(package private)  double startTime
          Holds start time of aligning so time can be measured.
(package private)  java.util.LinkedList text
          Linked list of all words in text, lowercase and punctuation removed.
(package private)  java.util.LinkedList Timed
          Linked list of all recognized words and their time-stamps as returned by Sphinx.
(package private)  double totalSentences
          Holds total number of sentences in transcript.
(package private)  double totalWords
          Holds total number of words in transcript.
(package private)  Transcript transcript
          Collection of transcripts of type Caption.
(package private)  DataSetStatistic UncoveredRate
          Collects all speaking rates for unrecognized word bursts.
(package private)  DataSetStatistic UncoveredWords
           
(package private)  double wordsMatched
          Holds number of words from transcript matched.
 
Constructor Summary
CaptionAligner(Transcript t)
          This constructor takes in a DOM Document that contains captioning information for a particular presentation.
 
Method Summary
 boolean addUtterance(edu.cmu.sphinx.result.Result r)
          Adds utterances as they come in from Sphinx.
protected  boolean collectBursts()
          This method collects all the burst of minimum lenght of recognized, and therefore time-stamped words.
private  Caption createTimedCaption(java.util.LinkedList burst, java.util.LinkedList raw, java.util.LinkedList time)
          Creates a timed caption from a burst.
protected  void extractSentences(Transcript t)
          Extracts sentences from XML document that contains captioning information for a presentation.
 Transcript getAlignedCaptions()
          Aligns the captions that we have collected.
private  java.lang.String join(java.util.Collection x)
          Helper function similar to PERL's join function.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

transcript

Transcript transcript
Collection of transcripts of type Caption.


text

java.util.LinkedList text
Linked list of all words in text, lowercase and punctuation removed.


bursts

java.util.List bursts
Linked list of bursts from the utterances in the transcript.


startPat

java.util.regex.Pattern startPat
RegEx pattern for matching the start of a time from Sphinx.


endPat

java.util.regex.Pattern endPat
RegEx pattern for matching the end of a time from Sphinx.


speakingTime

double speakingTime
Holds the amount of time speaking during bursts.


wordsMatched

double wordsMatched
Holds number of words from transcript matched.


totalWords

double totalWords
Holds total number of words in transcript.


startTime

double startTime
Holds start time of aligning so time can be measured.


finishTime

double finishTime
Holds finish time of aligning so time can be measured.


totalSentences

double totalSentences
Holds total number of sentences in transcript.


sentencesCovered

double sentencesCovered
Holds number of sentences with some coverage.


SpeakingRate

DataSetStatistic SpeakingRate
Collects all the speaking rates for analysis


UncoveredRate

DataSetStatistic UncoveredRate
Collects all speaking rates for unrecognized word bursts.


UncoveredWords

DataSetStatistic UncoveredWords

BURST_MIN_LENGTH

static final int BURST_MIN_LENGTH
Burst length for alignment. Any burst must have at least this many words in it.

See Also:
Constant Field Values

Timed

java.util.LinkedList Timed
Linked list of all recognized words and their time-stamps as returned by Sphinx.


Raw

java.util.LinkedList Raw
Linked list of all words in the trancript. This data memeber along with RawText is used to calculate the LCS during the Alignment phase of AutoCap.


RawText

java.lang.String RawText
Raw text of all recognized words

Constructor Detail

CaptionAligner

public CaptionAligner(Transcript t)
This constructor takes in a DOM Document that contains captioning information for a particular presentation. Extracts sentences from document and then compile patterns used in other methods.

Parameters:
t - The transcript for which alignment of captions will be performed.
Method Detail

extractSentences

protected void extractSentences(Transcript t)
Extracts sentences from XML document that contains captioning information for a presentation. All sentences are in the text attribute of the language field. All other fields are ignored.

Parameters:
t - - Transcript object that contains captions.

addUtterance

public boolean addUtterance(edu.cmu.sphinx.result.Result r)
Adds utterances as they come in from Sphinx. Finds the bursts and adds them to the burst LinkedList. These are processed later for estimating start times of each caption.

Parameters:
r - - The Result object containing the most recent utterance.
Returns:
true if successful, false otherwise.

collectBursts

protected boolean collectBursts()
This method collects all the burst of minimum lenght of recognized, and therefore time-stamped words. These bursts are used to actually extract the time when words were spoken in the video.


getAlignedCaptions

public Transcript getAlignedCaptions()
Aligns the captions that we have collected. That means, it iterates through each word and marks those it can identify with the correct time. This method is called once all all the audio has been processed. This method should eventually be replaced by a class that is plugged in to provide this functionality.

Returns:
A Transcript with all captions aligned to approximately where they were spoken in a given video or audio file.

createTimedCaption

private Caption createTimedCaption(java.util.LinkedList burst,
                                   java.util.LinkedList raw,
                                   java.util.LinkedList time)
Creates a timed caption from a burst. Given all the words of a burst and all the words and timings from an entire recognized utterance, extract the timings for those words that are in the burst.

Parameters:
burst - The raw text of a burst of recognized words.
raw - Raw text of owrds in transcript.
time - Text and time-stamp of recognized words as returned by Sphinx.
Returns:
A caption with all the words in the burst with their time-stamps.

join

private java.lang.String join(java.util.Collection x)
Helper function similar to PERL's join function. Returns a string with a space between each object in the collection.

Parameters:
x - The collection of objects to be joined as a string.
Returns:
A string with all the objects in x joined as a string.