|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object edu.ucsb.nmsl.autocap.CaptionAligner
public class CaptionAligner
This class is reponsible for aligning captions from utterances recognized by Sphinx. This is done in essentially two steps, first find longest common substrings and bursts, then estimating captions that are not recognized. These two steps will eventually be put in their own classes. This is a proof of concept implementation, not a final implementation.
Field Summary | |
---|---|
(package private) static int |
BURST_MIN_LENGTH
Burst length for alignment. |
(package private) java.util.List |
bursts
Linked list of bursts from the utterances in the transcript. |
(package private) java.util.regex.Pattern |
endPat
RegEx pattern for matching the end of a time from Sphinx. |
(package private) double |
finishTime
Holds finish time of aligning so time can be measured. |
(package private) java.util.LinkedList |
Raw
Linked list of all words in the trancript. |
(package private) java.lang.String |
RawText
Raw text of all recognized words |
(package private) double |
sentencesCovered
Holds number of sentences with some coverage. |
(package private) DataSetStatistic |
SpeakingRate
Collects all the speaking rates for analysis |
(package private) double |
speakingTime
Holds the amount of time speaking during bursts. |
(package private) java.util.regex.Pattern |
startPat
RegEx pattern for matching the start of a time from Sphinx. |
(package private) double |
startTime
Holds start time of aligning so time can be measured. |
(package private) java.util.LinkedList |
text
Linked list of all words in text, lowercase and punctuation removed. |
(package private) java.util.LinkedList |
Timed
Linked list of all recognized words and their time-stamps as returned by Sphinx. |
(package private) double |
totalSentences
Holds total number of sentences in transcript. |
(package private) double |
totalWords
Holds total number of words in transcript. |
(package private) Transcript |
transcript
Collection of transcripts of type Caption. |
(package private) DataSetStatistic |
UncoveredRate
Collects all speaking rates for unrecognized word bursts. |
(package private) DataSetStatistic |
UncoveredWords
|
(package private) double |
wordsMatched
Holds number of words from transcript matched. |
Constructor Summary | |
---|---|
CaptionAligner(Transcript t)
This constructor takes in a DOM Document that contains captioning information for a particular presentation. |
Method Summary | |
---|---|
boolean |
addUtterance(edu.cmu.sphinx.result.Result r)
Adds utterances as they come in from Sphinx. |
protected boolean |
collectBursts()
This method collects all the burst of minimum lenght of recognized, and therefore time-stamped words. |
private Caption |
createTimedCaption(java.util.LinkedList burst,
java.util.LinkedList raw,
java.util.LinkedList time)
Creates a timed caption from a burst. |
protected void |
extractSentences(Transcript t)
Extracts sentences from XML document that contains captioning information for a presentation. |
Transcript |
getAlignedCaptions()
Aligns the captions that we have collected. |
private java.lang.String |
join(java.util.Collection x)
Helper function similar to PERL's join function. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
Transcript transcript
java.util.LinkedList text
java.util.List bursts
java.util.regex.Pattern startPat
java.util.regex.Pattern endPat
double speakingTime
double wordsMatched
double totalWords
double startTime
double finishTime
double totalSentences
double sentencesCovered
DataSetStatistic SpeakingRate
DataSetStatistic UncoveredRate
DataSetStatistic UncoveredWords
static final int BURST_MIN_LENGTH
java.util.LinkedList Timed
java.util.LinkedList Raw
java.lang.String RawText
Constructor Detail |
---|
public CaptionAligner(Transcript t)
t
- The transcript for which alignment of captions will be performed.Method Detail |
---|
protected void extractSentences(Transcript t)
t
- - Transcript object that contains captions.public boolean addUtterance(edu.cmu.sphinx.result.Result r)
r
- - The Result object containing the most recent utterance.
protected boolean collectBursts()
public Transcript getAlignedCaptions()
private Caption createTimedCaption(java.util.LinkedList burst, java.util.LinkedList raw, java.util.LinkedList time)
burst
- The raw text of a burst of recognized words.raw
- Raw text of owrds in transcript.time
- Text and time-stamp of recognized words as returned by Sphinx.
private java.lang.String join(java.util.Collection x)
x
- The collection of objects to be joined as a string.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |