diff -crN julius-3.3p2-multipath/README-iwsp-ja.txt julius-3.3p3-multipath/README-iwsp-ja.txt
*** julius-3.3p2-multipath/README-iwsp-ja.txt Thu Jan 1 09:00:00 1970
--- julius-3.3p3-multipath/README-iwsp-ja.txt Mon Jan 13 14:49:40 2003
***************
*** 0 ****
--- 1,356 ----
+
+ Julius におけるポーズの扱いについて
+
+ '03/01/08
+ 李 晃伸
+
+ -*-Text-*-
+
+ □はじめに
+ ===========
+
+ 発話データにおける無音区間の扱いは,時間や発声位置によって,区間検出ア
+ ルゴリズム,音響モデル,言語モデル,デコーディングの様々なレベルでの対
+ 処が必要である.以前より,Juliusでは無音区間の切り出しや無音単語の扱い,
+ ショートポーズセグメンテーションなどの機能が実装されていた.また,
+ Julius rev. 3.3p3 より,マルチパス版において単語間ショートポーズへの対
+ 応が追加された.しかし,このように,発話データ中の無音部分について,そ
+ れぞれを扱う機能が個々に実装されており,各機能間の役割分担や設定方法な
+ どに不明確な点があった.
+
+ 本文書では,3.3p3 における無音区間の扱いについてまとめて解説する.
+ 無音区間をその出現位置と持続時間から分類を行い,それぞれに対応する処理
+ として,音声区間検出や無音単語,および3.3p3からの新機能である単語間ショー
+ トポーズの対応について解説する.また通常版/マルチパス版,Julius/Julian
+ などの設定の違いごとの設定方法を詳解する.ショートポーズセグメンテーショ
+ ンについても,ここで関連を整理して述べる.
+
+ なお,この文書は,Julius rev. 3.3p3 以降を対象とする.
+
+
+
+ □無音区間の分類
+ ===============
+
+ 入力音声データ中における,発話部分以外の「無音区間」について,その継続
+ 時間から以下のように分類する.
+
+ 1) long pause … 文と文の間の長い無音区間.Juliusはこれを基準に入力を
+ 区切る.長さの目安はおよそ 300ms 以上.
+
+ 2) short pause… 文中の無音.主にブレスや息継ぎによるもの.前後の音素
+ にコンテキストとして影響をおよぼす長さのものとする.
+ 長さの目安は 100ms 以上300ms 以下.
+ 3) very short pause … 文中の特に短い無音.瞬間の区切れに対応する.ト
+ ライフォンのコンテキストとしては無視される.
+ 長さの目安は 20 〜 100ms.
+
+
+
+ □long pause … 音声入力の区切りのレベルで対処
+ ===============================================
+
+ ○解説
+
+ Julius/Julian は,音声データの入力ストリームに対して,発声区間のみを検
+ 出して無音区間をカットし,その検出された発声区間単位で認識を行う機能を
+ 持っている.
+
+ 切り出しは,認識処理・特徴量抽出の前段階で行われる.ここでの「無音」の
+ 基準は,一定時間内の零交差数とレベルの閾値である.
+
+ この音声切り出し処理は,デフォルトでマイク入力のときに ON, それ以外の
+ 入力のときに OFF である.すなわち,マイク入力の場合は非発話状態(=無
+ 音状態)のときは認識が行われず,発話開始と同時に切り出しが行われ(同時
+ に第1パスの認識が始まる),発話終了(=長い無音区間の発生)と同時に第1
+ パスが終了し,第2パスが実行される.ファイル入力の場合は,デフォルトで
+ は切り出しは行われず,そのファイル全体を1つの入力として,特徴量抽出→
+ 第1パス→第2パスの順で認識が行われる.
+
+
+ ○設定方法
+
+ 明示的に音声切り出し処理を ON/OFF したい場合は,オプション
+ "-cutsilence"および "-nocutsilence" を用いる.例えば,ファイル入力では
+ デフォルトでは切り出しを行わないが,複数の発話を含んだ長い音声ファイル
+ をオフラインでの切り出しを行わずにJuliusで切り出しながら逐次認識を行いたい
+ 場合は,-cutsilence を指定すればよい.
+
+ 切り出しのための音声区間検出アルゴリズムは,一定時間内の零交差数と振幅
+ レベルを基準とする.ある一定時間内において,振幅レベルが指定値以上の振
+ 幅の零交差数が一定数以上になれば音声区間開始,同数が一定数以下になれば
+ 発声終了とみなす.実際には,音声区間の先頭や末尾において音声の立ち上が
+ りや立ち下がりに相当する振幅の小さい部分が切れてしまうことを防ぐため,
+ 発声開始時点より前と発声終了時点より後ろに,それぞれマージンを持たせて
+ 検出する.振幅レベルの閾値は "-lv" (0-3276 7),零交差数は"-zc" (一
+ 秒あたり),切り出し先頭のマージン長は "-headmargin" (単位:ミリ秒),
+ 切り出し末尾のマージン長は "-tailmargin" (単位:ミリ秒)で指定する.
+
+ 切り出しの1区間の最大長は 20 秒である.これを変更したい場合は,
+ ソースアーカイブの libsent/include/sent/speech.h の MAXSPEECHLEN の値
+ を変更して再コンパイルすればよい.ただし指定長に応じて必要メモリ量が増大
+ するので注意すること.
+
+
+ ○モデルの対応
+
+ 通常,音声区間はその先頭と末尾に無音区間を含む.通常の音声認識において
+ は,これらの無音は /sil/ (あるいは /silB/, /silE/)などの「無音単語」
+ を音響モデル・言語モデル上で用意することとなる.音響モデルでは学習サン
+ プルの文頭と文末の無音区間に対して学習した /silB/ や /silE/ などのモデ
+ ルを用意する.言語モデルにおいては,Julius (単語N-gram)では,通常,
+ 元コーパスの学習文の最初と最後に文開始タグ や文終了タグ などを
+ 付与して学習を行い,それぞれに「無音単語」を読みとして付与することでモ
+ デル化される.
+
+ [] silB
+ [] silE
+
+ また,記述文法(Julian)においては,文の先頭と末尾の無音単語は,
+ 文法で明示的に指定する必要がある.
+
+ S: NS_B SENTENCE NS_E
+
+ % NS_B
+ silB silB
+ % NS_E
+ silE silE
+
+ Julius では,以上のようなモデル化が行われていることを前提条件としてお
+ り,音声データの始終端の単語仮説をそれぞれ "","" に固定して探索
+ を行う.このため,Julius で認識を行う際には,"", "" という読み
+ の単語が単語辞書に登録されている必要がある.(ただしこれらの単語の発音
+ が /silB/, /silE/ である必要はない)これらの始終端単語の読みは,それぞ
+ れオプション "-silhead", "-siltail" で変更可能である.
+
+
+
+ □short pause … 言語モデル・辞書のレベルで対処
+ ==================================================
+
+ ○解説
+
+ 実際の発話においては,1回の発話中にも,息継ぎや punctuation,言い淀み
+ などによる短い無音が出現する.この無音部は,より自然で日常的な発話におい
+ て多く現れる傾向にある.このような発話中の短い無音を,ここではショート
+ ポーズと呼ぶ.
+
+ ある程度の長さを持つショートポーズであれば,文頭や文末の無音単語と同様
+ に,ショートポーズを読みとする単語を辞書に加え,その文中の出現パター
+ ンを言語モデルで,音響的特徴を音響モデルで定義することで対処することが
+ できる.このショートポーズに対応する辞書中の単語を,以後『sp 単語』と呼ぶ.
+
+
+ ○モデルの対応
+
+ sp単語の音響的特徴と出現位置は,それぞれ音響モデルと言語モデルでモデル
+ 化される.通常,sp 単語は,学習コーパス中の無音部分から学習した無音モ
+ デル /sp/ として学習される./sp/ の学習方法には,/sil/ との tee モデル
+ として学習する方法や,学習データベースに正確なショートポーズのアノテー
+ ションを行う方法などがあるが,Julius は学習方法には依存しない.
+
+ sp単語は通常の単語と同様に,トライフォン使用時には前後で単語間の音素環
+ 境依存性が考慮される.すなわち,
+
+ a | sp | b → ..-a+sp | sp | sp-b+..
+
+ というように,sp 単語の前後の単語については,sp がコンテキストとして考
+ 慮される.
+
+ sp単語の出現パターンのモデル化は,言語に依存する部分が大きい.例えば,
+ IPAやCSRCで配布している日本語のN-gram言語モデルでは,読み上げ文中の読
+ 点「、」にあたる部分にショートポーズが発生しやすいと考え,以下のように
+ 読点を sp 単語として定義することで,ショートポーズを予測している.
+
+ 、+78 [、] sp
+
+ あるいは,言語モデルで予測していない場合は,単純に辞書に sp 単語を追加
+ 登録するだけでも効果があると考えられる.
+
+
+ ○設定方法
+
+ 辞書上の sp 単語の検出は,Julius/Julian の起動時に行われる.「単一また
+ は複数の /sp/ のみからなる単語」を検出し,それを sp単語として認識する.
+ この sp に相当する音響モデル名は,オプション "-spmodel" で変更できる.
+
+ Julius と Julian で,sp単語の扱いは異なる.
+
+ Julius では,sp 単語は通常の単語と全く同様に扱われる.
+ 言語モデルや辞書中に sp 単語が定義されていない場合は,jconf オプション
+ "-iwspword" を指定することで,起動時に自動的に sp 単語のエントリを辞書
+ に追加して認識を行うことができる.この際,追加される sp 単語のエントリ
+ の内容は,"-iwspentry" で指定できる.デフォルトは " [sp] sp sp"
+ である.
+
+ Julian では,文法で sp 単語の出現位置を明示的に指定する必要がある.sp
+ 単語の出現位置が文法(および辞書)で定義されていれば,その sp 単語に対
+ してスキップを考慮した特別な認識処理が行われる.sp 単語が定義されてい
+ ない場合,これらの sp 単語に関する処理はいっさい行われない.
+
+ また,Julius/Julian とも,sp 単語は辞書上で複数指定しても構わない.
+ Julius で辞書に既に sp 単語があるときに "-iwspword" を指定した場合も,
+ 辞書にある sp 単語と "-iwspword" により自動追加された sp 単語は独立に
+ 扱われる.
+
+
+
+ □very short pause … 音素 HMM のレベルで対処 (マルチパス版のみ)
+ ==================================================================
+
+ ○解説
+
+ 実際の発話において,息継ぎなどによる短い無音が出現することは述べたが,
+ より自然な発声においては,さらに短い数十msのごく短時間の無音が単語間に
+ 出現する場合があるとされている.
+
+ この単語間の短時間の無音に対処するために,Julius/Julian のマルチパス版
+ では,単語の末尾にコンテキストとして考慮されないショートポーズモデルを
+ 付加してデコーディングを行う機能がある.オプション "-iwsp" を指定する
+ ことで,ショートポーズの音響モデル /sp/ を全単語の末尾に自動付加して
+ 認識処理を行う.このとき、単語末尾に sp が登場しない場合も考慮して、そ
+ の /sp/ 全体をスキップする遷移も同時に付加される.すなわち,各辞書単語
+ に対する /sp/ の付加は以下のように行われる.
+
+
+ <> <> <> <> <> <> <>
+ ○->●->●->●->○ + ○->●->○ => ○->●->●->●--->○
+ \<>/
+ ●
+ /a/ /sp/ /a sp/
+
+ なお、/sp/ がモデルをスキップする遷移をあらかじめ持っている場合は,それ
+ を用いる.そのような遷移を持たない場合には,確率 1.0 のスキップ遷移が
+ 自動的に付与される.
+
+ 前節の sp 単語と比較したときの大きな処理の違いは,この sp がトライフォンの
+ コンテキストとして無視される点にある.単語間における音素環境依存性の処
+ 理は,以下のように sp をスキップして行われる (context free).これは,
+ このような短時間の無音はその前後の単語間の音素環境に対して影響が小さい
+ としているためである.
+
+ a | sp | b → ..-a+b | sp | a-b+..
+
+ すなわち,Julius/Julian では,単語間のショートポーズに対する単語間音素
+ 環境依存性の扱いを,ポーズの長さで変えていることとなる.ポーズが比較的
+ 長い場合は sp 単語(context aware)にマッチさせ,短い場合は単語末尾へ
+ 付加された sp (context free) で対処する.
+
+ 単語末 sp と sp 単語との違いを以下にまとめる.
+
+ ・対象時間:sp単語=長時間(50msから200ms),単語末sp=単時間(10msから50ms)
+ ・出現:sp単語=言語モデルで指定,単語末sp=全ての単語間に出現可能
+ ・単語間トライフォン:sp単語=context aware,単語末sp=context free
+
+ なお,この単語末への /sp/ 自動付与による単語間ショートポーズの扱いは,
+ マルチパス版でのみサポートされる.通常版では,スキップ遷移の扱いの実装
+ が異なるため、現在はサポート外となっている.
+
+
+ ○設定方法
+
+ 単語末 sp 処理はデフォルトでは OFF であり,"-iwsp" を指定することで ON
+ となる.
+
+ 付加するショートポーズの音響モデル名 /sp/ は "-spmodel" で変更可能
+ である.また,単語末尾に付加された sp への遷移確率(対数尤度)はオプショ
+ ン"-iwsppenalty" で指定できる.マイナス方向に大きくなるほど,sp の挿入
+ を抑制できる.(default: -1.0)
+
+
+
+ □自動アラインメント機能とショートポーズ処理について
+ ======================================================
+
+ 自動アラインメント(-walign, -palign, -salign)についても,上記のショー
+ トポーズの扱いが考慮される.認識結果について,sp 単語や単語末spが考慮
+ されたアラインメント結果が出力される.
+
+ sp 単語のアラインメント結果は通常の単語と同様に出力される.認識結果に
+ sp 単語が含まれれば,その部分は sp を挿入した形で forced alignment が
+ 実行される.
+
+ 単語末 sp については,自動アラインメントにおいて,各単語末ごとに sp を
+ 挟む場合と挟まない場合で尤度の高い方が選択され,出力される.例えば「今
+ 日」のアラインメント結果は,なら,-walign では「ky o:」「ky o: sp」の
+ うち尤度が高い方の区間が出力される.-palign では「ky」に続いて「o:」
+ 「o: sp」のうち尤度が高い方の結果が出力される.-salign の場合は,sp 部
+ 分の遷移に関しては [o:#4 (sp)] のように出力される.
+
+
+
+ □Julius/Julian の動作と設定方法
+ =================================
+
+ Julius における "-iwspword" による sp 単語の自動追加,および
+ Julius/Julian のマルチパス版における "-iwsp" による単語末へのsp付与は,
+ Julius-3.3p3 でサポートされた新機能である.
+ "-iwsp" では context-free な短時間ショートポーズに対応し,
+ "-iwspword" では context-aware な長時間ショートポーズに対応する.
+
+ Julius/Julian および 通常版/マルチパス版 ごとに設定可能な動作が異なる
+ ので注意されたい.以下に各エンジンでの設定方法をまとめる.
+
+ Julius(通常版):
+ "-iwspword" のみ."-iwsp" は扱えない.sp単語の出現確率は,あらか
+ じめ N-gram と辞書で学習しておくか,あるいは "-iwspword" で自動追加する
+ ことになる.自動追加の場合,読みやN-gramエントリを "-iwspentry" で指定
+ できる.
+
+ Julian(通常版):
+ "-iwspword", "-iwsp" ともサポートしない.sp単語については辞書で明示的
+ に定義し,その出現しうる位置を文法で明示的に指定する必要がある.また,
+ sp 単語の読みに /sp/ と異なる名前の音響モデルを用いている場合は,
+ Julian がその単語を sp 単語として検出できるよう,"-spmodel" でその音響
+ モデル名を指定する必要がある.
+
+ Julius-multipath:
+ "-iwspword" および "-iwsp" を両方ともサポートする."-iwspword"で自動追
+ 加される sp 単語の読みや N-gramエントリは "-iwspentry","-iwsp" で単語
+ 末へ付与される sp のモデル名は "-spmodel" で指定可能.
+
+ Julian-multipath:
+ "-iwsp" のみサポート."-iwspword" は扱えない.sp単語については,辞書で
+ 明示的に定義し,その出現しうる位置を文法で明示的に指定する必要がある.
+ また,sp 単語の読みに /sp/ 以外の名前の音響モデルを用いている場合は,
+ Julianがその単語を sp 単語として検出できるよう,"-spmodel" でその音響
+ モデル名を指定する必要がある.
+ "-iwsp" により単語末へ付与される sp のモデル名 "-spmodel" で指定する.
+
+
+
+ □ ショートポーズセグメンテーションについて
+ ============================================
+
+ (注意!以下の使用は3.3p3ではまだ実現されていません。
+ 次回アップデートで対処の予定です)
+
+ Julius 3.2 以降の機能として,入力をショートポーズで逐次区切って認識し
+ ていくショートポーズセグメンテーションが実装されている.これはコンパイ
+ ル時に configure オプションとして "--enable-sp-segment" を指定すること
+ で可能となる.
+
+ 音声データ入力の場合のショートポーズセグメンテーションは,
+ 以下のアルゴリズムで行われる.
+
+ (1)入力音声に対して,前述の音声区間の切り出しが行われる.
+ (2)切り出された音声区間について,特徴量抽出が行われる.
+ (3)以下,音声区間全体の処理が終わるまでループ.
+ (a)認識処理の第1パスを実行する.その際,sp 単語が『一定フレーム数以上連
+ 続して最尤仮説であった』場合,そこでいったん第1パスを中断して(b)へ.
+ (b)(a)で処理した区間について第2パスを実行し,認識結果を出力する.
+ (c)(a)で中断した位置から第1パスを再開する.(a)へ.
+
+ (3)の検出の条件である,sp単語が最尤仮説である連続フレーム数は,
+ オプション "-spdur" で指定する.単位はフレーム数,デフォルトは 10.
+
+ 検出対象である sp 単語は,上記で述べた通り「単一または複数の /sp/ のみ
+ からなる単語」とする.この sp に相当する音響モデル名は,オプション
+ "-spmodel" で変更できる.
+
+ 制限事項:現在のショートポーズセグメンテーションはいったん(1)で音声区
+ 間の切り出しと特徴量抽出を行わないといけないため,マイク入力に対して行
+ うことは可能である.また Julian では同機能はサポートされていない.
+
+
+ 以上.
diff -crN julius-3.3p2-multipath/Release.txt julius-3.3p3-multipath/Release.txt
*** julius-3.3p2-multipath/Release.txt Thu Jan 1 09:00:00 1970
--- julius-3.3p3-multipath/Release.txt Wed Jan 8 17:21:48 2003
***************
*** 0 ****
--- 1,62 ----
+ 3.3p3 (2003.01.08)
+ ===================
+ - New inter-word short pause handling:
+ - [Julius] New option added for short pause handling. Specifying
+ "-iwspword" adds a short-pause word entry, namely " [sp] sp sp",
+ to the dictionary. The entry content to be changed by using "-iwspentry".
+ - [multi-path] Supports inter-word context-free short pause handling.
+ "-iwsp" option automatically appends a skippable short pause model at
+ every word end. The added model will also be ignored in context
+ modeling. The short pause model to be appended by "-iwsp" can be
+ specified by "-spmodel" options. See documents for details.
+ - Fixes for audio input:
+ - Input delay improved: the initial response to mic input now
+ becomes much faster than previous versions (200ms -> 50ms approx.).
+ - Would not block when other process is using the audio device, but
+ just output error and exit.
+ - Update support for libsndfile-1.0.x.
+ - Update support for ALSA-0.9.x
+ (to use this, add "--with-mictype=alsa" to configure option.)
+
+ 3.3p2 (2002.11.18)
+ ===================
+ - [multi-path version] Supports model-skip transition. From
+ this version, you can use "any" type of state transition in HTK
+ format acoustic model.
+ - New feature: "-record dir" records speech inputs sucessively
+ into the specified directory with time-stamp file names.
+ - fix segfault on Solaris with "-input mfcfile".
+ - fix blocking command input when using module mode and adinnet together.
+ - modified the output flush timing to make sure the last recognition
+ result will be output immediately.
+
+ 3.3p1 (2002.10.15)
+ ===================
+ Following bugs are fixed:
+ - Fixed incorrect default value of language weights for second pass (-lmp2).
+ - Fixed sometimes read failure of dictionary file (double space enabled).
+ - Fixed wrong output of "-separatescore" together with monophone model.
+
+ 3.3 (2002.09.12)
+ ==================
+ The updates and new features from rev.3.2 is shown below.
+
+ - New features added:
+ - Server module mode - control Julius (input on/off, grammar switching)
+ from other client process via network.
+ - Online grammar changing and multi-grammar recognition supported.
+ - Noise robustness:
+ - Spectral subtraction incorporated.
+ - Support more variety of acoustic models:
+ - "multi-path version" is available that allows any transition
+ including loop, skip and parallel transition.
+ - A little improvement of recognition performance by bug fixes
+ - Other minor extensions (CMN parameter saving, etc.)
+ - Many bug fixes
+
+ English documents are available in
+ o online manuals (will be installed by default), and
+ o Translated full documentation in PDF format: Julius-3.2-book-e.pdf.
+ We are sorry that current release contains only documents for old rev.3.2.
+ We are now working to update it to catch up with the current rev.3.3 version.
+
diff -crN julius-3.3p2-multipath/Release.txt.ja julius-3.3p3-multipath/Release.txt.ja
*** julius-3.3p2-multipath/Release.txt.ja Thu Jan 1 09:00:00 1970
--- julius-3.3p3-multipath/Release.txt.ja Wed Jan 8 17:21:48 2003
***************
*** 0 ****
--- 1,42 ----
+ 3.3p3 (2003.01.08)
+ ===================
+ - $BC18l4V%7%g!<%H%]!<%:$N07$$$,?7$?$KDI2C$5$l$?!%(B
+ - [Julius] "-iwspword" $B$G%7%g!<%H%]!<%:$KBP1~$9$kC18l%(%s%H%j$rG'<1(B
+ $B<-=q$K<+F0DI2C$9$k$h$&$K$J$C$?!%DI2C$5$l$kC18l$N(BN-gram$B%(%s%H%j$d(B
+ $B=PNOJ8;zNs!$2;6A(BHMM$BNs$O%G%U%)%k%H$G$O(B" [sp] sp sp"$B$G!$%*%W%7%g%s(B
+ "-iwspentry" $B$G;XDj2DG=!%(B
+ - [$B%^%k%A%Q%9HG(B] $B$h$jC;;~4V$N(Bcontext-free$B$JC18l4V$NL52;$KBP1~$7$?!%(B
+ $B%*%W%7%g%s(B "-iwsp" $B$G!$<-=qCf$NA4C18l$NFI$_$NKvHx$K%9%-%C%W2DG=$J(B
+ $B%7%g!<%H%]!<%:%b%G%k$rIU2C$9$k!%IU2C$9$k%b%G%k$O(B "-spmodel" $B$GJQ99(B
+ $B2DG=!%>\:Y$OJL%I%-%e%a%s%H$r;2>H$N$3$H!%(B
+ - $B2;@Mh$N(B 200ms $BA08e$+$i(B 50ms $BA08e$K2~A1$5$l$?!%(B
+ - $B2;@<%G%P%$%9$,B>$N%W%m%;%9$G;HMQCf$N>l9g$K!$%V%m%C%/$;$:$K%(%i!<(B
+ $B=*N;$9$k$h$&$KJQ99!%(B
+ - $B:G?7%G%P%$%94D6-$KBP1~!'(B
+ - libsndfile-1.0.x $B$KBP1~!%(B
+ - ALSA-0.9.x $B$KBP1~!%(B
+ $B<+F08!=P$G$O(B OSS $B$,M%@h$5$l$k$N$G!$;HMQ$9$k$K$O(B configure $B;~$K(B
+ "--with-mictype=alsa" $B$rL@<(E*$K;XDj$9$k$3$H!%(B
+
+ 3.3p2 (2002.11.18)
+ ===================
+ - [$B%^%k%A%Q%9HG(B] $B%b%G%k%9%-%C%WA+0\$KBP1~$7$?!%2;6A%b%G%k$N=i4|>uBV$+(B
+ $B$i:G=*>uBV$X$N!J=PNO>uBV$r7ONs>e$K4^$^$J$$!KA+0\$r%5%]!<%H$7$?!%$3$l(B
+ $B$K$h$j!$$3$l0J9_$N%P!<%8%g%s$O(B HTK $B$N!VA4$F$N!W>uBVA+0\$r%5%]!<%H$7(B
+ $B$?$3$H$K$J$k!%(B
+ - $B?75!G=(B: "-record dir" $B$G!$(BJulius$B$KF~NO$5$l$?2;@<%G!<%?$rA4$F;XDj%G%#(B
+ $B%l%/%H%j2<$KC` [sp] sp sp" # $BDI2C$5$l$kC18l%(%s%H%j$N%G%U%)%k%HCM(B
+
+ ##
+ ## context-free $B$JC;;~4V%7%g!<%H%]!<%:$X$NBP1~!'(B
+ ## $BA4C18l$NFI$_$NKvHx$K%9%-%C%W2DG=$J(B sp $B$rIU2C$9$k(B
+ ## ($B%^%k%A%Q%9HG$N$_(B)
+ ##
+ #-iwsp # ($B%^%k%A%Q%9HG$N$_M-8z(B)
+ #-spmodel "sp" # $BKvHx$KIU2C$9$k2;6A%b%G%k$NL>A0!JCM$O%G%U%)%k%HCM!K(B
+
+ ######################################################################
#### $B%7%g!<%H%]!<%:%;%0%a%s%F!<%7%g%s(B (--enable-sp-segment $B;~M-8z(B)
######################################################################
#-spdur 10 # $BBh#1%Q%9(B sp $B7QB3%U%l!<%`?t(B
***************
*** 173,178 ****
--- 190,198 ----
#-filelist file # $BG'<1BP>]%U%!%$%k$N%j%9%H(B
+ ######################################################################
+ #### $B2;@(B($B%G%U%)%k%H!'(B"sp")
##
! #-sp sp
##
## $B%(%i!(B($B%G%U%)%k%H!'(B"sp")
##
! #-spmodel sp
##
## $B%(%i!uBV?t(B
######################################################################
+ #### $BC18l4V%7%g!<%H%]!<%:(B
+ ######################################################################
+ ##
+ ## context-free $B$JC;;~4V%7%g!<%H%]!<%:$X$NBP1~!'(B
+ ## $BA4C18l$NFI$_$NKvHx$K%9%-%C%W2DG=$J(B sp $B$rIU2C$9$k(B
+ ## ($B%^%k%A%Q%9HG$N$_(B)
+ ##
+ #-iwsp # ($B%^%k%A%Q%9HG$N$_M-8z(B)
+ # $BKvHx$KIU2C$9$k2;6A%b%G%k$NL>A0$O(B -spmodel $B$G;XDj(B
+
+ ######################################################################
#### $B2;@]%U%!%$%k$N%j%9%H(B
+ ######################################################################
+ #### $B2;@ [sp] sp sp" # default word entry to be added
+
+ ##
+ ## For context-free short-term inter-word pauses:
+ ## append a skippable short pause model at every word end
+ ## (for multi-path version only)
+ ##
+ #-iwsp # (valid for multi-path version only)
+ #-spmodel "sp" # HMM model name to be appended (default: "sp")
+
+ ######################################################################
#### Short-pause Segmentation (--enable-sp-segment)
######################################################################
#-spdur 10 # sp duration frame on 1st pass
***************
*** 172,177 ****
--- 189,197 ----
#-filelist # specify file list to be recognized in batch mode
+ ######################################################################
+ #### Recording
+ ######################################################################
#-record directory # auto-save recognized speech data into the dir
######################################################################
diff -crN julius-3.3p2-multipath/julius/00readme-julian.txt julius-3.3p3-multipath/julius/00readme-julian.txt
*** julius-3.3p2-multipath/julius/00readme-julian.txt Thu Sep 12 07:08:17 2002
--- julius-3.3p3-multipath/julius/00readme-julian.txt Wed Jan 8 17:21:48 2003
***************
*** 94,101 ****
option to read it at run time.
Most are the same as Julius.
! Options only in Julian: -dfa, -penalty1, -penalty2, -sp,
! -looktrellis
Options only in Julius: -nlr, -nrl, -d, -lmp, -lmp2,
-transp, -silhead, -siltail, -spdur, -sepnum, -sepa-
ratescore
--- 94,101 ----
option to read it at run time.
Most are the same as Julius.
! Options only in Julian: -dfa, -penalty1, -penalty2, -look-
! trellis
Options only in Julius: -nlr, -nrl, -d, -lmp, -lmp2,
-transp, -silhead, -siltail, -spdur, -sepnum, -sepa-
ratescore
***************
*** 131,136 ****
--- 131,145 ----
(with -input netaudio) set the server name and unit
ID of the Datlink unit.
+ -record directory
+ auto-save recognized speech data under the direc-
+ tory. Each segmented inputs are recorded each by
+ one, with a filename of "YYYY.MMDD.HHMMSS.raw",
+ which shows the system time when the input begins
+ (YYYY=year, MMDD=month/day, HHMMSS=hour/minute/sec-
+ ond). The file format is RAW, 16bit, 16kHz, mono,
+ big endian.
+
Speech Detection
-cutsilence
***************
*** 231,237 ****
-v dictionary_file
Word dictionary file (required)
! -sp {WORD|WORD[OUTSYM]|#num}
Name of short pause model as defined in the
hmmdefs. (default: "sp")
--- 240,246 ----
-v dictionary_file
Word dictionary file (required)
! -spmodel {WORD|WORD[OUTSYM]|#num}
Name of short pause model as defined in the
hmmdefs. (default: "sp")
***************
*** 312,321 ****
to select from whole monophone states. (default:
24)
Search Parameters (First Pass)
-b beamwidth
! Beam width (Number of HMM nodes). As this value
! increases the precision also increases, however,
processing time and memory usage also increase.
default value: acoustic model dependent
--- 321,338 ----
to select from whole monophone states. (default:
24)
+ Inter-word Short Pause Handling
+ -iwsp (Multi-path version only) Enable inter-word con-
+ text-free short pause handling. This option
+ appends a skippable short pause model for every
+ word end. The added model will also be ignored in
+ context modeling. The model specified by
+ "-spmodel" will be appended.
+
Search Parameters (First Pass)
-b beamwidth
! Beam width (Number of HMM nodes). As this value
! increases the precision also increases, however,
processing time and memory usage also increase.
default value: acoustic model dependent
***************
*** 323,383 ****
800 (triphone,PTM)
1000 (triphone,PTM, setup=v2.1)
! -1pass Only perform the first pass search. This mode is
automatically set when no 3-gram language model has
been specified (-nlr).
-
-
-realtime
-norealtime
Explicitly specify whether real-time (pipeline)
! processing will be done in the first pass or not.
! For file input, the default is OFF (-norealtime),
! for microphone, adinnet and NetAudio input, the
! default is ON (-realtime). This option relates to
! the way CMN is performed: when OFF CMN is calcu-
! lated for each input independently, when the real-
time option is ON the previous 5 second of input is
always used. Also refer to -progout.
-cmnsave filename
Save last CMN parameters computed while recognition
! to the specified file. The parameters will be
! saved to the file in each time a input is recog-
nized, so the output file always keeps the last CMN
! parameters. If output file already exist, it will
be overridden.
-cmnload filename
! Load initial CMN parameters previously saved in a
! file by "-cmnsave". This option enables Julian to
! recognize the first utterance of a live microphone
input or adinnet input with CMN.
Search Parameters (Second Pass)
-b2 hyponum
! Beam width (number of hypothesis) in second pass.
! If the count of word expantion at a certain length
! of hypothesis reaches this limit while search,
! shorter hypotheses are not expanded further. This
! prevents search to fall in breadth-first-like sta-
! tus stacking on the same position, and improve
search failure. (default: 30)
-n candidatenum
! The search continues till 'candidate_num' sentence
! hypotheses have been found. The obtained sentence
hypotheses are sorted by score, and final result is
! displayed in the order (see also the "-output"
option).
! The possibility that the optimum hypothesis is
found increases as this value is increased, but the
processing time also becomes longer.
! Default value depends on the engine setup on com-
pilation time:
10 (standard)
1 (fast, v2.1)
--- 340,398 ----
800 (triphone,PTM)
1000 (triphone,PTM, setup=v2.1)
! -1pass Only perform the first pass search. This mode is
automatically set when no 3-gram language model has
been specified (-nlr).
-realtime
-norealtime
Explicitly specify whether real-time (pipeline)
! processing will be done in the first pass or not.
! For file input, the default is OFF (-norealtime),
! for microphone, adinnet and NetAudio input, the
! default is ON (-realtime). This option relates to
! the way CMN is performed: when OFF CMN is calcu-
! lated for each input independently, when the real-
time option is ON the previous 5 second of input is
always used. Also refer to -progout.
-cmnsave filename
Save last CMN parameters computed while recognition
! to the specified file. The parameters will be
! saved to the file in each time a input is recog-
nized, so the output file always keeps the last CMN
! parameters. If output file already exist, it will
be overridden.
-cmnload filename
! Load initial CMN parameters previously saved in a
! file by "-cmnsave". This option enables Julian to
! recognize the first utterance of a live microphone
input or adinnet input with CMN.
Search Parameters (Second Pass)
-b2 hyponum
! Beam width (number of hypothesis) in second pass.
! If the count of word expantion at a certain length
! of hypothesis reaches this limit while search,
! shorter hypotheses are not expanded further. This
! prevents search to fall in breadth-first-like sta-
! tus stacking on the same position, and improve
search failure. (default: 30)
-n candidatenum
! The search continues till 'candidate_num' sentence
! hypotheses have been found. The obtained sentence
hypotheses are sorted by score, and final result is
! displayed in the order (see also the "-output"
option).
! The possibility that the optimum hypothesis is
found increases as this value is increased, but the
processing time also becomes longer.
! Default value depends on the engine setup on com-
pilation time:
10 (standard)
1 (fast, v2.1)
***************
*** 387,472 ****
end of search. Use with "-n" option. (default: 1)
-sb score
! Score envelope width for enveloped scoring. When
! calculating hypothesis score for each generated
hypothesis, its trellis expansion and viterbi oper-
ation will be pruned in the middle of the speech if
! score on a frame goes under [current maximum score
! of the frame- width]. Giving small value makes
! computation cost of the second pass smaller, but
computation error may occur. (default: 80.0)
-s stack_size
The maximum number of hypothesis that can be stored
on the stack during the search. A larger value may
! give more stable results, but increases the amount
of memory required. (default: 500)
-m overflow_pop_times
! Number of expanded hypotheses required to discon-
tinue the search. If the number of expanded
hypotheses is greater then this threshold then, the
! search is discontinued at that point. The larger
this value is, the longer the search will continue,
! but processing time for search failures will also
increase. (default: 2000)
-lookuprange nframe
! When performing word expansion, this option sets
! the number of frames before and after in which to
! determine next word hypotheses. This prevents the
! omission of short words but, with a large value,
! the number of expanded hypotheses increases and
system becomes slow. (default: 5)
-looktrellis
! Expand only the trellis words instead of grammar-
! permitted words. This option makes second pass
decoding faster, but may increase deletion error of
short words. (default: disabled)
Forced Alignment
-walign
Do viterbi alignment per word units from the recog-
! nition result. The word boundary frames and the
average acoustic scores per frame are calculated.
-palign
Do viterbi alignment per phoneme (model) units from
! the recognition result. The phoneme boundary
! frames and the average acoustic scores per frame
are calculated.
-salign
! Do viterbi alignment per HMM state from the recog-
! nition result. The state boundary frames and the
average acoustic scores per frame are calculated.
Server Module Mode
-module [port]
Run Julian on "Server Module Mode". After startup,
! Julian waits for tcp/ip connection from client.
Once connection is established, Julian start commu-
! nication with the client to process incoming com-
! mands from the client, or to output recognition
results, input trigger information and other system
! status to the client. The multi-grammar mode is
! only supported at this Server Module Mode. The
default port number is 10500.
-outcode [W][L][P][S][w][l][p][s]
! (Only for Server Module Mode) Switch which symbols
! of recognized words to be sent to client. Specify
! 'W' for output symbol, 'L' for grammar entry, 'P'
! for phoneme sequence, 'S' for score, respectively.
! Capital letters are for the second pass (final
! result), and small letters are for results of the
! first pass. For example, if you want to send only
! the output symbols and phone sequences as a recog-
nition result to a client, specify "-outcode WP".
Message Output
! -quiet Omit phoneme sequence and score, only output the
best word sequence hypothesis.
-progout
--- 402,490 ----
end of search. Use with "-n" option. (default: 1)
-sb score
! Score envelope width for enveloped scoring. When
! calculating hypothesis score for each generated
hypothesis, its trellis expansion and viterbi oper-
ation will be pruned in the middle of the speech if
! score on a frame goes under [current maximum score
! of the frame- width]. Giving small value makes
! computation cost of the second pass smaller, but
computation error may occur. (default: 80.0)
-s stack_size
The maximum number of hypothesis that can be stored
on the stack during the search. A larger value may
! give more stable results, but increases the amount
of memory required. (default: 500)
-m overflow_pop_times
! Number of expanded hypotheses required to discon-
tinue the search. If the number of expanded
hypotheses is greater then this threshold then, the
! search is discontinued at that point. The larger
this value is, the longer the search will continue,
! but processing time for search failures will also
increase. (default: 2000)
-lookuprange nframe
! When performing word expansion, this option sets
! the number of frames before and after in which to
! determine next word hypotheses. This prevents the
! omission of short words but, with a large value,
! the number of expanded hypotheses increases and
system becomes slow. (default: 5)
-looktrellis
! Expand only the trellis words instead of grammar-
! permitted words. This option makes second pass
decoding faster, but may increase deletion error of
short words. (default: disabled)
Forced Alignment
-walign
Do viterbi alignment per word units from the recog-
! nition result. The word boundary frames and the
average acoustic scores per frame are calculated.
-palign
Do viterbi alignment per phoneme (model) units from
! the recognition result. The phoneme boundary
! frames and the average acoustic scores per frame
are calculated.
-salign
! Do viterbi alignment per HMM state from the recog-
! nition result. The state boundary frames and the
average acoustic scores per frame are calculated.
Server Module Mode
+
+
+
-module [port]
Run Julian on "Server Module Mode". After startup,
! Julian waits for tcp/ip connection from client.
Once connection is established, Julian start commu-
! nication with the client to process incoming com-
! mands from the client, or to output recognition
results, input trigger information and other system
! status to the client. The multi-grammar mode is
! only supported at this Server Module Mode. The
default port number is 10500.
-outcode [W][L][P][S][w][l][p][s]
! (Only for Server Module Mode) Switch which symbols
! of recognized words to be sent to client. Specify
! 'W' for output symbol, 'L' for grammar entry, 'P'
! for phoneme sequence, 'S' for score, respectively.
! Capital letters are for the second pass (final
! result), and small letters are for results of the
! first pass. For example, if you want to send only
! the output symbols and phone sequences as a recog-
nition result to a client, specify "-outcode WP".
Message Output
! -quiet Omit phoneme sequence and score, only output the
best word sequence hypothesis.
-progout
***************
*** 474,480 ****
the first pass at regular intervals.
-proginterval msec
! set the output time interval of "-progout" in mil-
liseconds.
-demo Equivalent to "-progout -quiet"
--- 492,498 ----
the first pass at regular intervals.
-proginterval msec
! set the output time interval of "-progout" in mil-
liseconds.
-demo Equivalent to "-progout -quiet"
***************
*** 484,495 ****
information.
-C jconffile
! Load the jconf file. The options written in the
! file are included and expanded at the point. This
option can also be used within other jconf file.
-check wchmm
! (For debug) turn on interactive check mode of tree
lexicon structure at startup.
-check triphone
--- 502,513 ----
information.
-C jconffile
! Load the jconf file. The options written in the
! file are included and expanded at the point. This
option can also be used within other jconf file.
-check wchmm
! (For debug) turn on interactive check mode of tree
lexicon structure at startup.
-check triphone
***************
*** 503,514 ****
-help Display a brief description of all options.
EXAMPLES
! For examples of system usage, refer to the tutorial sec-
tion in the Julian documents.
NOTICE
! Note about path names in jconf files: relative paths in a
! jconf file are interpreted as relative to the jconf file
itself, not to the current directory.
SEE ALSO
--- 521,532 ----
-help Display a brief description of all options.
EXAMPLES
! For examples of system usage, refer to the tutorial sec-
tion in the Julian documents.
NOTICE
! Note about path names in jconf files: relative paths in a
! jconf file are interpreted as relative to the jconf file
itself, not to the current directory.
SEE ALSO
***************
*** 519,535 ****
http://sourceforge.jp/projects/julius/ (development site)
DIAGNOSTICS
! Julian normally will return the exit status 0. If an
! error occurs, Julian exits abnormally with exit status 1.
! If an input file cannot be found or cannot be loaded for
! some reason then Julian will skip processing for that
file.
BUGS
! There are some restrictions to the type and size of the
! models Julian can use. For a detailed explanation refer
! to the Julius documentation. For bug-reports, inquires
! and comments please contact julius@kuis.kyoto-u.ac.jp or
julius@is.aist-nara.ac.jp.
AUTHORS
--- 537,553 ----
http://sourceforge.jp/projects/julius/ (development site)
DIAGNOSTICS
! Julian normally will return the exit status 0. If an
! error occurs, Julian exits abnormally with exit status 1.
! If an input file cannot be found or cannot be loaded for
! some reason then Julian will skip processing for that
file.
BUGS
! There are some restrictions to the type and size of the
! models Julian can use. For a detailed explanation refer
! to the Julius documentation. For bug-reports, inquires
! and comments please contact julius@kuis.kyoto-u.ac.jp or
julius@is.aist-nara.ac.jp.
AUTHORS
***************
*** 550,563 ****
Rev.3.2 (2001/08/15)
Rev.3.3 (2002/09/11)
! Development of above versions by Akinobu LEE (Nara
Institute of Science and Technology)
THANKS TO
! From Rev.3.2 Julian is released in the "Information Pro-
cessing Society, Continuous Speech Consortium".
! The Windows Microsoft Speech API compatible version was
developed by Takashi SUMIYOSHI (Kyoto University).
--- 568,581 ----
Rev.3.2 (2001/08/15)
Rev.3.3 (2002/09/11)
! Development of above versions by Akinobu LEE (Nara
Institute of Science and Technology)
THANKS TO
! From Rev.3.2 Julian is released in the "Information Pro-
cessing Society, Continuous Speech Consortium".
! The Windows Microsoft Speech API compatible version was
developed by Takashi SUMIYOSHI (Kyoto University).
diff -crN julius-3.3p2-multipath/julius/00readme-julius.txt julius-3.3p3-multipath/julius/00readme-julius.txt
*** julius-3.3p2-multipath/julius/00readme-julius.txt Thu Sep 12 07:08:17 2002
--- julius-3.3p3-multipath/julius/00readme-julius.txt Wed Jan 8 17:21:48 2003
***************
*** 115,120 ****
--- 115,129 ----
(with -input netaudio) set the server name and unit
ID of the Datlink unit.
+ -record directory
+ auto-save recognized speech data under the direc-
+ tory. Each segmented inputs are recorded each by
+ one, with a filename of "YYYY.MMDD.HHMMSS.raw",
+ which shows the system time when the input begins
+ (YYYY=year, MMDD=month/day, HHMMSS=hour/minute/sec-
+ ond). The file format is RAW, 16bit, 16kHz, mono,
+ big endian.
+
Speech Detection
-cutsilence
***************
*** 130,136 ****
this level then it is the end of the speech seg-
ment. (default: 3000)
-
-zc zerocrossnum
Zero crossing threshold per a second (default: 60)
--- 139,144 ----
***************
*** 188,193 ****
--- 196,202 ----
spectrum data should be computed beforehand by
mkss.
+
-ssalpha value
Alpha coefficient of spectral subtraction. Noise
will be subtracted stronger as this value gets
***************
*** 327,347 ****
to select from whole monophone states. (default:
24)
Short-pause Segmentation
! The short pause segmentation can be used for sucessive
! decoding of a long utterance. Enabled when compiled with
'--enable-sp-segment'.
-spdur Set the short-pause duration threshold in number of
! frames. If a short-pause word has the maximum
! likelihood in successive frames longer than this
! value, then interrupt the first pass and start the
second pass. (default: 10)
Search Parameters (First Pass)
-b beamwidth
! Beam width (Number of HMM nodes). As this value
! increases the precision also increases, however,
processing time and memory usage also increase.
default value: acoustic model dependent
--- 336,377 ----
to select from whole monophone states. (default:
24)
+ Inter-word Short Pause Handling
+ -iwspword
+ Add a word entry to the dictionary that corresponds
+ to inter-word short pauses. The content of the
+ word entry can be specified by "-iwspentry".
+
+ -iwspentry
+ Specify the word entry that will be added by
+ "-iwspword". (default: " [sp] sp sp")
+
+ -iwsp (Multi-path version only) Enable inter-word con-
+ text-free short pause handling. This option
+ appends a skippable short pause model for every
+ word end. The added model will also be ignored in
+ context modeling. The model to be appended can be
+ specified by "-spmodel" option.
+
+ -spmodel
+ Specify short-pause model name that will be used in
+ "-iwsp". (default: "sp")
+
Short-pause Segmentation
! The short pause segmentation can be used for sucessive
! decoding of a long utterance. Enabled when compiled with
'--enable-sp-segment'.
-spdur Set the short-pause duration threshold in number of
! frames. If a short-pause word has the maximum
! likelihood in successive frames longer than this
! value, then interrupt the first pass and start the
second pass. (default: 10)
Search Parameters (First Pass)
-b beamwidth
! Beam width (Number of HMM nodes). As this value
! increases the precision also increases, however,
processing time and memory usage also increase.
default value: acoustic model dependent
***************
*** 353,359 ****
Number of high frequency words to be separated from
the lexicon tree. (default: 150)
! -1pass Only perform the first pass search. This mode is
automatically set when no 3-gram language model has
been specified (-nlr).
--- 383,389 ----
Number of high frequency words to be separated from
the lexicon tree. (default: 150)
! -1pass Only perform the first pass search. This mode is
automatically set when no 3-gram language model has
been specified (-nlr).
***************
*** 361,412 ****
-norealtime
Explicitly specify whether real-time (pipeline)
! processing will be done in the first pass or not.
! For file input, the default is OFF (-norealtime),
! for microphone, adinnet and NetAudio input, the
! default is ON (-realtime). This option relates to
! the way CMN is performed: when OFF CMN is calcu-
! lated for each input independently, when the real-
time option is ON the previous 5 second of input is
always used. Also refer to -progout.
-cmnsave filename
Save last CMN parameters computed while recognition
! to the specified file. The parameters will be
! saved to the file in each time a input is recog-
nized, so the output file always keeps the last CMN
! parameters. If output file already exist, it will
be overridden.
-cmnload filename
! Load initial CMN parameters previously saved in a
! file by "-cmnsave". This option enables Julius to
! recognize the first utterance of a live microphone
input or adinnet input with CMN.
Search Parameters (Second Pass)
-b2 hyponum
! Beam width (number of hypothesis) in second pass.
! If the count of word expantion at a certain length
! of hypothesis reaches this limit while search,
! shorter hypotheses are not expanded further. This
! prevents search to fall in breadth-first-like sta-
! tus stacking on the same position, and improve
search failure. (default: 30)
-
-n candidatenum
! The search continues till 'candidate_num' sentence
! hypotheses have been found. The obtained sentence
hypotheses are sorted by score, and final result is
! displayed in the order (see also the "-output"
option).
! The possibility that the optimum hypothesis is
found increases as this value is increased, but the
processing time also becomes longer.
! Default value depends on the engine setup on com-
pilation time:
10 (standard)
1 (fast, v2.1)
--- 391,441 ----
-norealtime
Explicitly specify whether real-time (pipeline)
! processing will be done in the first pass or not.
! For file input, the default is OFF (-norealtime),
! for microphone, adinnet and NetAudio input, the
! default is ON (-realtime). This option relates to
! the way CMN is performed: when OFF CMN is calcu-
! lated for each input independently, when the real-
time option is ON the previous 5 second of input is
always used. Also refer to -progout.
-cmnsave filename
Save last CMN parameters computed while recognition
! to the specified file. The parameters will be
! saved to the file in each time a input is recog-
nized, so the output file always keeps the last CMN
! parameters. If output file already exist, it will
be overridden.
-cmnload filename
! Load initial CMN parameters previously saved in a
! file by "-cmnsave". This option enables Julius to
! recognize the first utterance of a live microphone
input or adinnet input with CMN.
Search Parameters (Second Pass)
-b2 hyponum
! Beam width (number of hypothesis) in second pass.
! If the count of word expantion at a certain length
! of hypothesis reaches this limit while search,
! shorter hypotheses are not expanded further. This
! prevents search to fall in breadth-first-like sta-
! tus stacking on the same position, and improve
search failure. (default: 30)
-n candidatenum
! The search continues till 'candidate_num' sentence
! hypotheses have been found. The obtained sentence
hypotheses are sorted by score, and final result is
! displayed in the order (see also the "-output"
option).
! The possibility that the optimum hypothesis is
found increases as this value is increased, but the
processing time also becomes longer.
! Default value depends on the engine setup on com-
pilation time:
10 (standard)
1 (fast, v2.1)
***************
*** 416,507 ****
end of search. Use with "-n" option. (default: 1)
-sb score
! Score envelope width for enveloped scoring. When
! calculating hypothesis score for each generated
hypothesis, its trellis expansion and viterbi oper-
ation will be pruned in the middle of the speech if
! score on a frame goes under [current maximum score
! of the frame- width]. Giving small value makes
! computation cost of the second pass smaller, but
computation error may occur. (default: 80.0)
-s stack_size
The maximum number of hypothesis that can be stored
on the stack during the search. A larger value may
! give more stable results, but increases the amount
of memory required. (default: 500)
-m overflow_pop_times
! Number of expanded hypotheses required to discon-
tinue the search. If the number of expanded
hypotheses is greater then this threshold then, the
! search is discontinued at that point. The larger
this value is, the longer the search will continue,
! but processing time for search failures will also
increase. (default: 2000)
-lookuprange nframe
! When performing word expansion, this option sets
! the number of frames before and after in which to
! determine next word hypotheses. This prevents the
! omission of short words but, with a large value,
! the number of expanded hypotheses increases and
system becomes slow. (default: 5)
Forced Alignment
-walign
Do viterbi alignment per word units from the recog-
! nition result. The word boundary frames and the
average acoustic scores per frame are calculated.
-palign
Do viterbi alignment per phoneme (model) units from
! the recognition result. The phoneme boundary
! frames and the average acoustic scores per frame
are calculated.
-
-salign
! Do viterbi alignment per HMM state from the recog-
! nition result. The state boundary frames and the
average acoustic scores per frame are calculated.
Server Module Mode
-module [port]
Run Julius on "Server Module Mode". After startup,
! Julius waits for tcp/ip connection from client.
Once connection is established, Julius start commu-
! nication with the client to process incoming com-
! mands from the client, or to output recognition
results, input trigger information and other system
! status to the client. The multi-grammar mode is
! only supported at this Server Module Mode. The
default port number is 10500.
-outcode [W][L][P][S][w][l][p][s]
! (Only for Server Module Mode) Switch which symbols
! of recognized words to be sent to client. Specify
! 'W' for output symbol, 'L' for grammar entry, 'P'
! for phoneme sequence, 'S' for score, respectively.
! Capital letters are for the second pass (final
! result), and small letters are for results of the
! first pass. For example, if you want to send only
! the output symbols and phone sequences as a recog-
nition result to a client, specify "-outcode WP".
Message Output
-separatescore
Output the language and acoustic scores separately.
! -quiet Omit phoneme sequence and score, only output the
best word sequence hypothesis.
-progout
Enable progressive output of the partial results on
the first pass at regular intervals.
-proginterval msec
! set the output time interval of "-progout" in mil-
liseconds.
-demo Equivalent to "-progout -quiet"
--- 445,538 ----
end of search. Use with "-n" option. (default: 1)
-sb score
! Score envelope width for enveloped scoring. When
! calculating hypothesis score for each generated
hypothesis, its trellis expansion and viterbi oper-
ation will be pruned in the middle of the speech if
! score on a frame goes under [current maximum score
! of the frame- width]. Giving small value makes
! computation cost of the second pass smaller, but
computation error may occur. (default: 80.0)
-s stack_size
The maximum number of hypothesis that can be stored
on the stack during the search. A larger value may
! give more stable results, but increases the amount
of memory required. (default: 500)
+
-m overflow_pop_times
! Number of expanded hypotheses required to discon-
tinue the search. If the number of expanded
hypotheses is greater then this threshold then, the
! search is discontinued at that point. The larger
this value is, the longer the search will continue,
! but processing time for search failures will also
increase. (default: 2000)
-lookuprange nframe
! When performing word expansion, this option sets
! the number of frames before and after in which to
! determine next word hypotheses. This prevents the
! omission of short words but, with a large value,
! the number of expanded hypotheses increases and
system becomes slow. (default: 5)
Forced Alignment
-walign
Do viterbi alignment per word units from the recog-
! nition result. The word boundary frames and the
average acoustic scores per frame are calculated.
-palign
Do viterbi alignment per phoneme (model) units from
! the recognition result. The phoneme boundary
! frames and the average acoustic scores per frame
are calculated.
-salign
! Do viterbi alignment per HMM state from the recog-
! nition result. The state boundary frames and the
average acoustic scores per frame are calculated.
Server Module Mode
-module [port]
Run Julius on "Server Module Mode". After startup,
! Julius waits for tcp/ip connection from client.
Once connection is established, Julius start commu-
! nication with the client to process incoming com-
! mands from the client, or to output recognition
results, input trigger information and other system
! status to the client. The multi-grammar mode is
! only supported at this Server Module Mode. The
default port number is 10500.
-outcode [W][L][P][S][w][l][p][s]
! (Only for Server Module Mode) Switch which symbols
! of recognized words to be sent to client. Specify
! 'W' for output symbol, 'L' for grammar entry, 'P'
! for phoneme sequence, 'S' for score, respectively.
! Capital letters are for the second pass (final
! result), and small letters are for results of the
! first pass. For example, if you want to send only
! the output symbols and phone sequences as a recog-
nition result to a client, specify "-outcode WP".
Message Output
-separatescore
Output the language and acoustic scores separately.
! -quiet Omit phoneme sequence and score, only output the
best word sequence hypothesis.
+
+
-progout
Enable progressive output of the partial results on
the first pass at regular intervals.
-proginterval msec
! set the output time interval of "-progout" in mil-
liseconds.
-demo Equivalent to "-progout -quiet"
***************
*** 511,522 ****
information.
-C jconffile
! Load the jconf file. The options written in the
! file are included and expanded at the point. This
option can also be used within other jconf file.
-check wchmm
! (For debug) turn on interactive check mode of tree
lexicon structure at startup.
-check triphone
--- 542,553 ----
information.
-C jconffile
! Load the jconf file. The options written in the
! file are included and expanded at the point. This
option can also be used within other jconf file.
-check wchmm
! (For debug) turn on interactive check mode of tree
lexicon structure at startup.
-check triphone
***************
*** 530,541 ****
-help Display a brief description of all options.
EXAMPLES
! For examples of system usage, refer to the tutorial sec-
tion in the Julius documents.
NOTICE
! Note about path names in jconf files: relative paths in a
! jconf file are interpreted as relative to the jconf file
itself, not to the current directory.
SEE ALSO
--- 561,572 ----
-help Display a brief description of all options.
EXAMPLES
! For examples of system usage, refer to the tutorial sec-
tion in the Julius documents.
NOTICE
! Note about path names in jconf files: relative paths in a
! jconf file are interpreted as relative to the jconf file
itself, not to the current directory.
SEE ALSO
***************
*** 546,565 ****
http://sourceforge.jp/projects/julius/ (development site)
DIAGNOSTICS
! Julius normally will return the exit status 0. If an
! error occurs, Julius exits abnormally with exit status 1.
! If an input file cannot be found or cannot be loaded for
! some reason then Julius will skip processing for that
file.
BUGS
! There are some restrictions to the type and size of the
! models Julius can use. For a detailed explanation refer
! to the Julius documentation. For bug-reports, inquires
! and comments please contact julius@kuis.kyoto-u.ac.jp or
julius@is.aist-nara.ac.jp.
AUTHORS
Rev.1.0 (1998/02/20)
Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto
University)
--- 577,598 ----
http://sourceforge.jp/projects/julius/ (development site)
DIAGNOSTICS
! Julius normally will return the exit status 0. If an
! error occurs, Julius exits abnormally with exit status 1.
! If an input file cannot be found or cannot be loaded for
! some reason then Julius will skip processing for that
file.
BUGS
! There are some restrictions to the type and size of the
! models Julius can use. For a detailed explanation refer
! to the Julius documentation. For bug-reports, inquires
! and comments please contact julius@kuis.kyoto-u.ac.jp or
julius@is.aist-nara.ac.jp.
AUTHORS
+
+
Rev.1.0 (1998/02/20)
Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto
University)
***************
*** 585,601 ****
Rev.3.2 (2001/08/15)
Rev.3.3 (2002/09/11)
! Development of above versions by Akinobu LEE (Nara
Institute of Science and Technology)
THANKS TO
! From Rev.3.2 Julius is released by the "Information Pro-
cessing Society, Continuous Speech Consortium".
! The Windows DLL version was developed and released by
Hideki BANNO (Nagoya University).
! The Windows Microsoft Speech API compatible version was
developed by Takashi SUMIYOSHI (Kyoto University).
--- 618,634 ----
Rev.3.2 (2001/08/15)
Rev.3.3 (2002/09/11)
! Development of above versions by Akinobu LEE (Nara
Institute of Science and Technology)
THANKS TO
! From Rev.3.2 Julius is released by the "Information Pro-
cessing Society, Continuous Speech Consortium".
! The Windows DLL version was developed and released by
Hideki BANNO (Nagoya University).
! The Windows Microsoft Speech API compatible version was
developed by Takashi SUMIYOSHI (Kyoto University).
diff -crN julius-3.3p2-multipath/julius/configure julius-3.3p3-multipath/julius/configure
*** julius-3.3p2-multipath/julius/configure Mon Nov 18 23:22:36 2002
--- julius-3.3p3-multipath/julius/configure Wed Sep 11 01:13:52 2002
***************
*** 734,740 ****
enableval="$enable_julian"
EXECNAME=julian
PRODUCTNAME=Julian
! VERSION=3.3p2-multipath
cat >> confdefs.h <<\EOF
#define USE_DFA 1
EOF
--- 734,740 ----
enableval="$enable_julian"
EXECNAME=julian
PRODUCTNAME=Julian
! VERSION=3.3-multipath
cat >> confdefs.h <<\EOF
#define USE_DFA 1
EOF
***************
*** 747,753 ****
else
EXECNAME=julius
PRODUCTNAME=Julius
! VERSION=3.3p2-multipath
fi
--- 747,753 ----
else
EXECNAME=julius
PRODUCTNAME=Julius
! VERSION=3.3-multipath
fi
diff -crN julius-3.3p2-multipath/julius/configure.in julius-3.3p3-multipath/julius/configure.in
*** julius-3.3p2-multipath/julius/configure.in Mon Nov 18 23:22:36 2002
--- julius-3.3p3-multipath/julius/configure.in Thu Sep 12 07:12:03 2002
***************
*** 101,107 ****
dnl JULIUS related end
EXECNAME=julian
PRODUCTNAME=Julian
! VERSION=3.3p2-multipath
AC_DEFINE(USE_DFA)
AC_DEFINE(CATEGORY_TREE)
dnl JULIUS related begin
--- 101,107 ----
dnl JULIUS related end
EXECNAME=julian
PRODUCTNAME=Julian
! VERSION=3.3-multipath
AC_DEFINE(USE_DFA)
AC_DEFINE(CATEGORY_TREE)
dnl JULIUS related begin
***************
*** 109,115 ****
dnl JULIAN related end
EXECNAME=julius
PRODUCTNAME=Julius
! VERSION=3.3p2-multipath
dnl JULIAN related begin
)
dnl JULIAN related end
--- 109,115 ----
dnl JULIAN related end
EXECNAME=julius
PRODUCTNAME=Julius
! VERSION=3.3-multipath
dnl JULIAN related begin
)
dnl JULIAN related end
diff -crN julius-3.3p2-multipath/julius/define.h julius-3.3p3-multipath/julius/define.h
*** julius-3.3p2-multipath/julius/define.h Thu Sep 12 07:12:03 2002
--- julius-3.3p3-multipath/julius/define.h Tue Jan 7 01:11:53 2003
***************
*** 4,10 ****
/* define.h --- define some internal options (please do not modify) */
! /* $Id: define.h,v 1.5 2002/09/11 22:02:33 ri Exp $ */
/************************************************************/
/********** DO NOT MODIFY MANUALLY DEFINES BELOW ************/
--- 4,10 ----
/* define.h --- define some internal options (please do not modify) */
! /* $Id: define.h,v 1.6 2003/01/06 16:10:39 ri Exp $ */
/************************************************************/
/********** DO NOT MODIFY MANUALLY DEFINES BELOW ************/
***************
*** 111,113 ****
--- 111,121 ----
/* '01/11/28 by ri: malloc step for startnode */
#define STARTNODE_STEP 300
+
+ /* default value of iwsp penalty */
+ #define IWSP_PENALTY_DEFAULT -1.0
+
+ /* default dict entry for IW-sp word that will be added to dict with -iwspword */
+ #ifdef USE_NGRAM
+ #define IWSPENTRY_DEFAULT " [sp] sp sp"
+ #endif
diff -crN julius-3.3p2-multipath/julius/extern.h julius-3.3p3-multipath/julius/extern.h
*** julius-3.3p2-multipath/julius/extern.h Mon Nov 18 22:05:38 2002
--- julius-3.3p3-multipath/julius/extern.h Tue Dec 24 17:05:18 2002
***************
*** 21,27 ****
--- 21,29 ----
/* factoring_sub.c */
void make_iwcache_index(WCHMM_INFO *wchmm);
+ #ifndef CATEGORY_TREE
void make_sc_index(WCHMM_INFO *wchmm);
+ #endif
void make_successor_list(WCHMM_INFO *wchmm);
#ifdef USE_NGRAM
void max_successor_cache_init(WCHMM_INFO *wchmm);
diff -crN julius-3.3p2-multipath/julius/factoring_sub.c julius-3.3p3-multipath/julius/factoring_sub.c
*** julius-3.3p2-multipath/julius/factoring_sub.c Fri Oct 18 14:03:32 2002
--- julius-3.3p3-multipath/julius/factoring_sub.c Tue Dec 24 17:05:18 2002
***************
*** 259,264 ****
--- 259,265 ----
VERMES("done\n");
}
+ #ifndef CATEGORY_TREE
/* make index to valid factoring node, and make mapping from node ID
for factoring cache */
void
***************
*** 346,351 ****
--- 347,354 ----
wchmm->state[node].scid = wchmm->state[ac->arc].scid;
wchmm->state[ac->arc].scid = -1;
}
+ #ifdef USE_NGRAM
+ #ifdef UNIGRAM_FACTORING
if (wchmm->state[ac->arc].fscore != LOG_ZERO) {
if (wchmm->state[node].fscore != LOG_ZERO && wchmm->state[node].fscore != wchmm->state[ac->arc].fscore) {
j_error("Error: different fscore within word-head phone?\n");
***************
*** 353,363 ****
--- 356,369 ----
wchmm->state[node].fscore = wchmm->state[ac->arc].fscore;
wchmm->state[ac->arc].fscore = LOG_ZERO;
}
+ #endif /* UNIGRAM_FACTORING */
+ #endif /* USE_NGRAM */
}
}
}
}
+ #endif /* CATEGORY_TREE */
/* -------------------------------------------------------------------- */
/* factoring computation */
diff -crN julius-3.3p2-multipath/julius/global.h julius-3.3p3-multipath/julius/global.h
*** julius-3.3p2-multipath/julius/global.h Mon Nov 18 22:05:38 2002
--- julius-3.3p3-multipath/julius/global.h Tue Jan 7 01:10:58 2003
***************
*** 4,10 ****
/* global.h --- global variables */
! /* $Id: global.h,v 1.12 2002/11/06 15:27:10 ri Exp $ */
#ifndef __SENT_EXTERNAL_DEFINITION__
#define __SENT_EXTERNAL_DEFINITION__
--- 4,10 ----
/* global.h --- global variables */
! /* $Id: global.h,v 1.15 2003/01/06 16:10:39 ri Exp $ */
#ifndef __SENT_EXTERNAL_DEFINITION__
#define __SENT_EXTERNAL_DEFINITION__
***************
*** 111,121 ****
/* pause model names */
#ifdef USE_NGRAM
! char *head_silname GLOBAL_VAL(BEGIN_WORD_DEFAULT); /* sil model name to begin search */
! char *tail_silname GLOBAL_VAL(END_WORD_DEFAULT); /* sil model name to end search */
! #else /* USE_DFA */
! char *sp_name GLOBAL_VAL(SP_NAME_DEFAULT); /* sp model name that will be skipped at search */
! #endif
/* search parameters for acoustic computation */
GLOBAL int gprune_method GLOBAL_VAL(GPRUNE_SEL_UNDEF); /* Gaussian pruning method (default: use default of engine configuration) */
--- 111,129 ----
/* pause model names */
#ifdef USE_NGRAM
! GLOBAL char *head_silname GLOBAL_VAL(BEGIN_WORD_DEFAULT); /* sil model name to begin search */
! GLOBAL char *tail_silname GLOBAL_VAL(END_WORD_DEFAULT); /* sil model name to end search */
! /* short pause word auto create */
! GLOBAL boolean enable_iwspword GLOBAL_VAL(FALSE); /* enable inter-word short pause word creation */
! GLOBAL char *iwspentry GLOBAL_VAL(IWSPENTRY_DEFAULT); /* given dict entry for the iwspword */
! #endif
! GLOBAL char *spmodel_name GLOBAL_VAL(SPMODEL_NAME_DEFAULT); /* sp model logical name that will be skipped at search */
!
! /* 1) in DFA mode, a word with only "spmodel_name" model as a pronunciation will be specially handled as "short-pause word" */
! /* 2) if "-iwsp" enabled, the "spmodel_name" model will be attached to every word end within the dictionary */
! /* short pause special handling */
! GLOBAL boolean enable_iwsp GLOBAL_VAL(FALSE); /* enable inter-word short pause handling */
! GLOBAL LOGPROB iwsp_penalty GLOBAL_VAL(IWSP_PENALTY_DEFAULT); /* transition penalty of inter-word short pause */
/* search parameters for acoustic computation */
GLOBAL int gprune_method GLOBAL_VAL(GPRUNE_SEL_UNDEF); /* Gaussian pruning method (default: use default of engine configuration) */
diff -crN julius-3.3p2-multipath/julius/julian.man julius-3.3p3-multipath/julius/julian.man
*** julius-3.3p2-multipath/julius/julian.man Mon Nov 18 23:17:28 2002
--- julius-3.3p3-multipath/julius/julian.man Wed Jan 8 17:21:48 2003
***************
*** 91,97 ****
.PP
Most are the same as Julius.
.br
! Options only in Julian: -dfa, -penalty1, -penalty2, -sp, -looktrellis
.br
Options only in Julius: -nlr, -nrl, -d, -lmp, -lmp2, -transp,
-silhead, -siltail, -spdur, -sepnum, -separatescore
--- 91,97 ----
.PP
Most are the same as Julius.
.br
! Options only in Julian: -dfa, -penalty1, -penalty2, -looktrellis
.br
Options only in Julius: -nlr, -nrl, -d, -lmp, -lmp2, -transp,
-silhead, -siltail, -spdur, -sepnum, -separatescore
***************
*** 196,202 ****
.SS Word Dictionary
.Ip "\-v dictionary_file"
Word dictionary file (required)
! .Ip "\-sp {WORD|WORD[OUTSYM]|#num}"
Name of short pause model as defined in the hmmdefs.
(default: "sp")
.sp
--- 196,202 ----
.SS Word Dictionary
.Ip "\-v dictionary_file"
Word dictionary file (required)
! .Ip "\-spmodel {WORD|WORD[OUTSYM]|#num}"
Name of short pause model as defined in the hmmdefs.
(default: "sp")
.sp
***************
*** 341,346 ****
--- 341,352 ----
.Ip "\-gsnum N"
When using GMS, specify number of monophone state to select from whole
monophone states. (default: 24)
+ .SS Inter-word Short Pause Handling
+ .Ip "\-iwsp"
+ (Multi-path version only) Enable inter-word context-free short pause
+ handling. This option appends a skippable short pause model for every
+ word end. The added model will also be ignored in context modeling.
+ The model specified by "-spmodel" will be appended.
.SS Search Parameters (First Pass)
.Ip "\-b beamwidth"
Beam width (Number of HMM nodes).
diff -crN julius-3.3p2-multipath/julius/julian.man.ja julius-3.3p3-multipath/julius/julian.man.ja
*** julius-3.3p2-multipath/julius/julian.man.ja Mon Nov 18 23:17:28 2002
--- julius-3.3p3-multipath/julius/julian.man.ja Wed Jan 8 17:21:48 2003
***************
*** 181,187 ****
.SS テアクシュス
.Ip "\-v dictionary_file"
テアクシュス・ユ・。・、・(.dict) (ノャソワ)。・
! .Ip "\-sp {WORD|WORD[OUTSYM]|#num}"
。ヨハクテ讀ホテサ、、・ン。シ・コ。ラ、ヒツミア、ケ、イサカチHMMフセ、サリト熙ケ、。・
.br
(default: "sp")
--- 181,187 ----
.SS テアクシュス
.Ip "\-v dictionary_file"
テアクシュス・ユ・。・、・(.dict) (ノャソワ)。・
! .Ip "\-spmodel {WORD|WORD[OUTSYM]|#num}"
。ヨハクテ讀ホテサ、、・ン。シ・コ。ラ、ヒツミア、ケ、イサカチHMMフセ、サリト熙ケ、。・
.br
(default: "sp")
***************
*** 317,322 ****
--- 317,329 ----
.Ip "\-gsnum N"
GMS サネヘムサ。、チエ・筵ホ・ユ・ゥ・、ホセツヨ、ホテ讀ォ、鮴蟆フ N クト、ホセツヨ、ホ、゚・ネ・鬣、・ユ・ゥ・
、キラササ、ケ、 (default: 24)
+ .SS テアクエヨ・キ・遑シ・ネ・ン。シ・コ
+ .Ip "\-iwsp"
+ (・゙・・チ・ム・ケネヌ、ホ、゚)、隍テササエヨ、ホcontext-free、ハテアクエヨ、ホフオイサ、リ、ホツミア、ヘュク
+ 、ヒ、ケ、。・カツホナェ、ヒ、マ。、シュステ讀ホチエテアク、ホニノ、゚、ホヒネ、ヒ・ケ・ュ・テ・ライトヌス、ハ・キ・遑シ・ネ
+ ・ン。シ・コ・筵ヌ・、ノユイテ、ケ、。・ノユイテ、オ、、ソ・筵ヌ・、マ・ウ・・ニ・ュ・ケ・ネ、ホキラササ、ォ、鬢マスウー
+ 、オ、、。・ノユイテ、ケ、・筵ヌ・、マ "-spmodel" 、ヌサリト熙オ、、ソ・筵ヌ・。・
+ セワコル、マハフ・ノ・ュ・螂皈・ネ、サイセネ、ホ、ウ、ネ。・
.SS テオコ・ム・鬣癸シ・ソ。ハツ1・ム・ケ。ヒ
.Ip "\-b beamwidth"
・モ。シ・猖。・HMM、ホ・ホ。シ・ノソ、ヌサリト熙ケ、。・
diff -crN julius-3.3p2-multipath/julius/julius.man julius-3.3p3-multipath/julius/julius.man
*** julius-3.3p2-multipath/julius/julius.man Mon Nov 18 23:17:28 2002
--- julius-3.3p3-multipath/julius/julius.man Wed Jan 8 17:21:48 2003
***************
*** 349,354 ****
--- 349,369 ----
.Ip "\-gsnum N"
When using GMS, specify number of monophone state to select from whole
monophone states. (default: 24)
+ .SS Inter-word Short Pause Handling
+ .Ip "\-iwspword"
+ Add a word entry to the dictionary that corresponds to inter-word
+ short pauses. The content of the word entry can be specified by
+ "-iwspentry".
+ .Ip "\-iwspentry"
+ Specify the word entry that will be added by "-iwspword".
+ (default: " [sp] sp sp")
+ .Ip "\-iwsp"
+ (Multi-path version only) Enable inter-word context-free short pause
+ handling. This option appends a skippable short pause model for every
+ word end. The added model will also be ignored in context modeling.
+ The model to be appended can be specified by "-spmodel" option.
+ .Ip "\-spmodel"
+ Specify short-pause model name that will be used in "-iwsp". (default: "sp")
.SS Short-pause Segmentation
The short pause segmentation can be used for sucessive decoding of a
long utterance. Enabled when compiled with '--enable-sp-segment'.
diff -crN julius-3.3p2-multipath/julius/julius.man.ja julius-3.3p3-multipath/julius/julius.man.ja
*** julius-3.3p2-multipath/julius/julius.man.ja Mon Nov 18 23:17:28 2002
--- julius-3.3p3-multipath/julius/julius.man.ja Wed Jan 8 17:21:48 2003
***************
*** 336,341 ****
--- 336,360 ----
.Ip "\-gsnum N"
GMS サネヘムサ。、チエ・筵ホ・ユ・ゥ・、ホセツヨ、ホテ讀ォ、鮴蟆フ N クト、ホセツヨ、ホ、゚・ネ・鬣、・ユ・ゥ・
、キラササ、ケ、 (default: 24)
+ .SS テアクエヨ・キ・遑シ・ネ・ン。シ・コ
+ .Ip "\-iwspword"
+ ・キ・遑シ・ネ・ン。シ・コ、ヒツミア、ケ、テアク・ィ・・ネ・熙ヌァシアシュス、ヒシォニートノイテ、ケ、。・
+ ・ィ・・ネ・熙ホニ簣ニ、マ "-iwspentry" 、ヌサリトイトヌス。・
+ .Ip "\-iwspentry"
+ "-iwspword" 、ヌトノイテ、ケ、テアク・ィ・・ネ・熙サリト熙ケ、。・ヌァシアシュス、ネニア、ク・ユ・ゥ。シ・゙・テ
+ ・ネ、ヌ。、・ッ・ゥ。シ・ニ。シ・キ・逾、ヌーマ、テ、ニサリト熙ケ、。・
+ .br
+ (default: " [sp] sp sp")
+ .Ip "\-iwsp"
+ (・゙・・チ・ム・ケネヌ、ホ、゚)、隍テササエヨ、ホcontext-free、ハテアクエヨ、ホフオイサ、リ、ホツミア、ヘュク
+ 、ヒ、ケ、。・カツホナェ、ヒ、マ。、シュステ讀ホチエテアク、ホニノ、゚、ホヒネ、ヒ・ケ・ュ・テ・ライトヌス、ハ・キ・遑シ・ネ
+ ・ン。シ・コ・筵ヌ・、ノユイテ、ケ、。・ノユイテ、オ、、ソ・筵ヌ・、マ・ウ・・ニ・ュ・ケ・ネ、ホキラササ、ォ、鬢マスウー
+ 、オ、、。・ノユイテ、ケ、・筵ヌ・、マ "-spmodel" 、ヌサリトイトヌス。・
+ セワコル、マハフ・ノ・ュ・螂皈・ネ、サイセネ、ホ、ウ、ネ。・
+ .Ip "\-spmodel"
+ "-iwsp" 、ヌノユイテ、ケ、・キ・遑シ・ネ・ン。シ・コ・筵ヌ・、ホフセチー、サリト熙ケ、。・
+ .br
+ (default: "sp")。・
.SS ・キ・遑シ・ネ・ン。シ・コ・サ・ー・皈・ニ。シ・キ・逾
.Ip "\-spdur"
(--enable-sp-segment サ) ツ1・ム・ケ、ホ sp キムツウサエヨトケ、ホ、キ、ュ、、テヘ(テアーフ。ァ・ユ・。シ
diff -crN julius-3.3p2-multipath/julius/m_fusion.c julius-3.3p3-multipath/julius/m_fusion.c
*** julius-3.3p2-multipath/julius/m_fusion.c Thu Sep 12 07:12:03 2002
--- julius-3.3p3-multipath/julius/m_fusion.c Tue Jan 7 01:10:59 2003
***************
*** 5,11 ****
/* m_fusion.c --- initialize all models, work area and parameters
to make up system */
! /* $Id: m_fusion.c,v 1.9 2002/09/11 22:02:33 ri Exp $ */
#include
--- 5,11 ----
/* m_fusion.c --- initialize all models, work area and parameters
to make up system */
! /* $Id: m_fusion.c,v 1.11 2003/01/06 16:10:39 ri Exp $ */
#include
***************
*** 45,50 ****
--- 45,53 ----
}
#endif
+ /* find short pause model and set to hmminfo->sp */
+ htk_hmm_set_pause_model(hmminfo, spmodel_name);
+
/* set flag for context dependent handling (if not specified in command arg)*/
if (!ccd_flag_force) {
if (hmminfo->is_triphone) {
***************
*** 59,64 ****
--- 62,75 ----
} else {
hmminfo->prefer_cdset_avg = FALSE;
}
+
+ /* find short-pause model */
+ if (enable_iwsp) {
+ if (hmminfo->sp == NULL) {
+ j_error("cannot find short pause model \"%s\" in hmmdefs\n", spmodel_name);
+ }
+ hmminfo->iwsp_penalty = iwsp_penalty;
+ }
}
/* initialize HMM for Gaussian Mixture Selection */
***************
*** 86,93 ****
) {
j_error("ERROR: failed to read dictionary, terminated\n");
}
!
#ifdef USE_NGRAM
/* set {head,tail}_silwid */
winfo->head_silwid = voca_lookup_wid(head_silname, winfo);
if (winfo->head_silwid == WORD_INVALID) { /* not exist */
--- 97,118 ----
) {
j_error("ERROR: failed to read dictionary, terminated\n");
}
!
#ifdef USE_NGRAM
+ /* if necessary, append a IW-sp word to the dict if "-iwspword" specified */
+ if (enable_iwspword) {
+ if (
+ #ifdef MONOTREE
+ voca_append_htkdict(iwspentry, winfo, hmminfo, TRUE)
+ #else
+ voca_append_htkdict(iwspentry, winfo, hmminfo, FALSE)
+ #endif
+ == FALSE) {
+ j_error("Error: failed to make IW-sp word entry \"%s\"\n", iwspentry);
+ } else {
+ j_printerr("1 IW-sp word entry added\n");
+ }
+ }
/* set {head,tail}_silwid */
winfo->head_silwid = voca_lookup_wid(head_silname, winfo);
if (winfo->head_silwid == WORD_INVALID) { /* not exist */
***************
*** 98,109 ****
--- 123,137 ----
j_error("ERROR: tail sil word \"%s\" not exist in voca", tail_silname);
}
#endif
+
#ifdef PASS1_IWCD
if (triphone_check_flag && hmminfo->is_triphone) {
/* go into interactive triphone HMM check mode */
hmm_check(hmminfo, winfo);
}
#endif
+
+
}
***************
*** 154,160 ****
init_dfa(dfa, dfa_filename);
/* the rest preparation is done in multigram.c */
/* make_dfa_voca_ref(dfa, winfo);
! dfa_find_pause_word(dfa, winfo, hmminfo, sp_name);
extract_cpair(dfa);*/
}
--- 182,188 ----
init_dfa(dfa, dfa_filename);
/* the rest preparation is done in multigram.c */
/* make_dfa_voca_ref(dfa, winfo);
! dfa_find_pause_word(dfa, winfo, hmminfo);
extract_cpair(dfa);*/
}
diff -crN julius-3.3p2-multipath/julius/m_info.c julius-3.3p3-multipath/julius/m_info.c
*** julius-3.3p2-multipath/julius/m_info.c Mon Nov 18 22:05:38 2002
--- julius-3.3p3-multipath/julius/m_info.c Tue Jan 7 01:12:33 2003
***************
*** 4,10 ****
/* m_info.c --- output system information */
! /* $Id: m_info.c,v 1.18 2002/11/07 15:51:47 ri Exp $ */
#include
--- 4,10 ----
/* m_info.c --- output system information */
! /* $Id: m_info.c,v 1.21 2003/01/06 16:10:39 ri Exp $ */
#include
***************
*** 208,220 ****
#ifdef CLASS_NGRAM
j_printf("\t(-clw)class Ngram weight= %f\n", class_weight);
#endif
! #else /* USE_DFA */
! if (dfa->sp_id != WORD_INVALID) {
! j_printf("\t(-sp)shortpause HMM name= %s (wid=%d[\"%s\"])\n", sp_name, dfa->sp_id, winfo->woutput[dfa->sp_id]);
}
{
int i;
! j_printf("\t shortpause category =");
for(i=0;iterm_num;i++) {
if (dfa->is_sp[i]) {
j_printf(" %d", i);
--- 208,228 ----
#ifdef CLASS_NGRAM
j_printf("\t(-clw)class Ngram weight= %f\n", class_weight);
#endif
! #endif /* USE_NGRAM */
! j_printf("\t(-sp)shortpause HMM name= \"%s\"", spmodel_name);
! if (hmminfo->sp != NULL) {
! j_printf(", used model = \"%s\"", hmminfo->sp->name);
! if (hmminfo->sp->is_pseudo) {
! j_printf(" (pseudo)");
! } else {
! j_printf(" (physical)");
! }
}
+ j_printf("\n");
+ #ifdef USE_DFA
{
int i;
! j_printf("\t found sp category IDs =");
for(i=0;iterm_num;i++) {
if (dfa->is_sp[i]) {
j_printf(" %d", i);
***************
*** 223,228 ****
--- 231,246 ----
j_printf("\n");
}
#endif
+ if (enable_iwsp) {
+ j_printf("\t inter-word short pause = on (append \"%s\" for each word tail)\n", hmminfo->sp->name);
+ j_printf("\t sp transition penalty = %+2.1f\n", iwsp_penalty);
+ }
+ #ifdef USE_NGRAM
+ if (enable_iwspword) {
+ j_printf("\tIW-sp word added to dict= \"%s\"\n", iwspentry);
+ }
+ #endif
+
j_printf("\t 1st pass method = ");
#ifdef WPAIR
# ifdef WPAIR_KEEP_NLIMIT
diff -crN julius-3.3p2-multipath/julius/m_options.c julius-3.3p3-multipath/julius/m_options.c
*** julius-3.3p2-multipath/julius/m_options.c Mon Nov 18 22:05:38 2002
--- julius-3.3p3-multipath/julius/m_options.c Tue Jan 7 01:10:59 2003
***************
*** 4,10 ****
/* m_options.c --- process options and set values */
! /* $Id: m_options.c,v 1.12 2002/11/06 15:27:10 ri Exp $ */
#include
--- 4,10 ----
/* m_options.c --- process options and set values */
! /* $Id: m_options.c,v 1.14 2003/01/06 16:10:39 ri Exp $ */
#include
***************
*** 322,337 ****
} else if (strmatch(argv[i],"-penalty2")) { /* word insertion penalty (pass2) */
penalty2 = (LOGPROB)atof(NEXTARG);
continue;
- } else if (strmatch(argv[i],"-sp")) { /* name of short pause word */
- sp_name = NEXTARG;
- continue;
#endif
#ifdef USE_NGRAM
} else if (strmatch(argv[i],"-silhead")) { /* head silence word name */
head_silname = NEXTARG;
continue;
} else if (strmatch(argv[i],"-siltail")) { /* tail silence word name */
tail_silname = NEXTARG;
continue;
#ifdef CLASS_NGRAM
} else if (strmatch(argv[i],"-class")) {
--- 322,349 ----
} else if (strmatch(argv[i],"-penalty2")) { /* word insertion penalty (pass2) */
penalty2 = (LOGPROB)atof(NEXTARG);
continue;
#endif
+ } else if (strmatch(argv[i],"-spmodel") || strmatch(argv[i], "-sp")) { /* name of short pause word */
+ spmodel_name = NEXTARG;
+ continue;
+ } else if (strmatch(argv[i],"-iwsp")) { /* enable inter-word short pause handing */
+ enable_iwsp = TRUE;
+ continue;
+ } else if (strmatch(argv[i],"-iwsppenalty")) { /* set inter-word short pause transition penalty */
+ iwsp_penalty = atof(NEXTARG);
+ continue;
#ifdef USE_NGRAM
} else if (strmatch(argv[i],"-silhead")) { /* head silence word name */
head_silname = NEXTARG;
continue;
} else if (strmatch(argv[i],"-siltail")) { /* tail silence word name */
tail_silname = NEXTARG;
+ continue;
+ } else if (strmatch(argv[i],"-iwspword")) { /* add short pause word */
+ enable_iwspword = TRUE;
+ continue;
+ } else if (strmatch(argv[i],"-iwspentry")) { /* content of the iwspword */
+ iwspentry = NEXTARG;
continue;
#ifdef CLASS_NGRAM
} else if (strmatch(argv[i],"-class")) {
diff -crN julius-3.3p2-multipath/julius/m_usage.c julius-3.3p3-multipath/julius/m_usage.c
*** julius-3.3p2-multipath/julius/m_usage.c Mon Nov 18 22:05:38 2002
--- julius-3.3p3-multipath/julius/m_usage.c Tue Jan 7 01:10:59 2003
***************
*** 4,10 ****
/* m_usage.c --- print help */
! /* $Id: m_usage.c,v 1.14 2002/11/06 15:27:10 ri Exp $ */
#include
--- 4,10 ----
/* m_usage.c --- print help */
! /* $Id: m_usage.c,v 1.17 2003/01/06 16:10:39 ri Exp $ */
#include
***************
*** 104,114 ****
#ifdef USE_NGRAM
j_printf(" [-silhead wordname] specify beginning-of-sentence word (%s)\n", head_silname);
j_printf(" [-siltail wordname] specify end-of-sentence word (%s)\n", tail_silname);
- #else
- j_printf(" [-sp HMMname] name of short pause model (\"%s\")\n",sp_name);
#endif
j_printf(" [-forcedict] not terminate on error words, just ignore\n");
!
j_printf("\n Acoustic Model:\n");
j_printf(" -h hmmdefsfile HMM definition file name\n");
j_printf(" [-hlist HMMlistfile] HMMlist filename (must for triphone model)\n");
--- 104,116 ----
#ifdef USE_NGRAM
j_printf(" [-silhead wordname] specify beginning-of-sentence word (%s)\n", head_silname);
j_printf(" [-siltail wordname] specify end-of-sentence word (%s)\n", tail_silname);
#endif
j_printf(" [-forcedict] not terminate on error words, just ignore\n");
! #ifdef USE_NGRAM
! j_printf(" [-iwspword] add an sp-word to the vocabulary for inter-word CD sp\n");
! j_printf(" [-iwspentry dictentry] specify content of the iwspword (%s)\n", iwspentry);
! #endif
!
j_printf("\n Acoustic Model:\n");
j_printf(" -h hmmdefsfile HMM definition file name\n");
j_printf(" [-hlist HMMlistfile] HMMlist filename (must for triphone model)\n");
***************
*** 123,128 ****
--- 125,131 ----
j_printf(" [-force_ccd] force to handle IWCD\n");
j_printf(" [-no_ccd] don't handle IWCD\n");
j_printf(" [-notypecheck] don't check input parameter type\n");
+ j_printf(" [-spmodel HMMname] name of short pause model (\"%s\")\n",spmodel_name);
j_printf("\n Acoustic Computation Method:\n");
j_printf(" [-gprune methodname] select Gaussian pruning method:\n");
***************
*** 188,193 ****
--- 191,198 ----
j_printf(" [-oldiwcd] use full lcdset\n");
#endif
#endif
+ j_printf(" [-iwsp] turn on inter-word short pause handling (off)\n");
+ j_printf(" [-iwsppenalty] trans. penalty for inter-word sp (%.1f)\n", iwsp_penalty);
j_printf("\n On-the-fly Decoding: (default: on=mic/net off=files)\n");
j_printf(" [-realtime] turn on, input streamed with last CMN\n");
diff -crN julius-3.3p2-multipath/julius/multi-gram.c julius-3.3p3-multipath/julius/multi-gram.c
*** julius-3.3p2-multipath/julius/multi-gram.c Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/julius/multi-gram.c Mon Jan 6 18:11:09 2003
***************
*** 4,10 ****
/* multi_gram.c --- multiple grammar handling */
! /* $Id: multi-gram.c,v 1.3 2002/09/11 22:02:33 ri Exp $ */
#include
--- 4,10 ----
/* multi_gram.c --- multiple grammar handling */
! /* $Id: multi-gram.c,v 1.5 2003/01/06 09:11:02 ri Exp $ */
#include
***************
*** 354,361 ****
if (m->newbie) {
/* map dict item to dfa terminal symbols */
make_dfa_voca_ref(m->dfa, m->winfo);
! /* set dfa->sp_id and dfa->is_sp from sp_name */
! dfa_find_pause_word(m->dfa, m->winfo, hmminfo, sp_name);
/* build catergory-pair information */
extract_cpair(m->dfa);
}
--- 354,361 ----
if (m->newbie) {
/* map dict item to dfa terminal symbols */
make_dfa_voca_ref(m->dfa, m->winfo);
! /* set dfa->sp_id and dfa->is_sp */
! dfa_find_pause_word(m->dfa, m->winfo, hmminfo);
/* build catergory-pair information */
extract_cpair(m->dfa);
}
diff -crN julius-3.3p2-multipath/julius/search.h julius-3.3p3-multipath/julius/search.h
*** julius-3.3p2-multipath/julius/search.h Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/julius/search.h Tue Nov 26 12:54:01 2002
***************
*** 46,51 ****
--- 46,52 ----
LOGPROB *g_prev; /* viterbi score on 1 phoneme before connection point */
#endif
HMM_Logical *last_ph; /* the last triphone of a hypo */
+ boolean last_ph_sp_attached;
#ifdef USE_NGRAM
# ifndef PASS2_STRICT_IWCD
LOGPROB lscore; /* N-gram score of last word */
diff -crN julius-3.3p2-multipath/julius/search_bestfirst_v1.c julius-3.3p3-multipath/julius/search_bestfirst_v1.c
*** julius-3.3p2-multipath/julius/search_bestfirst_v1.c Thu Oct 17 15:26:44 2002
--- julius-3.3p3-multipath/julius/search_bestfirst_v1.c Mon Jan 6 18:05:12 2003
***************
*** 54,59 ****
--- 54,60 ----
if (ccd_flag) {
memcpy(dst->g_prev, src->g_prev, sizeof(LOGPROB)*peseqlen);
dst->last_ph = src->last_ph;
+ dst->last_ph_sp_attached = src->last_ph_sp_attached;
#ifdef USE_NGRAM
dst->lscore = src->lscore;
#endif
***************
*** 81,86 ****
--- 82,88 ----
tmp->prev=NULL;
tmp->g = (LOGPROB *)mymalloc(sizeof(LOGPROB)*peseqlen);
tmp->last_ph = NULL;
+ tmp->last_ph_sp_attached = FALSE;
if (ccd_flag) {
tmp->g_prev = (LOGPROB *)mymalloc(sizeof(LOGPROB)*peseqlen);
#ifdef USE_NGRAM
***************
*** 109,114 ****
--- 111,117 ----
static LOGPROB *g;
static HMM_Logical **phmmseq;
static int phmmlen_max;
+ static boolean *has_sp;
/* $B#1C18lJ,$N%H%l%j%97W;;MQ$N%o!<%/%(%j%"(B wordtrellis[t][]$B$r3NJ](B */
/* allocate work area 'wordtrellis[t][]' for trellis computation of a word */
***************
*** 131,136 ****
--- 134,140 ----
phmmlen_max = winfo->maxwlen + 2;
phmmseq = (HMM_Logical **)mymalloc(sizeof(HMM_Logical *) * phmmlen_max);
+ has_sp = (boolean *)mymalloc(sizeof(boolean) * phmmlen_max);
}
/* $B>e5-$N%o!<%/%(%j%"(B wordtrellis[t][] $B$r2rJ|(B */
/* free the 'wordtrellis[t][]' */
***************
*** 141,161 ****
free(wordtrellis);
free(g);
free(phmmseq);
}
/* $B:G=*>uBV$X$NA+0\$N3NN($N:GBgCM$rJV$9(B */
/* return the maximum transition log probability to final state */
static LOGPROB
! max_out_arc(HMM_Logical *l)
{
! int state_num;
! HTK_HMM_Trans *tr;
int afrom;
LOGPROB a;
! LOGPROB max_a;
!
! state_num = hmm_logical_state_num(l);
! tr = hmm_logical_trans(l);
max_a = LOG_ZERO;
for (afrom = 0; afrom < state_num - 1; afrom++) {
a = tr->a[afrom][state_num-1];
--- 145,162 ----
free(wordtrellis);
free(g);
free(phmmseq);
+ free(has_sp);
}
/* $B:G=*>uBV$X$NA+0\$N3NN($N:GBgCM$rJV$9(B */
/* return the maximum transition log probability to final state */
static LOGPROB
! get_max_out_arc(HTK_HMM_Trans *tr, int state_num)
{
! LOGPROB max_a;
int afrom;
LOGPROB a;
!
max_a = LOG_ZERO;
for (afrom = 0; afrom < state_num - 1; afrom++) {
a = tr->a[afrom][state_num-1];
***************
*** 163,168 ****
--- 164,174 ----
}
return(max_a);
}
+ static LOGPROB
+ max_out_arc(HMM_Logical *l)
+ {
+ return(get_max_out_arc(hmm_logical_trans(l), hmm_logical_state_num(l)));
+ }
/**********************************************************************/
***************
*** 269,275 ****
}
#ifdef TCD
if (now->last_ph != NULL) {
! j_printf("inherited last_ph: %s\n", (now->last_ph)->name);
} else {
j_printf("no last_ph inherited\n");
}
--- 275,283 ----
}
#ifdef TCD
if (now->last_ph != NULL) {
! j_printf("inherited last_ph: %s", (now->last_ph)->name);
! if (now->last_ph_sp_attached) j_printf(" (sp attached)");
! j_printf("\n");
} else {
j_printf("no last_ph inherited\n");
}
***************
*** 291,296 ****
--- 299,305 ----
}
for (i=0;iwseq[word][i];
+ has_sp[i] = FALSE;
}
/* $B:G=*C18l$H(B last_ph $B4V$NC18l4V(Btriphone$B$r9MN8(B */
/* consider cross-word context dependency between the last word and now->last_ph */
***************
*** 308,313 ****
--- 317,323 ----
} else {
phmmseq[phmmlen-2] = ret;
}
+
ret = get_left_context_HMM(now->last_ph, wend->name, hmminfo);
if (ret == NULL) {
/* fallback to the original bi/mono-phone */
***************
*** 321,342 ****
phmmseq[phmmlen-1] = ret;
}
#ifdef TCD
j_printf("w=");
for(i=0;iwlen[word];i++) {
j_printf(" %s",(winfo->wseq[word][i])->name);
}
! j_printf(" | %s\n", (now->last_ph)->name);
! j_printf("scan for:");
for (i=0;iname);
}
j_printf("\n");
#endif
/* $BC18l(BHMM$B$r:n$k(B */
/* make word HMM */
! whmm = new_make_word_hmm(hmminfo, phmmseq, phmmlen);
/* backscan $B$J$N$G!$7W;;A0$N(B g[] $B=i4|CM$O(B now->g_prev[] $B$r;HMQ(B */
/* As backscan enabled, the initial forward score g[] is set by
--- 331,364 ----
phmmseq[phmmlen-1] = ret;
}
+ has_sp[phmmlen-2] = has_sp[phmmlen-1] = FALSE;
+ if (enable_iwsp) {
+ has_sp[phmmlen-2] = TRUE;
+ if (now->last_ph_sp_attached) {
+ has_sp[phmmlen-1] = TRUE;
+ }
+ }
+
+
#ifdef TCD
j_printf("w=");
for(i=0;iwlen[word];i++) {
j_printf(" %s",(winfo->wseq[word][i])->name);
+ if (has_sp[i]) j_printf("(sp)");
}
! j_printf(" | %s", (now->last_ph)->name);
! if (now->last_ph_sp_attached) j_printf("(sp)");
! j_printf("\nscan for:");
for (i=0;iname);
+ if (has_sp[i]) j_printf("(sp)");
}
j_printf("\n");
#endif
/* $BC18l(BHMM$B$r:n$k(B */
/* make word HMM */
! whmm = new_make_word_hmm(hmminfo, phmmseq, phmmlen, has_sp);
/* backscan $B$J$N$G!$7W;;A0$N(B g[] $B=i4|CM$O(B now->g_prev[] $B$r;HMQ(B */
/* As backscan enabled, the initial forward score g[] is set by
***************
*** 351,360 ****
--- 373,391 ----
in the HMM */
store_point = hmm_logical_state_num(phmmseq[0]) - 2;
store_point_maxarc = max_out_arc(phmmseq[0]);
+ if (enable_iwsp && has_sp[0]) {
+ store_point += hmm_logical_state_num(hmminfo->sp) - 2;
+ if (store_point_maxarc < max_out_arc(hmminfo->sp)) {
+ store_point_maxarc = max_out_arc(hmminfo->sp);
+ }
+ }
/* scan$BCf$KD>A0C18l$H$3$NC18l$r$^$?$0>l=j$r@_Dj(B */
/* set where is the connection point of the last word in the HMM */
crossword_point = whmm->len - hmm_logical_state_num(phmmseq[phmmlen-1]);
+ if (enable_iwsp && has_sp[phmmlen-1]) {
+ crossword_point -= hmm_logical_state_num(hmminfo->sp) - 2;
+ }
} else { /* not backscan mode */
***************
*** 370,376 ****
/* $BC18l(BHMM$B$r:n$k(B */
/* make word HMM */
! whmm = new_make_word_hmm(hmminfo, winfo->wseq[word], winfo->wlen[word]);
/* $B7W;;A0$N(B g[] $B=i4|CM$O(B now->g[] $B$r;HMQ(B */
/* the initial forward score g[] is set by now->g[] */
--- 401,414 ----
/* $BC18l(BHMM$B$r:n$k(B */
/* make word HMM */
! for(i=0;iwlen[word];i++) {
! has_sp[i] = FALSE;
! }
! if (enable_iwsp) {
! has_sp[winfo->wlen[word]-1] = TRUE;
! }
!
! whmm = new_make_word_hmm(hmminfo, winfo->wseq[word], winfo->wlen[word], has_sp);
/* $B7W;;A0$N(B g[] $B=i4|CM$O(B now->g[] $B$r;HMQ(B */
/* the initial forward score g[] is set by now->g[] */
***************
*** 383,388 ****
--- 421,432 ----
in the HMM */
store_point = hmm_logical_state_num(winfo->wseq[word][0]) - 2;
store_point_maxarc = max_out_arc(winfo->wseq[word][0]);
+ if (enable_iwsp && has_sp[0]) {
+ store_point += hmm_logical_state_num(hmminfo->sp) - 2;
+ if (store_point_maxarc < max_out_arc(hmminfo->sp)) {
+ store_point_maxarc = max_out_arc(hmminfo->sp);
+ }
+ }
/* scan$BCf$KD>A0C18l$H$3$NC18l$r$^$?$0>l=j$O!$$J$7(B */
/* the connection point of the last word is not exist in the HMM */
***************
*** 393,399 ****
/* $B2;AG4D6-Hs0MB8$N>l9g$OC1=c$K:G=*C18lJ,$N(B HMM $B$r:n@.(B */
/* for monophone: simple make HMM for the last word */
! whmm = new_make_word_hmm(hmminfo, winfo->wseq[word], winfo->wlen[word]);
/* $B7W;;A0$N(B g[] $B=i4|CM$O(B now->g[] $B$r;HMQ(B */
/* the initial forward score g[] is set by now->g[] */
--- 437,449 ----
/* $B2;AG4D6-Hs0MB8$N>l9g$OC1=c$K:G=*C18lJ,$N(B HMM $B$r:n@.(B */
/* for monophone: simple make HMM for the last word */
! for(i=0;iwlen[word];i++) {
! has_sp[i] = FALSE;
! }
! if (enable_iwsp) {
! has_sp[winfo->wlen[word]-1] = TRUE;
! }
! whmm = new_make_word_hmm(hmminfo, winfo->wseq[word], winfo->wlen[word], has_sp);
/* $B7W;;A0$N(B g[] $B=i4|CM$O(B now->g[] $B$r;HMQ(B */
/* the initial forward score g[] is set by now->g[] */
***************
*** 614,625 ****
} else {
now->last_ph = winfo->wseq[word][0];
}
}
/* free work area */
free_hmm(whmm);
#ifdef TCD
! if (ccd_flag) j_printf("last_ph = %s\n", (now->last_ph)->name);
#endif
}
--- 664,680 ----
} else {
now->last_ph = winfo->wseq[word][0];
}
+ now->last_ph_sp_attached = has_sp[0];
}
/* free work area */
free_hmm(whmm);
#ifdef TCD
! if (ccd_flag) {
! j_printf("last_ph = %s", (now->last_ph)->name);
! if (now->last_ph_sp_attached) j_printf(" (sp attached)");
! j_printf("\n");
! }
#endif
}
***************
*** 687,692 ****
--- 742,748 ----
/* $B852>@b$r(Bscan$B$7$?;~$NKvC<2;AG(BHMM -> $B?72>@b$ND>A02;AG(BHMM */
/* inherit last_ph */
new->last_ph = now->last_ph;
+ new->last_ph_sp_attached = now->last_ph_sp_attached;
#ifdef USE_NGRAM
/* set current LM score */
***************
*** 882,887 ****
--- 938,944 ----
newphone = winfo->wseq[word][winfo->wlen[word]-1];
if (ccd_flag) {
new->last_ph = NULL;
+ new->last_ph_sp_attached = FALSE;
#ifdef USE_NGRAM
new->lscore = nword->lscore;
#endif
diff -crN julius-3.3p2-multipath/julius/search_bestfirst_v2.c julius-3.3p3-multipath/julius/search_bestfirst_v2.c
*** julius-3.3p2-multipath/julius/search_bestfirst_v2.c Thu Oct 17 15:27:25 2002
--- julius-3.3p3-multipath/julius/search_bestfirst_v2.c Mon Jan 6 17:44:45 2003
***************
*** 53,58 ****
--- 53,59 ----
dst->tre = src->tre;
if (ccd_flag) {
dst->last_ph = src->last_ph;
+ dst->last_ph_sp_attached = src->last_ph_sp_attached;
}
#ifdef USE_NGRAM
dst->totallscore = src->totallscore;
***************
*** 77,82 ****
--- 78,84 ----
tmp->prev=NULL;
tmp->g = (LOGPROB *)mymalloc(sizeof(LOGPROB)*peseqlen);
tmp->last_ph = NULL;
+ tmp->last_ph_sp_attached = FALSE;
if (ccd_flag) {
#ifdef USE_NGRAM
tmp->totallscore = LOG_ZERO;
***************
*** 103,108 ****
--- 105,111 ----
static HMM_Logical **phmmseq;
static int phmmlen_max;
static HMM_Logical *tailph;
+ static boolean *has_sp;
/* $B#1C18lJ,$N%H%l%j%97W;;MQ$N%o!<%/%(%j%"(B wordtrellis[t][]$B$r3NJ](B */
/* allocate work area 'wordtrellis[t][]' for trellis computation of a word */
***************
*** 125,130 ****
--- 128,134 ----
phmmlen_max = winfo->maxwlen + 2;
phmmseq = (HMM_Logical **)mymalloc(sizeof(HMM_Logical *) * phmmlen_max);
+ has_sp = (boolean *)mymalloc(sizeof(boolean) * phmmlen_max);
}
/* $B>e5-$N%o!<%/%(%j%"(B wordtrellis[t][] $B$r2rJ|(B */
/* free the 'wordtrellis[t][]' */
***************
*** 135,140 ****
--- 139,145 ----
free(wordtrellis);
free(g);
free(phmmseq);
+ free(has_sp);
}
***************
*** 180,186 ****
g_new[0..framelen-1]. Scan should not terminate at least it reaches
'least_frame'. */
static void
! do_viterbi(LOGPROB *g, LOGPROB *g_new, HMM_Logical **phmmseq, int phmmlen, HTK_Param *param, int framelen, int least_frame, LOGPROB *final_g)
{
HMM *whmm; /* HMM */
int wordhmmnum; /* length of above */
--- 185,191 ----
g_new[0..framelen-1]. Scan should not terminate at least it reaches
'least_frame'. */
static void
! do_viterbi(LOGPROB *g, LOGPROB *g_new, HMM_Logical **phmmseq, boolean *has_sp, int phmmlen, HTK_Param *param, int framelen, int least_frame, LOGPROB *final_g)
{
HMM *whmm; /* HMM */
int wordhmmnum; /* length of above */
***************
*** 200,206 ****
/* $BC18l(BHMM$B$r:n$k(B */
/* make word HMM */
! whmm = new_make_word_hmm(hmminfo, phmmseq, phmmlen);
wordhmmnum = whmm->len;
if (wordhmmnum >= winfo->maxwn + 10) {
j_error("scan_word: word too long\n");
--- 205,211 ----
/* $BC18l(BHMM$B$r:n$k(B */
/* make word HMM */
! whmm = new_make_word_hmm(hmminfo, phmmseq, phmmlen, has_sp);
wordhmmnum = whmm->len;
if (wordhmmnum >= winfo->maxwn + 10) {
j_error("scan_word: word too long\n");
***************
*** 313,326 ****
/* for next_word(), do Viterbi computation for the last phone ('lastphone'),
with now->g[] as initial value. Results are stored in new->g[] */
static void
! do_viterbi_next_word(NODE *now, NODE *new, HMM_Logical *lastphone, HTK_Param *param)
{
int t, n;
for(t=0; tg[t];
/* do viterbi computation for the last phone */
! do_viterbi(g, new->g, &lastphone, 1, param, peseqlen, now->estimated_next_t, &(new->final_g));
}
/* $BJ82>@b$NA08~$-L`EY$r99?7$9$k(B
--- 318,333 ----
/* for next_word(), do Viterbi computation for the last phone ('lastphone'),
with now->g[] as initial value. Results are stored in new->g[] */
static void
! do_viterbi_next_word(NODE *now, NODE *new, HMM_Logical *lastphone, boolean sp, HTK_Param *param)
{
int t, n;
for(t=0; tg[t];
/* do viterbi computation for the last phone */
! phmmseq[0] = lastphone;
! has_sp[0] = sp;
! do_viterbi(g, new->g, phmmseq, has_sp, 1, param, peseqlen, now->estimated_next_t, &(new->final_g));
}
/* $BJ82>@b$NA08~$-L`EY$r99?7$9$k(B
***************
*** 332,337 ****
--- 339,345 ----
int i,t;
WORD_ID word;
int phmmlen;
+ boolean tail_ph_sp_attached;
/* ----------------------- prepare phoneme sequence ------------------ */
/* triphone$B$J$i@hF,$N(B1$B2;AG$O$3$3$G$OBP>]30(B($B$"$H$G(Bnext_word$B$G$d$k(B) */
***************
*** 373,383 ****
--- 381,393 ----
} else {
tailph = winfo->wseq[word][winfo->wlen[word]-1];
}
+
/* $BD9$5(B1$B$NC18l$Owlen[word] == 1) {
now->last_ph = tailph;
+ now->last_ph_sp_attached = TRUE;
#ifdef TCD
j_printf("suspended as %s\n", (now->last_ph)->name);
#endif
***************
*** 392,402 ****
}
for (i=0;iwseq[word][i+1];
}
phmmseq[phmmlen-1] = tailph;
} else {
phmmlen = winfo->wlen[word];
! for (i=0;iwseq[word][i];
}
/* $B85$N(Bg[]$B$r$$$C$?$sBTHr$7$F$*$/(B */
--- 402,418 ----
}
for (i=0;iwseq[word][i+1];
+ has_sp[i] = FALSE;
}
phmmseq[phmmlen-1] = tailph;
+ has_sp[phmmlen-1] = (enable_iwsp) ? TRUE : FALSE;
} else {
phmmlen = winfo->wlen[word];
! for (i=0;iwseq[word][i];
! has_sp[i] = FALSE;
! }
! if (enable_iwsp) has_sp[phmmlen-1] = TRUE;
}
/* $B85$N(Bg[]$B$r$$$C$?$sBTHr$7$F$*$/(B */
***************
*** 405,416 ****
/* viterbi$B$rg[] $B$r99?7$9$k(B */
/* do viterbi computation for phmmseq from g[] to now->g[] */
! do_viterbi(g, now->g, phmmseq, phmmlen, param, peseqlen, now->estimated_next_t, &(now->final_g));
if (ccd_flag) {
/* $Blast_ph $B$r99?7(B */
/* update 'now->last_ph' for future scan_word() */
now->last_ph = winfo->wseq[word][0];
#ifdef TCD
j_printf("last_ph = %s\n", (now->last_ph)->name);
#endif
--- 421,433 ----
/* viterbi$B$rg[] $B$r99?7$9$k(B */
/* do viterbi computation for phmmseq from g[] to now->g[] */
! do_viterbi(g, now->g, phmmseq, has_sp, phmmlen, param, peseqlen, now->estimated_next_t, &(now->final_g));
if (ccd_flag) {
/* $Blast_ph $B$r99?7(B */
/* update 'now->last_ph' for future scan_word() */
now->last_ph = winfo->wseq[word][0];
+ now->last_ph_sp_attached = FALSE; /* wlen > 1 here */
#ifdef TCD
j_printf("last_ph = %s\n", (now->last_ph)->name);
#endif
***************
*** 498,509 ****
/* $B@b$NMzNr>pJs$H$7$FJ]B8(B */
/* keep the lastphone for next scan_word() */
new->last_ph = lastphone;
}
if (ccd_flag) {
/* $B:G8e$N(B1$B2;AG(B(lastphone)$BJ,$r(Bscan$B$7!$99?7$7$?%9%3%"$r(B new $B$KJ]B8(B */
/* scan the lastphone and set the updated score to new->g[] */
! do_viterbi_next_word(now, new, lastphone, param);
g_src = new->g;
} else {
g_src = now->g;
--- 515,527 ----
/* $B@b$NMzNr>pJs$H$7$FJ]B8(B */
/* keep the lastphone for next scan_word() */
new->last_ph = lastphone;
+ new->last_ph_sp_attached = now->last_ph_sp_attached;
}
if (ccd_flag) {
/* $B:G8e$N(B1$B2;AG(B(lastphone)$BJ,$r(Bscan$B$7!$99?7$7$?%9%3%"$r(B new $B$KJ]B8(B */
/* scan the lastphone and set the updated score to new->g[] */
! do_viterbi_next_word(now, new, lastphone, now->last_ph_sp_attached, param);
g_src = new->g;
} else {
g_src = now->g;
***************
*** 723,729 ****
if (ccd_flag) {
/* $B:G=*2;AGJ,$r(B viterbi $B$7$F:G=*%9%3%"$r@_Dj(B */
/* scan the last phone and update the final score */
! do_viterbi_next_word(now, new, now->last_ph, param);
new->score = new->final_g;
} else {
new->score = now->final_g;
--- 741,747 ----
if (ccd_flag) {
/* $B:G=*2;AGJ,$r(B viterbi $B$7$F:G=*%9%3%"$r@_Dj(B */
/* scan the last phone and update the final score */
! do_viterbi_next_word(now, new, now->last_ph, now->last_ph_sp_attached, param);
new->score = new->final_g;
} else {
new->score = now->final_g;
diff -crN julius-3.3p2-multipath/julius/wchmm.c julius-3.3p3-multipath/julius/wchmm.c
*** julius-3.3p2-multipath/julius/wchmm.c Fri Oct 18 14:11:36 2002
--- julius-3.3p3-multipath/julius/wchmm.c Mon Jan 6 18:10:40 2003
***************
*** 243,249 ****
/* $B$"$kC18l$N$"$k0LCV$N2;AG$+$i30$X=P$kA+0\$N%j%9%H$rF@$k(B */
/* make outgoing transition list for given phone position of a word */
static void
! get_outtrans_list(WCHMM_INFO *wchmm, WORD_ID w, int pos, int *node, LOGPROB *a, int *num, int maxnum)
{
HMM_Logical *ltmp;
int states;
--- 243,249 ----
/* $B$"$kC18l$N$"$k0LCV$N2;AG$+$i30$X=P$kA+0\$N%j%9%H$rF@$k(B */
/* make outgoing transition list for given phone position of a word */
static void
! get_outtrans_list(WCHMM_INFO *wchmm, WORD_ID w, int pos, int *node, LOGPROB *a, int *num, int maxnum, boolean insert_sp)
{
HMM_Logical *ltmp;
int states;
***************
*** 262,273 ****
ltmp = wchmm->winfo->wseq[w][pos];
states = hmm_logical_state_num(ltmp);
!
/* check initial->final state */
if ((hmm_logical_trans(ltmp))->a[0][states-1] != LOG_ZERO) {
/* recursive call for previous phone */
oldnum = *num;
! get_outtrans_list(wchmm, w, pos-1, node, a, num, maxnum);
/* add probability of the skip transition to all the previous ones */
for(k=oldnum;k<*num;k++) {
a[k] += (hmm_logical_trans(ltmp))->a[0][states-1];
--- 262,273 ----
ltmp = wchmm->winfo->wseq[w][pos];
states = hmm_logical_state_num(ltmp);
!
/* check initial->final state */
if ((hmm_logical_trans(ltmp))->a[0][states-1] != LOG_ZERO) {
/* recursive call for previous phone */
oldnum = *num;
! get_outtrans_list(wchmm, w, pos-1, node, a, num, maxnum, FALSE); /* previous phone should not be an sp-inserted phone */
/* add probability of the skip transition to all the previous ones */
for(k=oldnum;k<*num;k++) {
a[k] += (hmm_logical_trans(ltmp))->a[0][states-1];
***************
*** 285,290 ****
--- 285,309 ----
(*num)++;
}
}
+ /* for -iwsp, add outgoing arc from the tail sp model
+ only if need_sp == TRUE.
+ need_sp should be TRUE only when the connecting [pos] phone is also an end phone of the to-be-added word (i.e. homophone word)
+ */
+ /* */
+ if (enable_iwsp && insert_sp) {
+ /* consider sp */
+ for (k = 1; k < hmm_logical_state_num(wchmm->hmminfo->sp) - 1; k++) {
+ prob = hmm_logical_trans(wchmm->hmminfo->sp)->a[k][hmm_logical_state_num(wchmm->hmminfo->sp)-1];
+ if (prob != LOG_ZERO) {
+ if (*num >= maxnum) {
+ j_error("Maximum outtrans list num exceeded! (%d)\n", maxnum);
+ }
+ node[*num] = wchmm->offset[w][pos] + (states - 2) + k - 1;
+ a[*num] = prob;
+ (*num)++;
+ }
+ }
+ }
}
/*printf(" %d(%s)-%d:\"%s\", num=%d\n", w, wchmm->winfo->woutput[w], pos,
(pos < 0) ? "BGN" : wchmm->winfo->wseq[w][pos]->name, *num);*/
***************
*** 316,321 ****
--- 335,341 ----
int ltmp_state_num;
int ato, kkk;
LOGPROB prob;
+ int ntmp;
/*
* if (matchlen > 0) {
***************
*** 377,390 ****
out_num_prev = 1;
} else {
/*printf("%d(%s)\n", word, wchmm->winfo->woutput[word]);*/
! get_outtrans_list(wchmm, matchword, add_to, out_from, out_a, &out_num_prev, wchmm->winfo->maxwn);
/*printf("NUM=%d\n", out_num_prev);*/
}
if (add_tail - add_head + 1 > 0) { /* there are new phones to be created */
{
- LOGPROB prob;
- int ntmp = n;
#ifdef PASS1_IWCD
CD_Set *lcd = NULL;
#endif
--- 397,409 ----
out_num_prev = 1;
} else {
/*printf("%d(%s)\n", word, wchmm->winfo->woutput[word]);*/
! /* on -iwsp, trailing sp is needed only when no phone will be created */
! get_outtrans_list(wchmm, matchword, add_to, out_from, out_a, &out_num_prev, wchmm->winfo->maxwn, (add_tail - add_head + 1 > 0) ? FALSE : TRUE);
/*printf("NUM=%d\n", out_num_prev);*/
}
if (add_tail - add_head + 1 > 0) { /* there are new phones to be created */
{
#ifdef PASS1_IWCD
CD_Set *lcd = NULL;
#endif
***************
*** 527,533 ****
} /* end of phone loop */
}
} /* new phone node creation loop for this word */
!
/* make mapping: word <-> node on wchmm */
for (j=0;j 0) { /* there are new phones to be created */
! int ntmp_bak;
!
! /* set short pause state info */
! ntmp_bak = ntmp;
! if (hmminfo->sp->is_pseudo) {
! for(k = 1;k < hmm_logical_state_num(hmminfo->sp) - 1; k++) {
! wchmm->state[ntmp].outstyle = AS_LSET;
! wchmm->state[ntmp].out.lset = &(hmminfo->sp->body.pseudo->stateset[k]);
! wchmm->state[ntmp].ac = NULL;
! wchmm->stend[ntmp] = WORD_INVALID;
! ntmp++;
! if (ntmp >= wchmm->maxwcn) wchmm_extend(wchmm);
! }
! } else {
! for(k = 1;k < hmm_logical_state_num(hmminfo->sp) - 1; k++) {
! wchmm->state[ntmp].outstyle = AS_STATE;
! wchmm->state[ntmp].out.state = hmminfo->sp->body.defined->s[k];
! wchmm->state[ntmp].ac = NULL;
! wchmm->stend[ntmp] = WORD_INVALID;
! ntmp++;
! if (ntmp >= wchmm->maxwcn) wchmm_extend(wchmm);
! }
! }
! ntmp = ntmp_bak;
! /* connect incoming arcs from previous phone */
! out_num_next = 0;
! for (ato = 1; ato < hmm_logical_state_num(hmminfo->sp); ato++) {
! prob = hmm_logical_trans(hmminfo->sp)->a[0][ato];
! if (prob != LOG_ZERO) {
! /* to control short pause insertion, transition probability toward
! the word-end short pause will be given a penalty */
! prob += hmminfo->iwsp_penalty;
! if (ato == hmm_logical_state_num(hmminfo->sp) - 1) {
! /* model has a model skip transition, just inherit them to next */
! for(kkk=0; kkksp)->a[0][hmm_logical_state_num(hmminfo->sp)-1] == LOG_ZERO) {
! /* to make insertion sp model to have no effect on the original path,
! the skip transition probability should be 0.0 (=100%) */
! prob = 0.0;
! for(kkk=0; kkksp) - 1; k++) {
! for (ato = 1; ato < hmm_logical_state_num(hmminfo->sp); ato++) {
! prob = hmm_logical_trans(hmminfo->sp)->a[k][ato];
! if (prob != LOG_ZERO) {
! if (ato == hmm_logical_state_num(hmminfo->sp) - 1) {
! out_from_next[out_num_next] = ntmp;
! out_a_next[out_num_next] = prob;
! out_num_next++;
! } else {
! add_wacc(wchmm, ntmp, prob, ntmp + ato - k);
! }
! }
! }
! ntmp++;
! }
! /* swap work area for next */
! for(kkk=0;kkk node on wchmm */
for (j=0;jwinfo->wseq[word][j]) - 2;
}
}
/* make word-end node (always create for each new word) */
wchmm->wordend[word] = n; /* tail node of 'word' is 'n' */
wchmm->stend[n] = word; /* node 'k' is a tail node of 'word' */
wchmm->state[n].ac = NULL;
wchmm->state[n].out.state = NULL;
! /* the last outgoing arcs are in out_from[] */
for(k = 0; k < out_num_prev; k++) {
add_wacc(wchmm, out_from[k], out_a[k], n);
}
--- 642,657 ----
n += hmm_logical_state_num(wchmm->winfo->wseq[word][j]) - 2;
}
}
+ if (enable_iwsp && add_tail - add_head + 1 > 0) {
+ n += hmm_logical_state_num(hmminfo->sp) - 2;
+ if (n != ntmp) j_error("Algorithm Error! cannot match\n");
+ }
/* make word-end node (always create for each new word) */
wchmm->wordend[word] = n; /* tail node of 'word' is 'n' */
wchmm->stend[n] = word; /* node 'k' is a tail node of 'word' */
wchmm->state[n].ac = NULL;
wchmm->state[n].out.state = NULL;
! /* connect the final outgoing arcs in out_from[] to the word end node */
for(k = 0; k < out_num_prev; k++) {
add_wacc(wchmm, out_from[k], out_a[k], n);
}
***************
*** 555,561 ****
/* check if the new word has whole word-skipping transition */
/* (use out_from and out_num_prev temporary) */
out_num_prev = 0;
! get_outtrans_list(wchmm, word, word_len-1, out_from, out_a, &out_num_prev, wchmm->winfo->maxwn);
for(k=0;kwordbegin[word]) {
j_printerr("\n*** ERROR: WORD SKIPPING TRANSITION NOT ALLOWED ***\n");
--- 662,668 ----
/* check if the new word has whole word-skipping transition */
/* (use out_from and out_num_prev temporary) */
out_num_prev = 0;
! get_outtrans_list(wchmm, word, word_len-1, out_from, out_a, &out_num_prev, wchmm->winfo->maxwn, TRUE);
for(k=0;kwordbegin[word]) {
j_printerr("\n*** ERROR: WORD SKIPPING TRANSITION NOT ALLOWED ***\n");
diff -crN julius-3.3p2-multipath/julius/word_align.c julius-3.3p3-multipath/julius/word_align.c
*** julius-3.3p2-multipath/julius/word_align.c Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/julius/word_align.c Mon Jan 6 18:05:07 2003
***************
*** 14,25 ****
/* build sentence HMM from word sequence */
static HMM_Logical **
! make_phseq(WORD_ID *wseq, short num, int *num_ret, int **end_ret, int per_what)
{
HMM_Logical **ph; /* phoneme sequence */
int phnum; /* num of above */
WORD_ID tmpw, w;
! int i, j, pn, st, endn;
HMM_Logical *tmpp, *ret;
/* make ph[] from wseq[] */
--- 14,26 ----
/* build sentence HMM from word sequence */
static HMM_Logical **
! make_phseq(WORD_ID *wseq, short num, boolean **has_sp_ret, int *num_ret, int **end_ret, int per_what)
{
HMM_Logical **ph; /* phoneme sequence */
+ boolean *has_sp;
int phnum; /* num of above */
WORD_ID tmpw, w;
! int i, j, k, pn, st, endn;
HMM_Logical *tmpp, *ret;
/* make ph[] from wseq[] */
***************
*** 27,32 ****
--- 28,34 ----
phnum = 0;
for (w=0;wwlen[wseq[w]];
ph = (HMM_Logical **)mymalloc(sizeof(HMM_Logical *) * phnum);
+ has_sp = (boolean *)mymalloc(sizeof(boolean) * phnum);
/* 2. make phoneme sequence */
st = 1;
pn = 0;
***************
*** 51,68 ****
}
}
}
! ph[pn++] = tmpp;
if (per_what == PER_STATE) {
for (j=0;jwlen[tmpw] - 1) {
! has_sp[pn] = TRUE;
! } else {
! has_sp[pn] = FALSE;
! }
!
if (per_what == PER_STATE) {
for (j=0;jsp)-2;k++) {
+ (*end_ret)[endn++] = st + j + k;
+ }
+ }
}
st += hmm_logical_state_num(tmpp) - 2;
+ if (enable_iwsp && has_sp[pn]) {
+ st += hmm_logical_state_num(hmminfo->sp) - 2;
+ }
if (per_what == PER_PHONEME) (*end_ret)[endn++] = st - 1;
+
+ pn++;
}
if (per_what == PER_WORD) (*end_ret)[endn++] = st - 1;
}
*num_ret = phnum;
+ *has_sp_ret = has_sp;
return ph;
}
***************
*** 73,78 ****
--- 92,98 ----
do_align(WORD_ID *words, short wnum, HTK_Param *param, int per_what)
{
HMM_Logical **phones; /* phoneme sequence */
+ boolean *has_sp;
int phonenum; /* num of above */
HMM *shmm; /* sentence HMM */
int *end_state; /* state number of word ends */
***************
*** 108,118 ****
for (i=0;iwlen[words[w]]; i++) {
end_num += hmm_logical_state_num(winfo->wseq[words[w]][i]) - 2;
}
}
phloc = (int *)mymalloc(sizeof(int)*end_num);
stloc = (int *)mymalloc(sizeof(int)*end_num);
{
! int j,n,p;
n = 0;
p = 0;
for(w=0;wwlen[words[w]]; i++) {
end_num += hmm_logical_state_num(winfo->wseq[words[w]][i]) - 2;
}
+ if (enable_iwsp) {
+ end_num += hmm_logical_state_num(hmminfo->sp) - 2;
+ }
}
phloc = (int *)mymalloc(sizeof(int)*end_num);
stloc = (int *)mymalloc(sizeof(int)*end_num);
{
! int j,n,p,k;
n = 0;
p = 0;
for(w=0;wwlen[words[w]] - 1) {
+ for(k=0;ksp)-2;k++) {
+ phloc[n] = p;
+ stloc[n] = j + 1 + k + end_num;
+ n++;
+ }
+ }
p++;
}
}
***************
*** 132,140 ****
end_state = (int *)mymalloc(sizeof(int) * end_num);
/* make phoneme sequence word sequence */
! phones = make_phseq(words, wnum, &phonenum, &end_state, per_what);
/* build the sentence HMMs */
! shmm = new_make_word_hmm(hmminfo, phones, phonenum);
/* call viterbi segmentation function */
allscore = viterbi_segment(shmm, param, end_state, end_num, &id_seq, &end_frame, &end_score, &rlen);
--- 162,170 ----
end_state = (int *)mymalloc(sizeof(int) * end_num);
/* make phoneme sequence word sequence */
! phones = make_phseq(words, wnum, &has_sp, &phonenum, &end_state, per_what);
/* build the sentence HMMs */
! shmm = new_make_word_hmm(hmminfo, phones, phonenum, has_sp);
/* call viterbi segmentation function */
allscore = viterbi_segment(shmm, param, end_state, end_num, &id_seq, &end_frame, &end_score, &rlen);
***************
*** 178,184 ****
} else {
j_printf(" %s[%s]", phones[n]->name, phones[n]->body.defined->name);
}
! j_printf(" #%d", stloc[id_seq[i]]);
break;
}
j_printf("\n");
--- 208,218 ----
} else {
j_printf(" %s[%s]", phones[n]->name, phones[n]->body.defined->name);
}
! if (enable_iwsp && stloc[id_seq[i]] > end_num) {
! j_printf(" #%d (sp)", stloc[id_seq[i]] - end_num);
! } else {
! j_printf(" #%d", stloc[id_seq[i]]);
! }
break;
}
j_printf("\n");
***************
*** 188,193 ****
--- 222,228 ----
free(id_seq);
free(phones);
+ free(has_sp);
free(end_score);
free(end_frame);
free(end_state);
diff -crN julius-3.3p2-multipath/libsent/configure julius-3.3p3-multipath/libsent/configure
*** julius-3.3p2-multipath/libsent/configure Mon Nov 18 23:22:36 2002
--- julius-3.3p3-multipath/libsent/configure Mon Nov 25 21:55:26 2002
***************
*** 553,559 ****
ac_configure=$ac_aux_dir/configure # This should be Cygnus configure.
! VERSION=3.3p2-multipath
# specify mic type
# Check whether --with-mictype or --without-mictype was given.
--- 553,559 ----
ac_configure=$ac_aux_dir/configure # This should be Cygnus configure.
! VERSION=3.3-multipath
# specify mic type
# Check whether --with-mictype or --without-mictype was given.
***************
*** 1937,1944 ****
wavefile_support="RAW and WAV only"
! echo $ac_n "checking for sf_open_read in -lsndfile""... $ac_c" 1>&6
! echo "configure:1942: checking for sf_open_read in -lsndfile" >&5
ac_lib_var=`echo sndfile'_'sf_open_read | sed 'y%./+-%__p_%'`
if eval "test \"`echo '$''{'ac_cv_lib_$ac_lib_var'+set}'`\" = set"; then
echo $ac_n "(cached) $ac_c" 1>&6
--- 1937,2035 ----
wavefile_support="RAW and WAV only"
! have_libsndfile=no
! echo $ac_n "checking for sf_open in -lsndfile""... $ac_c" 1>&6
! echo "configure:1943: checking for sf_open in -lsndfile" >&5
! ac_lib_var=`echo sndfile'_'sf_open | sed 'y%./+-%__p_%'`
! if eval "test \"`echo '$''{'ac_cv_lib_$ac_lib_var'+set}'`\" = set"; then
! echo $ac_n "(cached) $ac_c" 1>&6
! else
! ac_save_LIBS="$LIBS"
! LIBS="-lsndfile $LIBS"
! cat > conftest.$ac_ext <&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext}; then
! rm -rf conftest*
! eval "ac_cv_lib_$ac_lib_var=yes"
! else
! echo "configure: failed program was:" >&5
! cat conftest.$ac_ext >&5
! rm -rf conftest*
! eval "ac_cv_lib_$ac_lib_var=no"
! fi
! rm -f conftest*
! LIBS="$ac_save_LIBS"
!
! fi
! if eval "test \"`echo '$ac_cv_lib_'$ac_lib_var`\" = yes"; then
! echo "$ac_t""yes" 1>&6
! for ac_hdr in sndfile.h
! do
! ac_safe=`echo "$ac_hdr" | sed 'y%./+-%__p_%'`
! echo $ac_n "checking for $ac_hdr""... $ac_c" 1>&6
! echo "configure:1981: checking for $ac_hdr" >&5
! if eval "test \"`echo '$''{'ac_cv_header_$ac_safe'+set}'`\" = set"; then
! echo $ac_n "(cached) $ac_c" 1>&6
! else
! cat > conftest.$ac_ext <
! EOF
! ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
! { (eval echo configure:1991: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
! ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
! if test -z "$ac_err"; then
! rm -rf conftest*
! eval "ac_cv_header_$ac_safe=yes"
! else
! echo "$ac_err" >&5
! echo "configure: failed program was:" >&5
! cat conftest.$ac_ext >&5
! rm -rf conftest*
! eval "ac_cv_header_$ac_safe=no"
! fi
! rm -f conftest*
! fi
! if eval "test \"`echo '$ac_cv_header_'$ac_safe`\" = yes"; then
! echo "$ac_t""yes" 1>&6
! ac_tr_hdr=HAVE_`echo $ac_hdr | sed 'y%abcdefghijklmnopqrstuvwxyz./-%ABCDEFGHIJKLMNOPQRSTUVWXYZ___%'`
! cat >> confdefs.h <> confdefs.h <<\EOF
! #define HAVE_LIBSNDFILE 1
! EOF
!
! cat >> confdefs.h <<\EOF
! #define HAVE_LIBSNDFILE_VER1 1
! EOF
!
! EXTRALIB="$EXTRALIB -lsndfile"
! have_libsndfile=yes
! else
! echo "$ac_t""no" 1>&6
! fi
! done
!
! else
! echo "$ac_t""no" 1>&6
! fi
!
! if test $have_libsndfile = no; then
! echo $ac_n "checking for sf_open_read in -lsndfile""... $ac_c" 1>&6
! echo "configure:2033: checking for sf_open_read in -lsndfile" >&5
ac_lib_var=`echo sndfile'_'sf_open_read | sed 'y%./+-%__p_%'`
if eval "test \"`echo '$''{'ac_cv_lib_$ac_lib_var'+set}'`\" = set"; then
echo $ac_n "(cached) $ac_c" 1>&6
***************
*** 1946,1952 ****
ac_save_LIBS="$LIBS"
LIBS="-lsndfile $LIBS"
cat > conftest.$ac_ext < conftest.$ac_ext <&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext}; then
rm -rf conftest*
eval "ac_cv_lib_$ac_lib_var=yes"
else
--- 2048,2054 ----
sf_open_read()
; return 0; }
EOF
! if { (eval echo configure:2052: \"$ac_link\") 1>&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext}; then
rm -rf conftest*
eval "ac_cv_lib_$ac_lib_var=yes"
else
***************
*** 1976,1992 ****
do
ac_safe=`echo "$ac_hdr" | sed 'y%./+-%__p_%'`
echo $ac_n "checking for $ac_hdr""... $ac_c" 1>&6
! echo "configure:1980: checking for $ac_hdr" >&5
if eval "test \"`echo '$''{'ac_cv_header_$ac_safe'+set}'`\" = set"; then
echo $ac_n "(cached) $ac_c" 1>&6
else
cat > conftest.$ac_ext <
EOF
ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
! { (eval echo configure:1990: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
if test -z "$ac_err"; then
rm -rf conftest*
--- 2067,2083 ----
do
ac_safe=`echo "$ac_hdr" | sed 'y%./+-%__p_%'`
echo $ac_n "checking for $ac_hdr""... $ac_c" 1>&6
! echo "configure:2071: checking for $ac_hdr" >&5
if eval "test \"`echo '$''{'ac_cv_header_$ac_safe'+set}'`\" = set"; then
echo $ac_n "(cached) $ac_c" 1>&6
else
cat > conftest.$ac_ext <
EOF
ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
! { (eval echo configure:2081: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
if test -z "$ac_err"; then
rm -rf conftest*
***************
*** 2006,2021 ****
cat >> confdefs.h <> confdefs.h <<\EOF
#define HAVE_LIBSNDFILE 1
EOF
! EXTRALIB="$EXTRALIB -lsndfile"
else
echo "$ac_t""no" 1>&6
- echo "configure: warning: libsndfile enables AIFF AU SND NIST reading.
- It's available at http://www.zip.com.au/~erikd/libsndfile/" 1>&2
fi
done
--- 2097,2111 ----
cat >> confdefs.h <> confdefs.h <<\EOF
#define HAVE_LIBSNDFILE 1
EOF
! EXTRALIB="$EXTRALIB -lsndfile"
! have_libsndfile=yes
else
echo "$ac_t""no" 1>&6
fi
done
***************
*** 2023,2028 ****
--- 2113,2123 ----
echo "$ac_t""no" 1>&6
fi
+ fi
+ if test $have_libsndfile = no; then
+ echo "configure: warning: libsndfile enables AIFF AU SND NIST reading.
+ It's available at http://www.zip.com.au/~erikd/libsndfile/" 1>&2
+ fi
diff -crN julius-3.3p2-multipath/libsent/configure.in julius-3.3p3-multipath/libsent/configure.in
*** julius-3.3p2-multipath/libsent/configure.in Mon Nov 18 23:22:36 2002
--- julius-3.3p3-multipath/libsent/configure.in Mon Nov 25 21:55:27 2002
***************
*** 2,8 ****
dnl Copyright (c) 2000-2002 Speech and Acoustics Processing Lab., NAIST
dnl All rights reserved
dnl
! dnl $Id: configure.in,v 1.8 2002/09/11 22:01:50 ri Exp $
dnl
dnl Process this file with autoconf to produce a configure script.
--- 2,8 ----
dnl Copyright (c) 2000-2002 Speech and Acoustics Processing Lab., NAIST
dnl All rights reserved
dnl
! dnl $Id: configure.in,v 1.9 2002/11/19 04:03:29 ri Exp $
dnl
dnl Process this file with autoconf to produce a configure script.
***************
*** 10,16 ****
AC_CONFIG_HEADER(include/sent/config.h)
AC_CONFIG_AUX_DIR(../support)
! VERSION=3.3p2-multipath
dnl Checks for options
# specify mic type
--- 10,16 ----
AC_CONFIG_HEADER(include/sent/config.h)
AC_CONFIG_AUX_DIR(../support)
! VERSION=3.3-multipath
dnl Checks for options
# specify mic type
***************
*** 254,266 ****
dnl libsndfile
wavefile_support="RAW and WAV only"
! AC_CHECK_LIB(sndfile, sf_open_read,
AC_CHECK_HEADERS(sndfile.h,
! wavefile_support='RAW WAV AU SND NIST ADPCM and more by libsndfiles'
AC_DEFINE(HAVE_LIBSNDFILE)
! EXTRALIB="$EXTRALIB -lsndfile",
! AC_MSG_WARN([libsndfile enables AIFF AU SND NIST reading.
! It's available at http://www.zip.com.au/~erikd/libsndfile/])))
AC_SUBST(wavefile_support)
AC_SUBST(EXTRAOBJ)
--- 254,279 ----
dnl libsndfile
wavefile_support="RAW and WAV only"
! have_libsndfile=no
! AC_CHECK_LIB(sndfile, sf_open,
AC_CHECK_HEADERS(sndfile.h,
! wavefile_support='a large number of formats by libsndfile ver.1'
AC_DEFINE(HAVE_LIBSNDFILE)
! AC_DEFINE(HAVE_LIBSNDFILE_VER1)
! EXTRALIB="$EXTRALIB -lsndfile"
! have_libsndfile=yes))
! if test $have_libsndfile = no; then
! AC_CHECK_LIB(sndfile, sf_open_read,
! AC_CHECK_HEADERS(sndfile.h,
! wavefile_support='a large number of formats by libsndfile ver.0'
! AC_DEFINE(HAVE_LIBSNDFILE)
! EXTRALIB="$EXTRALIB -lsndfile"
! have_libsndfile=yes))
! fi
! if test $have_libsndfile = no; then
! AC_MSG_WARN([libsndfile enables AIFF AU SND NIST reading.
! It's available at http://www.zip.com.au/~erikd/libsndfile/])
! fi
AC_SUBST(wavefile_support)
AC_SUBST(EXTRAOBJ)
diff -crN julius-3.3p2-multipath/libsent/include/sent/adin.h julius-3.3p3-multipath/libsent/include/sent/adin.h
*** julius-3.3p2-multipath/libsent/include/sent/adin.h Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/include/sent/adin.h Mon Nov 25 21:55:27 2002
***************
*** 4,10 ****
/* adin.h --- audio input */
! /* $Id: adin.h,v 1.5 2002/09/11 22:01:50 ri Exp $ */
#ifndef __SENT_ADIN__
#define __SENT_ADIN__
--- 4,10 ----
/* adin.h --- audio input */
! /* $Id: adin.h,v 1.6 2002/11/25 02:23:29 ri Exp $ */
#ifndef __SENT_ADIN__
#define __SENT_ADIN__
***************
*** 83,89 ****
void end_count_zc_e();
int count_zc_e(SP16 *buf,int step);
int count_zc_e_level(SP16 *buf,int step,int *levelp);
! void zc_copy_buffer(SP16 *newbuf, int len);
/* adin/zmean.c */
void sub_zmean(SP16 *speech, int samplenum);
--- 83,89 ----
void end_count_zc_e();
int count_zc_e(SP16 *buf,int step);
int count_zc_e_level(SP16 *buf,int step,int *levelp);
! void zc_copy_buffer(SP16 *newbuf, int *len);
/* adin/zmean.c */
void sub_zmean(SP16 *speech, int samplenum);
diff -crN julius-3.3p2-multipath/libsent/include/sent/config.h.in julius-3.3p3-multipath/libsent/include/sent/config.h.in
*** julius-3.3p2-multipath/libsent/include/sent/config.h.in Tue Aug 14 15:45:22 2001
--- julius-3.3p3-multipath/libsent/include/sent/config.h.in Mon Nov 25 21:55:27 2002
***************
*** 10,16 ****
byte first (like Motorola and SPARC, unlike Intel and VAX). */
#undef WORDS_BIGENDIAN
! /* $Id: config.h.in,v 1.1 2001/08/14 06:45:22 ri Exp $ */
/* use microphone input */
#undef USE_MIC
--- 10,16 ----
byte first (like Motorola and SPARC, unlike Intel and VAX). */
#undef WORDS_BIGENDIAN
! /* $Id: config.h.in,v 1.2 2002/11/19 04:03:29 ri Exp $ */
/* use microphone input */
#undef USE_MIC
***************
*** 19,24 ****
--- 19,27 ----
/* libsndfile support */
#undef HAVE_LIBSNDFILE
+
+ /* libsndfile support (ver.1) */
+ #undef HAVE_LIBSNDFILE_VER1
/* use integer word WORD_ID (for over 60k word) */
#undef WORDS_INT
diff -crN julius-3.3p2-multipath/libsent/include/sent/dfa.h julius-3.3p3-multipath/libsent/include/sent/dfa.h
*** julius-3.3p2-multipath/libsent/include/sent/dfa.h Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/include/sent/dfa.h Tue Jan 7 01:10:59 2003
***************
*** 4,10 ****
/* dfa.h --- DFA structures for fa-parser and word-pair info for 1st pass */
! /* $Id: dfa.h,v 1.6 2002/09/11 22:01:50 ri Exp $ */
#ifndef __SENT_DFA_H__
#define __SENT_DFA_H__
--- 4,10 ----
/* dfa.h --- DFA structures for fa-parser and word-pair info for 1st pass */
! /* $Id: dfa.h,v 1.8 2003/01/06 16:10:39 ri Exp $ */
#ifndef __SENT_DFA_H__
#define __SENT_DFA_H__
***************
*** 16,23 ****
#define INITIAL_S 0x10000000 /* initial status flag */
#define ACCEPT_S 0x00000001 /* accept status flag */
- #define SP_NAME_DEFAULT "sp" /* default name of short pause model */
-
/* DFA state */
typedef struct __dfa_state__ {
int number; /* unique ID */
--- 16,21 ----
***************
*** 93,99 ****
void make_terminfo(TERM_INFO *tinfo, DFA_INFO *dinfo, WORD_INFO *winfo);
void terminfo_append(TERM_INFO *dst, TERM_INFO *src, int coffset, int woffset);
#include
! void dfa_find_pause_word(DFA_INFO *dfa, WORD_INFO *winfo, HTK_HMM_INFO *hmminfo, char *sp_name);
void dfa_pause_word_append(DFA_INFO *dst, DFA_INFO *src, int coffset);
#endif /* __SENT_DFA_H__ */
--- 91,97 ----
void make_terminfo(TERM_INFO *tinfo, DFA_INFO *dinfo, WORD_INFO *winfo);
void terminfo_append(TERM_INFO *dst, TERM_INFO *src, int coffset, int woffset);
#include
! void dfa_find_pause_word(DFA_INFO *dfa, WORD_INFO *winfo, HTK_HMM_INFO *hmminfo);
void dfa_pause_word_append(DFA_INFO *dst, DFA_INFO *src, int coffset);
#endif /* __SENT_DFA_H__ */
diff -crN julius-3.3p2-multipath/libsent/include/sent/hmm.h julius-3.3p3-multipath/libsent/include/sent/hmm.h
*** julius-3.3p2-multipath/libsent/include/sent/hmm.h Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/include/sent/hmm.h Tue Nov 26 12:58:56 2002
***************
*** 39,46 ****
/* mkwhmm.c */
! HMM *new_make_word_hmm(HTK_HMM_INFO *, HMM_Logical **, int);
! HMM *new_make_word_hmm_with_lm(HTK_HMM_INFO *, HMM_Logical **, int, LOGPROB *);
void free_hmm(HMM *);
/* vsegment.c */
LOGPROB viterbi_segment(HMM *hmm, HTK_Param *param, int *endstates, int ulen, int **id_ret, int **seg_ret, LOGPROB **uscore_ret, int *retlen);
--- 39,46 ----
/* mkwhmm.c */
! HMM *new_make_word_hmm(HTK_HMM_INFO *, HMM_Logical **, int, boolean *);
! HMM *new_make_word_hmm_with_lm(HTK_HMM_INFO *, HMM_Logical **, int, boolean *, LOGPROB *);
void free_hmm(HMM *);
/* vsegment.c */
LOGPROB viterbi_segment(HMM *hmm, HTK_Param *param, int *endstates, int ulen, int **id_ret, int **seg_ret, LOGPROB **uscore_ret, int *retlen);
diff -crN julius-3.3p2-multipath/libsent/include/sent/htk_defs.h julius-3.3p3-multipath/libsent/include/sent/htk_defs.h
*** julius-3.3p2-multipath/libsent/include/sent/htk_defs.h Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/include/sent/htk_defs.h Mon Nov 25 22:44:18 2002
***************
*** 62,65 ****
--- 62,67 ----
} OptionStr;
+ #define SP_NAME_DEFAULT "sp" /* default name of short pause model */
+
#endif /* __SENT_HTK_DEFS_H__ */
diff -crN julius-3.3p2-multipath/libsent/include/sent/htk_hmm.h julius-3.3p3-multipath/libsent/include/sent/htk_hmm.h
*** julius-3.3p2-multipath/libsent/include/sent/htk_hmm.h Tue Oct 29 14:44:31 2002
--- julius-3.3p3-multipath/libsent/include/sent/htk_hmm.h Tue Jan 7 01:10:59 2003
***************
*** 4,10 ****
/* htk_hmm.h --- HMM structure for HTK format */
! /* $Id: htk_hmm.h,v 1.4 2002/09/11 22:01:50 ri Exp $ */
#ifndef __SENT_HTK_HMM_2_H__
#define __SENT_HTK_HMM_2_H__
--- 4,10 ----
/* htk_hmm.h --- HMM structure for HTK format */
! /* $Id: htk_hmm.h,v 1.6 2003/01/06 16:10:39 ri Exp $ */
#ifndef __SENT_HTK_HMM_2_H__
#define __SENT_HTK_HMM_2_H__
***************
*** 25,30 ****
--- 25,31 ----
#define HMM_RC_DLIM_C '+' /* right-context delimiter */
#define HMM_LC_DLIM_C '-' /* right-context delimiter */
+ #define SPMODEL_NAME_DEFAULT "sp" /* default logical name of short pause model */
/* options info */
***************
*** 185,190 ****
--- 186,193 ----
boolean is_triphone; /* TRUE if this is triphone model */
boolean is_tied_mixture; /* TRUE if this is tied-mixture model */
boolean prefer_cdset_avg; /* compute average of lcdset instead of maximum */
+ HMM_Logical *sp; /* short pause model */
+ LOGPROB iwsp_penalty; /* transition penalty for interword sp */
int totalmixnum; /* total mixture num */
int totalstatenum; /* total state num */
***************
*** 199,204 ****
--- 202,209 ----
+ /* init_phmm.c */
+ void htk_hmm_set_pause_model(HTK_HMM_INFO *hmminfo, char *spmodel_name);
/* rdhmmdef.c */
void rderr(char *str);
char *read_token(FILE *fp);
diff -crN julius-3.3p2-multipath/libsent/include/sent/vocabulary.h julius-3.3p3-multipath/libsent/include/sent/vocabulary.h
*** julius-3.3p2-multipath/libsent/include/sent/vocabulary.h Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/include/sent/vocabulary.h Tue Jan 7 01:10:59 2003
***************
*** 4,10 ****
/* vocabulary.h --- defines for vocabulary */
! /* $Id: vocabulary.h,v 1.5 2002/09/11 22:01:50 ri Exp $ */
#ifndef __SENT_VOCA_H__
#define __SENT_VOCA_H__
--- 4,10 ----
/* vocabulary.h --- defines for vocabulary */
! /* $Id: vocabulary.h,v 1.6 2003/01/06 16:10:39 ri Exp $ */
#ifndef __SENT_VOCA_H__
#define __SENT_VOCA_H__
***************
*** 40,45 ****
--- 40,46 ----
boolean init_voca(WORD_INFO *winfo, char *filename, HTK_HMM_INFO *hmminfo, boolean, boolean);
boolean voca_load_htkdict(FILE *, WORD_INFO *, HTK_HMM_INFO *, boolean);
boolean voca_load_htkdict_fd(int, WORD_INFO *, HTK_HMM_INFO *, boolean);
+ boolean voca_append_htkdict(char *entry, WORD_INFO *winfo, HTK_HMM_INFO *hmminfo, boolean ignore_tri_conv);
void voca_append(WORD_INFO *dstinfo, WORD_INFO *srcinfo, int coffset, int woffset);
boolean voca_load_htkdict_line(char *buf, int vnum, WORD_INFO *winfo, HTK_HMM_INFO *hmminfo, boolean ignore_tri_conv, boolean do_conv, boolean *ok_flag);
diff -crN julius-3.3p2-multipath/libsent/src/adin/adin-cut.c julius-3.3p3-multipath/libsent/src/adin/adin-cut.c
*** julius-3.3p2-multipath/libsent/src/adin/adin-cut.c Mon Nov 18 22:05:38 2002
--- julius-3.3p3-multipath/libsent/src/adin/adin-cut.c Mon Nov 25 21:55:27 2002
***************
*** 4,10 ****
/* adin-cut.c --- read audio input from device with skipping silence */
! /* $Id: adin-cut.c,v 1.6 2002/11/06 08:36:49 ri Exp $ */
/* use zerocross & level threshold for silence detection */
/* process one speech segment till next silence at a call */
--- 4,10 ----
/* adin-cut.c --- read audio input from device with skipping silence */
! /* $Id: adin-cut.c,v 1.7 2002/11/25 02:23:29 ri Exp $ */
/* use zerocross & level threshold for silence detection */
/* process one speech segment till next silence at a call */
***************
*** 138,144 ****
static int bpmax; /* maximum size of buffer */
static int bp; /* current point to store data */
static int current_len; /* current length of stored samples */
! static SP16 *lastdata = NULL; /* for processing the end samples */
/* purge processed samples */
static void
--- 138,144 ----
static int bpmax; /* maximum size of buffer */
static int bp; /* current point to store data */
static int current_len; /* current length of stored samples */
! static SP16 *cbuf;
/* purge processed samples */
static void
***************
*** 159,169 ****
adin_cut(int (*ad_process)(SP16 *, int), int (*ad_check)())
{
static int i;
- static int pre_length;
static boolean is_valid_data;
int ad_process_ret;
! int imax, stlen, cnt;
! static boolean ending;
static int end_status;
/* variables for zero-cross routines*/
--- 159,168 ----
adin_cut(int (*ad_process)(SP16 *, int), int (*ad_check)())
{
static int i;
static boolean is_valid_data;
int ad_process_ret;
! int imax, len, cnt;
! static boolean end_of_stream;
static int end_status;
/* variables for zero-cross routines*/
***************
*** 189,201 ****
init_count_zc_e(thres, c_length, c_offset);
is_valid_data = FALSE;
}
! ending = FALSE;
wstep = DEFAULT_WSTEP;
nc = 0;
! if (adin_cut_on) pre_length = c_length;
! }
! if (lastdata == NULL) {
! lastdata = (SP16 *)mymalloc(sizeof(SP16) * c_length);
}
/* resume input */
--- 188,197 ----
init_count_zc_e(thres, c_length, c_offset);
is_valid_data = FALSE;
}
! end_of_stream = FALSE;
wstep = DEFAULT_WSTEP;
nc = 0;
! cbuf = (SP16 *)mymalloc(sizeof(SP16) * c_length);
}
/* resume input */
***************
*** 210,216 ****
/* Read samples as 16bit shorts (SP16) */
/* bp: pointer to samples left in queue buffer */
/* all samples in device are read at offset [bp] -> [0..len-1] */
! if (ending) { /* already reaches end of stream */
/* just return */
current_len = bp;
} else {
--- 206,212 ----
/* Read samples as 16bit shorts (SP16) */
/* bp: pointer to samples left in queue buffer */
/* all samples in device are read at offset [bp] -> [0..len-1] */
! if (end_of_stream) { /* already reaches end of stream */
/* just return */
current_len = bp;
} else {
***************
*** 220,226 ****
if (cnt < 0) { /* end of stream or error */
if (cnt == -2) end_status = -1; /* error */
else if (cnt == -1) end_status = 0; /* end of stream */
! ending = TRUE; /* mark as end of stream */
cnt = 0; /* no new input */
/* in case the first ad_read() fails */
if (bp == 0) break;
--- 216,222 ----
if (cnt < 0) { /* end of stream or error */
if (cnt == -2) end_status = -1; /* error */
else if (cnt == -1) end_status = 0; /* end of stream */
! end_of_stream = TRUE; /* mark as end of stream */
cnt = 0; /* no new input */
/* in case the first ad_read() fails */
if (bp == 0) break;
***************
*** 228,246 ****
/* strip off invalid samples */
if (cnt > 0) {
if (strip_flag) {
! stlen = strip_zero(&(buffer[bp]), cnt);
! if (stlen != cnt) cnt = stlen;
}
}
/* len = current samples in buffer */
current_len = bp + cnt;
}
#ifdef THREAD_DEBUG
! printf("input: get %d samples\n", current_len - bp);
#endif
#ifdef HAVE_PTHREAD
! if (ad_check != NULL && !enable_thread) {
/* call check callback (called for every period) */
if ((i = (*ad_check)()) < 0) {
if ((i == -1 && current_len == 0) || i == -2) {
--- 224,248 ----
/* strip off invalid samples */
if (cnt > 0) {
if (strip_flag) {
! len = strip_zero(&(buffer[bp]), cnt);
! if (len != cnt) cnt = len;
}
}
/* len = current samples in buffer */
current_len = bp + cnt;
}
#ifdef THREAD_DEBUG
! if (end_of_stream) {
! printf("stream already ended\n");
! }
! printf("input: get %d samples [%d-%d]\n", current_len - bp, bp, current_len);
#endif
+ if (ad_check != NULL
#ifdef HAVE_PTHREAD
! && !enable_thread
! #endif
! ) {
/* call check callback (called for every period) */
if ((i = (*ad_check)()) < 0) {
if ((i == -1 && current_len == 0) || i == -2) {
***************
*** 249,267 ****
}
}
}
- #else
- if (ad_check != NULL) {
- if ((i = (*ad_check)()) < 0) {
- if ((i == -1 && current_len == 0) || i == -2) {
- end_status = -2;
- goto break_input;
- }
- }
- }
- #endif
/* set process step */
! if (ending && wstep > current_len) wstep = current_len;
/* set maximum number of processed samples per loop */
#ifdef HAVE_PTHREAD
--- 251,259 ----
}
}
}
/* set process step */
! if (end_of_stream && wstep > current_len) wstep = current_len;
/* set maximum number of processed samples per loop */
#ifdef HAVE_PTHREAD
***************
*** 271,301 ****
imax = (current_len < wstep) ? current_len : wstep;
#endif
/* proceed for each 'wstep' steps */
i = 0;
while (i + wstep <= imax) {
if (adin_cut_on) {
! /* count zero-cross and flip switches */
! /* swap samples in buffer to that of (length) step ago */
zc = count_zc_e(&(buffer[i]), wstep);
! if (zc > noise_zerocross) {
nc = 0;
! if (is_valid_data == FALSE) {
is_valid_data = TRUE; /* start to record */
}
} else if (is_valid_data == TRUE) {
/* processing tailing silence */
nc++;
}
}
if(
! (!adin_cut_on || (is_valid_data == TRUE && pre_length <= 0))
#ifdef HAVE_PTHREAD
&& (!enable_thread || transfer_online)
#endif
) {
! /* store data */
if ( ad_process != NULL ) {
/* call external function */
ad_process_ret = (*ad_process)(&(buffer[i]), wstep);
switch(ad_process_ret) {
--- 263,338 ----
imax = (current_len < wstep) ? current_len : wstep;
#endif
+ #ifdef THREAD_DEBUG
+ printf("process %d samples by %d step\n", imax, wstep);
+ #endif
+
/* proceed for each 'wstep' steps */
i = 0;
while (i + wstep <= imax) {
if (adin_cut_on) {
! /* count zero-cross */
zc = count_zc_e(&(buffer[i]), wstep);
! if (zc > noise_zerocross) { /* triggering */
nc = 0;
! if (is_valid_data == FALSE) { /* triggered */
is_valid_data = TRUE; /* start to record */
+ #ifdef THREAD_DEBUG
+ printf("detect on\n");
+ #endif
+ /* process stored samples in cycle buffer */
+ if ( ad_process != NULL
+ #ifdef HAVE_PTHREAD
+ && (!enable_thread || transfer_online)
+ #endif
+ ) {
+ zc_copy_buffer(cbuf, &len);
+ if (len - wstep > 0) {
+ #ifdef THREAD_DEBUG
+ printf("callback for buffered samples (%d bytes)\n", len - wstep);
+ #endif
+ ad_process_ret = (*ad_process)(cbuf, len - wstep);
+ switch(ad_process_ret) {
+ case 1: /* segmented */
+ end_status = 1;
+ adin_purge(i);
+ #ifdef HAVE_PTHREAD
+ if (enable_thread) { /* just stop transfer */
+ pthread_mutex_lock(&mutex);
+ transfer_online = FALSE;
+ pthread_mutex_unlock(&mutex);
+ } else {
+ goto break_input;
+ }
+ #else
+ goto break_input;
+ #endif
+ case -1: /* error */
+ end_status = -1;
+ goto break_input;
+ }
+ }
+ }
}
} else if (is_valid_data == TRUE) {
/* processing tailing silence */
+ #ifdef THREAD_DEBUG
+ printf("trailing silence\n");
+ #endif
nc++;
}
}
if(
! (!adin_cut_on || is_valid_data == TRUE)
#ifdef HAVE_PTHREAD
&& (!enable_thread || transfer_online)
#endif
) {
! /* process data */
if ( ad_process != NULL ) {
+ #ifdef THREAD_DEBUG
+ printf("callback for input sample [%d-%d]\n", i, i+wstep);
+ #endif
/* call external function */
ad_process_ret = (*ad_process)(&(buffer[i]), wstep);
switch(ad_process_ret) {
***************
*** 320,344 ****
}
}
if (adin_cut_on && is_valid_data && nc >= nc_max) {
/* end input by silence */
is_valid_data = FALSE;
- #if 0
- if ( ad_process != NULL ) {
- /* process samples in cycle buffer */
- zc_copy_buffer(lastdata, c_length); /* process them but keep */
- /* last [pre_length] samples in buffer is invalid */
- ad_process_ret = (*ad_process)(&(lastdata[pre_length]), c_length-pre_length);
- switch(ad_process_ret) {
- /*case 1:
- end_status = 1;
- adin_purge(i+wstep);
- goto break_input;*/
- case -1: /* error */
- end_status = -1;
- goto break_input;
- }
- }
- #endif
adin_purge(i+wstep);
end_status = 1;
#ifdef HAVE_PTHREAD
--- 357,367 ----
}
}
if (adin_cut_on && is_valid_data && nc >= nc_max) {
+ #ifdef THREAD_DEBUG
+ printf("detect off\n");
+ #endif
/* end input by silence */
is_valid_data = FALSE;
adin_purge(i+wstep);
end_status = 1;
#ifdef HAVE_PTHREAD
***************
*** 353,364 ****
goto break_input;
#endif
}
- if (adin_cut_on) {
- if (pre_length > 0) {
- pre_length -= wstep;
- if (pre_length < 0) pre_length = 0;
- }
- }
i += wstep;
}
--- 376,381 ----
***************
*** 367,391 ****
adin_purge(i);
/* end of input by end of stream */
! if (ending && bp == 0) {
! if (adin_cut_on && is_valid_data) {
! /* flush samples in cycle buffer */
! if (adin_cut_on) zc = count_zc_e(buffer, c_length);
! if ( ad_process != NULL ) {
! /* first [pre_length] samples in buffer is invalid when short input */
! ad_process_ret = (*ad_process)(&(buffer[pre_length]), c_length-pre_length);
! switch(ad_process_ret) {
! case 1: /* segmented */
! end_status = 1;
! goto break_input;
! case -1: /* error */
! end_status = -1;
! goto break_input;
! }
! }
! }
! break;
! }
}
break_input:
--- 384,390 ----
adin_purge(i);
/* end of input by end of stream */
! if (end_of_stream && bp == 0) break;
}
break_input:
***************
*** 398,405 ****
}
}
! if (ending) { /* input already ends */
! if (bp == 0) { /* rest buffer flushed */
/* reset status */
if (adin_cut_on) end_count_zc_e();
free(buffer);
--- 397,404 ----
}
}
! if (end_of_stream) { /* input already ends */
! if (bp == 0) { /* rest buffer successfully flushed */
/* reset status */
if (adin_cut_on) end_count_zc_e();
free(buffer);
diff -crN julius-3.3p2-multipath/libsent/src/adin/adin_mic_linux_alsa.c julius-3.3p3-multipath/libsent/src/adin/adin_mic_linux_alsa.c
*** julius-3.3p2-multipath/libsent/src/adin/adin_mic_linux_alsa.c Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/src/adin/adin_mic_linux_alsa.c Mon Nov 25 21:55:27 2002
***************
*** 4,51 ****
/* adin_mic_linux_alsa.c --- adin microphone library for ALSA native API */
! /* $Id: adin_mic_linux_alsa.c,v 1.3 2002/09/11 22:01:50 ri Exp $ */
/*
! * From rev.3.0: catch up for alsa-driver-0.5.5.
*
*/
/*
- * NOTE: the ALSA sound driver supports 2 APIs below:
- *
- * 1) ALSA native API, for new ALSA-aware sound applications.
- * 2) OSS emulation API, for legacy OSS applications.
- *
- * As though Julius supports both, you'd better use the 2) in Julius.
- * The 1) is still under heavy development now (2000/03/08) and I don't
- * want to spend much time catching up with the still-varying driver APIs...
- * :-( This implementation will work for 0.5.5, but is unstable.
- *
- * The ALSA native API will be fully supported when ALSA becomes
- * stable and full documentation becomes available. Of course patches for
- * this file or a new maintainer are always welcomed :-).
- *
- * You can explicitly select OSS API by specifying "--with-mictype=oss" option
- * at configuration.
- *
- */
-
- /*
* Use mixer program like alsamixer or alsactl to setup mic device
* (mute/unmute, volume control, etc.)
*
* !!Note that ALSA drivers first mute all audio devices by default!!
*/
! /* see http://www.alsa-prpject.org/ for information about ALSA, */
/* Advanced Linux Sound Architecture */
-
- /*
- * When multiple sound cards are found, the first one will be used.
- */
-
#include
#include
--- 4,26 ----
/* adin_mic_linux_alsa.c --- adin microphone library for ALSA native API */
! /* $Id: adin_mic_linux_alsa.c,v 1.5 2002/11/25 01:05:38 ri Exp $ */
/*
! * From rev.3.3p2: alsa-0.9.0 (tested on rc2)
*
*/
/*
* Use mixer program like alsamixer or alsactl to setup mic device
* (mute/unmute, volume control, etc.)
*
* !!Note that ALSA drivers first mute all audio devices by default!!
*/
! /* see http://www.alsa-project.org/ for information about ALSA, */
/* Advanced Linux Sound Architecture */
#include
#include
***************
*** 54,246 ****
#include
#include
! static int bufsize; /* buffer size in bytes */
static boolean need_swap; /* whether samples need byte swap */
! static int samplefreq; /* sampling frequency */
- /* sound header for ALSA */
- #include
- static int cardid, devid;
- static snd_pcm_format_t format;
- static snd_pcm_t *handle;
boolean
adin_mic_standby(int sfreq, void *dummy)
{
! int ret;
! char *cardname;
! snd_ctl_t *chandle;
! struct snd_ctl_hw_info info;
! snd_pcm_info_t pinfo;
! snd_pcm_channel_info_t cinfo;
!
! samplefreq = sfreq;
!
! cardid = 0;
! devid = 0;
!
! /* prepare recording format */
! memset(&format, 0, sizeof(snd_pcm_format_t));
! format.interleave = 1;
! #ifdef WORDS_BIGENDIAN
! format.format = SND_PCM_SFMT_S16_BE;
! need_swap = FALSE;
! #else /* little endian */
! format.format = SND_PCM_SFMT_S16_LE;
! need_swap = FALSE;
! #endif
! format.rate = samplefreq;
! format.voices = 1; /* monoral */
!
! /* determine which soundcard / device / subdevice to use */
! /* currently, first found record-capable device will be used */
! if ((ret = snd_cards()) <= 0) {
! j_printerr("Error: adin_mic_standby: no soundcards!\n");
! return FALSE;
! }
! if (ret > 1) {
! j_printf("Warning: adin_mic_standby: multiple soundcards found, using first one.\n");
! }
! cardid = 0;
! /* create handle and open communication with kernel sound control interface */
! if ((ret = snd_ctl_open(&chandle, cardid)) < 0) {
! j_printerr("Error: adin_mic_standby: ctl_open(%d): %s\n", cardid, snd_strerror(ret));
return(FALSE);
}
!
! /* get sound hardware resource and display info */
! if ( (ret = snd_ctl_hw_info( chandle, &info )) < 0 ) {
! j_printerr("Error: adin_mic_standby: ctl_hw_info(%d): %s\n", cardid, snd_strerror(ret));
! snd_ctl_close(chandle);
return(FALSE);
}
! j_printf("Sound Card %d: %s [%s]\n", cardid + 1, info.name, info.id);
! /* search for recordable device */
! j_printf("Installed PCM devices: %i\n", info.pcmdevs );
! if (info.pcmdevs < 1) {
! j_printerr("Error: no PCM devices available on this card\n");
return(FALSE);
}
- for (devid = 0; devid < info.pcmdevs; devid++) {
- if ( (ret = snd_ctl_pcm_info( chandle, devid, &pinfo )) < 0 ) {
- j_printerr("Error: adin_mic_standby: ctl_pcm_info(%d): %s\n", cardid, snd_strerror(ret));
- snd_ctl_close(chandle);
- return(FALSE);
- }
- j_printf("#%d: %s", devid, pinfo.name);
- if (pinfo.flags & SND_PCM_INFO_CAPTURE) {
- j_printf("\n");
- break; /* use first found recordable device */
- } else {
- j_printf(" --- not recordable, skipped\n");
- }
- }
- /* check for record subdevices */
- j_printf("Record subdevices in this device: %d\n", pinfo.capture + 1);
- {
- int i;
-
- for(i=0;i<=pinfo.capture; i++) {
- memset(&cinfo, 0, sizeof(snd_pcm_channel_info_t));
- cinfo.channel = SND_PCM_CHANNEL_CAPTURE;
- if ( (ret = snd_ctl_pcm_channel_info( chandle, devid, SND_PCM_CHANNEL_CAPTURE, i, &cinfo )) < 0 ) {
- j_printerr("Error: adin_mic_standby: ctl_pcm_channel_info(%d): %s\n", cardid, snd_strerror(ret));
- snd_ctl_close(chandle);
- return(FALSE);
- }
- j_printf(" #%d: %s\n", i, cinfo.subname);
- break;
- }
- }
! /* close control interface */
! if ((ret = snd_ctl_close(chandle)) < 0) {
! j_printerr("Error: adin_mic_standby: %s\n", snd_strerror(ret));
return(FALSE);
}
- return(TRUE);
- }
! /* start recording */
! boolean
! adin_mic_start()
! {
! int ret;
! snd_pcm_channel_info_t cinfo;
! static struct snd_pcm_channel_params params;
!
! /* open device */
! if ((ret = snd_pcm_open(&handle, cardid, devid, SND_PCM_OPEN_CAPTURE)) < 0 ) {
! j_printerr("Error: adin_mic_start: open error: %s\n", snd_strerror(ret));
return(FALSE);
}
! /* check recording capability */
! memset(&cinfo, 0, sizeof(snd_pcm_channel_info_t));
! cinfo.channel = SND_PCM_CHANNEL_CAPTURE;
! if ( (ret = snd_pcm_plugin_info(handle, &cinfo )) < 0 ) {
! j_printerr("Error: adin_mic_start: ctl_pcm_channel_info(%d): %s\n", cardid, snd_strerror(ret));
return(FALSE);
}
! if (cinfo.min_rate > samplefreq || cinfo.max_rate < samplefreq) {
! j_printerr("-- cannot set frequency to %dHz\n", samplefreq);
return(FALSE);
}
! if (!(cinfo.formats & (1 << format.format))) {
! j_printerr("-- 16bit recording not supported\n");
return(FALSE);
}
! /* set format */
! memset(¶ms, 0, sizeof(params));
! params.channel = SND_PCM_CHANNEL_CAPTURE;
! params.mode = SND_PCM_MODE_BLOCK;
! memcpy(¶ms.format, &format, sizeof(format));
! params.start_mode = SND_PCM_START_DATA;
! params.stop_mode = SND_PCM_STOP_STOP;
! params.buf.block.frag_size = 8192;
! params.buf.block.frags_max = -1;
! params.buf.block.frags_min = 1;
! if ((ret = snd_pcm_plugin_params(handle, ¶ms)) < 0) {
! j_printerr("Error: adin_mic_start: unable to set channel params: %s\n",snd_strerror(ret));
return(FALSE);
}
! if (snd_pcm_plugin_prepare(handle,SND_PCM_CHANNEL_CAPTURE) < 0) {
! j_printerr("Error: adin_mic_start: unable to prepare channel\n");
return(FALSE);
}
! {
! struct snd_pcm_channel_setup setup;
! memset(&setup, 0, sizeof(setup));
! setup.channel = SND_PCM_CHANNEL_CAPTURE;
! setup.mode = SND_PCM_MODE_BLOCK;
! if (snd_pcm_plugin_setup(handle, &setup) < 0) {
! j_printerr("Error: adin_mic_start: unable to obtain setup\n");
! return(FALSE);
! }
! bufsize = setup.buf.block.frag_size;
}
!
! /* set nonblock */
! if (snd_pcm_nonblock_mode(handle, 1) < 0) {
! j_printerr("Error: adin_mic_start: unable to prepare channel\n");
return(FALSE);
}
return(TRUE);
}
/* stop recording */
boolean
adin_mic_stop()
{
- int ret;
- /* close device */
- if ((ret = snd_pcm_close(handle)) < 0) {
- j_printerr("Error: adin_mic_stop: unable to stop recording: %s\n",snd_strerror(ret));
- return(FALSE);
- }
return(TRUE);
}
--- 29,251 ----
#include
#include
! #include
!
! static snd_pcm_t *handle; /* audio handler */
! static snd_pcm_hw_params_t *hwparams; /* store device hardware parameters */
! static char *pcm_name = "hw:0,0"; /* name of the PCM device */
!
static boolean need_swap; /* whether samples need byte swap */
! static int latency = 50; /* lantency time (msec) */
!
! static struct pollfd *ufds;
! static int count;
boolean
adin_mic_standby(int sfreq, void *dummy)
{
! int err;
! int exact_rate; /* sample rate returned by hardware */
! int dir; /* comparison result of exact rate and given rate */
!
! /* allocate hwparam structure */
! snd_pcm_hw_params_alloca(&hwparams);
!
! /* open device (for resource test, open in non-block mode) */
! if ((err = snd_pcm_open(&handle, pcm_name, SND_PCM_STREAM_CAPTURE, SND_PCM_NONBLOCK)) < 0) {
! j_printerr("Error: cannot open PCM device %s (%s)\n", pcm_name, snd_strerror(err));
return(FALSE);
}
!
! /* set device to non-block mode */
! if ((err = snd_pcm_nonblock(handle, 0)) < 0) {
! j_printerr("Error: cannot set PCM device to block mode\n");
return(FALSE);
}
!
! /* initialize hwparam structure */
! if ((err = snd_pcm_hw_params_any(handle, hwparams)) < 0) {
! j_printerr("Error: cannot initialize PCM device parameter structure (%s)\n", snd_strerror(err));
return(FALSE);
}
! /* set interleaved read/write format */
! if ((err = snd_pcm_hw_params_set_access(handle, hwparams, SND_PCM_ACCESS_RW_INTERLEAVED)) < 0) {
! j_printerr("Error: cannot set PCM device access mode (%s)\n", snd_strerror(err));
return(FALSE);
}
! /* set sample format */
! #ifdef WORDS_BIGENDIAN
! /* try big endian, then little endian with byte swap */
! if ((err = snd_pcm_hw_params_set_format(handle, hwparams, SND_PCM_FORMAT_S16_BE)) >= 0) {
! need_swap = FALSE;
! } else if ((err = snd_pcm_hw_params_set_format(handle, hwparams, SND_PCM_FORMAT_S16_LE)) >= 0) {
! need_swap = TRUE;
! } else {
! j_printerr("Error: cannot set PCM device format to 16bit-signed (%s)\n", snd_strerror(err));
return(FALSE);
}
! #else /* LITTLE ENDIAN */
! /* try little endian, then big endian with byte swap */
! if ((err = snd_pcm_hw_params_set_format(handle, hwparams, SND_PCM_FORMAT_S16_LE)) >= 0) {
! need_swap = FALSE;
! } else if ((err = snd_pcm_hw_params_set_format(handle, hwparams, SND_PCM_FORMAT_S16_BE)) >= 0) {
! need_swap = TRUE;
! } else {
! j_printerr("Error: cannot set PCM device format to 16bit-signed (%s)\n", snd_strerror(err));
return(FALSE);
}
! #endif
!
! /* set sample rate (if the exact rate is not supported by the hardware, use nearest possible rate */
! exact_rate = snd_pcm_hw_params_set_rate_near(handle, hwparams, sfreq, &dir);
! if (exact_rate < 0) {
! j_printerr("Error: cannot set PCM device sample rate to %d (%s)\n", sfreq, snd_strerror(err));
return(FALSE);
}
! if (dir != 0) {
! j_printerr("Warning: the rate %d Hz is not supported by your PCM hardware.\n ==> Using %d Hz instead.\n", sfreq, exact_rate);
! }
!
! /* set number of channels */
! if ((err = snd_pcm_hw_params_set_channels(handle, hwparams, 1)) < 0) {
! j_printerr("Error: cannot set PCM monoral channel (%s)\n", snd_strerror(err));
return(FALSE);
}
!
! /* set period size */
! {
! int periodsize; /* period size (bytes) */
! int exact_size;
! int maxsize, minsize;
! /* get hardware max/min size */
! dir = 0;
! maxsize = snd_pcm_hw_params_get_period_size_max(hwparams, &dir);
! minsize = snd_pcm_hw_params_get_period_size_min(hwparams, &dir);
!
! /* set apropriate period size */
! periodsize = exact_rate * latency / 1000 * sizeof(SP16);
! if (periodsize < minsize) {
! j_printerr("Warning: PCM latency of %d ms (%d bytes) too small, use device minimum %d bytes\n", latency, periodsize, minsize);
! periodsize = minsize;
! } else if (periodsize > maxsize) {
! j_printerr("Warning: PCM latency of %d ms (%d bytes) too large, use device maximum %d bytes\n", latency, periodsize, maxsize);
! periodsize = maxsize;
! }
!
! /* set size (near value will be used) */
! exact_size = snd_pcm_hw_params_set_period_size_near(handle, hwparams, periodsize, &dir);
! if (exact_size < 0) {
! j_printerr("Error: cannot set PCM record period size to %d (%s)\n", periodsize, snd_strerror(err));
! return(FALSE);
! }
! if (dir != 0) {
! j_printerr("Warning: PCM period size: %d bytes (%d ms) -> %d bytes\n", periodsize, latency, exact_size);
! }
! /* set number of periods ( = 2) */
! if ((err = snd_pcm_hw_params_set_periods(handle, hwparams, 2, 0)) < 0) {
! j_printerr("Error: cannot set PCM number of periods to %d (%s)\n", 1, snd_strerror(err));
return(FALSE);
}
+ }
! /* apply the configuration to the PCM device */
! if ((err = snd_pcm_hw_params(handle, hwparams)) < 0) {
! j_printerr("Error: cannot set PCM hardware parameters (%s)\n", snd_strerror(err));
return(FALSE);
}
! /* prepare for recording */
! if ((err = snd_pcm_prepare(handle)) < 0) {
! j_printerr("Error: cannot prepare audio interface (%s)\n", snd_strerror(err));
}
!
! /* prepare for polling */
! count = snd_pcm_poll_descriptors_count(handle);
! if (count <= 0) {
! j_printerr("Error: invalid PCM poll descriptors count\n");
! return(FALSE);
! }
! ufds = mymalloc(sizeof(struct pollfd) * count);
!
! if ((err = snd_pcm_poll_descriptors(handle, ufds, count)) < 0) {
! j_printerr("Error: unable to obtain poll descriptors for PCM recording (%s)\n", snd_strerror(err));
return(FALSE);
}
return(TRUE);
}
+
+ static int
+ xrun_recovery(snd_pcm_t *handle, int err)
+ {
+ if (err == -EPIPE) { /* under-run */
+ err = snd_pcm_prepare(handle);
+ if (err < 0)
+ j_printerr("Can't recovery from PCM buffer underrun, prepare failed: %s\n", snd_strerror(err));
+ return 0;
+ } else if (err == -ESTRPIPE) {
+ while ((err = snd_pcm_resume(handle)) == -EAGAIN)
+ sleep(1); /* wait until the suspend flag is released */
+ if (err < 0) {
+ err = snd_pcm_prepare(handle);
+ if (err < 0)
+ j_printerr("Can't recovery from PCM buffer suspend, prepare failed: %s\n", snd_strerror(err));
+ }
+ return 0;
+ }
+ return err;
+ }
+
+ /* start recording */
+ boolean
+ adin_mic_start()
+ {
+ int err;
+ snd_pcm_state_t status;
+
+ /* check hardware status */
+ while(1) { /* wait till prepared */
+ status = snd_pcm_state(handle);
+ switch(status) {
+ case SND_PCM_STATE_PREPARED: /* prepared for operation */
+ if ((err = snd_pcm_start(handle)) < 0) {
+ j_printerr("Error: cannot start PCM (%s)\n", snd_strerror(err));
+ return (FALSE);
+ }
+ return(TRUE);
+ break;
+ case SND_PCM_STATE_RUNNING: /* capturing the samples of other application */
+ if ((err = snd_pcm_drop(handle)) < 0) { /* discard the existing samples */
+ j_printerr("Error: cannot drop PCM (%s)\n", snd_strerror(err));
+ return (FALSE);
+ }
+ break;
+ case SND_PCM_STATE_XRUN: /* buffer overrun */
+ if ((err = xrun_recovery(handle, -EPIPE)) < 0) {
+ j_printerr("Error: PCM XRUN recovery failed (%s)\n", snd_strerror(err));
+ return(FALSE);
+ }
+ break;
+ case SND_PCM_STATE_SUSPENDED: /* suspended by power management system */
+ if ((err = xrun_recovery(handle, -ESTRPIPE)) < 0) {
+ j_printerr("Error: PCM XRUN recovery failed (%s)\n", snd_strerror(err));
+ return(FALSE);
+ }
+ break;
+ }
+ }
+
+ return(TRUE);
+ }
/* stop recording */
boolean
adin_mic_stop()
{
return(TRUE);
}
***************
*** 250,285 ****
adin_mic_read(SP16 *buf, int sampnum)
{
int cnt;
! int tsamp, size;
! tsamp = 0;
! if (tsamp + (bufsize / sizeof(SP16)) > sampnum) bufsize = (sampnum - tsamp) * sizeof(SP16);
! /* wait till first input */
! do {
! cnt = snd_pcm_plugin_read(handle, buf, bufsize);
! if (cnt > 0) break;
! if (cnt == -EPIPE || cnt == -EAGAIN) {
! cnt = 0;
! }
! if (cnt < 0) {
! j_printerr("Error: adin_mic_read: %s\n", snd_strerror(cnt));
! return(-2);
! }
! usleep(10000);
! } while (cnt == 0);
!
! do {
! tsamp += cnt / sizeof(SP16);
! if (tsamp >= sampnum) break;
! if (tsamp + (bufsize / sizeof(SP16)) > sampnum) bufsize = (sampnum - tsamp) * sizeof(SP16);
! cnt = snd_pcm_plugin_read(handle, buf + tsamp, bufsize);
! } while (cnt > 0);
! if (cnt == -EPIPE || cnt == -EAGAIN) {
! cnt = 0;
}
if (cnt < 0) {
! j_printerr("Error: adin_mic_read: %s\n", snd_strerror(cnt));
return(-2);
}
! return(tsamp);
}
--- 255,278 ----
adin_mic_read(SP16 *buf, int sampnum)
{
int cnt;
! snd_pcm_sframes_t avail;
! while ((avail = snd_pcm_avail_update(handle)) <= 0) {
! usleep(latency * 1000);
! }
! if (avail < sampnum) {
! cnt = snd_pcm_readi(handle, buf, avail);
! } else {
! cnt = snd_pcm_readi(handle, buf, sampnum);
}
+
if (cnt < 0) {
! j_printerr("Error: PCM read failed (%s)\n", snd_strerror(cnt));
return(-2);
}
!
! if (need_swap) {
! swap_sample_bytes(buf, cnt);
! }
! return(cnt);
}
diff -crN julius-3.3p2-multipath/libsent/src/adin/adin_mic_linux_oss.c julius-3.3p3-multipath/libsent/src/adin/adin_mic_linux_oss.c
*** julius-3.3p2-multipath/libsent/src/adin/adin_mic_linux_oss.c Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/src/adin/adin_mic_linux_oss.c Mon Nov 25 21:55:27 2002
***************
*** 4,10 ****
/* adin_mic_linux_oss.c --- adin microphone library for OSS API */
! /* $Id: adin_mic_linux_oss.c,v 1.3 2002/09/11 22:01:50 ri Exp $ */
/* for standard sound drivers in linux-2.0.x, 2.2.x */
/* for OSS/Linux, OSS/Free or other OSS compatible API */
--- 4,10 ----
/* adin_mic_linux_oss.c --- adin microphone library for OSS API */
! /* $Id: adin_mic_linux_oss.c,v 1.4 2002/11/24 14:14:33 ri Exp $ */
/* for standard sound drivers in linux-2.0.x, 2.2.x */
/* for OSS/Linux, OSS/Free or other OSS compatible API */
***************
*** 38,44 ****
struct pollfd fds[1]; /* structure for polling */
#define FREQALLOWRANGE 200 /* acceptable sampling frequency width around 16kHz */
! #define POLLINTERVAL 200 /* in miliseconds */
/* check audio port resource and initialize */
--- 38,44 ----
struct pollfd fds[1]; /* structure for polling */
#define FREQALLOWRANGE 200 /* acceptable sampling frequency width around 16kHz */
! #define POLLINTERVAL 50 /* in miliseconds */
/* check audio port resource and initialize */
***************
*** 53,59 ****
int stereo; /* mono */
/* open device */
! if ((audio_fd = open("/dev/dsp", O_RDONLY)) == -1) {
perror("adin_mic_standby: open /dev/dsp");
return(FALSE);
}
--- 53,59 ----
int stereo; /* mono */
/* open device */
! if ((audio_fd = open("/dev/dsp", O_RDONLY|O_NONBLOCK)) == -1) {
perror("adin_mic_standby: open /dev/dsp");
return(FALSE);
}
diff -crN julius-3.3p2-multipath/libsent/src/adin/adin_sndfile.c julius-3.3p3-multipath/libsent/src/adin/adin_sndfile.c
*** julius-3.3p2-multipath/libsent/src/adin/adin_sndfile.c Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/src/adin/adin_sndfile.c Mon Nov 25 21:55:27 2002
***************
*** 4,12 ****
/* adin_sndfile.c --- A/D-in functions wave file using libsndfile */
! /* $Id: adin_sndfile.c,v 1.4 2002/09/11 22:01:50 ri Exp $ */
! /* requires libsndfile (tested on 0.0.23)
http://www.zip.com.au/~erikd/libsndfile/ */
#include
--- 4,12 ----
/* adin_sndfile.c --- A/D-in functions wave file using libsndfile */
! /* $Id: adin_sndfile.c,v 1.5 2002/11/19 04:03:29 ri Exp $ */
! /* requires libsndfile (tested on 0.0.23 and 1.0.1)
http://www.zip.com.au/~erikd/libsndfile/ */
#include
***************
*** 35,44 ****
--- 35,51 ----
j_printerr("adin_sndfile: channel num != 1, it has %d channels\n", s->channels);
return FALSE;
}
+ #ifdef HAVE_LIBSNDFILE_VER1
+ if ((s->format & SF_FORMAT_SUBMASK) != SF_FORMAT_PCM_16) {
+ j_printerr("adin_sndfile1: not 16-bit data\n");
+ return FALSE;
+ }
+ #else
if (s->pcmbitwidth != 16) {
j_printerr("adin_sndfile: not 16-bit data, it's %d bit\n", s->pcmbitwidth);
return FALSE;
}
+ #endif
return TRUE;
}
***************
*** 51,65 ****
case SF_FORMAT_WAV: j_printf("Microsoft WAV"); break;
case SF_FORMAT_AIFF: j_printf("Apple/SGI AIFF"); break;
case SF_FORMAT_AU: j_printf("Sun/NeXT AU"); break;
case SF_FORMAT_AULE: j_printf("DEC AU"); break;
case SF_FORMAT_RAW: j_printf("RAW"); break;
case SF_FORMAT_PAF: j_printf("Ensoniq PARIS"); break;
case SF_FORMAT_SVX: j_printf("Amiga IFF / SVX8 / SV16"); break;
case SF_FORMAT_NIST: j_printf("Sphere NIST"); break;
! case SF_FORMAT_WMA: j_printf("Windows Media Audio"); break;
! case SF_FORMAT_SMPLTD: j_printf("Sekd Samplitude"); break;
}
switch(s->format & SF_FORMAT_SUBMASK) {
case SF_FORMAT_PCM: j_printf(", PCM"); break;
case SF_FORMAT_FLOAT: j_printf(", floats"); break;
case SF_FORMAT_ULAW: j_printf(", U-Law"); break;
--- 58,97 ----
case SF_FORMAT_WAV: j_printf("Microsoft WAV"); break;
case SF_FORMAT_AIFF: j_printf("Apple/SGI AIFF"); break;
case SF_FORMAT_AU: j_printf("Sun/NeXT AU"); break;
+ #ifndef HAVE_LIBSNDFILE_VER1
case SF_FORMAT_AULE: j_printf("DEC AU"); break;
+ #endif
case SF_FORMAT_RAW: j_printf("RAW"); break;
case SF_FORMAT_PAF: j_printf("Ensoniq PARIS"); break;
case SF_FORMAT_SVX: j_printf("Amiga IFF / SVX8 / SV16"); break;
case SF_FORMAT_NIST: j_printf("Sphere NIST"); break;
! #ifdef HAVE_LIBSNDFILE_VER1
! case SF_FORMAT_VOC: j_printf("VOC file"); break;
! case SF_FORMAT_IRCAM: j_printf("Berkeley/IRCAM/CARL"); break;
! case SF_FORMAT_W64: j_printf("Sonic Foundry's 64bit RIFF/WAV"); break;
! case SF_FORMAT_MAT4: j_printf("Matlab (tm) V4.2 / GNU Octave 2.0"); break;
! case SF_FORMAT_MAT5: j_printf("Matlab (tm) V5.0 / GNU Octave 2.1"); break;
! #endif
! default: j_printf("UNKNOWN TYPE"); break;
}
switch(s->format & SF_FORMAT_SUBMASK) {
+ #ifdef HAVE_LIBSNDFILE_VER1
+ case SF_FORMAT_PCM_U8: j_printf(", Unsigned 8 bit PCM"); break;
+ case SF_FORMAT_PCM_S8: j_printf(", Signed 8 bit PCM"); break;
+ case SF_FORMAT_PCM_16: j_printf(", Signed 16 bit PCM"); break;
+ case SF_FORMAT_PCM_24: j_printf(", Signed 24 bit PCM"); break;
+ case SF_FORMAT_PCM_32: j_printf(", Signed 32 bit PCM"); break;
+ case SF_FORMAT_FLOAT: j_printf(", 32bit float"); break;
+ case SF_FORMAT_DOUBLE: j_printf(", 64bit float"); break;
+ case SF_FORMAT_ULAW: j_printf(", U-Law"); break;
+ case SF_FORMAT_ALAW: j_printf(", A-Law"); break;
+ case SF_FORMAT_IMA_ADPCM: j_printf(", IMA ADPCM"); break;
+ case SF_FORMAT_MS_ADPCM: j_printf(", Microsoft ADPCM"); break;
+ case SF_FORMAT_GSM610: j_printf(", GSM 6.10, "); break;
+ case SF_FORMAT_G721_32: j_printf(", 32kbs G721 ADPCM"); break;
+ case SF_FORMAT_G723_24: j_printf(", 24kbs G723 ADPCM"); break;
+ case SF_FORMAT_G723_40: j_printf(", 40kbs G723 ADPCM"); break;
+ #else
case SF_FORMAT_PCM: j_printf(", PCM"); break;
case SF_FORMAT_FLOAT: j_printf(", floats"); break;
case SF_FORMAT_ULAW: j_printf(", U-Law"); break;
***************
*** 75,82 ****
--- 107,127 ----
case SF_FORMAT_GSM610: j_printf(", GSM 6.10, "); break;
case SF_FORMAT_G721_32: j_printf(", 32kbs G721 ADPCM"); break;
case SF_FORMAT_G723_24: j_printf(", 24kbs G723 ADPCM"); break;
+ #endif
+ default: j_printf(", UNKNOWN SUBTYPE"); break;
}
+
+ #ifdef HAVE_LIBSNDFILE_VER1
+ switch(s->format & SF_FORMAT_ENDMASK) {
+ case SF_ENDIAN_FILE: j_printf(", file native endian"); break;
+ case SF_ENDIAN_LITTLE: j_printf(", forced little endian"); break;
+ case SF_ENDIAN_BIG: j_printf(", forced big endian"); break;
+ case SF_ENDIAN_CPU: j_printf(", forced CPU native endian"); break;
+ }
+ j_printf(", %d Hz, %d channels\n", s->samplerate, s->channels);
+ #else
j_printf(", %d bit, %d Hz, %d channels\n", s->pcmbitwidth, s->samplerate, s->channels);
+ #endif
}
***************
*** 132,148 ****
if (speechfilename == NULL) return (FALSE); /* end of input */
}
/* open input file */
sinfo.samplerate = sfreq;
sinfo.pcmbitwidth = 16;
sinfo.channels = 1;
sinfo.format = 0x0;
! if ((sp = sf_open_read(speechfilename, &sinfo)) == NULL) {
/* retry assuming raw format */
sinfo.samplerate = sfreq;
- sinfo.pcmbitwidth = 16;
sinfo.channels = 1;
sinfo.format = SF_FORMAT_RAW | SF_FORMAT_PCM_BE;
! if ((sp = sf_open_read(speechfilename, &sinfo)) == NULL) {
sf_perror(sp);
j_printerr("Error in opening speech data: \"%s\"\n",speechfilename);
}
--- 177,211 ----
if (speechfilename == NULL) return (FALSE); /* end of input */
}
/* open input file */
+ #ifndef HAVE_LIBSNDFILE_VER1
sinfo.samplerate = sfreq;
sinfo.pcmbitwidth = 16;
sinfo.channels = 1;
+ #endif
sinfo.format = 0x0;
! if ((sp =
! #ifdef HAVE_LIBSNDFILE_VER1
! sf_open(speechfilename, SFM_READ, &sinfo)
! #else
! sf_open_read(speechfilename, &sinfo)
! #endif
! ) == NULL) {
/* retry assuming raw format */
sinfo.samplerate = sfreq;
sinfo.channels = 1;
+ #ifdef HAVE_LIBSNDFILE_VER1
+ sinfo.format = SF_FORMAT_RAW | SF_FORMAT_PCM_16 | SF_ENDIAN_BIG;
+ #else
+ sinfo.pcmbitwidth = 16;
sinfo.format = SF_FORMAT_RAW | SF_FORMAT_PCM_BE;
! #endif
! if ((sp =
! #ifdef HAVE_LIBSNDFILE_VER1
! sf_open(speechfilename, SFM_READ, &sinfo)
! #else
! sf_open_read(speechfilename, &sinfo)
! #endif
! ) == NULL) {
sf_perror(sp);
j_printerr("Error in opening speech data: \"%s\"\n",speechfilename);
}
***************
*** 150,155 ****
--- 213,219 ----
if (sp != NULL) { /* open success */
if (! check_format(&sinfo)) {
j_printerr("Error: invalid format: \"%s\"\n",speechfilename);
+ print_format(&sinfo);
} else {
j_printf("\ninput speechfile: %s\n",speechfilename);
print_format(&sinfo);
diff -crN julius-3.3p2-multipath/libsent/src/adin/adin_tcpip.c julius-3.3p3-multipath/libsent/src/adin/adin_tcpip.c
*** julius-3.3p2-multipath/libsent/src/adin/adin_tcpip.c Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/src/adin/adin_tcpip.c Tue Dec 3 11:35:26 2002
***************
*** 4,16 ****
/* adin_tcpip.c --- adin via TCP/IP network */
! /* $Id: adin_tcpip.c,v 1.3 2002/09/11 22:01:50 ri Exp $ */
/* ignore sampling frequency config. ... incoming speech stream via socket is undoubtedly accepted without qualification. Be care! */
#include
#include
#include
static int adinnet_sd = -1; /* socket for adinserv */
static int adinnet_asd = -1; /* socket for adinserv */
--- 4,17 ----
/* adin_tcpip.c --- adin via TCP/IP network */
! /* $Id: adin_tcpip.c,v 1.4 2002/12/02 06:02:25 ri Exp $ */
/* ignore sampling frequency config. ... incoming speech stream via socket is undoubtedly accepted without qualification. Be care! */
#include
#include
#include
+ #include
static int adinnet_sd = -1; /* socket for adinserv */
static int adinnet_asd = -1; /* socket for adinserv */
***************
*** 80,95 ****
adin_tcpip_read(SP16 *buf, int sampnum)
{
int cnt, ret;
! ret = rd(adinnet_asd, (char *)buf, &cnt, sampnum * sizeof(SP16));
! if (ret == 0) {
! /* end of segment mark */
! last_is_segmented = TRUE;
! return -1;
}
! if (ret < 0) {
! /* end of input, mark */
! last_is_segmented = FALSE;
! return -1;
}
cnt /= sizeof(SP16);
return cnt;
--- 81,111 ----
adin_tcpip_read(SP16 *buf, int sampnum)
{
int cnt, ret;
! struct pollfd p;
! int status;
!
! /* check if some commands are waiting in queue */
! p.fd = adinnet_asd;
! p.events = POLLIN;
! status = poll(&p, 1, 50);
! if (status < 0) {
! j_printerr("adin_tcpip_read: cannot poll\n");
! return -2; /* error */
}
! if (status > 0) { /* there are some data */
! ret = rd(adinnet_asd, (char *)buf, &cnt, sampnum * sizeof(SP16));
! if (ret == 0) {
! /* end of segment mark */
! last_is_segmented = TRUE;
! return -1;
! }
! if (ret < 0) {
! /* end of input, mark */
! last_is_segmented = FALSE;
! return -1;
! }
! } else {
! cnt = 0; /* no data */
}
cnt /= sizeof(SP16);
return cnt;
diff -crN julius-3.3p2-multipath/libsent/src/adin/zc-e.c julius-3.3p3-multipath/libsent/src/adin/zc-e.c
*** julius-3.3p2-multipath/libsent/src/adin/zc-e.c Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/src/adin/zc-e.c Mon Nov 25 21:55:27 2002
***************
*** 4,10 ****
/* zc-e.c --- count zerocross and level */
! /* $Id: zc-e.c,v 1.2 2002/09/11 22:01:50 ri Exp $ */
/**check no sound in monoral input data **/
/* Sat Feb 19 13:48:00 JST 1994 */
--- 4,10 ----
/* zc-e.c --- count zerocross and level */
! /* $Id: zc-e.c,v 1.3 2002/11/25 02:23:29 ri Exp $ */
/**check no sound in monoral input data **/
/* Sat Feb 19 13:48:00 JST 1994 */
***************
*** 29,34 ****
--- 29,35 ----
static int sign; /* sign of sample */
static int is_trig; /* threshold */
static int top; /* current pointer of buffer */
+ static int valid_len; /* valid samples in buffer (for short input) */
/* initialize all parameter and buffer */
void
***************
*** 44,49 ****
--- 45,51 ----
is_trig = FALSE;
sign = POSITIVE;
top = 0;
+ valid_len = 0;
/* data spool for header-margin */
data = (SP16 *)mymalloc(length * sizeof(SP16));
***************
*** 52,58 ****
for (i=0; itrigger) {
is_trig = TRUE;
}
- tmp = data[top];
data[top] = buf[i];
! buf[i] = tmp;
! if (++top==length) {
top = 0;
}
}
--- 95,104 ----
if (abs(tmp)>trigger) {
is_trig = TRUE;
}
data[top] = buf[i];
! top++;
! if (valid_len < top) valid_len = top;
! if (top >= length) {
top = 0;
}
}
***************
*** 107,120 ****
/* count zerocross and level for 'buf[0..step-1]' */
/* also calculates max level and store to '*levelp' */
- /* the contents are swapped by cycle-buffer */
/* returns number of zero-crosses on the (swapped) buffer */
int
count_zc_e_level(SP16 *buf,int step,int *levelp)
{
int i;
! SP16 tmp;
! SP16 level;
level = 0;
for (i=0; ilevel) level = abs(tmp);
- tmp = data[top];
data[top] = buf[i];
! buf[i] = tmp;
! if (++top==length) {
top = 0;
}
}
--- 139,148 ----
is_trig = TRUE;
}
if (abs(tmp)>level) level = abs(tmp);
data[top] = buf[i];
! top++;
! if (valid_len < top) valid_len = top;
! if (top >= length) {
top = 0;
}
}
***************
*** 153,164 ****
}
void
! zc_copy_buffer(SP16 *newbuf, int len)
{
int i, t;
! t = top;
! for(i=0;i
#include
--- 4,10 ----
/* init_dfa.c --- initialize DFA */
! /* $Id: init_dfa.c,v 1.5 2003/01/06 08:05:37 ri Exp $ */
#include
#include
***************
*** 58,84 ****
j_printerr("done\n");
}
! /* set dfa->sp_id and dfa->is_sp from sp_name */
void
! dfa_find_pause_word(DFA_INFO *dfa, WORD_INFO *winfo, HTK_HMM_INFO *hmminfo, char *sp_name)
{
HMM_Logical *sphmm;
! int i, t;
WORD_ID w;
dfa->sp_id = WORD_INVALID;
dfa->is_sp = (boolean *)mymalloc(sizeof(boolean) * dfa->term_num);
for(t=0;tterm_num;t++) dfa->is_sp[t] = FALSE;
! if ((sphmm = htk_hmmdata_lookup_logical(hmminfo, sp_name)) != NULL) {
! for(t=0;tterm_num;t++) {
! for(i=0;iterm.wnum[t]; i++) {
! w = dfa->term.tw[t][i];
! if (winfo->wlen[w]==1 && winfo->wseq[w][0] == sphmm) {
! if (dfa->sp_id == WORD_INVALID) dfa->sp_id = w;
! dfa->is_sp[t] = TRUE;
! break;
! }
}
}
}
--- 58,84 ----
j_printerr("done\n");
}
! /* set dfa->sp_id and dfa->is_sp[cate] from hmminfo->sp */
void
! dfa_find_pause_word(DFA_INFO *dfa, WORD_INFO *winfo, HTK_HMM_INFO *hmminfo)
{
HMM_Logical *sphmm;
! int i, t,p;
WORD_ID w;
dfa->sp_id = WORD_INVALID;
dfa->is_sp = (boolean *)mymalloc(sizeof(boolean) * dfa->term_num);
for(t=0;tterm_num;t++) dfa->is_sp[t] = FALSE;
! for(t=0;tterm_num;t++) {
! for(i=0;iterm.wnum[t]; i++) {
! w = dfa->term.tw[t][i];
! p = 0;
! while(p < winfo->wlen[w] && winfo->wseq[w][p] == hmminfo->sp) p++;
! if (p >= winfo->wlen[w]) { /* w consists of only hmminfo->sp model */
! dfa->is_sp[t] = TRUE;
! if (dfa->sp_id == WORD_INVALID) dfa->sp_id = w;
! break; /* mark this category if at least 1 sp_word was found */
}
}
}
diff -crN julius-3.3p2-multipath/libsent/src/dfa/mkcpair.c julius-3.3p3-multipath/libsent/src/dfa/mkcpair.c
*** julius-3.3p2-multipath/libsent/src/dfa/mkcpair.c Thu Sep 12 07:12:04 2002
--- julius-3.3p3-multipath/libsent/src/dfa/mkcpair.c Mon Jan 6 17:07:47 2003
***************
*** 4,10 ****
/* mkcpair.c --- extract category-pair constraint from DFA */
! /* $Id: mkcpair.c,v 1.5 2002/09/11 22:01:50 ri Exp $ */
#include
#include
--- 4,10 ----
/* mkcpair.c --- extract category-pair constraint from DFA */
! /* $Id: mkcpair.c,v 1.6 2003/01/06 08:05:37 ri Exp $ */
#include
#include
***************
*** 32,38 ****
if ((dinfo->st[i].status & INITIAL_S) != 0) { /* arc from initial state */
for (arc_r = dinfo->st[i].arc; arc_r; arc_r = arc_r->next) {
if (dinfo->is_sp[arc_r->label]) {
! j_error("Error: skipable sp should not appear at end of sentence\n");
}
set_dfa_cp_end(dinfo, arc_r->label, TRUE);
}
--- 32,38 ----
if ((dinfo->st[i].status & INITIAL_S) != 0) { /* arc from initial state */
for (arc_r = dinfo->st[i].arc; arc_r; arc_r = arc_r->next) {
if (dinfo->is_sp[arc_r->label]) {
! j_error("Error: skippable sp should not appear at end of sentence\n");
}
set_dfa_cp_end(dinfo, arc_r->label, TRUE);
}
***************
*** 41,47 ****
left = arc_l->label;
if ((dinfo->st[arc_l->to_state].status & ACCEPT_S) != 0) {/* arc to accept state */
if (dinfo->is_sp[left]) {
! j_error("Error: skipable sp should not appear at beginning of sentence\n");
}
set_dfa_cp_begin(dinfo, left, TRUE);
}
--- 41,47 ----
left = arc_l->label;
if ((dinfo->st[arc_l->to_state].status & ACCEPT_S) != 0) {/* arc to accept state */
if (dinfo->is_sp[left]) {
! j_error("Error: skippable sp should not appear at beginning of sentence\n");
}
set_dfa_cp_begin(dinfo, left, TRUE);
}
***************
*** 52,58 ****
if (dinfo->is_sp[right]) {
for (arc_r2 = dinfo->st[arc_r->to_state].arc; arc_r2; arc_r2 = arc_r2->next) {
if (dinfo->is_sp[arc_r2->label]) { /* sp model continues twice */
! j_error("Error: skipable sp should not repeat\n");
}
set_dfa_cp(dinfo, arc_r2->label, left, TRUE);
}
--- 52,58 ----
if (dinfo->is_sp[right]) {
for (arc_r2 = dinfo->st[arc_r->to_state].arc; arc_r2; arc_r2 = arc_r2->next) {
if (dinfo->is_sp[arc_r2->label]) { /* sp model continues twice */
! j_error("Error: skippable sp should not repeat\n");
}
set_dfa_cp(dinfo, arc_r2->label, left, TRUE);
}
diff -crN julius-3.3p2-multipath/libsent/src/hmminfo/init_phmm.c julius-3.3p3-multipath/libsent/src/hmminfo/init_phmm.c
*** julius-3.3p2-multipath/libsent/src/hmminfo/init_phmm.c Tue Oct 29 14:42:45 2002
--- julius-3.3p3-multipath/libsent/src/hmminfo/init_phmm.c Tue Jan 7 22:39:42 2003
***************
*** 4,10 ****
/* init_phmm.c --- read in hmmdefs file & initialize */
! /* $Id: init_phmm.c,v 1.3 2002/09/11 22:01:50 ri Exp $ */
#include
#include
--- 4,10 ----
/* init_phmm.c --- read in hmmdefs file & initialize */
! /* $Id: init_phmm.c,v 1.4 2003/01/06 08:05:37 ri Exp $ */
#include
#include
***************
*** 37,45 ****
if (fclose_readfile(fp) < 0) {
j_error("failed to close %s\n", hmmfilename);
}
- /* extract basephone */
- make_hmm_basephone_list(hmminfo);
- j_printerr(" base phones: %5d\n", hmminfo->basephone.num);
j_printerr(" defined HMMs: %5d\n", hmminfo->totalhmmnum);
--- 37,42 ----
***************
*** 62,67 ****
--- 59,68 ----
j_printerr(" logical names: %5d\n", hmminfo->totallogicalnum);
}
+ /* extract basephone */
+ make_hmm_basephone_list(hmminfo);
+ j_printerr(" base phones: %5d used in logical\n", hmminfo->basephone.num);
+
/* Guess we need to handle context dependency */
/* (word-internal CD is done statically, cross-word CD dynamically */
if (guess_if_cd_hmm(hmminfo)) {
***************
*** 69,74 ****
--- 70,89 ----
} else {
hmminfo->is_triphone = FALSE;
}
+
+ hmminfo->sp = NULL;
j_printerr("done\n");
+ }
+
+ void
+ htk_hmm_set_pause_model(HTK_HMM_INFO *hmminfo, char *spmodel_name)
+ {
+ HMM_Logical *l;
+
+ l = htk_hmmdata_lookup_logical(hmminfo, spmodel_name);
+ if (l == NULL) {
+ j_printerr("Warning: no model named as \"%s\"\n", spmodel_name);
+ }
+ hmminfo->sp = l;
}
diff -crN julius-3.3p2-multipath/libsent/src/phmm/mkwhmm.c julius-3.3p3-multipath/libsent/src/phmm/mkwhmm.c
*** julius-3.3p2-multipath/libsent/src/phmm/mkwhmm.c Thu Oct 17 21:54:23 2002
--- julius-3.3p3-multipath/libsent/src/phmm/mkwhmm.c Mon Jan 6 18:10:04 2003
***************
*** 11,23 ****
/* calculate state length length */
static int
! totalstatelen(HMM_Logical **hdseq, int hdseqlen)
{
int i, len;
len = 0;
for (i=0;isp == NULL) j_error("Error: no hmminfo->sp!!\n");
+ len += hmm_logical_state_num(hmminfo->sp) - 2;
+ }
}
return(len+2);
}
***************
*** 39,45 ****
/* LM prob will be assigned for cross-word arcs */
/* new HMM is malloced and returned */
HMM *
! new_make_word_hmm_with_lm(HTK_HMM_INFO *hmminfo, HMM_Logical **hdseq, int hdseqlen, LOGPROB *lscore)
{
HMM *new;
int i,j,n;
--- 43,49 ----
/* LM prob will be assigned for cross-word arcs */
/* new HMM is malloced and returned */
HMM *
! new_make_word_hmm_with_lm(HTK_HMM_INFO *hmminfo, HMM_Logical **hdseq, int hdseqlen, boolean *has_sp, LOGPROB *lscore)
{
HMM *new;
int i,j,n;
***************
*** 50,56 ****
int state_num;
new = (HMM *)mymalloc(sizeof(HMM));
! new->len = totalstatelen(hdseq, hdseqlen);
new->state = (HMM_STATE *)mymalloc(sizeof(HMM_STATE) * new->len);
for (i=0;ilen;i++) {
new->state[i].ac = NULL;
--- 54,60 ----
int state_num;
new = (HMM *)mymalloc(sizeof(HMM));
! new->len = totalstatelen(hdseq, hdseqlen, has_sp, hmminfo);
new->state = (HMM_STATE *)mymalloc(sizeof(HMM_STATE) * new->len);
for (i=0;ilen;i++) {
new->state[i].ac = NULL;
***************
*** 75,80 ****
--- 79,100 ----
n++;
}
}
+ if (has_sp[i]) {
+ /* append sp at the end of the phone */
+ if (hmminfo->sp->is_pseudo) {
+ for (j = 1; j < hmm_logical_state_num(hmminfo->sp) - 1; j++) {
+ new->state[n].is_pseudo_state = TRUE;
+ new->state[n].out.cdset = &(hmminfo->sp->body.pseudo->stateset[j]);
+ n++;
+ }
+ } else {
+ for (j = 1; j < hmm_logical_state_num(hmminfo->sp) - 1; j++) {
+ new->state[n].is_pseudo_state = FALSE;
+ new->state[n].out.state = hmminfo->sp->body.defined->s[j];
+ n++;
+ }
+ }
+ }
}
/* make transition arcs */
***************
*** 147,153 ****
--- 167,237 ----
out_a[j] = out_a_next[j];
}
out_num_prev = out_num_next;
+
+ /* inter-word short pause handling */
+ if (has_sp[i]) {
+
+ out_num_next = 0;
+
+ /* arc from initial state */
+ for (ato = 1; ato < hmm_logical_state_num(hmminfo->sp); ato++) {
+ logprob = hmm_logical_trans(hmminfo->sp)->a[0][ato];
+ if (logprob != LOG_ZERO) {
+ /* to control short pause insertion, transition probability toward
+ the word-end short pause will be given a penalty */
+ logprob += hmminfo->iwsp_penalty;
+ /* expand arc */
+ if (ato == hmm_logical_state_num(hmminfo->sp)-1) {
+ /* from initial to final ... register all previously registered arcs for next expansion */
+ for(j=0;jstate[out_from[j]]), n + ato,
+ out_a[j] + logprob);
+ }
+ }
+ }
+ }
+ /* if short pause model doesn't have a model skip transition, also add it */
+ if (hmm_logical_trans(hmminfo->sp)->a[0][hmm_logical_state_num(hmminfo->sp)-1] == LOG_ZERO) {
+ /* to make insertion sp model to have no effect on the original path,
+ the skip transition probability should be 0.0 (=100%) */
+ logprob = 0.0;
+ for(j=0; jsp) - 1; afrom++) {
+ for (ato = 1; ato < hmm_logical_state_num(hmminfo->sp); ato++) {
+ logprob = hmm_logical_trans(hmminfo->sp)->a[afrom][ato];
+ if (logprob != LOG_ZERO) {
+ if (ato == hmm_logical_state_num(hmminfo->sp) - 1) {
+ /* from output state to final ... register the arc for next expansion */
+ out_from_next[out_num_next] = n+afrom;
+ out_a_next[out_num_next++] = logprob;
+ } else {
+ add_arc(&(new->state[n+afrom]), n + ato, logprob);
+ }
+ }
+ }
+ }
+ n += hmm_logical_state_num(hmminfo->sp) - 2;
+ for(j=0;jstate[out_from[j]]), new->len-1, out_a[j]);
}
***************
*** 162,170 ****
/* make word(phrase) HMM from HTK_HMM_INFO with no LM */
HMM *
! new_make_word_hmm(HTK_HMM_INFO *hmminfo, HMM_Logical **hdseq, int hdseqlen)
{
! return(new_make_word_hmm_with_lm(hmminfo, hdseq, hdseqlen, NULL));
}
/* free HMM */
--- 246,254 ----
/* make word(phrase) HMM from HTK_HMM_INFO with no LM */
HMM *
! new_make_word_hmm(HTK_HMM_INFO *hmminfo, HMM_Logical **hdseq, int hdseqlen, boolean *has_sp)
{
! return(new_make_word_hmm_with_lm(hmminfo, hdseq, hdseqlen, has_sp, NULL));
}
/* free HMM */
diff -crN julius-3.3p2-multipath/libsent/src/voca/voca_load_htkdict.c julius-3.3p3-multipath/libsent/src/voca/voca_load_htkdict.c
*** julius-3.3p2-multipath/libsent/src/voca/voca_load_htkdict.c Thu Oct 17 15:25:29 2002
--- julius-3.3p3-multipath/libsent/src/voca/voca_load_htkdict.c Tue Jan 7 01:10:59 2003
***************
*** 4,10 ****
/* voca_load_htkdict.c --- read in vocabulary data */
! /* $Id: voca_load_htkdict.c,v 1.7 2002/10/15 07:17:46 ri Exp $ */
/* format is HTK Dictionary format */
--- 4,10 ----
/* voca_load_htkdict.c --- read in vocabulary data */
! /* $Id: voca_load_htkdict.c,v 1.8 2003/01/06 16:10:39 ri Exp $ */
/* format is HTK Dictionary format */
***************
*** 126,131 ****
--- 126,132 ----
#define PHONEMELEN_STEP 10 /* malloc base */
+ static char buf[MAXLINELEN]; /* read buffer */
/* read in vocabulary file */
boolean /* TRUE on success, FALSE on any error word */
***************
*** 137,143 ****
{
boolean ok_flag = TRUE;
WORD_ID vnum;
- static char buf[MAXLINELEN];
boolean do_conv = FALSE;
if (hmminfo != NULL && hmminfo->is_triphone && (! ignore_tri_conv))
--- 138,143 ----
***************
*** 147,155 ****
vnum = 0;
while (getl(buf, sizeof(buf), fp) != NULL) {
if (voca_load_htkdict_line(buf, vnum, winfo, hmminfo, ignore_tri_conv, do_conv, &ok_flag) == FALSE) break;
vnum++;
- if (vnum >= winfo->maxnum) winfo_expand(winfo);
}
winfo->num = vnum;
--- 147,155 ----
vnum = 0;
while (getl(buf, sizeof(buf), fp) != NULL) {
+ if (vnum >= winfo->maxnum) winfo_expand(winfo);
if (voca_load_htkdict_line(buf, vnum, winfo, hmminfo, ignore_tri_conv, do_conv, &ok_flag) == FALSE) break;
vnum++;
}
winfo->num = vnum;
***************
*** 171,177 ****
{
boolean ok_flag = TRUE;
WORD_ID vnum;
- static char buf[MAXLINELEN];
boolean do_conv = FALSE;
if (hmminfo != NULL && hmminfo->is_triphone && (! ignore_tri_conv))
--- 171,176 ----
***************
*** 181,189 ****
vnum = 0;
while(getl_fd(buf, MAXLINELEN, fd) != NULL) {
if (voca_load_htkdict_line(buf, vnum, winfo, hmminfo, ignore_tri_conv, do_conv, &ok_flag) == FALSE) break;
vnum++;
- if (vnum >= winfo->maxnum) winfo_expand(winfo);
}
winfo->num = vnum;
--- 180,188 ----
vnum = 0;
while(getl_fd(buf, MAXLINELEN, fd) != NULL) {
+ if (vnum >= winfo->maxnum) winfo_expand(winfo);
if (voca_load_htkdict_line(buf, vnum, winfo, hmminfo, ignore_tri_conv, do_conv, &ok_flag) == FALSE) break;
vnum++;
}
winfo->num = vnum;
***************
*** 194,199 ****
--- 193,227 ----
return(ok_flag);
}
+ /* append a single entry to the existing dictionary */
+ boolean /* TRUE on success, FALSE on any error word */
+ voca_append_htkdict(
+ char *entry, /* dictionary entry string to be appended */
+ WORD_INFO *winfo,
+ HTK_HMM_INFO *hmminfo, /* if NULL, phonemes are ignored */
+ boolean ignore_tri_conv) /* TRUE if convert to triphone should be ignored */
+ {
+ boolean ok_flag = TRUE;
+ boolean do_conv = FALSE;
+
+ if (hmminfo != NULL && hmminfo->is_triphone && (! ignore_tri_conv))
+ do_conv = TRUE;
+
+ if (winfo->num >= winfo->maxnum) winfo_expand(winfo);
+ strcpy(buf, entry); /* const buffer not allowed in voca_load_htkdict_line() */
+ voca_load_htkdict_line(buf, winfo->num, winfo, hmminfo, ignore_tri_conv, do_conv, &ok_flag);
+
+ if (ok_flag == TRUE) {
+ winfo->num++;
+ /* re-compute maxwn */
+ set_maxwn(winfo);
+ set_maxwlen(winfo);
+ }
+
+ return(ok_flag);
+ }
+
+
boolean
voca_load_htkdict_line(char *buf, int vnum,
WORD_INFO *winfo,
***************
*** 299,305 ****
if (tmplg->is_pseudo) {
j_printerr(" use pseudo monophone \"%s\"\n", cbuf);
} else {
! j_printerr(" use monophone \"%s\"\n", cbuf);
}
}
}
--- 327,333 ----
if (tmplg->is_pseudo) {
j_printerr(" use pseudo monophone \"%s\"\n", cbuf);
} else {
! j_printerr(" use defined monophone \"%s\"\n", cbuf);
}
}
}
***************
*** 380,387 ****
{
WORD_ID n, w;
int i;
- static char buf[MAXLINELEN];
- boolean do_conv = FALSE;
n = woffset;
for(w=0;wnum;w++) {
--- 408,413 ----