Home

gentlesnow

03 Jul 2019

【kaldi】 001 kaldi的安装部署

安装

安装前你需要对linux进行配置,需要安装的软件有:

  1. apt-get
  2. subversion
  3. automake
  4. autoconf
  5. libtool
  6. g++
  7. zlib
  8. libatal
  9. wget 具体安装方法如下:

    sudo apt-get install autoconf automaker gcc g++ libtool subversion gawk
    sudo apt-get install libatlas-dev libatlas-base-dev gfortran zlib1g-dev

clone代码

git clone https://github.com/kaldi-asr/kaldi.git

项目内容

ls
COPYING egs INSTALL misc README.md src tools windows

INSTALL文件

cat README.md

[![Build Status](https://travis-ci.com/kaldi-asr/kaldi.svg?branch=master)](https://travis-ci.com/kaldi-asr/kaldi)
Kaldi Speech Recognition Toolkit
================================

To build the toolkit: see `./INSTALL`.  These instructions are valid for UNIX
systems including various flavors of Linux; Darwin; and Cygwin (has not been
tested on more "exotic" varieties of UNIX).  For Windows installation
instructions (excluding Cygwin), see `windows/INSTALL`.

To run the example system builds, see `egs/README.txt`

If you encounter problems (and you probably will), please do not hesitate to
contact the developers (see below). In addition to specific questions, please
let us know if there are specific aspects of the project that you feel could be
improved, that you find confusing, etc., and which missing features you most
wish it had.

Kaldi information channels
--------------------------

For HOT news about Kaldi see [the project site](http://kaldi-asr.org/).

[Documentation of Kaldi](http://kaldi-asr.org/doc/):
- Info about the project, description of techniques, tutorial for C++ coding.
- Doxygen reference of the C++ code.

[Kaldi forums and mailing lists](http://kaldi-asr.org/forums.html):

We have two different lists
- User list kaldi-help
- Developer list kaldi-developers:

To sign up to any of those mailing lists, go to
[http://kaldi-asr.org/forums.html](http://kaldi-asr.org/forums.html):


Development pattern for contributors
------------------------------------

1. [Create a personal fork](https://help.github.com/articles/fork-a-repo/)
   of the [main Kaldi repository](https://github.com/kaldi-asr/kaldi) in GitHub.
2. Make your changes in a named branch different from `master`, e.g. you create
   a branch `my-awesome-feature`.
3. [Generate a pull request](https://help.github.com/articles/creating-a-pull-request/)
   through the Web interface of GitHub.
4. As a general rule, please follow [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
   There are a [few exceptions in Kaldi](http://kaldi-asr.org/doc/style.html).
   You can use the [Google's cpplint.py](https://raw.githubusercontent.com/google/styleguide/gh-pages/cpplint/cpplint.py)
   to verify that your code is free of basic mistakes.

Platform specific notes
-----------------------

### PowerPC 64bits little-endian (ppc64le)

- Kaldi is expected to work out of the box in RHEL >= 7 and Ubuntu >= 16.04 with
  OpenBLAS, ATLAS, or CUDA.
- CUDA drivers for ppc64le can be found at [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads).
- An [IBM Redbook](https://www.redbooks.ibm.com/abstracts/redp5169.html) is
  available as a guide to install and configure CUDA.

### Android

- Kaldi supports cross compiling for Android using Android NDK, clang++ and
  OpenBLAS.
- See [this blog post](http://jcsilva.github.io/2017/03/18/compile-kaldi-android/)
  for details.

cat INSTALL

This is the official Kaldi INSTALL. Look also at INSTALL.md for the git mirror installation.
[for native Windows install, see windows/INSTALL]

(1)
go to tools/  and follow INSTALL instructions there.

(2)
go to src/ and follow INSTALL instructions there.

第一步进入tools/

cd tools/
cat INSTALL

To check the prerequisites for Kaldi, first run

  extras/check_dependencies.sh

and see if there are any system-level installations you need to do. Check the
output carefully. There are some things that will make your life a lot easier
if you fix them at this stage. If your system default C++ compiler is not
supported, you can do the check with another compiler by setting the CXX
environment variable, e.g.

  CXX=g++-4.8 extras/check_dependencies.sh

Then run

  make

which by default will install ATLAS headers, OpenFst, SCTK and sph2pipe.
OpenFst requires a relatively recent C++ compiler with C++11 support, e.g.
g++ >= 4.7, Apple clang >= 5.0 or LLVM clang >= 3.3. If your system default
compiler does not have adequate support for C++11, you can specify a C++11
compliant compiler as a command argument, e.g.

  make CXX=g++-4.8

If you have multiple CPUs and want to speed things up, you can do a parallel
build by supplying the "-j" option to make, e.g. to use 4 CPUs

  make -j 4

In extras/, there are also various scripts to install extra bits and pieces that
are used by individual example scripts.  If an example script needs you to run
one of those scripts, it will tell you what to do.

检查依赖项

extras/check_dependencies.sh

extras/check_dependencies.sh: all OK.

查看所使用的电脑是几核cpu

nproc

4

设置参数,加快编译速度

make -j 4

安装 IRSTLM

extras/install_irstlm.sh

第二步进入src/

cd ../src/
cat INSTALL


These instructions are valid for UNIX-like systems (these steps have
been run on various Linux distributions; Darwin; Cygwin).  For native Windows
compilation, see ../windows/INSTALL.

You must first have completed the installation steps in ../tools/INSTALL
(compiling OpenFst; getting ATLAS and CLAPACK headers).

The installation instructions are

  ./configure --shared
  make depend -j 8
  make -j 8

Note that we added the "-j 8" to run in parallel because "make" takes a long
time.  8 jobs might be too many for a laptop or small desktop machine with not
many cores.

For more information, see documentation at http://kaldi-asr.org/doc/
and click on "The build process (how Kaldi is compiled)".

./configure –shared
make depend -j 8
make -j 8

样例(检验是否安装成功)

yesno

cd ..
cd egs
cd yesno
cd s5
./run.sh

运行thchs30

Kaldi中文语音识别公共数据集:

  1. aishell: AI SHELL公司开源178小时中文语音语料及基本训练脚本,见kaldi-master/egs/aishell
  2. gale_mandarin: 中文新闻广播数据集(LDC2013S08, LDC2013S08)
  3. hkust: 中文电话数据集(LDC2005S15, LDC2005T32)
  4. thchs30: 清华大学30小时的数据集,可以在http://www.openslr.org/18/下载

三个文件:

  1. data_thchs30.tgz [6.4G] ( speech data and transcripts )
  2. test-noise.tgz [1.9G] ( standard 0db noisy test data )
  3. resource.tgz [24M] ( supplementary resources, incl. lexicon for training data, noise samples )

下载后随便解压到egs/thchs30/s5下新建了一个文件夹thchs30-openslr,然后把三个文件解压在了该文件夹下

首先修改s5下面的cmd.sh脚本,把原脚本注释掉,修改为本地运行:

#export train_cmd=queue.pl
#export decode_cmd="queue.pl --mem 4G"
#export mkgraph_cmd="queue.pl --mem 8G"
#export cuda_cmd="queue.pl --gpu 1"
export train_cmd=run.pl
export decode_cmd="run.pl --mem 4G"
export mkgraph_cmd="run.pl --mem 8G"
export cuda_cmd="run.pl --gpu 1"

然后修改s5下面的run.sh脚本,需要修改两个地方:

第一个地方是修改并行任务的数量,可以根据cpu的个数来定

#n=4      #parallel jobs

n=2      #parallel jobs

第二个地方是修改数据集放的位置,例如我修改的为:

#thchs=/nfs/public/materials/data/thchs30-openslr
thchs=/root/kaldi-trunk/egs/thchs30/s5/thchs30-openslr

在s5下执行./run.sh,就会开始运行。

运行ASpIRE Chain Model

下载解压到 kaldi/egs/aspire/s5/

cat README.md

This README documents this tar archive that contains some chain models.

you should un-tar this inside the egs/aspire/s5 directory of kaldi;
it relates to things that are produced by those scripts.
This was created around when kaldi's master branch was at git log
503e5ded49281a57c4b60bb53e4787100e35e0ce
which was on Sep 30 2016.

We also ran the following (and you should, too):


### RUN the following command ###
steps/online/nnet3/prepare_online_decoding.sh \
  --mfcc-config conf/mfcc_hires.conf data/lang_chain exp/nnet3/extractor exp/chain/tdnn_7b exp/tdnn_7b_chain_online

### RUN the following command ###
utils/mkgraph.sh --self-loop-scale 1.0 data/lang_pp_test exp/tdnn_7b_chain_online exp/tdnn_7b_chain_online/graph_pp

### run the following command if you need the 4-gram language models (e.g. you
### plan to do lattice rescoring
utils/build_const_arpa_lm.sh \
   data/local/lm/4gram-mincount/lm_unpruned.gz data/lang_pp_test data/lang_pp_test_fg



to create the directory exp/tdnn_7b_chain_online, and then we ran
the following (but don't bother if you don't have the data):

steps/online/nnet3/decode.sh --cmd "$decode_cmd" --nj 25 \
     --acwt 1.0 --post-decode-acwt 10.0 \
      exp/tdnn_7b_chain_online/graph_pp data/dev exp/tdnn_7b_chain_online/decode_dev


Now, you won't be able to do the above if you don't have the ASPIRE dev data, but you
should be able to run similar things on your own data.
Also, look in the files in exp/tdnn_7b_chain_online/decode_dev/log
(which are included in this archive) to see what the individual command lines
look like:

head exp/tdnn_7b_chain_online/decode_dev/log/decode.1.log
# Running on g01
# Started at Fri Sep 30 21:15:06 EDT 2016
# online2-wav-nnet3-latgen-faster --online=true --do-endpointing=false --frame-subsampling-factor=3 --config=exp/tdnn_7b_online/conf/online.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=exp/chain/tdnn_7b/graph_pp/words.txt exp/tdnn_7b_online/final.mdl exp/chain/tdnn_7b/graph_pp/HCLG.fst ark:data/dev/split25/1/spk2utt "ark,s,cs:extract-segments scp,p:data/dev/split25/1/wav.scp data/dev/split25/1/segments ark:- |" "ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/tdnn_7b_online/decode_dev/lat.1.gz"
online2-wav-nnet3-latgen-faster --online=true --do-endpointing=false --frame-subsampling-factor=3 --config=exp/tdnn_7b_online/conf/online.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=exp/chain/tdnn_7b/graph_pp/words.txt exp/tdnn_7b_online/final.mdl exp/chain/tdnn_7b/graph_pp/HCLG.fst ark:data/dev/split25/1/spk2utt 'ark,s,cs:extract-segments scp,p:data/dev/split25/1/wav.scp data/dev/split25/1/segments ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/tdnn_7b_online/decode_dev/lat.1.gz'
LOG (online2-wav-nnet3-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (online2-wav-nnet3-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:204) Done.
extract-segments scp,p:data/dev/split25/1/wav.scp data/dev/split25/1/segments ark:-
lattice-scale --acoustic-scale=10.0 ark:- ark:-
fe_03_00001-A-000376-000554 and generally prefer
LOG (online2-wav-nnet3-latgen-faster:main():online2-wav-nnet3-latgen-faster.cc:272) Decoded utterance fe_03_00001-A-000376-000554
...


#note on WER:
grep WER exp/tdnn_7b_chain_online/decode_dev/wer* | utils/best_wer.sh
%WER 15.60 [ 6045 / 38753, 908 ins, 1751 del, 3386 sub ] exp/tdnn_7b_chain_online/decode_dev/wer_10

# .. note, data/dev isn't the actual dev data, it's a small amount of data held
# out from training data.

Til next time,
gentlesnow at 10:15

scribble