开源Whisper语音转文字

2023-12-19
#Python #Unix

1. 前言

实现视频或音频转文字的在线工具🔧有:

开源免费,离线使用 —— whipser ,由 OpenAI 出品的自动语音识别系统 1, 2

2. 安装和测试

macOS 安装最新的 whisper 1

# install newest whisper on github.com
pip install git+https://github.com/openai/whisper.git

# install torch cpu/
pip3 install --pre --force-reinstall torch --index-url https://download.pytorch.org/whl/nightly/cpu

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# install rust
pip install setuptools-rust

命令行测试:

$ whisper 20231218.wav --language Chinese --model large
100%|█████████████████████████████████████| 2.88G/2.88G [06:12<00:00, 8.29MiB/s]
/Users/name/anaconda3/lib/python3.9/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")

可以用 whisper --help 查看具体参数的含义。

部分核心参数说明:

  • --model指定使用模型,默认为 --model small,可选有:tinybasesmallmediumlargelarge-v2,英文专用模型是在名称后加上 .en

  • --model_dir MODEL_DIR指定模型路径,the path to save model files; uses ~/.cache/whisper by default (default: None)

  • --device指定硬件加速 (device to use for PyTorch inference (default: cpu)), cuda 则为显卡,cpu 就是 CPU, mps 为苹果 M 芯片

  • --output_dir OUTPUT_DIR, -o OUTPUT_DIR 指定输出路径,directory to save the outputs (default: .)

  • --output_format {txt,vtt,srt,tsv,json,all}, -f {txt,vtt,srt,tsv,json,all} 指定输出格式,format of the output file; if not specified, all available formats will be produced (default: all)

  • --task {transcribe,translate} 指定转录方式,默认为 --task transcribe 转录模式,--task translate 为翻译模式【将其他语言翻译成英文】,whether to perform X->X speech recognition (’transcribe’) or X->English translation (’translate’) (default: transcribe)

  • --language指定转录语言,可选项有:

af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,
ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,
pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,
zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,
Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,
English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,
Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,
Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,
Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,
Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,
Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,
Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,
Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba

3. 延伸阅读

  1. openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision
  2. Introducing Whisper
  3. 精准转写:利用 Whisper 处理音视频转文字不完全指南 - 少数派
  4. 找不到现成的字幕?Whisper 让不懂外语的你也能看懂日剧 - 少数派
  5. 5 分钟 Whisper 测评,看完没有人比你更懂「语音识别」 - 少数派