开源Whisper语音转文字
1. 前言
实现视频或音频转文字的在线工具🔧有:
- 飞书妙记(抖音) https://www.feishu.cn/product/minutes
- 剪映 (抖音) https://www.capcut.cn ,用法:点击导航栏「文本」,选择「智能字幕」
开源免费,离线使用 —— whipser ,由 OpenAI 出品的自动语音识别系统 1, 2。
2. 安装和测试
macOS 安装最新的 whisper 1 :
# install newest whisper on github.com
pip install git+https://github.com/openai/whisper.git
# install torch cpu/
pip3 install --pre --force-reinstall torch --index-url https://download.pytorch.org/whl/nightly/cpu
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# install rust
pip install setuptools-rust
命令行测试:
$ whisper 20231218.wav --language Chinese --model large
100%|█████████████████████████████████████| 2.88G/2.88G [06:12<00:00, 8.29MiB/s]
/Users/name/anaconda3/lib/python3.9/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
可以用 whisper --help
查看具体参数的含义。
部分核心参数说明:
-
--model
指定使用模型,默认为--model small
,可选有:tiny
、base
、small
、medium
、large
、large-v2
,英文专用模型是在名称后加上.en
-
--model_dir MODEL_DIR
指定模型路径,the path to save model files; uses~/.cache/whisper
by default (default: None) -
--device
指定硬件加速 (device to use for PyTorch inference (default: cpu)), cuda 则为显卡,cpu 就是 CPU, mps 为苹果 M 芯片 -
--output_dir OUTPUT_DIR, -o OUTPUT_DIR
指定输出路径,directory to save the outputs (default: .) -
--output_format {txt,vtt,srt,tsv,json,all}, -f {txt,vtt,srt,tsv,json,all}
指定输出格式,format of the output file; if not specified, all available formats will be produced (default: all) -
--task {transcribe,translate}
指定转录方式,默认为--task transcribe
转录模式,--task translate
为翻译模式【将其他语言翻译成英文】,whether to perform X->X speech recognition (’transcribe’) or X->English translation (’translate’) (default: transcribe) -
--language
指定转录语言,可选项有:
af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,
ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,
pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,
zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,
Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,
English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,
Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,
Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,
Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,
Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,
Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,
Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,
Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba