柚子快報(bào)邀請(qǐng)碼778899分享:whisper深入-語(yǔ)者分離
柚子快報(bào)邀請(qǐng)碼778899分享:whisper深入-語(yǔ)者分離
文章目錄
學(xué)習(xí)目標(biāo):如何使用whisper學(xué)習(xí)內(nèi)容一:whisper 轉(zhuǎn)文字1.1 使用whisper.load_model()方法下載,加載1.2 使用實(shí)例對(duì)文件進(jìn)行轉(zhuǎn)錄1.3 實(shí)戰(zhàn)
學(xué)習(xí)內(nèi)容二:語(yǔ)者分離(pyannote.audio)pyannote.audio是huggingface開(kāi)源音色包第一步:安裝依賴第二步:創(chuàng)建key第三步:測(cè)試pyannote.audio
學(xué)習(xí)內(nèi)容三:整合
學(xué)習(xí)目標(biāo):如何使用whisper
學(xué)習(xí)內(nèi)容一:whisper 轉(zhuǎn)文字
1.1 使用whisper.load_model()方法下載,加載
model=whisper.load_model(參數(shù))
name 需要加載的模型,如上圖device:默認(rèn)有個(gè)方法,有顯存使用顯存,沒(méi)有使用cpudownload_root:下載的根目錄,默認(rèn)使用~/.cache/whisperin_memory: 是否將模型權(quán)重預(yù)加載到主機(jī)內(nèi)存中
返回值 model : Whisper Whisper語(yǔ)音識(shí)別模型實(shí)例
def load_model(
name: str,
device: Optional[Union[str, torch.device]] = None,
download_root: str = None,
in_memory: bool = False,
) -> Whisper:
"""
Load a Whisper ASR model
Parameters
----------
name : str
one of the official model names listed by `whisper.available_models()`, or
path to a model checkpoint containing the model dimensions and the model state_dict.
device : Union[str, torch.device]
the PyTorch device to put the model into
download_root: str
path to download the model files; by default, it uses "~/.cache/whisper"
in_memory: bool
whether to preload the model weights into host memory
Returns
-------
model : Whisper
The Whisper ASR model instance
"""
if device is None:
device = "cuda" if torch.cuda.is_available() else "cpu"
if download_root is None:
default = os.path.join(os.path.expanduser("~"), ".cache")
download_root = os.path.join(os.getenv("XDG_CACHE_HOME", default), "whisper")
if name in _MODELS:
checkpoint_file = _download(_MODELS[name], download_root, in_memory)
alignment_heads = _ALIGNMENT_HEADS[name]
elif os.path.isfile(name):
checkpoint_file = open(name, "rb").read() if in_memory else name
alignment_heads = None
else:
raise RuntimeError(
f"Model {name} not found; available models = {available_models()}"
)
with (
io.BytesIO(checkpoint_file) if in_memory else open(checkpoint_file, "rb")
) as fp:
checkpoint = torch.load(fp, map_location=device)
del checkpoint_file
dims = ModelDimensions(**checkpoint["dims"])
model = Whisper(dims)
model.load_state_dict(checkpoint["model_state_dict"])
if alignment_heads is not None:
model.set_alignment_heads(alignment_heads)
return model.to(device)
1.2 使用實(shí)例對(duì)文件進(jìn)行轉(zhuǎn)錄
result = model.transcribe(file_path)
def transcribe(
model: "Whisper",
audio: Union[str, np.ndarray, torch.Tensor],
*,
verbose: Optional[bool] = None,
temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
compression_ratio_threshold: Optional[float] = 2.4,
logprob_threshold: Optional[float] = -1.0,
no_speech_threshold: Optional[float] = 0.6,
condition_on_previous_text: bool = True,
initial_prompt: Optional[str] = None,
word_timestamps: bool = False,
prepend_punctuations: str = "\"'“?([{-",
append_punctuations: str = "\"'.。,,!!??::”)]}、",
**decode_options,
):
"""
將音頻轉(zhuǎn)換為文本。
參數(shù):
- model: Whisper模型
- audio: 音頻文件路徑、NumPy數(shù)組或PyTorch張量
- verbose: 是否打印詳細(xì)信息,默認(rèn)為None
- temperature: 溫度參數(shù),默認(rèn)為(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
- compression_ratio_threshold: 壓縮比閾值,默認(rèn)為2.4
- logprob_threshold: 對(duì)數(shù)概率閾值,默認(rèn)為-1.0
- no_speech_threshold: 無(wú)語(yǔ)音信號(hào)閾值,默認(rèn)為0.6
- condition_on_previous_text: 是否根據(jù)先前的文本進(jìn)行解碼,默認(rèn)為T(mén)rue
- initial_prompt: 初始提示,默認(rèn)為None
- word_timestamps: 是否返回單詞時(shí)間戳,默認(rèn)為False
- prepend_punctuations: 前綴標(biāo)點(diǎn)符號(hào),默認(rèn)為"\"'“?([{-"
- append_punctuations: 后綴標(biāo)點(diǎn)符號(hào),默認(rèn)為"\"'.。,,!!??::”)]}、"
- **decode_options: 其他解碼選項(xiàng)
返回:
- 轉(zhuǎn)錄得到的文本
"""
1.3 實(shí)戰(zhàn)
建議load_model添加參數(shù)
download_root:下載的根目錄,默認(rèn)使用~/.cache/whisper transcribe方法添加參數(shù)word_timestamps=True
import whisper
import arrow
# 定義模型、音頻地址、錄音開(kāi)始時(shí)間
def excute(model_name,file_path,start_time):
model = whisper.load_model(model_name)
result = model.transcribe(file_path,word_timestamps=True)
for segment in result["segments"]:
now = arrow.get(start_time)
start = now.shift(seconds=segment["start"]).format("YYYY-MM-DD HH:mm:ss")
end = now.shift(seconds=segment["end"]).format("YYYY-MM-DD HH:mm:ss")
print("【"+start+"->" +end+"】:"+segment["text"])
if __name__ == '__main__':
excute("large","/root/autodl-tmp/no/test.mp3","2022-10-24 16:23:00")
學(xué)習(xí)內(nèi)容二:語(yǔ)者分離(pyannote.audio)pyannote.audio是huggingface開(kāi)源音色包
第一步:安裝依賴
pip install pyannote.audio
第二步:創(chuàng)建key
https://huggingface.co/settings/tokens
第三步:測(cè)試pyannote.audio
創(chuàng)建實(shí)例:Pipeline.from_pretrained(參數(shù))使用GPU加速:import torch # 導(dǎo)入torch庫(kù) pipeline.to(torch.device(“cuda”))實(shí)例轉(zhuǎn)化音頻pipeline(“test.wav”)
from_pretrained(參數(shù))
cache_dir:路徑或str,可選模型緩存目錄的路徑。默認(rèn)/pyannote"當(dāng)未設(shè)置時(shí)。
pipeline(參數(shù))
file_path:錄音文件num_speakers:幾個(gè)說(shuō)話者,可以不帶
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1", use_auth_token="申請(qǐng)的key")
# send pipeline to GPU (when available)
import torch
device='cuda' if torch.cuda.is_available() else 'cpu'
pipeline.to(torch.device(device))
# apply pretrained pipeline
diarization = pipeline("test.wav")
print(diarization)
# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...
學(xué)習(xí)內(nèi)容三:整合
這里要借助一個(gè)開(kāi)源代碼,用于整合以上兩種產(chǎn)生的結(jié)果
報(bào)錯(cuò)No module named 'pyannote_whisper' 如果你使用使用AutoDL平臺(tái),你可以使用學(xué)術(shù)代理加速
source /etc/network_turbo
git clone https://github.com/yinruiqing/pyannote-whisper.git
cd pyannote-whisper
pip install -r requirements.txt
這個(gè)錯(cuò)誤可能是由于缺少或不正確安裝了所需的 sndfile 庫(kù)。sndfile 是一個(gè)用于處理音頻文件的庫(kù),它提供了多種格式的讀寫(xiě)支持。
你可以嘗試安裝 sndfile 庫(kù),方法如下:
在 Ubuntu 上,使用以下命令安裝:sudo apt-get install libsndfile1-dev 在 CentOS 上,使用以下命令安裝:sudo yum install libsndfile-devel 在 macOS 上,使用 Homebrew 安裝:brew install libsndfile 然后重新執(zhí)行如上指令
在項(xiàng)目里面寫(xiě)代碼就可以了,或者復(fù)制代碼里面的pyannote_whisper.utils模塊代碼
import os
import whisper
from pyannote.audio import Pipeline
from pyannote_whisper.utils import diarize_text
import concurrent.futures
import subprocess
import torch
print("正在加載聲紋模型")
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1",use_auth_token="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")
output_dir = '/root/autodl-tmp/no/out'
print("正在whisper模型")
model = whisper.load_model("large", device="cuda")
# MP3轉(zhuǎn)化為wav
def convert_to_wav(path):
new_path = ''
if path[-3:] != 'wav':
new_path = '.'.join(path.split('.')[:-1]) + '.wav'
try:
subprocess.call(['ffmpeg', '-i', path, new_path, '-y', '-an'])
except:
return path, 'Error: Could not convert file to .wav'
else:
new_path = ''
return new_path, None
def process_audio(file_path):
file_path, retmsg = convert_to_wav(file_path)
print(f"===={file_path}=======")
asr_result = model.transcribe(file_path, initial_prompt="語(yǔ)音轉(zhuǎn)換")
pipeline.to(torch.device('cuda'))
diarization_result = pipeline(file_path, num_speakers=2)
final_result = diarize_text(asr_result, diarization_result)
output_file = os.path.join(output_dir, os.path.basename(file_path)[:-4] + '.txt')
with open(output_file, 'w') as f:
for seg, spk, sent in final_result:
line = f'{seg.start:.2f} {seg.end:.2f} {spk} {sent}\n'
f.write(line)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
wave_dir = '/root/autodl-tmp/no'
# 獲取當(dāng)前目錄下所有wav文件名
wav_files = [os.path.join(wave_dir, file) for file in os.listdir(wave_dir) if file.endswith('.mp3')]
# 處理每個(gè)wav文件
# with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
# executor.map(process_audio, wav_files)
for wav_file in wav_files:
process_audio(wav_file)
print('處理完成!')
柚子快報(bào)邀請(qǐng)碼778899分享:whisper深入-語(yǔ)者分離
好文閱讀
本文內(nèi)容根據(jù)網(wǎng)絡(luò)資料整理,出于傳遞更多信息之目的,不代表金鑰匙跨境贊同其觀點(diǎn)和立場(chǎng)。
轉(zhuǎn)載請(qǐng)注明,如有侵權(quán),聯(lián)系刪除。