欧美free性护士vide0shd,老熟女,一区二区三区,久久久久夜夜夜精品国产,久久久久久综合网天天,欧美成人护士h版

目錄

柚子快報(bào)邀請(qǐng)碼778899分享:whisper深入-語(yǔ)者分離

柚子快報(bào)邀請(qǐng)碼778899分享:whisper深入-語(yǔ)者分離

http://yzkb.51969.com/

文章目錄

學(xué)習(xí)目標(biāo):如何使用whisper學(xué)習(xí)內(nèi)容一:whisper 轉(zhuǎn)文字1.1 使用whisper.load_model()方法下載,加載1.2 使用實(shí)例對(duì)文件進(jìn)行轉(zhuǎn)錄1.3 實(shí)戰(zhàn)

學(xué)習(xí)內(nèi)容二:語(yǔ)者分離(pyannote.audio)pyannote.audio是huggingface開(kāi)源音色包第一步:安裝依賴第二步:創(chuàng)建key第三步:測(cè)試pyannote.audio

學(xué)習(xí)內(nèi)容三:整合

學(xué)習(xí)目標(biāo):如何使用whisper

學(xué)習(xí)內(nèi)容一:whisper 轉(zhuǎn)文字

1.1 使用whisper.load_model()方法下載,加載

model=whisper.load_model(參數(shù))

name 需要加載的模型,如上圖device:默認(rèn)有個(gè)方法,有顯存使用顯存,沒(méi)有使用cpudownload_root:下載的根目錄,默認(rèn)使用~/.cache/whisperin_memory: 是否將模型權(quán)重預(yù)加載到主機(jī)內(nèi)存中

返回值 model : Whisper Whisper語(yǔ)音識(shí)別模型實(shí)例

def load_model(

name: str,

device: Optional[Union[str, torch.device]] = None,

download_root: str = None,

in_memory: bool = False,

) -> Whisper:

"""

Load a Whisper ASR model

Parameters

----------

name : str

one of the official model names listed by `whisper.available_models()`, or

path to a model checkpoint containing the model dimensions and the model state_dict.

device : Union[str, torch.device]

the PyTorch device to put the model into

download_root: str

path to download the model files; by default, it uses "~/.cache/whisper"

in_memory: bool

whether to preload the model weights into host memory

Returns

-------

model : Whisper

The Whisper ASR model instance

"""

if device is None:

device = "cuda" if torch.cuda.is_available() else "cpu"

if download_root is None:

default = os.path.join(os.path.expanduser("~"), ".cache")

download_root = os.path.join(os.getenv("XDG_CACHE_HOME", default), "whisper")

if name in _MODELS:

checkpoint_file = _download(_MODELS[name], download_root, in_memory)

alignment_heads = _ALIGNMENT_HEADS[name]

elif os.path.isfile(name):

checkpoint_file = open(name, "rb").read() if in_memory else name

alignment_heads = None

else:

raise RuntimeError(

f"Model {name} not found; available models = {available_models()}"

)

with (

io.BytesIO(checkpoint_file) if in_memory else open(checkpoint_file, "rb")

) as fp:

checkpoint = torch.load(fp, map_location=device)

del checkpoint_file

dims = ModelDimensions(**checkpoint["dims"])

model = Whisper(dims)

model.load_state_dict(checkpoint["model_state_dict"])

if alignment_heads is not None:

model.set_alignment_heads(alignment_heads)

return model.to(device)

1.2 使用實(shí)例對(duì)文件進(jìn)行轉(zhuǎn)錄

result = model.transcribe(file_path)

def transcribe(

model: "Whisper",

audio: Union[str, np.ndarray, torch.Tensor],

*,

verbose: Optional[bool] = None,

temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),

compression_ratio_threshold: Optional[float] = 2.4,

logprob_threshold: Optional[float] = -1.0,

no_speech_threshold: Optional[float] = 0.6,

condition_on_previous_text: bool = True,

initial_prompt: Optional[str] = None,

word_timestamps: bool = False,

prepend_punctuations: str = "\"'“?([{-",

append_punctuations: str = "\"'.。,,!!??::”)]}、",

**decode_options,

):

"""

將音頻轉(zhuǎn)換為文本。

參數(shù):

- model: Whisper模型

- audio: 音頻文件路徑、NumPy數(shù)組或PyTorch張量

- verbose: 是否打印詳細(xì)信息,默認(rèn)為None

- temperature: 溫度參數(shù),默認(rèn)為(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)

- compression_ratio_threshold: 壓縮比閾值,默認(rèn)為2.4

- logprob_threshold: 對(duì)數(shù)概率閾值,默認(rèn)為-1.0

- no_speech_threshold: 無(wú)語(yǔ)音信號(hào)閾值,默認(rèn)為0.6

- condition_on_previous_text: 是否根據(jù)先前的文本進(jìn)行解碼,默認(rèn)為T(mén)rue

- initial_prompt: 初始提示,默認(rèn)為None

- word_timestamps: 是否返回單詞時(shí)間戳,默認(rèn)為False

- prepend_punctuations: 前綴標(biāo)點(diǎn)符號(hào),默認(rèn)為"\"'“?([{-"

- append_punctuations: 后綴標(biāo)點(diǎn)符號(hào),默認(rèn)為"\"'.。,,!!??::”)]}、"

- **decode_options: 其他解碼選項(xiàng)

返回:

- 轉(zhuǎn)錄得到的文本

"""

1.3 實(shí)戰(zhàn)

建議load_model添加參數(shù)

download_root:下載的根目錄,默認(rèn)使用~/.cache/whisper transcribe方法添加參數(shù)word_timestamps=True

import whisper

import arrow

# 定義模型、音頻地址、錄音開(kāi)始時(shí)間

def excute(model_name,file_path,start_time):

model = whisper.load_model(model_name)

result = model.transcribe(file_path,word_timestamps=True)

for segment in result["segments"]:

now = arrow.get(start_time)

start = now.shift(seconds=segment["start"]).format("YYYY-MM-DD HH:mm:ss")

end = now.shift(seconds=segment["end"]).format("YYYY-MM-DD HH:mm:ss")

print("【"+start+"->" +end+"】:"+segment["text"])

if __name__ == '__main__':

excute("large","/root/autodl-tmp/no/test.mp3","2022-10-24 16:23:00")

學(xué)習(xí)內(nèi)容二:語(yǔ)者分離(pyannote.audio)pyannote.audio是huggingface開(kāi)源音色包

第一步:安裝依賴

pip install pyannote.audio

第二步:創(chuàng)建key

https://huggingface.co/settings/tokens

第三步:測(cè)試pyannote.audio

創(chuàng)建實(shí)例:Pipeline.from_pretrained(參數(shù))使用GPU加速:import torch # 導(dǎo)入torch庫(kù) pipeline.to(torch.device(“cuda”))實(shí)例轉(zhuǎn)化音頻pipeline(“test.wav”)

from_pretrained(參數(shù))

cache_dir:路徑或str,可選模型緩存目錄的路徑。默認(rèn)/pyannote"當(dāng)未設(shè)置時(shí)。

pipeline(參數(shù))

file_path:錄音文件num_speakers:幾個(gè)說(shuō)話者,可以不帶

from pyannote.audio import Pipeline

pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1", use_auth_token="申請(qǐng)的key")

# send pipeline to GPU (when available)

import torch

device='cuda' if torch.cuda.is_available() else 'cpu'

pipeline.to(torch.device(device))

# apply pretrained pipeline

diarization = pipeline("test.wav")

print(diarization)

# print the result

for turn, _, speaker in diarization.itertracks(yield_label=True):

print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")

# start=0.2s stop=1.5s speaker_0

# start=1.8s stop=3.9s speaker_1

# start=4.2s stop=5.7s speaker_0

# ...

學(xué)習(xí)內(nèi)容三:整合

這里要借助一個(gè)開(kāi)源代碼,用于整合以上兩種產(chǎn)生的結(jié)果

報(bào)錯(cuò)No module named 'pyannote_whisper' 如果你使用使用AutoDL平臺(tái),你可以使用學(xué)術(shù)代理加速

source /etc/network_turbo

git clone https://github.com/yinruiqing/pyannote-whisper.git

cd pyannote-whisper

pip install -r requirements.txt

這個(gè)錯(cuò)誤可能是由于缺少或不正確安裝了所需的 sndfile 庫(kù)。sndfile 是一個(gè)用于處理音頻文件的庫(kù),它提供了多種格式的讀寫(xiě)支持。

你可以嘗試安裝 sndfile 庫(kù),方法如下:

在 Ubuntu 上,使用以下命令安裝:sudo apt-get install libsndfile1-dev 在 CentOS 上,使用以下命令安裝:sudo yum install libsndfile-devel 在 macOS 上,使用 Homebrew 安裝:brew install libsndfile 然后重新執(zhí)行如上指令

在項(xiàng)目里面寫(xiě)代碼就可以了,或者復(fù)制代碼里面的pyannote_whisper.utils模塊代碼

import os

import whisper

from pyannote.audio import Pipeline

from pyannote_whisper.utils import diarize_text

import concurrent.futures

import subprocess

import torch

print("正在加載聲紋模型")

pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1",use_auth_token="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")

output_dir = '/root/autodl-tmp/no/out'

print("正在whisper模型")

model = whisper.load_model("large", device="cuda")

# MP3轉(zhuǎn)化為wav

def convert_to_wav(path):

new_path = ''

if path[-3:] != 'wav':

new_path = '.'.join(path.split('.')[:-1]) + '.wav'

try:

subprocess.call(['ffmpeg', '-i', path, new_path, '-y', '-an'])

except:

return path, 'Error: Could not convert file to .wav'

else:

new_path = ''

return new_path, None

def process_audio(file_path):

file_path, retmsg = convert_to_wav(file_path)

print(f"===={file_path}=======")

asr_result = model.transcribe(file_path, initial_prompt="語(yǔ)音轉(zhuǎn)換")

pipeline.to(torch.device('cuda'))

diarization_result = pipeline(file_path, num_speakers=2)

final_result = diarize_text(asr_result, diarization_result)

output_file = os.path.join(output_dir, os.path.basename(file_path)[:-4] + '.txt')

with open(output_file, 'w') as f:

for seg, spk, sent in final_result:

line = f'{seg.start:.2f} {seg.end:.2f} {spk} {sent}\n'

f.write(line)

if not os.path.exists(output_dir):

os.makedirs(output_dir)

wave_dir = '/root/autodl-tmp/no'

# 獲取當(dāng)前目錄下所有wav文件名

wav_files = [os.path.join(wave_dir, file) for file in os.listdir(wave_dir) if file.endswith('.mp3')]

# 處理每個(gè)wav文件

# with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:

# executor.map(process_audio, wav_files)

for wav_file in wav_files:

process_audio(wav_file)

print('處理完成!')

柚子快報(bào)邀請(qǐng)碼778899分享:whisper深入-語(yǔ)者分離

http://yzkb.51969.com/

好文閱讀

評(píng)論可見(jiàn),查看隱藏內(nèi)容

本文內(nèi)容根據(jù)網(wǎng)絡(luò)資料整理,出于傳遞更多信息之目的,不代表金鑰匙跨境贊同其觀點(diǎn)和立場(chǎng)。

轉(zhuǎn)載請(qǐng)注明,如有侵權(quán),聯(lián)系刪除。

本文鏈接:http://gantiao.com.cn/post/17821056.html

發(fā)布評(píng)論

您暫未設(shè)置收款碼

請(qǐng)?jiān)谥黝}配置——文章設(shè)置里上傳

掃描二維碼手機(jī)訪問(wèn)

文章目錄