攝影或3C

Python: openai-whisper 語音轉文字(Speech To Text, STT)完整教學; pip install openai-whisper ; 如何購買openAI API key?如何生成字幕檔?

Python: openai-whisper 語音轉文字(Speech To Text, STT)完整教學; pip install openai-whisper ; 如何購買openAI API key?如何生成字幕檔? - 儲蓄保險王

本地免費 vs 雲端付費API，實測告訴你答案！

🏠 方法一：本地 Whisper（推薦）

安裝套件

# 本地 Whisper
pip install openai-whisper

# OpenAI API
pip install openai

# 可選：可以將api_key儲存在.env檔中,再用dotenv處理
pip install python-dotenv

完整程式碼

import whisper
import os
import time


tick = time.time()

# 添加 ffmpeg 路径到 PATH
ffmpeg_path = r"D:\user\Python\speech\ffmpeg-7.0.2-essentials_build\bin"
os.environ["PATH"] += os.pathsep + ffmpeg_path
#寫在code中,可攜性比較高,不然換電腦就要到控制台中設定環境變數

# 本地运行，不需要API key
model = whisper.load_model("base")  
# tiny, base, small, medium, large , large-v2 , large-v3
result = model.transcribe(".\Chinese_test.m4a", language="zh")
print(result["text"])
tock = time.time()
print("花費時間(sec):\t",tock-tick)

ffmpeg_path 中有的檔案:

輸出結果:

完整輸出多次 “中文測試”

☁️ 方法二：OpenAI API（付費）

準備 API Key

在 D:\user\Python\GPT\json\api_key.json 建立：

{"api_key": "sk-proj-HAXTIlKFn...你從openAI購買的API金鑰"}
#pip install python-dotenv 
#也可將api_key存在.env檔中,再用dotenv讀取
#購買網址:
# https://platform.openai.com/settings/organization/billing/overview

api_key購買網址:
https://platform.openai.com/settings/organization/billing/overview
查詢/建立api_key:
https://platform.openai.com/settings/organization/api-keys

完整程式碼

import json
import time
from openai import OpenAI

# 開始計時
tick = time.time()

# 讀取 API Key
with open(r"D:\user\Python\GPT\json\api_key.json", "r") as f:
    api_key = json.load(f)["api_key"]

# 初始化客戶端
client = OpenAI(api_key = api_key)

# 轉錄音檔
with open("Chinese_test.m4a", "rb") as audio_file:
    result = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="zh" #"Zhongwen" 或 "Zhonghua" 中文的羅馬拼音
    )

print("轉錄結果:", result.text)
print("花費時間(秒):", time.time() - tick)

輸出結果:

只有輸出一次 “中文測試”
API有先理解文字內容後,再做輸出
而非逐字稿式地輸出

🎯 選擇建議

選擇本地 Whisper，如果你：

🆓 想省錢（完全免費）
⚡ 追求速度（實測更快）
🔐 重視隱私（資料不外洩）
📦 有大檔案要處理
🌐 網路不穩定

選擇 OpenAI API，如果你：

📱 開發手機 App
☁️ 建構雲端服務
😴 不想配置環境
🔄 偶爾使用（一個月幾次）

🏁 總結

💎 最佳建議

新手入門：直接用本地 Whisper
長期使用：本地版本省錢又快速
偶爾使用：還是建議本地版本
企業應用：根據隱私需求選擇

🎉 實測真相

本地 Whisper 不只免費，還比付費 API 更快更準確！

記住：免費的午餐真的存在，就是本地 Whisper！ 🍽️
🚀 現在就開始使用本地 Whisper，享受免費、快速、準確的語音轉文字服務吧！

result = model.transcribe(“.\Chinese_test.m4a”, language=”zh”)
whisper 所生成的result :

{'text': '中文測試中文測試中文測試中文測試',
 'segments': [{'id': 0,
   'seek': 0,
   'start': 0.0,
   'end': 2.0,
   'text': '中文測試',
   'tokens': [50364, 5975, 17174, 9592, 105, 22099, 50464],
   'temperature': 0.0,
   'avg_logprob': -0.34008565442315464,
   'compression_ratio': 2.0,
   'no_speech_prob': 0.655396044254303},
  {'id': 1,
   'seek': 0,
   'start': 2.0,
   'end': 4.0,
   'text': '中文測試',
   'tokens': [50464, 5975, 17174, 9592, 105, 22099, 50564],
   'temperature': 0.0,
   'avg_logprob': -0.34008565442315464,
   'compression_ratio': 2.0,
   'no_speech_prob': 0.655396044254303},
  {'id': 2,
   'seek': 0,
   'start': 4.0,
   'end': 6.0,
   'text': '中文測試',
   'tokens': [50564, 5975, 17174, 9592, 105, 22099, 50664],
   'temperature': 0.0,
   'avg_logprob': -0.34008565442315464,
   'compression_ratio': 2.0,
   'no_speech_prob': 0.655396044254303},
  {'id': 3,
   'seek': 0,
   'start': 6.0,
   'end': 8.0,
   'text': '中文測試',
   'tokens': [50664, 5975, 17174, 9592, 105, 22099, 50764],
   'temperature': 0.0,
   'avg_logprob': -0.34008565442315464,
   'compression_ratio': 2.0,
   'no_speech_prob': 0.655396044254303}],
 'language': 'zh'}

⏰ 時間分段詳解

# 第一段 (id: 0)
{
    'id': 0,
    'start': 0.0,        # 開始時間: 0秒
    'end': 2.0,          # 結束時間: 2秒
    'text': '中文測試',   # 這段的文字
}

# 第二段 (id: 1) 
{
    'id': 1,
    'start': 2.0,        # 開始時間: 2秒
    'end': 4.0,          # 結束時間: 4秒
    'text': '中文測試',   # 這段的文字
}

# 第三段 (id: 2)
{
    'id': 2, 
    'start': 4.0,        # 開始時間: 4秒
    'end': 6.0,          # 結束時間: 6秒
    'text': '中文測試',   # 這段的文字
}

🎬 轉換成字幕檔

立即生成 SRT 字幕

import whisper
import json

def convert_whisper_to_srt(result, output_file="subtitle.srt"):
    """將 Whisper 結果轉換為 SRT 字幕檔"""
    
    srt_content = ""
    
    for i, segment in enumerate(result["segments"], 1):
        # 格式化時間
        start_time = format_time_srt(segment["start"])
        end_time = format_time_srt(segment["end"])
        text = segment["text"].strip()
        
        # 建立 SRT 格式
        srt_content += f"{i}\n"
        srt_content += f"{start_time} --> {end_time}\n"
        srt_content += f"{text}\n\n"
    
    # 儲存檔案
    with open(output_file, "w", encoding="utf-8") as f:
        f.write(srt_content)
    
    print(f"✅ SRT 字幕檔已生成: {output_file}")
    return srt_content

def format_time_srt(seconds):
    """轉換秒數為 SRT 時間格式"""
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    secs = seconds % 60
    return f"{hours:02d}:{minutes:02d}:{secs:06.3f}".replace(".", ",")

# 使用你的結果
# 假設你的結果存在 result 變數中
convert_whisper_to_srt(result)