This project was achieved using faster-whisper technology throughout the entire Python codebase, and is implemented in a WebSocket server that enables real-time audio transcription and translation.

Whisper Streaming

Whisper Streaming is a real-time speech-to-text transcription and translation system built on OpenAI's Whisper model with WebSocket support and optional Google Translate integration. The system is powered exclusively by faster-whisper for optimal performance and speed. This project aims to be robust, scalable, and FAST for production use.

Demos

Real-time Whisper Transcription

This demo showcases the core WebSocket-based transcription functionality with near-instant results.

Please note: In the video, you'll notice the WebSocket connects to an IP address, this is an AWS EC2 GPU instance where the server was deployed for optimal real-time performance and faster processing.

Streaming Transcription with Translation

This demo demonstrates the combined transcription and translation capabilities using Google Translate API.

Please note: The processing appears slower in this video because it was running on a laptop with CPU-only processing (no GPU). Even with CPU limitations, the system delivers near real-time performance with approximately 5-second latency, which is acceptable for CPU-based processing.

Chrome Extension - Tab Audio Capture

The Chrome extension captures audio from any browser tab and provides real-time transcription and translation in a persistent side panel. Perfect for YouTube videos, podcasts, and online meetings works with any audio content playing in your browser tabs.

Features

  • Real-time streaming - WebSocket-based audio streaming with low latency processing
  • Voice Activity Detection - Optimized processing with VAD/VAC for better performance
  • Live Translation - Real-time Google Translate integration for multilingual support
  • Chrome Extension - Capture and translate audio from any browser tab
  • Web Interface - Browser-based UI with language switching capabilities

Streaming support: The application processes audio chunks in real-time as they arrive. You don't need to wait for the complete audio to be processed before receiving transcription results.

Faster-Whisper Backend: Built exclusively with faster-whisper for optimal performance and speed. This implementation provides the best balance of accuracy and processing speed for real-time applications.

WebSocket Architecture: Built on WebSocket protocol for real-time bidirectional communication between client and server, enabling live audio streaming and instant results.

Voice Activity Detection: Integrated VAD/VAC reduces processing overhead by only transcribing when speech is detected, improving performance and accuracy.

Translation Integration: Optional Google Translate API integration provides real-time translation alongside transcription for multilingual applications.

Chrome Extension: Capture audio from any browser tab (YouTube, podcasts, meetings) and get live transcription and translation in a persistent side panel.

Easy Configuration: Simple launcher script with extensive command-line options for quick setup and customization.

Please note: This project supports all Whisper-compatible languages and can be easily extended with additional features. Contributions and feature suggestions are welcome!

Installation

Required Dependencies

Install the core dependencies first:

pip install librosa soundfile websockets

Faster-Whisper Backend (Recommended)

Several alternative backends are integrated. The most recommended one is faster-whisper with GPU support. For optimal performance, GPU acceleration is highly recommended.

CPU Installation

pip install faster-whisper

GPU Installation (Recommended)

For GPU support, follow the faster-whisper instructions for NVIDIA libraries. We succeeded with CUDNN 8.5.0 and CUDA 11.7.

pip install faster-whisper

GPU Configuration: After installation, navigate to whisper_online.py and find the model variable around line 119. Uncomment the GPU model line to enable GPU acceleration.

Optional Features

  • Voice Activity Detection: pip install torch torchaudio
  • Translation Support: pip install requests (for Google Translate integration)

Quick Start

Once installed, start the server with:

python start_whisper.py

Then open index.html in your browser or navigate to http://localhost:43007