Voice Transcription

Yolocode provides audio transcription powered by NVIDIA's Parakeet model.

Endpoints

POST /api/transcribe-stream — Streaming transcription
POST /api/transcribe — Non-streaming transcription

Supported formats

WAV
FLAC
PCM (raw audio)

Usage

JSON (base64 audio)

curl -X POST https://api.yolocode.ai/api/transcribe-stream \
  -H "Content-Type: application/json" \
  -d '{
    "audio": "<base64_encoded_audio>",
    "languageCode": "en-US"
  }'

Multipart form data

curl -X POST https://api.yolocode.ai/api/transcribe-stream \
  -F "audio=@recording.wav" \
  -F "languageCode=en-US"

Parameters

Parameter	Type	Default	Description
`audio`	string/file	required	Base64 audio (JSON) or file (multipart)
`languageCode`	string	`en-US`	BCP-47 language code

Notes

Max request duration: 300 seconds
Audio format is auto-detected from the data
The streaming endpoint returns results as they're processed

Voice Transcription

Endpoints

Supported formats

Usage

JSON (base64 audio)

Multipart form data

Parameters

Notes

On this page