Yolocode

Voice Transcription

Convert speech to text with NVIDIA Parakeet

Yolocode provides audio transcription powered by NVIDIA's Parakeet model.

Endpoints

  • POST /api/transcribe-stream — Streaming transcription
  • POST /api/transcribe — Non-streaming transcription

Supported formats

  • WAV
  • FLAC
  • PCM (raw audio)

Usage

JSON (base64 audio)

curl -X POST https://api.yolocode.ai/api/transcribe-stream \
  -H "Content-Type: application/json" \
  -d '{
    "audio": "<base64_encoded_audio>",
    "languageCode": "en-US"
  }'

Multipart form data

curl -X POST https://api.yolocode.ai/api/transcribe-stream \
  -F "audio=@recording.wav" \
  -F "languageCode=en-US"

Parameters

ParameterTypeDefaultDescription
audiostring/filerequiredBase64 audio (JSON) or file (multipart)
languageCodestringen-USBCP-47 language code

Notes

  • Max request duration: 300 seconds
  • Audio format is auto-detected from the data
  • The streaming endpoint returns results as they're processed

On this page