Skip to main content
Dolva accepts audio files uploaded as multipart/form-data in the audio field. For best results, your audio should meet the quality and format guidelines described on this page. Poor audio quality — excessive background noise, very low bitrate, or very short clips — can reduce signal accuracy. Dolva works with common audio formats. The following are recommended for best compatibility and analysis quality:
FormatExtensionNotes
WAV (PCM).wavBest quality; uncompressed
MP3.mp3Widely supported; moderate compression
M4A / AAC.m4aGood quality; common on mobile devices
FLAC.flacLossless compression; large file size
OGG Vorbis.oggOpen format; good compression
When possible, use WAV or FLAC to preserve the full acoustic signal. Heavily compressed formats like low-bitrate MP3 can reduce analysis accuracy.

Recording Guidelines

For accurate cognitive and emotion signal extraction, follow these recording best practices:
  • Minimize background noise — Record in a quiet environment. Loud background noise (traffic, music, crowd) reduces signal quality.
  • Use a close microphone — Headset or phone microphone held close to the speaker produces better results than far-field recording.
  • Avoid excessive clipping — Recording volume should be high enough to be clear but not so high that the audio distorts.
  • Include full utterances — Clips of at least a few seconds that contain natural speech yield the most reliable signals.

File Size

Keep audio files to a reasonable size for upload performance. For most conversational recordings, a few minutes of audio is sufficient. Very long files (e.g., several hours) should be split into segments before uploading.

Mono vs. Stereo

Dolva accepts both mono and stereo audio. If you have a stereo file (e.g., two speakers on separate channels), consider whether you want to analyze the full mix or extract individual channels for per-speaker analysis.

Language and Accent

Dolva’s acoustic models are language-agnostic — they analyze audio signal properties rather than words, so they work across languages and accents without additional configuration.
If you’re unsure whether your audio format is supported, test with a short clip first using the /v1/analyze/cognitive or /v1/analyze/emotion endpoint. A 422 Unprocessable Entity response may indicate a format issue.