Dolva Audio Requirements: Formats, Quality, and Length

Dolva accepts audio files uploaded as multipart/form-data in the audio field. For best results, your audio should meet the quality and format guidelines described on this page. Poor audio quality — excessive background noise, very low bitrate, or very short clips — can reduce signal accuracy.

Recommended Formats

Dolva works with common audio formats. The following are recommended for best compatibility and analysis quality:

Format	Extension	Notes
WAV (PCM)	`.wav`	Best quality; uncompressed
MP3	`.mp3`	Widely supported; moderate compression
M4A / AAC	`.m4a`	Good quality; common on mobile devices
FLAC	`.flac`	Lossless compression; large file size
OGG Vorbis	`.ogg`	Open format; good compression

When possible, use WAV or FLAC to preserve the full acoustic signal. Heavily compressed formats like low-bitrate MP3 can reduce analysis accuracy.

Recording Guidelines

For accurate cognitive and emotion signal extraction, follow these recording best practices:

Minimize background noise — Record in a quiet environment. Loud background noise (traffic, music, crowd) reduces signal quality.
Use a close microphone — Headset or phone microphone held close to the speaker produces better results than far-field recording.
Avoid excessive clipping — Recording volume should be high enough to be clear but not so high that the audio distorts.
Include full utterances — Clips of at least a few seconds that contain natural speech yield the most reliable signals.

File Size

Keep audio files to a reasonable size for upload performance. For most conversational recordings, a few minutes of audio is sufficient. Very long files (e.g., several hours) should be split into segments before uploading.

Mono vs. Stereo

Dolva accepts both mono and stereo audio. If you have a stereo file (e.g., two speakers on separate channels), consider whether you want to analyze the full mix or extract individual channels for per-speaker analysis.

Language and Accent

Dolva’s acoustic models are language-agnostic — they analyze audio signal properties rather than words, so they work across languages and accents without additional configuration.

If you’re unsure whether your audio format is supported, test with a short clip first using the /v1/analyze/cognitive or /v1/analyze/emotion endpoint. A 422 Unprocessable Entity response may indicate a format issue.

Emotion Analysis: Understanding Dolva Affective Signals

Run Your First Audio Analysis Request with Dolva API

⌘I

​Recommended Formats

​Recording Guidelines

​File Size

​Mono vs. Stereo

​Language and Accent

Recommended Formats

Recording Guidelines

File Size

Mono vs. Stereo

Language and Accent