Choose Rivya audio workflows for voice, text to speech, dialogue, sound effects, cleanup, music drafts, credits, and Studio iteration.

Use this AI audio workflow guide before you choose between voice, text to speech, dialogue, sound effects, cleanup, music drafts, or lyric-first work in Rivya.

The easiest way to get audio wrong in Rivya is to think “audio” is one workflow.

It is not.

The current audio category really covers several different kinds of work side by side.

This page is the workflow reference for the audio area. If you want the more decision-oriented guide about how to start the first real voice or sound task, How to Start Your First AI Audio Workflow in Rivya is the better paired read.

Right now, the part most users will touch first is still spoken audio: voice, multilingual readout, dialogue, sound effects, and cleanup. But the catalog also already includes a live music branch built around Suno Music, Suno Sounds, and Suno Lyrics, so the category is broader than "TTS plus audio cleanup."

Start With the Job Shape

Before you choose an audio model, decide which of these problems you are actually solving:

single-speaker voice or narration
multilingual spoken output
multi-speaker dialogue
generated sound effects
cleanup of an uploaded recording
a full song draft or instrumental-first track
lyric ideation before audio generation

Those are different workflows, not one workflow with slightly different settings.

What the Current Audio Catalog Actually Covers

The current audio catalog spans two different clusters today.

Voice, dialogue, sound effects, and cleanup

Music and music-adjacent work

The important point is not that several of them happen to sit under the same category. It is that they belong to different form shapes and different cost patterns.

Spoken Voice and Narration

If the task is a single voice reading one script, ElevenLabs Turbo 2.5 is still the clean default.

That is the best place to start for:

narration
voice-over
quick TTS drafts
simple spoken tracks

If the spoken delivery has to work across languages, ElevenLabs Multilingual V2 is the better fit.

If the script already has two or more speakers, ElevenLabs Dialogue V3 is the better path because dialogue is structurally different from one-person readout.

If you already know the job is narrower than the whole voice area, the paired decision pages are Best Text to Speech Generator in 2026 for plain readout, AI Narration Generator for one-speaker explainers, and AI Dubbing Generator for localized or replaced spoken tracks.

Sound Design and Cleanup

If the task is "generate a sound," ElevenLabs Sound Effect V2 is the relevant path.

If the task is "fix this recording I already have," ElevenLabs Audio Isolation is the right one.

That distinction matters because the first is prompt-first generation, while the second is upload-first cleanup.

The Live Music Branch

The music side of the audio catalog is already live, but it is intentionally narrower than a full music-production suite.

If the goal is song structure, lyric-led ideation, or music-style output, it helps to start from the music side of the audio catalog instead of from the voice guides.

Suno Music is for first track drafts

Suno Music is the better path when you need a playable track draft with or without vocals.

That makes it the clearest start for:

first song drafts
instrumental-first concept tracks
rough music for videos, demos, or podcasts

Successful results can continue through Extend Music, and the current result-based follow-ups also include WAV conversion and vocal separation.

Suno Sounds is for short sound sketches

Suno Sounds is a better fit when the real job is a shorter sonic sketch, ambience bed, loop idea, or background texture rather than a complete song structure.

It is the more useful place to start when BPM, key, or looping matter more than verses and choruses do.

Successful results can continue into a Vocal Separation action.

Suno Lyrics is for words before audio

Suno Lyrics is the words-first path.

It is useful when the hook, title, chorus direction, or verse shape matters before you spend on track generation. The important boundary is that it returns text results, not playable audio.

If you want the music branch broken out in more detail, read Music Workflows in Rivya.

Why the Forms Change So Much

The audio surface is intentionally model-shaped.

The forms differ because the jobs differ:

voice models ask for text
dialogue models ask for turns and speaker assignment
sound effects ask for cue-like generation input
cleanup models expect uploaded audio
music models introduce their own prompt patterns and follow-up actions
lyric-first tools can return structured text instead of media files

That is not inconsistency. It is Rivya exposing the real shape of each workflow instead of pretending everything works the same way under one form.

What the Music Branch Is Not

The right way to describe the current music branch is "live and useful, but intentionally narrow."

It is not:

a full DAW
a deep mastering or multi-stem editing suite
the entire Suno family exposed at once
a reason to treat all audio work as music work

That boundary matters because Rivya's current strength is still the broader multimodal workflow, not a music-only specialist stack.

Why Audio Costs Feel Different

Audio work in Rivya does not always behave like fixed-cost image generation.

Cost can depend much more directly on variables such as:

script length
output duration
uploaded audio duration
result-based follow-up actions on music tasks

Some audio entries, especially on the live music branch, are documented with fixed per-run pricing. Others behave more like duration- or text-shaped cost patterns.

That is why credits hint is especially worth reading on audio models. In many cases it is describing a cost pattern, not promising one flat number.

The Most Common Audio Mistakes

The most common wrong turns are:

choosing voice when the real task is cleanup
treating dialogue like single-speaker narration
choosing sound effects when the real task is to repair an existing recording
starting with Suno Sounds when the real need is a full song draft
starting with Suno Lyrics when the real need is a playable result
ignoring duration or follow-up actions as part of the cost picture

Most of those mistakes disappear once you sort by workflow shape first.

A Fast Way to Choose

If you want the shortest reliable decision path:

decide whether the input is text, structured dialogue, uploaded audio, a music brief, or a lyric brief
decide whether the output is voice, multilingual voice, dialogue, sound design, cleanup, a full track, a short sound sketch, or lyric text
choose the matching model
only then tune the parameters or result-based follow-up actions

That sequence prevents most bad fits before you spend time or credits.

Public Audio Pages vs Studio

Use the public audio pages when you want a first run, a quick comparison, or a search landing page that gets you to the right branch.

Use Studio when you want repeated iteration, saved continuity, fuller account context, or a steadier place to keep pushing the same audio task forward.

If you want the most useful companions next, go to Music Workflows in Rivya, How to Create AI Music with Rivya, How to Start Your First AI Audio Workflow in Rivya, AI Narration Generator, AI Voiceover for Videos, AI Dubbing Generator, or Studio.

Audio Workflow Checklist

Start here when the input or output is sound:

Decide whether the job is voice, dialogue, sound effect, cleanup, music, or lyrics.
Separate generating new audio from repairing uploaded audio.
Check voice, language, speaker count, and commercial review before delivery.
Use shorter drafts before spending on longer or higher-risk audio tasks.
Keep scripts and pronunciation notes separate from general creative direction.

Recheck When Audio Changes Shape

Recheck when a voiceover becomes dubbing, a music idea becomes lyrics-first writing, or cleanup becomes re-recording. Audio tasks drift quickly if the job shape is not named.

Rivya AI Audio Workflow Guide