RAVE

github.com/acids-ircam/RAVE

Training a RAVE Model

Tutorials and Documentation

forum.ircam.fr/article/detail/training-rave-models-on-custom-data/

youtube.com/watch?v=MlbkSMLoWBk

Preparation

[wip]

Install dependencies

miniconda
ffmpeg#

Install RAVE

Create a project folder and cd into this directory.
Create a virtual environment with the necessary pyhton version, using conda, then acitave it:

conda create -n RAVE python=3.9

conda activate RAVE

Install RAVE

pip install acids-rave

Prepare audio data set

Script (Mac OS), to batch convert all .mp3 and .wav files in a given folder to the necessary .wav format and saves copies in a dedicated folder. For training a stereo model with 44.1 kHz.

#!/bin/bash
shopt -s nullglob

output_dir="./converted"
mkdir -p "$output_dir"

for i in ./*.{mp3,wav}; do
  [ -f "$i" ] || continue
  output_file="$output_dir/$(basename "${i%.*}.wav")"
  ffmpeg -i "$i" -c:a pcm_s16le -ar 44100 -ac 2 -y "$output_file"
  touch -r "$i" "$output_file"
done

(remember to make the script executable: chmod +x yourscript.sh)

Preprocess the audio data set

rave preprocess --input_path /path/to/audio/files --output_path /output/path/for/preprocessed/dataset --channels 1

rave preprocess --input_path /path/to/audio/files --output_path /output/path/for/preprocessed/dataset --channels 2

This creates three files in the the project folder:

data.mbd (the compressed audio data)
lock.mbd
metadata.yaml (metadata of the set, like tutal duration)

Set up Terminal to detache processes from terminal windows

Using screen:

Create a new screen to detache process from terminal window:

screen -S currentTraining

Resume to a certain screen:

# list all screens:
screen -ls
# resume to a screen:
screen -r currentTraining

Training the Model

See --help for all available training parameters:

# type:
rave train --help

# and read:

--augment: augmentation configurations to use;
    repeat this option to specify a list of values
    (default: '[]')
  --batch: Batch size
    (default: '8')
    (an integer)
  --channels: number of audio channels
    (default: '0')
    (an integer)
  --ckpt: Path to previous checkpoint of the run
  --config: RAVE configuration to use;
    repeat this option to specify a list of values
    (default: "['v2.gin']")
  --db_path: Preprocessed dataset path
  --[no]derivative: Train RAVE on the derivative of the signal
    (default: 'false')
  --ema: Exponential weight averaging factor (optional)
    (a number)
  --gpu: GPU to use;
    repeat this option to specify a list of values
    (an integer)
  --max_steps: Maximum number of training steps
    (default: '6000000')
    (an integer)
  --n_signal: Number of audio samples to use during training
    (default: '131072')
    (an integer)
  --name: Name of the run
  --[no]normalize: Train RAVE on normalized signals
    (default: 'false')
  --out_path: Output folder
    (default: 'runs/')
  --override: Override gin binding;
    repeat this option to specify a list of values
    (default: '[]')
  --[no]progress: Display training progress bar
    (default: 'true')
  --rand_pitch: activates random pitch
    (a comma separated list)
  --save_every: save every n steps (default: just last)
    (default: '500000')
    (an integer)
  --[no]smoke_test: Run training with n_batches=1 to test the model
    (default: 'false')
  --val_every: Checkpoint model every n steps
    (default: '10000')
    (an integer)
  --workers: Number of workers to spawn for dataset loading
    (default: '8')
    (an integer)

Compose the training command as needed, for example:

rave train --name aNameForTheModel --db_path /path/to/preprocessed/dataset --out_path /path/for/model/output/ --config v2 --config noise --channels 2 --augment mute --save_every 50000 --val_every 20000

See Documentation for all --config options.

Warning

It seem, it it necessary to use --config discrete if you plan to train a prior using msprior. See: github.com/caillonantoine/msprior

To use the directory as output-path: --out_path ./

Set --channels 1 for mono, --channels 2 for stereo (or --channels 4 for 4ch training).

Use --config causal to enable causality, this reduces the latency of the model in realtime use, but costs quality in the generated output.

Decide, if --config wasserstein may be a good choice, to provide a better reconstrucion of the training samples.

Important: Use --config discrete, when training a prior.

Add Augmentations, to expand the dataset: --augment mute adds random silence, --augment compress randomly compresses waveforms, --augment gain applies random gain variations between -6 and 3.

Monitor the training with Tensorboard

cd to directory of the model, and activate the conda environment. It is also possible, to set up a second screen for this:

screen -S tensorboardScreen
conda activate RAVE

#launch tensorboard:

tensorboard --logdir . --bind_all

Click on the given URL to open Tensorboard in a web browser.

Stop Training

Typ strg+c in the training terminal window to stop the training process.

Resume Training

rave train --name nameOfTheModel --db_path /path/to/preprocessed/dataset --out_path /output/path --config v2 --config noise --channels 2 --augment mute --save_every 50000 --val_every 20000 --ckpt /path/to/checkpoint.ckpt

Important: Keep the most important flags consistent, especially --channels. Others, liek --augment can be altered.

Giving the base path of the model, it will resume with the most recents checkpoint. Use --ckpt /path/to/checkpoint.ckpt to start from a certain checkpoint.

Export the Model

Stop training, then run:

rave export --run /path/to/the/run/to/export/model/from --channels 2 --name nameOfTheGeneratedModel --output .  --streaming True --fidelity 0.98

Fidelity default ist 0.95 and can be set between 0 and 1.

Resampling can be set with -rs 48000, but this increases latency at the input/output of the model!

ATTENTION: EXPORTING A STEREO MODEL WHEN TRAINED IN V2

ONLY CHANGE THIS, WHEN TRAINING/EXPORTING A STEREO MODEL!

Add this line model.RAVE.n_channels = 2

to the file: config.gin in the section # Parameters for model.RAVE:.

Train a prior

[...]

Training a prior with RAVE

See rave train_prior --help for all settings.

Training a prior with msprior

See: github.com/caillonantoine/msprior

Needs a base model (.ts file), trained with -- config discrete and exportet WITHOUT `streaming option.