RAVE
Training a RAVE Model
Tutorials and Documentation
forum.ircam.fr/article/detail/training-rave-models-on-custom-data/
youtube.com/watch?v=MlbkSMLoWBk
Preparation
[wip]
Install dependencies
- miniconda
- ffmpeg#
Install RAVE
- Create a project folder and cd into this directory.
- Create a virtual environment with the necessary pyhton version, using conda, then acitave it:
- Install RAVE
Prepare audio data set
Script (Mac OS), to batch convert all .mp3 and .wav files in a given folder to the necessary .wav format and saves copies in a dedicated folder. For training a stereo model with 44.1 kHz.
#!/bin/bash
shopt -s nullglob
output_dir="./converted"
mkdir -p "$output_dir"
for i in ./*.{mp3,wav}; do
[ -f "$i" ] || continue
output_file="$output_dir/$(basename "${i%.*}.wav")"
ffmpeg -i "$i" -c:a pcm_s16le -ar 44100 -ac 2 -y "$output_file"
touch -r "$i" "$output_file"
done
(remember to make the script executable: chmod +x yourscript.sh
)
Preprocess the audio data set
rave preprocess --input_path /path/to/audio/files --output_path /output/path/for/preprocessed/dataset --channels 1
rave preprocess --input_path /path/to/audio/files --output_path /output/path/for/preprocessed/dataset --channels 2
This creates three files in the the project folder:
- data.mbd (the compressed audio data)
- lock.mbd
- metadata.yaml (metadata of the set, like tutal duration)
Set up Terminal to detache processes from terminal windows
Using screen
:
Create a new screen to detache process from terminal window:
Resume to a certain screen:
Training the Model
See --help
for all available training parameters:
# type:
rave train --help
# and read:
--augment: augmentation configurations to use;
repeat this option to specify a list of values
(default: '[]')
--batch: Batch size
(default: '8')
(an integer)
--channels: number of audio channels
(default: '0')
(an integer)
--ckpt: Path to previous checkpoint of the run
--config: RAVE configuration to use;
repeat this option to specify a list of values
(default: "['v2.gin']")
--db_path: Preprocessed dataset path
--[no]derivative: Train RAVE on the derivative of the signal
(default: 'false')
--ema: Exponential weight averaging factor (optional)
(a number)
--gpu: GPU to use;
repeat this option to specify a list of values
(an integer)
--max_steps: Maximum number of training steps
(default: '6000000')
(an integer)
--n_signal: Number of audio samples to use during training
(default: '131072')
(an integer)
--name: Name of the run
--[no]normalize: Train RAVE on normalized signals
(default: 'false')
--out_path: Output folder
(default: 'runs/')
--override: Override gin binding;
repeat this option to specify a list of values
(default: '[]')
--[no]progress: Display training progress bar
(default: 'true')
--rand_pitch: activates random pitch
(a comma separated list)
--save_every: save every n steps (default: just last)
(default: '500000')
(an integer)
--[no]smoke_test: Run training with n_batches=1 to test the model
(default: 'false')
--val_every: Checkpoint model every n steps
(default: '10000')
(an integer)
--workers: Number of workers to spawn for dataset loading
(default: '8')
(an integer)
Compose the training command as needed, for example:
rave train --name aNameForTheModel --db_path /path/to/preprocessed/dataset --out_path /path/for/model/output/ --config v2 --config noise --channels 2 --augment mute --save_every 50000 --val_every 20000
See Documentation for all --config
options.
Warning
It seem, it it necessary to use --config discrete
if you plan to train a prior using msprior.
See: github.com/caillonantoine/msprior
To use the directory as output-path: --out_path ./
Set --channels 1
for mono, --channels 2
for stereo (or --channels 4
for 4ch training).
Use --config causal
to enable causality, this reduces the latency of the model in realtime use, but costs quality in the generated output.
Decide, if --config wasserstein
may be a good choice, to provide a better reconstrucion of the training samples.
Important: Use --config discrete
, when training a prior.
Add Augmentations, to expand the dataset: --augment mute
adds random silence, --augment compress
randomly compresses waveforms, --augment gain
applies random gain variations between -6 and 3.
Monitor the training with Tensorboard
cd
to directory of the model, and activate the conda environment.
It is also possible, to set up a second screen
for this:
screen -S tensorboardScreen
conda activate RAVE
#launch tensorboard:
tensorboard --logdir . --bind_all
Click on the given URL to open Tensorboard in a web browser.
Stop Training
Typ strg+c
in the training terminal window to stop the training process.
Resume Training
rave train --name nameOfTheModel --db_path /path/to/preprocessed/dataset --out_path /output/path --config v2 --config noise --channels 2 --augment mute --save_every 50000 --val_every 20000 --ckpt /path/to/checkpoint.ckpt
Important: Keep the most important flags consistent, especially --channels
. Others, liek --augment
can be altered.
Giving the base path of the model, it will resume with the most recents checkpoint. Use --ckpt /path/to/checkpoint.ckpt
to start from a certain checkpoint.
Export the Model
Stop training, then run:
rave export --run /path/to/the/run/to/export/model/from --channels 2 --name nameOfTheGeneratedModel --output . --streaming True --fidelity 0.98
Fidelity default ist 0.95 and can be set between 0 and 1.
Resampling can be set with -rs 48000
, but this increases latency at the input/output of the model!
ATTENTION: EXPORTING A STEREO MODEL WHEN TRAINED IN V2
ONLY CHANGE THIS, WHEN TRAINING/EXPORTING A STEREO MODEL!
Add this line
model.RAVE.n_channels = 2
to the file: config.gin
in the section # Parameters for model.RAVE:
.
Train a prior
[...]
Training a prior with RAVE
See rave train_prior --help
for all settings.
Training a prior with msprior
See: github.com/caillonantoine/msprior
Needs a base model (.ts file), trained with -- config discrete
and exportet WITHOUT `streaming option.