RVC v2 model-compatible version are now available
– Create your own AI voice changer!
On May 30, 2023, I published a tutorial article and sample code for a version that supports the RVC v2 model.
Tutorial article and sample code for the RVC v2 model-compatible version are now available.
The program is developing daily and changing quickly…
It may take some time yet, but I’m considering creating a tutorial video for RVC v2 model-compatible version.
The Gradio-related error that appeared on June 16, 2023 has already been addressed, so please run Google “Save a copy in Drive” and launch RVC WebUI with the tutorial code from the last updated version after June 16, 2023.
RVC v2 model supported RVC WebUI Tutorial:
【RVC v2 model supported】How to use RVC WebUI – Getting Started with AI Voice Changer for beginners
*A tutorial video was also released on June 18, 2023.
【Confirmatory event: July 18, 2023】
Error when starting RVC WebUI
ModuleNotFoundError: No module named ‘faiss’
is now fixed so that it does not appear.
【Confirmatory event: August 12, 2023】
Fixed a bug when training with the RVC WebUI・
“Pre-trained models are not reflected and training does not proceed.
and so on.
【Confirmatory event: Dec 26, 2023】
I have modified the code in 【Step 2: Installation of dependencies】.
→ I have not been able to verify whether the RVC WebUI can be used after the code has been modified.
According to the person who provided feedback in the comments, until December 26, 2023, “Train” was able to do so.(Depends on the type of runtime)
→ Added: December 27, 2023
“Train” was able to do so, but could not change voices due to errors in “Model inference”.
*The free version is not available. At this time (as of December 2023), you will need to pay a fee.
【Confirmed event: Feb 8, 2024】
As of February 8, 2024, both “Train” and “Model Inference” seem to be possible with the following tutorial code for the RVC v2 model.
RVC v2 model supported version tutorial article:
【RVC v2 model supported】How to use RVC WebUI – Getting Started with AI Voice Changer for beginners
【Confirmed event: Feb 11, 2024】
As of February 11, 2024, both “Train” and “Model Inference” seem to be possible, according to a commenter on the RVC WebUI’s tutorial video.
*The free version is not available. At this time (as of February 2024), you will need to pay a fee.
【Confirmed event: Mar 9, 2024】
Corrected the code in step 2 for “pydantic-core” & “typing-extensions”, “chex” & “numpy” related errors.
ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sqlalchemy 2.0.28 requires typing-extensions>=4.6.0, but you have typing-extensions 4.5.0 which is incompatible.
pydantic-core 2.16.3 requires typing-extensions!=4.7.0,>=4.6.0, but you have typing-extensions 4.5.0 which is incompatible.
ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chex 0.1.85 requires numpy>=1.24.1, but you have numpy 1.23.5 which is incompatible.
*The free version cannot be used. At this time (as of Mar 2024), billing is required.
【Confirmed event: Mar 18, 2024】
Corrected the code in step 2 for “PyTorch” related errors.
*The free version cannot be used. At this time (as of Mar 2024), billing is required.
【Confirmed event: Apr 4, 2024】
Corrected the code in step 2 for “optax” & “pandas-stubs” related errors.
*The free version cannot be used. At this time (as of Apr 2024), billing is required.
【Confirmed event: Jun 13, 2024】
Corrected the code in step 2 for “numba” related errors.
*The free version cannot be used. At this time (as of Jun 2024), billing is required.
【Confirmed event: Jun 14, 2024】
The code for Step 2, Step 3, Step 5, and Step 6 was added and modified with reference to the dependencies of the current “fumiama/Retrieval-based-Voice-Conversion-WebUI” (forked repository of the RVC WebUI).
From what I saw in the code, it seems that the RVC v2 model is now supported. Perhaps, but it may not be much different from the “RVC v2 model compatible tutorial code” (cloned from the official RVC WebUI repository), so if this notebook does not work well, please use the “RVC v2 model compatible tutorial code”.
*The free version cannot be used. At this time (as of Jun 2024), billing is required.
【Confirmed event: Jun 21, 2024】
The code in 【Step 10: Launch the RVC WebUI】 has been modified.
*The free version cannot be used. At this time (as of Jun 2024), billing is required.
【Confirmed event: Jun 29, 2024】
Due to frequent errors with updates to “fumiama/Retrieval-based-Voice-Conversion-WebUI” (the branched and forked repository of the RVC WebUI), the specification has been changed to use the archive of the repository as of April 2024.
(Only RVC v1 model is supported.)
*The free version cannot be used. At this time (as of Jun 2024), billing is required.
【Confirmed event: July 27, 2024】
Fixed “omegaconf” related dependency errors such as “ERROR: Cannot install fairseq and fairseq==0.12.2”.
*The free version cannot be used. At this time (as of July 2024), billing is required.
【Confirmed event: Sep 2, 2024】
The version of “numpy” was updated compared to a month ago, so the code was modified to allow downgrading to the “numpy==1.23.5” version.
*The free version cannot be used. At this time (as of September 2024), billing is required.
【Confirmed event: Sep 3, 2024】
Fixed code to downgrade to “numpy==1.24.4” since “numpy==1.23.5” cannot avoid dependency conflict errors.
Also, the version of “tensorflow” was updated to “tensorflow==2.17.0” compared to one month ago, so the code was modified to allow downgrading to the “tensorflow==2.15.0” version.
*The free version cannot be used. At this time (as of September 2024), billing is required.
→ As of September 9, 2024:
According to information given to us by viewers of the tutorial video, it is recommended that the following steps be performed instead of using “One-click Training” when training.
・The process proceeds individually by clicking each processing button.
* Steps: Process data → Feature extraction → Train Model
I have also received report of being able to change voices by
・Using a one-minute audio file
for inference.
→ As of September 12, 2024:
I have also received reports that even if an error occurs in “Train model”, if you ignore the error and press “Train feature index”, a “Training model” (pth file) is created and could be used in “Model Inference” for inference.
I also received a report of a case in which a comparison of the v1 model-compatible version and the v2 model-compatible version showed that the v1 model-compatible tutorial code was able to perform voice changes with a quality that felt “more like the real voice”.
The Benefits of Machine Learning! But are you still on the threshold?
RVC WebUI – AI Voice Changer Tutorial
Some of you may want to use the RVC WebUI,
the AI voice changer released in April 2023, but are saddened by the fact that you don’t know how on earth to use it….
The familiar artificial intelligence chatbot published by Open AI
・ChatGPT(Generative Pre-trained Transformer)
It is surprising to see how accurate voice transformation technology (voice quality transformation technology) can now be handled by using pre-trained models, even at the individual level, with only a small data set.
How to use the program,
・Machine learning program rules
・Understanding of file structure to some extent
・How to specify files
I suspect that if you don’t know how to translate a file, you may not be able to get a handle on it.
So, in order to make it easier for Japanese people who are interested in RVC WebUI to enjoy AI voice changers, I will summarize how to use Google Colaboratory to launch RVC WebUI and create an original AI voice changer.
I hope that this series of information will provide you with an opportunity to get in touch with AI Voice Changer.
Sample Code Links & Program Licenses
I have released sample code with explanations on how to use it so that you can easily try RVC WebUI.
I hope this will be of some help to first-time AI students around the world who find the original RVC WebUI Google Colaboratory code difficult to understand.
English Version
RVC WebUI Tutorial:
RVC-WebUI-for-AI-beginners.ipynb(The MIT License)| Google Colaboratory
License for the sample code “RVC-WebUI-for-AI-beginners.ipynb”:
The MIT License
Copyright 2023 child programmer
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Download the audio material used in this tutorial
【About the audio material used in this tutorial】
Before creating your own original AI voice changer of your choice, many of you may first want to learn a series of operating procedures to try out.
I searched for information and found that some people have given us permission to use the voice changers, so I have used the following audio for training in this tutorial.
AI Voice Changer Training Voice (female voice) audio download:
あみたろの声素材(セリフ素材)一括ダウンロード | あみたろの声素材工房
(PCM44,100Hz/16-bit/monaural WAV format)
Credit Information:あみたろの声素材工房 https://amitaro.net/
In addition, I have made the audio material before converting to “Amitaro’s” voice available for download from this page, so please use it if necessary.
Download audio material of a voice (male voice) testing inferences:
Download: Voice material (male voice) to try model inference (AI voice change)
(Sample rate 48,000Hz/24-bit/stereo WAV format)
* Licensing of sample audio:audio material of a voice (male voice) testing inferences
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
The voice material before conversion was partially extracted from this audio.
The source of the voice material for testing model inferences:
People’s Speech Dataset(CC-BY and CC-BY-SA)| MLCommons
How to start and install RVC WebUI
– Last update: September 3, 2024
【Step 1: Check the GPU】
If you are unable to check your GPU with the following commands, go to the Google Colaboratory menu
「Runtime – Change runtime type – Hardware accelerator」
and select “GPU”, then save and try running the code again.
Run Code
!nvidia-smi
!nvcc -V
!free -h
Output Result
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 51C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
total used free shared buff/cache available
Mem: 12Gi 636Mi 9.0Gi 1.0Mi 3.1Gi 11Gi
Swap: 0B 0B 0B
【Step 2: Installation of dependencies】
As of September 3, 2024:
The code has been modified to the specification to downgrade to “numpy==1.24.4” because the dependency conflict error cannot be avoided with “numpy==1.23.5”.
Run Code
# As of September 3, 2024:Specify “tensorflow==2.15.0” (downgrade from “tensorflow==2.17.0”) and adjust dependencies
!pip3 install tensorflow==2.15.0 tf-keras==2.15.1 tensorstore==0.1.45 orbax-checkpoint==0.4.4
# As of July 27, 2024:Downgrade Google Colaboratory pip version pip==24.1.2 to counter errors such as “fairseq 0.12.2 depends on omegaconf<2.1"
!python3 -m pip install --upgrade pip==22.3
# As of July 27, 2024:Specify the version of "omegaconf"
!pip3 install omegaconf==2.0.6
!pip3 install jedi==0.19.1
!apt-get -y install build-essential python3-dev ffmpeg
!pip3 install torch==2.1.0 torchtext==0.16.0 torchvision==0.16.0 torchaudio==2.1.0 # Downgrade PyTorch version to 2.1.0 + adjusted dependencies
!pip3 install optax==0.2.1 # Specify the version of optax that chex can use
!pip3 install chex==0.1.7 # numpy==1.23.5(1.25.2)(1.24.4) specifies the version of chex that can be used
!pip3 install pandas-stubs==2.0.1.230501 # numpy==1.23.5(1.25.2)(1.24.4) specifies the version of pandas-stubs that can be used
# As of September 2, 2024:Disable the following codes
# !pip3 install typeguard==3.0.2 inflect==6.0.5 albumentations==1.3.1 albucore==0.0.5 # As of July 27, 2024:"typeguard" related error countermeasures
# As of September 3, 2024:Specify “numpy==1.24.4”, “numba>=0.57.0”, and “llvmlite==0.43.0” to avoid dependency conflict errors
!pip3 install numpy==1.24.4 numba>=0.57.0 llvmlite==0.43.0
# As of September 2, 2024: albucore removed due to “numpy==1.23.5” conflict error
# As of September 3, 2024: → Restored albucore with changes to “numpy==1.24.4”.
!pip3 install typeguard==3.0.2 inflect==6.0.5 albumentations==1.3.1 albucore==0.0.5
# As of September 2, 2024:The following measures
# rmm-cu12 24.4.0 requires numba>=0.57, but you have numba 0.56.4 which is incompatible.
# cudf-cu12 24.4.1 requires numba>=0.57, but you have numba 0.56.4 which is incompatible.
# albucore 0.0.5 requires numpy>=1.24.4, but you have numpy 1.23.5 which is incompatible.
# Uninstall rmm-cu12, cudf-cu12, and albucore
# As of September 3, 2024:When uninstalled, the following display will appear when starting RVC WebUI
# Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
# Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
# Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
# Therefore, disable the following code to stop uninstallation
# !pip3 uninstall rmm-cu12 cudf-cu12 albucore -y
!pip3 install kaleido==0.2.1 tensorflow-probability==0.20.1 typing-extensions==4.6.0 faiss-cpu==1.7.2 fairseq==0.12.2 gradio==3.14.0
# As of September 2, 2024:Disable the following code
# # Changed "numba==0.56.4" to "numba" to avoid numba conflict error. numpy version changed to unspecified.
# !pip3 install ffmpeg==1.4 ffmpeg-python>=0.2.0 praat-parselmouth==0.4.3 pyworld==0.3.2 numpy numba librosa==0.9.2 tensorboardX==2.6.2.2 onnx==1.15.0
# As of September 2, 2024:Since the version of numpy has been upgraded since the end of August 2024, “numpy==1.23.5” and “numba==0.56.4” should be specified as the version (downgrade).
# !pip3 install ffmpeg==1.4 ffmpeg-python>=0.2.0 praat-parselmouth==0.4.3 pyworld==0.3.2 numpy==1.23.5 numba==0.56.4 librosa==0.9.2 tensorboardX==2.6.2.2 onnx==1.15.0
# As of September 3, 2024:I get the following conflict error
# rmm-cu12 24.4.0 requires numba>=0.57
# cudf-cu12 24.4.1 requires numba>=0.57
# albucore 0.0.5 requires numpy>=1.24.4
# Remove numpy/numba description from here and move code
!pip3 install ffmpeg==1.4 ffmpeg-python>=0.2.0 praat-parselmouth==0.4.3 pyworld==0.3.2 librosa==0.9.2 tensorboardX==2.6.2.2 onnx==1.15.0
print('(Version at the time of execution)') # As of execution on September 2, 2024:python 3.10.12
import platform
print('python ' + platform.python_version())
【Step 3: Clone the RVC WebUI repository from GitHub】
Copy the program “Retrieval-based-Voice-Conversion-WebUI” from GitHub to Google Colaboratory.
Run Code
# Clone the repository "https://github.com/fumiama/Retrieval-based-Voice-Conversion-WebUI" archive of the original RVC v1 model (as of April 2024)
!git clone https://github.com/ChildProgrammerJP/v1-Retrieval-based-Voice-Conversion-WebUI
# Move the "Retrieval-based-Voice-Conversion-WebUI.zip" file directly under content
import os
import shutil
new_path = shutil.move('/content/v1-Retrieval-based-Voice-Conversion-WebUI/Retrieval-based-Voice-Conversion-WebUI.zip', '/content/')
# Unzip "Retrieval-based-Voice-Conversion-WebUI.zip
!unzip /content/Retrieval-based-Voice-Conversion-WebUI.zip
%cd /content/Retrieval-based-Voice-Conversion-WebUI
# !mkdir -p pretrained uvr5_weights
【Step 4: Update to the latest status】
Run Code
!git pull
【Step 5: Preparation of Pre-trained models, etc.】
Run Code
!apt -y install -qq aria2
#pretrained__v1 pre-training model
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G48k.pth
#trained model for sound source separation(vocal remove)
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP2-人声vocals+非人声instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP2-人声vocals+非人声instrumentals.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP5-主旋律人声vocals+其他instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP5-主旋律人声vocals+其他instrumentals.pth
【Step 6: Download “hubert_base.pt”】
Run Code
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt -d /content/Retrieval-based-Voice-Conversion-WebUI -o hubert_base.pt
【Step 7: Mount Google Drive】
You can use folders and files on Google Drive.
After executing the following code, you will be asked for permission, so “Allow” with your Google Drive Google account.
Run Code
from google.colab import drive
drive.mount('/content/drive')
【Step 8: Prepare the dataset and the audio file to be converted to voice in Google Drive.】
In Google Drive’s “MyDrive”, prepare a
・”dataset”
in the “MyDrive” folder of Google Drive.
Also, upload the voice files (WAV or MP3 format) that you would like to try inference (voice conversion).
【Folder Structure】
dataset
|— 〜1.wav
|— 〜2.wav
|— 〜3.wav
|— 〜4.wav
・
・
・
|— 〜10.wav
*As an example, the “dataset” folder should contain several audio files in WAV format separated by short sentences (sentences up to the punctuation point).
*In this tutorial, I was training with “10” audio files of “1-3 seconds”. If you want to convert audio in earnest, increasing the number of audio files and the number of learning times (number of Epochs) may reduce the machine sound and other problems.
(It seems that a GPU with more memory is better.)
To create audio files to be trained, I have seen a number of ways to use audio editing applications such as
・Audacity
and other audio editing applications to create the audio files to be trained.
How long should the audio files for each data set be? I am not sure, but the producer explains
“Use less than 10 minutes vocal to fast train a voice conversion model!”
I looked up related information on the Internet, and also looked at currently distributed audio corpora, and found that
・each file contains only a few seconds of audio
So, after learning how to use the RVC WebUI to some extent, please try various trial-and-error methods.
【Step 9: Conversion of duplicate file names】
Rename duplicate files in the dataset (“dataset” folder).
Run Code
!ls -a /content/drive/MyDrive/dataset
!rename 's/(\w+)\.(\w+)~(\d*)/$1_$3.$2/' /content/drive/MyDrive/dataset/*.*~*
【Step 10: Launch the RVC WebUI】
Training and model inference (voice conversion) is performed on the web interface.
You can use the RVC WebUI by clicking on the
Running on public URL: https://〜.gradio.live
URL that appears after running the following code
Run Code
%cd /content/Retrieval-based-Voice-Conversion-WebUI
# %load_ext tensorboard
# %tensorboard --logdir /content/Retrieval-based-Voice-Conversion-WebUI/logs
!python3 infer-web.py --colab --pycmd python3
How to use RVC WebUI: Training
(Creating a trained model with original dataset) – Train
Click on the “Model inference” tab and configure as follows
(This is an example.)
【Step:1】
Fill in the experimental configuration. The experimental data is placed under logs, and each experiment has a folder. You need to manually enter the experimental name path, which contains the experimental configuration, logs, and model files obtained from training.
Input experiment name:
(Name of the output trained model)
amitaro
Target sample rate:
(Sampling Rates)
40k
*Match the sampling rate of the audio file.
Does the model have pitch guidance (singing must, voice can not.):
yes
【Step:2a】
Automatically traverse all files that can be decoded into audio in the training folder and perform slice normalization, and generate 2 wav folders in the experiment directory; only single-person training is supported for the time being.
Input training folder path:
(Specify the path of the dataset folder to be trained)
/content/drive/MyDrive/dataset
*Example of placing a dataset folder named “dataset” in “MyDrive” in Google Drive
Please specify speaker ID:
(Identification number setting)
0
【Step:2b】
Use CPU to extract pitch (if the model has pitch), use GPU to extract features (select card number)
Enter the card numbers used separated by -, for example 0-1-2 use card 0 and card 1 and card 2
0
GPU information:
0 Tesla T4
Number of CPU processes used for pitch extraction:
2
*This is an example.
Select pitch extraction algorithm: Use ‘pm’ for faster processing of singing voice, ‘dio’ for high-quality speech but slower processing, and ‘harvest’ for the best quality but slowest processing.
harvest
*This is an example.
【Step:3】
Fill in the training settings, start training the model and index
Save frequency (save_every_epoch):
(Frequency of storage of trained status)
5
*Some people may get stuck with this setting if they want to study with Google Colaboratory’s free slots at a level of several hundred to a thousand times.
Below is a summary of how to change the limits on the WebUI, which I hope you can refer to if necessary.
How to change the upper limit of “Save frequency (save_every_epoch)” (save frequency of the training model)
:【RVC WebUIの使い方】学習の保存頻度上限を変更編:Save frequencyを変えてGoogle Colaboratoryの無料枠のディスク容量上限対策
Total training epochs (total_epoch):
(Number of training – Number of epochs)
10
*If you have more time, increasing the number of training may improve the quality of voice conversion.
*If you want to improve the quality of the audio conversion, try increasing the “number of audio files” and “audio duration”.
batch_size for every GPU:
(Batch size for each graphics card/GPU)
*How much of the dataset to be trained, and how much to process per.
3
Whether to save only the latest ckpt file to save disk space:
no
Whether to cache all training sets to video memory. Small data under 10 minutes can be cached to speed up training, and large data cache will blow up video memory and not increase the speed much:
no
Load pre-trained base model G path.:
(Specify the path to the file for the pre-trained model G)
pretrained/G40k.pth
Load pre-trained base model D path.:
(Specify the path to the file for the pre-trained model D)
pretrained/D40k.pth
Enter the card numbers used separated by -, for example 0-1-2 use card 0 and card 1 and card 2:
0
Once you have made the settings, click on the
After a few moments, you should see something like the following
・“One-click training.”
button.
After a few moments, you should see something like the following
Output results on RCV WebUI
* “Output information” in “Train”
step 1: processing data
python3 trainset_preprocess_pipeline_print.py /content/drive/MyDrive/dataset 40000 2 /content/Retrieval-based-Voice-Conversion-WebUI/logs/amitaro False
step2a:正在提取音高
python3 extract_f0_print.py /content/Retrieval-based-Voice-Conversion-WebUI/logs/amitaro 2 harvest
step 2b: extracting features
python3 extract_feature_print.py cuda:0 1 0 0 /content/Retrieval-based-Voice-Conversion-WebUI/logs/amitaro
step 3a: training the model
write filelist done
python3 train_nsf_sim_cache_sid_load_pretrain.py -e amitaro -sr 40k -f0 1 -bs 3 -g 0 -te 10 -se 5 -pg pretrained/f0G40k.pth -pd pretrained/f0D40k.pth -l 0 -c 0
Training completed, you can view the training logs in the console or the train.log within the experiement folder
(1611, 256),41
training index
adding index
成功构建索引, added_IVF41_Flat_nprobe_1.index
all processes have been completed!
Output results on RVC-WebUI-for-AI-beginners.ipynb
*Output result of “【Step 10: Launch the RVC WebUI】”
INFO:amitaro:====> Epoch: 1
/usr/local/lib/python3.9/dist-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
INFO:amitaro:====> Epoch: 2
INFO:amitaro:====> Epoch: 3
INFO:amitaro:====> Epoch: 4
INFO:amitaro:Saving model and optimizer state at epoch 5 to ./logs/amitaro/G_15.pth
INFO:amitaro:Saving model and optimizer state at epoch 5 to ./logs/amitaro/D_15.pth
INFO:amitaro:====> Epoch: 5
INFO:amitaro:====> Epoch: 6
INFO:amitaro:====> Epoch: 7
INFO:amitaro:====> Epoch: 8
INFO:amitaro:====> Epoch: 9
INFO:amitaro:Saving model and optimizer state at epoch 10 to ./logs/amitaro/G_30.pth
INFO:amitaro:Saving model and optimizer state at epoch 10 to ./logs/amitaro/D_30.pth
INFO:amitaro:====> Epoch: 10
INFO:amitaro:Training is done. The program is closed.
INFO:amitaro:saving final ckpt:Success.
The output training model can be downloaded on Google Colaboratory. Output in the “weights” folder.
The trained model named
・「amitaro.pth」
*Example of setting the name “amitaro” in “Input experiment name:”.
is output in the “weights” folder.
If necessary, download it to your local environment (your own computer) and try real-time voice changes, etc.
VC Client – client software for AI-based real-time voice changer:
w-okada/voice-changer(The MIT License)| GitHub
Let’s try Text-To-Speech synthesis using the RVC trained model
:【VG WebUI】How to TTS(Text-To-Speech) with the RVC WebUI trained model – Introduction to TTS with RVC
How to use RVC WebUI: Inference (Voice Conversion)
– Model inference
Click on the “Model inference” tab and configure as follows
(This is an example.)
After clicking the “Refresh timbre list” button,
Inferencing timbre:
amitaro.pth
*Example of setting the name “amitaro” in “Input experiment name:”.
can be set.
Please select a speaker id:
(Identification number setting)
0
*Example when “Please specify speaker ID:” (identification ID setting) is set to “0” during training
It is recommended +12key for male to female conversion, and -12key for female to male conversion. If the sound range explodes and the timbre is distorted, you can also adjust it to the appropriate range by yourself.
transpose(integer, number of semitones, octave sharp 12 octave flat -12)
+12
*In this tutorial, the voice is set to +12 to convert from a male voice to a female voice.
Enter the path of the audio file to be processed (the default is the correct format example):
/content/drive/MyDrive/originalvoice.wav
*Example of placing an audio file named “originalvoice.wav” in “MyDrive” of Google Drive.
Select the algorithm for pitch extraction. Use ‘pm’ to speed up for singing voices, or use ‘harvest’ for better low-pitched voices, but it is extremely slow.:
harvest
*If you want to improve the quality of the audio conversion, select “harvest”.
Feature search database file path:
/content/Retrieval-based-Voice-Conversion-WebUI/logs/amitaro/added_IVF20_Flat_nprobe_2.index
(This is an example.)
*Example of a case where “Input experiment name:” (the name of the output training model) is set to “amitaro” during training.
*Copy and paste the path to the “added〜.index” file in the “logs” folder of the “Retrieval-based-Voice-Conversion-WebUI” folder into the input field
Search feature ratio:
1
*The closer the “Search feature ratio” is to “1”, the more it seems to be biased toward the features (tone quality) of the “Trained model” trained this time.
When settings are complete, click the
・Convert
button.
Inference is completed in a few seconds to 10 seconds.
I only had it learn with 10 files of about 1-3 seconds, but when we listened to it, it was converted to a feminine voice.
To make the voice conversion even more robust, it may be necessary to fine-tune the keys and increase the number of files and the number of times they are trained.
Please try to create your own original voice changer through trial and error, referring to the instructions in this article.
The output voice file can be downloaded at
*「ダウンロード:Download」
「Export audio (three dots in the lower right corner, click to download).」
In addition, it will be saved in the “TEMP” folder with a file name like “tmp395k6s3v.wav” (example file) below,
so please download it to your local environment (your computer) if necessary and use it.
When converting the male voice in this tutorial to “Amitaro “(female voice), it seemed good to set it to 12 keys or more.
I am grateful that Google Colaboratory allows me to learn and convert speech through trial and error without worrying about the load on my computer.
If you know how to do it, you can make an original AI voice changer…
The age of cosplaying your own voice as a matter of course…?
Amazing right.
Comments on RVC WebUI explanatory videos and examples of responses
Here are some comments on the Japanese version of RVC WebUI tutorial videos.
I hope that it will be a trigger to solve the questions you are currently wondering.
:【Q&A集】RVC WebUIの使い方 – チュートリアル動画へのコメントと対応例
Train & Model inference:How can I reuse a set of files related to previous trained models in the RVC WebUI?
【RVC WebUI Tutorial】
Reuse of previously trained models from RVC WebUI – How to use AI Voice changer
Video viewing time: 4 min. 47 sec.
I have found a way to reuse a set of files related to previous trained models in the RVC WebUI, and I have summarized the procedure in a video.
【Contents:Reuse of previously trained models from RVC WebUI】
0:00 Introduction
0:31 Step 1:How to use past trained model
1:06 Step 2:Launch the RVC WebUI
1:36 Step 3: Place files
2:50 Step 4:Perform Model inference
by 子供プログラマー – Child Programmer
How to use the RVC WebUI – AI Voice Changer Tutorial by Child Programmer
An Introductory Course for Japanese Artificial Intelligence Programmers (Machine Learning) by Child Programmer
How to use RVC v2 model supported RVC WebUI
:【RVC v2 model supported】How to use RVC WebUI – Getting Started with AI Voice Changer for beginners
How to use Vocal Remover in RVC WebUI
:【RVC WebUIの使い方】ボーカルリムーバー編:AIボイスチェンジャー用学習データセット作成のためのボーカルと音楽の分離
Vocal Remover: Separating Vocals and Music for Creating a Training Dataset for AI Voice Changer