【AI Tutorial】How to use RVC WebUI: Easy creation of original AI voice changer for beginners

【AI Tutorial】How to use RVC WebUI: Easy creation of original AI voice changer for beginners

 

RVC v2 model-compatible version are now available
– Create your own AI voice changer!

 

 

On May 30, 2023, I published a tutorial article and sample code for a version that supports the RVC v2 model.

Tutorial article and sample code for the RVC v2 model-compatible version are now available.

The program is developing daily and changing quickly…
It may take some time yet, but I’m considering creating a tutorial video for RVC v2 model-compatible version.

The Gradio-related error that appeared on June 16, 2023 has already been addressed, so please run Google “Save a copy in Drive” and launch RVC WebUI with the tutorial code from the last updated version after June 16, 2023.

RVC v2 model supported RVC WebUI Tutorial:
【RVC v2 model supported】How to use RVC WebUI – Getting Started with AI Voice Changer for beginners
*A tutorial video was also released on June 18, 2023.

 

【Confirmatory event: July 18, 2023】

Error when starting RVC WebUI

ModuleNotFoundError: No module named ‘faiss’

is now fixed so that it does not appear.

 

【Confirmatory event: August 12, 2023】

Fixed a bug when training with the RVC WebUI・

“Pre-trained models are not reflected and training does not proceed.

and so on.

 

【Confirmatory event: Dec 26, 2023】

I have modified the code in 【Step 2: Installation of dependencies】.

→ I have not been able to verify whether the RVC WebUI can be used after the code has been modified.
According to the person who provided feedback in the comments, until December 26, 2023, “Train” was able to do so.(Depends on the type of runtime)

→ Added: December 27, 2023
“Train” was able to do so, but could not change voices due to errors in “Model inference”.

*The free version is not available. At this time (as of December 2023), you will need to pay a fee.

 

【Confirmed event: Feb 8, 2024】

As of February 8, 2024, both “Train” and “Model Inference” seem to be possible with the following tutorial code for the RVC v2 model.

RVC v2 model supported version tutorial article:
【RVC v2 model supported】How to use RVC WebUI – Getting Started with AI Voice Changer for beginners

 

【Confirmed event: Feb 11, 2024】

As of February 11, 2024, both “Train” and “Model Inference” seem to be possible, according to a commenter on the RVC WebUI’s tutorial video.

*The free version is not available. At this time (as of February 2024), you will need to pay a fee.

 

【Confirmed event: Mar 9, 2024】

Corrected the code in step 2 for “pydantic-core” & “typing-extensions”, “chex” & “numpy” related errors.

ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sqlalchemy 2.0.28 requires typing-extensions>=4.6.0, but you have typing-extensions 4.5.0 which is incompatible.
pydantic-core 2.16.3 requires typing-extensions!=4.7.0,>=4.6.0, but you have typing-extensions 4.5.0 which is incompatible.

ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chex 0.1.85 requires numpy>=1.24.1, but you have numpy 1.23.5 which is incompatible.

*The free version cannot be used. At this time (as of Mar 2024), billing is required.

 

【Confirmed event: Mar 18, 2024】

Corrected the code in step 2 for “PyTorch” related errors.

*The free version cannot be used. At this time (as of Mar 2024), billing is required.

 

【Confirmed event: Apr 4, 2024】

Corrected the code in step 2 for “optax” & “pandas-stubs” related errors.

*The free version cannot be used. At this time (as of Apr 2024), billing is required.

 




 

The Benefits of Machine Learning! But are you still on the threshold?
RVC WebUI – AI Voice Changer Tutorial

 

 

Some of you may want to use the RVC WebUI,

RVC WebUI:
RVC-Project(previously known as liujing04)/Retrieval-based-Voice-Conversion-WebUI(The MIT License)| GitHub

the AI voice changer released in April 2023, but are saddened by the fact that you don’t know how on earth to use it….

The familiar artificial intelligence chatbot published by Open AI

・ChatGPT(Generative Pre-trained Transformer)

It is surprising to see how accurate voice transformation technology (voice quality transformation technology) can now be handled by using pre-trained models, even at the individual level, with only a small data set.

How to use the program,

・Machine learning program rules
・Understanding of file structure to some extent
・How to specify files

I suspect that if you don’t know how to translate a file, you may not be able to get a handle on it.

So, in order to make it easier for Japanese people who are interested in RVC WebUI to enjoy AI voice changers, I will summarize how to use Google Colaboratory to launch RVC WebUI and create an original AI voice changer.

I hope that this series of information will provide you with an opportunity to get in touch with AI Voice Changer.

 

 

Sample Code Links & Program Licenses

 

 

I have released sample code with explanations on how to use it so that you can easily try RVC WebUI.
I hope this will be of some help to first-time AI students around the world who find the original RVC WebUI Google Colaboratory code difficult to understand.

English Version
RVC WebUI Tutorial:

RVC-WebUI-for-AI-beginners.ipynb(The MIT License)| Google Colaboratory

 

License for the sample code “RVC-WebUI-for-AI-beginners.ipynb”:

The MIT License

Copyright 2023 child programmer

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

 

 

Download the audio material used in this tutorial

 

 

【About the audio material used in this tutorial】

Before creating your own original AI voice changer of your choice, many of you may first want to learn a series of operating procedures to try out.

I searched for information and found that some people have given us permission to use the voice changers, so I have used the following audio for training in this tutorial.

AI Voice Changer Training Voice (female voice) audio download:
あみたろの声素材(セリフ素材)一括ダウンロード | あみたろの声素材工房
(PCM44,100Hz/16-bit/monaural WAV format)
Credit Information:あみたろの声素材工房 https://amitaro.net/

In addition, I have made the audio material before converting to “Amitaro’s” voice available for download from this page, so please use it if necessary.

Download audio material of a voice (male voice) testing inferences:


Download: Voice material (male voice) to try model inference (AI voice change)
(Sample rate 48,000Hz/24-bit/stereo WAV format)
* Licensing of sample audio:audio material of a voice (male voice) testing inferences
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

The voice material before conversion was partially extracted from this audio.

The source of the voice material for testing model inferences:
People’s Speech Dataset(CC-BY and CC-BY-SA)| MLCommons

 

 

How to start and install RVC WebUI
– Last update: Apr 4, 2024

 

 

【Step 1: Check the GPU】

 

 

If you are unable to check your GPU with the following commands, go to the Google Colaboratory menu

「Runtime – Change runtime type – Hardware accelerator」

and select “GPU”, then save and try running the code again.

Run Code

!nvidia-smi
!nvcc -V
!free -h

 

Output Result

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   51C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
              total        used        free      shared  buff/cache   available
Mem:           12Gi       636Mi       9.0Gi       1.0Mi       3.1Gi        11Gi
Swap:            0B          0B          0B

 

 

【Step 2: Installation of dependencies】

 

 

At the time of confirmation,

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: The following packages were previously imported in this runtime: [numpy] You must restart the runtime in order to use newly installed versions.

RESTART RUNTIME

warning appears on the first runtime, but it seemed to be OK as is.

If you are concerned, click the “RESTART RUNTIME” button displayed on the screen to restart the runtime, and then execute the code from “Step 1” again.

Run Code

!pip3 install jedi==0.19.1
!apt-get -y install build-essential python3-dev ffmpeg
!pip3 install torch==2.1.0 torchtext==0.16.0 torchvision==0.16.0 torchaudio==2.1.0 # Downgrade PyTorch version to 2.1.0 + adjusted dependencies
!pip3 install optax==0.2.1 # Specify the version of optax that chex can use
!pip3 install chex==0.1.7 # numpy==1.23.5 specifies the version of chex that can be used
!pip3 install pandas-stubs==2.0.1.230501 # numpy==1.23.5 specifies the version of pandas-stubs that can be used
!pip3 install kaleido==0.2.1 tensorflow-probability==0.20.1 typing-extensions==4.6.0 faiss-cpu==1.7.2 fairseq==0.12.2 gradio==3.14.0
!pip3 install ffmpeg==1.4 ffmpeg-python>=0.2.0 praat-parselmouth==0.4.3 pyworld==0.3.2 numpy==1.23.5 numba==0.56.4 librosa==0.9.2 tensorboardX==2.6.2.2 onnx==1.15.0

print('(Version at the time of execution)') # As of execution on April 4, 2024:python 3.10.12
import platform
print('python ' + platform.python_version())

 

 

【Step 3: Clone the RVC WebUI repository from GitHub】

 

 

Copy the program “Retrieval-based-Voice-Conversion-WebUI” from GitHub to Google Colaboratory.

Run Code

!git clone --depth=1 -b stable https://github.com/fumiama/Retrieval-based-Voice-Conversion-WebUI
%cd /content/Retrieval-based-Voice-Conversion-WebUI
!mkdir -p pretrained uvr5_weights

 

 

【Step 4: Update to the latest status】

 

 

Run Code

!git pull

 

 

【Step 5: Preparation of Pre-trained models, etc.】

 

 

Run Code

!apt -y install -qq aria2

#pretrained__v1 pre-trained model
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G48k.pth

#trained model for sound source separation
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP2-人声vocals+非人声instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP2-人声vocals+非人声instrumentals.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP5-主旋律人声vocals+其他instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP5-主旋律人声vocals+其他instrumentals.pth

 

 

【Step 6: Download hubert_base.pt】

 

 

Run Code

!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt -d /content/Retrieval-based-Voice-Conversion-WebUI -o hubert_base.pt

 

 

【Step 7: Mount Google Drive】

 

 

You can use folders and files on Google Drive.
After executing the following code, you will be asked for permission, so “Allow” with your Google Drive Google account.

Run Code

from google.colab import drive
drive.mount('/content/drive')

 

 

【Step 8: Prepare the dataset and the audio file to be converted to voice in Google Drive.】

 

 

In Google Drive’s “MyDrive”, prepare a

・”dataset”

in the “MyDrive” folder of Google Drive.

Also, upload the voice files (WAV or MP3 format) that you would like to try inference (voice conversion).

 

【Folder Structure】

RVC WebU Training Dataset - Folder Structure

dataset
 |— 〜1.wav
 |— 〜2.wav
 |— 〜3.wav
 |— 〜4.wav
 ・
 ・
 ・
 |— 〜10.wav

*As an example, the “dataset” folder should contain several audio files in WAV format separated by short sentences (sentences up to the punctuation point).
*In this tutorial, I was training with “10” audio files of “1-3 seconds”. If you want to convert audio in earnest, increasing the number of audio files and the number of learning times (number of Epochs) may reduce the machine sound and other problems.
(It seems that a GPU with more memory is better.)

 

To create audio files to be trained, I have seen a number of ways to use audio editing applications such as

・Audacity

and other audio editing applications to create the audio files to be trained.
How long should the audio files for each data set be? I am not sure, but the producer explains

“Use less than 10 minutes vocal to fast train a voice conversion model!”

I looked up related information on the Internet, and also looked at currently distributed audio corpora, and found that

・each file contains only a few seconds of audio

So, after learning how to use the RVC WebUI to some extent, please try various trial-and-error methods.

 

 

【Step 9: Conversion of duplicate file names】

 

 

Rename duplicate files in the dataset (“dataset” folder).

Run Code

!ls -a /content/drive/MyDrive/dataset
!rename 's/(\w+)\.(\w+)~(\d*)/$1_$3.$2/' /content/drive/MyDrive/dataset/*.*~*

 

 

【Step 10: Launch the RVC WebUI】

 

 

Training and model inference (voice conversion) is performed on the web interface.

You can use the RVC WebUI by clicking on the

Running on public URL: https://〜.gradio.live

URL that appears after running the following code

Run Code

%cd /content/Retrieval-based-Voice-Conversion-WebUI
# %load_ext tensorboard
# %tensorboard --logdir /content/Retrieval-based-Voice-Conversion-WebUI/logs
!python3 infer-web.py --colab --pycmd python3

 

 

How to use RVC WebUI: Training
(Creating a trained model with original dataset) – Train

 

 

 

Click on the “Model inference” tab and configure as follows
(This is an example.)

 

【Step:1】
Fill in the experimental configuration. The experimental data is placed under logs, and each experiment has a folder. You need to manually enter the experimental name path, which contains the experimental configuration, logs, and model files obtained from training.

 

Input experiment name:
(Name of the output trained model)
amitaro

 

step1:Target sample rate - How to Use RVC WebUI

Target sample rate:
(Sampling Rates)
40k
*Match the sampling rate of the audio file.

 

Does the model have pitch guidance (singing must, voice can not.):
yes

 

【Step:2a】
Automatically traverse all files that can be decoded into audio in the training folder and perform slice normalization, and generate 2 wav folders in the experiment directory; only single-person training is supported for the time being.

 

step2a:Input training folder path - - How to Use RVC WebUI

Input training folder path:
(Specify the path of the dataset folder to be trained)
/content/drive/MyDrive/dataset
*Example of placing a dataset folder named “dataset” in “MyDrive” in Google Drive

 

step2a:Please specify speaker ID(Identification number setting)- How to Use RVC WebUI

Please specify speaker ID:
(Identification number setting)
0

 

【Step:2b】
Use CPU to extract pitch (if the model has pitch), use GPU to extract features (select card number)

 

Enter the card numbers used separated by -, for example 0-1-2 use card 0 and card 1 and card 2
0

 

GPU information:
0 Tesla T4

 

Number of CPU processes used for pitch extraction - How to Use RVC WebUI

Number of CPU processes used for pitch extraction:
2
*This is an example.

Select pitch extraction algorithm: Use ‘pm’ for faster processing of singing voice, ‘dio’ for high-quality speech but slower processing, and ‘harvest’ for the best quality but slowest processing.
harvest
*This is an example.

 

【Step:3】
Fill in the training settings, start training the model and index

 

Save frequency (save_every_epoch):
(Frequency of storage of trained status)
5

*Some people may get stuck with this setting if they want to study with Google Colaboratory’s free slots at a level of several hundred to a thousand times.
Below is a summary of how to change the limits on the WebUI, which I hope you can refer to if necessary.

How to change the upper limit of “Save frequency (save_every_epoch)” (save frequency of the training model)
【RVC WebUIの使い方】学習の保存頻度上限を変更編:Save frequencyを変えてGoogle Colaboratoryの無料枠のディスク容量上限対策

 

step3:Total training epochs - How to Use RVC WebUI

Total training epochs (total_epoch):
(Number of training – Number of epochs)
10
*If you have more time, increasing the number of training may improve the quality of voice conversion.
*If you want to improve the quality of the audio conversion, try increasing the “number of audio files” and “audio duration”.

 

step:3 batch_size - How to Use RVC WebUI

batch_size for every GPU:
(Batch size for each graphics card/GPU)
*How much of the dataset to be trained, and how much to process per.
3

 

Whether to save only the latest ckpt file to save disk space:
no

 

Whether to cache all training sets to video memory. Small data under 10 minutes can be cached to speed up training, and large data cache will blow up video memory and not increase the speed much:
no

 

step3:Load pre-trained base model G path. - How to Use RVC WebUI

Load pre-trained base model G path.:
(Specify the path to the file for the pre-trained model G)
pretrained/G40k.pth

 

step3:Load pre-trained base model D path. - How to Use RVC WebUI

Load pre-trained base model D path.:
(Specify the path to the file for the pre-trained model D)
pretrained/D40k.pth

 

Enter the card numbers used separated by -, for example 0-1-2 use card 0 and card 1 and card 2:
0

 

Once you have made the settings, click on the
After a few moments, you should see something like the following

One-click training

“One-click training.”

button.

 

After a few moments, you should see something like the following

 

Output results on RCV WebUI
“Output information” in “Train”

step 1: processing data
python3 trainset_preprocess_pipeline_print.py /content/drive/MyDrive/dataset 40000 2 /content/Retrieval-based-Voice-Conversion-WebUI/logs/amitaro False
step2a:正在提取音高
python3 extract_f0_print.py /content/Retrieval-based-Voice-Conversion-WebUI/logs/amitaro 2 harvest
step 2b: extracting features
python3 extract_feature_print.py cuda:0 1 0 0 /content/Retrieval-based-Voice-Conversion-WebUI/logs/amitaro
step 3a: training the model
write filelist done
python3 train_nsf_sim_cache_sid_load_pretrain.py -e amitaro -sr 40k -f0 1 -bs 3 -g 0 -te 10 -se 5 -pg pretrained/f0G40k.pth -pd pretrained/f0D40k.pth -l 0 -c 0
Training completed, you can view the training logs in the console or the train.log within the experiement folder
(1611, 256),41
training index
adding index
成功构建索引, added_IVF41_Flat_nprobe_1.index
all processes have been completed!

 

Output results on RVC-WebUI-for-AI-beginners.ipynb
*Output result of “【Step 10: Launch the RVC WebUI】”

Training:RVC-WebUI-for-Japanese-AI-beginners.ipynb - How to Use RVC WebUI

INFO:amitaro:====> Epoch: 1
/usr/local/lib/python3.9/dist-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
INFO:amitaro:====> Epoch: 2
INFO:amitaro:====> Epoch: 3
INFO:amitaro:====> Epoch: 4
INFO:amitaro:Saving model and optimizer state at epoch 5 to ./logs/amitaro/G_15.pth
INFO:amitaro:Saving model and optimizer state at epoch 5 to ./logs/amitaro/D_15.pth
INFO:amitaro:====> Epoch: 5
INFO:amitaro:====> Epoch: 6
INFO:amitaro:====> Epoch: 7
INFO:amitaro:====> Epoch: 8
INFO:amitaro:====> Epoch: 9
INFO:amitaro:Saving model and optimizer state at epoch 10 to ./logs/amitaro/G_30.pth
INFO:amitaro:Saving model and optimizer state at epoch 10 to ./logs/amitaro/D_30.pth
INFO:amitaro:====> Epoch: 10
INFO:amitaro:Training is done. The program is closed.
INFO:amitaro:saving final ckpt:Success.

 

The output training model can be downloaded on Google Colaboratory. Output in the “weights” folder.
The trained model named

Location of the trained model: in the weights folder

・「amitaro.pth
*Example of setting the name “amitaro” in “Input experiment name:”.

is output in the “weights” folder.
If necessary, download it to your local environment (your own computer) and try real-time voice changes, etc.

VC Client – client software for AI-based real-time voice changer:
w-okada/voice-changer(The MIT License)| GitHub

 

Let’s try Text-To-Speech synthesis using the RVC trained model
【VG WebUI】How to TTS(Text-To-Speech) with the RVC WebUI trained model – Introduction to TTS with RVC

 

 

How to use RVC WebUI: Inference (Voice Conversion)
– Model inference

 

 

Click on the “Model inference” tab and configure as follows
(This is an example.)

 

Model inference:Refresh timbre list - RVC WebUIの使い方

After clicking the “Refresh timbre list” button,

Model inference:Inferencing timbreで学習モデルが選択可能になる - RVC WebUIの使い方

Inferencing timbre:
amitaro.pth
*Example of setting the name “amitaro” in “Input experiment name:”.

can be set.

 

Model inference:Please select a speaker id(識別番号の指定) - RVC WebUIの使い方

Please select a speaker id:
(Identification number setting)
0
*Example when “Please specify speaker ID:” (identification ID setting) is set to “0” during training

 

It is recommended +12key for male to female conversion, and -12key for female to male conversion. If the sound range explodes and the timbre is distorted, you can also adjust it to the appropriate range by yourself.

 

transpose(integer, number of semitones, octave sharp 12 octave flat -12)
+12
*In this tutorial, the voice is set to +12 to convert from a male voice to a female voice.

 

Enter the path of the audio file to be processed (the default is the correct format example):
/content/drive/MyDrive/originalvoice.wav
*Example of placing an audio file named “originalvoice.wav” in “MyDrive” of Google Drive.

 

Select the algorithm for pitch extraction. Use ‘pm’ to speed up for singing voices, or use ‘harvest’ for better low-pitched voices, but it is extremely slow.:
harvest
*If you want to improve the quality of the audio conversion, select “harvest”.

 

Model Inference:Feature search database file path - RVC WebUIの使い方

Feature search database file path:
/content/Retrieval-based-Voice-Conversion-WebUI/logs/amitaro/added_IVF20_Flat_nprobe_2.index
(This is an example.)
*Example of a case where “Input experiment name:” (the name of the output training model) is set to “amitaro” during training.
*Copy and paste the path to the “added〜.index” file in the “logs” folder of the “Retrieval-based-Voice-Conversion-WebUI” folder into the input field

 

Search feature ratio:
1
*The closer the “Search feature ratio” is to “1”, the more it seems to be biased toward the features (tone quality) of the “Trained model” trained this time.

 

When settings are complete, click the

Model Inference:Conversion - RVC WebUIの使い方

Convert

button.

Inference is completed in a few seconds to 10 seconds.
I only had it learn with 10 files of about 1-3 seconds, but when we listened to it, it was converted to a feminine voice.
To make the voice conversion even more robust, it may be necessary to fine-tune the keys and increase the number of files and the number of times they are trained.

Please try to create your own original voice changer through trial and error, referring to the instructions in this article.
The output voice file can be downloaded at

ボイチェン後の音声のダウンロード方法-RVC WebUIの使い方
*「ダウンロード:Download」

Export audio (three dots in the lower right corner, click to download).

In addition, it will be saved in the “TEMP” folder with a file name like “tmp395k6s3v.wav” (example file) below,

推論:TEMPフォルダ内にwavファイルが出力 - RVC WebUIの使い方

so please download it to your local environment (your computer) if necessary and use it.

 

 

When converting the male voice in this tutorial to “Amitaro “(female voice), it seemed good to set it to 12 keys or more.
I am grateful that Google Colaboratory allows me to learn and convert speech through trial and error without worrying about the load on my computer.

 

If you know how to do it, you can make an original AI voice changer…

The age of cosplaying your own voice as a matter of course…?

Amazing right.

 

 

Comments on RVC WebUI explanatory videos and examples of responses

 

【Q&A集】RVC WebUIの使い方 - チュートリアル動画へのコメントと対応例

 

Here are some comments on the Japanese version of RVC WebUI tutorial videos.
I hope that it will be a trigger to solve the questions you are currently wondering.

【Q&A集】RVC WebUIの使い方 – チュートリアル動画へのコメントと対応例

 

 

Train & Model inference:How can I reuse a set of files related to previous trained models in the RVC WebUI?

 

 

【RVC WebUI Tutorial】
Reuse of previously trained models from RVC WebUI – How to use AI Voice changer


Video viewing time: 4 min. 47 sec.

I have found a way to reuse a set of files related to previous trained models in the RVC WebUI, and I have summarized the procedure in a video.

 

【Contents:Reuse of previously trained models from RVC WebUI】

0:00 Introduction
0:31 Step 1:How to use past trained model
1:06 Step 2:Launch the RVC WebUI
1:36 Step 3: Place files
2:50 Step 4:Perform Model inference

 

 

by 子供プログラマー – Child Programmer

 

How to use the RVC WebUI – AI Voice Changer Tutorial by Child Programmer

 

An Introductory Course for Japanese Artificial Intelligence Programmers (Machine Learning) by Child Programmer

 

How to use RVC v2 model supported RVC WebUI
【RVC v2 model supported】How to use RVC WebUI – Getting Started with AI Voice Changer for beginners

 

How to use Vocal Remover in RVC WebUI
【RVC WebUIの使い方】ボーカルリムーバー編:AIボイスチェンジャー用学習データセット作成のためのボーカルと音楽の分離 
Vocal Remover: Separating Vocals and Music for Creating a Training Dataset for AI Voice Changer