Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

TOP 文献データベース Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1803.00860

PDF

https://arxiv.org/pdf/1803.00860

文献情報

作者: Jaime Lorenzo-Trueba,Fuming Fang,Xin Wang,Isao Echizen,Junichi Yamagishi,Tomi Kinnunen
公開日: 2018-3-2
所属機関: National Institute of Informatics
所属の国: Japan
会議名: Speaker and Language Recognition Workshop (Odyssey)

AIにより推定されたラベル

音声強化技術音声認識システムデータ収集手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database. However, speech synthesis and voice conversion paradigms that are not considered in the ASVspoof2015 database are appearing. Such examples include direct waveform modelling and generative adversarial networks. We also need to investigate the feasibility of training spoofing systems using only low-quality found data. For that purpose, we developed a generative adversarial network-based speech enhancement system that improves the quality of speech data found in publicly available sources. Using the enhanced data, we trained state-of-the-art text-to-speech and voice conversion models and evaluated them in terms of perceptual speech quality and speaker similarity. The results show that the enhancement models significantly improved the SNR of low-quality degraded data found in publicly available sources and that they significantly improved the perceptual cleanliness of the source speech without significantly degrading the naturalness of the voice. However, the results also show limitations when generating speech with the low-quality found data.

外部データセット

ASVspoof2015

CSTR VCTK corpus

Noisy VCTK

Reverberant VCTK

Noisy and reverberant VCTK