Index Links:
Source Portrait: |
Result: |
SadTalker
EAT
PD-FGC
AniTalker
EDTalker
EchoMimic
VQTalker (Ours)
|
Source Portrait: |
Result: |
SadTalker
EAT
PD-FGC
AniTalker
EDTalker
EchoMimic
VQTalker (Ours)
|
Source Portrait: |
Result: |
SadTalker
EAT
PD-FGC
AniTalker
EDTalker
EchoMimic
VQTalker (Ours)
|
Source Portrait: |
Result: |
SadTalker
EAT
PD-FGC
AniTalker
EDTalker
EchoMimic
VQTalker (Ours)
|
Source Portrait: |
Result: |
SadTalker
EAT
PD-FGC
AniTalker
EDTalker
EchoMimic
VQTalker (Ours)
|
Source Portrait: |
Result: |
SadTalker
EAT
PD-FGC
AniTalker
EDTalker
EchoMimic
VQTalker (Ours)
|
Video1: |
H.264
(347 kbps) FOMM (48 kbps) DPE (16 kbps) MTIA (48 kbps) Vid2Vid (36 kbps) LIA (16 kbps) FADM (36 kbps) AniTalke (16 kbps) LivePortrait (50 kbps) FaceTokenizer (Ours, 11 kbps) |
Video2: |
H.264
(347 kbps) FOMM (48 kbps) DPE (16 kbps) MTIA (48 kbps) Vid2Vid (36 kbps) LIA (16 kbps) FADM (36 kbps) AniTalke (16 kbps) LivePortrait (50 kbps) FaceTokenizer (Ours, 11 kbps) |
Video3: |
H.264
(347 kbps) FOMM (48 kbps) DPE (16 kbps) MTIA (48 kbps) Vid2Vid (36 kbps) LIA (16 kbps) FADM (36 kbps) AniTalke (16 kbps) LivePortrait (50 kbps) FaceTokenizer (Ours, 11 kbps) |
Video4: |
H.264
(347 kbps) FOMM (48 kbps) DPE (16 kbps) MTIA (48 kbps) Vid2Vid (36 kbps) LIA (16 kbps) FADM (36 kbps) AniTalke (16 kbps) LivePortrait (50 kbps) FaceTokenizer (Ours, 11 kbps) |
(1) only the first codebook level, masking the subsequent three: |
Source
Driven Feature (Providing VQ features for driving) Result |
(2) the first two codebook levels, masking the latter two: |
Source
Driven Feature (Providing VQ features for driving) Result |
(3) the first three codebook levels, masking only the final one: |
Source
Driven Feature (Providing VQ features for driving) Result |
(4) all codebook levels: |
Source
Driven Feature (Providing VQ features for driving) Result |
Video 1: |
Driven Video
VQ GVQ RVQ GRVQ GRFSQ #1 GRFSQ #2 GRFSQ #3 GRFSQ #4 (Ours) |
Video 2: |
Driven Video
VQ GVQ RVQ GRVQ GRFSQ #1 GRFSQ #2 GRFSQ #3 GRFSQ #4 (Ours) |
Video 3: |
Driven Video
VQ GVQ RVQ GRVQ GRFSQ #1 GRFSQ #2 GRFSQ #3 GRFSQ #4 (Ours) |
Video 4: |
Driven Video
VQ GVQ RVQ GRVQ GRFSQ #1 GRFSQ #2 GRFSQ #3 GRFSQ #4 (Ours) |
Video 5 (Cross Video Driven): |
Driven Video
VQ GVQ RVQ GRVQ GRFSQ #1 GRFSQ #2 GRFSQ #3 GRFSQ #4 (Ours) |
Video 1: |
C-to-C
(Whisper Continuous Vector to Continuous Vector) D-to-C (CosyVoice Speech Tokens to Continuous Vector) C-to-D (Whisper Continuous Vector to Discrete Vector) D-to-D (VQ-wav2vec Speech Tokens to Discrete Vector) D-to-D (CosyVoice Speech Tokens to Discrete Vector) (Ours) |
Video 2: |
C-to-C
(Whisper Continuous Vector to Continuous Vector) D-to-C (CosyVoice Speech Tokens to Continuous Vector) C-to-D (Whisper Continuous Vector to Discrete Vector) D-to-D (VQ-wav2vec Speech Tokens to Discrete Vector) D-to-D (CosyVoice Speech Tokens to Discrete Vector) (Ours) |
Video 3: |
C-to-C
(Whisper Continuous Vector to Continuous Vector) D-to-C (CosyVoice Speech Tokens to Continuous Vector) C-to-D (Whisper Continuous Vector to Discrete Vector) D-to-D (VQ-wav2vec Speech Tokens to Discrete Vector) D-to-D (CosyVoice Speech Tokens to Discrete Vector) (Ours) |
Video 4: |
C-to-C
(Whisper Continuous Vector to Continuous Vector) D-to-C (CosyVoice Speech Tokens to Continuous Vector) C-to-D (Whisper Continuous Vector to Discrete Vector) D-to-D (VQ-wav2vec Speech Tokens to Discrete Vector) D-to-D (CosyVoice Speech Tokens to Discrete Vector) (Ours) |
The rapid advancement of digital human technology, particularly in the creation of highly realistic virtual faces, presents significant ethical challenges. There are genuine concerns about the potential misuse of this technology for malicious purposes, such as deepfakes, identity theft, or the propagation of misinformation. To address these issues, it is crucial that developers and organizations establish comprehensive ethical guidelines before deploying such technologies. These guidelines should encompass principles of user privacy, data protection, and responsible use. Furthermore, to enhance accountability and prevent misuse, it is recommended to implement robust verification systems and content attribution methods for all digitally generated human representations. This could include blockchain-based authentication or secure metadata tagging. By proactively addressing these ethical considerations, we can foster the positive potential of digital human technology while minimizing its risks to individuals and society.