October 23, 2023 / AIResearch Articles / Read Time: 25 Min

From Jackson Wan's New Song to the Legal Issues of AI Voice

Examines whether training and using AI voice models constitutes infringement under Chinese Civil Code voice rights protections, analyzing the difference between reproducing a specific person's voice versus generating composite voices from multiple sources.

AI voice

Has become one of the hottest topics in AI this year

During the 2023 National Day holiday, the author happened to hear a new song by Jackson Wan — “Dear Myself”

Because I rarely pay attention to new songs from Hong Kong in recent years

I just felt the style was very different from Jackson Wan’s previous work

Later, I searched for the music video and lyrics

Only then did I discover it’s a song about Jackson Wan’s personal journey and aspirations for new technology

The biggest innovation (perhaps the only one of its kind in the current Chinese music scene)

Is that he sings a duet with his own AI voice

The “feat. Wan K.” in the title

Wan K. is the codename for Jackson Wan’s AI voice

AI voice, as one of the mainstream outputs of AIGC

Has had its rights ownership in a gray area

Who owns the rights to AI voice (or AI voice training files)?

Whether using trained AI voices constitutes infringement

Remains controversial


1、Jackson Wan and His New Song

Readers from Cantonese-speaking regions, especially the Pearl River Delta area of more mature age

Will know who Jackson Wan is

Jackson Wan (real name Lyu Xiyang), a Vietnamese-Chinese Hong Kong male singer and actor, known as the “King of Temple Street”

His songs are known for being “vulgar” and “earthy,” but he is also very adventurous in trying new musical styles

If you don’t know him, you may have heard these lyrics (since they are QQ Music VIP songs, audio cannot be shared directly in the article):

《Hollywood》
Hey hey Hollywood has a big hotel
Three fat ladies learning to kick a ball, learning to kick a ball
You kick, she kicks, and finally kick it into the river

《Your Mom’s Big Sale》
Your mom, big sale, benefits your dad
Supporting the family day and night is really not easy
Your mom, big sale, benefits your dad
No allowance and still three meals a day

In June 2023, a Hong Kong netizen created “AI Jackson Wan” and used it to cover songs like Hong Kong singer Keung To’s “Dear My Friend” and Terence Lam’s “One Man’s Space,” which went viral online. Because AI Jackson Wan’s singing was astonishing, netizens said it “sounds better than the original,” and they even urged the real Jackson Wan to cover these songs. In response, the “real” Jackson Wan posted on social media on June 9:

“这两天收到大家好多消息,很多谢大家!尹光始终是尹光。
#由头到尾我们都是做回我自己
#有时悲有时喜有时小刀刺大腿
#不过你还有我我还有你
#尹光都还是尹光”

Hong Kong media “Sing Tao Headlines”

In September 2023, Jackson Wan officially released his new song “Dear Myself”

Besides connecting the titles of Jackson Wan’s famous songs to review his life experiences

It also groundbreakingly used Jackson Wan’s own AI voice to sing a duet

The MV also showcased his AI voice image

Netizen-reposted MV link on Bilibili:
【【真·光B】尹光 feat.Wan K.《Dear Myself》|Official MV】 https://www.bilibili.com/video/BV1RN4y1Q74r/?share_source=copy_web&vd_source=1dd9c2c50090b91a5043ac50c2d4c857

The most touching line in the lyrics is

I sing boringly, never too shining
I sing night and day, hoping to make you smile
Now that my voice has depreciated too
I have passed this mission to AI

Jackson Wan never received mainstream awards in his lifetime (“never too shining”)

He sang “vulgar” songs to make people laugh (“hoping to make you smile”)

When he gets old and can’t sing anymore (he’s actually 74 now)

AI Jackson Wan will continue the mission of bringing joy to everyone

Unlike many mainstream “creators” who resist AI

Jackson Wan, as an older-generation singer, chose to embrace AI directly

And affirmed AI’s role in continuing his “mission” in the song, maintaining his一贯 “trendiness”

Can AI Jackson Wan replace Jackson Wan?

Not really

But people will remember Jackson Wan himself

Because of AI Jackson Wan

Of course, at his concert press conference, Jackson Wan announced that he is the first Hong Kong singer to register an AI voice. Anyone who wants to use Jackson Wan’s AI voice in the future must first obtain his consent.

Jackson Wan happily said thank you for the face, allowing him to become the first person: “The company said we need to register, otherwise there will be too many versions and no one knows which one is the real one. Now having a few songs like ‘One Man’s Space’ and ‘Dear My Friend’ is enough to listen to. Too many would make it worthless, haha!“

2、What Is AI Voice

Before analyzing the legal issues of AI voice

Let’s first understand what “AI voice” is

Currently, AI voice is similar to large language models and AI painting

All based on algorithm models for training and generation (inference)

【Click below to understand diffusion algorithms】https://www.lslby.com/%e7%ae%97%e6%b3%95%e6%a8%a1%e5%9e%8b%e5%ba%94%e6%88%90%e4%b8%ba%e4%ba%ba%e5%b7%a5%e6%99%ba%e8%83%bd%ef%bc%88ai%ef%bc%89%e4%be%b5%e6%9d%83%e5%ae%a1%e6%9f%a5%e7%9a%84%e6%a0%b8%e5%bf%83/

There are two common approaches for AI voice: TTS and SVC

TTS (Text-to-Speech) is a very mature approach. Mainstream AI dubbing solutions (like those robot voices in videos) are basically based on TTS. However, TTS has obvious drawbacks—pauses, intonation, tone, and polyphonic characters make it clear that it’s an AI voice (“mechanical feel”).

SVC (Singing Voice Conversion) is a relatively new approach. Unlike TTS which reads text directly, it works more like a “voice changer,” directly transforming a segment of audio into a different timbre. This preserves the original audio’s intonation and other “human characteristics,” reducing the “mechanical voice” feel. Currently, mainstream AI singer solutions are based on this.

Whether TTS or SVC, the materials and process required for training are similar

Both involve obtaining clean audio files, then using the algorithm model chosen by the trainer

After calculating voice features, generating a model file that matches these features (AI voice model)

After training is complete

TTS uses the model to directly “read” text

SVC analyzes the difference between the original audio’s vocals and the model’s features, “transforming” the original vocals to better match the model’s “voice”

3、Legal Protection of AI Voice in China

China’s Civil Code already has provisions for protecting natural persons’ voices:

《民法典》 Article 1023
Licensing of names shall refer to the relevant provisions on licensing of portraits.
Protection of natural persons’ voices shall refer to the relevant provisions on protection of portrait rights.

By referring to the “forced” adjustment of the portrait rights provisions in the Civil Code, we can roughly derive the following content about “voice rights” (note: this is for convenient reading and is not actual legal text):

Natural persons have the right to voice, and are entitled to produce, use, disclose, or license others to use their voice in accordance with the law.

No organization or individual may infringe upon others’ right to voice by means of defamation, damage, or fabrication using information technology, among others. Without the consent of the voice right holder, no one may produce, use, or disclose the voice right holder’s voice, unless otherwise provided by law.

Without the consent of the voice right holder, the owner of the voice work may not use or disclose the voice right holder’s voice through publication, reproduction, distribution, rental, exhibition, etc.

Reasonable implementation of the following acts may be done without the consent of the voice right holder:
(1) Using the voice right holder’s already disclosed voice within the necessary scope for personal study, artistic appreciation, classroom teaching, or scientific research;
(2) Inevitably producing, using, or disclosing the voice right holder’s voice for news reporting;
(3) Producing, using, or disclosing the voice right holder’s voice within the necessary scope for state organs to perform their duties according to law;
(4) Inevitably producing, using, or disclosing the voice right holder’s voice for displaying a specific public environment;
(5) Other acts of producing, using, or disclosing the voice right holder’s voice for safeguarding the public interest or the legitimate rights and interests of the voice right holder.

The core of whether AI voice constitutes infringement, in the author’s opinion, comes down to two points:

  • Whether training AI voice constitutes “fabrication”

  • Whether using AI voice models to generate audio files constitutes “using the voice right holder’s voice"

"Fabrication” Issue

Unlike AI painting

If AI painting is a reproduction based on “commonality”

Generating a combination of certain “common” painting styles or content

Then AI voice is a reproduction of a certain voice’s “features”

The goal is to generate a voice consistent with the “features” (and the ultimate optimization goal of AI voice is necessarily the absolute reproduction of a specific voice)

Similar to this is “AI face-swapping” technology

If targeting a specific person, AI voice training aims to reproduce that person’s voice, using this voice to output audio files of specific content (presumably no one would train it without outputting)

Then, whether from a technical or usage purpose perspective, it may meet the standard of “fabrication,” constituting infringement on the voice source.

However

When using multiple different voice sources and training them together into a voice model such as “the most beautiful voice” or “common male/female voices from a certain region,” where the new voice model is based on the “features” of the voice sources and the results are not similar to any single source, the author believes this would be an exception and would not result in infringement.

Usage Issues

Regarding the use of AI voice, the author believes there are currently several directions:

  • Impersonating a person to commit illegal acts (there have been cases of fraud using AI face-swapping + AI voice)

  • Entertainment (such as AI singers)

  • Reducing commercial costs (using professional voice actors’ voices to dub one’s own works)

The first one doesn’t need elaboration—everyone knows it’s infringement

The second one, at first glance, seems like fair use

Especially the large number of AI singer videos on video sites

Uploaders don’t get financial gain from it (excluding video view incentives and donations for now)

It seems more like videos made just “for personal appreciation and learning”

But unfortunately

In the AI era, the “use” of “portrait rights” (or “voice rights”) can no longer be generalized as before

In the past, “use” meant simple “use” as per the “literal meaning”

But from the above, we understand that using AI voice requires first training it.

When using AI voice for covers (or other uses), there will inevitably be a “training” process, and the purpose of training is to reproduce the singer’s voice, which clearly meets the criteria of “fabrication using information technology means.”

Therefore, under current law

Using others’ voices to train voice models that are consistent or significantly similar to the original and then using them

Will very likely constitute civil infringement

And require bearing tort liability

As for the third situation, similarly

It also constitutes infringement

And depending on actual usage, may constitute more tort liability:

  • Directly advertising as “dubbed by someone (whether real name or nickname)” may constitute infringement of name rights and false advertising;

  • If not disclosing it as AI-generated voice, intending to play “edge ball with similar voices,” it may violate the “Regulations on the Management of Deep Synthesis of Internet Information Services.”

Article 17: Deep synthesis service providers providing the following deep synthesis services that may cause public confusion or misidentification shall make prominent markings at reasonable positions or areas of the generated or edited information content, to inform the public of the deep synthesis situation:

(2) Voice generation such as synthetic voices and voice imitation, or editing services that significantly change personal identity characteristics;

Deep synthesis service providers providing deep synthesis services other than those specified above shall provide prominent marking functions and remind deep synthesis service users that they can make prominent markings.


However

If you directly use a trained model + clearly mark it as AI-generated

But don’t explicitly state the training voice source

Even if it’s clearly infringement

Under current technical conditions

It’s not easy to find evidence of infringement

After all, voices and even faces

Can inevitably be similar to some extent around the world

Even Jackson Wan said

Registration is just to avoid “too many versions and not knowing which one is real,” so you have to register a specific version

How to prove simply and strongly that “this AI voice is me”

I believe will become a target for joint efforts in law and technology in the next stage

4、Personal Thoughts

Having discussed the legal issues

I also want to share some personal thoughts

Jackson Wan’s song

Besides letting listeners know about AI voice

Also brings a thought

As the lyrics say

“The mission has been passed to AI”

Can we train our own AI voice or even AI face data

To let ourselves convey emotions or express love in another way?

For example

Letting an old father who has difficulty expressing emotions read his letter to his children using AI voice, making it easier;

For example

Letting an elder who had an accident or illness have the chance to “personally” sing a birthday song to their descendants through a left-behind AI voice model;

For example

This little story from “Animal Crossing”

If a mother’s AI voice could be included in the letter

Would the emotional communication be better?

A widely circulated “Animal Crossing” story

Another story:

After all, training your own personal AI voice definitely won’t constitute infringement

And you can also follow Jackson Wan’s example

Register your AI voice model in advance

To prevent others from infringing on it

Instead of resisting AI

Why not let AI become your avatar

Become “the me in the mirror”

The me in the mirror and me
Are close friends
—— “Dear Myself”

Boyang Li
Author

Boyang Li

Chinese Attorney — Beijing Longan (Guangzhou) Law Firm

A lawyer focused on game law, AI regulation, data compliance, and digital content rights. I write about practical legal insights for innovative tech teams.

Contact me about this topic →

Research on Criminal Liability and Governance Paths of AI Large Model API Reverse Proxies

Analyzes three types of AI large model API reverse proxies (rule abuse, payment fraud, and protection breakthrough), explores criminal regulation paths such as the crime of destroying computer information systems, and advocates for upholding the principle of criminal restraint while adopting a cross-cutting criminal-civil rights protection strategy.