Xiaocha From the Aofei temple
qubits reports | official account QbitAI
Watch CCTV news , You must be right “ Duanzishou ” Zhu Guangquan drives the sign language teacher crazy. I'm very impressed .
Sign language news helps hearing-impaired people better understand the world .
But have you ever thought about , Automatic subtitle generation technology has been very mature today ,AI Can quickly convert voice into text , Why do TV news need sign language ？
A group of programmers create sign language hosts for deaf mute people
In China, 14 Millions of people , Yes 2700 Ten thousand hearing-impaired people .
The age of these people 、 Education is uneven . Many hearing-impaired people are not well educated , Many of them are more familiar with sign language than with words .
And the way sign language thinks 、 The order and the way of voice are not the same .
For example, we generally say “ Driving without drinking ”, But in sign language , It's in order “ Driving a car ”、“ Drink ”、“ Not allowed to ” These three gestures .
Most TV programs are arranged in normal word order , Little attention is paid to the special structure of sign language , As a result, the vast majority of hearing-impaired people can only understand insufficiently 60% The content of .
In particular, news such as the outbreak of epidemic situation was broadcast , There are often no real-time subtitles , It's harder for these people to get information .
When we're using our cell phones to swipe short videos 、 When watching the news , Because of the lack of sign language , Those who are hearing-impaired can't accept this information as much as we do , Many of them have difficulty integrating into society , On the verge of being forgotten .
therefore , There are a group of Sogou programmers , I want to do something for the Deaf .
In this year's Sohu 5G&AI The summit , Sogou has released the latest generation of AI Synthesis of the host —— The world's first sign language AI Synthesis of the host “ Xiao Cong ”.
AI Sign language , It's not that simple
2018 year , Sogou cooperates with new media of Xinhua news agency , Take Xinhua News Agency Qiu Hao and Qu Meng as the prototype , Making the world's first AI Synthesis of the host “ New Xiaohao ” and “ The new small sprout ”.
Now? , Sogou's separation technology upgrade , This makes “ Duanzishou ” Zhu Guangquan met a real AI“ rival ”—— sign language AI Synthesis of the host “ Xiao Cong ”, It can transform all kinds of complex languages into sign language that is easier for hearing-impaired people to understand .
“ Xiao Cong ” Using the industry's leading 3D Relight scan restore 、 Facial muscles drive 、 Facial expression body gesture capture technology , It produces highly restored human hair and skin 、 Vivid image 、 A digital human model with natural and vivid movements , A substantial breakthrough in the realism of digital people can significantly enhance the authenticity and intimacy of sign language broadcasting , So as to improve the broadcast user experience .
Sogou said , In the evaluation “ Xiao Cong ” The intelligibility of the system reaches 85% above , Compared with pure words, the efficiency of conveying information has been significantly improved , It can effectively help the hearing-impaired to overcome the understanding barrier .
from AI News anchor to AI Sign language anchor , This looks like a regular iteration after upgrading , There are many difficulties behind it .
First , Developing sign language AI Programmers who compose anchors , They are all ordinary people , I don't know enough about sign language .
At first, they thought that they just need to make a transformation model from voice to vision , But in practice , The problem is not as simple as it seems .
A Sogou employee who participated in the development said , They are faced with three major difficulties ： One is that the word order of sign language is different from that of spoken language , There is also a lack of some spoken words in sign language , Finally, expression and gas are also a very important part of sign language expression .
All these factors determine , sign language AI It's not that easy to compose an anchor .
secondly , The national sign language standard is 2019 It's only a year since it was made , There is no ready-made sign language video image data set available in the industry . To this end, Sogou invited three groups of people “ Advisory Group ” Give advice and suggestions .
Among them are experts in sign language standards , There are sign language teachers , And the hearing-impaired who are using sign language .
Sogou collects their sign language data , Listen to how they use it , After more than a year of polishing ,“ Xiao Cong ” Finally it's on line .
In many AI Technology companies , Why Sogou first launched sign language AI Synthesis of the host ？
It's not an accident , from 2018 Year begins , Sogou has been exploring AI Digital human technology , stay AI There's already a way to synthesize anchors 3 Years of successful landing experience .
This time, , Sogou not only released sign language AI Synthesis of the host , Also released the same model of Liuyan “ Digital person ”, It can seamlessly switch multiple dialects in one news , Even if Liu Yan doesn't know his own dialect .
This technology requires only a small amount of real voice 、 Video data , You can customize a high fidelity split model , It has been successfully used in Xinhua news agency 、 CCTV and other media .
Because of Sogou AI The team is super realistic 3D Continuous research and development in the field of digital human and new breakthroughs have been made , The sign language digital people released this time “ Xiao Cong ”, Combined with force technology, the industry is the leading one 3D The high precision model of sign language digital human and the collected animation data are restored by double illumination scanning , Assisted by self-developed technology of facial expression and body gesture capture , Created such a high degree of restoration of human hair and skin 、 Vivid image 、 A digital human model with natural and vivid movements .
We are actively exploring AI Off the ground , Sogou technology team also “ quietly ” Accumulated a lot of basic technology .
There's multimodal language processing , Use images and videos to improve AI The ability to process words .
for example 2019 year , Sogou is a paper on lip language to improve speech recognition accuracy , Presented at the top academic conference in the field of signal processing ICASSP On .
Sogou also has a lot of exploration in digital human body driving technology , Last year, ACM MM 2020 He also published a paper on virtual human dancing to the rhythm of music .
Of course, the most important thing is probably Sogou AI The technical feelings of the team .
sogou AI Chen Wei, general manager of interactive technology department, said , One thing touched him a lot .
2019 One night in the year , He saw the voice conversion function of deaf mutes on micro-blog make complaints about Sogou . After some communication , Sogou solves this technical problem .
In fact, long before the sign language anchor , These hearing-impaired people are already using Sogou's speech recognition technology to communicate with other ordinary people .
So in 2020 Years later , When Sogou 3D As digital human technology matures , The Sogou team came up with an idea ： To create a truly valuable AI The host .
Sogou is AI The creator of the composite anchor , At the forefront of Technology , Also thinking about the social responsibility of Technology .
As the world's first sign language AI Synthesis of the host ,“ Xiao Cong ” It can help the hearing-impaired to receive information better 、 Better life , It also reflects Sogou AI Humanistic care of technology .
As for when we can see it on TV “ Xiao Cong ”,“ Large scale applications are expected by the end of this year ”, Chen Wei said .
— End —
qubits QbitAI · Headline signing
Pay attention to our , The first time to learn about cutting-edge technology