Txt2Vid�Ultra-Low Bitrate Compression of Talking-Head Videos via Text
Pulkit Tandon
Stanford University
Workshop on Video Analytics 2022
Videos, videos, everywhere….
References:�Cisco, ”Cisco visual networking index: global mobile data traffic forecast update, 2017–2022.”, accessed 2021. [Online]. Available: https: //s3.amazonaws.com/media.mediapost.com/uploads/CiscoForecast.pdf�N. Pandey, A. Pal et al., “Impact of digital surge during Covid-19 pandemic: A viewpoint on research and practice,” International Journal of Information Management, vol. 55, p. 102171, 2020.�Cisco, ”Cisco Annual Internet Report (2018–2023) White Paper”, accessed 2021. [Online]. Available: https: //www.cisco.com/c/en/us/solutions/collateral/executive- perspectives/ annual- internet- report/white- paper- c11- 741490.html�M. Candela, V. Luconi, and A. Vecchio, “Impact of the covid-19 pan- demic on the internet latency: A large-scale study,” Computer Networks, vol. 182, p. 107495, 2020�G. S. Ford, “Covid-19 and broadband speeds: A multi-country analysis,” Available at SSRN 3689044, 2020.
Hello Bob, �how are you doing?
Your stream is freezing Bob!
Much better,�wish we could also see each other while talking though.
Hello Alice,
I am doing great. What about you?
OK, let me try switching off the video. �Can you hear me better now?
B
Can we compress AV content generated via webcams to text and recover videos with similar QoE compared to standard codecs in a low bitrate regime?
YES!
~100,000 bps AV stream
H.264 (95 Kbps) + AAC (5Kbps)
~100 bps text stream �decoded using Txt2Vid
~100-1000X compression �at iso-quality
against AVC + AAC
Subjective Study (~240 participants)
Transmission Package
Speech-to-Text
Text-to-Speech
Video Generation
Driving Video
Transmission Package
Encoder
Decoder
Sender
Receiver
Sender
Receiver
Conventional Approach
Txt2Vid Approach
One-Time
Typical Operation
~10-100 kbps Video Stream
~1-5 kbps Audio Stream
~100 bps Text Stream
User ID
“Hello, how is it going?”
User ID
Lip-Sync
Visit Poster Session to learn more!
Generative ML models at decoder
Lots of Potential Applications
Get Involved:�https://github.com/tpulkit/txt2vid