This is theoretical as. But say you trained a NN to take the previous 120 frames and predict the next 120 frames of a video, and you could run it 60 times a second. You could then have a video playback that played the next frame based on the NN, but then predicted the next frame based on what came in. My theory is that it would be good enough to predict the next frames and essentially produce smooth (fake) video from a potentially choppy image. If it could run fast enough.
The cool thing about this is that you wouldn’t actually be watching a live person, you’d be watching the AI prediction of a live person, projected X frames into the future — this would effectively mean there is no lag because the AI is good enough to predict what you’re going to say/act and is always adjusting based on what you did say/do. We’re talking sub-second here, so it’s not predicting what you’re going to say, it’s predicting the changes in tone and pitch and where your face can get to in the next second. That part is totally possible. The part I think would be impossible is that you couldn’t do this fast enough and display the result. Maybe quantum computers.