Well, of course, it's not magic
No software can guess what happens between the frame with no beer and the frame where the person is holding the beer. There is some academic research going on in this area (use machine learning to create animation from a still frame or frames) but although the results look interesting, it's far from production ready. For now, you will have to do like animators do it and draw each frame by hand.