Find out which of your problems you defined correspond to which category (compression, diff images, motion extrapolation, scalable network topology, etc.). And also determine which things that you may implement later will affect modules you created earlier.
Therefore I really support the suggestion from Beni: create a basic framework that will support the most important features you want. You don't need to implement all these features now but your framework needs to be adaptable and extendable to support it in the future.
And don't think about the network delay too much. If you create a nice extendable framework you can always care bout delays later. You can't do any more magic than to follow the tips that you are given by the papers I posted earlier. The bottleneck anyways will be at a place where you don't expect it to be

I like the questions you ask and the problems you confront the community with because they show, that you are thinking about the right things. If you get stuck, never hesitate to ask us, because networking really is no trivial task.