OK, taking a quick look at the specs it seems clear, that rsync is not a good solution, because the file format is a compressed one and therefore we have for one new single character a completely new file.I thought this was a technique of the 90s where "distributed" still was something unknown to many programmers.
Looking at openzim.org website I also can not find one single sentence about synchronization and updating - the concept of git, however, does not seem to be unknown to the developers, as they are using it for their own code, so I wonder, how they did not make the little jump to understand that decentralised distribution of content might have something to do with differential updates...
So it seems to be true: they did invent a file format for distribution of content without thinking about how to update. In fact on the kiwix website you find one 10GB file for the 2013-01 Wikipedia - so in february one million users will have to download again 10,5 GB?
I know ranting is evil, but I am playing this role as a sacrifice for the open source community - we have to speak clearly about such kind of extremely dumb decisions to avoid hundreds of programmers making the same error ever again. There is no sense in ignoring this elephant - which is a mammouth in fact! The inventors of this file-format should get the price of the "dumbest-programmig decision-of -the month", nothing else. Doing content distribution without thinking about a sane solution for updates is an unbelievable short-sighted WTF.
To be at least a little bit constructive:
- what parts of git can be used to help these guys? Or:
- why is git not good for them?
Can kiwix, instead of using a braindead not-update-enabled fileformat, be re-build as a git repository enabled wikipedia-markup interpreter / reader? If not, why not?
Using git as a "file backend", transparent for the user, the update problem could be solved very quickly. Also the outcome might be something, that is usable for many more things, if we keep the markup-interpreter part pluggable.
Forks welcome, Good Luck!
From the first sentence I was wonderig about "how to they manage incremental updates" - something, that would be the first thing to find a solution for before even starting... the "job offers" section at the end of the interview then reveals: they did not think about it. Can this be true? It simple does not make sense rolling out a new file format without thinking of the key problems BEFORE! But why exactly does rsync not work for this?