The Mental Blog

Software with Intellect

9 notes

iWork’s New File Formats

Nick Heer had an interesting post pointing out that iWork has a new file format. This was subsequently picked up by Michael Tsai on his blog. The gist of the post is that Apple’s new format comprises many small binary files, which is not nearly as useful to power users as the old XML files. The post concludes that it is unclear why Apple would take this apparently backward step.

I don’t know for sure, but I think I can take a pretty good guess at why they have done it. It has nothing to do with being malicious, or trying to stop people seeing into the document’s format, and it has everything to do with iCloud and iOS devices.

Funnily enough, I was involved in discussions on Twitter just last week about how iWork ‘09 stored its data. I expressed skepticism that an iOS device could load a large Mac-built document fully into memory, which is what must happen if you are using an atomic document format like an XML file. There is no option to load an XML document partially — it’s all or nothing. (Update: Some have pointed out that you can partially-load an XML file. Indeed, you can, but saving it with changes becomes difficult.)

This is the first reason I think Apple have changed formats. They have split that potentially large XML document into many small binary files. Each file can now be loaded in isolation, and this is much better for iOS. Effectively, they have built a partial-loading document format. Closer inspection shows that each slide is a separate file, so they can just load what is on the current slide, and leave the rest on disk.

Why binary? Probably just because it is smaller, and faster to load. Again, big benefits on iOS devices.

But there is one other reason I suspect they have gone this route: iCloud. These new iWork apps are designed to work seamlessly with iCloud. The problem with a large XML file is that when it changes, that whole file will probably have to be uploaded to iCloud, and downloaded to all other devices. That could be quite a lot of data transfer for every save.

By splitting the format into many files, Apple only have to upload the files that have changed since the last save, e.g., the slides you have been working on. This is much more friendly to iCloud.

(It is worth noting that these small files are actually all packed into a single zip archive, so it could be argued you still effectively have one file. But I suspect the iCloud diff-ing algorithm is capable of just uploading parts of a zip archive that are changed, and these will be reasonably isolated when many files in the archive are unchanged.)

  1. folkwolf reblogged this from mentalfaculty and added:
    Apparently Apple has not heard of diff or patch…
  2. mentalfaculty posted this