So I came across the following passage in section 3.1 of Pro Git:


"Let’s assume that you have a directory containing three files, and you stage them all and commit. Staging the files computes a checksum for each one (the SHA-1 hash we mentioned in Getting Started), stores that version of the file in the Git repository (Git refers to them as blobs), and adds that checksum to the staging area"


My question is this: Why does git "store a version of the file in the Git repository" prior to me committing those files?


There's a very mechanical answer (which I see siride mentioned in a comment): the internal structure of Git's index, that mysterious object that Git uses to build up the next commit, stores only blob hash IDs. Therefore, in order to have a copy of the file in the index (so that it will be in the next commit), it must be in the repository as a blob object.


There's a performance answer: by storing hash IDs in the index, Git makes new commits very quickly.

There's a data-recovery answer (which is kind of weak): by storing the blob in the repository in advance, you can get it back for a while, via git fsck --lost-found, if you accidentally do something bad to it. (The weaknesses here are, or include, that if the blob matches an existing blob in the repository, it does not show up in the lost-found search; and you lose the file's name, which is often important to understanding its content.)

There's a design-aesthetic answer: perhaps Linus thought that git add file copying the file into the repository early was prettier than having git commit do it later.


You can choose any of these answers, or make up your own!


