Staging area, working tree, and HEAD commit
Until now, we have barely named the staging area (also known as an index), while preparing files to make a new commit with the git add command.
Well, the staging area purpose is actually this. When you change the content of a file, when you add a new one or delete an existing one, you have to tell Git what of these modifications will be part of the next commit: the staging area is the container for this kind of data.
Let's focus on this right now; move to the master branch, if not already there, then type the git status command; it allows us to see the actual status of the staging area:
[1] ~/grocery (master) $ git status On branch master nothing to commit, working tree clean
Git says there's nothing to commit, our working tree is clean. But what's a working tree? Is it the same as the working directory we talked about? Well, yes and no, and it's confusing, I know.
Git had (and still have) some troubles with names; in fact, as we said a couple of lines before, even for the staging area we have two names (the other one is index). Git uses both in its messages and commands output, and the same often does people, blogs, and books like this one while talking about Git. Having two names for the same thing is not always a good idea, especially when they represent exactly the same thing, but being aware of this is enough (time will give us a less confusing Git, I'm sure).
For the working tree and working directory, the story is this. At some point, someone argued: If I'm in the root of the repository I'm in a working directory, but if I walk through a subfolder, I'm in another working directory. This is technically true by a filesystem perspective, but while in Git, doing some operations such as checkout or reset does not affect the current working directory, but the entire... working tree. So, to avoid confusion, Git stopped talking about working directory in its messages and "renamed" it as working tree. This is the commit on Git repository that made this change: https://github.com/git/git/commit/2a0e6cdedab306eccbd297c051035c13d0266343, if you want to go in deep. Hope I've clarified a little bit.
Back on topic now.
Add a peach to the shoppingList.txt file:
[2] ~/grocery (master) $ echo "peach" >> shoppingList.txt
Then make use of this new learnt command again, git status:
[3] ~/grocery (master) $ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: shoppingList.txt no changes added to commit (use "git add" and/or "git commit -a")
Okay, now it's time to learn about staged changes; with the word staged, Git means modifications we already added to the staging area, so they will be part of the next commit. In the current situation, we modified the shoppingList.txt file, but we have not added it yet to the staging area (using the good old git add command).
So, Git informs us: it tells that there is a modified file (in red color), and then offers two possibilities: stage it (add it to the staging area), or discard the modification, using the git checkout -- <file> command.
Let's try to add it; we will see the second option later.
So, try a git add command, with nothing more:
[4] ~/grocery (master) $ git add Nothing specified, nothing added. Maybe you wanted to say 'git add .'?
Okay, new thing learnt: git add wants you to specify something to add. A common thing is to use the dot . as a wildcard, and this by default means, add all the files in this folder and subfolders to the staging area. This is the same as git add -A (or --all), and by "all" I mean:
- Files in this folder and sub-folders I added in the past at least one time: This set of files is also known as the tracked files
- New files: These are called untracked files
- Files marked for deletion
Be aware that this behavior changed over time: before Git 2.x, git add . and git add -A had different effects. Here is a table for quickly understanding the differences.
Git version 1.x:
Git version 2.x:
As you can see, in Git 2.x there's a new way to stage new and modified files only, the git add --ignore-removal . way, and then git add . became the same as git add -A. If you are wondering, the -u option is the equivalent of --update.
Another basic usage is to specify the file we want to add; let's give it a try:
[5] ~/grocery (master) $ git add shoppingList.txt
As you can see, when git add goes right, Git says nothing, no messages: let's consider it a tacit approval.
Other ways to add files is specifying a directory to add all the changed files within it, using wildcards such as the star * with or without something else (for example, *.txt for adding all txt files, foo* for adding all files starting with foo and so on).
Please refer to https://git-scm.com/docs/git-add#git-add-ltpathspecgt82308203 for all the information.
Okay, time to look back at our repository; go with a git status now:
[6] ~/grocery (master) $ git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: shoppingList.txt
Nice! Our file has been added to the staging area, and now it is one of the changes that will be part of the next commit, the only one actually.
Now take a look at what Git says then: if you want to unstage the change, you can use the git reset HEAD command: what does it mean? Unstage is a word to say remove a change from the staging area, for example, because we realized we want to add that change not in the next commit, but later.
For now, leave things how they are, and do a commit:
[7] ~/grocery (master) $ git commit -m "Add a peach" [master 603b9d1] Add a peach 1 file changed, 1 insertion(+)
Check the status:
[8] ~/grocery (master) $ git status On branch master nothing to commit, working tree clean
Okay, now we have a new commit and our working tree is clean again; yes, because the effect of git commit is to create a new commit with the content of the staging area, and then empty it.
Now we can make some experiments and see how to deal with the staging area and working tree, undoing changes when in need.
So, follow me and make things more interesting; add an onion to the shopping list and then add it to the staging area, and then add a garlic and see what happens:
[9] ~/grocery (master) $ echo "onion" >> shoppingList.txt [10] ~/grocery (master) $ git add shoppingList.txt [11] ~/grocery (master) $ echo "garlic" >> shoppingList.txt [12] ~/grocery (master) $ git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: shoppingList.txt Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: shoppingList.txt
Okay, good! We are in a very interesting state now. Our shoppingList.txt file has been modified two times, and only the first modification has been added to the staging area. This means that at this point if we would commit the file, only the onion modification would be part of the commit, but not the garlic one. This is a thing to underline, as in other versioning systems it is not so simple to do this kind of work.
To highlight the modification we did, and take a brief look, we can use the git diff command; for example, if you want to see the difference between the working tree version and the staging area one, try to input only the git diff command without any option or argument:
[13] ~/grocery (master) $ git diff diff --git a/shoppingList.txt b/shoppingList.txt index f961a4c..20238b5 100644 --- a/shoppingList.txt +++ b/shoppingList.txt @@ -3,3 +3,4 @@ apple orange peach onion +garlic
As you can see, Git highlights the fact that in the working tree we have a garlic more than the staging area version.
The last part of the output of the git diff command is not difficult to understand: green lines starting with a plus + symbol are new lines added (there would be red lines starting with a minus - for deleted lines). A modified line will be usually highlighted by Git with a minus red deleted line and a plus green added line; to be true, Git can be instructed to use different diff algorithms, but this is out of the scope of this book.
Other than this, the first part of the git diff output is a little bit too difficult to explain in a few words; please refer to https://git-scm.com/docs/git-diff for all the details.
But what if you want to see the differences between the last committed version of the shoppingList.txt file and the one added into the staging area?
We have to use the git diff --cached HEAD command:
[14] ~/grocery (master) $ git diff --cached HEAD diff --git a/shoppingList.txt b/shoppingList.txt index 175eeef..f961a4c 100644 --- a/shoppingList.txt +++ b/shoppingList.txt @@ -2,3 +2,4 @@ banana apple orange peach +onion
We have to dissect this command to better understand what's the purpose; appending the HEAD argument, we are asking to use the last commit we did as a subject of the compare. To be true, in this case, the HEAD reference is optional, as it is the default: git diff --cached would return the same result.
Instead, the --cached option says, compare the argument (HEAD in this case) with the version in the staging area.
Yes, dear friends: the staging area, also known as an index, sometimes is called cache, hence the --cached option.
The last experiment that we can do is compare the HEAD version with the working tree one; let's do it with a git diff HEAD:
[15] ~/grocery (master) $ git diff HEAD diff --git a/shoppingList.txt b/shoppingList.txt index 175eeef..20238b5 100644 --- a/shoppingList.txt +++ b/shoppingList.txt @@ -2,3 +2,5 @@ banana apple orange peach +onion +garlic
Okay, it works as expected.
Now it's time to take a break from the console and spend a couple of words to talk about these three locations we compared.