Getting over my fear of git
I remember before starting college, Git
was a scary word to me. Mainly because I didn’t understand what it was and why it was important. All the Git
commands seemed strange and obscure. I would be even more confused when people brought up Github. What’s the difference between Git
and Github? Aren’t they the same thing? Why is one in the terminal and the other a website? I had lots of unanswered questions.
During a meeting with a University of Chicago grad who was working on Github developer docs, I told him how daunting Git
felt to me. I asked him how he was able to learn Git
, and whether there was a class out there or a specific resource to become familiar with it. He responded that I shouldn’t worry too much, and that eventually I would pick it up as a student in college. I was a little skeptical.
But he turned out to be right.
One of the first things I learned through my intro CS class was how to use Git
to clone, commit, and push my local work to a remote repository hosted on Github. This helped me understand the difference between Git
and Github. Git
is the essential version control system that lives on your computer, and Github is simply a service built around Git
that hosts your code and helps you share your Git
managed repositories. We could do without Github, but we cannot do without Git
- that’s the tool you should know how to use.
To get a local git repository started, you can clone (copy) an existing remote repository:
git clone https://github.com/davdma/davdma.github.io.git
You can see the modified states of your files using git status
. Then you could commit changes by running:
git add file.py
git commit -m "my first commit"
To upload to the remote repository you just run:
git push
To pull in changes from the repo (when professors would add files we needed), run:
git pull
These commands were all that was needed for class.
Note: If you wanted to start a new git
repository locally on your computer instead of cloning an existing one, you can just call git init
in the project directory. You can always add the remote later using the git remote
command.
Some things that helped me better wrap my head around what was going on in Git
:
Git
knows about it, so you will want to make sure your most important pieces of code are being tracked. Untracked files are just everything else in the directory (if you lose untracked files, Git
cannot recover them for you). You can check the status of your files using git status
.git add
it sends a snapshot of your working tree to put in the index. The purpose of the index is that it allows you to selectively stage changes for including in the next commit without doing it all at once.working tree -> index -> git history
. The first two is bridged with git add
, and the second two is bridged with git commit
.git remote -v
(the flag displays the remote url next to the remote name). You can push and pull from any of these remotes! git remote add <name> <url>
. You can give it any name you want as long as it’s short and easy to remember.git push <remote> <branch>
e.g. git push origin master
.main
then it is typically referenced by origin/main
). Each time you git pull
it actually runs two separate commands, git fetch
and git merge
behind the scenes. git fetch
(kind of like a download) synchronizes your local remote representation with the online hosted remote representation. To get the updates in your local files the git merge
is then done between the new local remote branch and your local branch.This is one of the things that stumped me for a while. How do I undo a git add
? What about a particular commit? How do I revert a specific file back to a previous state rather than my entire repo?
If you want to unstage a file you have staged with git add
, run:
git reset HEAD <file>
OR
git restore --staged <file>
If you want to discard changes to a tracked file, you can revert to a previous commit of the file with:
git checkout -- <file>
OR
git restore <file>
(Note: HEAD is not moved when you just revert a single file like this.)
For resetting the entire repo state (i.e. including your index and tracked files) to a previous commit use:
git reset <mode> <commit>`.
Definitely be careful with using commands that discard unwanted changes, as they may not be recovered. For instance, when using git reset <commit>
you need to be aware that there are three distinct settings, --soft
, --mixed
(default), and --hard
. The soft setting does not change your index or working tree, but simply moves the HEAD to a previous commit, so you will still have everything you staged ready to be committed. The mixed setting resets the index but not the working tree, so you still have all your changes, just not added for commit. The hard setting resets both the index and working tree, so that it permanently discards all of the changes you’ve made in your filesystem and reverts everything to that specific commit – this is the most dangerous, so use with caution.
You might wonder why there is both a git reset
and git restore
. git restore
is an alternative and is preferable for newer versions of git
. But there are some intricacies to them which I learned while trying to revert just a single file to a previous state. They sound similar, but are actually doing different things. I learned that git reset
moves the HEAD
while git restore
does not, it only modifies your working directory. This is why git restore
is a safer operation.
A useful command when it comes to probing around at file states is git diff
. It shows you the difference between files at particular states. By itself without any flags it helps you look at changes between your working tree and the index staging area. If you wanted to look at changes between your index and a prior commit use git diff --staged <commit>
(or --cached
which is a synonym of --staged
). For changes between working tree and a particular commit, use git diff --merge-base <commit>
. Again, here you see why it is good to know the difference between the working tree and the index.
If you just want to see the changes introduced at a commit before you undo it, run:
git show <commit-sha>
Branching is the most powerful feature in Git
, and I wish I learned about why sooner. During my software development class, I was working alongside multiple teams of students developing new features for the codebase. When you have many individuals all working on different versions of the code on your main
branch, things can get hairy quick. This is why it is essential to work with different branches. Branching allows you to work on the codebase separately without affecting the main code if something breaks. In this workflow, typically the main
branch (which you might be accustomed to working with) is protected, and developers cannot directly make commits to main
. What you must do is branch off of main
, work on that branch and make commits to it separately, and then incorporate your code later when it is ready through a pull request (PR). The PR must be reviewed by other developers before it is finally merged into the main
branch.
The process of branching and merging is essential to the concept of Continuous Integration (CI) which is part of the software development process. CI prevents integration hell by frequently merging each developer’s work into the mainline branch.
To create a new branch from your current commit, run:
git branch <name>
OR
git checkout -b <name>
You can switch branches using the command:
git checkout <name>
If you just created a branch locally, you will need to set the upstream remote in order to do a git push
. To set it, run:
git push -u <remote> <branch>
Note: -u
is short for --set-upstream
. You can also explicitly set upstream from your local branch david
with git branch -u origin/david
.
Merging branches takes all the changes from one branch and adds it to another using a merge commit. If you are on the david
branch and you call git merge main
, you merge the changes from the main
branch over to your david
branch. If on the other hand, you wanted to merge david
onto main
you would need to checkout main
and git merge david
from there. Often times you will want to add the newest changes from main
to the side branch you are working on - to do this, you have to checkout main
and call git pull
to get the recent updates to main
, then checkout the david
side branch again to merge the updates from main
in.
Merge commits are special in that it involves more than one parent commit (usually merge commits have two, but more is possible!). When merging two branches, git
will look at the snapshot of the common ancestor of the two branches, and the snapshots of the two branch tips, and conduct a three way merge.
Often times you will encounter merge conflicts. This can be daunting at first, but it is actually straightforward once you learn how to resolve merge conflicts in the editor. At locations of merge conflicts there will be conflict markers <<<<<<<
, =======
, >>>>>>>
. While most editors give the option to choose between the current or incoming change, you are free to modify the lines as you wish (e.g. if you want to choose the incoming change but make modifications or create a combination of the current and incoming change). A common misconception is that you have to choose one or the other, when you can rewrite it however you like. Once resolved and all conflict markers are removed, save the file and stage it (staging marks it as resolved), then continue with git merge --continue
.
Once you’ve merged a local branch in e.g. bugFix
and you no longer need it, delete the branch with git branch -d bugFix
. This will keep your list of working branches organized. To delete remote branches from the server, you must run git push origin --delete bugFix
.
During a talk by a software developer at Slack, he recommended that students learn to use the interactive git rebase git rebase -i
. Apparently nobody knows how to use it, but it’s a superpower. This piqued my interest, so I started learning more about rebasing. So… what is rebasing?
A rebase is another way to combine work between branches in addition to the merge. While a merge joins two branches together in an entangled fashion (commit with 2+ parents), a rebase creates a linearized commit history. How it works is that the rebase takes the set of commits from one branch starting from the common ancestor, and copies them over on top of the branch you are rebasing onto. In real life codebases, merging work from many developers can get gross really fast. Rebasing makes things much cleaner and easier.
Usually you would run git rebase <upstream> <branch>
. But if you run git rebase <arg>
it will automatically assume that the argument is the upstream branch and you want the current branch you have checked out to be rebased onto it. For example calling git rebase main
from david
would move the series of commits from the david
branch on top of the main
branch.
If you want to move work around by copying a series of commits below your current location or HEAD and you know the exact commits you want, instead of rebase you can run:
git cherry-pick <commit1> <commit2> <commit3>
But if you are not sure what commits you want or their hashes, interactive rebase comes in (and is more powerful). Interactive rebase can let you reorder commits, drop or keep commits, squash commits and even edit commits. When you run git rebase -i
, you will first be dropped into your default editor (in my case vim) with the following lines:
pick 1a2b3c Commit message A
pick 4d5e6f Commit message B
pick 7g8h9i Commit message C
You can then choose what you want to do with each commit by modifying the prefix before the hash. For instance, the prefix pick
means you want to keep the commit. You can also use drop
to omit the commit, edit
to pause and make changes, or reword
to keep commit but modify its commit message. Once you’ve made the appropriate changes for the rebase instructions, save and exit, and rebase will run. Most likely you will encounter merge conflicts in the process, in which case the rebase will pause at that point. You will have to resolve these conflicts manually, add those resolutions to your index with git add
, and then continue on by running git rebase --continue
. If things get messy, you can always reset and try again with git rebase --abort
.
If you have completed the rebase but want to go back, you can always undo the rebase by looking into the ref log with git reflog
and doing a git reset
.
Some important caveats: if a rebase is so much cleaner, shouldn’t we always do a git rebase
then? That might not always be the case! While it is safe to rebase commits local to your computer not yet shared with other developers, it can quickly become a nightmare if you rebase commits you have already pushed to the server that other people have started to base their work on. This is because when you rebase you are essentially abandoning those original commits. (More on how to deal with this situation can be found in the Pro Git book). The best practice here is to rebase local changes before pushing and never rebasing commits already pushed.
Another good way to make your git
history cleaner is to squash your commits, i.e. combine multiple commits into one commit. You can squash commits inside an interactive rebase. You can also squash commits in a merge using git merge --squash
flag. For example git merge --squash feature
will take the changes from the feature branch and stage them as one giant change set, without creating a merge commit - so highly recommend using this. Doing this eliminates the long history of the feature
branch, and simply adds a new squashed commit onto your main
branch. If you squash merge feature
with a history A (main) -> B -> C (feature)
onto the main branch at A
you get A -> D (main)
on main where D
is a commit combining both B
and C
.
If you are jumping around commits and have uncommitted changes in your files you might lose, you will want to use git stash
. Suppose you want to go back 3 commits to HEAD~3
. If you don’t want to commit your current changes you will have to stash them first:
git stash
git checkout HEAD~3 # now you can switch
git checkout feature # come back to your work
git stash pop # restores the changes you stashed
Once you call git stash
those changes will disappear and be stashed away. When git stash pop
is called they will be put back in your working directory.
You can inspect your reference history with git reflog
. The reference logs or reflog
record useful information about where the HEAD
was several moves ago, as well as the movement of branch references in the local repository. It also stores recent actions. The reflog
syntax @{}
is important to know in addition to the ^
and ~
symbols: HEAD@{n}
refers to the position of HEAD in the reflog n
moves ago. Check with the reflog command what HEAD@{n}
points to.
Move the HEAD to a previous reference point with all changes staged:
git reset --soft "HEAD@{2}"
Options I use a lot:
git status
output. Using the flag git status -uno
however skips displaying untracked files, making it easier to read.git commit -a
flag. You don’t even need to git add
!git add path/to/file
a dozen times. Just use git add -i
option! The interface is a little less intuitive than the interactive rebase, but it’s easy to follow here. You can also interactively add specific parts of files if you want to split changes done to the same file in multiple commits, which can alternatively be initiated with git add --patch
.git pull --rebase
option.git log -p
. Abbreviated stats (lines modified etc.) can be shown with --stat
and the ASCII graph with --graph
. Can also limit to specific files with git log -- path/to/file
. You can also focus on specific string of interest with git log -S
e.g. git log -S some_func -p
, it will look for diffs that adds or removes that string.-U
flag, add the number of lines you want around the diffs with an integer after the flag, for example, git diff -U8
for 8 lines of context instead of the default 3.git ls-files .
.git branch --column
for a neater output of branches organized into columns instead one long list if there are many branches being juggled.git branch -vv
. Make sure you run git fetch --all
to ensure you have the most up to date remote refs.Other things I learned of note, but I use less frequently:
git blame
. The flags git blame -w -C
ignores whitespace + detects moved lines in same commit. It’s actually recommended to do git blame -w -C -C -C
to ignore other unnecessary stuff like who created the file (so it’s more clear who actually is responsible for code). You can use the -L
flag to narrow down the output to specific line numbers of interest, e.g. git blame -L 59,100
for line numbers 59-100. You can also do something similar with git log -L 59,100:path/to/file
to show the evolution of the line range through commits.git push --force-with-lease
is always recommended over --force
when amending commitsgit maintenance start
once in any repo highly recommended! Will speed things up by allowing git to perform maintenance tasks in the background.For commonly used commands, it’s also a good idea to make an alias so that you can type short names. For example if you just want to type git co
instead of git checkout
then run:
git config --global alias.co checkout
It’s also cool to create your own commands from git command and flag combos. This would for example let you only see the last commit with git last
:
git config --global alias.last 'log -1 HEAD'
A powerful notation within git
is range notation, and can be used in most git
commands. The most common syntax is the double dot A..B
, and means commits reachable from B
but not from A
. So if you only want to see the commits in the branch featureB
but not featureA
from where they diverged, then you can run git log featureA..featureB
. In many cases you care only about what is in your feature
branch that you haven’t merged into main
in which you might run git log main..feature
. Preview newly fetched changes to feature
with git log feature..origin/feature
or preview what you are about to push to remote with git log origin/main..HEAD
.
You can use it with diffs to see what changes say a pull request has introduced, e.g. git diff main..feature
.
The triple dot syntax A...B
focuses on commits reachable from B
or A
but not both. So you can look at commits of both since their divergence.
Here are some useful resources I consulted (and I still often go back to) on my journey to learning git: