Many of us create a few Git commits daily, either through GUI tools or the command line. It can be as simple as following these steps:
shell
# 1. Modify or create a file in your working directory.
echo '# my change' > 'test.sh'
#2. Add the modification to the staging area of git.
git add test.sh
# 3. Commit the staged changes.
git commit -m "initial commit"
Here, we've used Git high-level commands (also known as Porcelain commands) like
git add
, and git commit
.
However, there is another group of Git commands, known as
Plumbing commands,
that handle the low-level operations.
In this blog post, we want to create a Git commit using these low-level operations, and
not the git commit
command.
Before diving into low-level commands and crafting a Git commit, we need to understand a few Git basics. Let's start with the states of a file in Git.
The Basics
Files in Git can be in one of these three states:
- Modified: The file has changed but has not been committed to the Git database.
- Staged: The current version of the modified file is staged to be included in the next commit.
- Committed: Data is safely stored in the Git database.
Similarly, A Git project has three sections:
- Working Directory: These are the files that are pulled out of the Git database so you can easily modify them. Modified files reside here.
- Staging Area (Index): A file inside the
.git
directory that holds the information about what will go into your next commit. Staged files reside here. - Git directory: It's where Git stores all the objects and metadata of your repository. This directory is essentially what you copy when you clone a git project. Committed files reside here.
Now that we understand the different sections of a Git project, we need to know what exactly a Git commit is.
Git Objects: Commits, Trees, and Blobs
A git commit
is a git object
. There are several types of objects in git,
including blob
, tree
, and commit
objects.
These objects can be found inside the .git/objects
folder.
If you look inside that folder, you'll see that everything is stored using a SHA-1 hash of the object's content rather than file names. This approach helps Git track all changes to the content of a file or directory and makes it impossible to alter the content without Git detecting it.
Blob Objects
We can store any blob (binary file) inside Git's database, making it a powerful
content-addressable file system with a version-control system built on top of it.
This can easily be done using one of Git's plumbing commands called git hash-object
:
shell
echo 'hello world' | git hash-object -w --stdin
The -w
flag tells Git not only to return the hash of the content passed to it via
standard input (--stdin
) but also to store that content inside the .git/objects
folder as a blob. Essentially, Git writes a binary file with this content:
JavaScript's template literal used for clarity:
const blobFileContent = `blob ${content.bytesize}\0${content}`
const blobFileName = sha1hash(blobFileContent)
In the "hello world" case, the content of the blob file becomes: blob 11\0hello world
.
Git then calculates the SHA-1 hash of this content and stores the file using the hash
as the filename.
Tree Objects
Tree objects allow us to store file names for one or more files. You can think of tree
objects as representing directories, while blob
objects represent file contents.
Essentially, a tree
is a collection of references to blobs along with their file names,
or other trees.
This is the content of the tree object shown in the image above:
100644 blob 8b137891791fe96927ad78e64b0aad7bded08bdc README
100644 blob 8b137891791fe96927ad78e64b0aad7bded08bdc package.json
040000 tree 9c422c2393ba5463772797e780e1d4c00400374c src
Commit Objects
A Git commit is essentially an object that contains a reference to a Git tree, along with information such as who made the changes (author), when they were made, and why they were made (commit message). A commit can also have zero parents (initial commit), one parent (normal commits), or multiple parents (merge commits).
This is the content of an example commit object:
Note: Commit message is separated from metadata via an empty line
tree 5fb4d17478fc270ea606c010293c97bb76dec583
author Avestura <me@avestura.dev> 1725466118 +0330
committer Avestura <me@avestura.dev> 1725466118 +0330
initial commit
Now that we understand blob
, tree
, and commit
objects, we can visualize
their relationships. Consider a simple scenario like this:
shell
git init # initialize the .git repository
echo 'Readme' > README
echo 'License' > LICENSE
git add README LICENSE
git commit -m 'initial commit'
In this case, a total of four objects are created in Git:
- 1 README
blob
object - 1 LICENSE
blob
object - 1
tree
object that contains references to the previous blobs and their names - 1
commit
object that references thetree
and includes the author information
If we add another commit, the new commit will have a parent
metadata, pointing to the
inital commit:
Craft a Commit, the hard way
Now that we understand the Git objects related to a commit and their relationships, we can easily create a commit using low-level Git plumbing commands.
First of all, we need to initalize a new repository:
shell
$ git init
Initialized empty Git repository in E:/Projects/git/git-playground/.git/
Now we have to create a blob
object. As we already know, we can do this using the hash-object
command:
shell
$ echo 'This is the content of my file' | git hash-object -w --stdin
6b59acb69a04903bfa9189e3c482fb57f77393f9
We have stored our blob
object and know its hash. Now we need to create a tree
object.
Git normally uses the staging area (index) to create tree objects. We can create an index with
a single entry (our previously created blob) using the git update-index
command:
shell
git update-index --add --cacheinfo 100644 6b59acb69a04903bfa9189e3c482fb57f77393f9 myfile.txt
Explanation of the above command:
--add
adds the file to the index, as it isn’t already there.--cacheinfo <mode> <object> <path>
is used because the file is not in our directory, but inside the git's database- The number represents the file mode.
100644
means it's a normal file. Other modes include executable files and symbolic links. 6b59acb69a04903bfa9189e3c482fb57f77393f9
is the hash of theblob
myfile.txt
is the name of the file
- The number represents the file mode.
Now that we have the index file ready, we can create a tree
object from it using write-tree
:
shell
$ git write-tree
de53417c67393f9ef09239709759ecbbd5ebfb97
Git now outputs the hash of the tree
object.
You can check its content using the cat-file
command:
shell
$ git cat-file -p de53417c67393f9ef09239709759ecbbd5ebfb97
100644 blob 6b59acb69a04903bfa9189e3c482fb57f77393f9 myfile.txt
Now that our tree
object is ready and connected to the underlying blob
, we can
simply create the commit
object using the git commit-tree
command:
shell
$ echo 'My commit message' | git commit-tree de53417c67393f9ef09239709759ecbbd5ebfb97
409399744678c13717b30c103feef9451c9103bf
Finally, we have created a commit without using any of the high-level git commands (e.g. git commit
).
You can view the content of the newly created commit:
shell
$ git cat-file -p 409399744678c13717b30c103feef9451c9103bf
tree de53417c67393f9ef09239709759ecbbd5ebfb97
author Avestura <[email protected]> 1725470340 +0330
committer Avestura <[email protected]> 1725470340 +0330
My commit message
You can also view the log of the commit using git log
:
shell
$ git log --stat 409399744678c13717b30c103feef9451c9103bf
commit 409399744678c13717b30c103feef9451c9103bf
Author: Avestura <[email protected]>
Date: Wed Sep 4 20:49:00 2024 +0330
My commit message
myfile.txt | 1 +
1 file changed, 1 insertion(+)
If you want to see the files in your working directory, you can reset your
current branch to point to the newly created commit using git reset
:
shell
$ git reset --hard 409399744678c13717b30c103feef9451c9103bf
HEAD is now at 4093997 My commit message
$ ls
myfile.txt
$ cat myfile.txt
This is the content of my file
🥳 Hooray! We have crafted our commit and seen it in our working directory!
Conclusion
Git has two sets of commands: Porcelain (high-level commands) such as git add
,
git commit
, git remote
, etc., and low-level Plumbing commands, which are used by
higher-level commands to manipulate Git objects and references.
We used these low-level commands to craft a commit by creating its underlying tree
and blob
objects.
References
Resources I've used to write this blog post:
- Chacon, S., & Straub, B. (2014). Pro Git (2nd ed.). Apress.