创建 Git 提交：困难的方法

创建 Git 提交：困难的方法
Creating a Git Commit: The Hard Way

原始链接: https://avestura.dev/blog/creating-a-git-commit-the-hard-way

在 Git 中，用户使用各种工具（图形用户界面 (GUI) 或命令行）执行大量日常 Git 提交。为了说明此过程，让我们按照以下基本步骤操作： 1. 在工作目录中编辑或创建文件。例如，使用以下命令： ````````` echo "# 我的改变"> test.sh ````````` 2. 在 Git 中暂存修改。运行以下命令： ````````` git 添加 test.sh ````````` 3. 提交分阶段的变更。使用这个命令： ````````` git commit -m“初始提交” ````````` 这里使用的命令是 Git 高级命令（Porcelain 命令）的一部分，例如 git add 和 git commit。然而，Git 还提供低级操作（管道命令），它管理这些陶瓷命令表面下的核心流程。本文的目的是演示使用低级 Git 操作而不是通常的 git commit 命令创建 Git 提交。要理解使用低级 Git 命令创建 Git 提交，熟悉一些基本的 Git 概念至关重要。这篇文章首先解释 Git 中文件的三个可能的阶段： - 已修改：文件已更改，但未提交。 - 已暂存：更新的文件被设置为包含在即将到来的提交中。 - 已提交：数据安全地保存在 Git 存储库中。同样，Git 项目由三部分组成： - 工作目录：拉取文件以便于编辑。 - 修改后的文件在这里。 - 暂存区域（索引）：保留有关后续提交的项目的数据。 - 暂存文件存在于此处。 - Git 存储库：保存存储库的所有对象和元数据。了解 Git 项目的不同组件后，准确掌握 Git 提交的含义至关重要。 Git 使用以下对象 – 提交、树和 Blob： - Commit：代表 Git 提交的 Git 对象，其中包含作者、日期和提交消息等元数据。提交可以不包含祖先（第一次提交）、一个祖先（常规提交）或多个祖先（合并提交）。 - Blob：存储在 Git 数据库中的任何二进制文件，提供具有版本控制的强大内容可寻址文件系统

Git 跟踪对文件所做的更改并在不同阶段维护其状态：未修改、暂存和已提交。暂存文件已准备好包含在下一次提交中，并且通过“git add”跟踪对暂存文件的修改。但是，在提交之前在“git add”之后更改文件会导致这两个语句都不正确。例如，“git add -p”很有帮助，但需要额外的步骤来取消暂存文件。与其最初添加整个文件，然后使用“git add -p”进行后续更改，不如以交互方式暂存文件的特定部分会更方便。 Magit 等工具中存在此功能。这些工具允许用户在文件中暂存单独的行或块，而不需要线性暂存。当用户希望删除暂存部分时，他们可以撤消（Magit 中的“ds”命令）暂存。了解 Git 的内部操作可以提高其使用和推理能力，但文档有时可能会被忽视。考虑到 Git 文档的详细程度，作者对此感到惊讶。

原文

Many of us create a few Git commits daily, either through GUI tools or the command line. It can be as simple as following these steps:

shell

# 1. Modify or create a file in your working directory.
echo '# my change' > 'test.sh'

#2. Add the modification to the staging area of git.
git add test.sh

# 3. Commit the staged changes.
git commit -m "initial commit"

Here, we've used Git high-level commands (also known as Porcelain commands) like git add, and git commit. However, there is another group of Git commands, known as Plumbing commands, that handle the low-level operations.

In this blog post, we want to create a Git commit using these low-level operations, and not the git commit command.

Before diving into low-level commands and crafting a Git commit, we need to understand a few Git basics. Let's start with the states of a file in Git.

The Basics

Files in Git can be in one of these three states:

Modified: The file has changed but has not been committed to the Git database.
Staged: The current version of the modified file is staged to be included in the next commit.
Committed: Data is safely stored in the Git database.

Similarly, A Git project has three sections:

Working Directory: These are the files that are pulled out of the Git database so you can easily modify them. Modified files reside here.
Staging Area (Index): A file inside the .git directory that holds the information about what will go into your next commit. Staged files reside here.
Git directory: It's where Git stores all the objects and metadata of your repository. This directory is essentially what you copy when you clone a git project. Committed files reside here.

Now that we understand the different sections of a Git project, we need to know what exactly a Git commit is.

Git Objects: Commits, Trees, and Blobs

A git commit is a git object. There are several types of objects in git, including blob, tree, and commit objects. These objects can be found inside the .git/objects folder.

If you look inside that folder, you'll see that everything is stored using a SHA-1 hash of the object's content rather than file names. This approach helps Git track all changes to the content of a file or directory and makes it impossible to alter the content without Git detecting it.

Blob Objects

We can store any blob (binary file) inside Git's database, making it a powerful content-addressable file system with a version-control system built on top of it. This can easily be done using one of Git's plumbing commands called git hash-object:

shell

echo 'hello world' | git hash-object -w --stdin

The -w flag tells Git not only to return the hash of the content passed to it via standard input (--stdin) but also to store that content inside the .git/objects folder as a blob. Essentially, Git writes a binary file with this content:

JavaScript's template literal used for clarity:

const blobFileContent = `blob ${content.bytesize}\0${content}`
const blobFileName    = sha1hash(blobFileContent)

In the "hello world" case, the content of the blob file becomes: blob 11\0hello world. Git then calculates the SHA-1 hash of this content and stores the file using the hash as the filename.

Tree Objects

Tree objects allow us to store file names for one or more files. You can think of tree objects as representing directories, while blob objects represent file contents. Essentially, a tree is a collection of references to blobs along with their file names, or other trees.

This is the content of the tree object shown in the image above:

100644 blob 8b137891791fe96927ad78e64b0aad7bded08bdc    README
100644 blob 8b137891791fe96927ad78e64b0aad7bded08bdc    package.json
040000 tree 9c422c2393ba5463772797e780e1d4c00400374c    src

Commit Objects

A Git commit is essentially an object that contains a reference to a Git tree, along with information such as who made the changes (author), when they were made, and why they were made (commit message). A commit can also have zero parents (initial commit), one parent (normal commits), or multiple parents (merge commits).

This is the content of an example commit object:

Note: Commit message is separated from metadata via an empty line

tree 5fb4d17478fc270ea606c010293c97bb76dec583
author Avestura <me@avestura.dev> 1725466118 +0330
committer Avestura <me@avestura.dev> 1725466118 +0330

initial commit

Now that we understand blob, tree, and commit objects, we can visualize their relationships. Consider a simple scenario like this:

shell

git init # initialize the .git repository
echo 'Readme' > README
echo 'License' > LICENSE
git add README LICENSE
git commit -m 'initial commit'

In this case, a total of four objects are created in Git:

1 README blob object
1 LICENSE blob object
1 tree object that contains references to the previous blobs and their names
1 commit object that references the tree and includes the author information

If we add another commit, the new commit will have a parent metadata, pointing to the inital commit:

Craft a Commit, the hard way

Now that we understand the Git objects related to a commit and their relationships, we can easily create a commit using low-level Git plumbing commands.

First of all, we need to initalize a new repository:

shell

$ git init
Initialized empty Git repository in E:/Projects/git/git-playground/.git/

Now we have to create a blob object. As we already know, we can do this using the hash-object command:

shell

$ echo 'This is the content of my file' | git hash-object -w --stdin
6b59acb69a04903bfa9189e3c482fb57f77393f9

We have stored our blob object and know its hash. Now we need to create a tree object. Git normally uses the staging area (index) to create tree objects. We can create an index with a single entry (our previously created blob) using the git update-index command:

shell

git update-index --add --cacheinfo 100644 6b59acb69a04903bfa9189e3c482fb57f77393f9 myfile.txt

Explanation of the above command:

--add adds the file to the index, as it isn’t already there.
--cacheinfo <mode> <object> <path> is used because the file is not in our directory, but inside the git's database
- The number represents the file mode. 100644 means it's a normal file. Other modes include executable files and symbolic links.
- 6b59acb69a04903bfa9189e3c482fb57f77393f9 is the hash of the blob
- myfile.txt is the name of the file

Now that we have the index file ready, we can create a tree object from it using write-tree:

shell

$ git write-tree
de53417c67393f9ef09239709759ecbbd5ebfb97

Git now outputs the hash of the tree object. You can check its content using the cat-file command:

shell

$ git cat-file -p de53417c67393f9ef09239709759ecbbd5ebfb97
100644 blob 6b59acb69a04903bfa9189e3c482fb57f77393f9    myfile.txt

Now that our tree object is ready and connected to the underlying blob, we can simply create the commit object using the git commit-tree command:

shell

$ echo 'My commit message' | git commit-tree de53417c67393f9ef09239709759ecbbd5ebfb97
409399744678c13717b30c103feef9451c9103bf

Finally, we have created a commit without using any of the high-level git commands (e.g. git commit). You can view the content of the newly created commit:

shell

$ git cat-file -p 409399744678c13717b30c103feef9451c9103bf
tree de53417c67393f9ef09239709759ecbbd5ebfb97
author Avestura <[email protected]> 1725470340 +0330
committer Avestura <[email protected]> 1725470340 +0330

My commit message

You can also view the log of the commit using git log:

shell

$ git log --stat 409399744678c13717b30c103feef9451c9103bf
commit 409399744678c13717b30c103feef9451c9103bf
Author: Avestura <[email protected]>
Date:   Wed Sep 4 20:49:00 2024 +0330

    My commit message

 myfile.txt | 1 +
 1 file changed, 1 insertion(+)

If you want to see the files in your working directory, you can reset your current branch to point to the newly created commit using git reset:

shell

$ git reset --hard 409399744678c13717b30c103feef9451c9103bf
HEAD is now at 4093997 My commit message

$ ls
myfile.txt

$ cat myfile.txt
This is the content of my file

🥳 Hooray! We have crafted our commit and seen it in our working directory!

Conclusion

Git has two sets of commands: Porcelain (high-level commands) such as git add, git commit, git remote, etc., and low-level Plumbing commands, which are used by higher-level commands to manipulate Git objects and references. We used these low-level commands to craft a commit by creating its underlying tree and blob objects.

References

Resources I've used to write this blog post:

Chacon, S., & Straub, B. (2014). Pro Git (2nd ed.). Apress.