Git Internals
The .git
Folder
Every Git repository contains a hidden .git
directory, which serves as the heart of the repository. This directory contains all the metadata, configuration, and data necessary to manage version control. The .git
folder is created when you initialize a repository (git init
) or clone an existing repository (git clone
).
The .git
directory structure includes several important subdirectories and files:
config
: Stores repository-specific configuration settings (like remote repository URL, branches, user name, email).HEAD
: A pointer to the currently checked-out branch. This file holds the reference to the branch or commit that the working directory is currently at.index
: Also known as the staging area or cache, this file stores the index (snapshot) of the files that are staged for the next commit.objects/
: Contains all the objects that Git uses to store data, such as commits, trees, and blobs. Git stores data in the form of these objects to reduce redundancy and optimize storage.refs/
: This folder contains references to heads (branches) and tags. Under this folder, you’ll find references to local branches and remote-tracking branches.logs/
: Stores logs for branch references, showing changes made to branches and how they evolved over time.hooks/
: Contains sample scripts for Git hooks (like pre-commit, post-commit) that allow for custom actions to be triggered at specific points during the Git workflow.
The Role of SHA in Git
Git uses SHA-1 (Secure Hash Algorithm 1) to generate unique identifiers for its objects. These SHA-1 hashes are 40-character strings that act as cryptographic fingerprints for all Git objects. A SHA-1 hash is used to uniquely identify the contents of a commit, file, or any other object in Git.
For example, Git assigns a SHA-1 hash to:
- Commits: Each commit is represented by a SHA-1 hash. The commit hash is generated based on the commit content, including the author, timestamp, commit message, and the tree object that represents the file snapshot.
- Files: Files are stored as blobs in Git. A blob object is assigned a SHA-1 hash based on the content of the file.
- Directories: Git represents directories as tree objects. A tree object groups blobs (files) and other tree objects (subdirectories) under a particular directory structure.
- Tags: Git uses SHA-1 to uniquely identify tags that point to specific commits.
SHA-1 hashes in Git are deterministic, meaning the same input will always result in the same hash. This ensures that even small changes in a file or commit will result in a completely different SHA-1 value, helping to detect changes.
Commit Storage and Structure
A commit in Git is an object that contains a snapshot of your project at a given point in time, along with metadata such as the commit message, author, and timestamp. The commit is stored in Git's object database as a commit object, and it points to a tree object, which in turn points to blob objects (files).
A commit object typically contains the following:
-
Commit Metadata:
- Author name and email
- Timestamp of the commit
- Commit message
-
Tree Object:
- A reference to the tree object that represents the state of the file system at the time of the commit.
-
Parent Commits:
- A reference to one or more parent commits (for example, in the case of merges, multiple parent commits are present).
The commit object is created by Git when a new commit is made, and it is identified by a SHA-1 hash that uniquely identifies that specific commit.
Commit Object Example:
commit <sha1-hash>
tree <sha1-hash of tree object>
parent <sha1-hash of parent commit>
author <name> <email> <timestamp>
committer <name> <email> <timestamp>
<commit message>
The commit object points to a tree object (which represents the snapshot of the files), and each file in the tree object is represented by a blob object (holding the file's content).
Example Commit Process:
-
Creating a Commit:
- When you execute
git commit
, Git first creates a snapshot of the staged changes and stores the result as a tree object (which contains references to the blobs representing individual files).
- When you execute
-
Storing the Commit:
- Git then creates a commit object, which points to this tree object and contains the commit message, author, timestamp, and any parent commits.
-
Hashing:
- Git calculates the SHA-1 hash of the commit object and stores it in the
.git/objects
directory. The commit’s SHA-1 hash is a cryptographic representation of all the data associated with that commit.
- Git calculates the SHA-1 hash of the commit object and stores it in the
-
Referencing Commits:
- After the commit is created, Git stores the commit’s SHA-1 hash in the
.git/refs/heads/
directory, under the branch name (e.g.,master
ormain
). This allows Git to track which commit is currently checked out for each branch.
- After the commit is created, Git stores the commit’s SHA-1 hash in the
Git Objects in .git/objects
All Git objects (commits, blobs, trees) are stored in the .git/objects/
directory, and these objects are stored in a compressed, binary format. The directory is organized into a two-level hierarchy:
- The first two characters of a SHA-1 hash represent a subdirectory (e.g.,
.git/objects/ab/
). - The remaining 38 characters are the object filename (e.g.,
.git/objects/ab/cde1234...
).
Git uses a pack file format to optimize storage. When a repository grows large, Git bundles multiple objects into a single pack file (.pack
), reducing disk usage and speeding up operations.
How Git Tracks Changes
Git's efficiency comes from the way it tracks changes. Rather than storing full versions of every file with each commit, Git only stores deltas or differences between files (if changes are made to the same file). It uses a combination of object types:
- Blob: Stores the raw contents of files.
- Tree: Stores a snapshot of the file structure, including references to blobs (files) and other trees (subdirectories).
- Commit: Stores the metadata (author, timestamp, message) and references to a tree object and its parent commits.
Git’s approach allows it to efficiently store a large number of commits and file changes with minimal storage overhead.
The git cat-file
Command
The git cat-file
command is a powerful tool that allows you to inspect Git objects in the .git/objects
directory. With this command, you can explore the details of various Git objects such as blobs (file content), trees (directory structure), and commits.
Syntax:
git cat-file -p <object-hash>
-p
(pretty-print): Displays the content of the object in a human-readable format.-t
(type): Shows the type of the object (blob, tree, or commit).-s
(size): Displays the size of the object in bytes.
Viewing the Entire Project from a Commit Hash
Using the git cat-file
command, you can navigate through the contents of a repository by starting with a commit hash and exploring its associated tree and blob objects.
Steps to Explore a Commit and Its Contents:
-
Start with a Commit Hash:
- When you run
git log
, each commit has a SHA-1 hash associated with it. Use this hash to start exploring.
Example:
git log --oneline
Output:
3f62c0a Initial commit
- When you run
-
Inspect the Commit Object:
- Use the commit hash to view the details of the commit object.
git cat-file -p 3f62c0a
Output:
tree f16b0e45e0a5c6e61c8b3d43234578db23818d57
author Abhishek <abhishek@example.com> 1672527600 +0000
committer Abhishek <abhishek@example.com> 1672527600 +0000
Initial commit- The tree hash (
f16b0e4...
) represents the snapshot of the project directory.
-
Inspect the Tree Object:
- Use the tree hash to explore the directory structure and see the files and subdirectories.
git cat-file -p f16b0e45e0a5c6e61c8b3d43234578db23818d57
Output:
100644 blob bbc4a75e0a8d72d67ec9c1c0f5d0b47ea93e2d88 README.md
100644 blob 9da3b8e6e10333f63cf575dcee73e8d22a769b9e main.py
040000 tree 1c7a28f342d9dbd1d21c9865c1d08e6cf4a0c2b7 src- The tree object lists:
- Mode: File type and permissions (e.g.,
100644
for regular files,040000
for directories). - Object Type: Blob for files, Tree for subdirectories.
- SHA-1 Hash: Points to the content of the file (blob) or subdirectory (tree).
- Filename: Name of the file or directory.
- Mode: File type and permissions (e.g.,
-
Inspect a Blob Object (File Content):
- Use the hash of a blob to view the content of a specific file.
git cat-file -p bbc4a75e0a8d72d67ec9c1c0f5d0b47ea93e2d88
Output:
# README.md content
Welcome to the project! -
Traverse Subdirectories:
- For subdirectories, use the tree hash to explore further.
git cat-file -p 1c7a28f342d9dbd1d21c9865c1d08e6cf4a0c2b7
Output:
100644 blob a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0 utils.py
Example: Viewing the Entire Project from a Commit
Given a commit hash (3f62c0a
), you can reconstruct the entire project as follows:
-
Use the commit hash to find the tree hash:
git cat-file -p 3f62c0a
Output:
tree f16b0e45e0a5c6e61c8b3d43234578db23818d57
-
Use the tree hash to list the files and directories:
git cat-file -p f16b0e45e0a5c6e61c8b3d43234578db23818d57
Output:
100644 blob bbc4a75e0a8d72d67ec9c1c0f5d0b47ea93e2d88 README.md
100644 blob 9da3b8e6e10333f63cf575dcee73e8d22a769b9e main.py
040000 tree 1c7a28f342d9dbd1d21c9865c1d08e6cf4a0c2b7 src -
For each blob, use its hash to display the file content:
git cat-file -p bbc4a75e0a8d72d67ec9c1c0f5d0b47ea93e2d88
Output:
Welcome to the project!
-
For subdirectories, use their tree hash to navigate deeper.
By following this process, we can manually traverse and reconstruct the entire project at the state of a given commit, directly from the .git
folder and object hashes.
How Branches Are Stored in .git
Folder
In Git, branches are lightweight pointers to specific commits. Internally, a branch is nothing more than a file that contains the SHA-1 hash of the commit it points to. These branch files are stored in the .git/refs/heads/
directory.
Storing Branches in .git/refs/heads/
When we create a branch in Git, a new file is added to the .git/refs/heads/
directory. The name of the file corresponds to the branch name, and its content is the SHA-1 hash of the commit that the branch is currently pointing to.
Example:
Let’s say we have a branch called main
.
-
Inside
.git/refs/heads/
, there will be a file namedmain
.cat .git/refs/heads/main
Output:
3f62c0a7a7e1a81c4a92b9b9cf56d2a44ea18cf2
- This file contains the hash of the latest commit on the
main
branch (3f62c0a...
).
- This file contains the hash of the latest commit on the
-
If we create another branch (e.g.,
feature
), Git will create a new file.git/refs/heads/feature
with the same structure, containing the commit hash that thefeature
branch points to.
Relationship Between HEAD and Branches
Git uses the HEAD
file to determine the currently active branch or commit. The HEAD
file typically contains a reference to the branch we’re currently working on.
Example:
If we’re on the main
branch, the HEAD
file will contain:
ref: refs/heads/main
This indicates that HEAD
is pointing to the main
branch. When we make a new commit, Git updates the main
file inside .git/refs/heads/
to point to the new commit hash.
Detached HEAD State
In some cases, HEAD
might point to a specific commit hash instead of a branch. This is known as a detached HEAD state.
Example:
If we check out a specific commit using git checkout <commit-hash>
, the HEAD
file will contain:
3f62c0a7a7e1a81c4a92b9b9cf56d2a44ea18cf2
In this case, we are not on any branch, but rather on a specific commit.
Remote Branches in .git/refs/remotes/
Remote-tracking branches are stored in .git/refs/remotes/
. For example, if we clone a repository with a main
branch from a remote named origin
, Git will create the file .git/refs/remotes/origin/main
to track the remote branch.
Example:
cat .git/refs/remotes/origin/main
Output:
3f62c0a7a7e1a81c4a92b9b9cf56d2a44ea18cf2
Packed References
When a repository has many branches, tags, or references, Git might store them in a more optimized format called a packed reference file. Instead of having individual files for each branch in .git/refs/heads/
, Git consolidates them into a single file called .git/packed-refs
.
Example:
Contents of .git/packed-refs
:
# pack-refs with: peeled fully-peeled
3f62c0a7a7e1a81c4a92b9b9cf56d2a44ea18cf2 refs/heads/main
1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p7q8r9s0 refs/heads/feature
Git automatically manages when references are packed. Despite being packed, the logical structure of branches remains the same.