patch-id Git Command Guide

The git patch-id command reads a patch from standard input and computes a unique patch ID based on the file diffs. Patch IDs are stable across line number changes and can be used to identify duplicate or equivalent patches.

git patch-id Syntax:

git patch-id [--stable | --unstable | --verbatim]

Algorithm Options:

Option	Description
`--stable`	Use stable patch ID algorithm (default)
`--unstable`	Use unstable algorithm (faster but less stable)
`--verbatim`	Use verbatim algorithm (exact content match)

Understanding Patch ID Computation:

Patch ID Algorithm:

Patch ID = SHA-1 sum of file diffs with:
├── Context lines ignored
├── Line numbers ignored
├── Whitespace changes ignored
└── Only actual diff content considered

Stable: Ignores line numbers and context
Unstable: Faster but may vary with formatting
Verbatim: Exact patch content match

Patch ID Properties:

Stable across:
├── Line number changes
├── Context size changes
├── Minor formatting differences
├── File renames (if content same)

Changes with:
├── Actual code changes
├── Added/removed lines
├── Significant formatting changes

Basic Usage Examples:

Compute Patch ID from Diff:

# Generate patch ID from git diff output
git diff HEAD~1 | git patch-id

# Output: <patch-id>

# Compute ID for specific commit
git show <commit> | git patch-id

# Get patch ID for staged changes
git diff --cached | git patch-id

Compare Patches for Duplicates:

# Compare two commits for duplicate patches
git show commit1 | git patch-id > id1.txt
git show commit2 | git patch-id > id2.txt

if diff id1.txt id2.txt >/dev/null; then
    echo "Patches are identical"
else
    echo "Patches are different"
fi

Process Multiple Patches:

# Process multiple patches from mailbox
git am --patch-format=mbox < patches.mbox |
while read -r line; do
    echo "$line" | git patch-id
done

# Batch process patch files
for patch in *.patch; do
    echo "Processing $patch:"
    git patch-id < "$patch"
done

Advanced Patch Analysis Scenarios:

Duplicate Commit Detection:

#!/bin/bash
# Find duplicate commits in repository

find_duplicate_commits() {
    echo "Searching for duplicate commits..."

    # Create mapping of patch IDs to commits
    declare -A patch_map

    # Process all commits
    git log --all --pretty=format:"%H" | while read commit; do
        patch_id=$(git show "$commit" | git patch-id | cut -d' ' -f1)

        if [ -n "${patch_map[$patch_id]}" ]; then
            echo "Duplicate found:"
            echo "  Original: ${patch_map[$patch_id]}"
            echo "  Duplicate: $commit"
            echo "  Patch ID: $patch_id"
            echo ""
        else
            patch_map[$patch_id]="$commit"
        fi
    done
}

find_duplicate_commits

Patch Series Analysis:

# Analyze patch series for duplicates or conflicts
analyze_patch_series() {
    local patch_dir="$1"

    echo "Analyzing patch series in $patch_dir"

    declare -A seen_patches

    for patch_file in "$patch_dir"/*.patch; do
        [ -f "$patch_file" ] || continue

        patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)
        patch_name=$(basename "$patch_file")

        if [ -n "${seen_patches[$patch_id]}" ]; then
            echo "Warning: Duplicate patch content"
            echo "  Original: ${seen_patches[$patch_id]}"
            echo "  Duplicate: $patch_name"
            echo "  Patch ID: $patch_id"
        else
            seen_patches[$patch_id]="$patch_name"
            echo "✓ $patch_name (ID: ${patch_id:0:8})"
        fi
    done
}

analyze_patch_series "/path/to/patches"

Email Thread Deduplication:

# Remove duplicate patches from email threads
deduplicate_email_patches() {
    local mbox_file="$1"

    echo "Deduplicating patches in $mbox_file"

    declare -A processed_patches
    temp_file=$(mktemp)

    # Process each message in mbox
    git mailsplit -o. "$mbox_file" >/dev/null

    for msg_file in [0-9]*; do
        [ -f "$msg_file" ] || continue

        # Extract patch content
        if sed -n '/^---$/,/^---$/p' "$msg_file" | git patch-id >/dev/null 2>&1; then
            patch_id=$(sed -n '/^---$/,/^---$/p' "$msg_file" | git patch-id | cut -d' ' -f1)

            if [ -z "${processed_patches[$patch_id]}" ]; then
                processed_patches[$patch_id]="$msg_file"
                cat "$msg_file" >> "$temp_file"
                echo "" >> "$temp_file"  # Message separator
            else
                echo "Skipping duplicate patch: $msg_file (matches ${processed_patches[$patch_id]})"
            fi
        else
            # Not a patch message, include as-is
            cat "$msg_file" >> "$temp_file"
            echo "" >> "$temp_file"
        fi
    done

    # Clean up
    rm -f [0-9]*

    # Replace original with deduplicated version
    mv "$temp_file" "$mbox_file.deduplicated"

    echo "Deduplicated mbox saved as: $mbox_file.deduplicated"
}

deduplicate_email_patches "patches.mbox"

Integration with Development Workflows:

Patch Review Workflow:

#!/bin/bash
# Patch review workflow with duplicate detection

review_patches() {
    local patch_dir="$1"

    echo "Reviewing patches in $patch_dir"

    declare -A reviewed_patches

    for patch_file in "$patch_dir"/*.patch; do
        [ -f "$patch_file" ] || continue

        patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)

        if [ -n "${reviewed_patches[$patch_id]}" ]; then
            echo "⚠ Duplicate patch detected:"
            echo "  Original: ${reviewed_patches[$patch_id]}"
            echo "  Current: $(basename "$patch_file")"
            echo "  Consider rejecting duplicate"
        else
            echo "✓ New patch: $(basename "$patch_file")"
            reviewed_patches[$patch_id]="$(basename "$patch_file")"

            # Apply and test patch
            if git apply --check "$patch_file" 2>/dev/null; then
                echo "  ✓ Patch applies cleanly"
            else
                echo "  ✗ Patch has conflicts"
            fi
        fi
    done
}

review_patches "/path/to/review/patches"

Automated Patch Management:

# Automated patch management system
manage_patches() {
    local incoming_dir="$1"
    local processed_dir="$2"
    local duplicate_dir="$3"

    mkdir -p "$processed_dir" "$duplicate_dir"

    declare -A known_patches

    # Load existing patch database
    if [ -f patch-database.txt ]; then
        while IFS='|' read -r patch_id filename; do
            known_patches[$patch_id]="$filename"
        done < patch-database.txt
    fi

    # Process incoming patches
    for patch_file in "$incoming_dir"/*.patch; do
        [ -f "$patch_file" ] || continue

        patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)
        filename=$(basename "$patch_file")

        if [ -n "${known_patches[$patch_id]}" ]; then
            echo "Duplicate patch: $filename (matches ${known_patches[$patch_id]})"
            mv "$patch_file" "$duplicate_dir/"
        else
            echo "New patch: $filename"
            mv "$patch_file" "$processed_dir/"
            known_patches[$patch_id]="$filename"

            # Apply patch if it applies cleanly
            if git apply --check "$processed_dir/$filename" 2>/dev/null; then
                git am "$processed_dir/$filename"
                echo "✓ Patch applied successfully"
            else
                echo "⚠ Patch needs manual review"
            fi
        fi
    done

    # Save updated database
    > patch-database.txt
    for patch_id in "${!known_patches[@]}"; do
        echo "$patch_id|${known_patches[$patch_id]}" >> patch-database.txt
    done
}

manage_patches "incoming" "processed" "duplicates"

CI/CD Patch Validation:

# Validate patches in CI/CD pipeline
validate_patches_ci() {
    echo "CI/CD patch validation"

    # Check for duplicate patches in PR
    declare -A pr_patches

    for patch_file in *.patch; do
        [ -f "$patch_file" ] || continue

        patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)

        if [ -n "${pr_patches[$patch_id]}" ]; then
            echo "❌ Duplicate patches detected in PR:"
            echo "  ${pr_patches[$patch_id]}"
            echo "  $patch_file"
            exit 1
        else
            pr_patches[$patch_id]="$patch_file"
        fi

        # Validate patch applies
        if ! git apply --check "$patch_file" 2>/dev/null; then
            echo "❌ Patch does not apply cleanly: $patch_file"
            exit 1
        fi
    done

    echo "✅ All patches validated successfully"
}

validate_patches_ci

Configuration and Best Practices:

Patch ID Algorithm Selection:

# Choose appropriate algorithm based on use case
# For patch deduplication (most common)
git config patch-id.algorithm stable

# For performance-critical operations
git config patch-id.algorithm unstable

# For exact content matching
git config patch-id.algorithm verbatim

Performance Optimization:

# Cache patch IDs for repeated operations
cache_patch_ids() {
    local cache_file=".patch-id-cache"

    if [ ! -f "$cache_file" ] || [ "$cache_file" -ot "$(find . -name "*.patch" -newer "$cache_file" 2>/dev/null | head -1)" ]; then
        echo "Building patch ID cache..."

        > "$cache_file"
        for patch_file in *.patch; do
            [ -f "$patch_file" ] || continue
            patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)
            echo "$patch_id|$patch_file" >> "$cache_file"
        done

        echo "Cache built with $(wc -l < "$cache_file") entries"
    fi
}

cache_patch_ids

Troubleshooting Common Issues:

Inconsistent Patch IDs:

# Debug patch ID inconsistencies
debug_patch_id() {
    local patch_file="$1"

    echo "Debugging patch ID for: $patch_file"

    # Show patch content summary
    echo "Patch statistics:"
    grep -c "^@@" "$patch_file" | xargs echo "Hunks:"
    grep -c "^+" "$patch_file" | xargs echo "Additions:"
    grep -c "^-" "$patch_file" | xargs echo "Deletions:"

    # Compute with different algorithms
    echo "Patch IDs:"
    echo "  Stable:   $(git patch-id --stable < "$patch_file" | cut -d' ' -f1)"
    echo "  Unstable: $(git patch-id --unstable < "$patch_file" | cut -d' ' -f1)"
    echo "  Verbatim: $(git patch-id --verbatim < "$patch_file" | cut -d' ' -f1)"
}

debug_patch_id "problematic.patch"

Large Patch Performance:

# Handle large patches efficiently
process_large_patches() {
    local patch_dir="$1"

    echo "Processing large patches..."

    # Process in parallel for performance
    find "$patch_dir" -name "*.patch" -print0 | \
    xargs -0 -n 1 -P $(nproc) bash -c '
        patch_file="$1"
        patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)
        echo "$patch_id|$patch_file"
    ' _ | sort > patch-ids.txt

    echo "Processed $(wc -l < patch-ids.txt) patches"
}

process_large_patches "/large/patch/collection"

Encoding and Format Issues:

# Handle different patch encodings
normalize_patch_encoding() {
    local patch_file="$1"

    # Detect encoding
    encoding=$(file -b --mime-encoding "$patch_file")

    if [ "$encoding" != "utf-8" ]; then
        echo "Converting $patch_file from $encoding to UTF-8"
        iconv -f "$encoding" -t utf-8 "$patch_file" > "${patch_file}.utf8"
        mv "${patch_file}.utf8" "$patch_file"
    fi

    # Normalize line endings
    sed -i 's/\r$//' "$patch_file"
}

normalize_patch_encoding "encoded.patch"

Real-World Usage Examples:

Open Source Contribution Management:

#!/bin/bash
# Manage open source contributions with patch deduplication

manage_contributions() {
    local contribution_dir="$1"

    echo "Managing contributions in $contribution_dir"

    declare -A contribution_map
    declare -A duplicate_map

    # Process all contribution patches
    find "$contribution_dir" -name "*.patch" | while read patch_file; do
        patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)
        contributor=$(basename "$(dirname "$patch_file")")

        if [ -n "${contribution_map[$patch_id]}" ]; then
            echo "Duplicate contribution detected:"
            echo "  Original: ${contribution_map[$patch_id]}"
            echo "  Duplicate: $contributor/$(basename "$patch_file")"
            duplicate_map[$patch_id]="${duplicate_map[$patch_id]} $contributor"
        else
            contribution_map[$patch_id]="$contributor/$(basename "$patch_file")"
            echo "✓ New contribution: $contributor/$(basename "$patch_file")"
        fi
    done

    # Report duplicates
    if [ ${#duplicate_map[@]} -gt 0 ]; then
        echo ""
        echo "Duplicate summary:"
        for patch_id in "${!duplicate_map[@]}"; do
            echo "Patch ID $patch_id: ${duplicate_map[$patch_id]}"
        done
    fi
}

manage_contributions "/contributions"

Patch-Based Code Review:

# Code review workflow with patch analysis
review_with_patch_analysis() {
    local pr_number="$1"

    echo "Reviewing PR #$pr_number with patch analysis"

    # Get PR patches
    curl -s "https://api.github.com/repos/owner/repo/pulls/$pr_number" |
    jq -r '.diff_url' | xargs curl -s > pr.patch

    # Split into individual patches
    git mailsplit -o. pr.patch >/dev/null

    # Analyze each patch
    for patch_file in [0-9]*; do
        [ -f "$patch_file" ] || continue

        patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)

        echo "Analyzing patch: $patch_file (ID: ${patch_id:0:8})"

        # Check for duplicates in codebase
        if git log --all --pretty=format:"%H" | xargs git show | git patch-id | grep -q "^$patch_id"; then
            echo "  ⚠ Similar changes already exist in codebase"
        fi

        # Check patch quality
        hunks=$(grep -c "^@@" "$patch_file")
        additions=$(grep -c "^+" "$patch_file")
        deletions=$(grep -c "^-" "$patch_file")

        echo "  Stats: $hunks hunks, +$additions -$deletions lines"

        # Apply and test
        if git apply --check "$patch_file" 2>/dev/null; then
            echo "  ✓ Applies cleanly"
        else
            echo "  ✗ Has conflicts"
        fi
    done

    # Cleanup
    rm -f [0-9]* pr.patch
}

review_with_patch_analysis "123"

Automated Patch Testing:

# Automated patch testing and validation
automated_patch_testing() {
    local patch_dir="$1"
    local test_script="$2"

    echo "Automated patch testing for $patch_dir"

    for patch_file in "$patch_dir"/*.patch; do
        [ -f "$patch_file" ] || continue

        echo "Testing: $(basename "$patch_file")"

        # Backup current state
        git branch backup-testing 2>/dev/null || true

        # Apply patch
        if git apply "$patch_file"; then
            echo "  ✓ Patch applied successfully"

            # Run tests
            if [ -x "$test_script" ]; then
                if "$test_script"; then
                    echo "  ✓ Tests passed"
                else
                    echo "  ✗ Tests failed"
                    patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)
                    echo "  Patch ID: $patch_id (for investigation)"
                fi
            fi

            # Revert changes
            git reset --hard HEAD
        else
            echo "  ✗ Patch failed to apply"
        fi

        # Restore backup
        git reset --hard backup-testing 2>/dev/null || true
    done

    git branch -D backup-testing 2>/dev/null || true
}

automated_patch_testing "/patches" "./run-tests.sh"

What’s the difference between patch-id algorithms?

—stable ignores line numbers and context (recommended); —unstable is faster but less stable; —verbatim requires exact content match.

How stable are patch IDs across different Git versions?

Very stable for —stable algorithm. Patch IDs computed with same algorithm should be identical across Git versions.

Can patch-id work with binary patches?

No, patch-id only works with text patches. Binary changes produce different patch IDs even for identical content.

What’s the performance impact of patch-id on large patches?

Linear with patch size. Use —unstable for faster processing if exact stability isn’t required.

How do I use patch-id with git format-patch?

Pipe format-patch output to patch-id: git format-patch -1 | git patch-id

Can patch-id detect reordered patches?

No, patch-id considers content changes but not patch ordering. Two patches with same changes in different order have different IDs.

What’s the relationship between patch-id and commit SHA?

Different concepts: patch-id identifies patch content; commit SHA includes author, date, and parent information.

How do I handle patch-id in scripts?

Use cut -d’ ’ -f1 to extract just the patch ID. Check exit codes and handle errors gracefully.

Can patch-id work with partial patches?

Yes, computes ID from whatever diff content is provided. Useful for analyzing partial changes.

What’s the collision rate for patch IDs?

Very low for —stable algorithm. Collisions would require identical diff content with different line numbers/context.

How do I compare patches from different sources?

Compute patch IDs for both and compare. Same ID means patches are functionally identical.

Can patch-id work with merge commits?

Yes, treats merge commit diffs like any other patch. Computes ID from combined diff content.

What’s the output format of patch-id?

Default: single line with patch ID. With git diff-tree input: patch-id + commit-id on same line.

How do I batch process many patches?

Use find + xargs for parallel processing, or write a loop that caches results to avoid recomputation.

Applications of the git patch-id command

Duplicate Detection: Identify duplicate patches and commits across repositories and patch series
Patch Management: Organize and deduplicate large collections of patches
Code Review: Detect when similar changes are proposed multiple times
Automated Testing: Validate patch uniqueness in CI/CD pipelines
Contribution Tracking: Manage open source contributions and avoid duplicate work
Patch Series Analysis: Analyze relationships between patches in complex patch sets