patch-id Git Command Guide
The git patch-id command reads a patch from standard input and computes a unique patch ID based on the file diffs. Patch IDs are stable across line number changes and can be used to identify duplicate or equivalent patches.
git patch-id Syntax:
Section titled “git patch-id Syntax:”git patch-id [--stable | --unstable | --verbatim]Algorithm Options:
Section titled “Algorithm Options:”| Option | Description |
|---|---|
--stable | Use stable patch ID algorithm (default) |
--unstable | Use unstable algorithm (faster but less stable) |
--verbatim | Use verbatim algorithm (exact content match) |
Understanding Patch ID Computation:
Section titled “Understanding Patch ID Computation:”Patch ID Algorithm:
Section titled “Patch ID Algorithm:”Patch ID = SHA-1 sum of file diffs with:├── Context lines ignored├── Line numbers ignored├── Whitespace changes ignored└── Only actual diff content considered
Stable: Ignores line numbers and contextUnstable: Faster but may vary with formattingVerbatim: Exact patch content matchPatch ID Properties:
Section titled “Patch ID Properties:”Stable across:├── Line number changes├── Context size changes├── Minor formatting differences├── File renames (if content same)
Changes with:├── Actual code changes├── Added/removed lines├── Significant formatting changesBasic Usage Examples:
Section titled “Basic Usage Examples:”Compute Patch ID from Diff:
Section titled “Compute Patch ID from Diff:”# Generate patch ID from git diff outputgit diff HEAD~1 | git patch-id
# Output: <patch-id>
# Compute ID for specific commitgit show <commit> | git patch-id
# Get patch ID for staged changesgit diff --cached | git patch-idCompare Patches for Duplicates:
Section titled “Compare Patches for Duplicates:”# Compare two commits for duplicate patchesgit show commit1 | git patch-id > id1.txtgit show commit2 | git patch-id > id2.txt
if diff id1.txt id2.txt >/dev/null; then echo "Patches are identical"else echo "Patches are different"fiProcess Multiple Patches:
Section titled “Process Multiple Patches:”# Process multiple patches from mailboxgit am --patch-format=mbox < patches.mbox |while read -r line; do echo "$line" | git patch-iddone
# Batch process patch filesfor patch in *.patch; do echo "Processing $patch:" git patch-id < "$patch"doneAdvanced Patch Analysis Scenarios:
Section titled “Advanced Patch Analysis Scenarios:”Duplicate Commit Detection:
Section titled “Duplicate Commit Detection:”#!/bin/bash# Find duplicate commits in repository
find_duplicate_commits() { echo "Searching for duplicate commits..."
# Create mapping of patch IDs to commits declare -A patch_map
# Process all commits git log --all --pretty=format:"%H" | while read commit; do patch_id=$(git show "$commit" | git patch-id | cut -d' ' -f1)
if [ -n "${patch_map[$patch_id]}" ]; then echo "Duplicate found:" echo " Original: ${patch_map[$patch_id]}" echo " Duplicate: $commit" echo " Patch ID: $patch_id" echo "" else patch_map[$patch_id]="$commit" fi done}
find_duplicate_commitsPatch Series Analysis:
Section titled “Patch Series Analysis:”# Analyze patch series for duplicates or conflictsanalyze_patch_series() { local patch_dir="$1"
echo "Analyzing patch series in $patch_dir"
declare -A seen_patches
for patch_file in "$patch_dir"/*.patch; do [ -f "$patch_file" ] || continue
patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1) patch_name=$(basename "$patch_file")
if [ -n "${seen_patches[$patch_id]}" ]; then echo "Warning: Duplicate patch content" echo " Original: ${seen_patches[$patch_id]}" echo " Duplicate: $patch_name" echo " Patch ID: $patch_id" else seen_patches[$patch_id]="$patch_name" echo "✓ $patch_name (ID: ${patch_id:0:8})" fi done}
analyze_patch_series "/path/to/patches"Email Thread Deduplication:
Section titled “Email Thread Deduplication:”# Remove duplicate patches from email threadsdeduplicate_email_patches() { local mbox_file="$1"
echo "Deduplicating patches in $mbox_file"
declare -A processed_patches temp_file=$(mktemp)
# Process each message in mbox git mailsplit -o. "$mbox_file" >/dev/null
for msg_file in [0-9]*; do [ -f "$msg_file" ] || continue
# Extract patch content if sed -n '/^---$/,/^---$/p' "$msg_file" | git patch-id >/dev/null 2>&1; then patch_id=$(sed -n '/^---$/,/^---$/p' "$msg_file" | git patch-id | cut -d' ' -f1)
if [ -z "${processed_patches[$patch_id]}" ]; then processed_patches[$patch_id]="$msg_file" cat "$msg_file" >> "$temp_file" echo "" >> "$temp_file" # Message separator else echo "Skipping duplicate patch: $msg_file (matches ${processed_patches[$patch_id]})" fi else # Not a patch message, include as-is cat "$msg_file" >> "$temp_file" echo "" >> "$temp_file" fi done
# Clean up rm -f [0-9]*
# Replace original with deduplicated version mv "$temp_file" "$mbox_file.deduplicated"
echo "Deduplicated mbox saved as: $mbox_file.deduplicated"}
deduplicate_email_patches "patches.mbox"Integration with Development Workflows:
Section titled “Integration with Development Workflows:”Patch Review Workflow:
Section titled “Patch Review Workflow:”#!/bin/bash# Patch review workflow with duplicate detection
review_patches() { local patch_dir="$1"
echo "Reviewing patches in $patch_dir"
declare -A reviewed_patches
for patch_file in "$patch_dir"/*.patch; do [ -f "$patch_file" ] || continue
patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)
if [ -n "${reviewed_patches[$patch_id]}" ]; then echo "⚠ Duplicate patch detected:" echo " Original: ${reviewed_patches[$patch_id]}" echo " Current: $(basename "$patch_file")" echo " Consider rejecting duplicate" else echo "✓ New patch: $(basename "$patch_file")" reviewed_patches[$patch_id]="$(basename "$patch_file")"
# Apply and test patch if git apply --check "$patch_file" 2>/dev/null; then echo " ✓ Patch applies cleanly" else echo " ✗ Patch has conflicts" fi fi done}
review_patches "/path/to/review/patches"Automated Patch Management:
Section titled “Automated Patch Management:”# Automated patch management systemmanage_patches() { local incoming_dir="$1" local processed_dir="$2" local duplicate_dir="$3"
mkdir -p "$processed_dir" "$duplicate_dir"
declare -A known_patches
# Load existing patch database if [ -f patch-database.txt ]; then while IFS='|' read -r patch_id filename; do known_patches[$patch_id]="$filename" done < patch-database.txt fi
# Process incoming patches for patch_file in "$incoming_dir"/*.patch; do [ -f "$patch_file" ] || continue
patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1) filename=$(basename "$patch_file")
if [ -n "${known_patches[$patch_id]}" ]; then echo "Duplicate patch: $filename (matches ${known_patches[$patch_id]})" mv "$patch_file" "$duplicate_dir/" else echo "New patch: $filename" mv "$patch_file" "$processed_dir/" known_patches[$patch_id]="$filename"
# Apply patch if it applies cleanly if git apply --check "$processed_dir/$filename" 2>/dev/null; then git am "$processed_dir/$filename" echo "✓ Patch applied successfully" else echo "⚠ Patch needs manual review" fi fi done
# Save updated database > patch-database.txt for patch_id in "${!known_patches[@]}"; do echo "$patch_id|${known_patches[$patch_id]}" >> patch-database.txt done}
manage_patches "incoming" "processed" "duplicates"CI/CD Patch Validation:
Section titled “CI/CD Patch Validation:”# Validate patches in CI/CD pipelinevalidate_patches_ci() { echo "CI/CD patch validation"
# Check for duplicate patches in PR declare -A pr_patches
for patch_file in *.patch; do [ -f "$patch_file" ] || continue
patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)
if [ -n "${pr_patches[$patch_id]}" ]; then echo "❌ Duplicate patches detected in PR:" echo " ${pr_patches[$patch_id]}" echo " $patch_file" exit 1 else pr_patches[$patch_id]="$patch_file" fi
# Validate patch applies if ! git apply --check "$patch_file" 2>/dev/null; then echo "❌ Patch does not apply cleanly: $patch_file" exit 1 fi done
echo "✅ All patches validated successfully"}
validate_patches_ciConfiguration and Best Practices:
Section titled “Configuration and Best Practices:”Patch ID Algorithm Selection:
Section titled “Patch ID Algorithm Selection:”# Choose appropriate algorithm based on use case# For patch deduplication (most common)git config patch-id.algorithm stable
# For performance-critical operationsgit config patch-id.algorithm unstable
# For exact content matchinggit config patch-id.algorithm verbatimPerformance Optimization:
Section titled “Performance Optimization:”# Cache patch IDs for repeated operationscache_patch_ids() { local cache_file=".patch-id-cache"
if [ ! -f "$cache_file" ] || [ "$cache_file" -ot "$(find . -name "*.patch" -newer "$cache_file" 2>/dev/null | head -1)" ]; then echo "Building patch ID cache..."
> "$cache_file" for patch_file in *.patch; do [ -f "$patch_file" ] || continue patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1) echo "$patch_id|$patch_file" >> "$cache_file" done
echo "Cache built with $(wc -l < "$cache_file") entries" fi}
cache_patch_idsTroubleshooting Common Issues:
Section titled “Troubleshooting Common Issues:”Inconsistent Patch IDs:
Section titled “Inconsistent Patch IDs:”# Debug patch ID inconsistenciesdebug_patch_id() { local patch_file="$1"
echo "Debugging patch ID for: $patch_file"
# Show patch content summary echo "Patch statistics:" grep -c "^@@" "$patch_file" | xargs echo "Hunks:" grep -c "^+" "$patch_file" | xargs echo "Additions:" grep -c "^-" "$patch_file" | xargs echo "Deletions:"
# Compute with different algorithms echo "Patch IDs:" echo " Stable: $(git patch-id --stable < "$patch_file" | cut -d' ' -f1)" echo " Unstable: $(git patch-id --unstable < "$patch_file" | cut -d' ' -f1)" echo " Verbatim: $(git patch-id --verbatim < "$patch_file" | cut -d' ' -f1)"}
debug_patch_id "problematic.patch"Large Patch Performance:
Section titled “Large Patch Performance:”# Handle large patches efficientlyprocess_large_patches() { local patch_dir="$1"
echo "Processing large patches..."
# Process in parallel for performance find "$patch_dir" -name "*.patch" -print0 | \ xargs -0 -n 1 -P $(nproc) bash -c ' patch_file="$1" patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1) echo "$patch_id|$patch_file" ' _ | sort > patch-ids.txt
echo "Processed $(wc -l < patch-ids.txt) patches"}
process_large_patches "/large/patch/collection"Encoding and Format Issues:
Section titled “Encoding and Format Issues:”# Handle different patch encodingsnormalize_patch_encoding() { local patch_file="$1"
# Detect encoding encoding=$(file -b --mime-encoding "$patch_file")
if [ "$encoding" != "utf-8" ]; then echo "Converting $patch_file from $encoding to UTF-8" iconv -f "$encoding" -t utf-8 "$patch_file" > "${patch_file}.utf8" mv "${patch_file}.utf8" "$patch_file" fi
# Normalize line endings sed -i 's/\r$//' "$patch_file"}
normalize_patch_encoding "encoded.patch"Real-World Usage Examples:
Section titled “Real-World Usage Examples:”Open Source Contribution Management:
Section titled “Open Source Contribution Management:”#!/bin/bash# Manage open source contributions with patch deduplication
manage_contributions() { local contribution_dir="$1"
echo "Managing contributions in $contribution_dir"
declare -A contribution_map declare -A duplicate_map
# Process all contribution patches find "$contribution_dir" -name "*.patch" | while read patch_file; do patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1) contributor=$(basename "$(dirname "$patch_file")")
if [ -n "${contribution_map[$patch_id]}" ]; then echo "Duplicate contribution detected:" echo " Original: ${contribution_map[$patch_id]}" echo " Duplicate: $contributor/$(basename "$patch_file")" duplicate_map[$patch_id]="${duplicate_map[$patch_id]} $contributor" else contribution_map[$patch_id]="$contributor/$(basename "$patch_file")" echo "✓ New contribution: $contributor/$(basename "$patch_file")" fi done
# Report duplicates if [ ${#duplicate_map[@]} -gt 0 ]; then echo "" echo "Duplicate summary:" for patch_id in "${!duplicate_map[@]}"; do echo "Patch ID $patch_id: ${duplicate_map[$patch_id]}" done fi}
manage_contributions "/contributions"Patch-Based Code Review:
Section titled “Patch-Based Code Review:”# Code review workflow with patch analysisreview_with_patch_analysis() { local pr_number="$1"
echo "Reviewing PR #$pr_number with patch analysis"
# Get PR patches curl -s "https://api.github.com/repos/owner/repo/pulls/$pr_number" | jq -r '.diff_url' | xargs curl -s > pr.patch
# Split into individual patches git mailsplit -o. pr.patch >/dev/null
# Analyze each patch for patch_file in [0-9]*; do [ -f "$patch_file" ] || continue
patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1)
echo "Analyzing patch: $patch_file (ID: ${patch_id:0:8})"
# Check for duplicates in codebase if git log --all --pretty=format:"%H" | xargs git show | git patch-id | grep -q "^$patch_id"; then echo " ⚠ Similar changes already exist in codebase" fi
# Check patch quality hunks=$(grep -c "^@@" "$patch_file") additions=$(grep -c "^+" "$patch_file") deletions=$(grep -c "^-" "$patch_file")
echo " Stats: $hunks hunks, +$additions -$deletions lines"
# Apply and test if git apply --check "$patch_file" 2>/dev/null; then echo " ✓ Applies cleanly" else echo " ✗ Has conflicts" fi done
# Cleanup rm -f [0-9]* pr.patch}
review_with_patch_analysis "123"Automated Patch Testing:
Section titled “Automated Patch Testing:”# Automated patch testing and validationautomated_patch_testing() { local patch_dir="$1" local test_script="$2"
echo "Automated patch testing for $patch_dir"
for patch_file in "$patch_dir"/*.patch; do [ -f "$patch_file" ] || continue
echo "Testing: $(basename "$patch_file")"
# Backup current state git branch backup-testing 2>/dev/null || true
# Apply patch if git apply "$patch_file"; then echo " ✓ Patch applied successfully"
# Run tests if [ -x "$test_script" ]; then if "$test_script"; then echo " ✓ Tests passed" else echo " ✗ Tests failed" patch_id=$(git patch-id < "$patch_file" | cut -d' ' -f1) echo " Patch ID: $patch_id (for investigation)" fi fi
# Revert changes git reset --hard HEAD else echo " ✗ Patch failed to apply" fi
# Restore backup git reset --hard backup-testing 2>/dev/null || true done
git branch -D backup-testing 2>/dev/null || true}
automated_patch_testing "/patches" "./run-tests.sh"What’s the difference between patch-id algorithms?
Section titled “What’s the difference between patch-id algorithms?”—stable ignores line numbers and context (recommended); —unstable is faster but less stable; —verbatim requires exact content match.
How stable are patch IDs across different Git versions?
Section titled “How stable are patch IDs across different Git versions?”Very stable for —stable algorithm. Patch IDs computed with same algorithm should be identical across Git versions.
Can patch-id work with binary patches?
Section titled “Can patch-id work with binary patches?”No, patch-id only works with text patches. Binary changes produce different patch IDs even for identical content.
What’s the performance impact of patch-id on large patches?
Section titled “What’s the performance impact of patch-id on large patches?”Linear with patch size. Use —unstable for faster processing if exact stability isn’t required.
How do I use patch-id with git format-patch?
Section titled “How do I use patch-id with git format-patch?”Pipe format-patch output to patch-id: git format-patch -1 | git patch-id
Can patch-id detect reordered patches?
Section titled “Can patch-id detect reordered patches?”No, patch-id considers content changes but not patch ordering. Two patches with same changes in different order have different IDs.
What’s the relationship between patch-id and commit SHA?
Section titled “What’s the relationship between patch-id and commit SHA?”Different concepts: patch-id identifies patch content; commit SHA includes author, date, and parent information.
How do I handle patch-id in scripts?
Section titled “How do I handle patch-id in scripts?”Use cut -d’ ’ -f1 to extract just the patch ID. Check exit codes and handle errors gracefully.
Can patch-id work with partial patches?
Section titled “Can patch-id work with partial patches?”Yes, computes ID from whatever diff content is provided. Useful for analyzing partial changes.
What’s the collision rate for patch IDs?
Section titled “What’s the collision rate for patch IDs?”Very low for —stable algorithm. Collisions would require identical diff content with different line numbers/context.
How do I compare patches from different sources?
Section titled “How do I compare patches from different sources?”Compute patch IDs for both and compare. Same ID means patches are functionally identical.
Can patch-id work with merge commits?
Section titled “Can patch-id work with merge commits?”Yes, treats merge commit diffs like any other patch. Computes ID from combined diff content.
What’s the output format of patch-id?
Section titled “What’s the output format of patch-id?”Default: single line with patch ID. With git diff-tree input: patch-id + commit-id on same line.
How do I batch process many patches?
Section titled “How do I batch process many patches?”Use find + xargs for parallel processing, or write a loop that caches results to avoid recomputation.
Applications of the git patch-id command
Section titled “Applications of the git patch-id command”- Duplicate Detection: Identify duplicate patches and commits across repositories and patch series
- Patch Management: Organize and deduplicate large collections of patches
- Code Review: Detect when similar changes are proposed multiple times
- Automated Testing: Validate patch uniqueness in CI/CD pipelines
- Contribution Tracking: Manage open source contributions and avoid duplicate work
- Patch Series Analysis: Analyze relationships between patches in complex patch sets