---
name: duplicate-file-cleaner-expert
description: Find, present, and safely remove duplicate image/media files on a user's computer using metadata-aware matching and safe preview/backup workflows. Common triggers: "find duplicate photos", "clean duplicates in ~/Pictures", "show duplicate RAW files by EXIF and resolution".
---

# Skill purpose

This Skill helps users (especially photographers and designers) reclaim disk space by finding duplicate image/media files using metadata-aware matching (EXIF, resolution, camera model, capture date) and content hashing. The Skill emphasizes safety: present candidate groups with thumbnails and metadata, require manual selection for deletions, and automatically create backups (compressed archive or user-specified backup folder) prior to deletion for easy recovery.

# Step-by-step instructions Claude must follow

1. Clarify scope and safety preferences
   - Ask the user what folders or file types to scan (default: Pictures and common image/video extensions).
   - Confirm matching sensitivity (exact hash-only, metadata-priority, or hybrid), whether to include content-similarity heuristics, and maximum file size to consider.
   - Ask where to store automatic backups (default: ~/duplicate_backups/YYYYMMDD_HHMMSS.zip) and whether to use OS trash instead of permanent delete.

2. Build scanning plan and parameters
   - Determine: file extensions, recursion depth, follow symlinks (yes/no), metadata fields to prioritize (EXIF DateTimeOriginal, resolution, camera model, lens), and thresholds for similarity.
   - Explain that metadata mismatches will not be treated as duplicates unless user chooses metadata-priority mode.

3. Execute safe scan (describe or run as appropriate to environment)
   - For each candidate file gather: absolute path, file size, SHA-256 (or user-selected) hash of content, image/video metadata (EXIF fields, dimensions, bitrate/duration for video), and a small thumbnail (e.g., 256px) for preview.
   - Group candidates by chosen matching strategy: exact-hash groups first; then metadata-similar groups (same capture date ± N seconds/minutes, resolution equal or lower/higher preference), and finally content-similarity groups if enabled.

4. Present results to the user for review
   - For each candidate group show a concise summary: number of files, total reclaimable space, suggested keep file (rule-based: highest resolution, newest edit timestamp, preferred directory), and previews: thumbnails and metadata table (filename, path, size, resolution, capture date, camera model, hash).
   - Provide UI-like actions (or clear CLI prompts): Keep this file, Mark others for deletion, Select files to keep manually, Select all suggested, or Skip group.

5. Confirm deletion workflow and backup
   - Before performing deletions, summarize selected deletions and confirm total space to free and backup location.
   - Create backup: either move selected-to-delete files into a timestamped folder, or compress them into a zip/tar.gz in the backup location. Verify backup integrity (e.g., list archive contents or checksum) before removing originals.
   - If user prefers OS trash, move files to trash instead of permanent deletion; still optionally create backup archive.

6. Perform deletion and provide recovery steps
   - Delete originals after backup verification.
   - Report completed actions, reclaimed space, and exact path to backup archive or instructions to restore from OS trash.
   - Provide a one-command or step-by-step restore instruction (e.g., unzip backup and move files back to original paths or restore from trash).

7. Logging and undo window
   - Produce a machine-readable log (JSON) that lists removed files, original paths, backup archive path, timestamps, and hashes for at least 30 days by default.
   - Offer a one-click undo instruction that reverts deletions by restoring from the backup archive.

# Usage examples

Example 1 — Guided scan request (typical)
- User: "Find duplicate photos in ~/Pictures, prefer highest resolution, back up deletions to ~/dup_backups, show me candidates before deleting."
- Skill: Ask confirmation of file types and similarity threshold → run scan → present groups with thumbnails and metadata → user marks files to delete → Skill creates backup zip ~/dup_backups/dup_20260104_120000.zip, verifies archive, deletes originals, returns report and undo command.

Example 2 — Metadata-priority mode for photographers
- User: "Scan external drive /Volumes/SDCard, treat files with same EXIF DateTime and camera model as duplicates even if different file sizes."
- Skill: Run metadata-based grouping, present candidate groups emphasizing EXIF fields, allow user to choose by resolution/ISO/filename, back up and delete upon explicit confirmation.

Example 3 — Dry-run and JSON log
- User: "Dry-run across ~/ClientWork for duplicates and output JSON of candidate groups."
- Skill: Perform analysis without making backups or deletions, return JSON report of groups with metadata and hashes for offline review.

# Best practices

- Always run a dry-run first on critical folders before deleting anything.
- Prefer hybrid matching: exact-hash for definite duplicates, metadata for likely duplicates; let the user decide ambiguous groups manually.
- Keep automatic backups for at least 30 days or until user explicitly purges them.
- For large scans, allow incremental runs and resume capability; process files in batches to avoid high memory use.
- Respect user privacy: do not upload files or metadata off-device without explicit consent.

# Implementation notes and configurable defaults (placeholders)

- Default file extensions: jpg, jpeg, png, tif, tiff, nef, cr2, arw, dng, raw, mp4, mov, avi.
- Default thumbnail size: 256px max dimension.
- Default hash algorithm: SHA-256.
- Default backup path template: ~/duplicate_backups/dup_{timestamp}.zip
- Default log path: ~/duplicate_backups/dup_{timestamp}_log.json
- Default undo retention: 30 days.

# When to call this Skill

- User asks to locate or remove duplicate photos/media.
- User requests reclaiming disk space for media-heavy folders.
- User requests a safe, metadata-aware duplicate cleanup with previews and backups.

# Related scripts / integration pointers

- If implemented as a CLI or GUI tool, integrate OS-native trash APIs for safe deletion and use exiftool or native libraries to extract image metadata.
- For performance, extract metadata before hashing; only compute full content hashes for groups that pass initial metadata filters.

# Step-by-step prompts Claude should use when interacting with the user

1. "Which folders and file types should I scan? (default: ~/Pictures, common image/video extensions)"
2. "Choose matching mode: exact-hash, metadata-priority, or hybrid. Any similarity thresholds?"
3. "Where should I store backups? (default: ~/duplicate_backups/) Use OS trash instead?"
4. "Run a dry-run first? (recommended)"
5. After scan: "I found N candidate groups totaling X GB. Review groups now or receive a JSON report?"
6. Before delete: "Confirm deletion of Y files and backup to /path/to/archive.zip. Proceed?"

# Best-effort safety guarantees

- Never permanently delete without backup or explicit user opt-in to bypass backups.
- Always show thumbnails and metadata for human confirmation prior to deletion.
- Provide easy restore instructions and a machine-readable log to support auditing.

