A checksum function reads every byte of a file and runs it through a fixed calculation, producing a short result such as a 32 or 64 character hex string. The key properties are:
Common checksum algorithms include MD5, SHA-1, and SHA-256.
Checksums answer two questions:
File size is a weak signal. Two different files can happen to be the same size, and two copies of the same file can differ by a few metadata bytes while the meaningful content is identical. A checksum reads the whole file, so it does not get fooled the way a size comparison does. It is the honest way to tell whether two files match.
For spotting accidental duplicates or verifying a download from a source you trust, MD5 is fast and fine. When the check has to resist someone deliberately faking a match, use SHA-256, which is designed to make that practically impossible. Many tools compute all three so you can pick per situation.
Compute MD5, SHA-1, and SHA-256 across a whole folder on Mac, then find duplicates by content. Runs on-device. · macOS