File carving is the recovery of files from raw disk data without relying on the filesystem. When a file is deleted, the OS removes its directory entry and marks the space as available — but the actual bytes often remain untouched until overwritten.
Carving works by scanning raw storage for known file signatures (sometimes called "magic bytes") that mark the beginning and end of specific file types, then extracting everything in between.
You'll use this technique when recovering deleted files from unallocated space, working with damaged or corrupt filesystems, or imaging flash media like SD cards where the FAT has been wiped.
⚠️ Fragmentation: Manual carving assumes the file is stored in contiguous clusters. Fragmented files require more advanced recovery. JPEG files on flash media are typically contiguous.
Step 2 — JPEG File Signatures
Every file type has a unique byte pattern at its start and (often) its end. These are documented in Gary Kessler's File Signature Table — the go-to reference for forensic analysts.
For JPEG files, the signatures are:
Marker
Hex Signature
Description
SOF
FF D8 FF
Start of File — first 3 bytes of every JPEG
EOF
FF D9
End of File — last 2 bytes of every JPEG
⚠️ False EOF warning: The byte sequence FF D9 can appear inside compressed image data. When carving, always use the last occurrence of FF D9 after your SOF as the true end of file.
💡 These same signatures are what tools like Foremost, Scalpel, and PhotoRec are configured to search for. Understanding them manually helps you know why those tools work — and how to recover files when they don't.
Step 3 — Examine the Disk Image
The hex dump on the right shows a raw sector from an evidence disk image (evidence.img). This is what you'd see opening a disk image in a hex editor like HxD or WinHex, or through the hex view in Autopsy.
The dimmed bytes at the top and bottom are slack space — unallocated data with no meaningful content. Somewhere in between, a JPEG is embedded. Your job is to find it using its signatures.
The layout follows standard hex editor format: Offset | Hex bytes (16 per row, grouped in 8) | ASCII
💡 In a Linux terminal you can search for JPEG signatures directly: grep -obUaP "\xFF\xD8\xFF" evidence.img
Step 4 — Locate the SOF
Scan the hex dump and find the JPEG Start of File signature: FF D8 FF
👆 Click the first FF byte of that 3-byte sequence to mark your SOF offset.
✅ SOF located! The JPEG begins at offset 0x0020 (decimal 32). Note this offset — it's your carve starting point.
SOF Offset
—
Signature Found
—
Step 5 — Locate the EOF
Now find the JPEG End of File signature: FF D9
Scan forward from your SOF at 0x0020 and find the last FF D9 sequence.
👆 Click the FF byte that begins that final FF D9 pair.
✅ EOF located! The JPEG ends at offset 0x00F8–0x00F9 (decimal 248–249). Note this as your carve end point — the last byte of the file is at 0x00F9.
SOF Offset
0x0020
EOF Offset
—
Step 6 — Calculate Size & Carve
With both offsets confirmed, you can calculate the exact file size and extract the bytes.
In practice, you'd select all bytes between your SOF and EOF offsets in your hex editor, copy them, paste into a new file, and save it with a .jpg extension.
Before documenting your findings, verify the carved file is structurally valid and generate hashes for your chain of custody record.
Structural check: Confirm the carved file starts with FF D8 FF and ends with FF D9. ✅
Hash the file: Generate MD5 and/or SHA1 hashes to document the file's integrity. These go in your case notes so anyone can verify the carved file hasn't been modified.
Validate it opens: Run file carved.jpg on Linux, or simply open it in an image viewer. A valid JPEG will be recognized and render correctly.
Output File
carved_001.jpg
File Size
218 bytes
Valid JPEG
✔ Yes
MD5a3f8e2d1c947b605f3a817e29d04c681
SHA17c4a9f2e1d830b56e4f7a2c3d819f0e4b623a1d7
📄 evidence.img — Sector 0x0000
SlackSOFEOFCarved Range
⚒️ Carved Output — carved_001.jpg (218 bytes)
Bytes extracted from 0x0020 → 0x00F9:
Step 8 — Common File Type Signatures Reference
As a digital forensics investigator, you'll carve far more than just JPEGs. The table below lists the most commonly encountered file types and their hex signatures. Always cross-reference with the Gary Kessler File Signature Table for the most complete and up-to-date reference.
⚠️ Understanding null and variable footers: Many file types do not have a defined footer signature. On the GCK table (and others), these appear as null, blank, or "n/a" in the footer column. This means the file format has no unique closing byte sequence — carving tools cannot reliably detect where the file ends on content alone.
For these file types, investigators use one of two approaches: (1) size-based carving, where a maximum file size is set and the carver extracts up to that limit, or (2) next-header carving, where the carver assumes the file ends just before the next recognized file signature begins in the data stream. Both methods can produce over-carved or under-carved output, so manual validation is especially important for null-footer file types.
File Type
Extension(s)
Header (SOF)
Footer (EOF)
Footer Notes
📷 Images
JPEG / JFIF
.jpg .jpeg
FF D8 FF
FF D9
Reliable fixed footer; always present
PNG
.png
89 50 4E 47 0D 0A 1A 0A
49 45 4E 44 AE 42 60 82
Reliable fixed footer (IEND chunk)
GIF
.gif
47 49 46 38 (37|39) 61
00 3B
Fixed footer; 37=GIF87a, 39=GIF89a
BMP
.bmp
42 4D
null
No footer; file size embedded in header at offset 0x02–0x05 — carvers read this field to determine length
TIFF
.tif .tiff
49 49 2A 00 or 4D 4D 00 2A
null
No footer; little-endian (II) or big-endian (MM) variants; size-based carving required
📄 Documents
PDF
.pdf
25 50 44 46
25 25 45 4F 46
Footer is %%EOF in ASCII; may be followed by trailing whitespace — use last occurrence
DOCX / XLSX / PPTX
.docx .xlsx .pptx
50 4B 03 04
50 4B 05 06
Office Open XML files are ZIP containers; footer is the ZIP end-of-central-directory record followed by 18 variable bytes
Legacy Office (DOC/XLS/PPT)
.doc .xls .ppt
D0 CF 11 E0 A1 B1 1A E1
null
OLE2 Compound File format; no reliable footer — size-based carving only
🗜️ Archives
ZIP
.zip
50 4B 03 04
50 4B 05 06
End-of-central-directory signature; 18 additional variable-length bytes follow — carving may need manual trimming
RAR
.rar
52 61 72 21 1A 07 00
null
RAR 4.x; RAR 5.x uses 52 61 72 21 1A 07 01 00. No standard footer; size-based carving required
🎥 Video & Audio
MP4 / MOV
.mp4 .mov .m4v
00 00 00 xx 66 74 79 70
null
xx is a variable size byte; ftyp is the atom type in ASCII. No footer — size-based or next-header carving required
AVI
.avi
52 49 46 46
null
RIFF container format; header reads RIFF in ASCII. No footer; file size stored in header bytes 4–7
MP3
.mp3
FF FB or FF F3 or FF F2
null
Frame sync header; first byte is always FF, second byte encodes bitrate/sample rate flags. No footer — highly prone to over-carving
⚙️ Executables & System
Windows EXE / DLL
.exe .dll .sys
4D 5A
null
MZ header (DOS stub); extremely common — 4D 5A will generate many false positives. No footer; size-based carving only
SQLite Database
.db .sqlite
53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00
null
Header reads SQLite format 3 in ASCII. File size stored internally; no footer signature
📚 This table covers the most frequently encountered types in casework, but it is not exhaustive. For the full reference covering hundreds of file types, bookmark the Gary Kessler File Signature Table.
🎉 Lab Complete! You've identified JPEG file signatures using the GCK table, located SOF and EOF offsets in a raw hex dump, calculated the file size, extracted the bytes, documented hashes for chain of custody, and reviewed common file signatures you'll encounter as a forensic investigator.