Skip to content

CH4: File Systems & Data Recovery

Introduction

In Chapter 3, we mastered the science of preservation, learning how to acquire and hash evidence without altering it. Now, we must look inside those forensic containers.

In the physical world, if you throw a document into a fire, it is gone forever. In the digital world, however, "deletion" is rarely synonymous with destruction. For a digital forensics investigator, understanding how data is written to a disk, how the operating system tracks that data, and how to recover it when the system claims it is gone, is the difference between a closed case and a cold case.

This chapter shifts our focus to the granular technical reality of data storage. We often refer to this as "The Hex Week" because it requires you to look beneath the user interface of Windows or macOS and view the raw data as the computer sees it: in Hexadecimal and Binary. We will explore the architecture of the NTFS file system, the concept of "slack space" where evidence hides in the margins, and the manual art of file carving.

Learning Objectives

By the end of this chapter, you will be able to:

  • Differentiate between physical storage concepts (sectors) and logical storage concepts (clusters).
  • Compare MBR and GPT partition structures and their impact on disk analysis.
  • Analyze the structure of the NTFS Master File Table (MFT) and identify resident vs. non-resident data.
  • Trace the deletion process in the Windows Recycle Bin, identifying the roles of $I and $R files.
  • Calculate Drive Slack and explain its significance in recovering overwritten files.
  • Interpret raw data using Hexadecimal notation and identify file signatures (Magic Bytes).
  • Perform manual data recovery (File Carving) and explain how fragmentation complicates this process.
  • Configure and execute automated carving tools like Foremost, Scalpel, and PhotoRec.

4.1 The Language of Data: Binary and Hexadecimal

Before we can understand how a hard drive stores a photograph or a text file, we must understand the language the computer speaks. At the most fundamental level, computers function on switches: On or Off.

4.1.1 Bits, Bytes, and Nibbles

The fundamental unit of data is the Bit (Binary Digit), which is represented as a 0 (Off) or a 1 (On).

  • Bit: A single 0 or 1.
  • Nibble: A grouping of 4 bits (e.g., 1011).
  • Byte: A grouping of 8 bits (e.g., 10110011).

In forensics, the Byte is the standard unit of measurement. However, reading streams of binary (e.g., 1101001010101110) is incredibly difficult for humans. To make this readable, we use Hexadecimal (Base 16).

4.1.2 Understanding Hexadecimal

Hexadecimal allows us to represent one Byte (8 bits) using just two characters. It uses the numbers 0-9 and the letters A-F.

  • 0-9: Represents values 0 through 9.
  • A-F: Represents values 10 through 15 (A=10, B=11, C=12, D=13, E=14, F=15).

For example, the binary byte 01000001 translates to the decimal number 65. In ASCII (American Standard Code for Information Interchange), 65 represents the capital letter "A." In Hexadecimal, this is written as 41. When you open a forensic tool like FTK Imager or a Hex Editor, you will see the "Hex View" in the center panel and the decoded text on the right.

4.1.3 Endianness

When data is stored on a drive, the order in which the bytes are written depends on the computer's architecture. This concept is known as Endianness.

  • Big Endian: The "most significant byte" (the big end) is stored first. This is how humans read numbers (left to right).
  • Little Endian: The "least significant byte" (the little end) is stored first.

Forensic Relevance: Most modern Intel and AMD systems (x86/x64 architectures) use Little Endian. This means that if you are looking for a specific value in a hex editor, the bytes might appear reversed. For example, if a file size is recorded as 1024 (which is 04 00 in Hex), a Little Endian system will store it as 00 04. If you do not account for this, your calculations of file sizes or offsets will be incorrect.


4.2 Storage Concepts: Physical vs. Logical

To investigate a drive, you must understand how it is organized. We divide storage into two categories: Physical (the actual hardware) and Logical (how the Operating System organizes the hardware).

4.2.1 Physical Storage: Sectors

The Sector is the smallest physical storage unit on a hard disk platter.

  • Legacy Standard: Historically, sectors were 512 bytes.
  • Modern Standard: Advanced Format drives use 4096 bytes (4KB) per sector.
  • Addressing: Sectors are located using CHS (Cylinder, Head, Sector) addressing on older drives or LBA (Logical Block Addressing) on modern drives.

4.2.2 Logical Storage: Clusters

The Operating System (like Windows) does not write data to individual sectors; that would be inefficient. Instead, it groups sectors into Clusters (also known as allocation units).

  • A cluster is the smallest amount of disk space that can be allocated to hold a file.
  • Common cluster sizes are 4KB (4096 bytes). If a sector is 512 bytes, a 4KB cluster consists of 8 contiguous sectors.

Key Forensic Concept: Even if a file is only 1 byte in size, it must take up an entire cluster. The system cannot put two different files in the same cluster. This inefficiency creates a forensic goldmine known as "Slack Space," which we will discuss later in this chapter.

4.2.3 SSDs and The Trim Problem

While traditional Hard Disk Drives (HDDs) physically overwrite data, Solid State Drives (SSDs) function differently. SSDs use a command called Trim (and Garbage Collection) to manage storage. When a user deletes a file on an SSD, the OS sends a Trim command telling the drive, "these sectors are no longer needed." The SSD may then wipe those cells immediately to prepare them for future writing.

For investigators, this presents a significant challenge: data recovery on modern SSDs is far more difficult than on traditional HDDs because the deleted data may be physically removed by the drive's firmware almost instantly.

4.2.4 Partitioning Schemes: MBR vs. GPT

Before a file system can be laid down, the hard drive must be partitioned. There are two primary standards you will encounter in the lab:

  • MBR (Master Boot Record): The legacy standard.

    • Structure: Located at the very first sector of the disk (Sector 0).
    • Limitations: It can only address drives up to 2TB in size and supports only 4 primary partitions.
    • Forensic Note: MBR is prone to corruption because the partition table is stored in only one place.
  • GPT (GUID Partition Table): The modern standard associated with UEFI BIOS.

    • Structure: Uses a "Protective MBR" at Sector 0 for backward compatibility, but the real partition table starts at LBA 1.
    • Advantages: Supports massive drives (Zettabytes) and up to 128 partitions.
    • Redundancy: GPT stores a secondary (backup) partition table at the physical end of the disk. If the header is corrupted, forensic tools can often recover the volume structure from this backup footer.

4.3 The FAT File System (Legacy Context)

While NTFS is the king of Windows, the File Allocation Table (FAT) file system is still ubiquitous in digital forensics because it is the standard for USB flash drives, SD cards, and many IoT devices.

  • FAT32: The most common version. It relies on a "Linked List" structure. The directory table points to the first cluster of a file, and the File Allocation Table (FAT) tells the OS where the next piece of the file is located.
  • Security: Unlike NTFS, FAT has no security permissions. Anyone who accesses the drive can read the files.
  • Forensic Relevance: FAT uses timestamps differently. In many implementations, it tracks "Created," "Modified," and "Accessed," but the resolution on "Created" is precise to the 10ms, while "Accessed" only tracks the Date, not the time. This lack of granularity can complicate timeline analysis.

4.4 The NTFS File System

We now focus on NTFS (New Technology File System). As the standard file system for modern Windows operating systems, understanding NTFS is the foundational skill required for the majority of corporate and criminal investigations.

4.4.1 The Master File Table (MFT)

The heart of NTFS is the Master File Table (MFT). Think of the MFT as a massive database or card catalog that contains a record for every single file and directory on the volume.

  • Every file has an MFT Record (usually 1024 bytes in size).
  • The MFT record contains "Attributes" that describe the file.

4.4.2 System Metadata Files

The MFT itself is actually a file (named $MFT). It is one of several hidden "Metafiles" that define the volume. You cannot see these in Windows Explorer, but they are visible in forensic tools.

System File Description Forensic Value
$MFT Master File Table Contains the records for every file on the volume.
$MFTMirr MFT Mirror A backup of the first four records of the MFT (critical for recovery).
$LogFile Transaction Log Records metadata changes (journaling). Can be used to undo/redo changes and recover deleted events.
$Bitmap Cluster Bitmap A map of 0s and 1s representing every cluster on the drive. 0 = Unallocated (Free), 1 = Allocated (Occupied).
$Boot Boot Sector Contains the BIOS Parameter Block (BPB) and boot code.
$Volume Volume Name Stores the volume label and version information.

4.4.3 The $DATA Attribute

In NTFS, the "content" of a file is just another attribute called $DATA.

  • Non-Resident Data: If a file is large (e.g., a photo), the MFT record header points to a "Cluster Run" (a list of cluster addresses) where the $DATA is stored on the physical disk.
  • Resident Data: If a file is very small (typically less than 600-700 bytes), NTFS performs an optimization. It stores the actual content inside the MFT record itself, in the space usually reserved for the cluster pointer.

Forensic Relevance: If a suspect creates a small text file containing a password or a threatening note (e.g., "kill_list.txt") and then deletes it, the data might persist inside the MFT record even if the cluster allocation creates new files. Furthermore, if you only image the "allocated space" (Logical copy), you typically get the MFT, so you capture this data.

4.4.4 The Recycle Bin Lifecycle ($I and $R)

When a user "deletes" a file to the Recycle Bin, it is not actually deleted. It is moved to a hidden folder named $Recycle.Bin. During this move, Windows performs a specific renaming process to track the file.

For every deleted file, Windows creates two new files in the Recycle Bin:

  1. The $I File (Index): This is a small metadata file. It contains:
    • The original filename.
    • The original path (where it was deleted from).
    • The deletion timestamp.
  2. The $R File (Raw Data): This is the actual file content (the photo or document itself), renamed to a random string (e.g., $R19283.jpg).

Investigation Tip: By linking the $I file to the $R file, an investigator can prove exactly when a suspect deleted a specific piece of evidence and where it originally existed on the disk.


4.5 The Forensic Goldmine: Slack Space

One of the most important concepts in data recovery is Slack Space. To understand this, we must clearly define the states of storage.

4.5.1 Storage Terminology Matrix

Term Definition Contains Evidence?
Allocated Space Clusters currently reserved by the OS for active files. Yes (Active Files)
Unallocated Space Clusters marked as "Free" in the $Bitmap. The data here is not protected and can be overwritten at any time. Yes (Deleted Data / Carving targets)
Slack Space The gap between the end of a file's logical data and the end of the physical cluster it occupies. Yes (Fragments of old/overwritten data)

4.5.2 Calculating Drive Slack

Recall that a cluster is the smallest unit of allocation. If your computer uses 4096-byte clusters, and you save a file that is only 2000 bytes, the computer must still allocate the entire 4096-byte cluster.

4096 (Cluster) - 2000 (File) = 2096 bytes of Slack Space

This unused space at the end of the cluster is not empty; it contains whatever data was previously written to that area of the disk.

4.5.3 Case Study: The "Cleaned" Invoice

To understand why Slack Space is critical, let's look at a realistic investigation scenario involving corporate fraud.

The Scenario: A suspect, Bob, is creating fake invoices to embezzle money. He creates a file named Fake_Invoice_001.txt which is 3,000 bytes in size. He saves it to the disk.

  • Cluster Size: 4,096 bytes.
  • Result: The file takes up one cluster. The first 3,000 bytes are the invoice data. The remaining 1,096 bytes are slack (empty or old data).

The Attempted Cover-Up: Bob realizes he might get caught. He decides to "delete" the evidence. However, he knows that simple deletion isn't enough, so he tries to be clever. He opens Fake_Invoice_001.txt, deletes all the text, and types "Grocery List: Milk, Eggs, Bread." He saves the file again.

  • New File Size: The new grocery list is only 50 bytes.
  • The Overwrite: The Operating System goes to the same cluster to write the new data. It writes the 50 bytes of the grocery list at the very beginning of the cluster.

The Forensic Discovery: When the investigator analyzes the drive, they look at the cluster allocated to "Grocery List."

  • Bytes 0-50: "Grocery List: Milk, Eggs, Bread" (This is the active file).
  • Bytes 51-4096: This is the Slack Space. Because the new file was smaller than the old file, the OS did not wipe the end of the cluster. It simply stopped writing after "Bread".

Consequently, the remaining 2,950 bytes of the original fake invoice—containing dates, amounts, and bank account numbers—are still sitting in the slack space of the "Grocery List" file. Bob successfully hid the file from Windows Explorer, but he failed to hide it from the Hex Editor.


4.6 Data Recovery: The Art of "Carving"

When a user "deletes" a file in Windows, the OS simply goes to the MFT and marks that file's entry as "unallocated." It basically tells the computer, "This space is now available for rent." Until the computer actually writes new data over that specific physical location, the old data remains 100% intact.

File Carving is the process of recovering this data without referencing the file system (MFT). We do this by searching the raw hex data for specific Signatures.

4.6.1 File Signatures (Magic Bytes)

Every file type has a unique digital fingerprint located in the first few bytes of the file header. These are often called "Magic Bytes".

The best resource to identify specific headers and footers for different file types is Gary Kessler's File Signature Table, which is now hosted on Search.org https://filesig.search.org/

File Type Extension Header Signature (Hex) Footer/Trailer (Hex)
JPEG Image .jpg FF D8 FF FF D9
PDF Document .pdf 25 50 44 46 (%PDF) 25 25 45 4F 46 (%%EOF)
Windows Executable .exe 4D 5A (MZ) Variable
ZIP Archive .zip 50 4B 03 04 (PK..) Variable

Note: The ASCII representations "MZ" and "PK" refer to the creators of those formats (Mark Zbikowski for DOS executables and Phil Katz for ZIP).

4.6.2 Manual Carving: The Visual Method

The most straightforward way to recover a file is to visually identify it in a Hex Editor (like HxD or FTK Imager) and copy it out. This is often used for quick verification or small recoveries.

  1. Locate the SOF (Start of File): Scroll through the data or use the "Find" function to search for a specific header, such as FF D8 FF (JPEG).
  2. Locate the EOF (End of File): Once you find the header, scroll down carefully until you find the corresponding footer, such as FF D9.
  3. Highlight: Click on the very first byte of the header, hold Shift, and click on the very last byte of the footer. This highlights the entire block of data.
  4. Copy and Paste: Right-click and select "Copy." Open a new file in the Hex Editor, paste the data, and save it as evidence.jpg. If the file was not fragmented (broken into pieces on the drive), it will open immediately.

4.6.3 Manual Carving: Offsets & Calculations

While the visual method works for simple tasks, professional forensics often requires the use of Command Line Interface (CLI) tools (like dd in Linux) or precise documentation for court reports. For this, we cannot just "highlight"; we need the specific Offsets.

What is an Offset? An offset is a precise address on the disk. It tells the computer exactly how many bytes from the beginning of the disk the data resides. In a Hex Editor, the offset is usually listed in the far-left column (e.g., 0x00001A20).

The Calculation Process:

  1. Identify Start Offset: Record the address of the first byte of the header (SOF).
  2. Identify End Offset: Record the address of the last byte of the footer (EOF).
  3. Calculate Size (Length): Most CLI tools require the starting point and the length of the data to carve. You calculate this by subtracting the Start from the End. $$Length = End Offset - Start Offset$$
  4. Extraction: You would then input these numbers into a tool. For example: dd if=image.dd of=recovered.jpg skip=[Start Offset] count=[Length]

Why Offsets Matter: In a professional report, you cannot simply say "I found a picture." You must state, "I recovered a JPEG file located at Offset 0x4A200 in the unallocated space." This ensures your work is reproducible—a requirement for the Daubert Standard in court.

4.6.4 The Fragmentation Challenge

Manual carving works perfectly if the file is contiguous (stored in sequential clusters). However, as drives fill up, the OS often has to split files into pieces to fit them into available gaps. This is Fragmentation.

If a JPEG is fragmented into three pieces scattered across the disk, manual carving will fail. Why?

  • You find the Header (FF D8) in Cluster 100.
  • You find the Footer (FF D9) in Cluster 500.
  • If you carve everything from 100 to 500, you are also grabbing the "garbage" data (other files) located in clusters 101-499. The resulting image will be corrupted and unreadable.

Solving fragmentation requires "Smart Carving" or File System Forensics (using the MFT run-lists), as simple header/footer carving cannot guess where the middle pieces of the file are located.

4.6.5 Automated Carving Tools

In a real investigation with terabytes of data, manual carving is too slow. We rely on automated carving tools to scan the drive and extract files based on known headers.

Foremost & Scalpel (CLI)

  • Foremost: Originally developed by the U.S. Air Force. It scans the drive for headers/footers. It is simple but effective.
  • Scalpel: A faster, more memory-efficient rewrite of Foremost. It uses a configuration file (scalpel.conf) where you must uncomment the file types you want to recover.

PhotoRec

  • Overview: Despite the name, PhotoRec recovers video, documents, and archives, not just photos. It ignores the file system entirely and goes after the underlying data.
  • Pros: It is free, open-source, and arguably the most powerful carver for damaged drives.
  • Cons: It recovers files with generic names (e.g., f00134.jpg, f00135.jpg). It cannot recover the original filename or directory structure, leaving the investigator to manually sort through thousands of files.

Autopsy & Recuva (GUI)

  • Autopsy: The premier open-source forensic suite. It has built-in carving modules (using PhotoRec in the background) that run automatically during "Ingest."
  • Recuva: A user-friendly tool by Piriform (creators of CCleaner). While less "forensically sound" than CLI tools (as it installs directly to the host), it is excellent for quick triage or non-criminal recovery tasks.

4.7 Real-World Case Study: The BTK Killer

The concepts of file systems and metadata recovery are not just academic; they catch serial killers.

The Case: Dennis Rader, known as the BTK (Bind, Torture, Kill) strangler, terrorized Kansas for decades. He went dormant for years but resurfaced in 2004 to taunt the police. He sent a package to a local TV station containing a 1.44MB Floppy Disk.

The Inquiry: Rader had asked the police in a letter, "Can I be traced on a floppy disk?" The police, via a newspaper ad, lied and said "No."

The Evidence: Rader saved a Microsoft Word document to the floppy disk. He thought he had deleted his personal information from the document. However, forensic analysts examined the Metadata embedded within the Word document itself (not just the file system).

  • Recovery: The metadata revealed the document was last saved by a user named "Dennis."
  • Association: It also showed the software was registered to the "Christ Lutheran Church."
  • The Catch: Police found that Dennis Rader was the president of the church council. This digital breadcrumb led directly to his arrest and confession.

This case illustrates that "deletion" is rarely complete and that metadata often tells a story the user never intended to share.


4.8 Summary

This week, we moved away from abstract legal theories and engaged directly with the raw data. You learned that computers are inefficient; they allocate storage in fixed buckets (clusters) that create gaps (slack space) where evidence can hide. You also learned that the file system (NTFS) is essentially a database (MFT) that tracks these files using complex attributes like $DATA.

We explored the lifecycle of deletion—from the $I and $R files of the Recycle Bin to the ultimate persistence of data in Unallocated Space. Finally, you learned to use manual and automated carving techniques to resurrect this data, understanding that even a floppy disk from a serial killer can hold the metadata needed to solve a 30-year-old cold case.

In the next chapter, we will begin Phase 2 of the course: Windows Forensics, starting with the complex nervous system of the OS—the Windows Registry.


Key Terms Checklist

  • Bit / Byte / Nibble: The units of digital data.
  • Hexadecimal: Base-16 numbering system used to represent binary data.
  • Cluster vs. Sector: The difference between logical allocation and physical storage.
  • MBR vs. GPT: Legacy vs. Modern partition tables.
  • MFT (Master File Table): The database that tracks all files in NTFS.
  • $I and $R Files: Metadata and Data files created by the Windows Recycle Bin.
  • Resident Data: Small files stored directly inside the MFT record.
  • Slack Space: Data remaining in the unused portion of a cluster.
  • Magic Bytes: Hex signatures identifying file types (e.g., FF D8 for JPEG).
  • Fragmentation: When a file is broken into non-contiguous clusters, complicating recovery.
  • File Carving: Recovering files based on headers/footers rather than file system metadata.