CH9: Email Forensics

Chapter Overview

In Chapter 8, we explored the fleeting, volatile world of Memory Forensics, capturing evidence that vanishes when the power is cut. Now, we shift our focus to a more persistent and pervasive form of evidence: Email.

Email remains the lifeblood of modern global communication, but it is also the most pervasive attack vector in the cybercrime landscape. From simple phishing schemes to sophisticated Business Email Compromise (BEC) campaigns that defraud organizations of millions, the inbox is often the primary crime scene.

In this chapter, you will learn that an email is much more than the text displayed on the screen; it is a complex container wrapped in a digital "envelope" of metadata. By looking past the visible "From" address—which is easily forged—and mastering the analysis of raw headers, authentication records (SPF, DKIM, DMARC), and server hops, you will develop the skills necessary to trace a malicious message back to its true origin. Furthermore, we will explore how attackers use encoding and obfuscation to bypass security filters, and how you, as an investigator, can safely de-obfuscate and neutralize these threats.

Learning Objectives

By the end of this chapter, students will be able to:

Differentiate between client-side and server-side email storage, and explain the roles of SMTP, POP3, and IMAP protocols.
Identify common email file formats (PST, OST, MBOX, EML) and their forensic implications.
Demonstrate how to access raw email headers in common email clients.
Analyze email headers to trace the true origin of a message using Received fields and X-Originating-IP.
Evaluate email authentication mechanisms (SPF, DKIM, DMARC) and utilize MxToolbox to verify domain records.
Decode obfuscated content using industry-standard tools like CyberChef.
Apply advanced techniques such as sandboxing and safe viewing to investigate malicious attachments and tracking pixels.

9.1 Introduction: The Vector of Choice

While investigators often enjoy the thrill of hunting hackers in memory or carving deleted files, the reality of modern cybercrime is often more mundane but equally devastating: it starts with an email. Whether it is a sophisticated Business Email Compromise (BEC) where a CFO is tricked into wiring millions of dollars, or a widespread Phishing campaign distributing ransomware, email remains the primary attack vector for cybercriminals.

Email forensics is not just about reading messages. It is about analyzing the "envelope" in which the message arrived. Just as a physical letter has a postmark and a return address, a digital email contains hidden metadata that reveals the servers it touched, the time it was sent, and the true identity of the sender.

9.1.1 Email Architecture and Protocols

To investigate email, you must understand how it moves. Email does not travel directly from Sender A to Recipient B; it relies on a store-and-forward model involving multiple servers.

The Protocols:

SMTP (Simple Mail Transfer Protocol): This is the language of sending. When you hit "Send," your email client uses SMTP to push the message to your outgoing mail server. That server then uses SMTP to relay the message across the internet to the recipient's mail server.
POP3 (Post Office Protocol version 3): This is an older retrieval protocol. Historically, POP3 would download emails from the server to the local device and then delete them from the server. Forensically, this is significant because if a user is using POP3, the evidence may exist only on their local laptop, not on the company server.
IMAP (Internet Message Access Protocol): The modern standard. IMAP synchronizes the mail client with the server. The email lives on the server, and the client simply views a copy. Forensically, this means the "source of truth" is usually the cloud or exchange server, but local cache files (OSTs) may still hold valuable data.

The Importance of Secure Ports When analyzing network traffic or server logs, an investigator must distinguish between "Cleartext" and "Encrypted" communication. Historically, email protocols transmitted data (including passwords and message bodies) in plain text, making them vulnerable to interception by anyone on the same network. Today, modern email servers enforce encryption using SSL/TLS. Recognizing the port numbers associated with these protocols allows an investigator to determine if a communication channel was secure or if the user was transmitting sensitive data in the open.

Protocol	Function	Plain Text Port (Unsecured)	Secure Port (SSL/TLS)
SMTP	Sending Mail	25	465 or 587
POP3	Downloading Mail	110	995
IMAP	Syncing Mail	143	993

9.1.2 The Evidence Container: File Formats

Different email clients store data in proprietary formats. Recognizing these extensions is step one in the acquisition phase.

.PST (Personal Storage Table): The classic Microsoft Outlook archive. It contains messages, calendar events, and contacts. These are local archives often found on user hard drives.
.OST (Offline Storage Table): Also Outlook, but this is the "cache" of an IMAP or Exchange account. It allows users to read mail while offline. Even if a user deletes an email from the server, a copy might linger in the local .OST file.
.MBOX: A generic format used by Thunderbird, Apple Mail, and Google Takeout. It stores all messages in a single, long text file, concatenated one after another.
.EML: A single, individual email message saved as a file. This is the standard format for forensic analysis because it preserves the raw header and body in plain text.

9.2 Header Analysis: Reading the Digital Envelope

When you view an email in Outlook or Gmail, you see the "rendered" view: From, To, Subject, and Date. This is what the sender wants you to see. However, this information is easily forged. To find the truth, investigators must view the Full Source or Internet Headers.

The header is a log of the email's journey. It is read from bottom to top; the bottom-most lines represent the start of the journey (the sender), and the top-most lines represent the end (the recipient).

9.2.1 Accessing the "Digital Envelope"

Before you can analyze headers, you must access them. Regular email clients hide this messy data by default so as not to confuse average users. While the exact steps vary slightly between software versions, the general methods are consistent:

Webmail (Gmail, Outlook 365 Web, Yahoo): Look for a menu option (usually three dots or an "Actions" dropdown near the specific email) that says something like "Show Original," "View Raw Message," or "View Source." This will typically open a new browser tab displaying the full text content of the email, starting with the headers.
Desktop Clients (Outlook Desktop App, Thunderbird, Apple Mail):
- In Outlook, you often have to open the email in its own window, go to File > Properties, and look for the "Internet headers" box at the bottom of the window.
- In Thunderbird or Apple Mail, look under the View menu for options like "Headers > All" or "Message > Raw Source."

As an investigator, your first step upon receiving a suspicious email is to export it to an .EML file or copy the entire raw source text for analysis.

9.2.2 Critical Header Fields

1. The Received Headers (The Hops) This is the most important section for tracing. Every time an email passes through a Mail Transfer Agent (MTA)—like a post office sorting facility—that server stamps the email with a Received header.

Structure: Received: from [Sender_Server] by [Receiver_Server] with [Protocol] at [Time]
Analysis Strategy: Read these from bottom to top. The bottom-most Received header usually reveals the originating server or the sender's IP address.

Example: Tracing the Path Below is a snippet of a header showing an email traveling from a home user, through their ISP, to a corporate mail server.

Received: from mail.company.com (mx1.company.com [198.51.100.10])
    by internal-relay.company.com with ESMTP id 5678
    for <victim@company.com>; Tue, 18 Nov 2025 09:15:05 -0800

Received: from mail.isp.net (mail.isp.net [203.0.113.55])
    by mail.company.com with ESMTPS id 1234
    for <victim@company.com>; Tue, 18 Nov 2025 09:15:00 -0800

Received: from unknown (Home-PC [192.168.1.5])
    by mail.isp.net (Postfix) with ESMTPSA id 9988
    for <victim@company.com>; Tue, 18 Nov 2025 09:14:55 -0800

How to Read This:

Hop 1 (The Bottom): The journey starts here. The user sent the email from their computer (Home-PC) to their Internet Service Provider (mail.isp.net). This usually happens when the user hits "Send" in Outlook.
Hop 2 (The Middle): The ISP's server (mail.isp.net) looks up the destination and forwards the email across the internet to the recipient's mail gateway (mail.company.com).
Hop 3 (The Top): The company's gateway accepts the message and passes it to the internal relay (internal-relay.company.com) for final delivery to the user's inbox.

2. X-Originating-IP Some mail servers explicitly stamp the IP address of the device that sent the email. If present, this is a "smoking gun" that can be resolved to a specific geolocation (e.g., a residential ISP in a different country).

3. Return-Path When an email "bounces" (fails to deliver), the error notification is sent to the address listed in the Return-Path. Phishers often spoof the visible From: address (e.g., CEO@target.com) but leave the Return-Path as their own email (hacker@evil.com) so they can see if their attacks are getting through. This field often reveals the true actor.

4. Message-ID A globally unique identifier assigned by the originating mail server. * Forensic Value: Identifying the type of server. For example, a Message-ID ending in @iphone suggests the email was sent from an Apple device, which might contradict a suspect's alibi that they were at their desktop.

9.2.3 Authentication: Catching the Spoof

Modern email systems use three DNS-based protocols to verify sender identity. As an investigator, looking at the Authentication-Results header within the email can instantly tell you if it is a fake.

SPF (Sender Policy Framework): The owner of a domain (e.g., google.com) publishes a DNS record listing which IP addresses are allowed to send mail for them. If an email claims to be from google.com but comes from a random IP not on that list, SPF fails.
DKIM (DomainKeys Identified Mail): The sender cryptographically signs the email with a private key. The receiver checks the public key stored in the domain's DNS. If the email body or headers were altered in transit, the signature verification fails.
DMARC: A policy layer on top of SPF and DKIM. It tells the receiving server what to do if SPF or DKIM fails (e.g., "do nothing," "put it in spam," or "reject it entirely").

9.2.4 The Investigator's Best Friend: MxToolbox

Analyzing raw headers manually is tedious and error-prone. Fortunately, online toolsets exist to assist investigators. The most prominent among these is MxToolbox. It provides two critical functions for email forensics: header parsing and infrastructure verification.

1. Email Header Analyzer You can copy the entire raw header block from an email and paste it into the MxToolbox Email Header Analyzer. * Visualization: It automatically parses the Received headers and creates a visual map or table of the "hops," making it much easier to see the path the email took and identify delays between servers. * Highlighting: It highlights critical fields like the Return-Path and extracts IP addresses for easy lookup.

2. SPF, DKIM, and DMARC Lookups While the email header tells you if authentication passed or failed for that specific message, MxToolbox allows you to query the domain's actual DNS records to understand their security posture.

Scenario: You receive a phishing email pretending to be from examplebank.com. The header shows SPF failed.
Investigation: You use MxToolbox's SPF Lookup tool against the domain examplebank.com.
Result: You might find that examplebank.com has no SPF record at all, or a very weak one (e.g., ends in +all, which allows anyone to send mail as them). This confirms that the domain is poorly secured and easily spoofed, adding context to your investigation.

9.3 Case Study: The "Urgent Wire" Investigation

To demonstrate the power of header analysis, let us examine a hypothetical scenario.

The Scenario: An Accounts Payable clerk at "Acme Corp" receives an urgent email appearing to be from her CEO, requesting an immediate payment of an overdue invoice to a new vendor. The email looks legitimate, with the CEO's correct name and email address in the "From" line. Suspicious, she forwards it to the forensic team.

The Evidence: We extract the message as an .EML file and open the raw headers. Below is the artifact we are analyzing.

9.3.1 The Suspicious Header

Delivered-To: clerk@acmecorp.com
Received: by 10.55.12.3 with SMTP id x3csp1293;
        Tue, 18 Nov 2025 09:14:22 -0800 (PST)
Authentication-Results: mx.acmecorp.com;
       spf=fail (sender IP is 192.0.2.45) smtp.mailfrom=attacker@phish-King.net;
       dkim=none;
       dmarc=fail action=none header.from=acmecorp.com;
Received: from mail.suspicious-server.net (mail.suspicious-server.net. [192.0.2.45])
        by mx.acmecorp.com with ESMTP id u12si45823
        for <clerk@acmecorp.com>; Tue, 18 Nov 2025 09:14:21 -0800 (PST)
Received: from [10.0.0.15] (unknown [45.33.22.11])
        by mail.suspicious-server.net (Postfix) with ESMTPSA id 34F2A1
        for <clerk@acmecorp.com>; Tue, 18 Nov 2025 17:14:19 +0000 (UTC)
From: "CEO John Smith" <john.smith@acmecorp.com>
To: "Jane Doe" <clerk@acmecorp.com>
Subject: URGENT: Overdue Invoice - WIRE IMMEDIATELY
Date: Tue, 18 Nov 2025 17:14:19 +0000
Message-ID: <20251118171419.82736@phish-King.net>
Return-Path: <attacker@phish-King.net>
X-Mailer: PHP/7.4.3
Content-Type: text/html; charset="UTF-8"

9.3.2 Analysis Walkthrough

Step 1: Check the Visible "From" vs. "Return-Path"

Visible From: john.smith@acmecorp.com (This is what the clerk saw).
Return-Path: attacker@phish-King.net.
Conclusion: Immediate mismatch. The bounce address reveals the email did not originate from the corporate account.

Step 2: Trace the Hops (Bottom to Top)

Bottom Received Header: Received: from [10.0.0.15] (unknown [45.33.22.11]) by mail.suspicious-server.net...
- This tells us the email originated from the IP 45.33.22.11. A quick WHOIS lookup might reveal this IP belongs to a VPS provider in a foreign country, not Acme Corp's headquarters.
Next Received Header: Received: from mail.suspicious-server.net... by mx.acmecorp.com...
- This shows the handover from the attacker's mail server to the victim's company server.

Step 3: Check Authentication Results

spf=fail: The IP 192.0.2.45 is NOT authorized to send mail for acmecorp.com.
dmarc=fail: This confirms the email violates the domain's security policies.

Step 4: The Tool Marks

X-Mailer: PHP/7.4.3: This is a significant finding. Corporate CEOs use Microsoft Outlook or Apple Mail. They do not send emails using a raw PHP script. This indicates the email was generated programmatically, likely by a web script hosted on a compromised server.

Case Conclusion: This is a confirmed spoofing attack. The attacker used a script (PHP/7.4.3) on a rented server (phish-King.net) to send an email pretending to be the CEO. The Return-Path and SPF failure serve as the primary evidence.

9.4 Content Analysis: Deobfuscating the Payload

Once the header confirms the email is malicious, the investigator must analyze the body. However, you will rarely find malicious code or phishing links in plain sight. Attackers use Encoding Schemes to bypass spam filters and security scanners.

9.4.1 Understanding Encoding Schemes

When you inspect the raw source of an email, you may encounter blocks of text that appear to be random strings of alphanumeric characters. It is important to distinguish this from "encrypted" data or "corrupted" data.

This is MIME (Multipurpose Internet Mail Extensions) encoding. The original SMTP protocol was designed only to handle plain text (ASCII). To send binary data (like an image, a PDF attachment, or a malware executable) via email, that binary data must be translated into text characters. Attackers leverage these same standard protocols to obfuscate malicious content, hoping the investigator (or the security filter) fails to decode it.

1. Base64 Encoding Base64 is the most common method for encoding binary data into text. It converts data into a set of 64 printable characters (A-Z, a-z, 0-9, +, /).

How to Spot It: A long, continuous string of random characters that often ends with one or two equals signs (=), which act as "padding".
Example: aHR0cDovL2V2aWwuY29t
Forensic Action: This string decodes to http://evil.com.

2. URL Encoding (Percent Encoding) URLs often replace special characters with a % followed by their hexadecimal ASCII value. Attackers use this to hide keywords from filters.

How to Spot It: Frequent use of the % symbol within a web link.
Example: http://site.com/login%20page
Forensic Action: %20 represents a space. Attackers might use %2E for a dot (.) to hide the file extension (e.g., .exe).

3. Quoted-Printable This format is often used for HTML emails. It uses an equals sign = followed by a hex value to represent specific characters.

How to Spot It: Random = signs breaking up words or appearing at the end of lines.
Example: Click h=65re
Forensic Action: =65 is the hex value for the letter 'e'. The text actually says "Click here."

9.4.2 The Investigator's Toolkit: CyberChef and Defanging

While simple online decoders exist, professional investigators rely on robust tools to manipulate and decode data safely. The industry standard for this is CyberChef, often referred to as "The Cyber Swiss Army Knife."

Using CyberChef CyberChef is a web application (that can also be downloaded and run locally) that allows you to chain together different operations.

The "Recipe": In CyberChef, you drag and drop operations (like "From Base64" or "URL Decode") into a "Recipe" list.
The Process: You paste your obfuscated text into the Input pane. CyberChef processes it through your recipe and displays the result in the Output pane.
Complex Chains: CyberChef's true power is chaining. An attacker might Base64 encode a string and then URL encode it. With CyberChef, you can simply add both operations to the recipe to peel back the layers instantly.

Defanging Indicators When you identify a malicious URL (like http://evil.com) or IP address in an investigation, you will likely need to email it to a colleague, include it in a report, or submit it to a threat intelligence community.

Never put a live, clickable malicious link in a report. If a reader accidentally clicks it, they could infect their own machine. To prevent this, investigators "Defang" the data.

The Goal: Make the link non-clickable but still readable.
Technique:
- HTTP: Change http to hxxp.
- Dots: Put brackets around the dots. 1.2.3.4 becomes 1.2.3[.]4 or 1[.]2[.]3[.]4.
CyberChef Defanging: CyberChef includes a specific "Defang URL" recipe that automates this process, ensuring your reports are safe to distribute.

9.4.3 Attachments and Hashes

Never double-click an attachment in a forensic investigation. Instead, extract the file and calculate its Hash Value (MD5/SHA256) to compare against databases like VirusTotal.

9.4.4 Hands-on Activity: Email Header Analysis

Now apply what you have learned to identify which of the email headers are malicious or legitimate. This will also include decoding base64 within the body of an email, so make sure to load up CyberChef.

Open Hands-On Email Analysis Activity

9.4.5 Hands-on Activity: Defanging for Reporting

Open Hands-On Defanging Activity

9.5 Advanced Investigation Techniques

As cybercriminals evolve, so must our investigation techniques. Beyond header analysis and decoding, modern forensics involves interacting with the malicious components in a safe, controlled manner.

9.5.1 Sandboxing: Detonating Safely

Sometimes, static analysis (looking at the code/headers) isn't enough; you need to see what the malware does. This is where Sandboxing comes in. A sandbox is an isolated virtual environment that mimics a real computer. You "detonate" (open/run) the malware inside the sandbox, and the tool records every action the malware takes.

Key Sandboxing Tools:

Any.Run: An interactive sandbox that allows you to click buttons and interact with the malware in real-time, just like a remote desktop. It is excellent for malware that requires user interaction (like clicking "Enable Content" in a Word doc).
Joe Sandbox: A deep-analysis sandbox that provides incredibly detailed technical reports, including memory dumps and graph-based execution trees.

9.5.2 Safe Viewing and Tracking Pixels

When investigating an email, your instinct is to view it as the user saw it. However, this can be dangerous. Attackers often embed Tracking Pixels (invisible 1x1 images) or malicious scripts in the HTML body.

The Risk: If your forensic workstation loads this image from the attacker's server, the attacker receives a log entry. They now know the email was opened, the IP address of the investigator, and potentially the software version you are using. This "tips your hand" that an investigation is underway.
Best Practice: Always configure your forensic email tools to Block Remote Content by default. View the email in "Plain Text" mode whenever possible to analyze the content without executing the code.

9.5.3 Forensic Tools for Email Parsing

While manual analysis is great for learning, real-world cases often involve thousands of emails. Investigators use specialized tools to parse email archives (like .PST or .MBOX files) efficiently.

Commercial Tools:

Aid4Mail and Paraben E3: Powerful commercial suites designed to process massive email archives, deduplicate messages, and convert formats for legal review.

Free and Open Source Tools:

FTK Imager: While primarily a disk imaging tool, FTK Imager allows you to mount and browse the structure of Outlook .PST and .OST files for free. It is a standard tool for triage.
Mozilla Thunderbird: An excellent open-source email client. It natively handles .MBOX and .EML files, making it a great free viewer for forensic artifacts exported from other systems.
NirSoft Utilities (OutlookAttachView / PSTWalker): Lightweight, free utilities that are excellent for quick tasks, such as extracting all attachments from a PST file without needing to install Outlook.

9.6 Chapter Summary

Email forensics is a critical skill because email is the "front door" for most cyberattacks. By stripping away the user-friendly interface and examining the raw headers (the "envelope"), an investigator can reveal the true path a message took. Key artifacts like the Received hops, Return-Path, and X-Originating-IP allow us to attribute an attack to a specific source, while authentication failures (SPF/DKIM) mathematically prove an email is a forgery. Tools like MxToolbox, CyberChef, and automated Sandboxes are essential for parsing these complex data structures and safely handling malicious indicators.

Key Terms Review

SMTP: Protocol for sending email.
Header: The metadata section of an email containing the routing log.
Return-Path: The address where bounce messages are sent; often reveals the true sender.
Base64: An encoding scheme used to represent binary data as text; often used to obfuscate data or embed attachments.
Defanging: The process of altering a malicious URL or IP to prevent accidental clicking (e.g., changing http to hxxp).
Sandbox: An isolated environment used to safely execute and observe malware.