Reverse Engineering Hyperion: Selective Thread Spawning

In this blog post, we take a deep dive into Hyperion's thread filtering mechanism. We explore how it selectively allows certain threads to spawn, how it identifies and blocks unauthorized ones, and walk through the key checks involved in the process—from instrumentation callbacks to memory validation and control flow resolution. Finally, we examine potential strategies for bypassing these restrictions.

Background Knowledge

Instrumentation Callback

The instrumentation callback is an undocumented Windows mechanism that allows a process to intercept certain system-level events during a transition from kernel mode to user mode. By setting a function pointer in the InstrumentationCallback field of the Process Environment Block (PEB), a process can register a routine that is invoked when specific system calls return from kernel mode back to user mode.

This mechanism is commonly known as Nirvana debugging in the reverse engineering community. It gives the process a chance to inspect, modify, or even block execution in response to certain system calls. While it was originally intended for diagnostics and profiling, it is now widely used in anti-debugging, anti-tampering, and runtime monitoring systems.

The instrumentation callback is not part of the official Windows API and is not documented by Microsoft. Its behavior can vary between Windows versions, and using it effectively requires low-level knowledge of Windows internals. Despite this, it remains a powerful technique in software protection and malware analysis.

If you're interested in learning more, here are some useful resources:

Understanding Thread Creation

Before a thread begins running in user mode, Windows sets up an important routine called an APC (Asynchronous Procedure Call) that runs first. This APC starts at the LdrInitializeThunk function in ntdll.dll, which is responsible for initializing the thread. This happens before the program reaches main or wmain. When a debugger like WinDbg attaches to a process, the first breakpoint is usually placed somewhere in this initialization path. During this phase, tasks such as TLS callbacks, loading kernel32.dll, setting up .NET, and other system configurations take place. These steps are part of both process startup and thread creation. This also explains why Hyperion’s instrumentation callback can intercept your thread before it starts executing normally.

Reverse Engineering The Process

Before proceeding, it's important to note that all the code presented here has been extensively reverse-engineered and, where applicable, deobfuscated. As a result, the code you encounter in the binary may appear slightly different from what is shown.

This analysis is based on the version-347f4ac346734391 build of Hyperion, which is the latest publicly accessible version at the time of writing.

NtCreateThread Hooks

A well-known feature of Hyperion is its runtime remapping of certain dependencies and the hooking of specific exported functions. One such dependency is ntdll. Upon inspecting its exports, you'll find that both NtCreateThread and NtCreateThreadEx have been replaced with unconditional branches to a hook routine implemented by Hyperion.

For the sake of simplicity and brevity, we’ll focus solely on the NtCreateThread hook, as it has fewer parameters and is overall easier to analyze.

The first step in the NtCreateThread hook routine.

This system call is only executed if the provided handle does not correspond to the current process. Upon closer inspection, we can see that it’s actually a call to NtQueryInformationProcess, which populates a PROCESS_BASIC_INFORMATION structure.

The first major condition in the NtCreateThread hook, checking for external thread creation.

Continuing further, we observe a special case for calls involving a handle to an external process, which reflects the behavior of CreateRemoteThread. If the previous system call succeeds and the target process ID differs from that of the current process (as determined through the TEB), a system call to NtCreateThread is made.

The second condition in the NtCreateThread hook, handling internal thread creation.

Here we see a read-write lock being acquired for the current thread, followed by a direct call to NtCreateThread. If the call fails, the lock is released and the NT status is returned. If it succeeds, something more interesting occurs.

As shown above, three functions are called, but we will focus only on the first two. Before the first subroutine is invoked, several values are pushed to and popped from the stack, which explains the slightly unconventional output. The first subroutine prepares the arguments for Hyperion::Threads::EmplaceThreadEntry, one of which is the resolved ClientId value.

A short snippet of the Hyperion::Threads::EmplaceThreadEntry function.

Above is a short snippet from the beginning of the EmplaceThreadEntry routine. Here we see a call to NtQueryInformationThread using the ThreadTimes structure, which retrieves the thread’s creation time. The function then traverses a map (in this case, RobloxPlayerBeta.dll+0x286210) and inserts a new thread entry if one does not already exist. The entries in this map are structured as follows:

struct ThreadEntry
{
    DWORD ThreadId;               // 0x0
    LARGE_INTEGER TimeCreated;    // 0x8
    ULONG_PTR StartAddress;       // 0x10
};

You might be wondering why this post is titled "Selective Thread Spawning". In what way is the process actually selective? That's a fair question, especially since everything we've covered so far only explains how Hyperion tracks successfully created threads. In the next section, we’ll examine how threads are filtered and what criteria determine whether they are allowed to spawn.

Locating The Instrumentation Callback

The core of Hyperion’s thread filtering logic resides in the hook triggered during the LdrInitializeThunk stage. This hook is registered through the process’s instrumentation callback and is executed when a new thread begins initialization.

LdrInitializeThunk is a low-level function in ntdll.dll that initializes a newly created thread before it begins execution in user mode. Before we can reverse the hook associated with it, we need to locate it. As mentioned earlier, it is registered through the instrumentation callback, so our first step is to find that. One way to do this is by scanning for the unique byte pattern 41 52 50, which identifies the start of the callback stub.

The instrumentation callback stub.

There isn’t much happening here. The stub performs a basic recursion check, saves the current stack and register state, and then calls the main instrumentation callback routine. Once the callback completes, the stub restores the saved state and jumps to the original return address to resume execution.

After entering the Hyperion::Instrumentation::Callback subroutine, you’ll encounter a large amount of code. Most of it is not relevant to our analysis, so we’ll focus on the final blocks, where the core hooking logic resides.

The core hooking logic for the instrumentation callback.

As you can see, if the syscall address (in this case, arg1, which is copied to result) matches one of the three function pointers that Hyperion intercepts, a custom instrumentation stub for that system call is returned, and execution continues from that stub.

Analyzing The LdrInitializeThunk Hook

Like most of Hyperion’s instrumentation routines, this hook is filled with obfuscation and junk code. Analyzing it can be extremely time-consuming, especially due to the inlined lazy importer stubs.

To save time, I searched for references to the loaded thread map we reversed earlier, and as expected, I found a couple of hits. I jumped to the first one and scrolled to the top of the nearest branching instruction.

The loaded thread list getting queried by the hook.

Here we see some familiar code. The hook is querying the loaded thread list we analyzed earlier. It is attempting to locate the current thread in that list, the same thread that was just created by LdrInitializeThunk. Since this logic is running inside the instrumentation hook, it is executing in the context of the newly created thread itself.

The result of the map lookup.

Looking further, we can see how the result of the lookup is handled. If an entry with the current thread ID is found, its start address is extracted and a flag (entryNotFound) is set. The read-write lock is then released if necessary.

After that, a few additional checks are performed, which are not relevant to this post. Hyperion compares the start address against several encrypted pointers that represent memory regions requiring special handling.

The first condition of the hook.

Moving forward, we arrive at the following code. If no entry was found, an unconditional jump leads to a label that terminates the current thread with the status code 0xC000071C (STATUS_INVALID_THREAD). After that, the pointer holding the base address of Roblox Player is decrypted and compared against the start address of the current thread. If the address falls outside of the expected range, an additional check is performed.

Encoded module vector lookup.

In the code above, Hyperion performs a quick search for the start address in a vector of whitelisted and loaded modules. If a match is found, a flag (addressFound) is set and the base address of the target module is saved. If not, execution falls through to the next condition.

This encoded list of loaded and whitelisted modules is not new. In my previous post, where I covered Hyperion's CFG, this same list was used for validation. Keep this list in mind—it becomes important later on.

moduleBaseAddress is extracted for unknown memory.

In this condition, we see how moduleBaseAddress is resolved when the start address falls within unknown memory. A call to NtQueryVirtualMemory is made, which retrieves the MEMORY_BASIC_INFORMATION structure for the region containing the start address. If the memory type is MEM_IMAGE, the base address of the allocation is extracted and assigned to moduleBaseAddress.

The thread start address getting validated.

If moduleBaseAddress was successfully retrieved, a second call to NtQueryVirtualMemory is made using rdx as the base address. This is done because rdx holds the start address of the thread at execution. If the memory at that address is committed, a final validation step is performed by calling ValidateStartAddress.

I won't include the full function for the sake of brevity, but its purpose is straightforward: it checks whether the memory pages at the start address have valid executable protection. If they don't, startAddress_1 is set to NULL.

Moving forward, we encounter some particularly interesting code. It relies on an import that is protected by Hyperion’s lazy importer. After locating the pointer where the import was cached and decrypting it, I found that it resolves to a call to RtlLookupFunctionEntry.

Initial loop logic for extracting a RUNTIME_FUNCTION and image base from the start address.

Here, the code attempts to retrieve a RUNTIME_FUNCTION and the corresponding image base for the current thread’s start address. This information becomes important for code we will examine later on.

Hyperion checking if a thread can get spawned at the current address in Roblox.

If no function entry is found for the current start address, Hyperion checks whether the address resides within the Roblox module. If it does, a binary search is performed on two separate data arrays. The loop terminates if the current offset matches any blacklisted offset within Roblox, effectively blocking the thread from being spawned at that location.

Logic that handles jump/call patch redirection based on opcode patterns.

This code scans the instruction at startAddress to detect jumps or calls that redirect control flow. It looks for patterns like jmp, call, and jcc, including indirect jumps via memory. If a redirection is found, it resolves the target and updates startAddress. If not, the current address is finalized.

This likely ensures threads are only spawned at code locations with registered exception handlers.

Initial runtime function handling logic.

If no function entry is found, runtimeFunction is set to NULL. Otherwise, a call to NtQueryVirtualMemory is made to query the basic memory information for the region where the function described by runtimeFunction is located. If the syscall fails, the thread is terminated with the NT status code 0xC000012A (STATUS_THREAD_NOT_IN_PROCESS).

The hook tries to look up base address of the memory region in the loaded module list.

This section reuses familiar logic. If the memory region is not an image and not part of the Roblox client, Hyperion queries the loaded module list to find an entry containing the thread’s start address. If a match is found, it validates startAddress by checking whether it falls within the bounds of the runtime function. If it does, runtimeFunction is set to 1. Otherwise, the thread is terminated with STATUS_THREAD_NOT_IN_PROCESS.

The final termination check of the lookup code.

As shown in the previous code, runtimeFunction is used as a flag. Once the loop completes, if runtimeFunction is not set, the thread is terminated with STATUS_INVALID_THREAD. So, if your thread reaches this point without being terminated, it will be allowed to spawn.

Brainstorming Possible Bypasses

Now that we have a solid understanding of how Hyperion filters threads created within the protected process, let's explore a few ways this mechanism might be bypassed. To begin, here are two common reasons you might want to bypass this protection:

  1. You want to create a thread remotely using CreateRemoteThread or CreateRemoteThreadEx.
  2. You want to create a thread that starts execution in a non-whitelisted region of memory.

To address the first point, the idea behind the bypass is straightforward. You create the remote thread in a suspended state, which avoids immediately triggering LdrInitializeThunk. Next, you extract the necessary information from the thread and add a new entry to the remote thread map. Once the entry is in place, you can resume the thread, and it should execute without issue, as long as the start address is in a memory region whitelisted by the LdrInitializeThunk hook. I built a simple proof-of-concept application that implements this bypass which you can access here.

Before discussing the second point, it's worth addressing a few common misconceptions. The memory whitelisted by the LdrInitializeThunk hook is separate from any other memory whitelist that may exist within Hyperion. Additionally, Hyperion behaves differently across various Windows versions. In my experience, many of its checks appear inactive on Windows 10 24H2 compared to earlier versions like 23H2. Just because you were able to spawn a thread internally without fully bypassing the LdrInitializeThunk logic does not mean it will work on other systems. I have some theories about why this happens, but I am not confident enough in them to share at this time.

Let’s look at how we would approach the second point. This bypass is relatively straightforward in concept, but the implementation is much more complex. The idea is to reverse the bitwise encoded vector, referred to earlier as the module vector, and add our allocation to it. While this may sound simple, it is far from easy. The encoding changes with each new Hyperion build, and reversing it is a tedious process (trust me, I’ve tried).