Huntress 2024 CTF: StackIT XOR Operation Challenge
When working with binary exploitation or reverse engineering, one of the most useful skills is being able to decode hidden data—especially when it’s been obfuscated using techniques like XOR. In this post, we'll walk through how to find XOR operations in the decompiled code of an ELF binary using GHIDRA, and how we can use Python to automate the decryption process. This method not only helps us tackle Capture The Flag (CTF) challenges but also provides a repeatable way to handle similar situations in real-world pen testing scenarios. Let’s dive in!
Introducing XOR Operations (Exclusive OR)
In this reverse engineering challenge, we dive into XOR (exclusive OR), a fundamental operation frequently used for obfuscation and encryption. XOR is a crucial concept for understanding obfuscation patterns in reverse engineering CTFs. XOR operates on binary data, comparing bits from two inputs. The rule is simple: if the bits are different, the result is 1
; if they are the same, the result is 0
. Here's the truth table:
0 XOR 0 = 0 | 0 XOR 1 = 1 | 1 XOR 0 = 1 | 1 XOR 1 = 0
This property of XOR makes it a popular choice for lightweight encryption since it is reversible. Applying the same XOR operation again with the same key restores the original data. For example:
Data: A |
Key: K |
Encrypted: A XOR K |
Decrypt: (A XOR K) XOR K = A
Three Common XOR Techniques in CTF Challenges
In Simple Encryption and Obfuscation scenarios where data (a flag or string in this case) is XORed with a key to make it unreadable. To decrypt that flag, the same XOR operation is applied with the same key, as A XOR B XOR B = A
(XOR is reversible).
In Static Key XOR scenarios, encryption or deobfuscation is done by repeatedly applying a fixed key to the data. This key can be a single byte or multiple bytes, and it’s cycled through until the entire data is processed. A common pattern involves using a modulo operation (key[i % len(key)])
to loop back to the start of the key once its length is exceeded.
If the key is 'ABC' and the data is 'HELLO', the encryption process XORs: 'H' with 'A', 'E' with 'B', 'L' with 'C', repeats the key: 'L' with 'A', and 'O' with 'B'.
This repetition allows the same logic to work on the data of any length, while the fixed key keeps the pattern consistent. When the key is a single-byte, each character in the data is XORed with the same byte. This is simpler but also easier to break because patterns in the data (common letters or file headers) can reveal the key. Multi-byte keys provide slightly more complexity but follow the same cyclic process.
In Dynamic XOR scenarios, a changing or computed key is used, requiring you to understand the logic generating the key. These keys are often derived from algorithms, external inputs, or runtime calculations, adding complexity to the reverse engineering process."
Decoding XOR: Strategies for Success
When analyzing binaries we can look for XOR instructions like (xor
, eax
, ebx
, in x86 assembly) and find references to obfuscated strings or data that make it clear we have a comparison going on. If the key is short (1-4 bytes) we can brute force it by testing all possibilities. XORed data often retains characteristics like text patterns or common structures which can give clues about the key. We also have automated tools like Cyber Chef, Python scripting, or plugins in disassemblers (Ghidra, IDA Pro) to analyze and decode XOR obfuscations.
STACKIT XOR Operations Reverse Engineering Challenge
Below is the challenge prompt we encounter providing the binary for analysis and some ‘hints’ or backstory for the scenario.
The first thing I did was run the file
command to get a quick look at the binary to learn I had a 32-bit ELF binary for an Intel 80386 architecture. It is statically linked, meaning all required libraries are included in the binary itself, which is important because it can make reverse engineering a bit trickier. It is also stripped, which means all debugging information and symbol names (function names) are removed so this requires a more hands-on approach when analyzing it.
Next up, I ran checksec
to check the security configurations in place to get an overview of protections on a binary. The binary lacks several key security protections: it has no RELRO, leaving it vulnerable to memory relocation attacks; no stack canary, making it prone to buffer overflow exploits; and NX is disabled, allowing code execution on the stack and increasing overflow risks. Additionally, with no PIE, the binary’s static memory address is predictable and easier to target, while the absence of RPATH/Runpath means no configuration for locating shared libraries at runtime. Finally, it’s stripped of symbol names, which complicates reverse engineering.
Then I used LTRACE to track library calls; when I provided the binary with an input, specifically the string ‘test’, this can be helpful to see what’s happening under the hood when the binary executes. The output says the binary is likely not using dynamic libraries and is instead running statically. The trace only returns “Hello, World!” and running the binary without input returns the same - so in terms of ‘input handling’ we don’t have much going on.
Looking at our Binary with GHIDRA
We will start by importing our file into GHIDRA, under the ‘file’ tab, and then we will click ‘okay’ through the prompt. Then double click on ‘stack_it.bin’ to open the file.
We are able to analyze the binary automatically now, click okay and let GHIDRA ‘do its thing’, then we will be able to look for XOR operations taking place in the debugger.
In the main function’s window (when we pop into the analyzed program), we can scroll down to find a XOR operation being done between two values for which we can see the bytes in GHIDRA. Where those memory addresses are in red (DAT0804a00f
and DAT0804a02e
) we can show references to the memory address in question to look deeper into this.
We can follow these steps (disregard the unequal numbers for the example I am demonstrating, we will get to the correct place with the correct memory address right-clicked and pointed to).
Here we can see the references of the memory addresses being compared to each other.
Deconstructing the XOR with Python
We do not need to import any libraries to solve this XOR operation as we’re only using Python data structures and operations. Therefore, we will initialize the program with the interpreter to tell the program we’re using Python3 whenever its run (good practice).
Then we need to create an array of bytes to hold the raw data we pull from the dissembled binary, that contains the flag once it is unencrypted. This list will be XORed with another set of bytes we’re going to create (the key) to extract the flag.
Next we will define the key, which is the byte array keybytes
. This array will be used to XOR with the raw data. The key is repeated (or wrapped around) to match the length of raw_data
using the modulo operator later in the script.
Now we will combine the raw data with the key bytes. We do this by XORing each byte of raw_data
with the corresponding byte of key_bytes
. If the key is shorter than the raw data, we’ll use the modulo operator to wrap around the key. Notice we’re using the common pattern from the Static XOR mentioned in the beginning of (key[i % len(key)])
to loop back to the start of the key once its length is exceeded.
What XOR does: XOR is a bitwise operation that compares the bits of two numbers. If the bits are the same, the result is 0
; if they are different, the result is 1
. This is what makes XOR useful for encoding and decoding data.
Now that we’ve decoded the raw data we need to append a closing brace to complete the flag string. The ASCII value for that is 0x7d
so we append that byte to decrypted_bytes
so we have the data wrapped in flag{DATAHERE}
.
Finally, we can construct the full flag string by prefixing the decoded bytes with flag{
and then printing it (it has the ending already). We concatenate the b’flag{‘
with decrypted_bytes
which contains the decoded content. The decode function (decode()
) converts the byte array into a string, which is then printed for the user to see.
Reviewing Our Final Payload and Retrieving the Flag
See the GitHub repository here to retrieve the payload: https://github.com/CommodoreAlex/Python/blob/master/decodeXOR.py
When we run the binary with ./
due to the interpeter being placed at the start to identify we’re using a Python3 script, it constructs our flag from the raw data and key we obtained from the binary in our GHIDRA debugger.
Post XOR Challenge Take Aways
In this post, we explored how examining binaries and using tools like GHIDRA can help us uncover hidden data, such as XOR operations in the decompiled code of an ELF binary. By identifying these patterns, we were able to automate the decryption process with a simple Python script.
This approach is valuable because it saves time and gives us a repeatable method for dealing with similar challenges in the future—whether it’s in CTFs, reverse engineering, or real-world penetration testing.
The more we dig into how data is obfuscated and encrypted, the better equipped we are to tackle more complex problems down the line. Keep experimenting, and you'll continue to sharpen your skills for those bigger challenges ahead!