Automation is key to improving the efficiency and accuracy of your analysis.
In an older post, we examined a stack-based buffer overflow in FreeFloat FTP Server. The step-by-step guide for beginners was to illustrate the process and technique of basic exploit development. However, I skipped the part where I was to explain how to determine which bytes or characters can break an exploit code. In this post, I’ll discuss how to hunt down bad characters in exploit development. If you are preparing for OSCE, you will find this post somehow useful.
ii. General approach
The fundamental principle of testing which bytes corrupt your exploit code is by utilising an array of all the bytes from “\x00” to “\xff” — byte array. The byte array is the shellcode in the exploit code. After executing the exploit code, we have to locate the byte array in the target’s memory application and examine for any missing bytes. The inference of the examination is anchored on the fact that if the shellcode found in the memory is not the same as the one used in the exploit code, there is a bad character. Once any bad character is identified, you generate the byte array again while eliminating the found bad character. The same examination is done repeatedly until all the bad characters are identified i.e. when bytes in exploit code are the same as in the memory of the exploited application.
iii. Slight Challenges
Sometimes, the memory buffer where you can fit your initial shellcode can be less than 256 bytes. For instance, you might get a buffer size of 24 bytes. You must start your analysis with just a byte array of 24 bytes, i.e. “\x00” to “\x18” and keep repeating the analysis with an iteration of the subsequent 24 bytes until all the 256 bytes are examined.
One assumption when analysing for bad characters in exploit development, especially for beginners, is that the byte array is always placed on the stack after a crash (EIP overwrite). Therefore, the first byte in the array can be referenced by the ESP register. However, it is not always the case. You might need to do some stack alignment or locate the byte array buried at a different location in the memory. Manual inspection of the byte array in the memory can be tedious, and the accuracy of the analysis will depend on your eyeballing strength.
The “\x00” byte is a null terminator for strings (ASCII or UTF-8), and it is always the first culprit in bad character analysis. A string must be ended by a null byte, but it should never be found within a string. You don’t want a null byte anywhere in your exploit code. So, my analysis usually starts with the null byte aside and examining the rest of the 255 bytes.
iv. My Deft Approach
To make finding the bad characters in memory a bit of fun, I automated the process using Python. There are two main scripts to do the job:
- A simple custom debugger using PyDbg (CrashDbg.py)
- A postmortem analysis script as the crash handler (BadChars.py)
The debugger hooks on the application we are analysing and when a crash happens (monitoring for access violation caused by EIP overwrites) the crash handler invokes the postmortem analysis script which compares the byte arrays in memory versus the one used in the exploit code.
Address of the byte array in the memory is copied from the ESP register (at crash), but you can specify a custom address in Hexadecimal format, e.g. 0x7331b00f. Furthermore, the data to be dumped from the memory and the length of the byte array MUST be specified in the scripts. The debugger passes the dumped data from memory to the postmortem analysis script which does a byte-by-byte comparison to detect any changes. So the postmortem analysis script must have the same byte array as the one used in the exploit code for a successful analysis.
To run the CrashDbg script, specify an executable name. The script will enumerate the host’s processes and attach to the running application if found. When a crash occurs, it will ask for the memory address to read (dump memory size of the used byte array) and then pass execution to BadChars which will try to identify a changed byte in the byte array (postmortem analysis).
After the bad character analysis, the BadChars script will output the found bad character. It will also generate a new shellcode and a new byte array while excluding the identified bad characters. You will use these newly generated data to repeat the test, and it is done until all bad characters are identified.
A sample postmortem analysis using CrashDbg and the identified bad character(s).
Notice from the Img. 3 there are details to pay attention to:
- After the crash, you can use ESP or another memory address to locate the shellcode,
- The bad characters in the analysis are — \x00 and \x0a,
- New check_bytes variable to be used in the BadChars.py (line 57) is generated,
- New shellcode to be used in your exploit code is also generated,
- Finally, update the size of bytes to be read to 254 (new size of shellcode used) in the CrashDbg.py (line 37).
The bad character analysis process is done until we have identified all the bad characters affecting the shellcode structure in memory. You can automate updating the script by using a separate config file to load the variables and auto-adjusting the byte array depending on the number of found bad characters. I enjoy doing this part of the process manually because any mistake at this point can result in time-wasting and pulling hair out of your scalp!
In most cases, you will find me on a fully-fledged debugger like WinDbg or Immunity. CrashDbg is normally on the side just for the bad character analysis. So, I usually know the memory address of the bytes I want to examine, and it can be just about anything, for instance, an egg-hunter. It is one flexibility I like about CrashDbg and the fact that it is a debugger agnostic (not a plug-in to another debugger).
v. Similar Approach and Resources
If you have used Immunity Debugger or Windbg (windbglib) you have probably come across mona.py by Corelanc0d3r (Peter Van Eeckhoutte). Mona is a Python-based script that provides a variety of options in exploit development. The script can be downloaded from https://github.com/corelan/mona
Another interesting project is expdevBadChars by mgeeky (Mariusz B.) and it is more detailed. It highlights and visualises bad characters and their position as they would appear in a memory structure. It incorporates the Longest Common Subsequence based algorithm designed by Peter Van Eeckhoutte (it is featured in Mona). The script can be downloaded from https://github.com/mgeeky/expdevBadChars
Here are some nice articles that also cover the same topic:
- Finding Bad Characters with Immunity Debugger and Mona.py — https://bulbsecurity.com/finding-bad-characters-with-immunity-debugger-and-mona-py/
- 0x7 Exploit Tutorial: Bad Character Analysis — http://www.primalsecurity.net/0x7-exploit-tutorial-bad-character-analysis/
Bad character analysis in exploit development is essential and cannot be ignored. For a stable exploit, payload to be executed should not be mangled up while in the target’s memory application. CrashDbg is a nice tool to help you in crafting your exploit code. Other than the byte array, It can compare just about any length of data from a computer memory and identify any changed bytes.
If I get time to share my Assembly x86 techniques for crafting a custom encoder for a limited set of a valid byte array (like an alphanumeric shellcode), I will do another post. Have fun and take time to look at Unicode based exploits. How you can change CrashDbg to handle Unicode based exploits.