Demorpheus: Getting Rid Of 
Polymorphic Shellcodes In Your 
Network 

Svetlana Gaivoronski, 

PhD Student, Moscow State University 

Dennis Gamayunov, 

Senior Researcher, Moscow State University 


Memory corruption, 0-days, shellcodes... 


Why should anyone care about 
shellcodes in 2012? 


CONS: 


Old exploitation technique, too 
old for Web-2.0-and-Clouds- 
Everywhere-World (some would 
say...) 

According to Microsoft's 2011 
stats*, user unawareness is #1 
reason for malware propagaion, 
and 0-days are less than 1% 
Endpoint security products deal 
with known malware quite well, 
why should we care about 
unkown?.. 


PROS: 

Memory corruption is still there ;- 

Hey, Microsoft, we're all excited 
with MS12-020 

Tools like Metasploit are widely 
used by pentesters and blackhat 
community 

Targeted attacks of critical 
infrastructure - what about early 
detection? 

Endpoint security is mostly 
signature-based, and does not 
help with 0-days 
It's fun! ;) 


* http://download.microsoft.com/download/0/3/3/0331766E-3FC4-44E5-BlCA- 

2BDEB58211B8/Microsoft_SecurityJntelligence_Report_volume_ll_Zeroingjn_on_Malware_ 

Propagation_Methods_English.pdf! 


CTF Madness 


Teams write O-days from 
scratch 

Game traffic is full of exploits 
all the time 

Detection of shellcode allows 
to get hints about your vulns 
and ways of exploitation... 


^^^^ 


Another POV: Privacy and Trust in 
Digital Era 


We share almost all aspects of our lives 
with digital devices (laptops, cellphones 
and so on) and Internet: 

• Bank accounts 

• Health records 

• Personal information 


Recent privacy issues with social 
networks and cloud providers: 

• Linkedln passwords hashes leak 

• Foursquare vulns 

• What's next?.. 


Maybe the risk of 0-days will fade 

away? 


2000 2001 2002 2003 2004 2005 2006 2007 2008 

Vulnerability Disclosur&s 


• Modern software market for mobile and social applications is 
too competitive for developers to invest in security 

• Programmers work under pressure of time limitation; managers 
who prefer quantity and no quality, etc. 

Despite the fact of significant efforts to improve code quality, the 
number of vulnerabily disclosures continues to grow every year... 


Shellcode detection: what do we have 


Static analysis 


Dynamic analysis 


A 


• Signature matching 

• Content-dependent 

• Behavioral 
(Hamsafl], Polygraph [2]) 

• CFG\IFG analysis (SigFree[3]) 

• NOP-sled detection 

(Race Walk [4] ) 

• APE/57 

The problem: if we simply try to run all methods for each portion of 
data, it would be extremely slow 

Classifierl Classified 

Classified 


Hybrid analysis 


Emulation ([61 [7]) 
Automata analysis 

(IGPSA[8]) 


([9], PonyUnpack[10]) 


Virtues and shortcomings 


Static methods 


Dynamic methods 


+ Complete code coverage (theoretically) 
+ In most cases work faster than dynamic 


+ More resistant to obfuscation techniques 


— The problem of metamorphic shellcode 
detection is undecidable 

— The problem of polymorphic shellcode 
detection is NP-complete 


— Require some overheads 

— Can consider only few control flow paths 

— There are still anti-dynamic analysis 
techniques 


And more: 

• Methods with low computation complexity have high FP rate 

• Methods with low FP have high computation complexity 

• They are also have problems with detection of new types of 0-day 
exploits 


None of them is applicable for real network 
channels in real-time 


STAND BACKi 


I'M GOING TP TRY 

SCIENCE 


Why shellcode detection is feasible at 

all 


xchg %eax,%esi 


35 93 98 24 92 


12 d9 do 


14 d9 74 ; 


23 eb 9c 


$0x92249893, 3 
%ebx 


%ecx,%ecx 
$0xa,%cl 

$Bxfffffffc.%edx 
%esi,0xl3(%edx) 
%ecx,%esp 


OxffffffbE 
(%dx),%es:(%edi) 
%eax,%es: (%edi) 


Activator 


Decryptor Payload 


RA 


As opposed to feature-rich 
viruses, shellcode has certain 
limitations in terms of size 
and structure 


29 bf 6C 1 


34 de lc c 


37 b5 93 


$0xef58f06c,%edi 

%eax, -0x22f35d40(%ebx,%edi, 

0X59 

$0xf5dcd81d,%ebx 


ENCRYPTED 
PAYLOAD 


ficomp ( 

rcrb % 


Proposed approach 


• Given the set of shellcode detection 
algorithms, why don't we try to construct 
optimal data flow graph, so that: 

- the execution time and FP rate are optimized, 

- and the FN rate is not more than some given 
threshold 


Shellcode features 


Generic features 


Specific features 


Correct disassembly into chain at least K 
instruction 


Correct disassembly from each and every offset. (NOP) 


Number of push-call patterns exceeds 
threshold 


Conditional jumps to the lower address offset. 
(Encrypted shellcode) 


u 


Overall shellcode size does not exceed 
threshold 


Ret address lies within certain range of values, (non- 
ASLR systems) 


Stat 


Operands of self-modifying and indirect jmp 
are initialized 


MEL exceeds threshold. (NOP) 


Cleared IFG contains chain with more than 
N instructions 


Presence of GetPC. (Encrypted shellcode) 


Last instruction in the chain ends with branch instruction 
with immediate or absolute addressing targeting lib call 
or valid interruption. (non-ASLR systems) 


u 


Number of near reads within payload 
exceed threshold R 


Control at least once transferred from executed payload 
to previously written address, (non-self-contained 
shellcode) 


Dynam 


Number of unique writes to different 
memory location exceeds threshold W 


Execution of wx-instruction exceeds threshold X (non- 
self-contained shellcode) 


Shellcode classes 


SH= {m 1 , ...,m ns }- set of shellcode features, BEN = {l lf -,l nb } -set of 
benign code features. 

Shellcode space 5is splitted for ^classes with respect to identified 
shellcode features. 


Shellcode classes. Example 


v_orrecc aisassemDiy irom eacn ana every 


opeciTic 


Knop 3 


byte offset 


features 


(contains multi-byte NOP- 


• Multi-byte instructions 


equivalent sled) 


uorrecc aisassemDiy into cnam ot at least 


V.-OIT1 MlUll 


K instructions 


features 


• Overall size does not exceed certain 


K sc 


threshold 


(self-ciphered) 


• Conditional jmps to the lower address 


Specific 


offsets 


features 


• Totally we identified 19 classes 


NOTE: none of existing shellcode detection metods provides complete 
coverage of identified classes 


Demorpheus: shellcode detection 
library and tool 


Common basic steps Elementary classifiers 


Hybrid shellcode detector 


Topology example 


Example of flow reducing 


Building hybrid classifier 


Round 1 Round 2 ... Round 


Evaluation: base line 


One of the important goals - minimization of false positives rate 
Hybrid topology Linear topology 


• Minimum false positives rate 

• No flow reducing 


Experimental results: numbers 


Data set 


Linear 


Hybrid 


FN, *100% 


FP, *100% 


Throughput, 
Mb\sec 


FN, *100% 


FP, *100% 


Throughput, 
Mb\sec 


Exploits 


0.2 


n/a 


0.069 


0.2 


n/a 


0.11 


Benign 
binaries 


n/a 


0.0064 


0.15 


n/a 


0.019 


2.36 


Random 
data 


n/a 


0.11 


n/a 


3.7 


Multimedia 


n/a 


0.005 


0.08 


n/a 


0.04 


3.62 


Experimental results: plots 


A couple of use-cases for hybrid 
classifier 

O-days exploits detection and filtering at network level 
CTF participation experience: 

- could help to increase defense level of team 

- could help to gather ideas from other teams 


Linked list of packets 


TCP reassembly 


Hybrid detector 


Demonstration here 


EXPERIMENT 
IN PROGRESS 


Conclusion 


• Good news everyone! 

• Shellcodes may be now detected up to 45 
times faster than before 

• You could download the Demorpheus source 
code from 

git@gitorious.org:demorpheus/demorpheus.git and 
integrate it with your own tool 


Authors 


• Svetlana Gaivoronski 

- E-mail: s.gaivoronski@gmail.com 

- Skype: svetik.sh 

- GPG key: 0x38428E3B 

16A2 2F9D 5930 7885 F9E5 9DA5 09ED D515 3842 8E3B 

• Dennis Gamayunov 

- E-mail: gamajun@cs.msu.su 

- Skype: jama.dharma 

- GPG key: 0xA642FA98 

14E1 C637 4AEC BFOD 0572 D041 0622 7663 A642 FA98 


References 


1. Zhichun Li, Manan Sanghi, Yan Chen, Mingyang Kao, and Brian Chavez. Hamsa: fast signature generation for zero-day 
polymorphic worms with provable attack resilience. In S&P06: Proceedings of the 2006 IEEE Symposium on Security and 
Privacy, pages 32-47. IEEE Computer Society, 2006. 

2. James Newsome. Polygraph: Automatically generating signatures for polymorphic worms. In Proceedings of the IEEE 
Symposium on Security and Privacy, pages 226-241, 2005. 

3. Xinran Wang, Chi-Chun Pan, Peng Liu, and Sencun Zhu. Sigfree: A signature-free buffer overflow attack blocker. IEEE 
Transactions On Dependable And Secure Computing, 7(l):65-79, 2010. 

4. Dennis Gamayunov, Nguyen Thoi Minh Quan, Fedor Sakharov, and Edward Toroshchin. Racewalk: Fast instruction 
frequency analysis and classification for shellcode detection in network flow. In Proceedings of the 2009 European 
Conference on Computer Network Defense, EC2ND '09, pages 4-12, Washington, DC, USA, 2009. IEEE Computer Society. 

5. Thomas Toth and Christopher Kruegel. Accurate buffer overflow detection via abstract payload execution. In In RAID, 
pages 274-291, 2002. 

6. Michalis Polychronakis, Kostas G. Anagnostakis, and Evangelos P. Markatos. Network-level polymorphic shellcode 
detection using emulation. In In Proceedings of the GI/IEEE SIG SIDAR Conference on Detection of Intrusions and Malware 
and Vulnerability Assessment (DIMVA), pages 54-73, 2006. 

7. Michalis Polychronakis, Kostas G. Anagnostakis, and Evangelos P. Markatos. Emulation-based detection of non-self- 
contained polymorphic shellcode. 

8. Lanjia Wang, Hai-Xin Duan, and Xing Li. Dynamic emulation based modeling and detection of polymorphic shellcode at the 
network level. Science in China Series F: Information Sciences, 51(11):1883-1897, 2008. 

9. Qinghua Zhang, Douglas S. Reeves, Peng Ning, and S. Purushothaman. Analyzing network traffic to detect self-decrypting 
exploit code. In In Proceedings of the ACM Symposium on Information, Computer and Communications Security 
(ASIACCS), 2007. 

10. Paul Royal, Mitch Halpin, David Dagon, Robert Edmonds, and Wenke Lee. Polyunpack: Automating the hidden-code 
extraction of unpack-executing malware. In Proceedings of the 22nd Annual Computer Security Applications Conference, 
pages 289-300, Washington, DC, USA, 2006. IEEE Computer Society.