Map of Computer Security
This is the print version of Reverse Engineering You won't see this message or any elements not part of the book's content when you print or preview this page. |
The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/Reverse_Engineering
Basic Security
🍎 |
🚪 |
🛡️ |
🐛 | |
general | Software system | Vulnerability | ||
---|---|---|---|---|
Authentication | Identity | Login | ||
Authorization | Privilege | Debugging and administration tools Default permissions Backdoor |
Principle of least privilege | |
Network | Broadband access |
Web exploits | ||
Data | Privacy Data integrity Confidentiality Sensitive information |
Data access | Data security Encryption Data erasure Chain of trust Canary trap NBDE |
|
Application | Code | |||
Social networks |
Private account Ad credit |
Weak passwords
Password recovery |
Security awareness | |
Artificial intelligence | Machine learning | DL Cyber threat, Deepfake | AI safety, ML defenses, CLEVER score | Adversarial machine learning |
Table of Contents
edit- Introduction
- General Security and Passwords
- Malware
- Web
- Online Security Good Practices
- Data Encryption
- Personal Information and Privacy
- Ethical Hacking
- Further reading
Working - A workbench area for contributors.
Please add {{alphabetical}}
only to book title pages.
Common Solutions
Protection Mechanisms
editNot many good protective measures are available to programmers to prevent most overflow vulnerabilities. However, something can be done.
Bounds Checking
editNew languages such as Java and C# make such a big deal over their "automatic bounds checking" and "memory management" features mainly because they help prevent stack overflows (with a small performance penalty). C programmers however are left to their own devices, and need to explicitly test the bounds on every array. It's tedious, but at least crackers won't break your program, and then you won't get fired from your job.
Canary / Cookie
editSome compilers help out by building in a flag value called a canary or cookie on the stack usually just above the pushed frame pointer and the return address (think of the caged bird used in coal mines to detect the buildup of poisonous gases before the workers could get intoxicated).
push CANARY push ebp mov ebp, esp sub esp, 100
Now, when the function wants to return, we can perform the following operation:
add esp, 100 mov esp, ebp pop ebp pop ebx ;canary value cmp ebx, CANARY jne _STACK_ERROR_FOUND ret
This way we can detect if the stack has been overwritten, because the Canary value has changed. A predictable Canary value however is vulnerable: attackers that insert that value onto the stack as part of their overflow data elude detection. For this reason most Canary values are randomly generated at run-time. Many Canary values also contain two null characters at the start or end: string copy functions (like strcpy or wcscpy) stop copying data after reaching and writing a null char; if the nulls are instead omitted by the attacker the overflow will be caught.
This method of protection can catch basic overflows, and prevent a function from returning to a modified address and execute arbitrary code. However the subroutine still gets executed – with compromised internal state and variables – since the overflow get detected only when it returns. This can still be exploited by an attacker: for example, a memory pointer variable can be modified to point to an arbitary location. If the subroutine then uses this pointer to write to memory, it could overwrite anything in the program’s address space.
Pointer Sanity
editMany heap overflows become effective by overwriting the housekeeping data at the start of the next heap chunk, which normally contains at least one linked list. Allocating or freeing an overwritten chunk can cause data to be written at an arbitary address in memory. Most heap systems now check the data pointed to by linked lists, to ensure that they point at another heap chunk or valid data.
This method of protection is also present in the Microsoft Windows "Structured Exception Handling" routines. Before calling an exception handler (the pointer to which resides on the stack, and can be overwritten), it is first checked to ensure that the routine resides within an executable section of memory. If the handler routine does not, then it is not called.
Safe String Libraries
editBecause the standard library string functions are the common cause of stack overflows, a number of libraries with "safe" string functions have appeared to try to address this problem. Most of them require an explicit “string length” parameter in their functions’ arguments, and limit the data copied to that amount.
The programmer must obviously still be careful and enter accurate string length values; sloppy programming can still cause trouble.
Exercises
editWe will leave as an exercise for the reader to write a set of safe string functions, that take a length parameter, and perform simple bounds-checking to prevent overflow. Another option would be to take as an argument a pointer to a "maximum" stack position, and compare pointers to prevent overflow.
Cracking Windows XP Passwords
This page is about cracking (recovering) passwords on Windows XP machines, which is a computationally difficult process. If you just need to set a new password (but without need to recover the old one), then this guide is not for you. For that, you can use, for example, the free-software tool Offline NT Password & Registry Editor or other similar programs.
Background
editThe Windows XP passwords are hashed using LM hash and NTLM hash (passwords of 14 or less characters) or NTLM only (passwords of 15 or more characters). The hashes are stored in C:\WINDOWS\system32\config\SAM
. The SAM file is encrypted using C:\WINDOWS\system32\config\system
and is locked when Windows is running. This file is a registry hive which is mounted to HKLM\SAM
when windows is running. The SYSTEM account is the only account which can read this part of the registry. To get the passwords, you need to shutdown Windows, decrypt the SAM file, and then crack the hashes. If everything goes well, you'll have the passwords in 15 minutes.
The hashes can be also obtained from running system using software like pwdump. However, it requires to be run under an account with administrator privileges.
Three ways to recover Windows Password
editUsually, we can recover Windows admin password in two traditional ways. The first is to change Screen password with another admin account; the second is to recover the previous password with the windows password reset disk that had been created before you forgot the password. Take Windows XP for example,
- At the Windows XP login prompt when the password is entered incorrectly click the reset button in the login failed window.
- Insert the password reset diskette into the computer and click Next.
- If the correct diskette Windows XP will open a window prompting for the new password you wish to use.
However, we often ignore the importance of security until we have been locked out of computer. Fortunately, there is still the last way that can unlock your computer without reinstalling - erase Windows password with Windows password reset CD, which can recover admin password for Windows 7/XP/Vista/NT/2000/2003.... Take Windows Password unlocker for example, followings are the steps to create the reset CD
- Download Windows Password Unlocker from Password Unlocker Official site
- Decompress the Windows password unlocker and note that there is an .ISO image file. Burn the image file onto an blank CD with the burner freely supported by Password Unlocker.
- Insert the newly created CD into the locked computer and re-boot it from the CD drive.
- After launched the CD, a window pop up with all your account names(if you have several accounts) select one of the accounts that you have forgotten its password to reset it.
Detailed Instructions for LoginRecovery.com Service
edit- Go to http://loginrecovery.com/ and from the home page click the option to download either the floppy disk image or CD image. If you use the floppy disk image, insert a blank floppy disk into your computer, run the program and a bootable floppy will be created. If you use the CD version, you will need to manually burn the ISO image to a CD, using software which specifically burns ISO images
- Insert the floppy disk or CD into the target computer from which you wish to extract the passwords. Then boot the computer. You may need to alter the BIOS settings to ensure the floppy drive or CD is booted from.
- If you used the floppy drive some messages will briefly appear on the screen and then the computer will shutdown. On the floppy disk will be a newly created file called "upload.txt" which will contain the encrypted passwords. If you used the CD version, the encrypted passwords will be shown on the screen; write them down into a text file.
- If you wish to wait up to 48 hours or pay to get your passwords, then you can upload the file onto the LoginRecovery site. Otherwise, continue reading.
- The file will consist of several 2-line entries, one for each account. Copy the 2 lines for the account you want and paste it into this utility to decode it into the "pwdump" format.
- Use any of the tools in the following section to decode the pwdump hash.
Notes
edit- If the information retrieved from the pwdump consists of an empty first part, then the LM hash is not stored. This means that the password is blank, in which case it would look like this:
Administrator:500:0: _31,D6,CF,E0,D1,6A,E9,31,B7,3C,59,D7,E0,C0,89,C0,xxxxx:::
If it says anything different, then they implemented better security and force you to crack the NTLM hash, which is much more difficult and out of the scope of this guide.
- This only works if the password is 14 characters or shorter
- If the password in Windows 2000/XP/2003 is longer than 14 characters, it will be shortened to two hashes of length seven characters each
- An alternative, which uses the same method of comparing known hashes against unknown is called RainbowCrack, available at http://www.antsight.com/zsl/rainbowcrack/ although this program uses Rainbow Tables that can be in excess of 64 Gb; these tables can be obtained at http://rainbowtables.shmoo.com/
- A comprehensive project of comparing known hashes against an unknown is at http://www.rainbowcrack.com/ however it requires that you submit a Rainbow Table before you can gain access to their server
Defense against attack
edit- Have a password longer than 14 characters.
- http://support.microsoft.com/kb/299656/ - prevent Windows from storing LM hashes
Mac OS X 10.3
editMac OS X 10.3 (Panther) also stores shadowed LM+NTLM hashes for each user. They can be cracked in the same way as the hashes for Windows above
- First find the "generateduid" for the user you want with the command
$ niutil -readprop . /users/<username> generateduid 70902C33-AC79-11DA-AFDF-000A95CD9AF8
- The hashes are stored in the file /var/db/shadow/hash/<generateduid>. The file is 104 characters long, consisting of the 64-character NTLM+LM hashes and the 40-character SHA1 hash. To retrieve the NTLM+LM hashes, you can run this command as an administrator for example
$ sudo cut -c1-64 /var/db/shadow/hash/70902C33-AC79-11DA-AFDF-000A95CD9AF8 996E6760CDDD8815A2C24A110CF040FBCC5E9ACBAD1B25C9AAD3B435B51404EE
- The hashes are stored in the reverse order as the pwdump format (NTLM first instead of LM first), so you need to switch the 32-character halves and insert a colon between them
CC5E9ACBAD1B25C9AAD3B435B51404EE:996E6760CDDD8815A2C24A110CF040FB
- Then follow the instructions for Windows passwords
Mac OS X 10.4
editMac OS X 10.4 (Tiger) improves the security by only storing LM+NTLM hashes for users who enable Windows Sharing for their account; and when they do enable it, it asks them to enter their password with a warning that their password is stored in a less secure format. However, for those users with Windows Sharing enabled, the above method will still work. The shadow file format is a little different, but the LM+NTLM hashes are still the first 64 characters. If the hashes are not stored, you will get all 0's when you try to retrieve the hashes.
Samba passwords
editIn older versions of Samba, the password hashes for Samba users were stored in the file /etc/smbpasswd (location may vary, only root has access) and are in similar format to Windows password hashes discussed above. In newer versions of Samba, run the following as root to get the same information:
pdbedit -L -w
File Formats
This section will talk about reverse-engineering proprietary file formats. Many software developers need to reverse engineer a proprietary file format, especially for the purposes of interoperability. For example, every year the Open Office project needs to reverse engineer the Microsoft Office file formats. Furthermore, reverse engineering is required for forensics purposes. The chapters in this section will talk about how to understand a proprietary file format.
Typical Features
editFile Header
editMost file formats begin with a "header," a few bytes that describe the file type and version. Because there are several incompatible file formats with the same extension (for example, ".doc" and ".cod"), the header gives a program enough additional information to see if this file is one of the formats that program can handle.
Many programmers package their data in some sort of "container format" before writing it out to disk. If they use the standard zlib to hold their data in compressed form, the file will begin with the 2 bytes 0x1f 0x8b (in decimal, 31 139 ).
Blank Space
editSome files are made up largely of blank space, for example, .ds_store files generated by OS X. Blank space will appear as a series of 0's in a hex editor. The creators of a file format may add blank space for a variety of reasons, for example, the author of this study on .ds_store files speculated that they exist to speed up writing data, as other data would not need to be pushed around to make room. They could also serve to prevent fragmentation.
For most purposes, blank space can be ignored.
Tools
editFile format reverse engineering is the domain of hex editors. Typically they are used more often to display file contents as opposed to editing them. Hex editors allow you to superimpose a data structure on top of the data (sometimes called custom views or similar), which are very helpful. Once a particular structure has been discovered in a file, these mechanisms can be used to document the structure, as well as to provide a more meaningful display of the information than just hex code.
Also useful are Unix/Linux tools like strings(1) and file(1).
- strings
- Finds and prints sequences of printable characters in a file. This can give hints of what data is embedded in the file.
- file
- Attempts to determine a file type. Sometimes file format designers re-use already well known file formats or file compression algorithms. There is a small but notable chance that file(1) can reveal this.
Windows ports of these tools are also available. E.g. as part of the Cygwin environment (strings is part of the binutils, don't ask ...).
Equally important as a hex editor is a brain. File format reverse engineering means to reason about what the hex editor and other tools displays. To guess structures, relations and the meaning of the data, to develop theories and then verify them. Very few tools can help here.
In a few limited cases additional tools are helpful. E.g. for checking brute-force if a particular part of the file consists of some embedded, compressed data. Typically such tools are written or scripted on the fly as custom tools. Another typical set of custom tools are the ones which are used to break up a file into separate components - once it has been discovered that a particular file indeed consists of separate parts, and how they are separated in the file. C/C++, Java, but also scripting languages like Perl are often used here (Perl because it can handle binary data, while classic Unix scripting tools are often limited to text data only).
In some cases a proprietary file format might contain executable code. For example, a firmware update file for some embedded device very likely contains executable code. Typically that code is wrapped into some structure, e.g. a file system, compressed, garnished with boot/flash code etc. In such cases a disassembler/decompiler for the particular executable format might be helpful.
Further, documentation of and familiarity with checksum algorithms, compression algorithms, encoding techniques, and also programming languages is very helpful.
Also very helpful is the availability of the application that produces and reads the proprietary file format. That application can be used to create test files, but also to verify if an own generated file is correct.
Strategies
editLook for the obvious first. E.g. magic numbers, a block structure, ASCII text in the file. Anything that can be more or less identified clearly can be the entry ticket to more. Once a particular structure has been identified, look for in-file pointers to that data. E.g. if the data is referenced from some other part of the file with an absolute or relative address. It is also very important to find out the byte order (little endian or big endian).
Choosing the target
editIf you have access to the software that created the file, you can always create files with the contents of your choosing. This makes reverse engineering substantially easier. In cryptography terms, you are engaging in a chosen-plaintext attack.
Probing
editOnce you formulate theory as to what some data in the file might mean, you can verify that theory by creating a manipulated file. Replace it with some other data using a hex editor or a custom tool. Then load the manipulated file into the original application. If the application loads the file and displays the intended change, the theory is probably correct. Sometimes it is not trivial to change the application and reload it because of the defense mechanism that may be present. Some application check the hash and signature of the code before running it.
Compression, Encryption & Scrambling
editIntroduction
editFile formats which are either in part or completely compressed, encrypted or scrambled are among the toughest nuts to crack. Of course, compression is different from encryption, and typically done for a different purpose. However, the resulting file formats often look similar: A bunch of gibberish. This is the intended result when file format designers go for encryption, but it is also often a desired side effect when compression is applied.
Please note: Wikibooks does not give legal opinions |
If checking a file with a hex editor or similar reveals that it just contains gibberish and e.g. not any easy to identify text strings, patterns or similar, it might indicate that the particular file is compressed, encrypted or scrambled. The methods for reverse engineering these files are similar. There might, however, be a big difference from a legal point of view. Many countries have laws against circumventing copy protection, and encryption can be seen as some kind of copy protection. See Reverse Engineering/Legal Aspects for some more hints regarding this, and seek qualified legal advice before attempting to reverse engineer an encrypted or otherwise protected file format. Similar issues might arise when a file format just uses scrambling. The format "owner" might argue that the scrambling is used as some kind of copy protection, encryption or whatever, and circumventing it might break some law. Again, seek qualified legal advice.
The remainder of this section only deals with reverse-engineering the compression of a file. This is typically just an initial step in the complete reverse-engineering process. Once it has been successfully decompressed, other reverse engineering methods need to be applied to identify the file contents and structure.
Well-Known Compression Algorithms
editOften file format designers apply well known compression algorithms. Either in the form of even using a particular, well known implementation of a certain algorithm (a well-known tool), or by re-implementing a well known algorithm unchanged. In the easiest case this has been documented. For example, it is well documented that the OpenOffice file format uses ZIP archives, and therefore there is no point in reverse engineering that format.
Unfortunately, for many formats we don't have this documentation. In case a well-known implementation of a particular algorithm has been used it is often relatively easy to reverse engineer. Such compressed file formats tend to start with a format identifier (magic number), clearly identifying the particular compressed format. The compression tool has left its "fingerprint" in place.
Example:
The following is a hex dump of the first few bytes of a fictitious firmware update file for a particular SOHO router
00000000 60 ea 27 00 1e 06 01 00 10 00 02 84 84 86 dc 34 |`.'............4| 00000010 84 86 dc 34 00 00 00 00 00 00 00 00 00 00 00 00 |...4............| 00000020 00 00 44 54 41 2e 41 52 4a 00 00 b1 18 78 a6 00 |..DTA.ARJ....x..| 00000030 00 60 ea 27 00 1e 06 01 00 10 01 00 84 4c 86 dc |.`.'.........L..| 00000040 34 9b 17 0c 00 e8 a4 25 00 25 10 0d 10 00 00 20 |4......%.%..... | 00000050 00 00 00 44 54 41 2e 4d 45 4d 00 00 50 98 0b 8f |...DTA.MEM..P...| 00000060 00 00 1f 30 84 dd 7b db 48 da 6f fd ee fd bb da |...0..{.H.o.....|
The file is compressed with ARJ. Not only does the string DTA.ARJ give it away for the human eye, but also the first two bytes 60 ea, which are known to identify ARJ-compressed files.
The Unix/linux tool file(1) is quite aware of many standard compressed file formats.
Example:
file returns the following for the above mentioned firmware file
firmware.bin: ARJ archive data, v6, slash-switched, original name: DTA.ARJ, os: MS-DOS
The next steps after the compression format has been discovered is obvious: To obtain a version of the used compression tool and to use it to decompress the data. The result, however, often needs more reverse engineering. For example, the above mentioned router firmware might contain separate sections for separate areas of the router's flash memory, each guarded with an own checksum.
A variant of using a well known compression algorithm and tool can also sometimes be found, which is more difficult to reverse engineer. In such a case the file is prefixed with some additional data, and the actual compression format can't be identified by just checking the file format. Lets assume, for example, another fictitious SOHO router's firmware update file, which is build as it follows:
Example:
Fictitious structure of another SOHO router firmware update file:
+--------------------------+ | Boot loader | +--------------------------+ | Decompression algorithm | +--------------------------+ | Compressed data | +--------------------------+
Of course, the format can only be known once the file format has been reverse engineered. So how is that done? Well, in the fictitious case we assume that an inspection with the Unix/Linux tool strings(1) reveals the following interesting strings in the file:
Example:
Abridged output of strings:
: : unknown compression method invalid window size incorrect header check need dictionary incorrect data check invalid block type invalid stored block lengths too many length or distance symbols invalid bit length repeat inflate 1.1.3 Copyright 1995-1998 Mark Adler oversubscribed dynamic bit lengths tree incomplete dynamic bit lengths tree oversubscribed literal/length tree incomplete literal/length tree oversubscribed distance tree incomplete distance tree empty distance tree with lengths invalid literal/length code invalid distance code invalid distance code invalid literal/length code incompatible version buffer error insufficient memory data error stream error file error stream end need dictionary 1.1.3 application.bin : :
The strings are very revealing, and those knowledgeable will recognize the name Mark Adler as one of the authors of zlib zlib, which is the base for info-zip as well as GNU's gzip. Those not so knowledgeable might at least have the idea to search for the name and the keyword compression.
It is a good bet to assume that at least parts of the file are ZIP compressed. Further probing might reveal that the file does not contain a complete ZIP archive, but just a section which is compressed with the ZIP deflate algorithm, and supposed to be decompressed with the ZIP inflate algorithm (likely version 1.1.3, as the output of strings revealed). Therefore, the fictitious file might be further separated into its components by using a custom tool which iteratively applies the inflate algorithm to the file, until the generated result makes some sense (e.g. until the result contains some recognizable clear text strings).
Unknown or homemade Compression Algorithms
editIf the software that either creates or reads the file is available then it is very possible to reverse the file format. You can use live analysis of the running application when reading/writing the file. Doing this is likely the easiest way to determine the data structure of the file.
If the software is not available, all bets are off if there is an unknown or homemade/ad-hoc compression algorithm, or a non-standard implementation of a known algorithm. One has to be exceptionally lucky to figure out the details of the applied algorithm, so the accompanying decompression algorithm can be constructed, although cryptologists strongly discourage the use of ad-hoc encryption schemes, as they typically do not stand up to serious cryptanalysis.
Sometimes additional information can be found. E.g. if a vendor has filed a patent application for a particular algorithm, or is known to have fallen in love with a particular compression technology in other products, e.g. communication protocols. Sometimes it might turn out that the file format actually belongs to some OEM or 3rd party product, and that information about that product is available.
Otherwise, there is a small chance that trial-and-error might reveal something about the file. e.g. run-length encodings are a popular, simple and easy to implement compression algorithm, so they can sometimes be found in homemade implementations. It might be worth a try to investigate if a file might be compressed that way. An investigation of a few other well known compression techniques might also be worth a try.
Last but not least, crypto-analysis techniques might reveal something interesting about the compression. E.g. reoccurring blocks of information might point to a particular compression algorithm. However, this requires a lot of effort, time and skill.
This page or section of the Reverse Engineering book is a stub. If you have information on this topic, write about it here.
Legal Aspects
It is quite often the case that reverse code engineering a software product is teetering on the border of legal and illegal. Note that reverse engineering a competing car or a weapon is never legally challenged, nor was reverse engineering software a few decades ago. So as a reverse engineer, you should know your rights and the rights of the software owner. This chapter will focus on just that, exploring issues surrounding patents, copyrights, and licensed software. Even if you play by the rules, you are not immune to harassment lawsuits. (NB: The material here reflects the legal position in the USA. Other juridictions may have different laws.)
Patented Software
editExplain the rights of the software owner under the patent law
Copyrighted Software
editThere are laws about the copyright that someone who reverse-engineers must take care of in open source projects, and the common approach to this problem is to divide the programmers into 2 groups:
- The one who disassembles the code of the program/firmware and writes the specifications.
- The second group that makes a program using these specifications.
Fair Use
editUnder a few circumstances, fair use allows the reproduction of copyrighted material without the owner's permission. The Copyright Act of 1976, 17 U.S.C. § 107 states specifically:
“ |
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—
The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors. |
” |
In terms of reverse engineering and fair use, the law tends to favor the reverser. However, negatively affecting the value of the original product will almost never result in it being categorized as "fair use." Also keep in mind that fair use does not permit breaking the user license terms.
It needs to be noted that fair use is not black and white. The line between fair use and copyright infringement is very gray. Unless you are very confident about what you are doing, you shouldn't do it.
Digital Millennium Copyright Act
editThe Digital Millennium Copyright Act was put into place in 1998 in order to make any service or device with purpose of undermining or removing DRM (Digital Rights Management) copyright infringement. The act forbids any service or device from being designed to circumvent, or even being marketed to circumvent any DRM.
There is, however, an exception in the DMCA stating that reverse engineering can be done under the purposes of inter-operability between software components.[1] It states the following:
“ |
REVERSE ENGINEERING.—
|
” |
Fair use does still apply. However, it is not fair use to gain unauthorized access to copyrighted work.[2]
End User License Agreement
editAn end user license agreement (or EULA) is a legal contract between the software manufacturer and the user. It explains the terms under which the user may use the software, giving a list of conditions of what the user may and may not do. This contract can state anything from the number of copies that can be made to conditions under which it can be reverse engineered.
EULA and Fair Use
editFair use seems to be safe ground for reverse engineers, almost always using it as a defense. However, an EULA is a legally binding contract. If a user agrees to terms which are in conflict with fair use, the user has effectively waived their rights to fair use.
In the case of Davidson & Associates v. Jung [3], Ross Combs, Rob Crittenden, and Jim Jung reverse engineered Blizzard's protocol language to allow gamers to play pirated video games online. In this case, the reverses agreed to an EULA and TOU (Terms of Use) prohibiting reverse engineering. The judge found the EULA and TOU to be enforceable by law and that a user's right to reverse engineer a product can be contractually waived.
Famous Cases
editWhen Nintendo came out with the Nintendo Entertainment System, they designed a program, the 10NES, to prevent unauthorized video games from working on the NES. In order to make an authorized game, you had to become licensed with Nintendo, and the license agreement essentially stated that a company could make up to five games per year and prevented them from selling the same games to other home entertainment systems.
Atari attempted to crack the 10NES to bypass the restrictive licensee agreement. In 1986, they purchased some NES units and started reverse engineering. By chemically dissolving top layers of the chip containing the 10NES, they could use a microscope to physically look at the bits and accrue some of the object code. The object code was then decompiled to source. However, Atari was unable to completely reverse the 10NES using this method.
In 1988, Atari requested a copy of the 10NES source code from the Copyright Office by falsely saying they were involved in an infringement lawsuit with Nintendo. After completely understanding the 10NES program, they built a program to defeat it. In 1989, Nintendo filed charges against them for unfair competition, patent infringement, copyright infringement, and trade secret violations.
One of Atari's defenses was that reverse engineering was fair use under the copyright law. In the end, the courts decided the act of chemically peeling back the chip and looking at the bits to get the object code on systems they purchased was fair use. It was expected that the courts would find Atari at fault for copyright infringement for stealing the source from the Copyright Office. However, in 1994, Atari and Nintendo settled out of court.
This case concerned Sega's video game console and cartridges. The cartridges had a 20-25 byte code segment which was interrogated by the console, as a security measure.
Accolade disassembled the code which was common to three different Sega games cartridges, to find the security segment, and included it in competing games cartridges.
The Ninth Circuit held this disassembly to be a permitted "fair use" of the copyright in the games' programs. The disassembly of copyrighted object code, as a necessary step in examination of the unprotected ideas and functional concepts embodied in the code, is a fair use that is privileged by section 107 of the Copyright Act: because disassembly was the only means of gaining access to those unprotected aspects of the program, and because Accolade has a legitimate interest in gaining such access (to determine how to make its cartridges compatible with the Genesis console).
Jon Johansen Case
editGive a description of this case
Further Reading
edit- "The Law and Economics of Reverse Engineering", Pamela Samuelson and Suzanne Scotchmer, Yale Law Journal 111, May 2002, 1575-1663.
References
edit- ↑ Digital Millennium Copyright Act, Public Law 105–304 (1998)
- ↑ "The Digital Millennium Copyright Act of 1998" Copyright Office Summary (December 1998)
- ↑ "Davidson & Associates v. Jung, 422 F.3d 630 (8th Cir. 2005)"
- ↑ "Atari Games Corp. v. Nintendo of America Inc." U.S. Court of Appeals (September 1992)
- ↑ "Sega Enterprises Ltd. v. Accolade Inc." U.S. Court of Appeals (October 1992)
Mac OS X
This page may need to be updated to reflect current knowledge. You can help update it, discuss progress, or request assistance. |
Apple Computer's Mac OS X is the standard Operating System used on Apple Macintosh computers. Other operating systems, primarily Linux, have been ported onto Mac Hardware, and there has been some effort to port OS X onto non-Mac Intel-based hardware, but neither of these efforts has attained the kind of popularity that the "standard bundle" has attained.
Mac OS X has been critically acclaimed by many people in the computer world as being both beautiful and easy to use. OS X is built on a BSD and Mach core but has a certain amount of software that is Mac-specific.
Try hard to keep this on the subject of general reverse engineering for Mac OS X, and not on 'cracking', or reversing only for security purposes. I have created special sections for these subjects, and all material focused on them should be kept there. Thanks! --Macpunk 04:17, 9 July 2007 (UTC)
Hardware Architecture
editHistorically before OS X Macs ran the Mac OS operating system on the Motorola 68000 through the 68040 and PowerPC architectures. Steve Jobs would later leave Apple to create NeXT. After Apple had completed its hardware migration to the PowerPC platform it looked to a new kernel that could take advantage of this new hardware architecture. Many projects were started and failed and this and other factors led to the decline of Apple. In a move to capitalize on the new architecture it turned to Be Inc. to purchase its new BeOS, this would later fall through as Be Inc. wanted too much money. Apple then turned to NeXT and acquired not only the NeXT OS but Steve Jobs. Steve Jobs quickly took control of Apple and placed the NeXT architecture as the replacement for Apple's aging Mac OS. The replacement product was originally known as Rhapsody which had the older Mac OS feel to it. Steve Jobs felt the interface did not do it justice so his team of ex-NeXT engineers developed Aqua and Mac OS X was born.
Mac OS X 10.0 "Cheetah" through 10.4.3 "Tiger" would only run on the 4th and 5th generation of the PowerPC architecture. It became clear to Apple that IBM was having trouble with the 5th generation of the PowerPC known as the PowerPC G5 both in Development and Manufacturing. In addition IBM had yet to release a laptop version of the G5 process a year after it promised Apple it would. Apple then decided to migrate away from the PowerPC architecture and to an Intel based one. Apple chose the Intel 32-bit Core Duo architecture. Apple's second generation of Intel products appeared less than a year later running the Intel 64-bit Core 2 Duo architecture.
Apple originally included a Trusted Platform Module (TPM) to help curb pirating of Mac OS X. Later, Apple would turn to a simple AES encryption system where the encryption keys were stored in a kernel device driver. This led to the ability to decrypt and even encrypt Mac OS X executable binary files. The new TPM system is no longer present in any modern Mac. The new encryption system is only available to the Intel based Macs and yields all sorts of errors if attempted on the PowerPC platform.
Apple has committed to supporting both PowerPC and Intel platforms for the next few years. Every Mac OS X system today ships with its binary files in a Universal binary format which can be ran in both PowerPC and Intel based Macs. The Universal binary is simply the source files compiled multiple times, (once for each architecture), and then glued together afterwards. When the OS reads this universal binary it will then select the proper version of that compiled code and execute it. Since not all binary files are Universal, Apple released for the Intel platforms a software component called Rosetta which would dynamically translate PowerPC system calls to Intel system calls allowing the PowerPC binary to be executed on an Intel Mac.
Software Architecture
editAll builds of Mac OS X (OS X) are built on top of an XNU kernel and Mach-O file format. The XNU kernel is a Hybrid kernel. The kernel is divided into 4 sections.
Kernel Sections
edit- The Hardware Platform Expert
- The Mach 3.0 Subsystem (OSFMK)
- The BSD 4.4 Subsystem
- The IOKit Subsystem and Framework
While the traditional Mach kernel is a Microkernel, Apple has instead implemented its variation of Mach 3.0 with a Monolithic design. The Mach subsystem is only a partial implementation of the Mach 3.0 kernel that was designed by Carnegie Mellon University. This partial implementation consists of the Mach Messaging system, Mach Virtual Memory System and Mach Process Manager.
The BSD 4.4 Subsystem is a micro implementation of the FreeBSD 4.x kernel. Over time Apple has been shrinking and reducing the feature set of this kernel subsystem in the hope to eliminate all but the essential pieces to run BSD source code software on the XNU kernel. Originally the subsystem had support for BSD device drivers, which could communicate directly to hardware. Unfortunately the device driver architecture in the BSD Subsystem is only able to support direct main memory access and to interface into the UserMode part of a running process.
The IOKit Framework is a subset of the C++ programming language known as Embedded C++. The IOKit Subsystem drives the components written in the IOKit Framework. IOKit's purpose is to unify and simplify the Driver architecture while maintaining some level of compatibility between major and minor OS releases. IOKit has generally been a resounding success as some other BSD operating systems have ported or implemented an IOKit like system to it, (such as DragonFly BSD).
The Hardware Platform Expert deals with the hardware differences of the PowerPC (G3, G4 and G5), Intel 32-bit, Intel 64-bit, Intel Xeon 64-bit (Mac Pro and XServe) and ARM (iPhone) architectures.
Commonly Used Tools
editThe common tools used to both compile/create the software and to disassemble/debug the software has been titled by Apple as the "Developer Tools". The developer tools can be found both on the Installation DVD for Mac OS X 10.4 and higher as well as the Apple Developer Connection (ADC) site. Joining ADC is free and is highly recommended. The ADC site has up to date documentation, tools and even sample source code. The ADC site should be your 1st place to do you research. A summary of the developer tools can be found at Apple's official XCode And Tools website. The tools commonly used on the Mac OS X platform for reverse engineering besides the developer documents are found in the list below.
Developer Tools Used
edit- gdb (GNU Debugger)
- nm (Object File Symbol Table Viewer)
- otool (Object File Display Tool)
- fs_usage (File System Monitoring Tool)
- lsof (File Descriptor Table Viewer)
- vmmap (Virtual Memory Regions Viewer)
- lipo (Universal Binary Handler)
- file (Binary File Format Analyzer)
All of the above tools are installed during the Developer Tools Installation. As of current writing (3 Aug 2008) the current Developer Tools version is 3.1 (Build 2199).
Third party tools:
- [1] class-dump is useful for parsing Objective-C runtime information.
Reversing Basics
editArchitecture
editSince most target binaries that you wish to reverse engineer on the Mac OS X platform are in the Mach-O Universal Binary format you should decide which target binary platform you wish to reverse engineer. To get a list of what formats a specific binary has you would call the "file" program. Example:
A common example using the file "/bin/ls":
$ file /bin/ls /bin/ls: Mach-O universal binary with 2 architectures /bin/ls (for architecture i386): Mach-O executable i386 /bin/ls (for architecture ppc7400): Mach-O executable ppc
Another example, this time more of a rare one, using the file "/System/Library/Frameworks/ApplicationServices.framework/ApplicationServices"
$ file /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices: Mach-O universal binary with 4 architectures /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices (for architecture ppc7400): Mach-O dynamically linked shared library ppc /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices (for architecture ppc64): Mach-O 64-bit dynamically linked shared library ppc64 /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices (for architecture i386): Mach-O dynamically linked shared library i386 /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices (for architecture x86_64): Mach-O 64-bit dynamically linked shared library x86_64
Symbols
editOnce you have identified the architecture you wish to use as your base for reverse engineering you would then dump the symbol table. This can be handy for the future. Example:
Common symbol table dump from the i386 architecture:
$ nm -arch i386 /bin/echo U ___error 00001000 A __mh_execute_header U _exit U _malloc U _strerror$UNIX2003 U _strlen U _write$UNIX2003 U _writev$UNIX2003
The above symbols can be broken up into 2 major categories:
Symbol Types
edit- External
- Internal
There is a 3rd category of symbols which are called "hidden" or "stripped" symbols. These symbols do not show up on nm and are hard to find out what they are doing and if they exist at all.
Each symbol type has a scope. The scope can either be private or public. In the past you could set the dynamic linker to a "flat namespace" which would convert the private symbols to public for your program only, however it has been reported that this functionality has been disabled on most libraries.
A private symbol is a symbol that is addressable by either the entire program or a section of the program only and can not be addressed by anyone else. A public symbol is one that is commonly known on other platforms as "Exported". The public symbols can be accessed by anything that links to that binary either at compile time or runtime.
Internal Symbols
editInternal symbols are symbols that are defined within the program and thus are not imported, (dynamically linked), during runtime. An internal symbol can however be an external symbol that was linked in at compile time and the source of that symbol was an object file or a static library. You can identify an internal symbol quickly because the line with the symbol has a hexadecimal number before the symbol type letter. External symbols have a blank space where the number should be. The number specified in the symbol table denotes at what offset in the file that symbol's code or data starts at. This value is relative and WILL be different at runtime. One way a symbol is located in memory during runtime is to find the relative positions of 2 symbols on the disk, 1st being a well known symbol and the 2nd being an unknown one, then extract the difference. Once you have the difference you can find the 2nd symbol in memory by simply apply the difference to the 1st symbols address. Example:
Example
editFind the 1st and 2nd symbols:
$ nm /System/Library/Frameworks/QTKit.framework/QTKit /System/Library/Frameworks/QTKit.framework/QTKit(single module): {...} 0005a638 T _copyBitmapRepToGWorld 0008b017 t _createDisplayList {...}
In the above example our 1st symbol is "_copyBitmapRepToGWorld" which in a program is known as "copyBitmapRepToGWorld". Our 2nd symbol is "_createDisplayList" which in a program is unknown since its a private symbol, (See private symbols). Once the function definition for the symbol "_createDisplayList" can be determined then it becomes important to define that symbol for your program's use. To do this lets assume that "_createDisplayList" C function prototype would be:
void * createDisplayList(void);
The above prototype would be defined in the source code for the QTKit which is our target. That unfortunately doesn't help us since both the function prototype and the symbol name is unknown to our program. To resolve this problem we simple compute the difference from the above symbols, (the difference is 0x309DF), and define our function prototype as this:
void * (* createDisplayList)(void);
Then you would assign that function its address by having another function, (such as main), execute this command before you use that function for the 1st time:
createDisplayList = copyBitmapRepToGWorld + 0x309DF;
Some programs can get away with doing the above in 1 command outside of a function, I would NOT recommend this as the Mac OS X dynamic linker dyld sometimes will change the value of the symbol address before you enter your main function but after the variable's initial values have been defined.
External Symbols
editExternal symbols are symbols that are defined elsewhere like in a library, (see library below). To read an external symbol you simply strip the leading "_" off. If the symbol has a "$" in its name then everything past the 1st "$" is a hint to the dynamic linker that this symbol is an explicate external symbol and should be matched with that exact version of the symbol in the external library. An explicit symbol is very helpful for a program creator since it allows him/her to make it difficult to override the symbol or to have a runtime link mismatch error. The letter to the left of the symbol name, (in the above example "U"), denotes the type of symbol such as function or data structure.
PowerPC
editBasic instructions include li (load immediate) and mr (move register).
The Stack
editThe PowerPC stack works exactly as any other stack would. It's a LIFO structure, and it grows downwards(towards lower memory addresses). The most important detail to remember when reversing PowerPC binaries is that the PowerPC chip has no built in implementation of a stack. There's no register designated to keep track of where the bottom of the stack is, and there's no instructions to push and pop data off of the stack. Everything is done via a general purpose register, and various arithmetic instructions.
(This section will contain PowerPC specific information like how PowerPC function calls are executed, how arguments are passed to functions, the stack format, et cetera.)--Macpunk 06:19, 8 July 2007 (UTC)
Intel
edit(This section will contain Intel specific information like how Intel function calls are executed, how arguments are passed to functions, the stack format, et cetera.)--Macpunk 06:19, 8 July 2007 (UTC)
This page or section of the Reverse Engineering book is a stub. If you have information on this topic, write about it here.
Reversing for security
editThis page or section of the Reverse Engineering book is a stub. If you have information on this topic, write about it here.
Reversing for 'cracking'
editThis page or section of the Reverse Engineering book is a stub. If you have information on this topic, write about it here.
Further Reading
edit- Wikibooks: PowerPC Assembly
- A Brief Tutorial on Reverse Engineering OS X [2]
- Cocoa Reverse Engineering [3]
- KellogS' Intro to OS X Reversing [4]
- A Non Practical & Non Real World Intro to Kracking for Mac OS X [5]
- What is Mac OS X?[6]
Special Notes
editA large section of this document has been prepared and written by JosephC7, while this information has been granted for use by Wikibooks for free publication it should be noted that the author only asks that if you republish this information that you provide the author's user name and link to his user page on wikibooks.org. No fees is required or requested for this information and it is expected that if this information is republished that it too be given away freely with out compensation. This document is a work in progress and should be completed by the end of Aug 2008.
Other Compilers
🍎 |
🚪 |
🛡️ |
🐛 | |
general | Software system | Vulnerability | ||
---|---|---|---|---|
Authentication | Identity | Login | ||
Authorization | Privilege | Debugging and administration tools Default permissions Backdoor |
Principle of least privilege | |
Network | Broadband access |
Web exploits | ||
Data | Privacy Data integrity Confidentiality Sensitive information |
Data access | Data security Encryption Data erasure Chain of trust Canary trap NBDE |
|
Application | Code | |||
Social networks |
Private account Ad credit |
Weak passwords
Password recovery |
Security awareness | |
Artificial intelligence | Machine learning | DL Cyber threat, Deepfake | AI safety, ML defenses, CLEVER score | Adversarial machine learning |
Table of Contents
edit- Introduction
- General Security and Passwords
- Malware
- Web
- Online Security Good Practices
- Data Encryption
- Personal Information and Privacy
- Ethical Hacking
- Further reading
Working - A workbench area for contributors.
Please add {{alphabetical}}
only to book title pages.
Stack Overflows
Frequently we hear about malicious code causing a very vague problem called a stack overflow. This page is going to talk about what a stack overflow is, and how to prevent it.
What It Is
editA stack-based overflow attack is the act of putting too much information into a buffer in order to overwrite a return address and hijack the control flow. The overwritten return address will, in most cases, point to some function in the programs address space. This function may already be defined in the application, or it can easily be defined by the hacker by injecting the code into the stack.
If we remember the chapter on the stack, we know a few fundamental facts about the stack when we enter into a new function:
- The stack "grows" downward.
- Local data is pushed on top of the stack.
- The old value for bp is stored below the local data
- The return address is stored below the old bp value
Consider the following buggy C code snippet:
void MyFunction(void) { int a[100]; int i; for(i = 0; i <= 100; i++) { a[i] = 0; } ...
What happens when i reaches 100? As discussed earlier we know that local arrays are created on the stack. If we try to write above the upper bound of "a", we will be overwriting the previous value on the stack: a[100] overwrites bp, a[101] overwrites the return address.
The program flow will then be redirected to the new address we placed. This is a stack overflow vulnerability, and it stems from bad programming where the programmer doesn't check the array bounds before writing data to the array.
Spotting a Vulnerability
editHow do reversers spot a stack overflow vulnerability? Let's take a look at some example ASM code:
push ebp mov ebp, esp sub esp, 100
This is a standard entry sequence, and we can see that this function is allocating 100 bytes of data on the stack. Either 25 integers worth of data, or an array of some sort. We examine the rest of the function, and see what kind of data it is:
call _gets push eax push esp call _strcpy ...
Clearly we are accessing the data on the stack as an array, specifically an array of chars. The above assembly code fragment gets a text string from the console, and copies that data into the local variable on the stack.
Unfortunately the standard C library string functions we are using have a well-known vulnerability: they do not check the bounds of the input arguments. In fact, the <string.h> functions rarely even ask the programmer to supply the size of an array, or the maximum available memory size!
Some of the most common stack vulnerabilities stem from this fact. Offenders to look out for are strcpy, strcat and sprintf, functions whose output string arguments can be larger then the supplied buffer to hold them.
The local variable is only 100 chars (1 char = 1 byte) wide. What happens if we input a string 100 characters long? Remember, ASCIIZ strings are terminated by a null char (00h), that requires an extra slot from the array. That means that the 101st char will be a null byte, and the saved value for ebp will be lost. Now imagine what would happen if we input 104 characters, or even 108 (enough to overwrite the return address). An attacker that inputs just the right values can redirect program execution to a malicious function that may help take over the computer.
Further Reading
edit"Smashing The Stack For Fun And Profit", Aleph One, Phrack, 7(49), November 1996.
Terminology
"Hackers"
editHacking is a term used in popular culture to describe malicious activities of computer users. The movie Hackers was a large influence on bringing the term into common use by romanticising the Hacker as an idealistic youth seeking freedom from tyranny.
There are some fantastic books that help to explain what a real hacker is like:
- Hackers, by Steven Levy
- The Devouring Fungus, by Karla Jennings
- Free as in Freedom, by Sam Williams
- Just for Fun, by Linus Torvalds
- The Cathedral and the Bazaar, by Eric Raymond
- Code Book, by Simon Singh
- In The Beginning... Was The Command Line, by Neal Stephenson
- the cluetrain manifesto, by Rick Levine, Christopher Locke, Doc Searls, David Weinberger
This wikibook hopes to shed some light on what hackers really do, and who they actually are.
Hackers are people who enjoy playing around with computers to make things happen. This often involves circumventing some security aspects of operating systems or applications in order to gain privileged access.
The first chapter is one of the most important chapters to read. Here the term Hacking is defined, revealing some insight into what hacking really is.
The second covers the history of computing and hackers. This might help correct the false impressions propagated by news media.
The 'hacking-culture' follows up next and 'finally' the real thing is assessed. (Note: These methods are illegal if used wrongly, yet the method to prevent or 'cure' this 'attack' is given as well to remain as objective as possible).
I would appreciate it if anyone posts stuff which will help the world to deal with security issues and how to 'deal' with hackers (mostly crackers and scriptkiddies).
The Jargon File or New Hackers dictionary defines the term hacker quite nicely. It also does an exceptional job of pointing out that one does not need to be affiliated with computers at all to be considered a hacker.
The excerpt from The Jargon File:
hacker: n. [originally, someone who makes furniture with an axe] 1. A person who enjoys exploring the details of programmable systems and how to stretch their capabilities, as opposed to most users, who prefer to learn only the minimum necessary. RFC1392, the Internet Users' Glossary, usefully amplifies this as: A person who delights in having an intimate understanding of the internal workings of a system, computers and computer networks in particular. 2. One who programs enthusiastically (even obsessively) or who enjoys programming rather than just theorizing about programming. 3. A person capable of appreciating hack value. 4. A person who is good at programming quickly. 5. An expert at a particular program, or one who frequently does work using it or on it; as in ?a Unix hacker?. (Definitions 1 through 5 are correlated, and people who fit them congregate.) 6. An expert or enthusiast of any kind. One might be an astronomy hacker, for example. 7. One who enjoys the intellectual challenge of creatively overcoming or circumventing limitations. 8. [deprecated] A malicious meddler who tries to discover sensitive information by poking around. Hence password hacker, network hacker. The correct term for this sense is cracker.
Maybe it is helpful to note that the people that program the Linux kernel are called "Linux hackers".
A brief History of Hackers
editAs long as there have been computers, people were there to 'hack' them. But this activity really hit the headlines when the Internet arrived. Yet history teaches us that this event wasn't an evil thing at all, hackers actually 'maintain' the Internet as it should be. It is unimaginable for the computer/Internet society to grow so large without people who were at the cutting edge of technology (hacking internet it's way up). Just imagine how it would be if there weren't any hackers... Most technology you use today would not exist. Ask yourself: Would your computer be this powerful if it wasn't put to the edge? Would software be as reliable as it looks? Would there be a spiral of cutting-edge innovations?
You can answer all these questions with no. It might seem strange to think positively of hackers, but know that much is done on the edge of society (mostly not in the middle).
The Hacker-Culture
editAdvanced computer users describe themselves as hackers; those who use their skills for malevolent purposes are termed "crackers". The term crackersimplies breaking things, in the sense of cracking the integrity of a computer system; and they work through cracks in security, like climbing through a crack in a wall; but their main act is breaking into computers for example by figuring out the root password, which is like cracking a safe at a bank. This means that crackers break the law, yet this isn't enough to get indepth information about the hacker-culture.
In this chapter the main hacker-personalities will be described. In a rather unusual way: the media is used to get to know the real group. This means that you'll be able to understand that some of the people certainly not worth the name hacker.
Terminology
editHack is an onomatopoeic verb describing the noise and actions of chopping at something with a blade (i.e.: He hacked away at the underbrush with a machete), or a particularly nasty cough (i.e.: The chainsmoker hacked up some brown phlegm), but which also came to describe the act of typing on a typewriter, for the same reason (the annoying, incessant HACK HACK HACK, Ding! CRASH! HACK HACK HACK, &c).
From there it became associated not only with the action itself, but also those doing the typing. For example, a "hack"--a bad writer/journalist--would "hack out" a poorly researched or unoriginal story on his typewriter. While the less noisy tapping of computer keyboards began to replace the harsh noise of the typewriter, the old terminology was carried over to the new technology. Thus, the original "hackers," (long before the PC or word processors) were merely called that because they would spend their days "hacking away" at their console keyboards, writing code.
Note: It's worth noting that, when computing was in its infancy and console time at the giant mainframes was scarce, programmers would often hand-write code or type it out on typewriters before they manually plugged it into the machine. Also, the earliest consoles were basically automated typewriters.)
Nowadays, a hacker is, within the software development community, any skilled programmer, especially among open-source developers. A hack in turn, is a quick-and-dirty patch, fix, or utility which may not be well documented or necessarily reliable, but which gets the job done, whatever that job may be.
Crackers are skilled programmers who exploit the limitations of computer networks, and write up cracks--malicious hacks--to automate the dirty work. These hacks/cracks may attempt to break into remote machines e.g. He hacked into the school's server to increase his phys-ed grade., crack passwords (the most useful utility for this is simply called Crack), decrypt data, or simply modify proprietary software so the cracker doesn't have to pay for it e.g. He downloaded a cracked version of Dreamweaver, because he couldn't afford to buy it..
Viruses, trojans, and worms are also hacks/cracks of a sort. Crackers are especially fond of worms, which spread without user interaction and can be used to create giant, distributed supercomputers that can then be used to attack other computers (the Code Red worm used the combined power of infected computers to flood the White House web server, making it inaccessible to regular users for a time).
When a hacker finds a security hole in the software they're using, they hack out some code to patch it. When a cracker finds a hole, they hack out code to exploit it, ideally bringing remote computers under their control.
Script-kiddies use hacks and cracks created by real programmers, but they use their software without really understanding the code that's doing the work. They are generally trying to just impress their friends.
As indicated above, not all hacks are "malware" (malicious code). User-created shell scripts and batch files that automate tasks (like workstation startup, permission settings, data backup, &c.) are hacks too. Hacks are tools. They are hacked out to make someone's life easier, but not necessarily yours.
TV news and, of course, the movie "Hackers" brought the label hacker to the attention of the wider public, but failed to acknowledge its broader meaning, instead using it as a buzzword, so as not to confuse the less informed members of their audience ("Computer programs are written by people? Like books? I thought other computers made them!").
Others have latched on to the grit, glamour, and rebellion the buzzword hacker invokes; thus, they think of hacking as something of a religion. But, in short, it's nothing more than playing around with computer code. You don't even need to be connected to the internet to do some hacking, just learn a programming language.