Reverse Engineering for Beginners

Welcome!  This page will serve as a writeup on the reverse engineering online workshop by Ophir Harpaz.  Background info on the workshop can be found on its about page.

There is no official write-up yet as of me writing this, so hopefully any beginners can come here and get some hints if they are stuck, or get explanations to the answers.

Leggo!

Three Preparation Assignments

Before we jump right in, there are three preparation assignments whose purpose is to familiarize the user with x86 assembly, the call stack, and tools such as debuggers and disassemblers. 

Assignment 1

  • Read all the material and go thru the x86 tutorial: here.
  • Exercises
  1. What is foo in the following example? How much space does it occupy in memory?
.data
   foo DW 1,1,2,3,5

Answer: Here foo is a data declaration, preceded by the .DATA directive. The DW specifies a double word, two bytes of data.  Following is a list of integers 1,1,2,3, and 5.  Consecutive declarations in a list represents an array.  So, here foo is a label for an array of integers, each taking up two bytes in memory.  The total space it occupies is 1o bytes.

 

2. What is the value stored in EAX by the end of this code?

mov eax,0x2
mov ebx,eax
shl eax,0x2
add eax,ebx
and eax,0x8

Answer: Let’s go line-by-line.

0x2 is moved into the eax register

     eax = 2

The value in eax is moved into ebx

eax = 2, ebx = 2

eax is shifted left by 0x2; in binary, move each bit left by two spaces and pad the right-most two bits with 0s.

This is effectively multiplying the value by two, each shift. 

0000 0010 shifted left by two is 0000 1000; eax = 8

add ebx to eax, and store result in eax

eax = 10, ebx = 2

Perform a bit-wise AND with eax and 0x8

1010 AND 1000 is 1000

eax = 8

 

Bonus: what should be the value of EAX at the beginning of the following code, such that by the end of it, EAX = 0?

xor eax, eax

 

Answer: The XOR operation sets the resultant bit to 1, if and only if the bits from the operands are different. So, anything XOR itself is 0.

 

Assignment 2

This assignment goes over the call stack and using a debugger.

Riddles

Which register holds the address of the top of the stack?

Answer: The ESP register points to top of the stack

Suppose the function multiply(x, y) creates 2 local variables – tmp1 and tmp2 – during its execution. What instruction do you expect seeing that allocates memory for these variables?

Allocating space for local variables is done by subtracting from stack pointer.  Assuming each variable is 4 bytes (DW), we would expect to see sub esp, 0x8.

In what order will multiply’s parameters, x and y, be pushed onto the stack before the call to multiply?

When a function is called in assembly, the parameters passed to the function are pushed onto the stack first.  Usually (depending on the calling convention), they are pushed in reverse order: first y, then x.

How can each of them be accessed during the function execution? Write the exact expressions.

Order of events: Parameters are pushed to stack, then return address is pushed, and finally EBP is set to be the top of the stack.  EBP is a reference point that doesn’t change while in the function’s scope and so offsets are described in relation to EBP.   Common convention for the stack is high to low.  Older items pushed to stack are at higher memory addresses than newer items.  So, EBP + 4 is the return address…

Y can be accessed by EBP + 0x8, and X can be accessed by EBP + 0xC

The offsets are represented in hex; remember 8 + 4 = 12, but represented as C in hex.

Assignment 3

This assignment goes over IDA disassembler. 

Riddles

Figure out whether your computer is running a 32-bit or a 64-bit version of windows.

Answer: Who uses 32-bit Windows anymore? Unless you have very specific legacy software that needs to be supported… but then you can still run that software on 64-bit Windows.

 

What does the following code print? If you have no idea what these percent-signs stand for, take a look here.

int num = 2;
int *ptr = #
char c = ‘b’;
char *ptr2 = &c;
char mid[] = “|| !”;
printf(“%d %c %s %d %c”, *ptr, c, mid, num, *ptr2);

Answer: 2 b || ! 2 b  (to be or not to be  – Shakespear or something)

The printf statement’s first parameter is a string that has 5 expressions separated by spaces.  Each expression has a %-sign and a letter.  These are placeholders – %d represents digit, %c – character, and %s – string.

The rest of the parameters are (in order) what replaces the placeholders. 

*ptr is the value at the address that ptr holds.  In this case, ptr holds the address of num, and the value of num is 2, so *ptr = 2.

c is self-explanatory, it is just the letter ‘b’.

Great! We just completed the introduction/background material needed for the workshop!

Sessions

Session 1

This slideshow contains a deep overview on x86 Assembly. Please read through it and make sure you understand.

Session 2

This slideshow contains a short introduction to the IDA disassembler along with some tips and tricks. Follow the slides with your IDA open.

Session 3

The Playground session consists of 2 exercises:
Password and Good_Luck.

Download each exercise and try to understand (using IDA) what the program does and what input it expects.

Check yourself with the quiz below.

 

 

Password Exercise

Download the password program and run it:

So we just have to reverse the program and find the password.  Easy!

Let’s fire up IDA and examine this binary.  The first thing I do is search for strings.  This subview can be toggled with Shift+F12.  Here is a list of strings that are found in the program.

Just by examining the list, we can infer that the correct password is ‘cr4ckm3’.  But let’s be sure by examining the disassembled code.  Double-clicking on the ‘Enter password:’ string brings us to the place in memory the string is stored.  This view shows us any cross references to the string.

Double clicking on the DATA XREF: sub_401096 will take us to that function.  Here we have a graph view of the function, separated into blocks, and organized by program flow.

Three blocks are shown, with one in the center top, and two on the bottom.  The left block is pointed to by a red arrow.  This arrow signifies the program will go to the left block if the jump statement before it is false (red), and execution will continue in the right block if the jump statement is true.

The top block has the string “Enter password: “, which is what we saw in the program.  Now we can infer that there are two options after we input the password.  It can either be correct, or wrong.  What signifies a correct input though?  At the bottom of the center block, a string “cr4ckm3” is pushed onto the stack, along with Str1, and then strncmp is called.  This looks like the program compares Str1 with “cr4ckm3”.  If they match, the program will execute the left block.

Great! We understand the program a little better now.  We have high confidence that the password is “cr4ckm3“.

To be sure, launch the program and type in cr4ckm3.

Boo-ya!

Good_Luck Exercise

The Good_Luck program has no input or output.  Let’s load it up in IDA and view the strings like before.  We have two clues: “Very correct!”, and “Hmm nope.”.  These seem like the responses to the first exercise contingent on the password.

Just like before, click on the cross reference to the string and follow it to the function.  This isn’t as easy as the first exercise – the password isn’t in plain text.  But we can see that the third block down from the top has two arrows leading into two blocks.  So somewhere before these two blocks determines whether the password is correct or not.

Let’s start examining from the top.

The first block (start of function) does a comparison, then jumps to one of two possible blocks: the block under it, or the block at the bottom of the window. That bottom block looks like function cleanup and a return statement.

What is the first block checking for? If it doesn’t find what its looking for, then the function returns and the program exits.

The function is checking to see if two command line arguments are provided. The first argument is the name of the program, and I’ll assume the second argument is the password.

Let’s pass another parameter to the program; we expect an output now.  Well, looks like the program just ignores alphabet input and only accepts integers.  We are getting somewhere!

The next block moves memory address [ebp+arg_4] into eax register.  Then pushes [eax+4] onto the stack and calls atoi.  What just happened?

Recall that we push function arguments onto the stack before a function call. Atoi is called, which converts a string to an integer. This converts the number we passed along with the program name on the command line from a string into a number the computer can use.

Then we cleanup that argument from the stack by adding 4 to esp.

Eax is tested against itself; this makes sure that the return value of atoi is not 0.  If it was, then something error’d out and then the program jumps to the end.  Otherwise, if the atoi conversion went successfully, we continue.

Examining the next block, we encounter a lea instruction.  Lea literally writes the address of the second operand into the first, and the second operand already has ‘[ ]’ around it, meaning address.  This line is a bit confusing but what it does is put whatever is in the brackets into eax.  Eax is then compared to 0x181A and if they match, the password is correct.

Effectively, we have [eax + eax * 4] = 181Ah, where eax is the number we pass into the command line after the program name.  Using algebra, we can simplify the left hand side to be

    • eax(1 + 4) = 5*eax.
    • The equation then simplifies to 5*eax = 181Ah.
    • eax = 181Ah / 5
    • eax = 4D2h
    • eax = 1234

Awesome! We’re getting the hang of it!

Julia

As always, start by running the program and see what we can find.  This time, the command line specifies we need to input the password as argument, and tells us that the password we entered was incorrect. 

Open the binary in IDA and view strings again, and follow any relevant strings to their cross-referenced functions.  Here I followed the string “Incorrect Password.\n”, met by the biggest function we’ve seen yet!

Looks like there is a loop that does something with the input, then when loop is over, the green box is executed and that’s where the input string is compared against “VIMwXliFiwx”.  So here we have a loop that modifies our input and when it’s done, it should match “VIMwXliFiwx”.  

The loop calls a function, sub_401160.  This function seems complex, with many blocks and jumps, but we can have IDA reconstruct the C-style pseudocode for this function with F5.  Looks like the outer loop checks each character in the input array and calls this function on that character.  There are two if-statements; the first one is checking if the character is in-between 97 and 122 (inclusive), and the second if-statement checks if the character is in-between 65 and 90.  What do these numbers mean?  Why are we comparing a character type to integers?

This function is checking whether or not a character is in the sets: [a-z] (first if-statement) and [A-Z] (second if-statement). Either way, as long as the character is alphabetic, the function adds dword_403008 to it.  

Double-click the dword_403008 label and it brings us to the section of code it is defined in.  dword_403008 represents 4.  

If my input string starts with a ‘R’, it’s ascii representation is 82.  82 results in the second if-statement resolving to true, and 4 is added to the character ‘R’.  82 + 4 = 86, which is the character ‘V’.  We can manually subtract 4 from each character in the string “VIMwXliFiwx” and get the correct input string that solves this exercise.  

I wrote a quick c++ program that does this for us and returns the result. 

The correct input that solves this exercise is “REIsTheBest”.

Patch Minesweeper

The last and final section of the workshop is a guided session in finding and patching mines in Microsoft’s Minesweeper game.  This guide will show you how to disassemble the binary with IDA and show you a systematic approach to finding the function responsible for creating a new mine field each game. By examining the assembly code, the section of memory representing the mine field can be patched, changing the mines into flags.  

What can you do with a patched version of minesweeper? Get mad high scores, yo (completed a game in 4 seconds)!  

My patched version of minesweeper is available for download here: 

Zip file SHA-256 Hash: 552f08e62ba32988a4e2b536b08337a655cc14b32a30947075749b5b67904e48

Zip file VirusTotal:  https://www.virustotal.com/gui/file/552f08e62ba32988a4e2b536b08337a655cc14b32a30947075749b5b67904e48/detection

.exe file SHA-256 Hash: 3ec4e0cf34e46dcedb2a1d8c746f33c6ecfccfc9fc5f944efc5e4e146d3dbf0e

.exe file VirusTotal: https://www.virustotal.com/gui/file/3ec4e0cf34e46dcedb2a1d8c746f33c6ecfccfc9fc5f944efc5e4e146d3dbf0e/detection

Leave a Reply