int main(){
printf("Hello!");
return 0;
}
Assembly Code
.LC0:
.string
"Hello!"
.text
.globl
main
.type
main, @function
main:
.LFB0:
.cfi_startproc
pushq
%rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq
%rsp, %rbp
.cfi_def_cfa_register 6
movl
$.LC0, %edi
movl
$0, %eax
call
printf
movl
$0, %eax
popq
%rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
8
#Assemblers
•
Assemblers convert the assembly code into binary opcodes
•
Each instruction is represented by a
binary opcode
•
mov rax,1 = 0x48C7C001000000
•
The assembler will produce an object file
•
Object files contain machine code
•
This file will contain fields to be filled by the linker
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
9
#Assemblers
–
An Example
Assembly Code
.LC0:
.string
"Hello!"
.text
.globl
main
.type
main, @function
main:
.LFB0:
.cfi_startproc
pushq
%rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq
%rsp, %rbp
.cfi_def_cfa_register 6
movl
$.LC0, %edi
movl
$0, %eax
call
printf
movl
$0, %eax
popq
%rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
Assembled Bytecode
55 48 89 e5 bf 00 00 00 00 b8 00 00 00 00 e8 00
00 00 00 b8 00 00 00 00 5d c3 48 65 6c 6c 6f 21
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
10
#Linking
•
More is needed before the object
code can be executed
•
Entry point, or starting instruction must be defined
•
Used to define memory regions on embedded platforms
•
Often done through linker scripts
•
The result of linking is the final executable program
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
11
#Linking
–
An Example
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
12
gcc
–
o session1 session1.o
#Output Formats
•
The output of the compilation process can take many forms:
•
PE (Windows)
•
ELF (Linux)
•
Mach-O (OSX)
•
COFF/ECOFF
•
This output file is often your starting point as a
reverse engineer
•
For this course we will focus on the ELF format
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
13
#ELF Files
–
An Overview
•
ELF = Executable Linking Format
•
Contains information identifying:
•
OS,endianness,etc
•
ELF files provide information needed for
execution by the OS
•
ELF Files can be broken up into three components
•
ELF Header
•
Sections
•
Segments
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
14
#ELF Files: Symbols
•
Symbols are used to aid in debugging and provide context to the
loader
•
The removal of these symbols makes things more difficult
to reverse engineer
•
ELF objects contain a maximum of two symbol tables
•
.symtab: Symbols used for debugging / labelling (useful for RE!)
•
.dynsym: Contains symbols needed for dynamic linking
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
15
#ELF Files: A Review
•
ELF files define how the program is laid out in memory
•
Used by the OS loader
to create a process
•
ELF files contain machine code that we will be reverse engineering
•
Many tools exist to analyze and read ELF files:
•
dumpelf
•
readelf
•
objdump
•
elfutils (package containing multiple utilities)
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
16
#SE Review: Pixelated Edition
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
17
Compile
Assemble
#SE Review: Pixelated Edition
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
18
Link
#Intermission:
Why Review
this?
•
Information can be limited when performing SRE
•
Understanding core concepts is important
•
File formats can be a treasure trove of information
•
Our goal is to work backwards from machine code
•
The ELF file
will contain machine code
•
This machine code can be converted BACK into assembly language!
•
Machine code -> Assembly Language = Disassembly!
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
19
#Computer Architecture 101
•
When a program is running, the following must happen:
1. An instruction is read into memory
2. The instruction is processed by the Arithmetic Logic Unit
3. The result of the operation is stored
into registers or memory
•
For this course, we’ll deconstruct C programs info four core
components
•
Registers
•
Instructions
•
Stack
•
Heap
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
20
# Computer Architecture 101
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
21
#x86_64 Architecture
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
22
•
We will focus on Intel’s x86
-64 instruction set
•
64 bit version of the x86 instruction set
•
Contains multiple operating modes for backwards compatibility
•
Original specification was created by AMD in 2000
•
Commonly used in desktop and laptop computers
#x86_64: Registers
•
Registers are small storage
areas used by the processor
•
x86_64 assembly uses 16 64 bit general purpose registers (R8-15 not
in table)
7/19/2020
Hackaday U
–
Introduction to Software Reverse Engineering
23