Yangbo's Blog

MIT 6.828 Lab1 - Booting a PC (PC Bootstrap & Bootloader)

Boot-up procedure: ROM BIOS -> Boot Loader -> Kernel.

BIOS sets up an interrupt descriptor table and initializes various devices such as the VGA display. After initializing the PCI bus and all the important devices, it searches for a bootable device and loads the 512 byte boot sector into memory at physical addresses 0x7c00 through 0x7dff, and then uses a jmp instruction to set the CS:IP to 0000:7c00, passing control to the boot loader.

The boot loader switches the processor from real mode to 32-bit protected mode. It also reads the kernel from the hard disk by directly accesing the disk device registers via the x86’s special I/O instructions. Then it transfers control to the kernel.

Exercise 2

A few step instructions were used to trace into the ROM BIOS and understand what it might be doing.

1
2
3
4
0xffff0: ljmp $0xf000, $0xe05b
0xfe05b: cmpl $0x0, $cs:0x6ac8
0xfe062: jne 0xfd2e1
0xfe066: xor %dx, %dx

Below can be inferred from the above instructions:

  • The IBM PC starts executing at physical address 0x000ffff0, which is at the very top of the 64KB area reserved for the ROM BIOS.
  • The PC starts executing with CS = 0xf000 and IP = 0xfff0.
  • The first instruction to be executed is a jmp instruction, which jumps to the segmented address CS = 0xf000 and IP = 0xe05b.
  • The instant number 0x0 is compared with the value at memory address 0xf6ac8.
  • The above mentoned value equals 0, since the address of the last instruction is 0xfe066, instead of 0xfd2e1, which means the jne 0xfd2e1 instruction didn’t take effect.
  • dx is cleared to be 0 by the last instruction.
1
2
3
4
5
6
0xfe068: mov %dx %ss
0xfe06a: mov $0x7000, %esp
0xfe070: mov $0xf34d2, %edx
0xfe076: jmp 0xfd15c
0xfd15c: mov %eax, %ecx
0xfd15f: cli
  • Set stack segment from 0x0 to 0x7000 (stack top), as well as some other registers.
  • The last instruction disables interrupts before entering the protected mode.
1
2
3
4
0xfd160: cld
0xfd161: mov $0x8f, %eax
0xfd167: out %al, $0x70
0xfd169: in $0x71, %al
  • cld clears the direction flag, which is used to influence the direction in which some of the instructions work when used with the REP prefix.
  • Ports 0x70 and 0x71 are used to control the CMOS device, referring to here.
  • 0x8f is written to the port to close the NMI (non-maskable interrupt).
1
2
3
0xfd16b: in $0x92, %al
0xfd16d: or $0x2, %al
0xfd16f: out %al, $0x92
  • Similar to above, port 0x92 is set to 1 to indicate active line A20, which is necessary to enter the protected mode.
1
2
0xfd171: lidtw %cs:0x6ab8
0xfd177: lgdtw %cs:0x6a74
  • Load the GDTR (global descriptor table register), a 48-bit register that includes two parts, the base address and the boundary of the global descriptor table.
  • The global descriptor table is used to save descriptors of each segment, such as CS.
  • Each descriptor takes 64 bits, detailed bits assignment see below:
  • The lgdt instruction loads 6 bytes starting from %cs:0x6a74 to GDTR.
1
2
3
0xfd17d: mov %cr0, %eax
0xfd180: or $0x1, %eax
0xfd184: mov %eax, %cr0
  • CR0 is a 32-bit control register, whose first bit (bit 0) is the Protection Enable bit.
  • The above three lines set the PE bit to be 1, to enable the protected mode.
1
2
3
4
5
6
7
0xfd187: ljmpl $0x8, $0xfd18f
0xfd18f: mov $0x10, %eax
0xfd194: mov %eax, %ds
0xfd196: mov %eax, %es
0xfd198: mov %eax, %ss
0xfd19a: mov %eax, %fs
0xfd19c: mov %eax, %gs
  • 0x8 and 0x10 are segment selectors, which are basically the indices of entries in the GDT (see P197 of book x86 Assembly Language - from Real Mode to Protected Mode).
  • More info can be found in the 11th chapter of book x86 Assembly Language - from Real Mode to Protected Mode.

Exercise 3

Each ELF (Executable and Linkable Format) file is made up of one ELF header, followed by file data. The file data can include:

  • Program header table, describing zero or more segments
  • Section header table, describing zero or more sections
  • Data referred to by entries in the program header table or section header table

Sections in the kernel ELF are:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ objdump -h obj/kern/kernel
obj/kern/kernel: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00001871 f0100000 00100000 00001000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000714 f0101880 00101880 00002880 2**5
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .stab 000038d1 f0101f94 00101f94 00002f94 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .stabstr 000018bb f0105865 00105865 00006865 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .data 0000a300 f0108000 00108000 00009000 2**12
CONTENTS, ALLOC, LOAD, DATA
5 .bss 00000644 f0112300 00112300 00013300 2**5
ALLOC
6 .comment 00000034 00000000 00000000 00013300 2**0
CONTENTS, READONLY

LMA (load address) is the memory address at which that section should be loaded into memory.
VMA (link address) is the memory address from which the section expects to execute.

The boot loader (see boot/main.c) uses the ELF program headers to decide how to load the sections. The program headers specify which parts of the ELF object to load into memory and the destination address each should occupy. You can inspect the program headers by executing:

1
2
3
4
5
6
7
8
9
10
$ objdump -p obj/kern/kernel
obj/kern/kernel: file format elf32-i386
Program Header:
LOAD off 0x00001000 vaddr 0xf0100000 paddr 0x00100000 align 2**12
filesz 0x00007120 memsz 0x00007120 flags r-x
LOAD off 0x00009000 vaddr 0xf0108000 paddr 0x00108000 align 2**12
filesz 0x0000a300 memsz 0x0000a944 flags rw-
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
filesz 0x00000000 memsz 0x00000000 flags rwx

The areas of the ELF object that need to be loaded into memory are those that are marked as “LOAD”. Other information for each program header is given, such as the virtual address (“vaddr”), the physical address (“paddr”), and the size of the loaded area (“memsz” and “filesz”). In boot/main.c, the ph->p_pa field of each program header contains the segment’s destination physical address.

Exercise 5

Set the first breakpoint at 0x7c00, and change the link address from 0x7c00 to be 0x7e00 in boot/Makefrag. Beacuse the BIOS loaded boot loader into 0x7c00, the first couple of instructions don’t seem to have problem. However, the label addresses in the boot loader code have been changed, so the first instruction that will break is lgdt gdtdesc. By checking the contents of the label address, we’ll find that all 6 bytes of the GDTR will be filled with 0s, instead of the correct values.

1
2
3
4
5
6
7
8
9
10
11
(gdb) b * 0x7c1e
Breakpoint 2 at 0x7c1e
(gdb) c
Continuing.
[ 0:7c1e] => 0x7c1e: lgdtw 0x7e64
Breakpoint 2, 0x00007c1e in ?? ()
(gdb) x/6xb 0x7e64
0x7e64: 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) x/6xb 0x7c64
0x7c64: 0x17 0x00 0x4c 0x7e 0x00 0x00

The e_entry field in the ELF header holds the link address of the entry point in the program: the memory address in the program’s text section at which the program should begin executing.

1
2
3
4
5
$ objdump -f obj/kern/kernel
obj/kern/kernel: file format elf32-i386
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0010000c

The minimal ELF loader in boot/main.c reads each section of the kernel from disk into memory at the section’s load address and then jumps to the kernel’s entry point.

Exercise 6

The difference is because the kernel code was loaded to 0x100000 by the boot loader.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(gdb) b * 0x7c00
Breakpoint 1 at 0x7c00
(gdb) c
Continuing.
[ 0:7c00] => 0x7c00: cli
Breakpoint 1, 0x00007c00 in ?? ()
(gdb) x/16xb 0x100000
0x100000: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x100008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) b * 0x10000c
Breakpoint 2 at 0x10000c
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x10000c: movw $0x1234,0x472
Breakpoint 2, 0x0010000c in ?? ()
(gdb) x/16xb 0x100000
0x100000: 0x02 0xb0 0xad 0x1b 0x00 0x00 0x00 0x00
0x100008: 0xfe 0x4f 0x52 0xe4 0x66 0xc7 0x05 0x72