CPU emulators A quick look on their types, principles of design and operation

Grigory Rechistov

Intel-MIPT lab, MDSP project

October 2010

1

▲□▶ ▲□▶ ▲□▶ ▲□▶ □ の 0 0

#### The definition

 $\mathsf{Emulator}-\mathsf{a}$  program that executes models of certain pieces of hardware.

◆□▶ ◆□▶ ◆臣▶ ◆臣▶ 臣 のへで

#### The definition

Emulator — a program that executes models of certain pieces of hardware.

Actually, there is no precise definition.

▲□▶ ▲□▶ ▲□▶ ▲□▶ □ の 0 0

Many meanings exist. I just interchange both terms.  $\ll$ Emulation $\gg$  was introduced in 1957 by IBM and since then it was reinterpreted many times.

In Russian, we usually use simple «Моделирование».

How a simulator is usually designed

4

・<</li>
 ・<</li>
 ・<</li>
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・</

# How a simulator is usually designed



#### How a simulator is usually designed



... But we can build one step by step!

・ロト ・回ト ・ヨト ・ヨト

э

#### Functional model

Simulates only the functional features of hardware.



イロト イポト イヨト イヨト

э

#### Functional model

Simulates only the functional features of hardware.



Question: what can be included in system state?

イロト イポト イヨト イヨト

## Clock-precise model

Sometimes is called «Performance». Adds a concept of internal temporal state.



## Clock-precise model

Sometimes is called «Performance». Adds a concept of internal temporal state.



Question: what is in temporal state?

э

#### A bit of distraction: a 1 bit summator

7

◆□ > ◆□ > ◆目 > ◆目 > ○ ■ ○ ○ ○ ○

## A bit of distraction: a 1 bit summator

Functional model:



## A bit of distraction: a 1 bit summator

Functional model:



Task: write a table of boolean truth for this adder.

イロト 不同 トイヨト イヨト ヨー うらう

#### Real 1 bit half-summator

8

◆□ > ◆□ > ◆目 > ◆目 > ○ ■ ○ ○ ○ ○

#### Real 1 bit half-summator



Note: clock synchronization circuitry is not shown.

・ロト ・四ト ・ヨト ・ヨト

э

#### 2 bit full summator with latency

◆□ > ◆□ > ◆目 > ◆目 > ○ ■ ○ ○ ○ ○

#### 2 bit full summator with latency



Note: clock synchronization circuitry is not shown.

イロト イポト イヨト イヨト

э

## 2 bit full summator with latency



Note: clock synchronization circuitry is not shown. Question: what type of model this scheme is — functional or clock-precise?

#### A clock-presise variant of model of full 2 bit adder



э

## A clock-presise variant of model of full 2 bit adder



Please note that we don't specify the size of numbers being added or any details of adder. We simply modelled functions **and** timings!

Attempts to be on its own. But for this it needs an entity with notion of time!

◆□▶ ◆□▶ ◆臣▶ ◆臣▶ 臣 のへで

Attempts to be on its own. But for this it needs an entity with notion of time!



◆□▶ ◆□▶ ◆臣▶ ◆臣▶ 臣 のへで

# Full platform functional models

Simulates only the functional features of hardware.



- 1. Hardly can be used to measure speed, performance, power or certain other parameters of HW.
- 2. Are able to emulate quite complex systems from booting OS to airplane complexes.
- 3. Comparatively fast emulation (slowdown  $\times 2 \times 100$ ).

# Full platform functional models

Simulates only the functional features of hardware.



- 1. Hardly can be used to measure speed, performance, power or certain other parameters of HW.
- 2. Are able to emulate quite complex systems from booting OS to airplane complexes.
- 3. Comparatively fast emulation (slowdown  $\times 2 \times 100$ ).

#### Simulation of time

Problem: there are many devices to simulate and only one time line. . .

◆□▶ ◆□▶ ◆ □▶ ◆ □▶ - □ - のへで

#### Simulation of time

Problem: there are many devices to simulate and only one time line...

It is a task of scheduling similar to one performed in an OS.

- Cooperative multitasking
- Preemptive multitasking

イロト 不同 トイヨト イヨト ヨー うらう

#### Simulation of time

Problem: there are many devices to simulate and only one time line. . .

- It is a task of scheduling similar to one performed in an OS.
  - Cooperative multitasking
  - Preemptive multitasking

Another issue is

 Problem of synchronization of all devices' point of view on time.

Application mode provide just minimal set of components able to run the particular workloads, e.g. CPU, memory, IO.
Full platform simulates the complete set of HW found in a particular computing system: CPU, memory, network, sound, display, disk, keyboard/mouse...
Hybrid use a junction of software components with ones modeled with hardware models e.g. on FPGA.
Distributed are placed on several computers and interact over the network.

- Usually the first one to be implemented for a new architecture.
- Cannot simulate any OS booting (as if there is any OS for the new architecture!)
- But applications need an OS to work! Thus such simulator has to implement some minimal non-architectural ABI.
- Can be used to build some performance simulator and let it measure speed etc [6].

#### Usually the first one to be implemented for a new architecture.

- Cannot simulate any OS booting (as if there is any OS for the new architecture!)
- But applications need an OS to work! Thus such simulator has to implement some minimal non-architectural ABI.
- Can be used to build some performance simulator and let it measure speed etc [6].

- Usually the first one to be implemented for a new architecture.
- Cannot simulate any OS booting (as if there is any OS for the new architecture!)
- But applications need an OS to work! Thus such simulator has to implement some minimal non-architectural ABI.
- Can be used to build some performance simulator and let it measure speed etc [6].

- Usually the first one to be implemented for a new architecture.
- Cannot simulate any OS booting (as if there is any OS for the new architecture!)
- But applications need an OS to work! Thus such simulator has to implement some minimal non-architectural ABI.
- Can be used to build some performance simulator and let it measure speed etc [6].

- Usually the first one to be implemented for a new architecture.
- Cannot simulate any OS booting (as if there is any OS for the new architecture!)
- But applications need an OS to work! Thus such simulator has to implement some minimal non-architectural ABI.
- Can be used to build some performance simulator and let it measure speed etc [6].

## General steps in making an emulator [4]

- Model the CPU and memory.
- Emulate an instruction set, create disassembler.
- Stub out the rest of the architecture.
- Get basic IO working.
- Work on virtualizing the remaining hardware.

イロト 不同 トイヨト イヨト ヨー うらう

# The simpliest CPU emulation [3]

```
for(;;) {
    OpCode = Memory[PC];
    PC++; // program counter
    Counter -= Cycles[OpCode];
    switch(OpCode) {
        case OpCode1:
            Simulate1(); break;
        case OpCode2:
            Simulate2(); break;
        //...
    }
    if(Counter <= 0) {
        // check for interrupts and other tasks
        Counter += InterruptPeriod;
        if (ExitRequired) break;
    }
}
```

< ロ > < 同 > < 回 > < 回 > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ >

#### Memory emulation

The simplest way to access emulated memory is to treat it as a plain array of items:

```
Data = Memory[Address1];
Memory[Address2] = Data;
```

Such simple memory access is not always possible for following reasons:

- Paged Memory.
- Mirrored Memory.
- ROM protection.
- Memory-Mapped I/O. Accesses to such memory locations produce «special effects» and therefore should be tracked.

```
Data=ReadMemory(Address1);
WriteMemory(Address2,Data);
```

・ロット 本語 とうき マ ほう 二日 -

### A bit low on speed?

The common rule: don't do the work already done.

# A bit low on speed?

The common rule: don't do the work already done. Traces



# A bit low on speed?

#### Hyper Simulation



Wide range of techniques is used

- ▶ Not so lame interpretation: hashing, lookup tables.
- Binary translation.
- Just-in-time compilation.
- Direct execution using virtualization.

#### $1. \ \mbox{When was term} \ \mbox{emulation} \gg \ \mbox{invented}?$

#### 1. When was term $\ll\!emulation\!\gg\!invented?$ In 1957.

- 1. When was term  $\ll\!emulation\!\gg\!invented?$  In 1957.
- 2. What type of emulators is the slowest?

- 1. When was term  $\ll\!emulation\!\gg\!invented?$  In 1957.
- 2. What type of emulators is the slowest? Performance

- 1. When was term  $\ll\!emulation\!\gg\!invented?$  In 1957.
- 2. What type of emulators is the slowest? Performance
- 3. What type of emulators is the most accurate?

- 1. When was term  $\ll\!emulation\!\gg\!invented?$  In 1957.
- 2. What type of emulators is the slowest? Performance
- 3. What type of emulators is the most accurate? it depends on task.

- 1. When was term  $\ll\!emulation\!\gg\!invented?$  In 1957.
- 2. What type of emulators is the slowest? Performance
- 3. What type of emulators is the most accurate? it depends on task.

#### More to read

- Крис Касперски. Техника оптимизации программ. Эффективное использование памяти. СПб. БХВ-Петербург, 2003.
- Wikipedia http://en.wikipedia.org/wiki/Emulator
- Marat Fayzullin. How To Write a Computer Emulator http://fms.komkon.org/EMUL8/HOWTO.html
- Tony Gray. How to Write an Emulator http://www.tucs.org.au/how-to-write-an-emulator/
- Daniel Boris. How Do I Write an Emulator?, Part 1, R1.00
- Carole Dulong et al. The Making of a Compiler for the Intel Itanium Processor. Intel Technology Journal Q3, 2001



・ロト ・四ト ・ヨト ・ヨト

#### Emulators



#### running Fedora 13.

#### **Emulators**



25

running ReactOS.

#### Emulators



running MZX running ZX Spectrum.

イロン イボン イヨン イヨン

э

Oracle VirtualBox

◆□ > ◆□ > ◆目 > ◆目 > ○ ■ ○ ○ ○ ○

- Oracle VirtualBox
- Microsoft VirtualPC

- Oracle VirtualBox
- Microsoft VirtualPC
- Bochs

- Oracle VirtualBox
- Microsoft VirtualPC
- Bochs
- Qemu

- Oracle VirtualBox
- Microsoft VirtualPC
- Bochs
- Qemu
- Apple Rosetta

- Oracle VirtualBox
- Microsoft VirtualPC
- Bochs
- Qemu
- Apple Rosetta
- AMD SimNOW!

- Oracle VirtualBox
- Microsoft VirtualPC
- Bochs
- Qemu
- Apple Rosetta
- AMD SimNOW!
- WindRiver Simics

▲□▶ ▲□▶ ▲□▶ ▲□▶ □ の 0 0

- Oracle VirtualBox
- Microsoft VirtualPC
- Bochs
- Qemu
- Apple Rosetta
- AMD SimNOW!
- WindRiver Simics
- Parallels

- Oracle VirtualBox
- Microsoft VirtualPC
- Bochs
- Qemu
- Apple Rosetta
- AMD SimNOW!
- WindRiver Simics
- Parallels
- VMWare Workstation and Server

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三日 うのの

- Oracle VirtualBox
- Microsoft VirtualPC
- Bochs
- Qemu
- Apple Rosetta
- AMD SimNOW!
- WindRiver Simics
- Parallels
- VMWare Workstation and Server
- ARMware

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三日 うのの

- Oracle VirtualBox
- Microsoft VirtualPC
- Bochs
- Qemu
- Apple Rosetta
- AMD SimNOW!
- WindRiver Simics
- Parallels
- VMWare Workstation and Server
- ARMware
- ...thousands of them! For every and each of architecture including PDP-10, ZX Spectrum, NES and Itanium.

#### Simulation of time



#### Other pecularities

- Simulation of MP systems.
- Endiannes.
- Speed of emulation.