Emulation/How does it work?

The Theory edit

At the core of computer science, specifically the study of computation and computability, is the Church-Turing thesis. This thesis states that any function regarded as computable can be computed by a theoretical device called a 'Turing Machine'. A natural corollary of this theorem is that any general-purpose programming language is sufficient to express any algorithm. Basically anything which can be considered computable (in the case of emulation, the computations involving the workings of a machines hardware), can be computed by any general-purpose programming language. The reason the modern emulators are capable of emulating most computers is that the bulk of modern computers are based on the von Neumann architecture. This computer architecture was originally documented by the late John von Neumann in the 1950s, and dictates the basic way in which the modern computer operates (by fetching and executing instructions from memory, these functions can in turn write data back to computer memory during the course of their execution). All other components such as video, sound, keyboard, and mouse are all extensions to this basic architecture. In pseudocode, a basic emulator loop is written:

do
  fetch the next instruction from emulated memory
  execute the instruction
  check for interrupts
while WE ARE STILL EXECUTING INSTRUCTIONS

Implementation edit

Utilizing these principles, emulation software (emulators) essentially convert binary data written for execution on one machine to an equivalent binary form suitable for execution on another machine. This conversion is usually done by taking the original binary instruction and translating it into one or more equivalent instructions for execution on another machine. As a 1:1 instruction conversion rate is usually not possible, the emulator's equivalent to the original program is often much larger.

Often an exact equivalent to one device required by the machine being emulated is not possible. For example, the controller for a NES or PlayStation is usually not present on a PC. This is where any comparable equivalent is used, most commonly the keyboard or a joystick. This compensation is part of what makes emulation difficult, as one must have a clear understanding of how the hardware works in order to emulate it. This is especially true for more challenging pieces of hardware like audio and video chips. There are typically two ways to emulate a given piece of software: interpretation and recompilation.

Interpretation edit

In interpretation, the binary data is read and as each instruction is decoded it is executed; each instruction is executed every time it is encountered. Typically this method is the easiest to implement, but it is also typically the slowest in terms of execution time. Many older console emulators use interpretation.


 

To do:
Add pseudocode examples and explain better for people looking to program an interpreter...


Some examples of emulators that use interpretation are:

Recompilation edit

Recompilation, also called binary translation, involves a directly binary conversion from the binary data for the emulated platform into binary data suitable for execution on the targeted platform. This method usually provides a significant performance boost to emulation software, and it used by most emulators for 'next-generation' consoles. Recompilation typically comes in two varieties:

  • Dynamic recompilation
    • In dynamic recompilation, or dynamic binary translation, the binary data is translated only on the first pass, and is kept in a cache where the translated binary equivalent is stored, and referenced whenever that section is executed again. The basic idea is to do expensive translation and decoding of instructions once, and then reference the cached translated code often. This method is the most commonly employed as it can deal with all forms of code, including self-modifying. Some examples:
  • Static recompilation
    • In static recompilation, or static binary translation, the binary data is translated once in a single pass over the code. The code to be emulated can be scanned and the translation can be optimized by applying various algorithms. The translated data is then usually saved either in a file or memory, where it is referenced by the program upon execution. While this can sometimes improve performance even over dynamic recompilation, if the code is self-modifying, this method usually fails to properly translate the executable, forcing a fallback to a dynamic recompiler or an interpreter to handle the modified code.

See also edit