Memory map of the x86 VM

This is a continuing series of x86 blog posts that basically documents things as needed for both team members and the public.

Unlike the EVM, in x86 memory is used extensively, and gaps in memory is possible. For example, accessing memory at 0x1000 might give an error about there being no memory there, but 0x2000 might have something useful. The EVM is actually the odd one out here as most VM and CPU architectures use this concept to their advantage. Back when 1Mb of memory was rare, the 8086 actually abused this by having most memory located starting at 0, but with the read-only BIOS memory located at 0xF0000, and sometimes with various sections of memory that actually connected to external devices rather than to RAM.

Anyway, Qtum's x86 VM design abuses this feature as well so that certain areas of memory can be shared etc. This is Qtum's core memory map; explanation for each memory section is below:

  • Emergency stack - 0x100, length 64, read-write
  • Contract Code - 0x1000, length 0x10000 (max), read-only
  • Contract Data - 0x100000, length 0x10000 (max), read-write
  • Stack memory - 0x200000, length 8196, read-write
  • Execution data - 0xD0000000, length TBD, read-only
  • Transaction data - 0xD1000000, length dynamic, read-only
  • Blockchain data - 0xD2000000, length TBD, read-only

Note that a memory section can be "init on read". ie, there's not necessarily a need to construct the data for a section before a contract needs to read that data. There may also be additional gas costs for the first access of such a memory area.

Emergency Stack

This is an emergency stack that is used when a double-fault exception occurs. Unused right now since exception support is not implemented

Contract Code

This is where the actual contract code is loaded into. This is read-only for security and optimization purposes. The actual size of this is determined by how much contract code needs to be loaded. Memory beyond the actual size can be read, but it can not be written to since it's read-only, and is fixed at 0

Contract Data

This is where contract data is loaded into. Unlike in the EVM, if you have a variable with some value, there is no need for CPU code to be written to reserve some bit of memory and then set it to that value. Variables in x86 is instead just a simple pointer to a memory address. That memory address is expected by the loader (in normal operating systems, the executable parser and loader.. in Qtum, the VM initialization process). This is nice because we can use fast native code to initialize all the variables in one quick "memcpy" operation. Variable pointers end up pointing to somewhere in this memory section. The actual size is currently fixed at 1Mb, but this will be changed later beyond the prototype.

Note that read-only data is (depending on linker configuration) stored in the code section. Read-only data in common programs includes strings, constants, etc.

Stack memory

This is a section of memory reserved for the x86 call stack. The call stack is used for passing arguments to functions, storing return addresses, local variables etc. The reason for not overloading the data section with stack data is that this section of memory is surrounded by missing memory. This means that if the stack over or under flows it can be immediately "detected" by getting an error thrown. Explicit errors are almost always preferrable to continuing to run with corrupted state.

Execution data

This is data that is read-only and specific to each contract execution. This means that the data will be different when calling a contract within a contract. This data includes things like the sender address (what address initiated this execution), gas limit, and other data.

Transaction data

This is dynamic length data that encodes the complete transaction data which caused the current contract execution. This includes all inputs and outputs. There will be both raw script access, as well as simplified "100 coins was sent to address A" type of data that can be used very easily. There will be helper library functions to assist in script decoding etc.

Blockchain data

This is read-only global blockchain data that is constant for all contract executions in the current block. This includes things like the block gas limit, current block height, previous block hashes, current difficulty, etc.

Rationale

The point of most of this data being available in memory rather than in syscalls is primarily for two reasons:

  • Every syscall has a non-negligible security risk
  • Syscalls are expensive in contract code size, contract code gas overhead, and in VM implementation speeds (it means exiting JIT etc)

Every syscall is an interface for exposing contract code to the outside world. The majority of OS kernel security exploits come from buggy syscall behavior. Thus, there is a big benefit to exposing as few syscalls to the VM as possible. Less ways for the VM to escape the sandbox, less places to audit.

The other factor that encouraged this is that the syscall process is quite expensive in multiple ways. The code size is small, but not negligible. It requires a minimum of about 30 bytes to make a syscall from C. Other languages might be smaller, but are probably bigger. The expensive part of the syscall process is that there must be translation between the syscall and the standard C interface. This involves calling a helper function, preserving a few registers, getting a bunch of items from the stack and storing them in registers, doing the actual syscall, and then finally restoring those previously preserved registers. The 30 byte point previously actually only covers the helper function part. The total number of bytes executed is significantly more but not represented in contract size since that code is written only once and used for every syscall.

And then finally, beyond actual contract ineffeciencies, each syscall can potentially wreck VM performance. In a JIT workflow, the actual hardware CPU can cache the JIT overhead code and JIT compiled contract code very quickly, but upon leaving that section to some other piece of code, it typically must clear at least portions of the cache to make room for this new code. And in a true virtualization workflow using hardware support it becomes even more expensive as it can involve a context switch and hardware level system call. This is a common problem with most languages involving a VM. This includes Webassembly (up until recently, I think that team made a major improvement somehow), Javascript/V8, etc.

Prototype

Qtum x86 is still in prototype. These memory areas are fairly certain, but as with everything in this series, things are still up in the air and this should not be considered a final specification

Posted: 10/27/2018 12:20:38 AM

Exposed Formats in Qtum x86

The following formats are exposed to interface and smart contract code

Runtime Length Encoding Format

On the blockchain all contract data for x86 uses a variant of Runtime Length Encoding (RLE) for bytes that are equal to 0. Non-zero data is not encoded. When a 0 is encountered in the bytestream it becomes a length to indicate how many zero bytes follow it. Having a length of 0 for the RLE is expressly forbidden and will result in an invalid transaction that can not be included in blocks or in the mempool

The RLE payload is prefixed with a 32 bit integer which contains a 24-bit decoded size field. If this size does not exactly match the decoded payload, the transaction is invalid.

the other 8 bits remaining in the 32 bit integer is reserved as a version number for potentially implementing other compression formats

Note that the size of the payload can be verified without allocating additional memory.

Relevant code comment:

//format: decoded payload length (uint32_t) | payload
//the compression only compresses 0 bytes. 
//when 0x00 is encountered in the bytestream, it is transformed converted to a run-length encoding
//encoding examples:
//0x00 00 00 00 -> 0x00 04
//0x00 -> 0x00 01
//0x00 00 -> 0x00 02
//0x00 (repeated 500 times) -> 0x00 0xFF 0x00 0xF5
//as part of the encoding process, a 32 bit length field is prefixed to the payload

The RLE encoding is used on top of any other formats.

Code for encoding and decoding is included in x86Lib, but should be trivial to implement in any language needed.

The length field may be removed at a later time as right now it provides no benefit. The original rationale was so that a constant memory allocation profile could be used, rather than potentially needing multiple allocations. However, the decoded format of data is stored in backing database in Qtum right now, making decoding a one-time operation.

Smart contract bytecode format

The x86 VM works differently from how the EVM encodes and stores contracts. In the EVM, a flat array of opcodes is all that is given. These opcodes are then executed and they decide what part of those opcodes and data is actually stored in permanent storage as the contract bytecode.

With Qtum's x86 VM it's much simpler and generally easier to reason about. A custom binary format with separate sections for options (ie, contract configuration info etc), initialized data, and code. The entire binary is stored directly in permanent storage. This prevents the situation in the EVM where a contract must also contain a contract saver. This is not too complicated in the EVM, but with existing paradigms used in x86 development this would be extremely unusual, probably necessitating the need to write two completely separate programs in order to prevent assumed address memory errors etc.

Tooling is provided in the x86Lib testbench program to convert an ELF program to this custom binary format. The rationale for not using ELF directly is that ELF contains a significant amount of potential complexity, even if most ELF programs are fairly simple to parse. Rather than making a special case "you can't use these features in ELF" type implementation, it is instead easier to use a very simple custom format and providee conversion tools. Because of this, it is also possible to potentially provide flat binary, PE, and other binary format support.

The custom format is a flat byte array with a prefix map that indicates the length of each section:

//The data field available is only a flat data field, so we need some format for storing
//Code, data, and options.
//Thus, it is prefixed with 4 uint32 integers.
//1st, the size of options, 2nd the size of code, 3rd the size of data
//4th is unused (for now) but is kept for padding and alignment purposes
struct ContractMapInfo {
    //This structure is CONSENSUS-CRITICAL
    //Do not add or remove fields nor reorder them!
    uint32_t optionsSize;
    uint32_t codeSize;
    uint32_t dataSize;
    uint32_t reserved;
} __attribute__((__packed__));

After this map, just a flat array of bytes remains.

Flow:

  • LOCATION = 0, which is the first byte after ContractMapInfo
  • copy option data into a buffer from LOCATION to LOCATION + optionsSize. Increment LOCATION by optionsSize
  • copy code data into a buffer from LOCATION to LOCATION + codeSize. Increment LOCATION by codeSize
  • etc etc...

Section description:

  • Options is currently unused and thus optionsSize must be 0. Later it will contain dependency graphs, trusted contract options, and other specialized configuration data. This data is not directly exposed to contract code, unless the option specifically indicates so
  • Code is read-only executable code. This is loaded into the contract's memory space at 0x1000 and currently has a max size of 1Mb. The contract is not allowed to modify the contents of this memory. Their may also be a gas surcharge for executing code outside of this section, since this may make JIT and other optimizations more difficult. Right now this section has a fixed size of 1Mb and any data beyond codeSize is set to 0.
  • Data is initialized and read-write data. This is loaded into the contract's memory space at 0x100000 and also has a 1Mb limit. In some documents this section of memory is also referred to as "scratch memory". Right now this space has a fixed size of 1Mb and any data beyond dataSize is set to 0. The ".BSS" section of ELF files (uninitialized memory) will end up in this memory area as well, but since there is no data associated with it, there is no need to provide any info about .BSS to the VM. The ELF file converter will check that the data size + bss size does not exceed 1Mb.

Smart contract transaction-call format

Contracts have a call stack which can be used to send arguments to called contracts and for called contracts to return data. However, having numerous data entities in a transaction is non-trivial to deal with for verification purposes. So, there will only be 1 actual data field in the transaction format for calling contracts (same as we currently have for the EVM). However, an ABI is provided to simplify the smart contract's responsibiltiy for parsing this. The format can of course by ignored by just having one large argument that is the byte array you want to pass.

Flow:

  • LOCATION = 0, the first byte of the data
  • 32 bit integer, SIZE, is read from LOCATION
  • LOCATION is incremented by 2 (size of integer)
  • Buffer is allocated of size SIZE and then memory is copied to it from LOCATION to LOCATION + SIZE
  • LOCATION is incremented by SIZE
  • Repeat until no data remains

System call interface

Qtum uses syscalls in a very similar way to Linux based systems. The stack is not used at all, and only registers are used. If more arguments are needed than registers exist, then one register needs to point to an area of memory wiht these additional arguments in some kind of structure. The interrupt number 0x40 is used for all Qtum syscalls at this time.

Registers for calling:

  • EAX - system call number
  • EBX, ECX, EDX, ESI, EDI, EBP - arguments 1-6
  • ESP, EFLAGS - unused

Registers upon return from syscall:

  • EAX - return value (0 means success typically, excluding operations that return a length)
  • EBX, ECX, EDX, ESI, EBP, ESP, EFLAGS - not modified

A wrapper function is provided in libqtum to simplify this interface for C ABI using languages:

.global __qtum_syscall
// long syscall(long number, long p1, long p2, long p3, long p4, long p5, long p6)
__qtum_syscall:
  push %ebp
  mov %esp,%ebp
  push %edi
  push %esi
  push %ebx
  mov 8+0*4(%ebp),%eax
  mov 8+1*4(%ebp),%ebx
  mov 8+2*4(%ebp),%ecx
  mov 8+3*4(%ebp),%edx
  mov 8+4*4(%ebp),%esi
  mov 8+5*4(%ebp),%edi
  mov 8+6*4(%ebp),%ebp
  int $0x40
  pop %ebx
  pop %esi
  pop %edi
  pop %ebp
  ret

Most people should never need to do anything with syscalls however, as libqtum provides easy to use wrapper functions for each syscall.

Posted: 10/22/2018 10:52:10 PM