Exposed Formats in Qtum x86

The following formats are exposed to interface and smart contract code

Runtime Length Encoding Format

On the blockchain all contract data for x86 uses a variant of Runtime Length Encoding (RLE) for bytes that are equal to 0. Non-zero data is not encoded. When a 0 is encountered in the bytestream it becomes a length to indicate how many zero bytes follow it. Having a length of 0 for the RLE is expressly forbidden and will result in an invalid transaction that can not be included in blocks or in the mempool

The RLE payload is prefixed with a 32 bit integer which contains a 24-bit decoded size field. If this size does not exactly match the decoded payload, the transaction is invalid.

the other 8 bits remaining in the 32 bit integer is reserved as a version number for potentially implementing other compression formats

Note that the size of the payload can be verified without allocating additional memory.

Relevant code comment:

//format: decoded payload length (uint32_t) | payload
//the compression only compresses 0 bytes. 
//when 0x00 is encountered in the bytestream, it is transformed converted to a run-length encoding
//encoding examples:
//0x00 00 00 00 -> 0x00 04
//0x00 -> 0x00 01
//0x00 00 -> 0x00 02
//0x00 (repeated 500 times) -> 0x00 0xFF 0x00 0xF5
//as part of the encoding process, a 32 bit length field is prefixed to the payload

The RLE encoding is used on top of any other formats.

Code for encoding and decoding is included in x86Lib, but should be trivial to implement in any language needed.

The length field may be removed at a later time as right now it provides no benefit. The original rationale was so that a constant memory allocation profile could be used, rather than potentially needing multiple allocations. However, the decoded format of data is stored in backing database in Qtum right now, making decoding a one-time operation.

Smart contract bytecode format

The x86 VM works differently from how the EVM encodes and stores contracts. In the EVM, a flat array of opcodes is all that is given. These opcodes are then executed and they decide what part of those opcodes and data is actually stored in permanent storage as the contract bytecode.

With Qtum's x86 VM it's much simpler and generally easier to reason about. A custom binary format with separate sections for options (ie, contract configuration info etc), initialized data, and code. The entire binary is stored directly in permanent storage. This prevents the situation in the EVM where a contract must also contain a contract saver. This is not too complicated in the EVM, but with existing paradigms used in x86 development this would be extremely unusual, probably necessitating the need to write two completely separate programs in order to prevent assumed address memory errors etc.

Tooling is provided in the x86Lib testbench program to convert an ELF program to this custom binary format. The rationale for not using ELF directly is that ELF contains a significant amount of potential complexity, even if most ELF programs are fairly simple to parse. Rather than making a special case "you can't use these features in ELF" type implementation, it is instead easier to use a very simple custom format and providee conversion tools. Because of this, it is also possible to potentially provide flat binary, PE, and other binary format support.

The custom format is a flat byte array with a prefix map that indicates the length of each section:

//The data field available is only a flat data field, so we need some format for storing
//Code, data, and options.
//Thus, it is prefixed with 4 uint32 integers.
//1st, the size of options, 2nd the size of code, 3rd the size of data
//4th is unused (for now) but is kept for padding and alignment purposes
struct ContractMapInfo {
    //This structure is CONSENSUS-CRITICAL
    //Do not add or remove fields nor reorder them!
    uint32_t optionsSize;
    uint32_t codeSize;
    uint32_t dataSize;
    uint32_t reserved;
} __attribute__((__packed__));

After this map, just a flat array of bytes remains.

Flow:

  • LOCATION = 0, which is the first byte after ContractMapInfo
  • copy option data into a buffer from LOCATION to LOCATION + optionsSize. Increment LOCATION by optionsSize
  • copy code data into a buffer from LOCATION to LOCATION + codeSize. Increment LOCATION by codeSize
  • etc etc...

Section description:

  • Options is currently unused and thus optionsSize must be 0. Later it will contain dependency graphs, trusted contract options, and other specialized configuration data. This data is not directly exposed to contract code, unless the option specifically indicates so
  • Code is read-only executable code. This is loaded into the contract's memory space at 0x1000 and currently has a max size of 1Mb. The contract is not allowed to modify the contents of this memory. Their may also be a gas surcharge for executing code outside of this section, since this may make JIT and other optimizations more difficult. Right now this section has a fixed size of 1Mb and any data beyond codeSize is set to 0.
  • Data is initialized and read-write data. This is loaded into the contract's memory space at 0x100000 and also has a 1Mb limit. In some documents this section of memory is also referred to as "scratch memory". Right now this space has a fixed size of 1Mb and any data beyond dataSize is set to 0. The ".BSS" section of ELF files (uninitialized memory) will end up in this memory area as well, but since there is no data associated with it, there is no need to provide any info about .BSS to the VM. The ELF file converter will check that the data size + bss size does not exceed 1Mb.

Smart contract transaction-call format

Contracts have a call stack which can be used to send arguments to called contracts and for called contracts to return data. However, having numerous data entities in a transaction is non-trivial to deal with for verification purposes. So, there will only be 1 actual data field in the transaction format for calling contracts (same as we currently have for the EVM). However, an ABI is provided to simplify the smart contract's responsibiltiy for parsing this. The format can of course by ignored by just having one large argument that is the byte array you want to pass.

Flow:

  • LOCATION = 0, the first byte of the data
  • 32 bit integer, SIZE, is read from LOCATION
  • LOCATION is incremented by 2 (size of integer)
  • Buffer is allocated of size SIZE and then memory is copied to it from LOCATION to LOCATION + SIZE
  • LOCATION is incremented by SIZE
  • Repeat until no data remains

System call interface

Qtum uses syscalls in a very similar way to Linux based systems. The stack is not used at all, and only registers are used. If more arguments are needed than registers exist, then one register needs to point to an area of memory wiht these additional arguments in some kind of structure. The interrupt number 0x40 is used for all Qtum syscalls at this time.

Registers for calling:

  • EAX - system call number
  • EBX, ECX, EDX, ESI, EDI, EBP - arguments 1-6
  • ESP, EFLAGS - unused

Registers upon return from syscall:

  • EAX - return value (0 means success typically, excluding operations that return a length)
  • EBX, ECX, EDX, ESI, EBP, ESP, EFLAGS - not modified

A wrapper function is provided in libqtum to simplify this interface for C ABI using languages:

.global __qtum_syscall
// long syscall(long number, long p1, long p2, long p3, long p4, long p5, long p6)
__qtum_syscall:
  push %ebp
  mov %esp,%ebp
  push %edi
  push %esi
  push %ebx
  mov 8+0*4(%ebp),%eax
  mov 8+1*4(%ebp),%ebx
  mov 8+2*4(%ebp),%ecx
  mov 8+3*4(%ebp),%edx
  mov 8+4*4(%ebp),%esi
  mov 8+5*4(%ebp),%edi
  mov 8+6*4(%ebp),%ebp
  int $0x40
  pop %ebx
  pop %esi
  pop %edi
  pop %ebp
  ret

Most people should never need to do anything with syscalls however, as libqtum provides easy to use wrapper functions for each syscall.

Posted: 10/22/2018 10:52:10 PM

Comments

Posting comments is currently disabled(probably due to spam)