I'm Ashley and this site is now frozen

I've now established the online moniker 'ash2q' and similar. Although I'll always keep aliases etc for email, my proper email is now ashley@ash2q.com. I don't have a personal/tech website right now, but if I do in the future it'll be at ash2q.com. My new github is https://github.com/ash2q and my new twitter is https://twitter.com/ash2qx

I've been thinking a while about how to handle getting rid of the "earlz" moniker and due to the great number of links into my twitter, github, website, etc. I've decided that the best solution for this issue is to make all 'earlz' based accounts, sites, etc read-only and to migrate as needed to new accounts.

This message will be the last (or near to it) thing posted on these accounts, sites etc. For search engine etc reasons, my legal name is Ashley Houston and I was formerly known as Jordan Earls.

See you on the other side

Tags: meta
Posted: 2/9/2022 5:55:26 AM

The "UniversalAddress" structure

In Qtum (as well as Bitcoin and others) there are multiple "address" formats. These might look like QNPkSpiwVfvMoMK8wqpKN9TdyLgRwJTYKz for a standard pay-to-pubkeyhash address or the address might begin with the letter M indicating a pay-to-scripthash address. The prefix of the address (ie, the bit that determines if it starts with a Q or M) determines what format the address is, and thus how coins can be sent to that address. There is actually more data in address than what goes into the blockchain. An address is made like the following:

base58(version + data + checksum)

Version is a 1 byte prefix that determines what letter the base58 address begins with. Data is the actual data that will go on the blockchain. For a pubkeyhash address, this would be a ripemd160 hashed public key. The checksum is the bottom 4 bytes of sha256(sha256(version + data)). Satoshi himself designed this system to prevent mistakes where it is possible to send coins to a wrong address, etc.

Since Qtum was first thought of, the mutliple address formats has been difficult to reconcile with the design of the EVM. In the EVM, all addresses are assumed to be 160 bits. This does not allow for any simple method of disambiguating between a pubkeyhash or scripthash form of address, there is no room for the "version" field. Furthermore, it's always been rather confusing because Qtum uses base58 addresses everywhere like Bitcoin does, but the EVM and Solidity naturally uses hex addresses like Ethereum. It's easy to convert a base58 address to a hex address, but it strips away the version number and it's easy to make mistakes.

Our best workaround for this problem was basically to have soldiity contracts etc continue to use hex addresses. If the hex address exists in the EVM as a contract, it sends coins or whatever to that contract address. If the address does not exist, then it assumes that the address is a pubkeyhash address. With this system it's impossible to make use of pay-to-scripthash or multisig addresses.

With the design of the x86 VM in Qtum, this was one of the more important things to be fixed. It will be possible to use established and safe Bitcoin style multisig addresses to manage x86 contracts, and contracts can send and receive funds to/from these addresses. Exposing the base58 version number was not enough however. To avoid mistakes like sending mainnet Qtum to a testnet network address, the exact version number can vary per network. It would not be ideal to need to compile special versions of a contract that work only on testnet, or only on mainnet. So, instead the "UniversalAddress" structure was created. It is currently defined as the following:

#define ADDRESS_DATA_SIZE 32
typedef struct{
    //Do not modify this struct's fields
    //This is consensus critical!
    uint32_t version;
    uint8_t data[ADDRESS_DATA_SIZE];
} __attribute__((__packed__)) UniversalAddressABI;

//Note: we don't use Params().Base58Prefixes for these version numbers because otherwise
//contracts would need to use two different SDKs since the the base58 version for pubkeyhash etc changes
//between regtest, testnet, and mainnet
enum AddressVersion{
    UNKNOWN = 0,
    //legacy is either pubkeyhash or EVM, depending on if the address already exists
    LEGACYEVM = 1,
    PUBKEYHASH = 2,
    EVM = 3,
    X86 = 4,
    SCRIPTHASH = 5
};

The version number was made 4 bytes instead of 1 to allow for future flags and to keep with a 4 byte alignment. Allowing the structure to be 33 bytes might be rather inefficient due to unaligned memory access on some processors being more expensive. The address data is 32 bytes because it can fit a 256 bit hash. No current address format uses a full 256 bit hash though, so there has been some consideration of reducing that size to 20 bytes instead, to fit the standard ripemd160 bit hash.. but thus far we are leaning on the side of flexibility.

The version number used in UniversalAddresses are not the same as the base58 address version. Instead, these addresses are not intended to be used directly, but rather to be easy to write code around. Each value corresponds to pubkeyhash, scripthash, EVM, or x86 addresses. These version numbers are always the same regardless of network, and tools are provided to decode and map a base58 address into a UniversalAddress. Here is an example in a C contract:

UniversalAddressABI a;
if(qtumParseAddress("xCREYTgvdk3XzFNpdXkj35fZX4hJK2Ckpd", &a)){
    qtumError("Error parsing address");
}
Posted: 11/19/2018 5:46:12 AM

Memory map of the x86 VM

This is a continuing series of x86 blog posts that basically documents things as needed for both team members and the public.

Unlike the EVM, in x86 memory is used extensively, and gaps in memory is possible. For example, accessing memory at 0x1000 might give an error about there being no memory there, but 0x2000 might have something useful. The EVM is actually the odd one out here as most VM and CPU architectures use this concept to their advantage. Back when 1Mb of memory was rare, the 8086 actually abused this by having most memory located starting at 0, but with the read-only BIOS memory located at 0xF0000, and sometimes with various sections of memory that actually connected to external devices rather than to RAM.

Anyway, Qtum's x86 VM design abuses this feature as well so that certain areas of memory can be shared etc. This is Qtum's core memory map; explanation for each memory section is below:

  • Emergency stack - 0x100, length 64, read-write
  • Contract Code - 0x1000, length 0x10000 (max), read-only
  • Contract Data - 0x100000, length 0x10000 (max), read-write
  • Stack memory - 0x200000, length 8196, read-write
  • Execution data - 0xD0000000, length TBD, read-only
  • Transaction data - 0xD1000000, length dynamic, read-only
  • Blockchain data - 0xD2000000, length TBD, read-only

Note that a memory section can be "init on read". ie, there's not necessarily a need to construct the data for a section before a contract needs to read that data. There may also be additional gas costs for the first access of such a memory area.

Emergency Stack

This is an emergency stack that is used when a double-fault exception occurs. Unused right now since exception support is not implemented

Contract Code

This is where the actual contract code is loaded into. This is read-only for security and optimization purposes. The actual size of this is determined by how much contract code needs to be loaded. Memory beyond the actual size can be read, but it can not be written to since it's read-only, and is fixed at 0

Contract Data

This is where contract data is loaded into. Unlike in the EVM, if you have a variable with some value, there is no need for CPU code to be written to reserve some bit of memory and then set it to that value. Variables in x86 is instead just a simple pointer to a memory address. That memory address is expected by the loader (in normal operating systems, the executable parser and loader.. in Qtum, the VM initialization process). This is nice because we can use fast native code to initialize all the variables in one quick "memcpy" operation. Variable pointers end up pointing to somewhere in this memory section. The actual size is currently fixed at 1Mb, but this will be changed later beyond the prototype.

Note that read-only data is (depending on linker configuration) stored in the code section. Read-only data in common programs includes strings, constants, etc.

Stack memory

This is a section of memory reserved for the x86 call stack. The call stack is used for passing arguments to functions, storing return addresses, local variables etc. The reason for not overloading the data section with stack data is that this section of memory is surrounded by missing memory. This means that if the stack over or under flows it can be immediately "detected" by getting an error thrown. Explicit errors are almost always preferrable to continuing to run with corrupted state.

Execution data

This is data that is read-only and specific to each contract execution. This means that the data will be different when calling a contract within a contract. This data includes things like the sender address (what address initiated this execution), gas limit, and other data.

Transaction data

This is dynamic length data that encodes the complete transaction data which caused the current contract execution. This includes all inputs and outputs. There will be both raw script access, as well as simplified "100 coins was sent to address A" type of data that can be used very easily. There will be helper library functions to assist in script decoding etc.

Blockchain data

This is read-only global blockchain data that is constant for all contract executions in the current block. This includes things like the block gas limit, current block height, previous block hashes, current difficulty, etc.

Rationale

The point of most of this data being available in memory rather than in syscalls is primarily for two reasons:

  • Every syscall has a non-negligible security risk
  • Syscalls are expensive in contract code size, contract code gas overhead, and in VM implementation speeds (it means exiting JIT etc)

Every syscall is an interface for exposing contract code to the outside world. The majority of OS kernel security exploits come from buggy syscall behavior. Thus, there is a big benefit to exposing as few syscalls to the VM as possible. Less ways for the VM to escape the sandbox, less places to audit.

The other factor that encouraged this is that the syscall process is quite expensive in multiple ways. The code size is small, but not negligible. It requires a minimum of about 30 bytes to make a syscall from C. Other languages might be smaller, but are probably bigger. The expensive part of the syscall process is that there must be translation between the syscall and the standard C interface. This involves calling a helper function, preserving a few registers, getting a bunch of items from the stack and storing them in registers, doing the actual syscall, and then finally restoring those previously preserved registers. The 30 byte point previously actually only covers the helper function part. The total number of bytes executed is significantly more but not represented in contract size since that code is written only once and used for every syscall.

And then finally, beyond actual contract ineffeciencies, each syscall can potentially wreck VM performance. In a JIT workflow, the actual hardware CPU can cache the JIT overhead code and JIT compiled contract code very quickly, but upon leaving that section to some other piece of code, it typically must clear at least portions of the cache to make room for this new code. And in a true virtualization workflow using hardware support it becomes even more expensive as it can involve a context switch and hardware level system call. This is a common problem with most languages involving a VM. This includes Webassembly (up until recently, I think that team made a major improvement somehow), Javascript/V8, etc.

Prototype

Qtum x86 is still in prototype. These memory areas are fairly certain, but as with everything in this series, things are still up in the air and this should not be considered a final specification

Posted: 10/27/2018 12:20:38 AM

Exposed Formats in Qtum x86

The following formats are exposed to interface and smart contract code

Runtime Length Encoding Format

On the blockchain all contract data for x86 uses a variant of Runtime Length Encoding (RLE) for bytes that are equal to 0. Non-zero data is not encoded. When a 0 is encountered in the bytestream it becomes a length to indicate how many zero bytes follow it. Having a length of 0 for the RLE is expressly forbidden and will result in an invalid transaction that can not be included in blocks or in the mempool

The RLE payload is prefixed with a 32 bit integer which contains a 24-bit decoded size field. If this size does not exactly match the decoded payload, the transaction is invalid.

the other 8 bits remaining in the 32 bit integer is reserved as a version number for potentially implementing other compression formats

Note that the size of the payload can be verified without allocating additional memory.

Relevant code comment:

//format: decoded payload length (uint32_t) | payload
//the compression only compresses 0 bytes. 
//when 0x00 is encountered in the bytestream, it is transformed converted to a run-length encoding
//encoding examples:
//0x00 00 00 00 -> 0x00 04
//0x00 -> 0x00 01
//0x00 00 -> 0x00 02
//0x00 (repeated 500 times) -> 0x00 0xFF 0x00 0xF5
//as part of the encoding process, a 32 bit length field is prefixed to the payload

The RLE encoding is used on top of any other formats.

Code for encoding and decoding is included in x86Lib, but should be trivial to implement in any language needed.

The length field may be removed at a later time as right now it provides no benefit. The original rationale was so that a constant memory allocation profile could be used, rather than potentially needing multiple allocations. However, the decoded format of data is stored in backing database in Qtum right now, making decoding a one-time operation.

Smart contract bytecode format

The x86 VM works differently from how the EVM encodes and stores contracts. In the EVM, a flat array of opcodes is all that is given. These opcodes are then executed and they decide what part of those opcodes and data is actually stored in permanent storage as the contract bytecode.

With Qtum's x86 VM it's much simpler and generally easier to reason about. A custom binary format with separate sections for options (ie, contract configuration info etc), initialized data, and code. The entire binary is stored directly in permanent storage. This prevents the situation in the EVM where a contract must also contain a contract saver. This is not too complicated in the EVM, but with existing paradigms used in x86 development this would be extremely unusual, probably necessitating the need to write two completely separate programs in order to prevent assumed address memory errors etc.

Tooling is provided in the x86Lib testbench program to convert an ELF program to this custom binary format. The rationale for not using ELF directly is that ELF contains a significant amount of potential complexity, even if most ELF programs are fairly simple to parse. Rather than making a special case "you can't use these features in ELF" type implementation, it is instead easier to use a very simple custom format and providee conversion tools. Because of this, it is also possible to potentially provide flat binary, PE, and other binary format support.

The custom format is a flat byte array with a prefix map that indicates the length of each section:

//The data field available is only a flat data field, so we need some format for storing
//Code, data, and options.
//Thus, it is prefixed with 4 uint32 integers.
//1st, the size of options, 2nd the size of code, 3rd the size of data
//4th is unused (for now) but is kept for padding and alignment purposes
struct ContractMapInfo {
    //This structure is CONSENSUS-CRITICAL
    //Do not add or remove fields nor reorder them!
    uint32_t optionsSize;
    uint32_t codeSize;
    uint32_t dataSize;
    uint32_t reserved;
} __attribute__((__packed__));

After this map, just a flat array of bytes remains.

Flow:

  • LOCATION = 0, which is the first byte after ContractMapInfo
  • copy option data into a buffer from LOCATION to LOCATION + optionsSize. Increment LOCATION by optionsSize
  • copy code data into a buffer from LOCATION to LOCATION + codeSize. Increment LOCATION by codeSize
  • etc etc...

Section description:

  • Options is currently unused and thus optionsSize must be 0. Later it will contain dependency graphs, trusted contract options, and other specialized configuration data. This data is not directly exposed to contract code, unless the option specifically indicates so
  • Code is read-only executable code. This is loaded into the contract's memory space at 0x1000 and currently has a max size of 1Mb. The contract is not allowed to modify the contents of this memory. Their may also be a gas surcharge for executing code outside of this section, since this may make JIT and other optimizations more difficult. Right now this section has a fixed size of 1Mb and any data beyond codeSize is set to 0.
  • Data is initialized and read-write data. This is loaded into the contract's memory space at 0x100000 and also has a 1Mb limit. In some documents this section of memory is also referred to as "scratch memory". Right now this space has a fixed size of 1Mb and any data beyond dataSize is set to 0. The ".BSS" section of ELF files (uninitialized memory) will end up in this memory area as well, but since there is no data associated with it, there is no need to provide any info about .BSS to the VM. The ELF file converter will check that the data size + bss size does not exceed 1Mb.

Smart contract transaction-call format

Contracts have a call stack which can be used to send arguments to called contracts and for called contracts to return data. However, having numerous data entities in a transaction is non-trivial to deal with for verification purposes. So, there will only be 1 actual data field in the transaction format for calling contracts (same as we currently have for the EVM). However, an ABI is provided to simplify the smart contract's responsibiltiy for parsing this. The format can of course by ignored by just having one large argument that is the byte array you want to pass.

Flow:

  • LOCATION = 0, the first byte of the data
  • 32 bit integer, SIZE, is read from LOCATION
  • LOCATION is incremented by 2 (size of integer)
  • Buffer is allocated of size SIZE and then memory is copied to it from LOCATION to LOCATION + SIZE
  • LOCATION is incremented by SIZE
  • Repeat until no data remains

System call interface

Qtum uses syscalls in a very similar way to Linux based systems. The stack is not used at all, and only registers are used. If more arguments are needed than registers exist, then one register needs to point to an area of memory wiht these additional arguments in some kind of structure. The interrupt number 0x40 is used for all Qtum syscalls at this time.

Registers for calling:

  • EAX - system call number
  • EBX, ECX, EDX, ESI, EDI, EBP - arguments 1-6
  • ESP, EFLAGS - unused

Registers upon return from syscall:

  • EAX - return value (0 means success typically, excluding operations that return a length)
  • EBX, ECX, EDX, ESI, EBP, ESP, EFLAGS - not modified

A wrapper function is provided in libqtum to simplify this interface for C ABI using languages:

.global __qtum_syscall
// long syscall(long number, long p1, long p2, long p3, long p4, long p5, long p6)
__qtum_syscall:
  push %ebp
  mov %esp,%ebp
  push %edi
  push %esi
  push %ebx
  mov 8+0*4(%ebp),%eax
  mov 8+1*4(%ebp),%ebx
  mov 8+2*4(%ebp),%ecx
  mov 8+3*4(%ebp),%edx
  mov 8+4*4(%ebp),%esi
  mov 8+5*4(%ebp),%edi
  mov 8+6*4(%ebp),%ebp
  int $0x40
  pop %ebx
  pop %esi
  pop %edi
  pop %ebp
  ret

Most people should never need to do anything with syscalls however, as libqtum provides easy to use wrapper functions for each syscall.

Posted: 10/22/2018 10:52:10 PM

Proposal to allow free UTXO creation in Qtum

So there is one big thing that is annoying about UTXOs. Namely, that in order to do anything you first need your very own UTXO. They're easy to get, and come in any size you want above 0... but you can't make them out of thin air if you don't already own one. Getting one involves you being friendly either with a staker who can create you one for free in the coinstake transaction, or having someone else pay the network fees and be willing to give you a few satoshi.

The way contracts figure out who sent the message (and authenticate it) is to check "ok, did this person spend a UTXO in this transaction" (even if they merely spent it and sent all the funds back to themselves). This sounds a bit wasteful, but the worse problem is that without a UTXO it's impossible to authenticate that you're sending a message to the contract. Even if someone else pays the fees and everything else for you, you still need a UTXO to really do anything with most contracts. This issue commonly comes up in other cases as well. Your wallet has multiple addresses and each address has it's own set of UTXOs. The blockchain is incapable of knowing that those addresses belong to the same wallet. So, what happens when you choose to send QRC20 tokens to some certain address, but then find out that address has no UTXOs. You will be incapable of spending the QRC20 tokens, since you have no way of proving to the contract that you are that certain address. The official workaround for this problem is to simply send that certain address any amount of coins (even 1 satoshi) so that it has a UTXO, and then you can withdraw your QRC20 tokens or whatever. Later, Qtum Core will (by default) make sure to use change addresses to ensure that address always has a UTXO.

This is wasteful of precious UTXO set space, but most of all it's incredibly annoying. Our Dapps have complained about it, their users, our users, myself. It's a problem... and it makes it harder to use a lot of different and otherwise awesome use cases unique to the UTXO model in Qtum.

My (definetely not 100% proven and thought out) proposal is to extend Bitcoin's fundamentals to support an otherwise boring edge case. In Bitcoin (and Qtum right now), in order to even prove a signature or anything like that on the blockchain, you must give a TXID and a Vout number, indicating which UTXO you will be spending, then you provide a signature that basically makes the UTXO's "vout script" evaluate to true. There's other cool stuff about this, but for the case we're interested in, it involves checking the signature of the public key in the UTXO for the hash of all inputs and outputs of the transaction. So, what if we gave another way of looking up that public key to check the signature of?

My proposal: Allow a TXID and Vout of 0 to be permissable and assumed to have 0 value, but otherwise to be infinitely spendable. To prevent duplicate transactions potentially being valid, all complete transactions must consist of at least one actual vin with a TXID that's not 0. It would probably be very ill-advised to use SIGHASH_ANYONECANPAY, as inputs created with this method could be used in any transaction without restriction (ie, duplicated). This gives no threat to the blockchain, but could be a huge security hole for users with no practical use. Thus, SIGHASH_ANYONECANPAY will most likely be prohibited when using this functionality. To be safest, it would probably be a requirement to only allow the SIGHASH_ALL (default) signature scheme. Otherwise, anyone with early knowledge of the transaction (such as nodes/stakers) could potentially rewrite the transaction to include another output that executes a contract and drains ERC20 balances or whatever.

No new opcodes need to be introduced and in theory this has no ill effect on Bitcoin or Qtum blockchains at this moment if allowed. Qtum would make use of this functionality though by extending the Account Abstraction Layer so that it can run these vin scripts that would otherwise be useless. It could check for pattern matching so that P2SH or PUBKEYHASH scripts could be identified and classified as an address, along with being run as a normal script to verify signatures of course. Thus, it would be possible to take an address never used on the blockchain, construct a vin that uses and verifies that address, but points to no UTXO. In theory this could be used for many purposes. You could use P2SH/Multisig addresses without UTXOs, send messages from contracts (pending attack vector verification), and even use it with Segwit addresses.

The pattern matching would probably be the most difficult part and might actually require an additional opcode for simplicity. The opcode could be something like OP_VOUTBEGINSHERE or whatever and actually do nothing other than be a marker for pattern matching.

Practical Usecase

A user wants to use a Dapp but has never used Qtum before. In order to begin using the Dapp, they must register an account on the blockchain. The Dapp maintainer has an off-chain registration portal to allow for someone to register on the Dapp for free. This works by the Dapp maintainer paying for the gas fees of on-chain registration.

This is possible today (though has not been used by anyone yet), but with a caveat. The new user must have a UTXO to prove their identity and complete the registration.. they don't actually need any money, but the only way to prove ownership of a certain address is by spending a UTXO. With this proposal, it'd be possible for the user to create a UTXO from nothing using only their public/private key.

The actual workflow is that the user would register off-chain or whatever to basically prove that they are worthy of the Dapp site paying for their gas fees (remember spam is still possible and must be handled somehow). The user would then provide their address to the Dapp site. The Dapp site would then send the user a partially completed transaction composing of the following:

  • Input: Created user address utxo from txid 0. This is unsigned at this point. This the first input so that the user address will be the msg.sender to contract code
  • Input: Dapp UTXO that pays for keys; This is signed and marked SIGHASH_ALL
  • Output: Dapp contract execution that does the registration process on-chain etc. The amount of gas given is chosen by the Dapp and is measured to be the correct amount.
  • Output: Change address to send remaining funds back to the Dapp UTXO address

The user would receive this transaction and validate that it is as expected. They would check that the Dapp contract output only does registration. They would also ensure that no additional outputs are added to the transaction other than the change address UTXO (and they would check that to ensure there is not a hidden contract execution).

Once the user is satisfied and has determined the transaction is secure, the user will sign the first input and use the SIGHASH_ALL signature scheme to ensure nothing can be added or removed to the transaction. The user then broadcasts the transaction to the blockchain (or hands it to the Dapp site to broadcast if they aren't running a node).

Once the transaction is executed and confirmed, the address of the user will be registered with the Dapp on-chain, even though this address has never actually received any Qtum coins or had any other activity to "establish" the address.

Many identity based Dapps in the ecosystem right now struggle with this problem. Users might pay with Bitcoin or a credit card or some other off-chain method of payment for the Dapp to register them, but if the Dapp sends them Qtum (or ETH, whatever) to the address as a stipend for registering on-chain there is no method of actually enforcing that. In addition, the Dapp could get in serious legal trouble since exchanging fiat to cryptocurrency in most countries has a lot of legal regulation like money transmission licenses etc. With this method, gas and transaction fees can easily be paid by the Dapp in an enforced way. The user can't just run off with that stipend and choose to cash it out, or spend it on a different contract execution. Thus it is (btw, IANAL, don't take this as legal advice) probably safe legally to take fiat and use it to pay for the fees of a specific on-chain transaction.

Another simpler use case is for an ERC20 token provider to provide the gas fees for sending tokens to an address. If we're working in a world where people might not own Qtum, but do use the Dapps on the Qtum blockchain, then there needs to be systems for using Dapps without actually owning any Qtum to pay for your own transaction fees. This proposal is just the first step in making this a practical reality. There are various workarounds in Qtum and Ethereum to solve this problem, but they are clunky and definitely not ideal.

Posted: 10/18/2018 7:33:05 PM

Economics of Fees and Gas

So, I'm terrible at preparing for speeches, so I'm trying something new; I'll write about it before I speak about it.

Gas in Ethereum (and Qtum) is basically the cost of an operation in a smart contract. Different operations have different costs. Some are very cheap, some are not so cheap. This mechanism basically functions as a way of discouraging certain behavior, and also serves to make spam and attacks on the blockchain more expensive. In a Turing-Complete computer, this notion of gas is an absolute requirement. Otherwise, someone could simply write an infinite loop and watch the blockchain come to a halt.

Today I'll discuss specifically what kind of things the gas model encourages in certain conditions, and some of the unintended consequences of that.

Storage costs

So, permanent storage is by and far the most expensive resource on the Ethereum blockchain. It is completely impractical to store any significant amount of data on the blockchain. In current network conditions, 1Kb of data is ~$2.50, depending on how fast you want it to confirm. This means 1Mb is over $2,000. So, it's impractical to store any significant data on the blockchain. However, this is for traditional storage. There are unconventional alternatives to the framework Ethereum lays out. I'll present a total of 5 methods of storage, with various behaviors, costs, and benefits:

External Code

This is basically where you create a new contract account using the CREATE opcode in Ethereum. You store whatever data you need to as the contract bytecode of a newly created contract. This is probably the most wasteful way to store data on the blockchain, as once you store it, there is very little incentive to include a way to destroy that data after it is no longer needed.

This is a semi-volatile storage method. Once the data is stored, it can not be updated without creating it from new. It is best for large blocks of data which do not need to be updated at a later point, or which only need to be updated very seldomly. It is extremely cheap to load this data once written, beating conventional storage by an order of magnitude.

Internal Code

This is an obvious choice for some use cases and used internally in Solidity generated bytecode. Basically if you have some data you can generate in your constructor and cache, it is typically much cheaper to calculate it, and then store it directly in the deployed contract bytecode, than it is to include large amounts of data in the create transaction itself. Assuming the data does not need to ever be updated, this is the best method to use, as it is also the cheapest way to load data.

Storage

This is the conventional storage model provided by Ethereum. It is granular, allowing each individual 32-byte word to be updated in storage, with reduced costs to update than compared to writing to new storage. There is also some incentive to keep this area clean and tidy, as cleaning up storage actually can provide the contract a gas stipend. This area is easy to update, easy to read, but in most cases the most expensive method to access and write to.

Transaction

This is a fully volatile data storage method, though it actually stores no persistent data directly accessible to the contract. Instead of storing data within the EVM itself, the data is provided to the contract by sending contract data. The contract then hashes the data to ensure that it is correct and matches the expected contract state. This method has some drawbacks, the biggest being that transaction data is extremely expensive. It is by far the most expensive to "access", which requires putting the data in a transaction that calls the contract. However, it is also the cheapest (other than internal code) to actually "write" to and update, because it is not actually stored within the blockchain itself. This method could be extended to use a hash-tree like structure so that the data provided to the contract is very granular and thus less wasteful of precious transaction space. However, this method also comes with a major drawback. Data races must be accounted for. If another user sends a call to the contract that causes the data to be updated, then it invalidates the data in any pending transaction. This along with the greatly expensive cost of data access severely limits the potential use cases of this method.

So, how do these methods stack up for overall cost?

Note that the "actual" transaction cost is shown along-side the other methods. This is basically how much it costs to actually call a contract with the set amount of data.

Data access

Data access with transaction

Data Initial Storage

Data Update Storage

These graphs clear some things up for us:

  • External code is preferrable for any data greater than 128 bytes that never needs to be updated
  • Internal code should be used anytime pure constant data is needed and is known at deployment time
  • Storage should be used for any commonly updated data
  • Transaction data is useable in some cases despite races, but if using the contract more than once, the overall cost quickly becomes very high.

Economics of Storage

On the topic of the game theory and economics behind this, is it really better for the overall blockchain to store non-volatile data in contract bytecode rather than in the storage database? I guess it depends on several factors. As more accounts are created, does it become more expensive to access account data? I don't have the answers, but I can guarantee that using this method for volatile storage is not only more expensive, but also somewhat wasteful. My graph includes the cost of deploying basically a new contract, but subtracts from the gas cost the refund given by destroying the old data contract of ~19000 gas. This does have a measurable impact on the graph on the left side of the graph, but after about 320 bytes it becomes a moot benefit other than being a good blockchain citizen.

So, what's the downside of contract storage being so expensive? The primary disadvantage is that it can encourage performing potentially cheaper computations on data that actually take more time to execute, just to prevent using storage as a caching layer.

These costs can also in some cases can encourage for data to be sent in each transaction. This is probably the most detrimental condition. The database where contract data is stored is in some ways actually a free resource of the blockchain at a protocol level. Outside of SPV(in Qtum) or "fast" (in Ethereum) syncing methods, the size of this database has absolutely no bearing on how large the actual blockchain is, nor how long it takes to process. The size of this database in the end is collapsed into a single 256-bit SHA3 hash called the stateRootHash. Arguably, the most direct resource cost to the blockchain is not this storage, but rather all data that is attached to transactions. This transaction data can not be discarded or ignored without certain security implications.

For the lightweight wallets and nodes present in Qtum, it is actually more detrimental to the overall user experience to have data stored in account bytecode data, rather than directly in the storage. It is simple to retreive and validate all storage associated with a single account. However, if this contract is accessing multiple separate accounts for data purposes, then it is a completely non-deterministic process, causing significant latency. Without downloading all contract bytecode, it is not possible to retrieve and validate this data in one-shot. So, you instead must begin execution of the contract, and then go back to the network, retrieve another account, validate it, etc, and then finally continue on with execution. For just 1 data contract like this, it may not have a large impact, but because of how cheap this data is to access, there is no clear benefit to not using multiple contracts, other than the fixed 32,000 gas flat creation fee. In fact, after a certain point, especially with semi-volatile data, it is greatly encouraged to keep this data fairly granular.

So, basically the high cost for storage is born out of the need to keep the blockchain easy to interact with for lightweight wallets. However, there is a clear path that contract developers can use to circumvent this cost, and hurt lightweight wallets much more than storing a lot of data directly in storage.

Computation Costs

Now let's talk about the less obvious expenses to the Ethereum gas model. Almost every computational opcode has some cost associated with it. These costs are significantly less than storage or a direct resource, but with how many opcodes that are required to do basic operations in the EVM, they add up quickly. One of my favorite examples to point to is the cost of comparing two strings to determine if they are equal. So, I constructed this basic test contract:

contract GasTest{
    using strings for *;
    function StringUtilsEqual(string tmp1, string tmp2) constant returns (bool){
        return StringUtils.equal(tmp1, tmp2);
    }
    function SliceEquals(string tmp1, string tmp2) constant returns (bool){
        return tmp1.toSlice().equals(tmp2.toSlice());
    }
    function NaiveEquals(string tmp1, string tmp2) constant returns (bool){
        if(bytes(tmp1).length != bytes(tmp2).length) return false;
        uint256 max=bytes(tmp1).length;
        for(uint256 i=0;i<max;i++){ 
            if(bytes(tmp1)[i] != bytes(tmp2)[i]){
                return false;
            }
        }
        return true;
    }
    function Sha3Equals(string tmp1, string tmp2) constant returns (bool){
        return sha3(tmp1) == sha3(tmp2);
    }
}

This measures the gas cost of comparing strings 4 ways:

  1. Using the StringUtils library
  2. Using the StringSlice library
  3. Using a naive loop method to compare if the strings are equal
  4. Using two SHA3 operations to hash the strings, and then comparing if the hashes are equal

Before I go into the chart, the slowest method on a normal computer would be the SHA3 method. SHA3 is cheap, but even using very optimized libraries and fast machines, it hashes about 100 bytes in 0.028s, or 28ms.. Meaning that 2 hashes would take around 64ms. Meanwhile, comparing 2 strings on an (older) 64bit processor takes about 0.00183s, or 0.0183ms. Using traditional methods to compare 2 strings should be significantly cheaper if we're basing gas cost on how long an operation takes to compute and execute. So, here is the chart for the gas cost of these different methods:

stringequal gas costs

The Takeaway

So, at this point I assume it's expected for me to talk about how terrible Ethereum's gas schedule is. But no, the problem is not with the values used, it's with the entire model. And I will not pretend to know some perfect answer to this problem. It is definitely a complicated one.

If you ever look into things like the Intel Manuals, that describe the operation and timing of various opcodes in desktop x86 processors, and then go and measure the actual performance of these opcodes, the comparison between the documented timing and the actual timing varies... A LOT. This is because modern processors are extremely complicated and optimized pieces of silicon. There are multiple pipelines, cache levels, branch predictors, register cache sets, etc. Basically all of this is what allows a set of opcodes that may take 1ms, to operate in a tight loop and execute 100 executions of this in just 2ms, rather than 100ms. This complex caching, pipeline, and branch prediction structure is plainly not possible to represent in an isolated virtual machine. The gas model must work on a wide variety of processors, including processors of different architectures, and different bitness (ie, 64bit vs 32bit). So, a gas model is at best an approximation of this that is applicable to all processors.

However, despite no perfect solution being possible, there are definitely some ideas I have on improving the situation, and these are leads I will be researching in great detail while designing the gas model for the x86 VM in Qtum (coming 2018!)

Rich and Predictable Standard Library

Providing a rich standard library with set gas prices is probably one of the best ways to immediately improve the situation without a great amount of risk. This standard library has no need to actually be executed without the VM itself, though if the VM is effecient enough, there is no harm in it. The important thing is that the gas rate required to execute the function should be set by a reasonable human. This allows for things like branch prediction and caching to be measured and thus taken directly into account for the gas price. Ethereum does this for a VERY limited set of operations, such as SHA3. The base cost of SHA3 is 30 gas, and then 3 gas for each 32-byte word of data hashed. So, one could imagine a built-in StringEqual operation which might have a base gas cost of 5 gas, and then 1 gas for each byte compared.

The biggest problem with this method, is that using the Ethereum model, this requires using either pre-compiled contracts or adding new EVM opcodes. In the x86 VM I imagine being able to use native x86 code (potentially pre-compiled internally) and then somehow detecting when the VM enters this "special gas" memory area. It's definitely still an idea being figured out. But adding and changing the costs of these operations would require a fork or other network disruption. One big advantage of Qtum in this area is that we could extend the Decentralized Governance Protocol to cover this functionality, so that the base cost and rate cost of these operations could be easily adjusted. However, the way to add new functions under this specialized gas cost model is still an in-progress problem.

Cost for branches and memory, not for opcodes

One idea that may be a way forward is to charge not for opcodes, but rather for branches and memory access. The primary point behind this method is that most operations are so fast, that the timing difference is insignificant, and thus most operations could have a fixed or even zero cost. Then, the only cost would be for branches (potentially only backward branches) and memory allocations. The most expensive operations on modern processors are not explicit opcodes, but rather cache misses and wrongly predicted branches. Thus, this model aims to charge an ammortized cost for these conditions anytime they are possible. This would greatly encourage loop-unrolling, which is faster... sometimes, in certain conditions. If loop unrolling is done too much, it causes slower performance due to filling up cache, but the additional storage cost for unrolled loops would probably outweigh this pressure.

This method requires a lot of research to determine if it is at all feasible, and what edge cases need particular care. The main advantage of this method is that gas costs can be made to be very predictable when compared to current models. I'm sure this model alone is not a workable solution, but combining this concept with another gas concept can bring predictability for smart contract developers.

Including macro-operations in gas model

Because there is so much identical code generated by modern compilers (ie, they figured out the most effecient way to compile common operations) it could be beneficial to include not just raw opcodes into the gas model, but also a series of "macro-operations". These macro-operations would need to be carefully assessed and priced, with a way to predict the time cost of it's execution. The hard part is that the general execution environment is so volatile, perfectly predicting the time cost may be near impossible. However, any hole in this model is a potential DoS exploit available to attackers.

The problem with all of this

At some point, by placing more and more code in the blockchain consensus system in order to properly measure the true resource cost of an execution, you actually spend more time measuring than you do executing the code. This is one of the hardest problems of gas models in general. You want something consistent that can keep consensus across a variety of implementations, but also prevents cheap attacks, and finally is friendly to various virtual machine optimizations including pre-compilation and JIT execution. These aspects come together to really form the most difficult aspect of this entire topic. This is definitely one of the reasons why it is still very much an open research problem. The problem with turing-completeness is that it is impossible to measure the cost of a smart contract until you actually execution it. The best that can be done is to measure various pieces of code, and then you can at least know the gas price of a piece of code up until the next branch. This is actually one of the weaknesses of the x86 architecture for smart contracts. Code is data and can be mutated and overwritten. It may end up being better in the final implementation to have a mixed gas model, so that constant internal (ie, loaded like EVM code is now) bytecode has a more powerful model, but then code loaded and executed dynamically in read-write-execute memory uses a less computationally expensive (albeit less accurate/financially more expensive) gas model which can be used on-the-fly.

Conclusions

Gas is hard, there are no silver bullets, blockchain is hard.

Notes

Gas cost calculations:

memory(size) = 3 * (size/32) + (size/32) * (size/32) / 512 

# Gas costs of loading existing data in EVM (assuming memory is already allocated)
(not accounting for memory costs to store the data in)
## external code (semi-volatile)
200 + 700 + (3 * size)
sload + extcodecopy + (copy * size)
## internal code (constant)
3 * size
copy * size
## storage (mutable)
200 * size
sload * size
## Transaction (mutable)
200 + (68 * (size * 32)) + 30 + (6 * size) ## note: zero data is only 4 gas
sload + (txnonzeroByte * (size * 32)) + sha3 + (sha3word * size)


# Gas costs of storing data initially to EVM
(not accounting for memory costs data is initially stored in)
## external code (semi-volatile)
32000 + 20000 + (200 * size * 32) + memory(size) + 10
create + sstore + (createByte * (size * 32)) + memory(size) + overhead
## internal code (constant)
(200 * (size * 32))
(createByte * (size * 32))
## storage (mutable)
20000 * size
sstore * size
## Transaction (mutable)
20000 + 30 + (6 * size) ## note: zero data is only 4 gas
sstore + sha3 + (sha3word * size)

# Gas costs of updating data in EVM
(not accounting for memory costs of updated data to store)
## external code (semi-volatile)
32000 + 5000 + (200 * size * 32) + memory(size) + 10
create + sstoreReset + (createByte * (size * 32)) + memory(size) + overhead
## internal code (constant)
N/A (technically could be possible, but would greatly complicated contract design)
## storage (mutable)
5000 * size
sstoreReset * size
## Transaction (mutable)
5000 + 30 + (6 * size) ## note: zero data is only 4 gas
sstoreReset + sha3 + (sha3word * size)

Test contract bytecode including libraries etc:

6060604052341561000f57600080fd5b610f488061001e6000396000f30060606040523615610081576000357c0100000000000000000000000000000000000000000000000000000000900463ffffffff1680633a96fdd71461008657806346bdca9a1461013a5780635d62fdf5146101f2578063674ff51b146102aa57806388ee0b2a146103625780638a0807b71461041a578063ef5a8374146104ce575b600080fd5b341561009157600080fd5b610124600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f01602080910402602001604051908101604052809392919081815260200183838082843782019150505050505091905050610586565b6040518082815260200191505060405180910390f35b341561014557600080fd5b6101d8600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190505061084a565b604051808215151515815260200191505060405180910390f35b34156101fd57600080fd5b610290600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f01602080910402602001604051908101604052809392919081815260200183838082843782019150505050505091905050610860565b604051808215151515815260200191505060405180910390f35b34156102b557600080fd5b610348600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f01602080910402602001604051908101604052809392919081815260200183838082843782019150505050505091905050610874565b604051808215151515815260200191505060405180910390f35b341561036d57600080fd5b610400600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f016020809104026020016040519081016040528093929190818152602001838380828437820191505050505050919050506108a1565b604051808215151515815260200191505060405180910390f35b341561042557600080fd5b6104b8600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f016020809104026020016040519081016040528093929190818152602001838380828437820191505050505050919050506109eb565b6040518082815260200191505060405180910390f35b34156104d957600080fd5b61056c600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f01602080910402602001604051908101604052809392919081815260200183838082843782019150505050505091905050610d17565b604051808215151515815260200191505060405180910390f35b6000610590610eee565b610598610eee565b6000808693508592508351915081835110156105b357825191505b600090505b818110156107f65782818151811015156105ce57fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff1916848281518110151561064957fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff191610156106e4577fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff9450610840565b82818151811015156106f257fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff1916848281518110151561076d57fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff191611156107e95760019450610840565b80806001019150506105b8565b825184511015610828577fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff9450610840565b82518451111561083b5760019450610840565b600094505b5050505092915050565b6000806108578484610586565b14905092915050565b600061086c838361084a565b905092915050565b600061089961088283610df0565b61088b85610df0565b610e1e90919063ffffffff16565b905092915050565b6000806000835185511415156108ba57600092506109e3565b84519150600090505b818110156109de5783818151811015156108d957fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff1916858281518110151561095457fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff19161415156109d157600092506109e3565b80806001019150506108c3565b600192505b505092915050565b60006109f5610eee565b6109fd610eee565b600080869350859250600184511080610a17575060018351105b80610a23575083518351115b15610a50577fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff9450610d0d565b6fffffffffffffffffffffffffffffffff84511115610a91577fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff9450610d0d565b60009150600090505b8351811015610ce957826000815181101515610ab257fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff19168482815181101515610b2d57fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff19161415610cdc57600191505b825182108015610bb757508351828201105b8015610cb857508282815181101515610bcc57fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff191684838301815181101515610c4957fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff1916145b15610cca578180600101925050610ba5565b8251821415610cdb57809450610d0d565b5b8080600101915050610a9a565b7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff94505b5050505092915050565b6000816040518082805190602001908083835b602083101515610d4f5780518252602082019150602081019050602083039250610d2a565b6001836020036101000a038019825116818451168082178552505050505050905001915050604051809103902060001916836040518082805190602001908083835b602083101515610db65780518252602082019150602081019050602083039250610d91565b6001836020036101000a03801982511681845116808217855250505050505090500191505060405180910390206000191614905092915050565b610df8610f02565b600060208301905060408051908101604052808451815260200182815250915050919050565b600080610e2b8484610e34565b14905092915050565b60008060008060008060008060008a6000015197508a600001518a600001511015610e6157896000015197505b8a60200151965089602001519550600094505b87851015610ed25786519350855192508284141515610ebb57600185896020030160080260020a03199150818316828516039050600081141515610eba57809850610ee0565b5b602087019650602086019550602085019450610e74565b89600001518b600001510398505b505050505050505092915050565b602060405190810160405280600081525090565b6040805190810160405280600081526020016000815250905600a165627a7a72305820fcb471f7522e763de8c06e6eba885f81d32bc8128e364dc46a35fd7e6c960bb50029

Raw string gas measurements:

Gas used for string "x" 1
StringUtilsEqual 23884
SliceEquals 23928
NaiveEquals 23433
Sha3Equals 23708

Gas used for string "foobar123" 9
StringUtilsEqual 28268
SliceEquals 24952
NaiveEquals 26385
Sha3Equals 24732

Gas used for string "foobarxxx124529898453ifdf" 25
StringUtilsEqual 37036
SliceEquals 27000
NaiveEquals 32289
Sha3Equals 26780

Gas used for string "jhkfdfdsfsdgfdsgdsffgfdsgrsehsfrhfrdggewrrtrewgfdsaffvdsfgdsf" 61
StringUtilsEqual 57032
SliceEquals 32006
NaiveEquals 45841
Sha3Equals 31859

Gas used for string "hjfhjdskhjglkhdsjlkghjlkdshjkglhjdsfkhjglfdsjhgjhldsfhgjlkdsfhjkghjfdsklhgjlkfdsjhgfdsgfdshgfds" 95
StringUtilsEqual 75932
SliceEquals 36756
NaiveEquals 58655
Sha3Equals 36682

Gas used for string "hgjkghjdskhgjlserijhlktjiurewihuygtiuserhjkgfhfdsjkvgfhjklfdshjghfdsjghufdshgjlkfdshjlkghfdsjlkghjlkfdshgjlkhfdseulkghfdslkjhgulkijfdshg" 136
StringUtilsEqual 98936
SliceEquals 42801
NaiveEquals 74320
Sha3Equals 42872
Posted: 10/2/2017 3:50:00 PM

Thoughts and Goals on Qtum's x86 VM

So we've been notoriously quiet about what the x86 VM will allow in Qtum, beyond just more programming language support. This is basically because the design process is easy to make a mediocre version on, but hard to build a well-optimized, efficient, and easy to use version. So here I won't dip into the details of the design just yet, but I'd like to announce the goals we have in mind.

Programming Language Support

Of course, programming language support is a great reason to build this x86 VM. I personally would like to make 2018 the year of smart contracts written in Rust. Rust is incredibly efficient, lightweight, and above all, focused on safety and avoidance of programmer mistakes. There is a lot more than just Rust of course though. Bringing easy to use languages like C# or Go is also a goal.

The basic gist of the x86 VM's power is that you can take pretty much any existing compiler or programming language, and just make some modifications so that it can run on Qtum's operating system like environment. Almost every compiler out there already has x86 support, so that actual bytecode and architecture support is already there.

Standard Library

One of the common complaints about the EVM is a lack of a standard library. This is not only a developer annoyance, but also is a direct consumer of precious blockchain space. Providing a standard library allows not only for Qtum's blockchain to be made slimmer and more efficient, but also it allows for these standard library functions to have special internal code, similar to Ethereum's pre-compiled contracts. This functionality can happen without needing to add special support for a new pre-compiled contract, and then relying on contracts to begin using this special functionality. Instead, contracts can use the same old unoptimized code, and when they call it, that code is opaque and can be optimized at will without needing to make any change to consensus. Optimization of the standard library like this is an implementation detail. However, as the ecosystem's implementations become efficient, the gas model can be tuned for these functions to make their gas cost reflect their true resource cost.

Additionally, the standard library need not require a fork to extend. Common functions could easily enter the specialized standard library memory space by using the Decentralized Governance Protocol. This mechanism also allows for patching bugs in the standard library, though that power must be especially audited, as smart contracts can come to rely on buggy behavior for proper function. And so, potential upgrades to standard library functions may be only by an opt-in function, or it may not exist at all. Our goal with the DGP is always to keep it conservative and to ensure that even in the case of complete compromise, smart contract logic is unaffected and user's funds are safe.

Optimized Gas Model

This part is very tricky, and just as a forewarning, we will most likely launch the x86 VM initially with a very simple gas model akin to the EVM. However, because of how much more powerful the ISA and features available in x86 are, there is some fairly straightforward ways to advance this space.

One of the simple but powerful solutions will be not only providing a standard library with common functions used by smart contracts and programs in general. But also there is a fairly straightforward path to making these standard library functions have a human-set cost, rather than requiring for these functions to rely on the simplistic and general gas model used for general computation. So, for example, strlen in the simplistic gas model may require 90 gas per character in the string. However, upon developer inspection, it's discovered strlen is extremely cheap to actually execute on the Qtum VM. So, Qtum's Decentralized Governance Protocol is used to propose a special gas rule for this function. So, now the cost of calling this function might be a flat initial cost of 10 gas, plus 1 gas per character. It is impossible to make a perfectly accurate gas model, so, we want to utilize the DGP mechanism in Qtum in order to make this approximation as optimized and efficient as possible.

Unlocking the Full Power of the AAL

Right now, we tend to pitch the Account Abstraction Layer as "what was necessary to make the EVM work". However, the AAL has a lot more power hidden in it beyond what was needed to make the EVM work on top of Qtum. Qtum was designed from the start to support multiple virtual machines, and the EVM was just the first one to be supported. That being said, the Account Abstraction Layer is currently limited by what can easily be exposed through the EVM. The x86 VM we are designing will not face such limitations. Some of the powerful things we want to expose because of this:

  • P2SH (ie, Bitcoin style) multi-sig as a first class citizen, for both sending and receiving payments from smart contracts
  • Raw transaction script support to send custom transactions to take full advantage of the Script functionality in Qtum
  • Allowing segwit transactions to contain and execute smart contracts

New Possibilities for Smart Contracts

Using x86 we get the Von Neumman architecture of computing. This means that code is data and vice-versa. This feature, as well as things like hardware/software interrupts allow for potential operating-system like constructs and features to be integrated into a single smart contract with multiple semi-trusted participants. This includes things like cooperative-multitasking, pause and resume execution (ie, resuming execution in a later transaction), and watchdog timers (though instead of "time", it would work on gas). This also of course includes a direct mechanism of updating contract bytecode without needing to transfer funds and data to a new contract.

The x86 instruction set also includes many specialized functions to control things like specialized permissions for certain code spaces, paging and memory mapping, as well as system calls. It is not expected for Qtum to expose most of these specialized system-level instructions. They greatly complicate the gas model design, and make everything significantly harder to optimize.

However, despite that let-down, there are relatively few things relevant to smart contracts within that set of instructions. There is very little practical need on a public blockchain for having separate ring-0 privileged code and ring-3 unprivileged code within a single smart contract's code. Where the use cases tend to really be prevalent for these features is in privileged and semi-privileged blockchains. So, when we begin to focus on Qtum's enterprise side, we will of course revisit this.

First-Class Oracles

In the x86 VM model for transactions, there is no need to call a contract if you know the data you need from that contract. It will be possible to load data directly from the external contract's storage space. This allows for first-class oracles, where contracts can establish their own ABI and API mechanisms to standardize their storage space. Then, a contract can simply load the storage data directly, with no need to do an expensive call that requires loading the entire contract bytecode, creating a new VM environment, etc. This will finally bring Oracles to be first-class citizens on the blockchain, instead of being limited by the functionality of smart contracts. s

Blockchain Analysis

The large memory space available to x86 as well as it's efficient set of opcodes for general computation allows for the potential of blockchain analysis that is just a pipe-dream with the EVM. It's possible to expose full blockchain data for contract analysis. This could allow AI-based smart contracts to automatically monitor the blockchain, potentially functioning as an oracle, in order to allow smart contracts to adjust their own behavior so that they operate as efficiently as possible in the current network conditions. This blockchain data could include full transaction data or statistics computed by the blockchain nodes (in a consensus-critical way). There is little downside to exposing this data since it is fully constant and only results in a few Mb of additional memory usage.

Alternative Data Storage

Currently, the EVM forces everyone to use 32-byte keys that point to 32-byte data. This can be quite painful to manage, especially when considering fragmentation and maintenance of that space. Moreover, there is no great reason for this. So, in the x86 machine we intend to give smart contracts a general purpose key-value store. So, you can store anything from 1 to some large number of bytes as a key, and have it point to an equally variant value. The proposed gas model for this functionality so far basically involves a flat-fee for writing/reading to this database, and then a per-byte rate fee for each byte that needs to be touched. This functionality would of course still be covered under the stateRootHash so that SPV wallets can interact with the smart contracts using this database.

Explicit Dependency Trees

Another, somewhat lofty, goal is to allow for dependency trees for smart contracts to be explicitly declared and enforced. This would be an opt-in function only so that contracts touching unknown smart contracts are still possible. However, for contracts that know exactly what dependencies they will rely on, this allows for these contracts to be executed in parallel in some cases, and thus they may received reduced gas costs, among other benefits. This would be a major scaling advantage for x86 based smart contracts that choose to opt-in to this feature.

Why x86? Why not ARM?

I've heard many people ask "why x86? Why not ARM?". This is a very good question. We think that x86 is the most well understood platform for virtual machines and emulators. There has been decades of collective mindshare put forth into making efficient and secure VMs for x86. If you want to really research this point, look no further than the many questions on Stackoverflow about making the Android emulator operate at a reasonable performance level. Basically, the solution in most cases is to use an x86 virtual machine instead of an ARM one. There are of course some projects out there for ARM VMs that aren�t regarded as terrible, like Qemu and I'm sure others I don't know about. But the point is that x86 emulation is a known problem with a fairly straightforward solution. The famous Intel Manuals are regarded as some of the most well written and clear documents available for a CPU architecture like this. And there are even some kids in high school that write x86 emulators for fun (haha, that's me!). The compiler support for ARM is always improving, but it is still no where near on par with x86's support.

Now, all that being said, x86 is by no means a simple ISA. It's existed since the 70s with the 8008, and has stayed backwards compatible since the 8088/8086. This really takes a toll on it's design, and is why there are just a vast amount of opcodes available, including some that are arguably useless and actually execute slower on hardware than if you wrote your code to avoid those instructions. My favorite example of that is the cursed Binary Coded Decimal set of instructions which haven't been popular since the 80s. However, this is an added complexity that is a few hours of extra work. The benefits to using x86 far outweigh the costs.

ARM is still on our radar though, especially for IoT use cases which typically operate natively on ARM processors. Right now we are focused on x86, but after that, who knows, especially for enterprise and permissioned blockchains.

Posted: 10/2/2017 8:01:55 AM

Look-ahead Staking in Qtum - What does that even mean?

Many times while speaking about Qtum, I mention "look-ahead staking" as part of our "smart staking protocol". It sounds like some kind of cool technology, but what exactly is it? I'll answer that here in only slightly technical terms.

So, if you've read my Proof of Stake Version 3 article, then you know that there is a lot to the PoS protocol we use to maintain consensus. However, this is actually fairly old technology, debuting in 2015 in Blackcoin. The "staker", ie, the thing that "mines" proof-of-stake blocks was initially based heavily on the existing Blackcoin code. Basically, the way it works is it builds a block from transactions in the mempool, etc. It then checks every matured UTXO in your wallet, sees if any of them are valid for creating a PoS block, and if not, then it throws away that in-progress block and 16 seconds later will do it all again.

This is somewhat inefficient in Blackcoin; it processes transactions multiple times. However, this is not a problem there, as Blackcoin's transactions are basically the same form as Bitcoin, very cheap to verify, etc. This really became a problem though early on in Qtum. Basically we were running some tests to see what would happen if we stuffed a ton of slow contract transactions into a single block, if the network would handle it. The network actually does handle it, but the staker doesn't. Basically it would take so long to create a block, that by the time the block was full and had a valid proof-of-stake UTXO, it would no longer be valid. Additionally, during this time staking caused 100% CPU usage in basically an infinite loop of reprocessing the same transactions.

We fixed some of the problems with this by landing on the obvious idea of not completely re-processing every transaction every time the staker tries to make a block. This was easy. However, we found it severely restricted transaction throughput and increased the rate of "expired" stakes. ie, PoS blocks that could've been made, but the staker spent too much time creating, and so the wallet owner missed out on the stake reward and "missed a block". This affects not only their personal rewards, but also can have a major impact on the overall security of the entire network. People could craft transactions, and broadcast them at unlucky times and basically force people staking to miss blocks.

This was a serious problem, and it gravely worried us for probably around 2 weeks while we figured out a solution. Basically, all of the obvious solutions involved severely restricting transaction throughput. And then the lightbulb went off, while digging even deeper into how PoSv3 works (I think this was around the time I was writing the PoSv3 article I posted on my blog. It was originally formed as a reference document for our internal developers), we finally arrived at the idea of using "look-ahead staking". It wouldn't be an easy change, but it was also not anything that would change the consensus model or otherwise make compromises on transaction throughput or security.

So, to give a quick recap for those who didn't read my grossly technical PoSv3 article, basically, when you try to create a PoS block, you iterate over all of your coins, see if a UTXO matches a particular set of conditions, and then use it in a particular way in a block in order to make a valid PoS block. For security, a particular PoS block being valid doesn't matter at all on what transactions are in the block. If that did matter, then people could just iterate through different transaction orderings etc in order to mine a block similar to proof-of-work. And the timing requirements of PoS blocks are fairly strict compared to proof-of-work implementations. Basically you can make a valid PoS block, and it will expire 16 seconds later (or less). After that 16 seconds, your UTXO for creating the PoS block is no longer valid, and you need to check through them all again.

Now maybe you see where I'm going with this. Look-ahead staking involved the following fundamental changes to how the staker actually creates blocks:

  • It checks through the UTXOs for staking first before it ever generates a block or validates any transaction. This way, you're not wasting time processing transactions when you don't even have a valid PoS block in the first place. This one is pretty obvious
  • There is now a soft and hard time limit on long the staker will spend processing transactions for a block. When the soft time limit approaches, it stops processing contract transaction. When the hard time limit approaches, it stops processing all transactions, and considers the block to be as full as it will get.
  • Instead of only checking for blocks that are valid right now, it checks for blocks that are valid right now, and up to 120 seconds into the future.

This last point is the important part. The first two parts make for a workable solution, but severely hampers transaction throughput to only what can be processed in less than 16 seconds. Although no blocks today should take 16 seconds to validate, you must remember the staker still can suffer from thread switches, slow CPU speeds, etc. So, the more time to process transactions the better. Back when this was first implemented there was actually a particular transaction that took 30 seconds to process (and hit the block gas limit) due to an exponential complexity bug, so this was more significant then as well, but this way is still superior for reliability and throughput.

Basically, the look-ahead staker's core feature is that it checks over the next 120 seconds (in 16-second timeslots), if the current wallet has any UTXOs that can create a PoS block. In some cases, it might see that in say 64 seconds it can. So, it prepares the block with 64 seconds to spend processing transactions etc, and then the moment the network will recognize it as a valid block, the staker broadcasts it.

And finally, we did a LOT of optimization compared to the old version of the staker used in Blackcoin. So, to round it all up as an easy to digest list, this is a quick summary of what we did to make sure Qtum's staker is the most efficient one to exist:

  • Only processes transactions when a UTXO in the wallet can create a PoS block
  • Uses lookahead so that longer sleeping is possible
  • Has enforced time-limits to ensure that missed stakes are significantly less likely
  • Instead of using the txindex database and reading the full block data from disk in order to stake (and validate stakes at consensus level), Qtum uses the significantly faster and more optimized UTXO set and "chainstate" database which only contains block headers and some meta-data.
  • Qtum caches data which is used to determine if a UTXO can be staked. This means that significantly less data needs to be read from disk.
  • Because of the look-ahead model, the staker can spend more time sleeping than otherwise, potentially keeping your computer in an idle low-power mode

Our implementation is a big improvement from the original version, but it still has some ways to go. One of the big things that are not currently done is that after a look-ahead PoS block is created, no more transactions are added to it, even if there is time to add them. This is definitely possible to do, but a bit of a nightmare due to how the Bitcoin block creation code is structured. We're always accepting of pull requests though if this sounds interesting to someone out there :)

Posted: 9/24/2017 5:53:19 PM

The Faults and Shortcomings of the EVM

So, first to give an introduction. I co-founded Qtum which is a project that basically takes the Ethereum Virtual Machine (EVM) and puts it onto a blockchain that isn't Ethereum (along with a ton of other stuff). On my way to doing that I was forced against my will to learn way more than I ever wanted to know about the EVM. What is my main take away from all this learning? Well, I don't like it. I personally think it's an impractical design with an even more impractical implementation. And as a disclaimer, I intend to see through that we add another VM to Qtum which fixes at least most of these problems.

Anyway, so lets cut to the chase. What was the point of the EVM and why was it made in the first place? According to the Design Rationale it was designed for:

  1. Simplicity
  2. Determinism
  3. Compact bytecode size
  4. Specialization for blockchain
  5. Simplicity (uhh?)
  6. Optimizable

And if you skim through that document, you see the reasoning for the EVM is pretty well thought out. So where does it go wrong? Well, it doesn't work with today's technology and paradigms. It's a very good design built for a world that doesn't currently exist. I'll circle back around to this, but lets start with my favorite thing to hate in the EVM

256 bit integers

On most modern processors you have pretty much 4 good choices for fast speedy math:

  1. 8 bit integers
  2. 16 bit integers
  3. 32 bit integers
  4. 64 bit integers

Of course, in some cases 32 bit is faster than 16 bit, and at least in x86 8 bit math is not fully supported (ie, no native division or multiplication), but for the most part if you use one of these sizes you have some guarantees of how many cycles a math operation takes, and it's fast, measured in a couple of nanoseconds if you don't include cache misses and memory latency. Anyway, suffice to say that these are the size of integers that modern processor use "natively", without any translation or other things requiring extraneous operations.

So of course, since the EVM is intended to be optimized for speed and efficiency, it's choice for integer size is:

  1. 256 bit integers

For reference, here's how to add 2 32 bit integers in x86 assembly (ie, the processor your PC has in it)

mov eax, dword [number1]
add eax, dword [number2]

And here's how to add 2 64 bit integers in x86 assembly assuming your processor is 64 bit capable:

mov rax, qword [number1]
add rax, qword [number2]

And here is how to add 2 256 bit integers on a 32 bit x86 computer

mov eax, dword [number]
add dword [number2], eax
mov eax, dword [number1+4]
adc dword [number2+4], eax
mov eax, dword [number1+8]
adc dword [number2+8], eax
mov eax, dword [number1+12]
adc dword [number2+12], eax
mov eax, dword [number1+16]
adc dword [number2+16], eax
mov eax, dword [number1+20]
adc dword [number2+20], eax
mov eax, dword [number1+24]
adc dword [number2+24], eax
mov eax, dword [number1+28]
adc dword [number2+28], eax

Although, adding these 256 bit integers on a 64 bit x86 computer is a bit better

mov rax, qword [number]
add qword [number2], rax
mov rax, qword [number1+8]
adc qword [number2+8], rax
mov rax, qword [number1+16]
adc qword [number2+16], rax
mov rax, qword [number1+24]
adc qword [number2+24], rax

Anyway, suffice to say that working with 256 bit integers is significantly more complex and slow than working with an integer length natively supported by the processor.

The EVM embraces this design though because it is much simpler to only support 256 bit integers, than to add additional opcodes for working with other integer sizes. The only non-256 bit operations are a series of push instructions for pulling data from 1-32 bytes from memory, and a few instructions that work with 8 bit integers.

So, the design rationale for using this inefficient integer size for all operations?

"4 or 8 byte words are too restrictive to store addresses and big values for crypto computations, and unlimited values are too hard to make a secure gas model around."

I must admit, being able to compare 2 addresses with a single operation is pretty cool. However, here is how you would do the same in x86 when in 32-bit mode (without SSE and other optimizations):

mov esi, [address1]
mov edi, [address2]
mov ecx, 32 / 4
repe cmpsd 
jne not_equal
; if reach here, then they're equal

Assuming address1 and address2 are hardcoded addresses, that's around 6 + 5 + 5 = 16 bytes of opcodes, or if the addresses were in the stack, it might be something like 6 + 3 + 3 = 12 bytes of opcodes.

The other justification for the large integer size is "big values for cryptography computations", however, since reading that several months ago I've had a problem figuring out a single use case for 256 bit integers that doesn't involve comparing if an address or hash is equal. Custom cryptography is plainly too expensive to execute on the public blockchain. I searched for over an hour on github trying to find a solidity contract that does anything I'd define as cryptography and I came up with nothing. Almost any form of cryptography is guaranteed to be slow and complex on modern computers, and this makes it non-economical to execute on the public Ethereum blockchain due to gas costs (not to mention the effort of porting any real algorithm to Solidity). However, there are still private blockchains where gas costs do not matter. But if you own your own blockchain you won't want to do this as part of a slow EVM contract, you would use C++, or Go, or any number of real programming languages to implement the cryptography in native code as a pre-compiled smart contract. So this really blows the entire justification for supporting only 256 bit integers out of the water. This I feel is the real foundation of problems with the EVM, but there's a lot more lurking in the less obvious areas.

EVM's Memory Model

The EVM has 3 main places you can put data

  1. Stack
  2. Temporary memory
  3. Permanent memory

The stack has certain limits, so sometimes you need to use temporary memory instead of very expensive permanent memory. There is no allocate instruction or anything like that in the EVM. You claim memory by writing to it. This may seem pretty clever, but it's also extremely sinister. For instance, if you write to address 0x10000, your contract just allocated 64Kwords (ie, 64K of 256 bit words) of memory and paid the gas costs as if you had used all 64Kwords of memory. Well, easy workaround, just track the last memory address you use and increment it when you need more. That works decently, unless you happen to need at one point a lot of memory and then you don't need that memory anymore. Let's say you do some crazy algorithm that uses 100 words of memory. So, you allocate that, use the memory, whatever and pay for 100 words of memory... then you exit that function. Now you're back in some other function and it needs just 1 word of memory for scratch space or something, so it allocates another word. You're now using 101 words of memory. There is no way to free memory. You can in theory decrease that special pointer you were keeping track of for the last space of memory, but that only works if you know that entire block of memory will never be referenced again and can safely be reused. If out of those 100 words, you need the word at 50 and the word at 90, then you must copy those to another location (like the stack) and then that memory can be freed. There is no tools provided by the EVM to help with this. The technical term for it is memory fragmentation. It is up to you to vet that each function doesn't use that memory that was allocated and globally accessible, and if you reuse that memory and something got through your vetting process, then your contract now has a potentially critical state corruption bug. So your options are basically either open yourself up to a large class of memory reuse bugs, or pay more gas for memory even though you have already allocated more than you need.

Additionally, allocating memory does not have a linear cost. If you have allocated 100 words of memory and you allocate 1 more word, it is significantly more expensive than allocating that 1st word of memory when your program starts. This aspect greatly amplifies the economic cost of being on the safe side, compared to opening yourself up to more contract bugs for greatly decreased gas costs.

So, why use memory at all? Why not use the stack? Well, the stack is ridiculously limited.

EVM's Stack

The EVM is a stack-based machine. That means it uses a stack for most of it's operations, rather than a set of registers. Stack based machines are typically much simpler to optimize, but result in more opcodes being needed for most operations when compared to a similar register based machine.

Anyway, so the EVM has a lot of different operations, most of which operate on the stack alone. Notice the SWAP and DUP series of instructions. These go up to 16. Now try to compile this contract:

pragma solidity ^0.4.13;

contract Something{

    function foo(address a1, address a2, address a3, address a4, address a5, address a6){
        address a7;
        address a8;
        address a9;
        address a10;
        address a11;
        address a12;
        address a13;
        address a14;
        address a15;
        address a16;
        address a17;
    }
}

You'll be met with this error:

CompilerError: Stack too deep, try removing local variables.

This error occurs because once an item is 16 levels deep in the stack, it is effectively impossible to access without popping items off the stack. The official "solution" for this problem is to use less variables and make functions smaller. Various workarounds also include stuffing variables into a struct or array and using the memory keyword (which isn't able to be applied to normal variables for... reasons?). So, lets fix our contract to use some memory based structs:

pragma solidity ^0.4.13;

contract Something{
    struct meh{
        address x;
    }
    function foo(address a1, address a2, address a3, address a4, address a5, address a6){
        address a7;
        address a8;
        address a9;
        address a10;
        address a11;
        address a12;
        address a13;
        meh memory a14;
        meh memory a15;
        meh memory a16;
        meh memory a17;
    }
}

And the result is..

CompilerError: Stack too deep, try removing local variables.

But we replaced these variables with memory? Doesn't that fix it? Well, no. Because now instead of storing 17 256 bit integers on the stack, we are storing 13 integers and 4 256 bit memory addresses (ie, references) to a 256 bit slot of memory. Part of this is a Solidity problem, but the primary problem is that the EVM is missing a way to access arbitrary items on the stack. Every other VM implementation I know of works around this basic problem by either

  1. Encouraging small stack sizes and making it easy to swap stack items to memory or alternative storage (like local variables, in .NET)
  2. Implementing a pick instruction or similar that allows access to any arbitrary stack slot

However, in the EVM, the stack is the only free place of memory for data and computation, any other place has a direct cost in the form of gas. So, this directly discourages small stack sizes, because anywhere else is more expensive... so we arrive at basic language implementation problems like this.

Bytecode size

In the rationale document it's stated their goal was for EVM bytecode to be both simple and compact. However, this is like saying that you prefer to write code that is both descriptive and concise. They are fundamentally differing goals accomplished in fundamentally different ways. A simple instruction set is accomplished by limiting the number of operations, and keeping operations concise and simplistic. Meanwhile, a compact bytecode that produces small programs is accomplished by making an instruction set that performs as many operations as possible in as few bytes of code as possible.

Ultimately, despite "compact bytecode size" being a goal in their rationale, the actual implementation of the EVM does not attain that goal in any sense. It is instead focused on a simplistic instruction set that is easy to create a gas model around. And I'm not saying this is wrong or bad, only that one of their primary goals of the EVM is fundamentally at ends with the other goals of the EVM. Also, one number given in that document is that a C program takes over 4000 bytes in order to implement "hello world". This is definitely not the case and glosses over the different environments and optimizations that take place in C programs. In the C program they measured, I expect there was also ELF data, relocation data, and alignment optimizations - aligning code and data on certain boundaries such as 32 byte or 4kb can have a measurable impact on the performance of the program on physical processors. I personally have built a simplistic bare bones C program that compiles to 46 bytes of x86 machine code, and a simple greeter type program which compiles to ~700 bytes, while Solidity's example compiles to over 1000 bytes of EVM bytecode.

I understand the need for a simplistic instruction set for security reasons, but it causes significant bloat on the blockchain. Passing over this as if EVM smart contract bytecode is as small as possible is detrimental. It could clearly be made much smaller by including a standard library and supporting opcodes that do a batch of common operations rather than needing to execute several opcodes for such a thing.

256 bit integers (again)

But really, 256 bit integers are awful. And the most ridiculous part is they are used in places where they have no reasonable use. It's effectively impossible to use more than 4B (32 bits) units of gas, so what integer size is used for specifying and counting gas? 256 bits of course. Memory is fairly expensive, so what's the address size for memory addresses in the EVM? 256 bit of course, for when your contract needs more words of memory than there are atoms in the universe. I would complain about using 256 bit integers for both addresses and values in permanent storage, but this actually provides some interesting abilities to use a hash for some data and have no worries about conflicts in the address space, so I guess that gets a pass. Every single instance where you could use any integer size, the EVM calls for 256 bits. Even JUMP uses 256bit, but in their defense they do limit the highest jump destination to 0x7FFFFFFFFFFFFFFF and effectively limit the jump destination to a signed 64 bit integer. And then for currency values themselves. The smallest unit of ETH is wei, so we arrive at the total coin supply (in wei) is 1000000000000000000 * 200000000 (200M is an estimate, currently the supply is ~92M).. And so, if we subtract that number from 2 to the power of 256 (maximum value storable by a 256 bit integer), we get.. 1.157920892373162e+77. Just enough space to send more wei than will ever exist plus a magnitude greater than the number of atoms in the universe. Basically, 256 bit integers are incredibly impractical and unnecessary for almost any application that the EVM is designed for.

Lack of standard library

If you've ever developed a Solidity smart contract, this is probably one of the first things you encountered as a problem. There is no standard library, at all. If you want to determine if two strings are equal, there is no strcmp or memcmp or anything like that, you must write the code yourself or copy code from the internet. The Zepplin Project is making this situation bearable by providing a standard library that contracts can use (either by including it in the contract itself or by calling an external contract). However, the limitations of this approach is apparent when considering that it is cheaper to use two SHA3 operations and then compare the resulting hashes, than it is to loop through the bytes of a string (32 bytes at a time) to determine if they are equal. Having a standard library of precompiled contracts that use native code with set, reasonable gas prices would be greatly beneficial to the entire smart contract ecosystem. Without this though, people instead copy and paste code from open source code, with unknown security implications. In addition to this people will optimize their code, trying to find shortcuts and reductions in gas usage, even at the risk of potentially compromising the security profile of their contract.

The economics and game theory of gas

I plan on making a full blog post about this topic, but the EVM doesn't just make good practices hard, but also expensive. For instance, it costs quite a bit of gas to store data on the blockchain. This means it can be incredibly expensive to cache any amount of data within a smart contract. So, instead it is computed with each contract execution. Over time more gas is consumed and blockchain nodes waste more time executing the same code to compute the same data. Furthermore, there is very little actual cost to data stored on the blockchain. It does not directly increase the size of the blockchain (in either Ethereum or Qtum). The real cost is the data which enters the blockchain in the form of data sent to contracts, as that is what directly increases the size of the blockchain. It is almost cheaper in Etheruem to enter 32 bytes of data into the blockchain in the form of a transaction (23176 gas) than it costs to store 32 bytes in a contract (20,000), and it is significantly cheaper when scaling that 64 bytes of data (29704 gas for tx compared to 80,000 gas for storage). There is a "virtual" cost to data stored in a contract, but it is much less than most people assume. It is basically just the cost of iterating through the database storing data for the entire blockchain. The RLP and LevelDB database system used by both Qtum and Ethereum is very efficient at handling this however, and ongoing costs are no where close to linear.

Another part of the EVM that encourages inefficient code is that it is not possible to call a specific function in a smart contract. This is for security, as being able to directly call a function like withdraw() in an ERC20 contract would be bad. However, this is needed for standard libraries to be efficient. Instead of simply being able to load a specific piece of code from an external contract, it's all or nothing, and execution always starts at the first byte of the code, there is no way to jump around and skip all of the Solidity ABI bootstrap code. So, in the end this encourages for small functions to be duplicated (because they are more expensive to call externally), and to deploy as many functions in a contract as possible. There is no cost difference for calling a 100 byte contract or a 10,000 byte contract, despite all of the code needing to be loaded into memory either way.

And finally, it is not at all possible to access the storage of a contract directly. The contract code must be fully loaded from disk, executed, the code must load the data from the storage that you requested, and then finally return it to the calling contract while making sure not to use variable size arrays. Oh, and if you need some back and forth because you didn't know the exact data you needed, at least it's in cache so it's cheap for nodes, but there is no discount on the gas price for calling the external contract a second time. It's possible to access an external contract's storage without needing to completely load its code. In fact, it's just as cheap computationally as accessing the current contract's storage, so why make it so expensive and discourage efficiency?

Lack of debugging and testability

This problem lies not just on the fault of the EVM's design, but also its implementations. Of course, some projects are striving to make this as easy as possible, like Truffle. However, the EVM's design does not make this at all easy. The only exception available is "OutOfGas", there are no logging facilities, no easy way to call external native code (such as for test helpers and mocking data), and the Ethereum blockchain itself is difficult to create a private testnet with, and the private blockchain has different parameters and behavior. Qtum at least has a leg up here thanks to "regtest" mode, but testing the EVM with mock data etc is still incredibly hard since no implementation is really stand-alone. And there are no debuggers I know of that work at Solidity level, there are at least 1 EVM assembly debuggers I know of though, but that is far from user friendly. There is no symbol format or debug data format established at all for EVM and/or Solidity, and I've found no EIPs or other effort to begin working toward a standardized debug format like DWARF.

Floating point numbers

One common thing I see people say when the lack of floating point support comes up is "well no one should be handling currency values using floating point numbers". This is incredibly narrow-minded though. There are many practical use cases for floating point numbers such as risk modeling, scientific computations, and cases where ranges and approximations are more important than exact values. Saying the potential applications of smart contracts to only handling currency values is unrealistic and needlessly limiting.

Immutable code

One of the major things that contracts need to be designed for is upgradeability, because it's not a matter of if a contract needs changed, but rather when. In the EVM code is completely immutable, and because it uses the Harvard Architecture of computing, it is not possible to load code into memory and then execute it. Code and data are completely separate things treated differently. So, the only option for upgrading a contract is to deploy a completely new contract, duplicating all of the code and make the old contract redirect to it. Patching pieces of the contract and partially (or wholly) replacing the code is not possible.

Conclusion

I finished my beer (well, hard cider) and I think my rant is coming to an end. The EVM at this point is a necessary evil. It was the first in this space, and like most things that come first (like Javascript), there are many problems. And it's design is very unconventional, and is why I don't think we will see any conventional programming languages ported to the EVM. It's design is actively hostile to the many common language paradigms that have been established over the past 50+ years. This includes things like JUMPDEST making jump table optimizations difficult, no tail-recursion support, strange and inflexible memory model, difficult to understand DELEGATECALL model for external code, lack of commonly used opcodes such as bitwise shifts, inflexible stack size limits, and of course the 256 bit integers. These aspects make porting traditional languages to the EVM at best inefficient and at worst impossible. This I assume is why all EVM languages currently are built specifically for the EVM and with all of it's unconventional models in mind. It is a sad state of affairs really.

I mean this entire post not as an assault or anything to the designers of the EVM, it's just how things are. Hindsight is always 20/20 and I know I've seen many regrets from them about certain aspects of the EVM's design. I don't wish to attack them (even if my sarcastic tone might seem like it sometimes), but rather I want to bring these faults to the attention of the greater blockchain developer community, so that they will not be repeated, and hopefully also provide some insight into all of the "why can't I do this in Solidity" type questions at the same time. The EVM has an incredible design that we are still learning the benefits and pitfalls of, and it's obvious that we have a long way to go before smart contracts can be as efficient and powerful as we all know they can be. The EVM was the first contender in this space, and ultimately we're still learning and discovering all of the use cases of smart contracts and what kind of design benefits them most. We've come a long way, but there's still a long way to go.

Posted: 8/13/2017 4:51:44 AM

The missing explanation of Proof of Stake Version 3

In every cryptocurrency there must be some consensus mechanism which keeps the entire distributed network in sync. When Bitcoin first came out, it introduced the Proof of Work (PoW) system. PoW is done by cryptographically hashing a piece of data (the block header) over and over. Because of how one-way hashing works. One tiny change in the data can cause an extremely different hash to come of it. Participants in the network determine if the PoW is valid complete by judging if the final hash meets a certain condition, called difficulty. The difficulty is an ever changing "target" which the hash must meet or exceed. Whenever the network is creating more blocks than scheduled, this target is changed automatically by the network so that the target becomes more and more difficult to meet. And thus, requires more and more computing power to find a hash that matches the target within the target time of 10 minutes.

Definitions

Some basic definitions might be unfamiliar to some people not familiar with the blockchain code, these are:

  • UTXO - Unspent Transaction Output
  • vin - In a transaction a 'vin' is a UTXO that is being spent as an "input"
  • vout - In a transaction, a 'vout' is the new UTXO that is being created as an "output". The 'vouts' is effectively all of the coins sent after the transaction is complete
  • hashing - The process of creating a hash. This takes an arbritrary amount of data as input, and outputs a fixed size "digest" which is impossible to reverse. Additionally, if you change anything about the input data, it drastically changes the output digest.
  • hash - The result of a hashing algorithm.
  • script - The computer program that determines how a vout/UTXO is spent.
  • pay-to-pubkeyhash script - The most common script used to send money in Bitcoin and other cryptocurrencies. In order to send money, all you need to know is the hash of their public key (commonly represented as a base58 address), and in order to spend the received money all that is needed is a signature from the public key, and the public key itself.
  • pay-to-pubkey script - A very simple script which has very similar functionality to pubkeyhash scripts. However, instead of sending money to the hash of a public key, the money is sent to the public key itself. All that is needed for spending is a cryptographic signature proving ownership of the public key.
  • prevout - The vout which is spent (as a vin) in a transaction
  • OP_RETURN script - OP_RETURN is an operation used in script which effectively makes an output provably unspendable. It is commonly used to hold small amounts of data on the blockchain without polluting the UTXO set.

Proof of Work and Blockchain Consensus Systems

Proof of Work is a proven consensus mechanism that has made Bitcoin secure and trustworthy for 8 years now. However, it is not without it's fair share of problems. PoW's major drawbacks are:

  1. PoW wastes a lot of electricity, harming the environment.
  2. PoW benefits greatly from economies of scale, so it tends to benefit big players the most, rather than small participants in the network.
  3. PoW provides no incentive to use or keep the tokens.
  4. PoW has some centralization risks, because it tends to encourage miners to participate in the biggest mining pool (a group of miners who share the block reward), thus the biggest mining pool operator holds a lot of control over the network.

Proof of Stake was invented to solve many of these problems by allowing participants to create and mine new blocks (and thus also get a block reward), simply by holding onto coins in their wallet and allowing their wallet to do automatic "staking". Proof Of Stake was originally invented by Sunny King and implemented in Peercoin. It has since been improved and adapted by many other people. This includes "Proof of Stake Version 2" by Pavel Vasin, "Proof of Stake Velocity" by Larry Ren, and most recently CASPER by Vlad Zamfir, as well as countless other experiments and lesser known projects.

For Qtum we have decided to build upon "Proof of Stake Version 3", an improvement over version 2 that was also made by Pavel Vasin and implemented in the Blackcoin project. This version of PoS as implemented in Blackcoin is what we will be describing here. Some minor details of it has been modified in Qtum, but the core consensus model is identical.

For many community members and developers alike, proof of stake is a difficult topic, because there has been very little written on how it manages to accomplish keeping the network safe using only proof of ownership of tokens on the network. This blog post will go into fine detail about Proof of Stake Version 3 and how it's blocks are created, validated, and ultimately how a pure Proof of Stake blockchain is possible to secure. This will assume some technical knowledge, but I will try to explain things so that most of the knowledge can be gathered from context. You should at least be familiar with the concept of the UTXO-based blockchain.

Before we talk about PoS, it helps to understand how the much simpler PoW consensus mechanism works. It's mining process can be described in just a few lines of pseudo-code:

while(blockhash > difficulty) {
  block.nonce = block.nonce + 1
  blockhash = sha256(sha256(block))
}

A hash is a cryptographic algorithm which takes an arbritrary amount of input data, does a lot of processing of it, and outputs a fixed-size "digest" of that data. It is impossible to figure out the input data with just the digest. So, PoW tends to function like a lottery, where you find out if you won by creating the hash and checking it against the target, and you create another ticket by changing some piece of data in the block. In Bitcoin's case, nonce is used for this, as well as some other fields (usually called "extraNonce"). Once a blockhash is found which is less than the difficulty target, the block is valid, and can be broadcast to the rest of the distributed network. Miners will then see it and start building the next block on top of this block.

Proof of Stake's Protocol Structures and Rules

Now enter Proof of Stake. We have these goals for PoS:

  1. Impossible to counterfeit a block
  2. Big players do not get disproportionally bigger rewards
  3. More computing power is not useful for creating blocks
  4. No one member of the network can control the entire blockchain

The core concept of PoS is very similar to PoW, a lottery. However, the big difference is that it is not possible to "get more tickets" to the lottery by simply changing some data in the block. Instead of the "block hash" being the lottery ticket to judge against a target, PoS invents the notion of a "kernel hash".

The kernel hash is composed of several pieces of data that are not readily modifiable in the current block. And so, because the miners do not have an easy way to modify the kernel hash, they can not simply iterate through a large amount of hashes like in PoW.

Proof of Stake blocks add many additional consensus rules in order to realize it's goals. First, unlike in PoW, the coinbase transaction (the first transaction in the block) must be empty and reward 0 tokens. Instead, to reward stakers, there is a special "stake transaction" which must be the 2nd transaction in the block. A stake transaction is defined as any transaction that:

  1. Has at least 1 valid vin
  2. It's first vout must be an empty script
  3. It's second vout must not be empty

Furthermore, staking transactions must abide by these rules to be valid in a block:

  1. The second vout must be either a pubkey (not pubkeyhash!) script, or an OP_RETURN script that is unspendable (data-only) but stores data for a public key
  2. The timestamp in the transaction must be equal to the block timestamp
  3. the total output value of a stake transaction must be less than or equal to the total inputs plus the PoS block reward plus the block's total transaction fees. output <= (input + block_reward + tx_fees)
  4. The first spent vin's output must be confirmed by at least 500 blocks (in otherwords, the coins being spent must be at least 500 blocks old)
  5. Though more vins can used and spent in a staking transaction, the first vin is the only one used for consensus parameters.

These rules ensure that the stake transaction is easy to identify, and ensures that it gives enough info to the blockchain to validate the block. The empty vout method is not the only way staking transactions could have been identified, but this was the original design from Sunny King and has worked well enough.

Now that we understand what a staking transaction is, and what rules they must abide by, the next piece is to cover the rules for PoS blocks:

  • Must have exactly 1 staking transaction
  • The staking transaction must be the second transaction in the block
  • The coinbase transaction must have 0 output value and a single empty vout
  • The block timestamp must have it's bottom 4 bits set to 0 (referred to as a "mask" in the source code). This effectively means the blocktime can only be represented in 16 second intervals, decreasing it's granularity
  • The version of the block must be 7
  • A block's "kernel hash" must meet the weighted difficulty for PoS
  • The block hash must be signed by the public key in the staking transaction's second vout. The signature data is placed in the block (but is not included in the formal block hash)
  • The signature stored in the block must be "LowS", which means consisting only of a single piece of data and must be as compressed as possible (no extra leading 0s in the data, or other opcodes)
  • Most other rules for standard PoW blocks apply (valid merkle hash, valid transactions, timestamp is within time drift allowance, etc)

There are a lot of details here that we'll cover in a bit. The most important part that really makes PoS effective lies in the "kernel hash". The kernel hash is used similar to PoW (if hash meets difficulty, then block is valid). However, the kernel hash is not directly modifiable in the context of the current block. We will first cover exactly what goes into these structures and mechanisms, and later explain why this design is exactly this way, and what unexpected consequences can come from minor changes to it.

The Proof of Stake Kernel Hash

The kernel hash specifically consists of the following exact pieces of data (in order):

  • Previous block's "stake modifier" (more detail on this later)
  • Timestamp from "prevout" transaction (the transaction output that is spent by the first vin of the staking transaction)
  • The hash of the prevout transaction
  • The output number of the prevout (ie, which output of the transaction is spent by the staking transaction)
  • Current block time, with the bottom 4 bits set to 0 to reduce granularity. This is the only thing that changes during staking process

The stake modifier of a block is a hash of exactly:

  • The hash of the prevout transaction in PoS blocks, OR the block hash in PoW blocks.
  • The previous block's stake modifier (the genesis block's stake modifier is 0)

The only way to change the current kernel hash (in order to mine a block), is thus to either change your "prevout", or to change the current block time.

A single wallet typically contains many UTXOs. The balance of the wallet is basically the total amount of all the UTXOs that can be spent by the wallet. This is of course the same in a PoS wallet. This is important though, because any output can be used for staking. One of these outputs are what can become the "prevout" in a staking transaction to form a valid PoS block.

Finally, there is one more aspect that is changed in the mining process of a PoS block. The difficulty is weighted against the number of coins in the staking transaction. The PoS difficulty ends up being twice as easy to achieve when staking 2 coins, compared to staking just 1 coin. If this were not the case, then it would encourage creating many tiny UTXOs for staking, which would bloat the size of the blockchain and ultimately cause the entire network to require more resources to maintain, as well as potentially compromise the blockchain's overall security.

So, if we were to show some pseudo-code for finding a valid kernel hash now, it would look like:

while(true){
    foreach(utxo in wallet){
        blockTime = currentTime - currentTime % 16
        posDifficulty = difficulty * utxo.value
        hash = hash(previousStakeModifier << utxo.time << utxo.hash << utxo.n << blockTime)
        if(hash < posDifficulty){
            done
        }
    } 
    wait 16s -- wait 16 seconds, until the block time can be changed
}

This code isn't so easy to understand as our PoW example, so I'll attempt to explain it in plain english:

Do the following over and over for infinity: 
Calculate the blockTime to be the current time minus itself modulus 16 (modulus is like dividing by 16, but then only instead of taking the result, taking the remainder)
Calculate the posDifficulty as the network difficulty, multiplied by the number of coins held by the UTXO. 
Cycle through each UTXO in the wallet. With each UTXO, calculate a SHA256 hash using the previous block's stake modifier, as well as some data from the the UTXO, and finally the blockTime. Compare this hash to the posDifficulty. If the hash is less than the posDifficulty, then the kernel hash is valid and you can create a new block. 
After going through all UTXOs, if no hash produced is less than the posDifficulty, then wait 16 seconds and do it all over again. 

Now that we have found a valid kernel hash using one of the UTXOs we can spend, we can create a staking transaction. This staking transaction will have 1 vin, which spends the UTXO we found that has a valid kernel hash. It will have (at least) 2 vouts. The first vout will be empty, identifying to the blockchain that it is a staking transaction. The second vout will either contain an OP_RETURN data transaction that contains a single public key, or it will contain a pay-to-pubkey script. The latter is usually used for simplicity, but using a data transaction for this allows for some advanced use cases (such as a separate block signing machine) without needlessly cluttering the UTXO set.

Finally, any transactions from the mempool are added to the block. The only thing left to do now is to create a signature, proving that we have approved the otherwise valid PoS block. The signature must use the public key that is encoded (either as pay-pubkey script, or as a data OP_RETURN script) in the second vout of the staking transaction. The actual data signed in the block hash. After the signature is applied, the block can be broadcast to the network. Nodes in the network will then validate the block and if it finds it valid and there is no better blockchain then it will accept it into it's own blockchain and broadcast the block to all the nodes it has connection to.

Now we have a fully functional and secure PoSv3 blockchain. PoSv3 is what we determined to be most resistant to attack while maintaining a pure decentralized consensus system (ie, without master nodes or currators). To understand why we approached this conclusion however, we must understand it's history.

PoSv3's History

Proof of Stake has a fairly long history. I won't cover every detail, but cover broadly what was changed between each version to arrive at PoSv3 for historical purposes:

PoSv1 - This version is implemented in Peercoin. It relied heavily on the notion of "coin age", or how long a UTXO has not been spent on the blockchain. It's implementation would basically make it so that the higher the coin age, the more the difficulty is reduced. This had the bad side-effect however of encouraging people to only open their wallet every month or longer for staking. Assuming the coins were all relatively old, they would almost instantaneously produce new staking blocks. This however makes double-spend attacks extremely easy to execute. Peercoin itself is not affected by this because it is a hybrid PoW and PoS blockchain, so the PoW blocks mitigated this effect.

PoSv2 - This version removes coin age completely from consensus, as well as using a completely different stake modifier mechanism from v1. The number of changes are too numerous to list here. All of this was done to remove coin age from consensus and make it a safe consensus mechanism without requiring a PoW/PoS hybrid blockchain to mitigate various attacks.

PoSv3 - PoSv3 is really more of an incremental improvement over PoSv2. In PoSv2 the stake modifier also included the previous block time. This was removed to prevent a "short-range" attack where it was possible to iteratively mine an alternative blockchain by iterating through previous block times. PoSv2 used block and transaction times to determine the age of a UTXO; this is not the same as coin age, but rather is the "minimum confirmations required" before a UTXO can be used for staking. This was changed to a much simpler mechanism where the age of a UTXO is determined by it's depth in the blockchain. This thus doesn't incentivize inaccurate timestamps to be used on the blockchain, and is also more immune to "timewarp" attacks. PoSv3 also added support for OP_RETURN coinstake transactions which allows for a vout to contain the public key for signing the block without requiring a full pay-to-pubkey script.

References:

  1. https://peercoin.net/assets/paper/peercoin-paper.pdf
  2. https://blackcoin.co/blackcoin-pos-protocol-v2-whitepaper.pdf
  3. https://www.reddcoin.com/papers/PoSV.pdf
  4. https://blog.ethereum.org/2015/08/01/introducing-casper-friendly-ghost/
  5. https://github.com/JohnDolittle/blackcoin-old/blob/master/src/kernel.h#L11
  6. https://github.com/JohnDolittle/blackcoin-old/blob/master/src/main.cpp#L2032
  7. https://github.com/JohnDolittle/blackcoin-old/blob/master/src/main.h#L279
  8. http://earlz.net/view/2017/07/27/1820/what-is-a-utxo-and-how-does-it
  9. https://en.bitcoin.it/wiki/Script#Obsolete_pay-to-pubkey_transaction
  10. https://en.bitcoin.it/wiki/Script#Standard_Transaction_to_Bitcoin_address_.28pay-to-pubkey-hash.29
  11. https://en.bitcoin.it/wiki/Script#Provably_Unspendable.2FPrunable_Outputs
Posted: 7/27/2017 7:04:13 PM