The Faults and Shortcomings of the EVM

So, first to give an introduction. I co-founded Qtum which is a project that basically takes the Ethereum Virtual Machine (EVM) and puts it onto a blockchain that isn't Ethereum (along with a ton of other stuff). On my way to doing that I was forced against my will to learn way more than I ever wanted to know about the EVM. What is my main take away from all this learning? Well, I don't like it. I personally think it's an impractical design with an even more impractical implementation. And as a disclaimer, I intend to see through that we add another VM to Qtum which fixes at least most of these problems.

Anyway, so lets cut to the chase. What was the point of the EVM and why was it made in the first place? According to the Design Rationale it was designed for:

  1. Simplicity
  2. Determinism
  3. Compact bytecode size
  4. Specialization for blockchain
  5. Simplicity (uhh?)
  6. Optimizable

And if you skim through that document, you see the reasoning for the EVM is pretty well thought out. So where does it go wrong? Well, it doesn't work with today's technology and paradigms. It's a very good design built for a world that doesn't currently exist. I'll circle back around to this, but lets start with my favorite thing to hate in the EVM

256 bit integers

On most modern processors you have pretty much 4 good choices for fast speedy math:

  1. 8 bit integers
  2. 16 bit integers
  3. 32 bit integers
  4. 64 bit integers

Of course, in some cases 32 bit is faster than 16 bit, and at least in x86 8 bit math is not fully supported (ie, no native division or multiplication), but for the most part if you use one of these sizes you have some guarantees of how many cycles a math operation takes, and it's fast, measured in a couple of nanoseconds if you don't include cache misses and memory latency. Anyway, suffice to say that these are the size of integers that modern processor use "natively", without any translation or other things requiring extraneous operations.

So of course, since the EVM is intended to be optimized for speed and efficiency, it's choice for integer size is:

  1. 256 bit integers

For reference, here's how to add 2 32 bit integers in x86 assembly (ie, the processor your PC has in it)

mov eax, dword [number1]
add eax, dword [number2]

And here's how to add 2 64 bit integers in x86 assembly assuming your processor is 64 bit capable:

mov rax, qword [number1]
add rax, qword [number2]

And here is how to add 2 256 bit integers on a 32 bit x86 computer

mov eax, dword [number]
add dword [number2], eax
mov eax, dword [number1+4]
adc dword [number2+4], eax
mov eax, dword [number1+8]
adc dword [number2+8], eax
mov eax, dword [number1+12]
adc dword [number2+12], eax
mov eax, dword [number1+16]
adc dword [number2+16], eax
mov eax, dword [number1+20]
adc dword [number2+20], eax
mov eax, dword [number1+24]
adc dword [number2+24], eax
mov eax, dword [number1+28]
adc dword [number2+28], eax

Although, adding these 256 bit integers on a 64 bit x86 computer is a bit better

mov rax, qword [number]
add qword [number2], rax
mov rax, qword [number1+8]
adc qword [number2+8], rax
mov rax, qword [number1+16]
adc qword [number2+16], rax
mov rax, qword [number1+24]
adc qword [number2+24], rax

Anyway, suffice to say that working with 256 bit integers is significantly more complex and slow than working with an integer length natively supported by the processor.

The EVM embraces this design though because it is much simpler to only support 256 bit integers, than to add additional opcodes for working with other integer sizes. The only non-256 bit operations are a series of push instructions for pulling data from 1-32 bytes from memory, and a few instructions that work with 8 bit integers.

So, the design rationale for using this inefficient integer size for all operations?

"4 or 8 byte words are too restrictive to store addresses and big values for crypto computations, and unlimited values are too hard to make a secure gas model around."

I must admit, being able to compare 2 addresses with a single operation is pretty cool. However, here is how you would do the same in x86 when in 32-bit mode (without SSE and other optimizations):

mov esi, [address1]
mov edi, [address2]
mov ecx, 32 / 4
repe cmpsd 
jne not_equal
; if reach here, then they're equal

Assuming address1 and address2 are hardcoded addresses, that's around 6 + 5 + 5 = 16 bytes of opcodes, or if the addresses were in the stack, it might be something like 6 + 3 + 3 = 12 bytes of opcodes.

The other justification for the large integer size is "big values for cryptography computations", however, since reading that several months ago I've had a problem figuring out a single use case for 256 bit integers that doesn't involve comparing if an address or hash is equal. Custom cryptography is plainly too expensive to execute on the public blockchain. I searched for over an hour on github trying to find a solidity contract that does anything I'd define as cryptography and I came up with nothing. Almost any form of cryptography is guaranteed to be slow and complex on modern computers, and this makes it non-economical to execute on the public Ethereum blockchain due to gas costs (not to mention the effort of porting any real algorithm to Solidity). However, there are still private blockchains where gas costs do not matter. But if you own your own blockchain you won't want to do this as part of a slow EVM contract, you would use C++, or Go, or any number of real programming languages to implement the cryptography in native code as a pre-compiled smart contract. So this really blows the entire justification for supporting only 256 bit integers out of the water. This I feel is the real foundation of problems with the EVM, but there's a lot more lurking in the less obvious areas.

EVM's Memory Model

The EVM has 3 main places you can put data

  1. Stack
  2. Temporary memory
  3. Permanent memory

The stack has certain limits, so sometimes you need to use temporary memory instead of very expensive permanent memory. There is no allocate instruction or anything like that in the EVM. You claim memory by writing to it. This may seem pretty clever, but it's also extremely sinister. For instance, if you write to address 0x10000, your contract just allocated 64Kwords (ie, 64K of 256 bit words) of memory and paid the gas costs as if you had used all 64Kwords of memory. Well, easy workaround, just track the last memory address you use and increment it when you need more. That works decently, unless you happen to need at one point a lot of memory and then you don't need that memory anymore. Let's say you do some crazy algorithm that uses 100 words of memory. So, you allocate that, use the memory, whatever and pay for 100 words of memory... then you exit that function. Now you're back in some other function and it needs just 1 word of memory for scratch space or something, so it allocates another word. You're now using 101 words of memory. There is no way to free memory. You can in theory decrease that special pointer you were keeping track of for the last space of memory, but that only works if you know that entire block of memory will never be referenced again and can safely be reused. If out of those 100 words, you need the word at 50 and the word at 90, then you must copy those to another location (like the stack) and then that memory can be freed. There is no tools provided by the EVM to help with this. The technical term for it is memory fragmentation. It is up to you to vet that each function doesn't use that memory that was allocated and globally accessible, and if you reuse that memory and something got through your vetting process, then your contract now has a potentially critical state corruption bug. So your options are basically either open yourself up to a large class of memory reuse bugs, or pay more gas for memory even though you have already allocated more than you need.

Additionally, allocating memory does not have a linear cost. If you have allocated 100 words of memory and you allocate 1 more word, it is significantly more expensive than allocating that 1st word of memory when your program starts. This aspect greatly amplifies the economic cost of being on the safe side, compared to opening yourself up to more contract bugs for greatly decreased gas costs.

So, why use memory at all? Why not use the stack? Well, the stack is ridiculously limited.

EVM's Stack

The EVM is a stack-based machine. That means it uses a stack for most of it's operations, rather than a set of registers. Stack based machines are typically much simpler to optimize, but result in more opcodes being needed for most operations when compared to a similar register based machine.

Anyway, so the EVM has a lot of different operations, most of which operate on the stack alone. Notice the SWAP and DUP series of instructions. These go up to 16. Now try to compile this contract:

pragma solidity ^0.4.13;

contract Something{

    function foo(address a1, address a2, address a3, address a4, address a5, address a6){
        address a7;
        address a8;
        address a9;
        address a10;
        address a11;
        address a12;
        address a13;
        address a14;
        address a15;
        address a16;
        address a17;
    }
}

You'll be met with this error:

CompilerError: Stack too deep, try removing local variables.

This error occurs because once an item is 16 levels deep in the stack, it is effectively impossible to access without popping items off the stack. The official "solution" for this problem is to use less variables and make functions smaller. Various workarounds also include stuffing variables into a struct or array and using the memory keyword (which isn't able to be applied to normal variables for... reasons?). So, lets fix our contract to use some memory based structs:

pragma solidity ^0.4.13;

contract Something{
    struct meh{
        address x;
    }
    function foo(address a1, address a2, address a3, address a4, address a5, address a6){
        address a7;
        address a8;
        address a9;
        address a10;
        address a11;
        address a12;
        address a13;
        meh memory a14;
        meh memory a15;
        meh memory a16;
        meh memory a17;
    }
}

And the result is..

CompilerError: Stack too deep, try removing local variables.

But we replaced these variables with memory? Doesn't that fix it? Well, no. Because now instead of storing 17 256 bit integers on the stack, we are storing 13 integers and 4 256 bit memory addresses (ie, references) to a 256 bit slot of memory. Part of this is a Solidity problem, but the primary problem is that the EVM is missing a way to access arbitrary items on the stack. Every other VM implementation I know of works around this basic problem by either

  1. Encouraging small stack sizes and making it easy to swap stack items to memory or alternative storage (like local variables, in .NET)
  2. Implementing a pick instruction or similar that allows access to any arbitrary stack slot

However, in the EVM, the stack is the only free place of memory for data and computation, any other place has a direct cost in the form of gas. So, this directly discourages small stack sizes, because anywhere else is more expensive... so we arrive at basic language implementation problems like this.

Bytecode size

In the rationale document it's stated their goal was for EVM bytecode to be both simple and compact. However, this is like saying that you prefer to write code that is both descriptive and concise. They are fundamentally differing goals accomplished in fundamentally different ways. A simple instruction set is accomplished by limiting the number of operations, and keeping operations concise and simplistic. Meanwhile, a compact bytecode that produces small programs is accomplished by making an instruction set that performs as many operations as possible in as few bytes of code as possible.

Ultimately, despite "compact bytecode size" being a goal in their rationale, the actual implementation of the EVM does not attain that goal in any sense. It is instead focused on a simplistic instruction set that is easy to create a gas model around. And I'm not saying this is wrong or bad, only that one of their primary goals of the EVM is fundamentally at ends with the other goals of the EVM. Also, one number given in that document is that a C program takes over 4000 bytes in order to implement "hello world". This is definitely not the case and glosses over the different environments and optimizations that take place in C programs. In the C program they measured, I expect there was also ELF data, relocation data, and alignment optimizations - aligning code and data on certain boundaries such as 32 byte or 4kb can have a measurable impact on the performance of the program on physical processors. I personally have built a simplistic bare bones C program that compiles to 46 bytes of x86 machine code, and a simple greeter type program which compiles to ~700 bytes, while Solidity's example compiles to over 1000 bytes of EVM bytecode.

I understand the need for a simplistic instruction set for security reasons, but it causes significant bloat on the blockchain. Passing over this as if EVM smart contract bytecode is as small as possible is detrimental. It could clearly be made much smaller by including a standard library and supporting opcodes that do a batch of common operations rather than needing to execute several opcodes for such a thing.

256 bit integers (again)

But really, 256 bit integers are awful. And the most ridiculous part is they are used in places where they have no reasonable use. It's effectively impossible to use more than 4B (32 bits) units of gas, so what integer size is used for specifying and counting gas? 256 bits of course. Memory is fairly expensive, so what's the address size for memory addresses in the EVM? 256 bit of course, for when your contract needs more words of memory than there are atoms in the universe. I would complain about using 256 bit integers for both addresses and values in permanent storage, but this actually provides some interesting abilities to use a hash for some data and have no worries about conflicts in the address space, so I guess that gets a pass. Every single instance where you could use any integer size, the EVM calls for 256 bits. Even JUMP uses 256bit, but in their defense they do limit the highest jump destination to 0x7FFFFFFFFFFFFFFF and effectively limit the jump destination to a signed 64 bit integer. And then for currency values themselves. The smallest unit of ETH is wei, so we arrive at the total coin supply (in wei) is 1000000000000000000 * 200000000 (200M is an estimate, currently the supply is ~92M).. And so, if we subtract that number from 2 to the power of 256 (maximum value storable by a 256 bit integer), we get.. 1.157920892373162e+77. Just enough space to send more wei than will ever exist plus a magnitude greater than the number of atoms in the universe. Basically, 256 bit integers are incredibly impractical and unnecessary for almost any application that the EVM is designed for.

Lack of standard library

If you've ever developed a Solidity smart contract, this is probably one of the first things you encountered as a problem. There is no standard library, at all. If you want to determine if two strings are equal, there is no strcmp or memcmp or anything like that, you must write the code yourself or copy code from the internet. The Zepplin Project is making this situation bearable by providing a standard library that contracts can use (either by including it in the contract itself or by calling an external contract). However, the limitations of this approach is apparent when considering that it is cheaper to use two SHA3 operations and then compare the resulting hashes, than it is to loop through the bytes of a string (32 bytes at a time) to determine if they are equal. Having a standard library of precompiled contracts that use native code with set, reasonable gas prices would be greatly beneficial to the entire smart contract ecosystem. Without this though, people instead copy and paste code from open source code, with unknown security implications. In addition to this people will optimize their code, trying to find shortcuts and reductions in gas usage, even at the risk of potentially compromising the security profile of their contract.

The economics and game theory of gas

I plan on making a full blog post about this topic, but the EVM doesn't just make good practices hard, but also expensive. For instance, it costs quite a bit of gas to store data on the blockchain. This means it can be incredibly expensive to cache any amount of data within a smart contract. So, instead it is computed with each contract execution. Over time more gas is consumed and blockchain nodes waste more time executing the same code to compute the same data. Furthermore, there is very little actual cost to data stored on the blockchain. It does not directly increase the size of the blockchain (in either Ethereum or Qtum). The real cost is the data which enters the blockchain in the form of data sent to contracts, as that is what directly increases the size of the blockchain. It is almost cheaper in Etheruem to enter 32 bytes of data into the blockchain in the form of a transaction (23176 gas) than it costs to store 32 bytes in a contract (20,000), and it is significantly cheaper when scaling that 64 bytes of data (29704 gas for tx compared to 80,000 gas for storage). There is a "virtual" cost to data stored in a contract, but it is much less than most people assume. It is basically just the cost of iterating through the database storing data for the entire blockchain. The RLP and LevelDB database system used by both Qtum and Ethereum is very efficient at handling this however, and ongoing costs are no where close to linear.

Another part of the EVM that encourages inefficient code is that it is not possible to call a specific function in a smart contract. This is for security, as being able to directly call a function like withdraw() in an ERC20 contract would be bad. However, this is needed for standard libraries to be efficient. Instead of simply being able to load a specific piece of code from an external contract, it's all or nothing, and execution always starts at the first byte of the code, there is no way to jump around and skip all of the Solidity ABI bootstrap code. So, in the end this encourages for small functions to be duplicated (because they are more expensive to call externally), and to deploy as many functions in a contract as possible. There is no cost difference for calling a 100 byte contract or a 10,000 byte contract, despite all of the code needing to be loaded into memory either way.

And finally, it is not at all possible to access the storage of a contract directly. The contract code must be fully loaded from disk, executed, the code must load the data from the storage that you requested, and then finally return it to the calling contract while making sure not to use variable size arrays. Oh, and if you need some back and forth because you didn't know the exact data you needed, at least it's in cache so it's cheap for nodes, but there is no discount on the gas price for calling the external contract a second time. It's possible to access an external contract's storage without needing to completely load its code. In fact, it's just as cheap computationally as accessing the current contract's storage, so why make it so expensive and discourage efficiency?

Lack of debugging and testability

This problem lies not just on the fault of the EVM's design, but also its implementations. Of course, some projects are striving to make this as easy as possible, like Truffle. However, the EVM's design does not make this at all easy. The only exception available is "OutOfGas", there are no logging facilities, no easy way to call external native code (such as for test helpers and mocking data), and the Ethereum blockchain itself is difficult to create a private testnet with, and the private blockchain has different parameters and behavior. Qtum at least has a leg up here thanks to "regtest" mode, but testing the EVM with mock data etc is still incredibly hard since no implementation is really stand-alone. And there are no debuggers I know of that work at Solidity level, there are at least 1 EVM assembly debuggers I know of though, but that is far from user friendly. There is no symbol format or debug data format established at all for EVM and/or Solidity, and I've found no EIPs or other effort to begin working toward a standardized debug format like DWARF.

Floating point numbers

One common thing I see people say when the lack of floating point support comes up is "well no one should be handling currency values using floating point numbers". This is incredibly narrow-minded though. There are many practical use cases for floating point numbers such as risk modeling, scientific computations, and cases where ranges and approximations are more important than exact values. Saying the potential applications of smart contracts to only handling currency values is unrealistic and needlessly limiting.

Immutable code

One of the major things that contracts need to be designed for is upgradeability, because it's not a matter of if a contract needs changed, but rather when. In the EVM code is completely immutable, and because it uses the Harvard Architecture of computing, it is not possible to load code into memory and then execute it. Code and data are completely separate things treated differently. So, the only option for upgrading a contract is to deploy a completely new contract, duplicating all of the code and make the old contract redirect to it. Patching pieces of the contract and partially (or wholly) replacing the code is not possible.

Conclusion

I finished my beer (well, hard cider) and I think my rant is coming to an end. The EVM at this point is a necessary evil. It was the first in this space, and like most things that come first (like Javascript), there are many problems. And it's design is very unconventional, and is why I don't think we will see any conventional programming languages ported to the EVM. It's design is actively hostile to the many common language paradigms that have been established over the past 50+ years. This includes things like JUMPDEST making jump table optimizations difficult, no tail-recursion support, strange and inflexible memory model, difficult to understand DELEGATECALL model for external code, lack of commonly used opcodes such as bitwise shifts, inflexible stack size limits, and of course the 256 bit integers. These aspects make porting traditional languages to the EVM at best inefficient and at worst impossible. This I assume is why all EVM languages currently are built specifically for the EVM and with all of it's unconventional models in mind. It is a sad state of affairs really.

I mean this entire post not as an assault or anything to the designers of the EVM, it's just how things are. Hindsight is always 20/20 and I know I've seen many regrets from them about certain aspects of the EVM's design. I don't wish to attack them (even if my sarcastic tone might seem like it sometimes), but rather I want to bring these faults to the attention of the greater blockchain developer community, so that they will not be repeated, and hopefully also provide some insight into all of the "why can't I do this in Solidity" type questions at the same time. The EVM has an incredible design that we are still learning the benefits and pitfalls of, and it's obvious that we have a long way to go before smart contracts can be as efficient and powerful as we all know they can be. The EVM was the first contender in this space, and ultimately we're still learning and discovering all of the use cases of smart contracts and what kind of design benefits them most. We've come a long way, but there's still a long way to go.

Posted: 8/13/2017 4:51:44 AM

The missing explanation of Proof of Stake Version 3

In every cryptocurrency there must be some consensus mechanism which keeps the entire distributed network in sync. When Bitcoin first came out, it introduced the Proof of Work (PoW) system. PoW is done by cryptographically hashing a piece of data (the block header) over and over. Because of how one-way hashing works. One tiny change in the data can cause an extremely different hash to come of it. Participants in the network determine if the PoW is valid complete by judging if the final hash meets a certain condition, called difficulty. The difficulty is an ever changing "target" which the hash must meet or exceed. Whenever the network is creating more blocks than scheduled, this target is changed automatically by the network so that the target becomes more and more difficult to meet. And thus, requires more and more computing power to find a hash that matches the target within the target time of 10 minutes.

Definitions

Some basic definitions might be unfamiliar to some people not familiar with the blockchain code, these are:

  • UTXO - Unspent Transaction Output
  • vin - In a transaction a 'vin' is a UTXO that is being spent as an "input"
  • vout - In a transaction, a 'vout' is the new UTXO that is being created as an "output". The 'vouts' is effectively all of the coins sent after the transaction is complete
  • hashing - The process of creating a hash. This takes an arbritrary amount of data as input, and outputs a fixed size "digest" which is impossible to reverse. Additionally, if you change anything about the input data, it drastically changes the output digest.
  • hash - The result of a hashing algorithm.
  • script - The computer program that determines how a vout/UTXO is spent.
  • pay-to-pubkeyhash script - The most common script used to send money in Bitcoin and other cryptocurrencies. In order to send money, all you need to know is the hash of their public key (commonly represented as a base58 address), and in order to spend the received money all that is needed is a signature from the public key, and the public key itself.
  • pay-to-pubkey script - A very simple script which has very similar functionality to pubkeyhash scripts. However, instead of sending money to the hash of a public key, the money is sent to the public key itself. All that is needed for spending is a cryptographic signature proving ownership of the public key.
  • prevout - The vout which is spent (as a vin) in a transaction
  • OP_RETURN script - OP_RETURN is an operation used in script which effectively makes an output provably unspendable. It is commonly used to hold small amounts of data on the blockchain without polluting the UTXO set.

Proof of Work and Blockchain Consensus Systems

Proof of Work is a proven consensus mechanism that has made Bitcoin secure and trustworthy for 8 years now. However, it is not without it's fair share of problems. PoW's major drawbacks are:

  1. PoW wastes a lot of electricity, harming the environment.
  2. PoW benefits greatly from economies of scale, so it tends to benefit big players the most, rather than small participants in the network.
  3. PoW provides no incentive to use or keep the tokens.
  4. PoW has some centralization risks, because it tends to encourage miners to participate in the biggest mining pool (a group of miners who share the block reward), thus the biggest mining pool operator holds a lot of control over the network.

Proof of Stake was invented to solve many of these problems by allowing participants to create and mine new blocks (and thus also get a block reward), simply by holding onto coins in their wallet and allowing their wallet to do automatic "staking". Proof Of Stake was originally invented by Sunny King and implemented in Peercoin. It has since been improved and adapted by many other people. This includes "Proof of Stake Version 2" by Pavel Vasin, "Proof of Stake Velocity" by Larry Ren, and most recently CASPER by Vlad Zamfir, as well as countless other experiments and lesser known projects.

For Qtum we have decided to build upon "Proof of Stake Version 3", an improvement over version 2 that was also made by Pavel Vasin and implemented in the Blackcoin project. This version of PoS as implemented in Blackcoin is what we will be describing here. Some minor details of it has been modified in Qtum, but the core consensus model is identical.

For many community members and developers alike, proof of stake is a difficult topic, because there has been very little written on how it manages to accomplish keeping the network safe using only proof of ownership of tokens on the network. This blog post will go into fine detail about Proof of Stake Version 3 and how it's blocks are created, validated, and ultimately how a pure Proof of Stake blockchain is possible to secure. This will assume some technical knowledge, but I will try to explain things so that most of the knowledge can be gathered from context. You should at least be familiar with the concept of the UTXO-based blockchain.

Before we talk about PoS, it helps to understand how the much simpler PoW consensus mechanism works. It's mining process can be described in just a few lines of pseudo-code:

while(blockhash > difficulty) {
  block.nonce = block.nonce + 1
  blockhash = sha256(sha256(block))
}

A hash is a cryptographic algorithm which takes an arbritrary amount of input data, does a lot of processing of it, and outputs a fixed-size "digest" of that data. It is impossible to figure out the input data with just the digest. So, PoW tends to function like a lottery, where you find out if you won by creating the hash and checking it against the target, and you create another ticket by changing some piece of data in the block. In Bitcoin's case, nonce is used for this, as well as some other fields (usually called "extraNonce"). Once a blockhash is found which is less than the difficulty target, the block is valid, and can be broadcast to the rest of the distributed network. Miners will then see it and start building the next block on top of this block.

Proof of Stake's Protocol Structures and Rules

Now enter Proof of Stake. We have these goals for PoS:

  1. Impossible to counterfeit a block
  2. Big players do not get disproportionally bigger rewards
  3. More computing power is not useful for creating blocks
  4. No one member of the network can control the entire blockchain

The core concept of PoS is very similar to PoW, a lottery. However, the big difference is that it is not possible to "get more tickets" to the lottery by simply changing some data in the block. Instead of the "block hash" being the lottery ticket to judge against a target, PoS invents the notion of a "kernel hash".

The kernel hash is composed of several pieces of data that are not readily modifiable in the current block. And so, because the miners do not have an easy way to modify the kernel hash, they can not simply iterate through a large amount of hashes like in PoW.

Proof of Stake blocks add many additional consensus rules in order to realize it's goals. First, unlike in PoW, the coinbase transaction (the first transaction in the block) must be empty and reward 0 tokens. Instead, to reward stakers, there is a special "stake transaction" which must be the 2nd transaction in the block. A stake transaction is defined as any transaction that:

  1. Has at least 1 valid vin
  2. It's first vout must be an empty script
  3. It's second vout must not be empty

Furthermore, staking transactions must abide by these rules to be valid in a block:

  1. The second vout must be either a pubkey (not pubkeyhash!) script, or an OP_RETURN script that is unspendable (data-only) but stores data for a public key
  2. The timestamp in the transaction must be equal to the block timestamp
  3. the total output value of a stake transaction must be less than or equal to the total inputs plus the PoS block reward plus the block's total transaction fees. output <= (input + block_reward + tx_fees)
  4. The first spent vin's output must be confirmed by at least 500 blocks (in otherwords, the coins being spent must be at least 500 blocks old)
  5. Though more vins can used and spent in a staking transaction, the first vin is the only one used for consensus parameters.

These rules ensure that the stake transaction is easy to identify, and ensures that it gives enough info to the blockchain to validate the block. The empty vout method is not the only way staking transactions could have been identified, but this was the original design from Sunny King and has worked well enough.

Now that we understand what a staking transaction is, and what rules they must abide by, the next piece is to cover the rules for PoS blocks:

  • Must have exactly 1 staking transaction
  • The staking transaction must be the second transaction in the block
  • The coinbase transaction must have 0 output value and a single empty vout
  • The block timestamp must have it's bottom 4 bits set to 0 (referred to as a "mask" in the source code). This effectively means the blocktime can only be represented in 16 second intervals, decreasing it's granularity
  • The version of the block must be 7
  • A block's "kernel hash" must meet the weighted difficulty for PoS
  • The block hash must be signed by the public key in the staking transaction's second vout. The signature data is placed in the block (but is not included in the formal block hash)
  • The signature stored in the block must be "LowS", which means consisting only of a single piece of data and must be as compressed as possible (no extra leading 0s in the data, or other opcodes)
  • Most other rules for standard PoW blocks apply (valid merkle hash, valid transactions, timestamp is within time drift allowance, etc)

There are a lot of details here that we'll cover in a bit. The most important part that really makes PoS effective lies in the "kernel hash". The kernel hash is used similar to PoW (if hash meets difficulty, then block is valid). However, the kernel hash is not directly modifiable in the context of the current block. We will first cover exactly what goes into these structures and mechanisms, and later explain why this design is exactly this way, and what unexpected consequences can come from minor changes to it.

The Proof of Stake Kernel Hash

The kernel hash specifically consists of the following exact pieces of data (in order):

  • Previous block's "stake modifier" (more detail on this later)
  • Timestamp from "prevout" transaction (the transaction output that is spent by the first vin of the staking transaction)
  • The hash of the prevout transaction
  • The output number of the prevout (ie, which output of the transaction is spent by the staking transaction)
  • Current block time, with the bottom 4 bits set to 0 to reduce granularity. This is the only thing that changes during staking process

The stake modifier of a block is a hash of exactly:

  • The hash of the prevout transaction in PoS blocks, OR the block hash in PoW blocks.
  • The previous block's stake modifier (the genesis block's stake modifier is 0)

The only way to change the current kernel hash (in order to mine a block), is thus to either change your "prevout", or to change the current block time.

A single wallet typically contains many UTXOs. The balance of the wallet is basically the total amount of all the UTXOs that can be spent by the wallet. This is of course the same in a PoS wallet. This is important though, because any output can be used for staking. One of these outputs are what can become the "prevout" in a staking transaction to form a valid PoS block.

Finally, there is one more aspect that is changed in the mining process of a PoS block. The difficulty is weighted against the number of coins in the staking transaction. The PoS difficulty ends up being twice as easy to achieve when staking 2 coins, compared to staking just 1 coin. If this were not the case, then it would encourage creating many tiny UTXOs for staking, which would bloat the size of the blockchain and ultimately cause the entire network to require more resources to maintain, as well as potentially compromise the blockchain's overall security.

So, if we were to show some pseudo-code for finding a valid kernel hash now, it would look like:

while(true){
    foreach(utxo in wallet){
        blockTime = currentTime - currentTime % 16
        posDifficulty = difficulty * utxo.value
        hash = hash(previousStakeModifier << utxo.time << utxo.hash << utxo.n << blockTime)
        if(hash < posDifficulty){
            done
        }
    } 
    wait 16s -- wait 16 seconds, until the block time can be changed
}

This code isn't so easy to understand as our PoW example, so I'll attempt to explain it in plain english:

Do the following over and over for infinity: 
Calculate the blockTime to be the current time minus itself modulus 16 (modulus is like dividing by 16, but then only instead of taking the result, taking the remainder)
Calculate the posDifficulty as the network difficulty, multiplied by the number of coins held by the UTXO. 
Cycle through each UTXO in the wallet. With each UTXO, calculate a SHA256 hash using the previous block's stake modifier, as well as some data from the the UTXO, and finally the blockTime. Compare this hash to the posDifficulty. If the hash is less than the posDifficulty, then the kernel hash is valid and you can create a new block. 
After going through all UTXOs, if no hash produced is less than the posDifficulty, then wait 16 seconds and do it all over again. 

Now that we have found a valid kernel hash using one of the UTXOs we can spend, we can create a staking transaction. This staking transaction will have 1 vin, which spends the UTXO we found that has a valid kernel hash. It will have (at least) 2 vouts. The first vout will be empty, identifying to the blockchain that it is a staking transaction. The second vout will either contain an OP_RETURN data transaction that contains a single public key, or it will contain a pay-to-pubkey script. The latter is usually used for simplicity, but using a data transaction for this allows for some advanced use cases (such as a separate block signing machine) without needlessly cluttering the UTXO set.

Finally, any transactions from the mempool are added to the block. The only thing left to do now is to create a signature, proving that we have approved the otherwise valid PoS block. The signature must use the public key that is encoded (either as pay-pubkey script, or as a data OP_RETURN script) in the second vout of the staking transaction. The actual data signed in the block hash. After the signature is applied, the block can be broadcast to the network. Nodes in the network will then validate the block and if it finds it valid and there is no better blockchain then it will accept it into it's own blockchain and broadcast the block to all the nodes it has connection to.

Now we have a fully functional and secure PoSv3 blockchain. PoSv3 is what we determined to be most resistant to attack while maintaining a pure decentralized consensus system (ie, without master nodes or currators). To understand why we approached this conclusion however, we must understand it's history.

PoSv3's History

Proof of Stake has a fairly long history. I won't cover every detail, but cover broadly what was changed between each version to arrive at PoSv3 for historical purposes:

PoSv1 - This version is implemented in Peercoin. It relied heavily on the notion of "coin age", or how long a UTXO has not been spent on the blockchain. It's implementation would basically make it so that the higher the coin age, the more the difficulty is reduced. This had the bad side-effect however of encouraging people to only open their wallet every month or longer for staking. Assuming the coins were all relatively old, they would almost instantaneously produce new staking blocks. This however makes double-spend attacks extremely easy to execute. Peercoin itself is not affected by this because it is a hybrid PoW and PoS blockchain, so the PoW blocks mitigated this effect.

PoSv2 - This version removes coin age completely from consensus, as well as using a completely different stake modifier mechanism from v1. The number of changes are too numerous to list here. All of this was done to remove coin age from consensus and make it a safe consensus mechanism without requiring a PoW/PoS hybrid blockchain to mitigate various attacks.

PoSv3 - PoSv3 is really more of an incremental improvement over PoSv2. In PoSv2 the stake modifier also included the previous block time. This was removed to prevent a "short-range" attack where it was possible to iteratively mine an alternative blockchain by iterating through previous block times. PoSv2 used block and transaction times to determine the age of a UTXO; this is not the same as coin age, but rather is the "minimum confirmations required" before a UTXO can be used for staking. This was changed to a much simpler mechanism where the age of a UTXO is determined by it's depth in the blockchain. This thus doesn't incentivize inaccurate timestamps to be used on the blockchain, and is also more immune to "timewarp" attacks. PoSv3 also added support for OP_RETURN coinstake transactions which allows for a vout to contain the public key for signing the block without requiring a full pay-to-pubkey script.

References:

  1. https://peercoin.net/assets/paper/peercoin-paper.pdf
  2. https://blackcoin.co/blackcoin-pos-protocol-v2-whitepaper.pdf
  3. https://www.reddcoin.com/papers/PoSV.pdf
  4. https://blog.ethereum.org/2015/08/01/introducing-casper-friendly-ghost/
  5. https://github.com/JohnDolittle/blackcoin-old/blob/master/src/kernel.h#L11
  6. https://github.com/JohnDolittle/blackcoin-old/blob/master/src/main.cpp#L2032
  7. https://github.com/JohnDolittle/blackcoin-old/blob/master/src/main.h#L279
  8. http://earlz.net/view/2017/07/27/1820/what-is-a-utxo-and-how-does-it
  9. https://en.bitcoin.it/wiki/Script#Obsolete_pay-to-pubkey_transaction
  10. https://en.bitcoin.it/wiki/Script#Standard_Transaction_to_Bitcoin_address_.28pay-to-pubkey-hash.29
  11. https://en.bitcoin.it/wiki/Script#Provably_Unspendable.2FPrunable_Outputs
Posted: 7/27/2017 7:04:13 PM

What is a UTXO, and how does it work for a blockchain ledger?

Today I'd like to introduce the basics of how a blockchain works, and how it keeps track of money in a secure manner. I will be covering the UTXO model, as it is used by Bitcoin and Qtum. There is another way of managing funds on the blockchain called the account model, but it will not be covered here.

First I'd like to give some definitions in case you do not know anything about Bitcoin.

  • One-way hash (or just "hash") - A cryptographic algorithm which converts an arbtritrary amount of data into a fixed-length "digest". The algorithm does this in a way that given just the digest it is impossible to determine what the input data was, and furthermore it is impossible to predict what the digest is from the given input data. The most common example is SHA256 which is used extensively in Bitcoin, but there are many others including SHA3, RIPEMD160, scrypt, and many others.
  • Public key cryptography - A cryptographic mechanism by which a "private" secret key can be converted into a "public" key and used to prove ownership of the private key without giving away the secret. Additionally it is possible to encrypt data using the public key so that only the person holding the private key can decrypt it. In Bitcoin this is commonly used to sign transactions. It is possible to prove that the creator of the transaction owns the secret private key by using only the signature data and the public key.
  • Merkle root - A tree data structure that uses one-way hashing to hold multiple pieces of data making it so that any data in the input of the tree can not be modified without changing the final value of the merkle root hash.
  • UTXO - Unspent Transaction Output, an unspent vout from a transaction
  • Block - The smallest verifiable and unforgeable unit on the blockchain. It contains various data to prove it's consensus as well as transactions

So, let's talk about how transactions work in this. Transactions in Bitcoin resemble a cashier's check in some ways. When you want to spend an "output" of a transaction you must spend the entire thing. It's similar to how you can't walk into the bank and say "I want to cash half of this check". However, in this model there is no equivalent of cash or bank accounts. So in order to send money anywhere you must "cash" a check written out to you, and "output" from that cashing process a check to your intended destination, and another check back to yourself.

This "cashing process" is actually a transaction in Bitcoin. In a transaction you spend 1 or more "checks" (actually known as UTXOs) and create 1 or more UTXOs to new destinations from those spent funds. The UTXOs you spend in a transaction are called "vins", and the new UTXOs you create are called "vouts". Once a UTXO is spent by a transaction it can be considered gone and destroyed. You can see it's history in the blockchain, but there is nothing that can done with it.

So, one problem in our system so far is that checks are normally written out to names, such as "Jordan Earls". Anyone of course can say they are any name on the internet. This is where we introduce public key cryptography and programming into UTXOs. In Bitcoin UTXOs contain a script, or a computer program, which are only spendable if you can make that script end by saying "True". Let's look at the most simple script possible that does something useful:

[pubKey] OP_CHECKSIG

This is referred to as a "pay-to-pubkey" script. It was the first standard Bitcoin transaction type. The first item is [pubKey]. This is the data for a public key. Remember that for each public key there is a private key which is kept secret by it's owner. It is safe to publish the public key, but not the private key. The Bitcoin "Script" language is stack based. So imagine you have a stack of papers. You write the public key on a piece of paper and then place it on the stack. The next piece of this script is OP_CHECKSIG. This specific operation will take 2 things off of the top of the stack. The first thing it takes off is the public key. Then, the second thing it takes off is the cryptographic signature.

This is confusing now though. OP_CHECKSIG takes 2 values from the stack (also known as arguments), but our script appears to only have 1 value, pubKey. This is where the vin portion becomes important. You can imagine the vout script as the "pay to" field on a check, and the vin script as the place you sign on the back, proving that you are indeed the intended party from the "pay to" field. In Bitcoin, a script is not executed until it is spent. And when it is spent, it first executes the vin script, and then places the resulting data from the vin stack on to the vout stack. So in actual execution, the script might look rather like:

[signature from vin] [pubKey] OP_CHECKSIG

One could consider the vout script as a challenge, and the vin as the answer to give the vout to satisfy it. Anyway, now that we have a vin providing the signature and attempting to spend these funds, we can actually execute the script. If the signature and public key is valid, then OP_CHECKSIG will push "true" on the stack, resulting in the UTXO being succesfully spent.

So in a transaction, each vin specifies a previous UTXO, and provides an answer that causes the UTXO's script to return "true". If an invalid signature or similar is used, then the scripts will return "false" and the transaction will not be valid. It is not possible to partially spend a UTXO. It must be completely spent or left untouched. This means that if you have a UTXO worth 10 tokens, and you want to send 7 tokens to Bob, then you must make a transaction that spends this 10 token UTXO, and creates 2 outputs. One output to Bob (using his public key), and one output back to yourself (ensuring that you can provide an "answer" to the vout to spend it successfully). This second output back to yourself is called a "change address".

Finally, we have a reasonable way of exchanging tokens using transactions and scripts. However, we face a problem. When someone sends you a transaction output, how can you be sure that their vins for that transaction only use unspent outputs. This is where the concept of the blockchain becomes important.

A block in Bitcoin has a header. The header contains the following:

  • Version
  • Previous block header hash
  • Merkle root hash of all transactions in the block
  • Time of creation
  • Difficulty
  • Nonce

The body of the block is complete transactions (and eventually witnesses as well, but that's another topic).

Because each block includes a reference to the previous block, it is impossible to modify a previous block sereptitiously. To modify a previous block would change the block hash, and thus break the "chain" made of block hashes.

Bitcoin uses the Proof of Work (PoW) consensus system. This will be explained more in a later article, but basically it is a system which requires participants (miners) in the block creation process to put in a certain amount of computational work to solve a difficult puzzle. The first miner to solve the puzzle gets a reward and their created block is added to the network's blockchain. How much work must be done is controlled by the "difficulty" specified in the block.

In PoW, only the block header is actually used for the consensus mechanism. The merkle root hash ensures that despite this, it is possible to validate every transaction in the body of the block, as well as ensure that every transaction has been received.

Once a block has been created, it's transactions can be mostly considered permanent. The only way to "double spend" a UTXO is to replace the block in which the spending transaction took place. This can happen naturally in some cases (known as orphan blocks), but as more blocks are built on top of the transaction containing block, the likelyhood of this becomes exponentially less likely, and furthermore, would require exponentially more work to maliciously attack and replace.

This is why many services that accept Bitcoin wait for 3 or 6 confirmations (blocks placed on top of the transaction containing block). It becomes incredibly unlikely that the blockchain could be broken and those funds spent by another transaction.

We have only one remaining problem. Where do the tokens initially come from? They come from the mining process. As part of mining, the miner adds a special transaction called a "coinbase" transaction. This transaction has no inputs, and is allowed to have outputs worth a set amount (currently 12 in Bitcoin). This coinbase transaction is where all of the tokens in circulation actually come from. Without tokens there would be no transactions to create, and thus nothing to be done.

Now we have a functioning blockchain that is capable of holding it's value securely, ensuring that double spends are extremely difficult to execute (and increasing in difficulty with more confirmations). You should now know enough to understand how Bitcoin, Qtum, and other UTXO cryptocurrencies really work at the protocol level and can begin to look into more advanced topics on the blockchain.

References:

  1. https://en.bitcoin.it/wiki/Script#Obsolete_pay-to-pubkey_transaction
Posted: 7/27/2017 6:20:44 PM