Economics of Fees and Gas

So, I'm terrible at preparing for speeches, so I'm trying something new; I'll write about it before I speak about it.

Gas in Ethereum (and Qtum) is basically the cost of an operation in a smart contract. Different operations have different costs. Some are very cheap, some are not so cheap. This mechanism basically functions as a way of discouraging certain behavior, and also serves to make spam and attacks on the blockchain more expensive. In a Turing-Complete computer, this notion of gas is an absolute requirement. Otherwise, someone could simply write an infinite loop and watch the blockchain come to a halt.

Today I'll discuss specifically what kind of things the gas model encourages in certain conditions, and some of the unintended consequences of that.

Storage costs

So, permanent storage is by and far the most expensive resource on the Ethereum blockchain. It is completely impractical to store any significant amount of data on the blockchain. In current network conditions, 1Kb of data is ~$2.50, depending on how fast you want it to confirm. This means 1Mb is over $2,000. So, it's impractical to store any significant data on the blockchain. However, this is for traditional storage. There are unconventional alternatives to the framework Ethereum lays out. I'll present a total of 5 methods of storage, with various behaviors, costs, and benefits:

External Code

This is basically where you create a new contract account using the CREATE opcode in Ethereum. You store whatever data you need to as the contract bytecode of a newly created contract. This is probably the most wasteful way to store data on the blockchain, as once you store it, there is very little incentive to include a way to destroy that data after it is no longer needed.

This is a semi-volatile storage method. Once the data is stored, it can not be updated without creating it from new. It is best for large blocks of data which do not need to be updated at a later point, or which only need to be updated very seldomly. It is extremely cheap to load this data once written, beating conventional storage by an order of magnitude.

Internal Code

This is an obvious choice for some use cases and used internally in Solidity generated bytecode. Basically if you have some data you can generate in your constructor and cache, it is typically much cheaper to calculate it, and then store it directly in the deployed contract bytecode, than it is to include large amounts of data in the create transaction itself. Assuming the data does not need to ever be updated, this is the best method to use, as it is also the cheapest way to load data.

Storage

This is the conventional storage model provided by Ethereum. It is granular, allowing each individual 32-byte word to be updated in storage, with reduced costs to update than compared to writing to new storage. There is also some incentive to keep this area clean and tidy, as cleaning up storage actually can provide the contract a gas stipend. This area is easy to update, easy to read, but in most cases the most expensive method to access and write to.

Transaction

This is a fully volatile data storage method, though it actually stores no persistent data directly accessible to the contract. Instead of storing data within the EVM itself, the data is provided to the contract by sending contract data. The contract then hashes the data to ensure that it is correct and matches the expected contract state. This method has some drawbacks, the biggest being that transaction data is extremely expensive. It is by far the most expensive to "access", which requires putting the data in a transaction that calls the contract. However, it is also the cheapest (other than internal code) to actually "write" to and update, because it is not actually stored within the blockchain itself. This method could be extended to use a hash-tree like structure so that the data provided to the contract is very granular and thus less wasteful of precious transaction space. However, this method also comes with a major drawback. Data races must be accounted for. If another user sends a call to the contract that causes the data to be updated, then it invalidates the data in any pending transaction. This along with the greatly expensive cost of data access severely limits the potential use cases of this method.

So, how do these methods stack up for overall cost?

Note that the "actual" transaction cost is shown along-side the other methods. This is basically how much it costs to actually call a contract with the set amount of data.

Data access

Data access with transaction

Data Initial Storage

Data Update Storage

These graphs clear some things up for us:

  • External code is preferrable for any data greater than 128 bytes that never needs to be updated
  • Internal code should be used anytime pure constant data is needed and is known at deployment time
  • Storage should be used for any commonly updated data
  • Transaction data is useable in some cases despite races, but if using the contract more than once, the overall cost quickly becomes very high.

Economics of Storage

On the topic of the game theory and economics behind this, is it really better for the overall blockchain to store non-volatile data in contract bytecode rather than in the storage database? I guess it depends on several factors. As more accounts are created, does it become more expensive to access account data? I don't have the answers, but I can guarantee that using this method for volatile storage is not only more expensive, but also somewhat wasteful. My graph includes the cost of deploying basically a new contract, but subtracts from the gas cost the refund given by destroying the old data contract of ~19000 gas. This does have a measurable impact on the graph on the left side of the graph, but after about 320 bytes it becomes a moot benefit other than being a good blockchain citizen.

So, what's the downside of contract storage being so expensive? The primary disadvantage is that it can encourage performing potentially cheaper computations on data that actually take more time to execute, just to prevent using storage as a caching layer.

These costs can also in some cases can encourage for data to be sent in each transaction. This is probably the most detrimental condition. The database where contract data is stored is in some ways actually a free resource of the blockchain at a protocol level. Outside of SPV(in Qtum) or "fast" (in Ethereum) syncing methods, the size of this database has absolutely no bearing on how large the actual blockchain is, nor how long it takes to process. The size of this database in the end is collapsed into a single 256-bit SHA3 hash called the stateRootHash. Arguably, the most direct resource cost to the blockchain is not this storage, but rather all data that is attached to transactions. This transaction data can not be discarded or ignored without certain security implications.

For the lightweight wallets and nodes present in Qtum, it is actually more detrimental to the overall user experience to have data stored in account bytecode data, rather than directly in the storage. It is simple to retreive and validate all storage associated with a single account. However, if this contract is accessing multiple separate accounts for data purposes, then it is a completely non-deterministic process, causing significant latency. Without downloading all contract bytecode, it is not possible to retrieve and validate this data in one-shot. So, you instead must begin execution of the contract, and then go back to the network, retrieve another account, validate it, etc, and then finally continue on with execution. For just 1 data contract like this, it may not have a large impact, but because of how cheap this data is to access, there is no clear benefit to not using multiple contracts, other than the fixed 32,000 gas flat creation fee. In fact, after a certain point, especially with semi-volatile data, it is greatly encouraged to keep this data fairly granular.

So, basically the high cost for storage is born out of the need to keep the blockchain easy to interact with for lightweight wallets. However, there is a clear path that contract developers can use to circumvent this cost, and hurt lightweight wallets much more than storing a lot of data directly in storage.

Computation Costs

Now let's talk about the less obvious expenses to the Ethereum gas model. Almost every computational opcode has some cost associated with it. These costs are significantly less than storage or a direct resource, but with how many opcodes that are required to do basic operations in the EVM, they add up quickly. One of my favorite examples to point to is the cost of comparing two strings to determine if they are equal. So, I constructed this basic test contract:

contract GasTest{
    using strings for *;
    function StringUtilsEqual(string tmp1, string tmp2) constant returns (bool){
        return StringUtils.equal(tmp1, tmp2);
    }
    function SliceEquals(string tmp1, string tmp2) constant returns (bool){
        return tmp1.toSlice().equals(tmp2.toSlice());
    }
    function NaiveEquals(string tmp1, string tmp2) constant returns (bool){
        if(bytes(tmp1).length != bytes(tmp2).length) return false;
        uint256 max=bytes(tmp1).length;
        for(uint256 i=0;i<max;i++){ 
            if(bytes(tmp1)[i] != bytes(tmp2)[i]){
                return false;
            }
        }
        return true;
    }
    function Sha3Equals(string tmp1, string tmp2) constant returns (bool){
        return sha3(tmp1) == sha3(tmp2);
    }
}

This measures the gas cost of comparing strings 4 ways:

  1. Using the StringUtils library
  2. Using the StringSlice library
  3. Using a naive loop method to compare if the strings are equal
  4. Using two SHA3 operations to hash the strings, and then comparing if the hashes are equal

Before I go into the chart, the slowest method on a normal computer would be the SHA3 method. SHA3 is cheap, but even using very optimized libraries and fast machines, it hashes about 100 bytes in 0.028s, or 28ms.. Meaning that 2 hashes would take around 64ms. Meanwhile, comparing 2 strings on an (older) 64bit processor takes about 0.00183s, or 0.0183ms. Using traditional methods to compare 2 strings should be significantly cheaper if we're basing gas cost on how long an operation takes to compute and execute. So, here is the chart for the gas cost of these different methods:

stringequal gas costs

The Takeaway

So, at this point I assume it's expected for me to talk about how terrible Ethereum's gas schedule is. But no, the problem is not with the values used, it's with the entire model. And I will not pretend to know some perfect answer to this problem. It is definitely a complicated one.

If you ever look into things like the Intel Manuals, that describe the operation and timing of various opcodes in desktop x86 processors, and then go and measure the actual performance of these opcodes, the comparison between the documented timing and the actual timing varies... A LOT. This is because modern processors are extremely complicated and optimized pieces of silicon. There are multiple pipelines, cache levels, branch predictors, register cache sets, etc. Basically all of this is what allows a set of opcodes that may take 1ms, to operate in a tight loop and execute 100 executions of this in just 2ms, rather than 100ms. This complex caching, pipeline, and branch prediction structure is plainly not possible to represent in an isolated virtual machine. The gas model must work on a wide variety of processors, including processors of different architectures, and different bitness (ie, 64bit vs 32bit). So, a gas model is at best an approximation of this that is applicable to all processors.

However, despite no perfect solution being possible, there are definitely some ideas I have on improving the situation, and these are leads I will be researching in great detail while designing the gas model for the x86 VM in Qtum (coming 2018!)

Rich and Predictable Standard Library

Providing a rich standard library with set gas prices is probably one of the best ways to immediately improve the situation without a great amount of risk. This standard library has no need to actually be executed without the VM itself, though if the VM is effecient enough, there is no harm in it. The important thing is that the gas rate required to execute the function should be set by a reasonable human. This allows for things like branch prediction and caching to be measured and thus taken directly into account for the gas price. Ethereum does this for a VERY limited set of operations, such as SHA3. The base cost of SHA3 is 30 gas, and then 3 gas for each 32-byte word of data hashed. So, one could imagine a built-in StringEqual operation which might have a base gas cost of 5 gas, and then 1 gas for each byte compared.

The biggest problem with this method, is that using the Ethereum model, this requires using either pre-compiled contracts or adding new EVM opcodes. In the x86 VM I imagine being able to use native x86 code (potentially pre-compiled internally) and then somehow detecting when the VM enters this "special gas" memory area. It's definitely still an idea being figured out. But adding and changing the costs of these operations would require a fork or other network disruption. One big advantage of Qtum in this area is that we could extend the Decentralized Governance Protocol to cover this functionality, so that the base cost and rate cost of these operations could be easily adjusted. However, the way to add new functions under this specialized gas cost model is still an in-progress problem.

Cost for branches and memory, not for opcodes

One idea that may be a way forward is to charge not for opcodes, but rather for branches and memory access. The primary point behind this method is that most operations are so fast, that the timing difference is insignificant, and thus most operations could have a fixed or even zero cost. Then, the only cost would be for branches (potentially only backward branches) and memory allocations. The most expensive operations on modern processors are not explicit opcodes, but rather cache misses and wrongly predicted branches. Thus, this model aims to charge an ammortized cost for these conditions anytime they are possible. This would greatly encourage loop-unrolling, which is faster... sometimes, in certain conditions. If loop unrolling is done too much, it causes slower performance due to filling up cache, but the additional storage cost for unrolled loops would probably outweigh this pressure.

This method requires a lot of research to determine if it is at all feasible, and what edge cases need particular care. The main advantage of this method is that gas costs can be made to be very predictable when compared to current models. I'm sure this model alone is not a workable solution, but combining this concept with another gas concept can bring predictability for smart contract developers.

Including macro-operations in gas model

Because there is so much identical code generated by modern compilers (ie, they figured out the most effecient way to compile common operations) it could be beneficial to include not just raw opcodes into the gas model, but also a series of "macro-operations". These macro-operations would need to be carefully assessed and priced, with a way to predict the time cost of it's execution. The hard part is that the general execution environment is so volatile, perfectly predicting the time cost may be near impossible. However, any hole in this model is a potential DoS exploit available to attackers.

The problem with all of this

At some point, by placing more and more code in the blockchain consensus system in order to properly measure the true resource cost of an execution, you actually spend more time measuring than you do executing the code. This is one of the hardest problems of gas models in general. You want something consistent that can keep consensus across a variety of implementations, but also prevents cheap attacks, and finally is friendly to various virtual machine optimizations including pre-compilation and JIT execution. These aspects come together to really form the most difficult aspect of this entire topic. This is definitely one of the reasons why it is still very much an open research problem. The problem with turing-completeness is that it is impossible to measure the cost of a smart contract until you actually execution it. The best that can be done is to measure various pieces of code, and then you can at least know the gas price of a piece of code up until the next branch. This is actually one of the weaknesses of the x86 architecture for smart contracts. Code is data and can be mutated and overwritten. It may end up being better in the final implementation to have a mixed gas model, so that constant internal (ie, loaded like EVM code is now) bytecode has a more powerful model, but then code loaded and executed dynamically in read-write-execute memory uses a less computationally expensive (albeit less accurate/financially more expensive) gas model which can be used on-the-fly.

Conclusions

Gas is hard, there are no silver bullets, blockchain is hard.

Notes

Gas cost calculations:

memory(size) = 3 * (size/32) + (size/32) * (size/32) / 512 

# Gas costs of loading existing data in EVM (assuming memory is already allocated)
(not accounting for memory costs to store the data in)
## external code (semi-volatile)
200 + 700 + (3 * size)
sload + extcodecopy + (copy * size)
## internal code (constant)
3 * size
copy * size
## storage (mutable)
200 * size
sload * size
## Transaction (mutable)
200 + (68 * (size * 32)) + 30 + (6 * size) ## note: zero data is only 4 gas
sload + (txnonzeroByte * (size * 32)) + sha3 + (sha3word * size)


# Gas costs of storing data initially to EVM
(not accounting for memory costs data is initially stored in)
## external code (semi-volatile)
32000 + 20000 + (200 * size * 32) + memory(size) + 10
create + sstore + (createByte * (size * 32)) + memory(size) + overhead
## internal code (constant)
(200 * (size * 32))
(createByte * (size * 32))
## storage (mutable)
20000 * size
sstore * size
## Transaction (mutable)
20000 + 30 + (6 * size) ## note: zero data is only 4 gas
sstore + sha3 + (sha3word * size)

# Gas costs of updating data in EVM
(not accounting for memory costs of updated data to store)
## external code (semi-volatile)
32000 + 5000 + (200 * size * 32) + memory(size) + 10
create + sstoreReset + (createByte * (size * 32)) + memory(size) + overhead
## internal code (constant)
N/A (technically could be possible, but would greatly complicated contract design)
## storage (mutable)
5000 * size
sstoreReset * size
## Transaction (mutable)
5000 + 30 + (6 * size) ## note: zero data is only 4 gas
sstoreReset + sha3 + (sha3word * size)

Test contract bytecode including libraries etc:

6060604052341561000f57600080fd5b610f488061001e6000396000f30060606040523615610081576000357c0100000000000000000000000000000000000000000000000000000000900463ffffffff1680633a96fdd71461008657806346bdca9a1461013a5780635d62fdf5146101f2578063674ff51b146102aa57806388ee0b2a146103625780638a0807b71461041a578063ef5a8374146104ce575b600080fd5b341561009157600080fd5b610124600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f01602080910402602001604051908101604052809392919081815260200183838082843782019150505050505091905050610586565b6040518082815260200191505060405180910390f35b341561014557600080fd5b6101d8600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190505061084a565b604051808215151515815260200191505060405180910390f35b34156101fd57600080fd5b610290600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f01602080910402602001604051908101604052809392919081815260200183838082843782019150505050505091905050610860565b604051808215151515815260200191505060405180910390f35b34156102b557600080fd5b610348600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f01602080910402602001604051908101604052809392919081815260200183838082843782019150505050505091905050610874565b604051808215151515815260200191505060405180910390f35b341561036d57600080fd5b610400600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f016020809104026020016040519081016040528093929190818152602001838380828437820191505050505050919050506108a1565b604051808215151515815260200191505060405180910390f35b341561042557600080fd5b6104b8600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f016020809104026020016040519081016040528093929190818152602001838380828437820191505050505050919050506109eb565b6040518082815260200191505060405180910390f35b34156104d957600080fd5b61056c600480803590602001908201803590602001908080601f0160208091040260200160405190810160405280939291908181526020018383808284378201915050505050509190803590602001908201803590602001908080601f01602080910402602001604051908101604052809392919081815260200183838082843782019150505050505091905050610d17565b604051808215151515815260200191505060405180910390f35b6000610590610eee565b610598610eee565b6000808693508592508351915081835110156105b357825191505b600090505b818110156107f65782818151811015156105ce57fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff1916848281518110151561064957fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff191610156106e4577fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff9450610840565b82818151811015156106f257fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff1916848281518110151561076d57fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff191611156107e95760019450610840565b80806001019150506105b8565b825184511015610828577fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff9450610840565b82518451111561083b5760019450610840565b600094505b5050505092915050565b6000806108578484610586565b14905092915050565b600061086c838361084a565b905092915050565b600061089961088283610df0565b61088b85610df0565b610e1e90919063ffffffff16565b905092915050565b6000806000835185511415156108ba57600092506109e3565b84519150600090505b818110156109de5783818151811015156108d957fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff1916858281518110151561095457fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff19161415156109d157600092506109e3565b80806001019150506108c3565b600192505b505092915050565b60006109f5610eee565b6109fd610eee565b600080869350859250600184511080610a17575060018351105b80610a23575083518351115b15610a50577fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff9450610d0d565b6fffffffffffffffffffffffffffffffff84511115610a91577fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff9450610d0d565b60009150600090505b8351811015610ce957826000815181101515610ab257fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff19168482815181101515610b2d57fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff19161415610cdc57600191505b825182108015610bb757508351828201105b8015610cb857508282815181101515610bcc57fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff191684838301815181101515610c4957fe5b9060200101517f010000000000000000000000000000000000000000000000000000000000000090047f0100000000000000000000000000000000000000000000000000000000000000027effffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff1916145b15610cca578180600101925050610ba5565b8251821415610cdb57809450610d0d565b5b8080600101915050610a9a565b7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff94505b5050505092915050565b6000816040518082805190602001908083835b602083101515610d4f5780518252602082019150602081019050602083039250610d2a565b6001836020036101000a038019825116818451168082178552505050505050905001915050604051809103902060001916836040518082805190602001908083835b602083101515610db65780518252602082019150602081019050602083039250610d91565b6001836020036101000a03801982511681845116808217855250505050505090500191505060405180910390206000191614905092915050565b610df8610f02565b600060208301905060408051908101604052808451815260200182815250915050919050565b600080610e2b8484610e34565b14905092915050565b60008060008060008060008060008a6000015197508a600001518a600001511015610e6157896000015197505b8a60200151965089602001519550600094505b87851015610ed25786519350855192508284141515610ebb57600185896020030160080260020a03199150818316828516039050600081141515610eba57809850610ee0565b5b602087019650602086019550602085019450610e74565b89600001518b600001510398505b505050505050505092915050565b602060405190810160405280600081525090565b6040805190810160405280600081526020016000815250905600a165627a7a72305820fcb471f7522e763de8c06e6eba885f81d32bc8128e364dc46a35fd7e6c960bb50029

Raw string gas measurements:

Gas used for string "x" 1
StringUtilsEqual 23884
SliceEquals 23928
NaiveEquals 23433
Sha3Equals 23708

Gas used for string "foobar123" 9
StringUtilsEqual 28268
SliceEquals 24952
NaiveEquals 26385
Sha3Equals 24732

Gas used for string "foobarxxx124529898453ifdf" 25
StringUtilsEqual 37036
SliceEquals 27000
NaiveEquals 32289
Sha3Equals 26780

Gas used for string "jhkfdfdsfsdgfdsgdsffgfdsgrsehsfrhfrdggewrrtrewgfdsaffvdsfgdsf" 61
StringUtilsEqual 57032
SliceEquals 32006
NaiveEquals 45841
Sha3Equals 31859

Gas used for string "hjfhjdskhjglkhdsjlkghjlkdshjkglhjdsfkhjglfdsjhgjhldsfhgjlkdsfhjkghjfdsklhgjlkfdsjhgfdsgfdshgfds" 95
StringUtilsEqual 75932
SliceEquals 36756
NaiveEquals 58655
Sha3Equals 36682

Gas used for string "hgjkghjdskhgjlserijhlktjiurewihuygtiuserhjkgfhfdsjkvgfhjklfdshjghfdsjghufdshgjlkfdshjlkghfdsjlkghjlkfdshgjlkhfdseulkghfdslkjhgulkijfdshg" 136
StringUtilsEqual 98936
SliceEquals 42801
NaiveEquals 74320
Sha3Equals 42872
Posted: 10/2/2017 3:50:00 PM