Ethereum in Depth, Part 1

 

Thank you for your interest in this post! We’re undergoing a rebranding process, so please excuse us if some names are out of date. Also have in mind that this post might not reference the latest version of our products. For up-to-date guides, please check our documentation site.

By Facu Spagnuolo

Welcome to the first part of a two-posts series aimed at software developers looking to understand how the EVM works. The idea is to explain and describe in detail core behavior of the EVM. We will see how contracts are created, how message calls work, and take a look at everything related to data management, such as storage, memory, calldata, and the stack.

To better understand this article, you should be familiar with the basics of the EVM. If you are not, I highly recommend reading these posts first.

Throughout this post we will illustrate some examples and demonstrations using sample contracts you can find in this repository. Please clone it, run npm install, and check it out before beginning.

Enjoy, and please do not hesitate to reach out with questions, suggestions or feedback.

Ethereum Contracts

Basics

Smart contracts are just computer programs, and we can say that Ethereum contracts are smart contracts that run on the Ethereum Virtual Machine. The EVM is the sandboxed runtime and a completely isolated environment for smart contracts in Ethereum. This means that every smart contract running inside the EVM has no access to the network, file system, or other processes running on the computer hosting the VM.

As we already know, there are two kinds of accounts: contracts and external accounts. Every account is identified by an address, and all accounts share the same address space. The EVM handles addresses of 160-bit length. Every account consists of a balance, a nonce, bytecode, and stored data (storage). However, there are some differences between these two kinds of accounts. For instance, the code and storage of external accounts are empty, while contract accounts store their bytecode and the merkle root hash of the entire state tree. Moreover, while external addresses have a corresponding private key, contract accounts don’t. The actions of contract accounts are controlled by the code they host in addition to regular cryptographic signing of every Ethereum transaction.

Creation

The creation of a contract is simply a transaction in which the receiver address is empty and its data field contains the compiled bytecode of the contract to be created (this makes sense — contracts can create contracts too). Let’s look at a quick example. Please open the directory of exercise 1; in it you will find a contract called MyContract with the following code:

[pragma solidity ^ 0.4 .21;
contract MyContract {
    event Log(address addr);

    function MyContract() public {
        emit Log(this);
    }

    function add(uint256 a, uint256 b) public pure returns(uint256) {
        return a + b;
    }
}

Let’s open a truffle console in develop mode running truffle develop. Once inside, follow the subsequent commands to deploy an instance of MyContract:

truffle(develop)> compile
truffle(develop)> sender = web3.eth.accounts[0]
truffle(develop)> opts = { from: sender, to: null, data: MyContract.bytecode, gas: 4600000 }
truffle(develop)> txHash = web3.eth.sendTransaction(opts)

We can check that our contract has been deployed successfully by running the following code:

truffle(develop)> receipt = web3.eth.getTransactionReceipt(txHash)
truffle(develop)> myContract = new MyContract(receipt.contractAddress)
truffle(develop)> myContract.add(10, 2)
{ [String: ‘12’] s: 1, e: 1, c: [ 12 ] }

Let’s go deeper to analyze what we just did. The first thing that happens when a new contract is deployed to the Ethereum blockchain is that its account is created.¹ As you can see, we logged the address of the contract in the constructor in the example above. You can confirm this by checking that receipt.logs[0].data is the address of the contract-padded 32 bytes and that receipt.logs[0].topics is the keccak-256 hash of the string “Log(address)”.

As the next step, the data sent in with the transaction is executed as bytecode. This will initialize the state variables in storage, and determine the body of the contract being created. This process is executed only once during the lifecycle of a contract. The initialization code is not what is stored in the contract; it actually produces as its return value the bytecode to be stored. Bear in mind that after a contract account has been created, there is no way to change its code.²

Given the fact that the initialization process returns the code of the contract’s body to be stored, it makes sense that this code isn’t reachable from the constructor logic. For example, let’s take a look at the Impossible contract of exercise 1:

[contract Impossible {
    function Impossible() public {
        this.test();
    }

    function test() public pure returns(uint256) {
        return 2;
    }
}

If you try to compile this contract, you will get a warning saying you’re referencing this within the constructor function, but it will compile. However, if you try to deploy a new instance, it will revert. This is because it makes no sense to attempt to run code that is not stored yet.³ On the other hand, we were able to access the address of the contract: the account exists, but it doesn’t have any code yet.

However, a code execution can produce other events, such as altering the storage, creating further accounts, or making further message calls. For example, let’s take a look at the AnotherContract code:

[contract AnotherContract {
    MyContract public myContract;

    function AnotherContract() public {
        myContract = new MyContract();
    }
}

Let’s see how it works running the following commands inside a truffle console:

truffle(develop)> compile
truffle(develop)> sender = web3.eth.accounts[0]
truffle(develop)> opts = { from: sender, to: null, data: AnotherContract.bytecode, gas: 4600000 }
truffle(develop)> txHash = web3.eth.sendTransaction(opts)
truffle(develop)> receipt = web3.eth.getTransactionReceipt(txHash)
truffle(develop)> anotherContract = AnotherContract.at(receipt.contractAddress)
truffle(develop)> anotherContract.myContract().then(a => myContractAddress = a)
truffle(develop)> myContract = MyContract.at(myContractAddress)
truffle(develop)> myContract.add(10, 2)
{ [String: ‘12’] s: 1, e: 1, c: [ 12 ] }

Additionally, contracts can be created using the CREATE opcode, which is what the Solidity new construct compiles down to. Both alternatives work the same way. Let’s continue exploring how message calls work.

Message Calls

Contracts can call other contracts through message calls. Every time a Solidity contract calls a function of another contract, it does so by producing a message call. Every call has a sender, a recipient, a payload, a value, and an amount of gas. Solidity provides a native call method for the address type that works as follows:

address .call.gas(gas).value(value)(data)

gas is the amount of gas to be forwarded, address is the address to be called, value is the amount of Ether to be transferred in wei, and data is the payload to be sent. Bear in mind that value and gas are optional parameters here, but be careful because almost all the remaining gas of the sender will be sent by default in a low-level call.

As you can see, each contract can decide the amount of gas to be forwarded in a call. Given that every call can end in an out-of-gas (OOG) exception, to avoid security issues at least 1/64th of the sender’s remaining gas will be saved. This allows senders to handle inner calls’ out-of-gas errors, so that they are able to finish its execution without themselves running out of gas, and thus bubbling the exception up.

Let’s take a look at the Caller contract of exercise 2:

[contract Implementation {
    event ImplementationLog(uint256 gas);

    function () public payable {
        emit ImplementationLog(gasleft());
        assert(false);
    }
}
contract Caller {
    event CallerLog(uint256 gas);
    Implementation public implementation;

    function Caller() public {
        implementation = new Implementation();
    }

    function () public payable {
        emit CallerLog(gasleft());
        implementation.call.gas(gasleft()).value(msg.value)(msg.data);
        emit CallerLog(gasleft());
    }
}

The Caller contract has only a fallback function that redirects every received call to an instance of Implementation. This instance simply throws through an assert(false) on every received call, which will consume all the gas given. Then, the idea here is to log the amount of gas in Caller before and right after forwarding a call to Implementation. Let’s open a truffle console and see what happens:

truffle(develop)> compile
truffle(develop)> Caller.new().then(i => caller = i)
truffle(develop)> opts = { gas: 4600000 }
truffle(develop)> caller.sendTransaction(opts).then(r => result = r)
truffle(develop)> logs = result.receipt.logs
truffle(develop)> parseInt(logs[0].data)
// 4578955
truffle(develop)> parseInt(logs[1].data)
// 71495

As you can see, 71495 is approximately the 64th part of 4578955. This example clearly demonstrates that we can handle an OOG exception from an inner call.

Solidity also provides the following opcode, allowing us to manage calls with inline assembly:

call(g, a, v, in, insize, out, outsize)

Where g is the amount of gas to be forwarded, a is the address to be called, v is the amount of Ether to be transferred in wei, in states the memory position of insize bytes where the call data is held, and out and outsize state where the return data will be stored in memory. The only difference is that an assembly call allows us to handle return data, while the function will only return 1 or 0 whether it failed or not.

The EVM supports a special variant of a message call called delegatecall. Once again, Solidity provides a built-in address method in addition to an inline assembly version of it. The difference with a low-level call is that the target code is executed within the context of the calling contract, and msg.sender and msg.value do not change.⁴

Let’s analyze the following example to understand better how a delegatecall works. Let’s start with the Greeter contract:

[contract Greeter {
    event Thanks(address sender, uint256 value);

    function thanks() public payable {
        emit Thanks(msg.sender, msg.value);
    }
}

As you can see, the Greeter contract simply declares a thanks function that emits an event carrying the msg.value and msg.sender data. We can try this method by running the following lines in a truffle console:

truffle(develop)> compile
truffle(develop)> someone = web3.eth.accounts[0]
truffle(develop)> ETH_2 = new web3.BigNumber(‘2e18’)
truffle(develop)> Greeter.new().then(i => greeter = i)
truffle(develop)> opts = { from: someone, value: ETH_2 }
truffle(develop)> greeter.thanks(opts).then(tx => log = tx.logs[0])
truffle(develop)> log.event
//Thanks
truffle(develop)> log.args.sender === someone
//true
truffle(develop)> log.args.value.eq(ETH_2)
//true

Now that we have confirmed its functionality, let’s pay attention to the Wallet contract:

[contract Wallet {
    Greeter internal greeter;

    function Wallet() public {
        greeter = new Greeter();
    }

    function () public payable {
        bytes4 methodId = Greeter(0).thanks.selector;
        require(greeter.delegatecall(methodId));
    }
}

This contract only defines a fallback function that executes the Greeter#thanks method through a delegatecall. Let’s see what happens when we call Greeter#thanks through the Wallet contract:

truffle(develop)> Wallet.new().then(i => wallet = i)
truffle(develop)> wallet.sendTransaction(opts).then(r => tx = r)
truffle(develop)> logs = tx.receipt.logs
truffle(develop)> SolidityEvent = require(‘web3/lib/web3/event.js’)
truffle(develop)> Thanks = Object.values(Greeter.events)[0]
truffle(develop)> event = new SolidityEvent(null, Thanks, 0)
truffle(develop)> log = event.decode(logs[0])
truffle(develop)> log.event
// Thanks
truffle(develop)> log.args.sender === someone
// true
truffle(develop)> log.args.value.eq(ETH_2)
// true

As you may have noticed, we have just confirmed that the delegatecall function preserves the msg.value and msg.sender.

This means that a contract can dynamically load code from a different address at runtime. Storage, current address and balance still refer to the calling contract, only the code is taken from the called address. This makes it possible to implement the ‘library’ feature in Solidity.” ⁵
There is one more thing we should explore about delegatecalls. As mentioned above, the storage of the calling contract is the one being accessed by the executed code. Let’s see the Calculator contract:

[contract ResultStorage {
    uint256 public result;
}
contract Calculator is ResultStorage {
    Product internal product;
    Addition internal addition;

    function Calculator() public {
        product = new Product();
        addition = new Addition();
    }

    function add(uint256 x) public {
        bytes4 methodId = Addition(0).calculate.selector;
        require(addition.delegatecall(methodId, x));
    }

    function mul(uint256 x) public {
        bytes4 methodId = Product(0).calculate.selector;
        require(product.delegatecall(methodId, x));
    }
}
contract Addition is ResultStorage {
    function calculate(uint256 x) public returns(uint256) {
        uint256 temp = result + x;
        assert(temp >= result);
        result = temp;
        return result;
    }
}
contract Product is ResultStorage {
    function calculate(uint256 x) public returns(uint256) {
        if (x == 0) result = 0;
        else {
            uint256 temp = result * x;
            assert(temp / result == x);
            result = temp;
        }
        return result;
    }
}

The Calculator contract has just two functions: add and product. The Calculator contract doesn’t know how to add or multiply; it delegates those calls to the Addition and Product contracts respectively instead. However, all these contracts share the same state variable result to store the result of each calculation. Let’s see how this works:

truffle(develop)> Calculator.new().then(i => calculator = i)
truffle(develop)> calculator.addition().then(a => additionAddress=a)
truffle(develop)> addition = Addition.at(additionAddress)
truffle(develop)> calculator.product().then(a => productAddress = a)
truffle(develop)> product = Product.at(productAddress)
truffle(develop)> calculator.add(5)
truffle(develop)> calculator.result().then(r => r.toString())
// 5
truffle(develop)> addition.result().then(r => r.toString())
// 0
truffle(develop)> product.result().then(r => r.toString())
// 0
truffle(develop)> calculator.mul(2)
truffle(develop)> calculator.result().then(r => r.toString())
// 10
truffle(develop)> addition.result().then(r => r.toString())
// 0
truffle(develop)> product.result().then(r => r.toString())
// 0

We have just confirmed that we are using the storage of the Calculator contract. Besides that, the code being executed is stored in the Addition and in the Product contracts.

Additionally, as for the call function, there is a Solidity assembly opcode version for delegatecall. Let’s take a look at the Delegator contract to see how we can use it:

[contract Implementation {
    event ImplementationLog(uint256 gas);

    function () public payable {
        emit ImplementationLog(gasleft());
        assert(false);
    }
}
contract Delegator {
    event DelegatorLog(uint256 gas);
    Implementation public implementation;

    function Delegator() public {
        implementation = new Implementation();
    }

    function () public payable {
        emit DelegatorLog(gasleft());
        address _impl = implementation;
        assembly {
            let ptr: = mload(0x40)
            calldatacopy(ptr, 0, calldatasize)
            let result: = delegatecall(gas, _impl, ptr, calldatasize, 0, 0)
        }

        emit DelegatorLog(gasleft());
    }
}

This time we are using inline assembly to execute the delegatecall. As you may have noticed, there is no value argument here, since msg.value will not change. You may be wondering why we are loading the 0x40 address, or what calldatacopy and calldatasize are. Don’t panic — we will describe them in the next post of the series. In the meantime, feel free to run the same commands over a truffle console to validate its behavior.

Once again, it’s important to understand clearly how a delegatecall works. Every triggered call will be sent from the current contract and not the delegate-called contract. Additionally, the executed code can read and write to the storage of the caller contract. You can view this as a cliffhanger for the next post, in which storage will be described in more detail.

Thank you for reading this post! We have learned more in-depth details about how smart contracts work in the Ethereum world. Please remember that questions, feedback, and suggestions are welcome! If you liked the post, stay tuned for the second part focused on data management!

¹ “The address of the new account is defined as being the rightmost 160 bits of the Keccak hash of the RLP encoding of the structure containing only the sender and the account nonce.” (Ethereum Yellow Paper)

² One of the pillars of zeppelin_os is contract upgradeability. At Zeppelin, we’ve been exploring different strategies to implement this. You can read more about this here.

³ Solidity will check before calling an external function that the address has bytecode in it, and otherwise revert.

⁴ This is the main strategy being used these days to build upgradeable contracts.

⁵ Solidity 0.4.23 official documentation.