Ethereum in Depth, Part 2

| July 24, 2018

Security Insights

Thank you for your interest in this post! We’re undergoing a rebranding process, so please excuse us if some names are out of date. Also have in mind that this post might not reference the latest version of our products. For up-to-date guides, please check our documentation site.

By Facu Spagnuolo

Welcome to the second part of this guide. If you haven’t read part 1, I highly recommend it to better understand this post. This second article will explain everything about data management. We will see how memory, storage, calldata and stack data is manipulated
To better understand this article, you should be familiar with the basics of the EVM. If you are not, I highly recommend reading these posts first.
Throughout this post we will illustrate some examples and demonstrations using sample contracts you can find in this repository. Please clone it, run npm install, and check it out before beginning.
Enjoy, and please do not hesitate to reach out with questions, suggestions or feedback.

Data Management

The EVM manages different kinds of data depending on their context, and it does that in different ways. We can distinguish at least four main types of data: stack, calldata, memory, and storage, besides the contract code. Let’s analyze each of these:

Stack

The EVM is a stack machine, meaning that it doesn’t operate on registers but on a virtual stack. The stack has a maximum size of 1024. Stack items have a size of 256 bits; in fact, the EVM is a 256-bit word machine (this facilitates Keccak256 hash scheme and elliptic-curve computations). Here is where most opcodes consume their parameters from.
The EVM provides many opcodes to modify the stack directly. Some of these include:

POP removes item from the stack.
PUSH_n_ places the following n bytes item in the stack, with n from 1 to 32.
DUP_n_ duplicates the n_th stack item, with _n from 1 to 32.
SWAP_n_ exchanges the 1st and n_th stack item, with _n from 1 to 32.

Calldata

The calldata is a read-only byte-addressable space where the data parameter of a transaction or call is held. Unlike the stack, to use this data you have to specify an exact byte offset and number of bytes you want to read.
The opcodes provided by the EVM to operate with the calldata include:

CALLDATASIZE tells the size of the transaction data.
CALLDATALOAD loads 32 bytes of the transaction data onto the stack.
CALLDATACOPY copies a number of bytes of the transaction data to memory.

Solidity also provides an inline assembly version of these opcodes. These are calldatasize, calldataload and calldatacopy respectively. The last one expects three arguments (t, f, s): it will copy s bytes of calldata at position f into memory at position t. In addition, Solidity lets you access to the calldata through msg.data.
As you may have noticed, we used some of these opcodes in some examples of the previous post. Let’s take a look at the inline assembly code block of a delegatecall again:

assembly {
  let ptr := mload(0x40)
  calldatacopy(ptr, 0, calldatasize)
  let result := delegatecall(gas, _impl, ptr, calldatasize, 0, 0)
}

To delegate the call to the _impl address, we must forward msg.data. Given that the delegatecall opcode operates with data in memory, we need to copy the calldata into memory first. That’s why we use calldatacopy to copy all the calldata to a memory pointer (note that we are using calldatasize).
Let’s analyze another example using calldata. You will find a Calldata contract in the exercise 3 folder with the following code:

contract Calldata {
    function add(uint256 _a, uint256 _b) public view
    returns(uint256 result) {
        assembly {
            let a: = mload(0x40)
            let b: = add(a, 32)
            calldatacopy(a, 4, 32)
            calldatacopy(b, add(4, 32), 32)
            result: = add(mload(a), mload(b))
        }
    }
}

The idea here is to return the addition of two numbers passed by arguments. As you can see, once again we are loading a memory pointer reading from <code0x40, but please ignore that for now; we will explain it right after this example. We are storing that memory pointer in the variable a and storing in b the following position which is 32-bytes right after a. Then we use calldatacopy to store the first parameter in a. You may have noticed we are copying it from the 4th position of the calldata instead of its beginning. This is because the first 4 bytes of the calldata hold the signature of the function being called, in this case bytes4(keccak256("add(uint256,uint256)")); this is what the EVM uses to identify which function has to be executed on a call. Then, we store the second parameter in b copying the following 32 bytes of the calldata. Finally, we just need to calculate the addition of both values loading them from memory.
You can test this yourself with a truffle console running the following commands:

truffle(develop)> compile
truffle(develop)> Calldata.new().then(i => calldata = i)
truffle(develop)> calldata.add(1, 6).then(r => r.toString()) // 7

Memory

Memory is a volatile read-write byte-addressable space. It is mainly used to store data during execution, mostly for passing arguments to internal functions. Given this is volatile area, every message call starts with a cleared memory. All locations are initially defined as zero. As calldata, memory can be addressed at byte level, but can only read 32-byte words at a time.
Memory is said to “expand” when we write to a word in it that was not previously used. Additionally to the cost of the write itself, there is a cost to this expansion, which increases linearly for the first 724 bytes and quadratically after that.
The EVM provides three opcodes to interact with the memory area:

MLOAD loads a word from memory into the stack.
MSTORE saves a word to memory.
MSTORE8 saves a byte to memory.

Solidity also provides an inline assembly version of these opcodes.
There is another key thing we need to know about memory. Solidity always stores a free memory pointer at position 0x40, i.e. a reference to the first unused word in memory. That’s why we load this word to operate with inline assembly. Since the initial 64 bytes of memory are reserved for the EVM, this is how we can ensure that we are not overwriting memory that is used internally by Solidity. For instance, in the delegatecall example presented above, we were loading this pointer to store the given calldata to forward it. This is because the inline-assembly opcode delegatecall needs to fetch its payload from memory.
Additionally, if you pay attention to the bytecode output by the Solidity compiler, you will notice that all of them start with 0x6060604052…, which means:

PUSH1   :  EVM opcode is 0x60
0x60    :  The free memory pointer
PUSH1   :  EVM opcode is 0x60
0x40    :  Memory position for the free memory pointer
MSTORE  :  EVM opcode is 0x52

You must be very careful when operating with memory at assembly level. Otherwise, you could overwrite a reserved space.

Storage

Storage is a persistent read-write word-addressable space. This is where each contract stores its persistent information. Unlike memory, storage is a persistent area and can only be addressed by words. It is a key-value mapping of 2²⁵⁶ slots of 32 bytes each. A contract can neither read nor write to any storage apart from its own. All locations are initially defined as zero.
The amount of gas required to save data into storage is one of the highest among operations of the EVM. This cost is not always the same. Modifying a storage slot from a zero value to a non-zero one costs 20,000. While storing the same non-zero value or setting a non-zero value to zero costs 5,000. However, in the last scenario when a non-zero value is set to zero, a refund of 15,000 will be given.
The EVM provides two opcodes to operate the storage:

SLOAD loads a word from storage into the stack.
SSTORE saves a word to storage.

These opcodes are also supported by the inline assembly of Solidity.
Solidity will automatically map every defined state variable of your contract to a slot in storage. The strategy is fairly simple — statically sized variables (everything except mappings and dynamic arrays) are laid out contiguously in storage starting from position 0.
For dynamic arrays, this slot (p) stores the length of the array and its data will be located at the slot number that results from hashing p (keccak256(p)). For mappings, this slot is unused and the value corresponding to a key k will be located at keccak256(k,p). Bear in mind that the parameters of keccak256 (k and p) are always padded to 32 bytes.
Let’s take a look at a code example to understand how this works. Inside the exercise 3 contracts folder you will find a Storage contract with the following code:

contract Storage {
    uint256 public number;
    address public account;
    uint256[] private array;
    mapping(uint256 => uint256) private map;

    function Storage() public {
        number = 2;
        account = this;
        array.push(10);
        array.push(100);
        map[1] = 9;
        map[2] = 10;
    }
}

Now, let’s open a truffle console to test its storage structure. First, we will compile and create a new contract instance:

truffle(develop)> compile
truffle(develop)> Storage.new().then(i => storage = i)

Then we can ensure that the address 0 holds a number 2 and the address 1 holds the address of the contract:

truffle(develop)> web3.eth.getStorageAt(storage.address, 0)  // 0x02
truffle(develop)> web3.eth.getStorageAt(storage.address, 1)  // 0x..

We can check that the storage position 2 holds the length of the array as follows:

truffle(develop)> web3.eth.getStorageAt(storage.address, 2)  // 0x02

Finally, we can check that the storage position 3 is unused and the mapping values are stored as we described above:

truffle(develop)> web3.eth.getStorageAt(storage.address, 3) // 0x00
truffle(develop)> mapIndex = ‘0000000000000000000000000000000000000000000000000000000000000003’
truffle(develop)> firstKey = ‘0000000000000000000000000000000000000000000000000000000000000001’
truffle(develop)> firstPosition = web3.sha3(firstKey + mapIndex, { encoding: ‘hex’ })
truffle(develop)> web3.eth.getStorageAt(storage.address, firstPosition) // 0x09
truffle(develop)> secondKey = ‘0000000000000000000000000000000000000000000000000000000000000002’
truffle(develop)> secondPosition = web3.sha3(secondKey + mapIndex, { encoding: ‘hex’ })
truffle(develop)> web3.eth.getStorageAt(storage.address, secondPosition)  // 0x0A

Great! We have demonstrated that the Solidity storage strategy works as we understand it! To learn more about how Solidity maps state variables into the storage, read the official documentation.
Thank you for reading this post, please remember any questions, feedback or suggestions are welcome!

Data Management

Stack

Calldata

Memory

Storage

Related Posts

A Developer’s Guide to Building Safe Noir Circuits

The Notorious Bug Digest #4: Deflationary Token Risks, ERC4626 Override Gaps, and Rust Shift Overflows

Inside ZKStack's Crosschain Architecture — Part II: Gateway Settlement & Recursive Proofs