Deconstructing a Solidity Contract — Part V: Function Bodies

| September 20, 2018

Products

Image from [www.snappygoat.com]( http://www.snappygoat.com)

By Alejandro Santander in collaboration with Leo Arias

Thank you for your interest in this post! We’re undergoing a rebranding process, so please excuse us if some names are out of date.

Note: This article is part of a series. If you haven’t read the previous article, please have a look at it first. We’re deconstructing the EVM bytecode of a simple Solidity contract.

Deconstructing a Solidity Contract — Part I: Introduction ✔
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
Deconstructing a Solidity Contract — Part III: The Function Selector ✔
Deconstructing a Solidity Contract — Part IV: Function Wrappers ✔
Deconstructing a Solidity Contract — Part V: Function Bodies ⬅
Deconstructing a Solidity Contract — Part VI: The Metadata Hash

Hey there! We’ve come a long way, haven’t we? First, we understood the difference between a contract’s creation time and runtime bytecode; next, we understood how the entry point of execution from any call or transaction is routed to specific functions via the function selector; and finally, we saw how incoming transaction data is unpacked for a function to consume, and the data produced by a function is repacked for the user via function wrappers. In this section, we will (at last) look at the actual execution of a function, or what we’ve been calling so far a “function’s body”.

The function body is precisely what the function wrappers detour to, after unpacking the incoming calldata. By the time a function body is executed, the function’s arguments should be sitting comfortably in the stack (or in memory if the data is dynamic), anxious to be used. Let’s see this in action with the balanceOf(address) function. This function should receive an address and return the corresponding uint256 balance of such an address.

Let’s go back to Remix and compile and deploy the contract as we’ve done before, and then call the balanceOf function with the address you used to deploy the contract as the argument. This should return the number 10000, since it is what is initially assigned to whatever address deploys the contract in the constructor’s code, which we used when deploying the contract.

Right, now let’s debug the transaction.

The first thing you’ll notice is that the debugger placed us at instruction 252. If you look at the deconstruction diagram, in the wrappers’ blue section, you should see that the balanceOf function wrapper redirects flow at instruction 175 to the JUMPDEST instruction in 251. As we’ve seen multiple times before, Remix places us precisely at the point when a function’s body is about to be executed.

Figure 1. Function wrapper redirects execution to the function body (blue dashed line at instruction 175).

Figure 2. Function body execution, coming from the function’s wrapper (blue dashed line at instruction 251).

Now, if you look at the stack, you’ll notice that its topmost value is the address we called balanceOf with. The wrapper has done its job of unpacking the calldata correctly. So we’re ready to step through instructions 251 to 290, the body of the balanceOf function.

Instruction 252 pushes a 20-byte 0xffffffffffffffffffffffffffffffffffffffffvalue and uses the AND opcode to “mask” the 32-byte address into its correct type (remember, addresses in Ethereum are 20 bytes long while the stack operates in 32-byte words).

In instructions 274 to 278,

the bytecode will upload the address from the stack to memory. It needs it there for the upcoming SHA3 opcode. If you look in the Yellow Paper, the SHA3 opcode has two parameters: the position in memory to calculate the hash from, and the number of bytes to hash.

But why will the code be using a SHA3 opcode? This function wants to read from the balances mapping. More specifically, it wants to read the value mapped for the incoming address. If you recall how a mapping is laid out in storage, the hash of the concatenation of the variable’s slot — in this case 1, because balances is defined as the second variable (totalSupply_ is the first, at slot 0), with the actual key itself as the address — is the position in storage where the value we are looking for is stored. SHA3 will need both these values in memory to do its magic, and that is precisely what’s happening here.

So, we’ve got the address in memory, but now we need the slot in memory. And that’s what happens next between instructions 279 and 283.

The number 0x01 is stored at memory position 0x20. Now memory holds the address at the first word, memory position 0x00, and the slot at the second word, memory position 0x20. Yay! We’re ready to call SHA3.

And so it’s called between instructions 284 and 287.

By the time SHA3 is called in instruction 287, the stack contains 0x00 (start position for SHA3) and 0x40 (length for SHA3), which is basically telling the EVM to hash whatever is in memory in the first two 32-byte words. Thirty-two bytes in hex is 0x20, so 0x20 + 0x20 equals 0x40.

Now, SHA3 leaves the 32-byte hash in the stack, which is an awfully long hexadecimal number, considerably longer than an Ethereum address. This hash is the location in the contract’s storage where the balance of the address passed to balanceOf is stored. You can visualize this using the Storage completely loaded panel in Remix’s debugger. You should find a matching location in the second storage object.

What’s stored at this location? The number 10000, or 0x2710 in hex. At instruction 288, SLOAD takes the argument of where to read from storage (our hash) and pushes 0x2710 to the stack.

Finally, at instruction 289, SWAP1 resurfaces the function wrapper’s JUMPDEST location (0x70, or 112), and a JUMP in instruction 290 takes us back to the outgoing portion of the function wrapper, which will repack 0x2710 for returning it to the user.

I strongly advise you to go over the same debugging process we just did with balanceOf, this time with the totalSupply and transfer functions. The former is very straightforward, if not trivial, and the latter is considerably more complex but elementally made up of the same building blocks. The secret is understanding how values are read from mappings and written to mappings. There’s really not much more to it.

Now let’s go back to the big picture:

Figure 3. Function bodies after function wrappers.

As we’ve discussed before, the function bodies are all packed together right after the function wrappers. Execution flow jumps to them from the wrappers and returns to the wrappers after performing each function’s instructions.

If you look carefully at the diagram, there’s a chunk of code that comes after the function bodies called the “metadata hash.” This is a very simple structure that we’ll look at next in the final part of the series.

See you there!

Deconstructing a Solidity Contract — Part I: Introduction ✔
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
Deconstructing a Solidity Contract — Part III: The Function Selector ✔
Deconstructing a Solidity Contract — Part IV: Function Wrappers ✔
Deconstructing a Solidity Contract — Part V: Function Bodies ✔
Deconstructing a Solidity Contract — Part VI: The Metadata Hash

Deconstructing a Solidity Contract - Part V

Function Bodies

Related Posts

EVM Deterministic Deployments Made Easy with OpenZeppelin Defender

Introducing OpenZeppelin Contracts v4.9

How Web3 Progressively Decentralizes using OpenZeppelin Governor