By Alejandro Santander in collaboration with Leo Arias
Thank you for your interest in this post! We’re undergoing a rebranding process, so please excuse us if some names are out of date.
Note: This article is part of a series. If you haven’t read the previous article, please have a look at it first. We’re deconstructing the EVM bytecode of a simple Solidity contract.
- Deconstructing a Solidity Contract — Part I: Introduction ✔
- Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
- Deconstructing a Solidity Contract — Part III: The Function Selector ✔
- Deconstructing a Solidity Contract — Part IV: Function Wrappers ✔
- Deconstructing a Solidity Contract — Part V: Function Bodies ⬅
- Deconstructing a Solidity Contract — Part VI: The Metadata Hash
Hey there! We’ve come a long way, haven’t we? First, we understood the difference between a contract’s creation time and runtime bytecode; next, we understood how the entry point of execution from any call or transaction is routed to specific functions via the function selector; and finally, we saw how incoming transaction data is unpacked for a function to consume, and the data produced by a function is repacked for the user via function wrappers. In this section, we will (at last) look at the actual execution of a function, or what we’ve been calling so far a “function’s body”.
The function body is precisely what the function wrappers detour to, after unpacking the incoming calldata. By the time a function body is executed, the function’s arguments should be sitting comfortably in the stack (or in memory if the data is dynamic), anxious to be used. Let’s see this in action with the balanceOf(address)
function. This function should receive an address
and return the corresponding uint256
balance of such an address.
Let’s go back to Remix and compile and deploy the contract as we’ve done before, and then call the balanceOf
function with the address you used to deploy the contract as the argument. This should return the number 10000
, since it is what is initially assigned to whatever address deploys the contract in the constructor’s code, which we used when deploying the contract.
Right, now let’s debug the transaction.
The first thing you’ll notice is that the debugger placed us at instruction 252. If you look at the deconstruction diagram, in the wrappers’ blue section, you should see that the balanceOf
function wrapper redirects flow at instruction 175 to the JUMPDEST
instruction in 251. As we’ve seen multiple times before, Remix places us precisely at the point when a function’s body is about to be executed.
Now, if you look at the stack, you’ll notice that its topmost value is the address we called balanceOf
with. The wrapper has done its job of unpacking the calldata correctly. So we’re ready to step through instructions 251 to 290, the body of the balanceOf
function.
Instruction 252 pushes a 20-byte 0xffffffffffffffffffffffffffffffffffffffff
value and uses the AND
opcode to “mask” the 32-byte address into its correct type (remember, addresses in Ethereum are 20 bytes long while the stack operates in 32-byte words).
In instructions 274 to 278,
the bytecode will upload the address from the stack to memory. It needs it there for the upcoming SHA3
opcode. If you look in the Yellow Paper, the SHA3
opcode has two parameters: the position in memory to calculate the hash from, and the number of bytes to hash.
But why will the code be using a SHA3
opcode? This function wants to read from the balances
mapping. More specifically, it wants to read the value mapped for the incoming address. If you recall how a mapping is laid out in storage, the hash of the concatenation of the variable’s slot — in this case 1, because balances
is defined as the second variable (totalSupply_
is the first, at slot 0), with the actual key itself as the address — is the position in storage where the value we are looking for is stored. SHA3
will need both these values in memory to do its magic, and that is precisely what’s happening here.
So, we’ve got the address in memory, but now we need the slot in memory. And that’s what happens next between instructions 279 and 283.
The number 0x01
is stored at memory position 0x20
. Now memory holds the address at the first word, memory position 0x00
, and the slot at the second word, memory position 0x20
. Yay! We’re ready to call SHA3
.
And so it’s called between instructions 284 and 287.
By the time SHA3
is called in instruction 287, the stack contains 0x00
(start position for SHA3
) and 0x40
(length for SHA3
), which is basically telling the EVM to hash whatever is in memory in the first two 32-byte words. Thirty-two bytes in hex is 0x20
, so 0x20
+ 0x20
equals 0x40
.
Now, SHA3
leaves the 32-byte hash in the stack, which is an awfully long hexadecimal number, considerably longer than an Ethereum address. This hash is the location in the contract’s storage where the balance of the address passed to balanceOf
is stored. You can visualize this using the Storage completely loaded panel in Remix’s debugger. You should find a matching location in the second storage object.
What’s stored at this location? The number 10000
, or 0x2710
in hex. At instruction 288, SLOAD
takes the argument of where to read from storage (our hash) and pushes 0x2710
to the stack.
Finally, at instruction 289, SWAP1
resurfaces the function wrapper’s JUMPDEST
location (0x70
, or 112), and a JUMP
in instruction 290 takes us back to the outgoing portion of the function wrapper, which will repack 0x2710
for returning it to the user.
I strongly advise you to go over the same debugging process we just did with balanceOf
, this time with the totalSupply
and transfer
functions. The former is very straightforward, if not trivial, and the latter is considerably more complex but elementally made up of the same building blocks. The secret is understanding how values are read from mappings and written to mappings. There’s really not much more to it.
Now let’s go back to the big picture:
As we’ve discussed before, the function bodies are all packed together right after the function wrappers. Execution flow jumps to them from the wrappers and returns to the wrappers after performing each function’s instructions.
If you look carefully at the diagram, there’s a chunk of code that comes after the function bodies called the “metadata hash.” This is a very simple structure that we’ll look at next in the final part of the series.
See you there!
- Deconstructing a Solidity Contract — Part I: Introduction ✔
- Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
- Deconstructing a Solidity Contract — Part III: The Function Selector ✔
- Deconstructing a Solidity Contract — Part IV: Function Wrappers ✔
- Deconstructing a Solidity Contract — Part V: Function Bodies ✔
- Deconstructing a Solidity Contract — Part VI: The Metadata Hash