Deconstructing a Solidity Contract — Part III: The Function Selector

| September 5, 2018

Security Insights

By Alejandro Santander in collaboration with Leo Arias.

Thank you for your interest in this post! We’re undergoing a rebranding process, so please excuse us if some names are out of date.

Note: This article is part of a series. If you haven’t read the previous article, please have a look at it first. We’re deconstructing the EVM bytecode of a simple Solidity contract.

Deconstructing a Solidity Contract — Part I: Introduction ✔
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
Deconstructing a Solidity Contract — Part III: The Function Selector ⬅
Deconstructing a Solidity Contract — Part IV: Function Wrappers
Deconstructing a Solidity Contract — Part V: Function Bodies
Deconstructing a Solidity Contract — Part VI: The Metadata Hash

In the previous article, we identified the need to separate a contract’s bytecode into creation-time and runtime code. Having made a deep dive into the creation part, it’s now time to begin our quest into the runtime part. If you look at the deconstruction diagram, we’re going to start by looking at the second big split block titled BasicToken.evm (runtime).

This might seem a bit scary at first because the runtime code looks at least four times the size of the creation code! But don’t worry, the skills we’ve developed in the previous articles for understanding EVM code, combined with our infallible divide-and-conquer strategy, will make addressing the challenge surprisingly systematic, and perhaps even easy. It’s simply a matter of starting to look at the code, identifying stand-alone structures, and continuing to split until there’s nothing else to split.

So, to get started, let’s go back to Remix and initiate a debugging session with the runtime bytecode only. How do we do that? Last time, we deployed the contract and debugged the deployment transaction. This time, we’ll interact with the deployed contract’s interface, with one of its functions, and debug that transaction instead. Recall our contract:
https://gist.github.com/ajsantander/dce951a95e7608bc29d7f5deeb6e2ecf#file-basictoken-sol

Deploy it in Remix using the Javascript VM, v0.4.24 of the compiler with optimizations enabled, and 10000 as the initial supply. Once the contract is deployed, you should see it listed in the Deployed Contracts section of Remix’s Run panel. Click on it to expand the contract’s interface.

What is this interface? It’s a list of all the methods of the contract that are either public or external — i.e., that any Ethereum account or contract can interact with. Private and internal methods will not show here and are in fact not reachable from “the outside world”. How to interact with particular portions of the runtime code of a contract will be the focus of this article.

Should we try it out? Click on the totalSupply button in Remix’s Run panel. You should immediately see a response below the button: 0: uint256: 10000, which is what we would expect, since we deployed the contract with 10000 as the initial token supply. Now, in the Console panel, click on the Debug button to start a debugging session with this paticular transaction. Notice that there will be multiple Debug buttons in the Console panel; make sure you’re using the latest one.

In this case, we’re not debugging a transaction to the 0x0 address, which as we saw in the previous article creates a contract. Instead, we’re debugging a transaction made to the contract itself — i.e., to its runtime code.

If you pop open the Instructions panel, you should be able to verify that the instructions listed by Remix are identical to those in the BasicToken.evm (runtime) section in the deconstruction diagram. If they don’t match, something went wrong. Try starting over and making sure that you’re using the the right settings as described above.

All good? The first thing you may notice is that the debugger placed you at instruction 246 and that the Transaction Slider is positioned at about 60% of the bytecode. Why? Well, because Remix is a really nice and generous program, and it takes you directly to the part in which the EVM is just about to execute the body of the totalSupply function. However, a LOT of things happened before this point, and those are what we want to pay attention to here. In fact, we won’t even look into the execution of the body of the function in this article. Our single concern here is how Solidity’s generated EVM code routes incoming transactions, which is the job of what we will come to understand as a contract’s “Function Selector”.

So, grab that slider and drag it all the way to the left so that we start from instruction zero. As we saw before, the EVM always executes code from instruction 0, no exceptions, and then flows through the rest of the code. Let’s walk through this execution opcode by opcode.

The first structure that appears is something we’ve seen before (and that we’ll see a lot of, actually):

This is something Solidity-generated EVM code will always do before anything else in a call: save a spot in memory to be used later.

Let’s see what happens next:

If you open Remix’s Stack panel in the Debug tab and step past instructions 5 to 7, you will see that the stack now contains the number 4 twice. If you’re having trouble reading these super long numbers, note that you adjust the width of Remix’s Debug panel so that the numbers fit nicely into single lines. The first one came from a regular push, but the second one is the result of executing the opcode CALLDATASIZE, which as the Yellow Paper states, takes no arguments and returns the size of the “input data in the current environment”, or what we often refer to as the calldata.

What is the calldata? As explained in Solidity’s documentation ABI specification, the calldata is an encoded chunk of hexadecimal numbers that contains information about what function of the contract we want to call, and it’s arguments or data. Simply put, it consists of a “function id”, which is generated by hashing the function’s signature (truncated to the first leading four bytes) followed by the packed arguments data. You may study the documentation link in detail if you want, but don’t worry about understanding how this packaging works to the finest detail just yet. It’s explained in the documentation, but is a little hard to grasp all at once. It’s much easier to understand with practical examples.

Let’s see what this calldata is. Open the Call Data panel in Remix’s debugger to see: 0x18160ddd. That’s four bytes produced precisely by applying the keccak256 algorithm on the function signature as a string — "totalSupply()" — and performing the said truncation. Since this particular function takes no arguments, it’s just that: a four-byte function id. When CALLDATASIZE is called, it simply pushes the second 4 on to the stack.

Instruction 8 then uses LT to verify if the calldata size is less than four. If so, the following two instructions perform a JUMPI to instruction 86 (0x0056). That is less than four bytes, so in that case there will be no jump, and the execution flow will continue to instruction 13. But before we do that, let’s imagine that we made a call to our contract with empty calldata — that is, 0x0 instead of 0x18160ddd. You can’t do that with Remix btw, but you could if you constructed the transaction manually.

In that case, we’d end up in instruction 86, which basically pushes a couple of zeroes to the stack and feeds them to a REVERT opcode. Why? Well, because this contract doesn’t have a fallback function. If the bytecode doesn’t identify the incoming data, it will divert the flow to the fallback function, and if that structure doesn’t “catch” the call, this revert structure will, terminating the execution with absolutely no regret or compassion. Ruthlessly. If there’s nothing to fall back to, then there is nothing to do and the call is completely reverted.

Now, let’s do something a little more interesting. Go back to Remix’s Run tab, copy the Account address, and use it as a parameter to call balanceOf instead of totalSupply and debug that transaction. This is a completely new debugging session; let’s forget about totalSupply for now. Navigate to instruction 8, where CALLDATASIZE will now have pushed 36 (0x24) to the stack. And if you look at the calldata, it’s now 0x70a08231000000000000000000000000ca35b7d915458ef540ade6068dfe2f44e8fa733c.

This new calldata is actually super easy to break apart: the first four bytes 70a08231 are the hash of the signature "balanceOf(address)", and the 32 bytes that follow contain the address we passed as a parameter. Why 32 bytes if Ethereum addresses are only 20 bytes long, the inquisitive reader may ask? The ABI always uses 32-byte “words” or “slots” to hold parameters used in function calls.

Continuing in the context of our balanceOf call, let’s pick up where we left off at instruction 13, at which point there’s nothing in the stack. Instruction 13 then pushes 0xffffffff to the stack, and the next instruction pushes a 29 byte long 0x000000001000…000 number to the stack. We’ll see why in just a moment. For now, simply notice that one contains four bytes of f‘s, and the other contains four bytes of 0‘s.

Next up CALLDATALOAD takes one parameter (the one pushed to the stack at instruction 48) and reads a chunk of 32 bytes from the calldata at that position, which in this case in Yul would be:

calldataload(0)

Basically pushing our entire calldata to the stack. Now comes the funny part. DIV consumes two arguments from the stack, taking the calldata and dividing it by that weird 0x000000001000…000 number, effectively filtering everything but the function signature from the calldata and leaving that alone on the stack: 0x000…000070a08231. The next instruction uses AND, which also consumes two elements from the stack: our function id and the number with four bytes of f’s. This is to make sure that the signature hash is exactly eight bytes long, masking out anything else, if anything were present. A safety measure used by Solidity, I presume.

Long story short, we’ve simply checked if the calldata is too short, and if so, reverted, and then shuffled things a bit so that we have our function id in the stack: 70a08231.

Aaaaand, we’re almost done here. The next part is really easy to understand:

At instruction 53, the code pushes 18160ddd (the function id of totalSuppy) to the stack and then uses a DUP2 to duplicate our incoming calldata 70a08231 value, currently present at the second position in the stack. Why the dup? Because the EQ opcode at instruction 59 will consume two values from the stack and we want to keep the 70a08231 value around, as we went through so much trouble to extract it from the calldata.

The code will now try to match the function id from the calldata with one of the known function ids. Since 70a08231 is coming in, it will not match 18160ddd, skipping the JUMPI at instruction 63. But it will match in the next check and jump into the JUMPI at instruction 74.

Let’s take a moment to observe that there is one of these equality checks for each public or external function of the contract. This is the function selector’s core: acting as some sort of switch statement that simply routes execution to the correct part of the code. It is our “hub”.

So, since the last case was a match, execution flow takes us to the JUMPDEST at location 130, which is, as we will see in the next part of the series, the ABI “wrapper” of the balanceOf function. As we will see, this wrapper will be in charge of un-packaging the transaction’s data for the function’s body to consume.

Go ahead and try debugging the transfer function this time. There’s really no mystery to the function selector. It’s a simple and effective structure that sits at the gate of every contract out there (at least all those compiled from Solidity) and redirects execution to the appropriate location in the code. It’s the way in which Solidity gives a contract’s bytecode the ability to emulate multiple entry points, and hence, an interface.

Looking at the deconstruction diagram, this is what we’ve just deconstructed: