By Alejandro Santander in collaboration with Leo Arias.
You’re on the road, driving fast in your rare, fully restored 1969 Mustang Mach 1. The sunlight shimmers on the all-original, gorgeous plated rims. It’s just you, the road, the desert, and the never-ending chase of the horizon. Perfection!
In the blink of an eye, your 335 hp beast is engulfed in white smoke, as if transformed into a steam locomotive, and you’re forced to stop on the side of the road. With determination, you pop the hood, only to realize that you have absolutely no idea what you’re looking at. How does this damn machine work? You grab your phone and discover that you have no signal.
Could this perhaps be an analogy for your current knowledge of dApp development? In the analogy, the Mustang is your set of smart contracts, the rims are all those well-thought-out little details and the ❤ you put into them. And the popping of the hood is you looking into your contract’s EVM bytecode and having absolutely no idea what’s going on.
If this sounds familiar, then not to worry! The purpose of this series of articles is to deconstruct a simple Solidity contract, look at its bytecode, and break it apart into identifiable structures down to the lowest level. We’ll pop the hood on Solidity. By the end of the series, you should feel comfortable when looking at or debugging EVM bytecode. The whole point of the series is to demystify the EVM bytecode produced by the Solidity compiler. And it’s really much simpler than it seems.
Note: This series is aimed at developers who already feel comfortable with and have experience in developing Solidity contracts, but want to understand how things work at a slightly deeper/lower level — that is, how Solidity is translated into EVM bytecode by the Solidity compiler, and how the EVM executes such bytecode. If you aren’t there just yet, I recommend reading this great introduction by Facu Spagnuolo: A Gentle Introduction to Ethereum Programming.
Here’s the contract we’ll deconstruct:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pragma solidity ^0.4.24; | |
contract BasicToken { | |
uint256 totalSupply_; | |
mapping(address => uint256) balances; | |
constructor(uint256 _initialSupply) public { | |
totalSupply_ = _initialSupply; | |
balances[msg.sender] = _initialSupply; | |
} | |
function totalSupply() public view returns (uint256) { | |
return totalSupply_; | |
} | |
function transfer(address _to, uint256 _value) public returns (bool) { | |
require(_to != address(0)); | |
require(_value <= balances[msg.sender]); | |
balances[msg.sender] = balances[msg.sender] – _value; | |
balances[_to] = balances[_to] + _value; | |
return true; | |
} | |
function balanceOf(address _owner) public view returns (uint256) { | |
return balances[_owner]; | |
} | |
} |
Note: This contract is susceptible to an overflow attack, but we’re just keeping it simple here for the purpose at hand.
Compiling the contract
To compile the contract, we’ll be using Remix. Go ahead and create a new contract by clicking on the + button on the top left, above the file browser area. Set the filename to BasicToken.sol. Now, paste the above code into the editor section.
In the right-hand section, go to the Settings tab and make sure Enable Optimization is selected. Also, verify that the selected version of the Solidity compiler is “version:0.4.24+commit.e67f0147.Emscripten.clang”. These two details are very important, otherwise you’ll be looking at slightly different bytecode from what will be discussed here.
If you go to the Compile tab and click on the Details button, you should see a popup with all the stuff that the Solidity compiler generates, one of which is a JSON object named BYTECODE that has an “object” property, which is the compiled code of the contract. It looks like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
608060405234801561001057600080fd5b5060405160208061021783398101604090815290516000818155338152600160205291909120556101d1806100466000396000f3006080604052600436106100565763ffffffff7c010000000000000000000000000000000000000000000000000000000060003504166318160ddd811461005b57806370a0823114610082578063a9059cbb146100b0575b600080fd5b34801561006757600080fd5b506100706100f5565b60408051918252519081900360200190f35b34801561008e57600080fd5b5061007073ffffffffffffffffffffffffffffffffffffffff600435166100fb565b3480156100bc57600080fd5b506100e173ffffffffffffffffffffffffffffffffffffffff60043516602435610123565b604080519115158252519081900360200190f35b60005490565b73ffffffffffffffffffffffffffffffffffffffff1660009081526001602052604090205490565b600073ffffffffffffffffffffffffffffffffffffffff8316151561014757600080fd5b3360009081526001602052604090205482111561016357600080fd5b503360009081526001602081905260408083208054859003905573ffffffffffffffffffffffffffffffffffffffff85168352909120805483019055929150505600a165627a7a72305820a5d999f4459642872a29be93a490575d345e40fc91a7cccb2cf29c88bcdaf3be0029 |
Yup. That’s completely unreadable (at least for a normal human being).
Deploying the contract
Next, go to the Run section in Remix. At the top, make sure you’re using the Javascript VM. This is basically an embedded Javascript EVM + network, our ideal Ethereum playground. Make sure BasicToken is selected in the ComboBox, and enter the number 10000 in the Deploy input box. Next, click the Deploy button. This should deploy an instance of our BasicToken contract, with an initial supply of 10000 tokens owned by the account currently selected at the top of the account ComboBox, which will hold the totality of our token supply.
Lower in the Run tab, in the Deployed Contracts section, you should see the deployed contract, with fields to interact with its three functions: transfer, balanceOf, and totalSupply. Here, we’ll be able to interact with the instance of the contract we just deployed.
But before that, let’s take a look at exactly what “deploying” the contract means. At the bottom of the page, in the console area, you should see the log “creation of BasicToken pending…”, followed by a transaction entry with various fields: from, to, value, data, logs, and hash. Click on this entry to expand the transaction’s info. Even though abbreviated, you should see that the data/input of the transaction is the same bytecode we presented above. This transaction is sent to the 0x0 address, and as a result, a new contract instance is created, with its own address and code. We’ll examine this process in detail in the next article.
Disassembling the bytecode
To the right of the transaction’s data, still in the console, click on the Debug button. This will activate the Debugger tab in Remix’s right-hand area. Let’s take a look at the Instructions section. If you scroll down, you should see the following:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
000 PUSH1 80 | |
002 PUSH1 40 | |
004 MSTORE | |
005 CALLVALUE | |
006 DUP1 | |
007 ISZERO | |
008 PUSH2 0010 | |
011 JUMPI | |
012 PUSH1 00 | |
014 DUP1 | |
015 REVERT | |
016 JUMPDEST | |
017 POP | |
018 PUSH1 40 | |
020 MLOAD | |
021 PUSH1 20 | |
023 DUP1 | |
024 PUSH2 0217 | |
027 DUP4 | |
028 CODECOPY | |
029 DUP2 | |
030 ADD | |
031 PUSH1 40 | |
033 SWAP1 | |
034 DUP2 | |
035 MSTORE | |
036 SWAP1 | |
037 MLOAD | |
038 PUSH1 00 | |
040 DUP2 | |
041 DUP2 | |
042 SSTORE | |
043 CALLER | |
044 DUP2 | |
045 MSTORE | |
046 PUSH1 01 | |
048 PUSH1 20 | |
050 MSTORE | |
051 SWAP2 | |
052 SWAP1 | |
053 SWAP2 | |
054 SHA3 | |
055 SSTORE | |
056 PUSH2 01d1 | |
059 DUP1 | |
060 PUSH2 0046 | |
063 PUSH1 00 | |
065 CODECOPY | |
066 PUSH1 00 | |
068 RETURN | |
069 STOP | |
070 PUSH1 80 | |
072 PUSH1 40 | |
074 MSTORE | |
075 PUSH1 04 | |
077 CALLDATASIZE | |
078 LT | |
079 PUSH2 0056 | |
082 JUMPI | |
083 PUSH4 ffffffff | |
088 PUSH29 0100000000000000000000000000000000000000000000000000000000 | |
118 PUSH1 00 | |
120 CALLDATALOAD | |
121 DIV | |
122 AND | |
123 PUSH4 18160ddd | |
128 DUP2 | |
129 EQ | |
130 PUSH2 005b | |
133 JUMPI | |
134 DUP1 | |
135 PUSH4 70a08231 | |
140 EQ | |
141 PUSH2 0082 | |
144 JUMPI | |
145 DUP1 | |
146 PUSH4 a9059cbb | |
151 EQ | |
152 PUSH2 00b0 | |
155 JUMPI | |
156 JUMPDEST | |
157 PUSH1 00 | |
159 DUP1 | |
160 REVERT | |
161 JUMPDEST | |
162 CALLVALUE | |
163 DUP1 | |
164 ISZERO | |
165 PUSH2 0067 | |
168 JUMPI | |
169 PUSH1 00 | |
171 DUP1 | |
172 REVERT | |
173 JUMPDEST | |
174 POP | |
175 PUSH2 0070 | |
178 PUSH2 00f5 | |
181 JUMP | |
182 JUMPDEST | |
183 PUSH1 40 | |
185 DUP1 | |
186 MLOAD | |
187 SWAP2 | |
188 DUP3 | |
189 MSTORE | |
190 MLOAD | |
191 SWAP1 | |
192 DUP2 | |
193 SWAP1 | |
194 SUB | |
195 PUSH1 20 | |
197 ADD | |
198 SWAP1 | |
199 RETURN | |
200 JUMPDEST | |
201 CALLVALUE | |
202 DUP1 | |
203 ISZERO | |
204 PUSH2 008e | |
207 JUMPI | |
208 PUSH1 00 | |
210 DUP1 | |
211 REVERT | |
212 JUMPDEST | |
213 POP | |
214 PUSH2 0070 | |
217 PUSH20 ffffffffffffffffffffffffffffffffffffffff | |
238 PUSH1 04 | |
240 CALLDATALOAD | |
241 AND | |
242 PUSH2 00fb | |
245 JUMP | |
246 JUMPDEST | |
247 CALLVALUE | |
248 DUP1 | |
249 ISZERO | |
250 PUSH2 00bc | |
253 JUMPI | |
254 PUSH1 00 | |
256 DUP1 | |
257 REVERT | |
258 JUMPDEST | |
259 POP | |
260 PUSH2 00e1 | |
263 PUSH20 ffffffffffffffffffffffffffffffffffffffff | |
284 PUSH1 04 | |
286 CALLDATALOAD | |
287 AND | |
288 PUSH1 24 | |
290 CALLDATALOAD | |
291 PUSH2 0123 | |
294 JUMP | |
295 JUMPDEST | |
296 PUSH1 40 | |
298 DUP1 | |
299 MLOAD | |
300 SWAP2 | |
301 ISZERO | |
302 ISZERO | |
303 DUP3 | |
304 MSTORE | |
305 MLOAD | |
306 SWAP1 | |
307 DUP2 | |
308 SWAP1 | |
309 SUB | |
310 PUSH1 20 | |
312 ADD | |
313 SWAP1 | |
314 RETURN | |
315 JUMPDEST | |
316 PUSH1 00 | |
318 SLOAD | |
319 SWAP1 | |
320 JUMP | |
321 JUMPDEST | |
322 PUSH20 ffffffffffffffffffffffffffffffffffffffff | |
343 AND | |
344 PUSH1 00 | |
346 SWAP1 | |
347 DUP2 | |
348 MSTORE | |
349 PUSH1 01 | |
351 PUSH1 20 | |
353 MSTORE | |
354 PUSH1 40 | |
356 SWAP1 | |
357 SHA3 | |
358 SLOAD | |
359 SWAP1 | |
360 JUMP | |
361 JUMPDEST | |
362 PUSH1 00 | |
364 PUSH20 ffffffffffffffffffffffffffffffffffffffff | |
385 DUP4 | |
386 AND | |
387 ISZERO | |
388 ISZERO | |
389 PUSH2 0147 | |
392 JUMPI | |
393 PUSH1 00 | |
395 DUP1 | |
396 REVERT | |
397 JUMPDEST | |
398 CALLER | |
399 PUSH1 00 | |
401 SWAP1 | |
402 DUP2 | |
403 MSTORE | |
404 PUSH1 01 | |
406 PUSH1 20 | |
408 MSTORE | |
409 PUSH1 40 | |
411 SWAP1 | |
412 SHA3 | |
413 SLOAD | |
414 DUP3 | |
415 GT | |
416 ISZERO | |
417 PUSH2 0163 | |
420 JUMPI | |
421 PUSH1 00 | |
423 DUP1 | |
424 REVERT | |
425 JUMPDEST | |
426 POP | |
427 CALLER | |
428 PUSH1 00 | |
430 SWAP1 | |
431 DUP2 | |
432 MSTORE | |
433 PUSH1 01 | |
435 PUSH1 20 | |
437 DUP2 | |
438 SWAP1 | |
439 MSTORE | |
440 PUSH1 40 | |
442 DUP1 | |
443 DUP4 | |
444 SHA3 | |
445 DUP1 | |
446 SLOAD | |
447 DUP6 | |
448 SWAP1 | |
449 SUB | |
450 SWAP1 | |
451 SSTORE | |
452 PUSH20 ffffffffffffffffffffffffffffffffffffffff | |
473 DUP6 | |
474 AND | |
475 DUP4 | |
476 MSTORE | |
477 SWAP1 | |
478 SWAP2 | |
479 SHA3 | |
480 DUP1 | |
481 SLOAD | |
482 DUP4 | |
483 ADD | |
484 SWAP1 | |
485 SSTORE | |
486 SWAP3 | |
487 SWAP2 | |
488 POP | |
489 POP | |
490 JUMP | |
491 STOP | |
492 LOG1 | |
493 PUSH6 627a7a723058 | |
500 SHA3 | |
501 INVALID | |
502 INVALID | |
503 SWAP10 | |
504 DELEGATECALL | |
505 GASLIMIT | |
506 SWAP7 | |
507 TIMESTAMP | |
508 DUP8 | |
509 INVALID | |
510 INVALID | |
511 INVALID | |
512 SWAP4 | |
513 LOG4 | |
514 SWAP1 | |
515 JUMPI | |
516 INVALID | |
517 CALLVALUE | |
518 INVALID | |
519 BLOCKHASH | |
520 INVALID | |
521 SWAP2 | |
522 INVALID | |
523 INVALID | |
524 INVALID | |
525 INVALID | |
526 CALLCODE | |
527 SWAP13 | |
528 DUP9 | |
529 INVALID | |
530 INVALID | |
531 RETURN | |
532 INVALID | |
533 STOP | |
534 INVALID | |
535 STOP | |
536 STOP | |
537 STOP | |
538 STOP | |
539 STOP | |
540 STOP | |
541 STOP | |
542 STOP | |
543 STOP | |
544 STOP | |
545 STOP | |
546 STOP | |
547 STOP | |
548 STOP | |
549 STOP | |
550 STOP | |
551 STOP | |
552 STOP | |
553 STOP | |
554 STOP | |
555 STOP | |
556 STOP | |
557 STOP | |
558 STOP | |
559 STOP | |
560 STOP | |
561 STOP | |
562 STOP | |
563 STOP | |
564 STOP | |
565 INVALID | |
566 LT |
To make sure that you’re following the same set of opcodes described in this series, please compare what you see in Remix with the bytecode in this gist.
This is the disassembled bytecode of the contract. Disassembly sounds rather intimidating, but it’s quite simple, really. If you scan the raw bytecode by bytes (two characters at a time), the EVM identifies specific opcodes that it associates to particular actions. For example:
0x60 => PUSH 0x01 => ADD 0x02 => MUL 0x00 => STOP ...
The disassembled code is still very low-level and difficult to read, but as you will see, we can start making sense out of it.
Opcodes
Before we get started on our ambitious endeavour of completely deconstructing the bytecode, you’re going to need a basic tool set for understanding individual opcodes such as PUSH
, ADD
, SWAP
, DUP
, etc. An opcode, in the end, can only push or consume items from the EVM’s stack, memory, or storage belonging to the contract. That’s it.
To see all the available opcodes that the EVM can process, check out this handy gist from Pyethereum showing a list of the opcodes. To understand what each one does and how it works, Solidity’s assembly documentation is a great reference. Even though it’s not a one-on-one relationship with the raw opcodes, it’s pretty close (it’s actually Yul, an intermediate language between Solidity and EVM bytecode). Finally, if you can speak scientician, there’s always the Ethereum Yellow Paper to fall back on.
There’s no point in reading these resources from start to finish right now; just keep them around for reference. We’ll be using them as we go along.
Instructions
Each line in the disassembled code above is an instruction for the EVM to execute. Each instruction contains an opcode. For example, let’s take one of those instructions, instruction 88, which pushes the number 4 to the stack. This particular disassembler interprets instructions as follows:
88 PUSH1 0x04 | | | | | Hex value for push. | Opcode. Instruction number.
Even though the disassembled code brings us one step closer to understanding what’s going on, it’s still quite intimidating. We’re going to need a strategy for deconstructing the whole thing, which has 596 instructions!
The Strategy
Problems that appear to be overwhelming at first usually succumb to the all-powerful, all-mighty “divide-and-conquer” strategy, and this problem is no exception to the rule. We’ll identify split points in the disassembled code and reduce it bit by bit, until we end up with small, digestible chunks, which we’ll walk through step by step in Remix’s debugger. In the following diagram, we can see the first split we can make on the disassembled code, which we’ll analyze completely in the next article.
You can find the end result of the entire deconstruction in the deconstruction diagram. Don’t worry if you don’t understand the diagram at first. You’re not supposed to. This series will go through it step by step. Keep it around so you can keep track of the big picture as we go along.
The series is divided into the following set of articles. If you’re up for the challenge, get started with the actual deconstruction in Part II. See you there!
- Deconstructing a Solidity Contract — Part I: Introduction ✔
- Deconstructing a Solidity Contract — Part II: Creation vs. Runtime
- Deconstructing a Solidity Contract — Part III: The Function Selector
- Deconstructing a Solidity Contract — Part IV: Function Wrappers
- Deconstructing a Solidity Contract — Part V: Function Bodies
- Deconstructing a Solidity Contract — Part VI: The Metadata Hash