Deconstructing a Solidity Contract —Part I: Introduction

Image from http://www.szzljy.com

By Alejandro Santander in collaboration with Leo Arias.

You’re on the road, driving fast in your rare, fully restored 1969 Mustang Mach 1. The sunlight shimmers on the all-original, gorgeous plated rims. It’s just you, the road, the desert, and the never-ending chase of the horizon. Perfection!

In the blink of an eye, your 335 hp beast is engulfed in white smoke, as if transformed into a steam locomotive, and you’re forced to stop on the side of the road. With determination, you pop the hood, only to realize that you have absolutely no idea what you’re looking at. How does this damn machine work? You grab your phone and discover that you have no signal.

Image from http://www.mustangandfords.com

Could this perhaps be an analogy for your current knowledge of dApp development? In the analogy, the Mustang is your set of smart contracts, the rims are all those well-thought-out little details and the ❤ you put into them. And the popping of the hood is you looking into your contract’s EVM bytecode and having absolutely no idea what’s going on.

If this sounds familiar, then not to worry! The purpose of this series of articles is to deconstruct a simple Solidity contract, look at its bytecode, and break it apart into identifiable structures down to the lowest level. We’ll pop the hood on Solidity. By the end of the series, you should feel comfortable when looking at or debugging EVM bytecode. The whole point of the series is to demystify the EVM bytecode produced by the Solidity compiler. And it’s really much simpler than it seems.

Note: This series is aimed at developers who already feel comfortable with and have experience in developing Solidity contracts, but want to understand how things work at a slightly deeper/lower level — that is, how Solidity is translated into EVM bytecode by the Solidity compiler, and how the EVM executes such bytecode. If you aren’t there just yet, I recommend reading this great introduction by Facu Spagnuolo: A Gentle Introduction to Ethereum Programming.

Here’s the contract we’ll deconstruct:

pragma solidity ^0.4.24;
contract BasicToken {
uint256 totalSupply_;
mapping(address => uint256) balances;
constructor(uint256 _initialSupply) public {
totalSupply_ = _initialSupply;
balances[msg.sender] = _initialSupply;
}
function totalSupply() public view returns (uint256) {
return totalSupply_;
}
function transfer(address _to, uint256 _value) public returns (bool) {
require(_to != address(0));
require(_value <= balances[msg.sender]);
balances[msg.sender] = balances[msg.sender] _value;
balances[_to] = balances[_to] + _value;
return true;
}
function balanceOf(address _owner) public view returns (uint256) {
return balances[_owner];
}
}
view raw
BasicToken.sol
hosted with ❤ by GitHub

Note: This contract is susceptible to an overflow attack, but we’re just keeping it simple here for the purpose at hand.

Compiling the contract

To compile the contract, we’ll be using Remix. Go ahead and create a new contract by clicking on the + button on the top left, above the file browser area. Set the filename to BasicToken.sol. Now, paste the above code into the editor section.

In the right-hand section, go to the Settings tab and make sure Enable Optimization is selected. Also, verify that the selected version of the Solidity compiler is “version:0.4.24+commit.e67f0147.Emscripten.clang”. These two details are very important, otherwise you’ll be looking at slightly different bytecode from what will be discussed here.

If you go to the Compile tab and click on the Details button, you should see a popup with all the stuff that the Solidity compiler generates, one of which is a JSON object named BYTECODE that has an “object” property, which is the compiled code of the contract. It looks like this:

608060405234801561001057600080fd5b5060405160208061021783398101604090815290516000818155338152600160205291909120556101d1806100466000396000f3006080604052600436106100565763ffffffff7c010000000000000000000000000000000000000000000000000000000060003504166318160ddd811461005b57806370a0823114610082578063a9059cbb146100b0575b600080fd5b34801561006757600080fd5b506100706100f5565b60408051918252519081900360200190f35b34801561008e57600080fd5b5061007073ffffffffffffffffffffffffffffffffffffffff600435166100fb565b3480156100bc57600080fd5b506100e173ffffffffffffffffffffffffffffffffffffffff60043516602435610123565b604080519115158252519081900360200190f35b60005490565b73ffffffffffffffffffffffffffffffffffffffff1660009081526001602052604090205490565b600073ffffffffffffffffffffffffffffffffffffffff8316151561014757600080fd5b3360009081526001602052604090205482111561016357600080fd5b503360009081526001602081905260408083208054859003905573ffffffffffffffffffffffffffffffffffffffff85168352909120805483019055929150505600a165627a7a72305820a5d999f4459642872a29be93a490575d345e40fc91a7cccb2cf29c88bcdaf3be0029
view raw
BasicToken.evm
hosted with ❤ by GitHub

Yup. That’s completely unreadable (at least for a normal human being).

Deploying the contract

Next, go to the Run section in Remix. At the top, make sure you’re using the Javascript VM. This is basically an embedded Javascript EVM + network, our ideal Ethereum playground. Make sure BasicToken is selected in the ComboBox, and enter the number 10000 in the Deploy input box. Next, click the Deploy button. This should deploy an instance of our BasicToken contract, with an initial supply of 10000 tokens owned by the account currently selected at the top of the account ComboBox, which will hold the totality of our token supply.

Lower in the Run tab, in the Deployed Contracts section, you should see the deployed contract, with fields to interact with its three functions: transfer, balanceOf, and totalSupply. Here, we’ll be able to interact with the instance of the contract we just deployed.

But before that, let’s take a look at exactly what “deploying” the contract means. At the bottom of the page, in the console area, you should see the log “creation of BasicToken pending…”, followed by a transaction entry with various fields: from, to, value, data, logs, and hash. Click on this entry to expand the transaction’s info. Even though abbreviated, you should see that the data/input of the transaction is the same bytecode we presented above. This transaction is sent to the 0x0 address, and as a result, a new contract instance is created, with its own address and code. We’ll examine this process in detail in the next article.

Disassembling the bytecode

To the right of the transaction’s data, still in the console, click on the Debug button. This will activate the Debugger tab in Remix’s right-hand area. Let’s take a look at the Instructions section. If you scroll down, you should see the following:

000 PUSH1 80
002 PUSH1 40
004 MSTORE
005 CALLVALUE
006 DUP1
007 ISZERO
008 PUSH2 0010
011 JUMPI
012 PUSH1 00
014 DUP1
015 REVERT
016 JUMPDEST
017 POP
018 PUSH1 40
020 MLOAD
021 PUSH1 20
023 DUP1
024 PUSH2 0217
027 DUP4
028 CODECOPY
029 DUP2
030 ADD
031 PUSH1 40
033 SWAP1
034 DUP2
035 MSTORE
036 SWAP1
037 MLOAD
038 PUSH1 00
040 DUP2
041 DUP2
042 SSTORE
043 CALLER
044 DUP2
045 MSTORE
046 PUSH1 01
048 PUSH1 20
050 MSTORE
051 SWAP2
052 SWAP1
053 SWAP2
054 SHA3
055 SSTORE
056 PUSH2 01d1
059 DUP1
060 PUSH2 0046
063 PUSH1 00
065 CODECOPY
066 PUSH1 00
068 RETURN
069 STOP
070 PUSH1 80
072 PUSH1 40
074 MSTORE
075 PUSH1 04
077 CALLDATASIZE
078 LT
079 PUSH2 0056
082 JUMPI
083 PUSH4 ffffffff
088 PUSH29 0100000000000000000000000000000000000000000000000000000000
118 PUSH1 00
120 CALLDATALOAD
121 DIV
122 AND
123 PUSH4 18160ddd
128 DUP2
129 EQ
130 PUSH2 005b
133 JUMPI
134 DUP1
135 PUSH4 70a08231
140 EQ
141 PUSH2 0082
144 JUMPI
145 DUP1
146 PUSH4 a9059cbb
151 EQ
152 PUSH2 00b0
155 JUMPI
156 JUMPDEST
157 PUSH1 00
159 DUP1
160 REVERT
161 JUMPDEST
162 CALLVALUE
163 DUP1
164 ISZERO
165 PUSH2 0067
168 JUMPI
169 PUSH1 00
171 DUP1
172 REVERT
173 JUMPDEST
174 POP
175 PUSH2 0070
178 PUSH2 00f5
181 JUMP
182 JUMPDEST
183 PUSH1 40
185 DUP1
186 MLOAD
187 SWAP2
188 DUP3
189 MSTORE
190 MLOAD
191 SWAP1
192 DUP2
193 SWAP1
194 SUB
195 PUSH1 20
197 ADD
198 SWAP1
199 RETURN
200 JUMPDEST
201 CALLVALUE
202 DUP1
203 ISZERO
204 PUSH2 008e
207 JUMPI
208 PUSH1 00
210 DUP1
211 REVERT
212 JUMPDEST
213 POP
214 PUSH2 0070
217 PUSH20 ffffffffffffffffffffffffffffffffffffffff
238 PUSH1 04
240 CALLDATALOAD
241 AND
242 PUSH2 00fb
245 JUMP
246 JUMPDEST
247 CALLVALUE
248 DUP1
249 ISZERO
250 PUSH2 00bc
253 JUMPI
254 PUSH1 00
256 DUP1
257 REVERT
258 JUMPDEST
259 POP
260 PUSH2 00e1
263 PUSH20 ffffffffffffffffffffffffffffffffffffffff
284 PUSH1 04
286 CALLDATALOAD
287 AND
288 PUSH1 24
290 CALLDATALOAD
291 PUSH2 0123
294 JUMP
295 JUMPDEST
296 PUSH1 40
298 DUP1
299 MLOAD
300 SWAP2
301 ISZERO
302 ISZERO
303 DUP3
304 MSTORE
305 MLOAD
306 SWAP1
307 DUP2
308 SWAP1
309 SUB
310 PUSH1 20
312 ADD
313 SWAP1
314 RETURN
315 JUMPDEST
316 PUSH1 00
318 SLOAD
319 SWAP1
320 JUMP
321 JUMPDEST
322 PUSH20 ffffffffffffffffffffffffffffffffffffffff
343 AND
344 PUSH1 00
346 SWAP1
347 DUP2
348 MSTORE
349 PUSH1 01
351 PUSH1 20
353 MSTORE
354 PUSH1 40
356 SWAP1
357 SHA3
358 SLOAD
359 SWAP1
360 JUMP
361 JUMPDEST
362 PUSH1 00
364 PUSH20 ffffffffffffffffffffffffffffffffffffffff
385 DUP4
386 AND
387 ISZERO
388 ISZERO
389 PUSH2 0147
392 JUMPI
393 PUSH1 00
395 DUP1
396 REVERT
397 JUMPDEST
398 CALLER
399 PUSH1 00
401 SWAP1
402 DUP2
403 MSTORE
404 PUSH1 01
406 PUSH1 20
408 MSTORE
409 PUSH1 40
411 SWAP1
412 SHA3
413 SLOAD
414 DUP3
415 GT
416 ISZERO
417 PUSH2 0163
420 JUMPI
421 PUSH1 00
423 DUP1
424 REVERT
425 JUMPDEST
426 POP
427 CALLER
428 PUSH1 00
430 SWAP1
431 DUP2
432 MSTORE
433 PUSH1 01
435 PUSH1 20
437 DUP2
438 SWAP1
439 MSTORE
440 PUSH1 40
442 DUP1
443 DUP4
444 SHA3
445 DUP1
446 SLOAD
447 DUP6
448 SWAP1
449 SUB
450 SWAP1
451 SSTORE
452 PUSH20 ffffffffffffffffffffffffffffffffffffffff
473 DUP6
474 AND
475 DUP4
476 MSTORE
477 SWAP1
478 SWAP2
479 SHA3
480 DUP1
481 SLOAD
482 DUP4
483 ADD
484 SWAP1
485 SSTORE
486 SWAP3
487 SWAP2
488 POP
489 POP
490 JUMP
491 STOP
492 LOG1
493 PUSH6 627a7a723058
500 SHA3
501 INVALID
502 INVALID
503 SWAP10
504 DELEGATECALL
505 GASLIMIT
506 SWAP7
507 TIMESTAMP
508 DUP8
509 INVALID
510 INVALID
511 INVALID
512 SWAP4
513 LOG4
514 SWAP1
515 JUMPI
516 INVALID
517 CALLVALUE
518 INVALID
519 BLOCKHASH
520 INVALID
521 SWAP2
522 INVALID
523 INVALID
524 INVALID
525 INVALID
526 CALLCODE
527 SWAP13
528 DUP9
529 INVALID
530 INVALID
531 RETURN
532 INVALID
533 STOP
534 INVALID
535 STOP
536 STOP
537 STOP
538 STOP
539 STOP
540 STOP
541 STOP
542 STOP
543 STOP
544 STOP
545 STOP
546 STOP
547 STOP
548 STOP
549 STOP
550 STOP
551 STOP
552 STOP
553 STOP
554 STOP
555 STOP
556 STOP
557 STOP
558 STOP
559 STOP
560 STOP
561 STOP
562 STOP
563 STOP
564 STOP
565 INVALID
566 LT
view raw
BasicToken.asm
hosted with ❤ by GitHub

To make sure that you’re following the same set of opcodes described in this series, please compare what you see in Remix with the bytecode in this gist.

This is the disassembled bytecode of the contract. Disassembly sounds rather intimidating, but it’s quite simple, really. If you scan the raw bytecode by bytes (two characters at a time), the EVM identifies specific opcodes that it associates to particular actions. For example:

0x60 => PUSH
0x01 => ADD
0x02 => MUL
0x00 => STOP
...

The disassembled code is still very low-level and difficult to read, but as you will see, we can start making sense out of it.

Opcodes

Before we get started on our ambitious endeavour of completely deconstructing the bytecode, you’re going to need a basic tool set for understanding individual opcodes such as PUSH, ADD, SWAP, DUP, etc. An opcode, in the end, can only push or consume items from the EVM’s stack, memory, or storage belonging to the contract. That’s it.

To see all the available opcodes that the EVM can process, check out this handy gist from Pyethereum showing a list of the opcodes. To understand what each one does and how it works, Solidity’s assembly documentation is a great reference. Even though it’s not a one-on-one relationship with the raw opcodes, it’s pretty close (it’s actually Yul, an intermediate language between Solidity and EVM bytecode). Finally, if you can speak scientician, there’s always the Ethereum Yellow Paper to fall back on.

There’s no point in reading these resources from start to finish right now; just keep them around for reference. We’ll be using them as we go along.

Instructions

Each line in the disassembled code above is an instruction for the EVM to execute. Each instruction contains an opcode. For example, let’s take one of those instructions, instruction 88, which pushes the number 4 to the stack. This particular disassembler interprets instructions as follows:

88 PUSH1 0x04
|  |     |     
|  |     Hex value for push.
|  Opcode.
Instruction number.

Even though the disassembled code brings us one step closer to understanding what’s going on, it’s still quite intimidating. We’re going to need a strategy for deconstructing the whole thing, which has 596 instructions!

The Strategy

Problems that appear to be overwhelming at first usually succumb to the all-powerful, all-mighty “divide-and-conquer” strategy, and this problem is no exception to the rule. We’ll identify split points in the disassembled code and reduce it bit by bit, until we end up with small, digestible chunks, which we’ll walk through step by step in Remix’s debugger. In the following diagram, we can see the first split we can make on the disassembled code, which we’ll analyze completely in the next article.

You can find the end result of the entire deconstruction in the deconstruction diagram. Don’t worry if you don’t understand the diagram at first. You’re not supposed to. This series will go through it step by step. Keep it around so you can keep track of the big picture as we go along.

The series is divided into the following set of articles. If you’re up for the challenge, get started with the actual deconstruction in Part II. See you there!

  1. Deconstructing a Solidity Contract — Part I: Introduction ✔
  2. Deconstructing a Solidity Contract — Part II: Creation vs. Runtime
  3. Deconstructing a Solidity Contract — Part III: The Function Selector
  4. Deconstructing a Solidity Contract — Part IV: Function Wrappers
  5. Deconstructing a Solidity Contract — Part V: Function Bodies
  6. Deconstructing a Solidity Contract — Part VI: The Metadata Hash