Security Hub

GST2 Bytecode Deep Dive - OpenZeppelin blog

Written by Eric Decourcy | Mar 18, 2021 4:00:00 AM

GST2 Bytecode Deep Dive

This post seeks to break down the bytecode and assembly code used by GST2 (Gas Token 2) in order to understand how it works at a low-level. We will be stepping through the functions used to mint and redeem gas tokens, as well as the bytecode which governs how a gas token “dummy contract” works. Note that this post will assume you understand some Solidity assembly, the concept of a stack machine, and how the EVM reads bytecode.

GST2 functions similarly to GST1 and CHI, two popular and well-known gas tokens. I hope that by illustrating how GST2 works under-the-hood, GST1 and CHI can be better understood as well.

Warning: There are active discussions to remove the SELFDESTRUCT feature prior to the merge to Proof of Stake, as well as gas refunds in general. If this were to happen, then gas tokens would no longer be usable for gas, and hence impact their value.
This change could occur as early as the London network upgrade (July 2021).
You can read about this (potential) decision in the following articles:

makeChild() bytecode: Initialization data

In this section, we’ll examine exactly how the bytecode deployed in the makeChild function operates and becomes a destructable “dummy” contract. This section specifically details the initialization data, which is the portion of the bytecode that actually creates the contract, versus the bytecode which is the contract (that portion is discussed in the next section).

The makeChild function contains 3 lines of assembly code which do everything.

let solidity_free_mem_ptr := mload(0x40)
mstore(solidity_free_mem_ptr, 0x00756eb3f879cb30fe243b4dfee438691c043318585733ff6000526016600af3)
addr := create(0, add(solidity_free_mem_ptr, 1), 31)

The first line gets a reference to the “free memory pointer”, which is just the first memory location that can be safely used.

The second line stores in memory the child contract creation bytecode at the free memory pointer slot. In other words, whatever memory address the free memory pointer references, the bytecode 0x00756eb3f879cb30fe243b4dfee438691c043318585733ff6000526016600af3 will get stored there. Notice that its length is 32 bytes, which is perfect because that’s how much a memory slot can store.

The third line creates the child contract using the CREATE instruction. The add(solidity_free_mem_ptr, 1) operation will evaluate to solidity_free_mem_ptr + 1, which represents the pointer to the memory location at which the bytecode is stored plus 1 byte. This essentially means that we are creating a contract with bytecode stored in memory of 31 bytes of length, starting at memory location free memory pointer + 1.

This turns the original bytecode:

0x00756eb3f879cb30fe243b4dfee438691c043318585733ff6000526016600af3

Into:

0x756eb3f879cb30fe243b4dfee438691c043318585733ff6000526016600af3

The 00 at the beginning is missing, which effectively just means some blank bytes got cut off. That’s fine, and it saves us a little on gas.

But, what does this bytecode DO?

Well, two things. It will deploy the destructable contract, AND it will be the destructable contract. The instructions for deployment are a necessary portion of the bytecode usually known as initialization code (initcode for short). Within the initcode is the actual contract bytecode that will get deployed and stored on the blockchain, and dictates how the child contract will operate when we call it. This portion of the bytecode is usually called the runtime code.

The initcode

Here is the entire, 31-byte initcode from before:

0x756eb3f879cb30fe243b4dfee438691c043318585733ff6000526016600af3

The first instruction is 0x75, which represents a PUSH22, and pushes 22 bytes to the stack. These 22 bytes are none other than the next 22 bytes of the bytecode. After pushing these 22 bytes onto the stack, our remaining opcodes to evaluate, and the stack are:

6000526016600af3

TOP: 6eb3f879cb30fe243b4dfee438691c043318585733ff

The next instruction is 0x60, a PUSH1, which pushes 1 byte to the stack. This byte will be the next two hex digits in the initcode: 00. Our remaining opcodes and stack now look like this:

526016600af3

TOP: 0x00 | 0x6eb3f879cb30fe243b4dfee438691c043318585733ff

Opcode 0x52 is next: this is MSTORE. It will consume our only two stack items, storing the stack item 6eb3f879cb30fe243b4dfee438691c043318585733ff at memory location 0x00.

But remember that the EVM uses a word size of 32 bytes. So these 22 bytes 6eb3f879cb30fe243b4dfee438691c043318585733ff are actually stored in the first 32 bytes of the EVM memory, which will look like 0x000000000000000000006eb3f879cb30fe243b4dfee438691c043318585733ff.

We still have the following opcodes to execute, and an empty stack:

6016600af3

TOP: (stack is empty)

The 60 16 and 60 0a instructions are pushing (with PUSH1) two new items to the stack: 0x16 and 0x0a. The remaining opcode and current stack are:

f3

TOP: 0x0a | 0x16

The final opcode f3 is RETURN, which returns a sequence of bytes from memory, reading the beginning and length of the sequence from the stack. In decimal notation, 0x0a is 10, and 0x16 is 22. So RETURN will consume these two elements, returning the “memory starting at slot 10, spanning 22 bytes”. This will end up reading bytes 10 through 31 (inclusive) in memory, which is none other than 6eb3f879cb30fe243b4dfee438691c043318585733ff. Note that RETURN starts at byte 10 to chop off the first 10 bytes, which are just 0x00 (as explained earlier).

So, this final opcode returns 6eb3f879cb30fe243b4dfee438691c043318585733ff, which is the runtime code that the EVM will store in the newly created account. This is just how contract creation in the EVM works – it returns the contract’s runtime bytecode. So, we can safely assume that this returned data is the bytecode of the child contract.

And now we’ll dig into it.

Child contract bytecode

This section digs into the bytecode that actually is the contract once it’s been deployed. It’s a portion of the bytecode deployed via create, except it doesn’t include the initcode. Let’s look at this friendly runtime code and make sense of it:

0x6eb3f879cb30fe243b4dfee438691c043318585733ff

We already know what it’s intended to do: “self-destruct if called by the GST2 token, otherwise throw”.

Upon calling the child contract holding this runtime code, the first instruction that will be executed is 0x6e, which represents PUSH15. It will push 15 bytes of data to the stack. So it ends up pushing the bytes 0xb3f879cb30fe243b4dfee438691c04. Why push these bytes to the stack? In case you hadn’t noticed, the GST2 contract address is 0x0000000000b3f879cb30fe243b4dfee438691c04, or 5-bytes-of-zeroes-and-b3f879cb30fe243b4dfee438691c04. So what we just did is storing the hardcoded GST2 token address on the stack.

At this stage, the bytecode left to be executed and stack are:

0x3318585733ff

TOP: 0xb3f879cb30fe243b4dfee438691c04

0x33 is CALLER, which is akin to msg.sender in regular Solidity. This will put the caller’s address on the stack. Our remaining bytecode and stack now look like this:

0x18585733ff

TOP: msg.sender | 0xb3f879cb30fe243b4dfee438691c04

The next opcode to evaluate is 0x18: XOR. It will consume two stack elements, performing a bitwise XOR operation between the two. Remember that XOR outputs 1 when the bits differ, and 0 when they are equal. So, if msg.sender equals 0xb3f879cb30fe243b4dfee438691c04, the XOR operation will return all zeroes.

Once XOR is applied, the stack is going to contain just one element. 0 if msg.sender is the GST2 contract, or not zero otherwise. For ease of reference, let’s just call this result XRES (short for “XOR Result”). This is now the only element on the stack, and we still have some remaining instructions to execute.

0x585733ff

TOP: XRES

The next one is 0x58, which will push the value of the program counter to the stack. The program counter basically tracks the opcode the EVM is looking at at a given time during program execution, starting with 0 for the very first opcode and increasing by 1 for each byte. In other words, it allows the EVM to say “we’re at byte of the contract bytecode now”. But here, this opcode is being used for convenience to serve a different purpose. PC is one of the cheapest opcodes which pushes a value to the stack. As we’ll see very soon, all we really need is to push a value to the stack here. For now, lets refer to the value that PC pushes to the stack as PCRES for “PC result”.

The remaining bytecode and current stack are:

0x5733ff

TOP: PCRES | XRES

The next opcode, 0x57, is the JUMPI instruction, which consumes two items from the stack and changes the program counter, causing execution to jump if the second item on the stack is any non-zero value.

If XRES is not zero (when msg.sender IS NOT the GST2 contract), a jump will occur and cause the program to jump to the instruction at position PCRES in the contract’s code. And it will cause execution to throw. This is because in the EVM, JUMPs require valid JUMPDESTs to exist. As there are no JUMPDESTs in the bytecode, this will end contract execution, which is the desired behavior. This is what prevents any caller aside from the GST2 contract from destroying child contracts.

On the other hand, if XRES is zero (when msg.sender IS the GST2 contract), the jump will not occur. Both elements on the stack will be consumed, and we will continue execution, going now to the last two opcodes, 0x33 and 0xff.

0x33 (which is CALLER) will push the caller’s address to the stack, which at this point must be the GST2 contract. Then, 0xff will trigger a SELFDESTRUCT, consuming one stack element (the caller’s address stored previously). This will destroy the contract, first sending funds to the specified destination. Consequently, any Ether in balance and gas refund resulting from destruction of the contract is forwarded back to the parent GST2 contract.

That’s it! Isn’t bytecode fun and easy?

What is the difference between GST2 and CHI?

Here’s the bytecode passed into create2 to deploy child contracts of the CHI contract:

0x746d4946c0e9F43F4Dee607b0eF1fA1c3318585733ff6000526015600bf30000

Essentially, the CHI contract deploys a very similar child contract compared to GST2, with the exception that a few opcodes change.

The CHI contract address has one more byte of zeroes compared to the GST2 contract address. This allows the CHI contract’s bytecode to be 1 byte smaller. Therefore, the PUSH22 and PUSH15 instructions from GST2 become PUSH21 and PUSH14 instructions in CHI, since the child bytecode is one byte smaller.

Another small difference is that the zero-bytes in CHI are at the end of the bytestring rather than at the beginning. This doesn’t actually make much of a difference, except that we need to pay attention to byte indices when appropriate. CHI does so, as shown in the call to create2. Instead of using indices 1-32, like in GST, the create2 opcode here uses index 0 through 30. The second and third parameters being 0 and 30 code for this, effectively saying “use the data in memory, starting at byte 0 and spanning for 30 bytes”.

Although GST2 uses create and CHI uses create2, they both share the very important property that the address at which a new contract will be created can be computed within a smart contract. This computation is performed within the CHI when calculating the address at which to destroy child contracts.

Also note that the operations of the initcode for the child contract will differ, as they will pass 1 less byte back as memory. The second-to-last non-zero byte of the bytecode changes from 0x0a to 0x0b, and the fourth-to-last non-zero byte changes from 0x16 to 0x15. This is done to indicate that the RETURN operation should be returning a value at memory from slot 0x0b and spanning for 0x15 bytes, rather than starting at 0x0a and spanning for 0x16 bytes as in GST2. This difference is what makes the contract return (and therefore deploy) 21 bytes of runtime code, rather than 22.

And, of course, the 15 bytes of contract address in the GST2 contract will be replaced by 14 bytes in the CHI contract, so don’t let that throw you off. It’s just a reference to the token contract address.