EVM Deep Dives: The Path to Shadowy Super Coder 🥷 💻 - Part 2

Let's take a trip down memory lane

Mar 05, 2022

This is the second installment in a series of articles that will deep dive into the EVM and build the foundational knowledge needed to become a “shadowy super coder”. This article will build on the knowledge gained from Part 1 so if you haven’t read it yet I encourage you to do so.

In Part 1 we explored how the EVM knows which bytecode to run depending on which contract function is called. This helped us build an understanding of the call stack, calldata, function signatures & the EVM opcodes instructions.

In Part 2 we’ll take a trip down “memory” lane and provide a comprehensive review of what contract memory is and how it works under the EVM hood.

A Trip Down Memory Lane

As you will recall in Part 1 we took a look at the default 1_Storage.sol contract from remix.

We then generated the byte code and zoomed in on the part relating to function selection. In this article, we going to focus on the first 5 bytes of the contract runtime bytecode.

6080604052

60 80                       =   PUSH1 0x80
60 40                       =   PUSH1 0x40
52                          =   MSTORE

These 5 bytes represent the initialisation of the “free memory pointer”. To fully understand what that means and what these bytes do we must first build your understanding of the data structures that govern contract memory.

Memory Data Structure

Contract memory is a simple byte array, where data can be stored in 32 bytes (256 bit) or 1 byte (8 bit) chunks and read in 32 bytes (256 bit) chunks. The image below illustrates this structure along with the read/write functionality of contract memory.

source: https://takenobu-hs.github.io/downloads/ethereum_evm_illustrated.pdf

This functionality is determined by the 3 opcodes that operate on memory.

MSTORE (x, y) - Store a 32 byte (256-bit) value “y” starting at memory location “x”
MLOAD (x) - Load 32 bytes (256-bit) starting at memory location “x” onto the call stack
MSTORE8 (x, y) - Store a 1 byte (8-bit) value “y” at memory location “x” (the least significant byte of the 32-byte stack value).

You can think of the memory location as simply the array index of where to start writing/reading the data. If you want to write/read more than 1 byte of data you simply continue writing or reading from the next array index.

EVM Playground

This EVM playground will help solidify your understanding of what these 3 opcodes do and how memory locations work. Click Run and the curled arrow at the top right to jump through the opcodes and see how the stack and memory are altered. (There are comments above the opcodes to describe what each section does)

While walking through the EVM playground above you may have noticed a few strange occurrences. First, when we wrote a single byte 0x22 using MSTORE8 to memory location 32 (0x20) the memory changed from

You may ask the question, what’s with all the additional zeros we only added 1 byte?

Memory Expansion

When your contract writes to memory, you have to pay for the number of bytes written. If you are writing to an area of memory that hasn't been written to before, there is an additional memory expansion cost for using it for the first time.

Memory is expanded in 32 bytes (256-bit) increments when writing to previously untouched memory space.

Memory expansion costs scale linearly for the first 724 bytes and quadratically after that.

Above our memory was 32 bytes before we wrote 1 byte at location 32. At this point we began writing into untouched memory, as a result, the memory was expanded by another 32-byte increment to 64 bytes.

Note that all locations in memory are well-defined initially as zero which is why we see 2200000000000000000000000000000000000000000000000000000000000000 added to our memory.

Remember Memory is a Byte Array

The second thing you may have noticed occurred when we ran an MLOAD from memory location 33 (0x21). We returned the following value to the call stack.

3300000000000000000000000000000000000000000000000000000000000000

We were able to start our read from a non 32 factor.

Remember memory is a byte array meaning we can start our reads (and our writes) from any memory location. We are not constrained to multiples of 32. Memory is linear and can be addressed at the byte level.

Memory can only be newly created in a function. It can either be newly instantiated complex types like array/struct (e.g. via new int[...]) or copied from a storage referenced variable.

Now we have an understanding of the data structures let’s return to the free memory pointer.

Free Memory Pointer

The free memory pointer is simply a pointer to the location where free memory starts. It ensures smart contracts keep track of which memory locations have been written to and which haven’t.

This protects against a contract overwriting some memory that has been allocated to another variable.

When a variable is written to memory the contract will first reference the free memory pointer to determine where the data should be stored.

It then updates the free memory pointer by noting how much data is to be written to the new location. A simple addition of these 2 values will yield where the new free memory will start.

freeMemoryPointer + dataSizeBytes = newFreeMemoryPointer

Bytecode

As mentioned before the free memory pointer is defined at the start of the runtime bytecode through these 5 opcodes.

60 80                       =   PUSH1 0x80
60 40                       =   PUSH1 0x40
52                          =   MSTORE

These effectively state that the free memory pointer is located in memory at byte 0x40 (64 in decimal) and has a value of 0x80 (128 in decimal).

The immediate questions you may have are why the values 0x40 & 0x80 are used above. The answer to this can be found in the following statement.

Solidity’s memory layout reserves four 32-byte slots:
0x00 - 0x3f (64 bytes): scratch space
0x40 - 0x5f (32 bytes): free memory pointer
0x60 - 0x7f (32 bytes): zero slot

We can see that 0x40 is the predefined location by solidity for the free memory pointer. The value 0x80 is merely the first memory byte that is available to write to after the 4 reserved 32-byte slots.

We’ll quickly run through what each reserved section does.

Scratch space, can be used between statements i.e. within inline assembly and for hashing methods.
Free memory pointer, currently allocated memory size, start location of free memory, 0x80 initially.
The zero slot, is used as an initial value for dynamic memory arrays and should never be written to.

Memory in a Real Contract

To consolidate what we’ve learned so far we’re going to look at how memory and the free memory pointer update within real solidity code.

I’ve created a MemoryLane Contract and intentionally kept it extremely simple. It has a single function that merely defines two arrays of lengths 5 & 2 and then assigns b[0] a value of 1. Despite the simplicity, there is a lot that goes on when these 3 lines of code are executed.

To view the details of how this solidity code executes within the EVM it can be copied into a remix IDE. After it’s copied you can compile the code, deploy it, run the memoryLane() function and then enter debugging mode to step through the opcodes (See here for instructions on how to do this). I have extracted a simplified version into an EVM Playground and will run through it below.

The simplified version organises the opcodes sequentially removing any JUMP’s and any code that isn’t relevant to memory manipulation. Comments have been added to the code to provide context to what is being done. The code is split into 6 distinct sections which we will delve into.

I cannot stress enough how important it is to use the playground and step through the opcodes yourself. This will greatly enhance your learning. Now let’s dig into the 6 sections.

Free Memory Pointer Initialisation (EVM Playground Lines 1-15)

First, we have “free memory pointer initialisation” which we have discussed above. A value of 0x80 (128 in decimal) is pushed onto the stack. This is the value of the free memory pointer and is determined by Solidity’s memory layout. At this stage, we have nothing in memory.

Next, we push the free memory pointer location 0x40 (64 in decimal) again determined by Solidity’s memory layout.

Finally, we call MSTORE which pops the first item off the stack 0x40 to determine where to write to in memory and the second value 0x80 as what to write.

This leaves us with an empty stack but we have now populated some memory. This memory representation is in hexadecimal where each character represents 4 bits.

We have 192 hexadecimal characters in memory which means we have 96 bytes (1 byte = 8 bits = 2 hexadecimal characters).

If we refer back to Solidity’s memory layout we were told the first 64 bytes would be allocated as scratch space and the next 32 would be for the free memory pointer.

Thats exactly what we have below.

Memory Allocation Variable “a” & Free Memory Pointer Update (EVM Playground Lines 16-34)

For the remaining sections, we’re going to skip to the end state of each section and give a high-level overview of what happened for brevity. The individual opcode steps can be seen via the EVM playground.

Next memory is allocated for variable “a” (bytes32[5]) and the free memory pointer is updated.

The compiler will have determined how much space is required through the array size and the default array element size.

Remember elements in memory arrays in Solidity always occupy multiples of 32 bytes (this is even true for bytes1[], but not for bytes and string)

The size of the array multiplied by 32 bytes tells us how much memory we need to allocate.

In this case that calculation 5 * 32 yields 160 or 0xa0 in hex. We can see this being pushed onto the stack and added to the current free memory pointer 0x80 (128 in decimal) to get the new free memory pointer value.

This returns 0x120 (288 in decimal) which we can see has been written to the free memory pointer location.

The call stack keeps the memory location of the variable “a” on the stack 0x80 so it can reference it later if needed. 0xffff represents a JUMP location and can be ignored since it isn’t relevant to memory manipulation.

Memory Initialisation Variable “a” (EVM Playground Lines 35-95)

Now that the memory has been allocated and the free memory pointer updated we need to initialise the memory space for variable “a”. Since the variable is just declared and not assigned it will be initialised with the zero value.

To do this write the EVM uses CALLDATACOPY which takes in 3 variables.

memoryOffset (which memory location to copy the data to)
calldataOffset (byte offset in the calldata to copy)
size (byte size to copy)

In our case, the memoryOffset is the memory location for variable “a” (0x80). The calldataOffset is the actual size of our calldata since we don’t want to copy any of the calldata, we want to initialise the memory with the zero value. Finally, the size is 0xa0 or 160 bytes since that is the size of the variable.

We can see our memory has expanded to 288 bytes (this includes the zero slot) and the stack again holds the memory location of the variable and a JUMP location on the call stack.

Memory Allocation Variable “b” & Free Memory Pointer Update (EVM Playground Lines 96-112)

This is the same as the memory allocation and free memory pointer update for variable “a” except this time it is for “bytes32[2] memory b”.

The memory pointer is updated to 0x160 (352 in decimal) which is equal to the previous free memory pointer 288 plus the size of the new variable in bytes 64.

Note that the free memory pointer has updated in memory to 0x160 and we now have the memory location for variable “b” (0x120) on the stack.

Memory Initialisation Variable “b” (EVM Playground Lines 113-162)

Same as memory initialisation of variable “a”.

Note that memory has increased to 352 bytes. The stack still holds memory locations for the 2 variables.

Assign Value to b[0] (EVM Playground Lines 163-207)

Finally, we get to assigning a value to array “b” index 0. The code states that b[0] should have a value of 1.

This value is pushed onto the stack 0x01. A bit shift left occurs next however the input for the bit shift is 0 meaning our value doesn’t change.

Next, the array index position to be written to 0x00 is pushed to the stack and a check is done to verify this value is less than the length of the array 0x02. If it isn’t the execution jumps to a different part of the bytecode which handles this error state.

The MUL (multiply) & ADD opcodes are used to determine where in memory the value needs to be written for it to correspond to the correct array index.

0x20 (32 in decimal) * 0x00 (0 in decimal) = 0x00

Remember memory arrays are 32-byte elements so this value represents the start location of an array index. Given we are writing to index 0 we have no offset.

0x00 + 0x120 = 0x120 (288 in decimal)

ADD is used to add this offset value to the memory location for variable “b”. Given our offset was 0 we will write our data straight to the assigned memory location.

Finally, an MSTORE stores the value 0x01 to this memory location 0x120.

The image below shows the system state at the end of the function execution. All the stack items have been popped off.

Note in actuality in remix there are a few items left on the stack, a JUMP location and the function signature however they are not relevant to memory manipulation and therefore have been omitted in the EVM playground.

Our memory has been updated to include the b[0] = 1 assignment, on the third last line of our memory a 0 value has turned into a 1.

You can verify the value is at the correct memory location, b[0] should occupy locations 0x120 - 0x13f (bytes 289 - 320).

There we have it 🎉 , that was a lot of information to take in but we now have a solid understanding of how contract memory works. This will serve us well the next time we need to write some solidity code.

When you’re jumping through some contract opcodes and see certain memory locations that keep popping up (0x40) you’ll now know exactly what they mean.

Next, in the series, we “Demystify Storage Slot Packing” in EVM Deep Dives - Part 3.

Until next time.

noxx

Twitter @noxx3xxon

Caleb Ogundiya

Apr 18, 2022

“While walking through the EVM playground above you may have noticed a few strange occurrences. First, when we wrote a single byte 0x22 using MLOAD8 to memory location 32 (0x20)“

I think you meant to say MSTORE8 here, not MLOAD8

Expand full comment