On-chain indexing

Problem

One of the first problems you face when working with an event history retrieval in EVM — is the inability of most RPC nodes to properly filter the events' history. eth_getLogs allows you to retrieve events only in ascending order, and all the public RPC nodes have limitations over the number of blocks you can scan at once.

It causes a huge problem when you want to perform a simple task such as "retrieve the last ten events". In general, you will have to backward-traverse all the blockchain history until you find the first ten events. The dramatic downside of this problem is that having limits for the scanned block count of the public RPC nodes means that you will have to make hundreds or even thousands of queries until extracting these events. Moreover, users will have to wait minutes just to see their last ten messages. The "good news" is that you will never have such a problem: any public RPC node bans you after the first hundred such queries, and your user will just close the dApp.

We figured out how to optimize such queries drastically.

First of all, we have to store the information about the last X messages for the certain recipient in our smart contracts. Secondly, we have to optimize our smart contract execution as much as possible to reduce gas costs since executing anything inside the smart contract is very expensive.

Solution

In Solidity you have the default type for numbers - Uint256. Internally, EVM aligns memory cells in the same chunks of 32 bytes, so using such variable means having no memory and gas overheads for working with it.

The great way to speed up retrieval is to know the block numbers of the blocks where messages are stored which significantly reduces amounts of queries to the RPC node and removes redundant queries at all.

Optimization

We decided that storing info about the last ten messages would be enough for the majority of possible UX cases (i.e. that's the amount you usually want to show to your user when she opens the dApp).

The question is — how to put ten block numbers into one 32-byte variable without gas overhead?

We see no blockchain is even close to getting to the 2,100,000,000 blocks in the next few years. However, there are blockchains which already have more than 17,000,000 blocks. It means that the block number does not fit a 3-byte number, but perfectly fits a 4-byte.

School-grade math says that you can't put ten 4-byte numbers into one 32-byte variable. But all RPC nodes are able to scan at least 500 blocks for events per one request. It means that having exact block numbers is not important, so we can shrink the precision by one byte and still be able to retrieve messages efficiently. Therefore, we will put ten 3-byte numbers into a 32-byte variable using it as a Ring Buffer (https://www.geeksforgeeks.org/introduction-and-array-implementation-of-circular-queue/). The first byte is used as a buffer pointer, the second one — is for nothing (for now), and the last 30 bytes are split into the 3-byte chunks (full capacity of ten elements).

The Ring Buffer data structure was used deliberately to reduce the number of operations, hence, gas costs.

But EVM support for the bitwise operations is quite poor, so, writing such a code in a straightforward way leads to higher gas consumption and unhappy users.

That's why we decided to predefine all the bit-masks as constants in our smart contract (the horrifying screenshot above). It helps us to save gas consumption on generating these masks in runtime.

As a result, we are able to create a poorly looking but perfectly optimized function for generating these 32-byte numbers:

function storeBlockNumber(uint256 indexValue, uint256 blockNumber) public pure returns (uint256) {
    blockNumber = blockNumber & 0xffffff; // 3 bytes
    uint256 currIdx = indexValue & indexF;
    if (currIdx == 0) {
        return (indexValue & empty1) | index1 | (blockNumber * shift192);
    } else
    if (currIdx == index1) {
        return (indexValue & empty2) | index2 | (blockNumber * shift168);
    } else
    if (currIdx == index2) {
        return (indexValue & empty3) | index3 | (blockNumber * shift144);
    } else
    if (currIdx == index3) {
        return (indexValue & empty4) | index4 | (blockNumber * shift120);
    } else
    if (currIdx == index4) {
        return (indexValue & empty5) | index5 | (blockNumber * shift096);
    } else
    if (currIdx == index5) {
        return (indexValue & empty6) | index6 | (blockNumber * shift072);
    } else
    if (currIdx == index6) {
        return (indexValue & empty7) | index7 | (blockNumber * shift048);
    } else
    if (currIdx == index7) {
        return (indexValue & empty8) | index8 | (blockNumber * shift024);
    } else
    if (currIdx == index8) {
        return (indexValue & empty9) | index9 | blockNumber;
    } else {
        return (indexValue & empty0) | (blockNumber * shift216);
    }
}

Now, we create mapping from recipient to messages index in our smart contract:

mapping (uint256 => uint256) public recipientToPushIndex;

After that, we add two new lines of code into the message-sending function:

uint256 current = recipientToPushIndex[recipient];
recipientToPushIndex[recipient] = storeBlockNumber(current, block.number / 256);

Done.

A few simple tricks allow us to retrieve users' last messages without the necessity to traverse all historic events.

The client-side code to decode the index value will look like this:

function decodeIndexValue(val: Uint8Array): number[] {
	const idx = val[0];
	const vals = [
		(val[2] << 16) + (val[3] << 8) + val[4],
		(val[5] << 16) + (val[6] << 8) + val[7],
		(val[8] << 16) + (val[9] << 8) + val[10],
		(val[11] << 16) + (val[12] << 8) + val[13],
		(val[14] << 16) + (val[15] << 8) + val[16],
		(val[17] << 16) + (val[18] << 8) + val[19],
		(val[20] << 16) + (val[21] << 8) + val[22],
		(val[23] << 16) + (val[24] << 8) + val[25],
		(val[26] << 16) + (val[27] << 8) + val[28],
		(val[29] << 16) + (val[30] << 8) + val[31],
	];
	
	const res: number[] = [];
	for (let i = idx + 10; i > idx; i -= 1) {
		const ri = i % 10;
		if (vals[ri] === 0) {
			break;
		}
		res.push(vals[ri] * 256);
	}
	
	return res;
}

Moreover, we decided to put the previous value of the index into every emitted event. It means that by reading the event you will get the info about previous ten events as well. Now we can traverse not only the last ten messages, but all the messages history of the user in an efficient and optimized way.

event BroadcastPush(
    address indexed sender,
    uint256 contentId,
    uint256 previousEventsIndex // index for the previous 10 events
);

You can try this approach using our BlockNumberRingBufferIndex.sol library. It could be used for any EVM events.

Last updated