Speed Boost from the Past

tl;dr: similarly to another rust reimplementation, a new reimplementation is massively faster and, during the optimization, a magical algorithm from the past makes everything even faster!

I have not wrote anything for quite a long time! A lot of working, running, climbing. I have been quite busy! However, I have a small break and I was wondering if there is something cool to share from one of my side-projects. So… how to begin…

Grids and Diagonals

Beware, I will spare you all the theoretical technicalities and where the following problem appears. Rest assured that it is indeed a funny problem that I will definitely share once I have some results!

The maths-problem I’m interested relates to grids, partitions and similar structure. Consider two fixed sized lists of booleans, meaning that there are $A = (a_0, a_1, \dots, a_n)$ and $B = (b_0, b_1, \dots, b_n)$ for some fixed length $n$ and each entry is a boolean value, for simplicity consider ${0,1}$. From the lists, we build a grid $G$ where each entry $g_{i,j}$ denotes if both $a_i$ and $b_j$ are $1$.

For example, consider we fix $n = 4$ and have the lists $A = [1,0,0,1,1]$ and $B=[0,1,1,0,1]$. To build the grid $G$, we put $A$ and $B$ on two sides of a square and start filling up with the appropriate values:

				b0	b1	b2	b3	b4
				0		1		1		0		1
			--------------------
a0	1	|	0		1		1		0		1
a1	0	|	0		0		0		0		0
a2	0	|	0		0		0		0		0
a3	1	|	0		1		1		0 	1
a4	1	|	0		1		1		0 	1

Grid from $A$ and $B$.

Clearly, different $A$ and $B$ will provide a different grid $G$ and I’m interested in the grids that achieve a specific property: a grid $G$ is special if each diagonal has at least a $1$. Basically, for each $k \in [-n,n]$, we have to check if $1 \in [g_{i,j} \mid i-j = k ]$.
“Why?” you might ask but this is what I need!

The grid from the example has diagonals $(0,1,1,1,1,1,1,0,1)$ denoting that there are two diagonals without a $1$ meaning that the grid is sadly non special 😢.
But you should not be worrying because, if we instead take $A = [1,0,0,0,1]$ and $B=[1,1,1,1,1]$, we get a special grid!

Good, easy peasy, anything more? Well… out of all the special grid, the focus is in the one that are generated by $A,B$ that has the minimal amount of $1$s. For example, $A = [1,0,0,0,1]$ and $B=[1,1,1,1,1]$ have a total of $7$ ones and, by brute forcing all the possible solutions, one can see that this is indeed the minimum amount. On top of that, we get many possible pairs of special lists!

A: (0, 4), B: (0, 1, 2, 3, 4)
A: (0, 1, 4), B: (0, 2, 3, 4)
A: (0, 2, 4), B: (0, 1, 3, 4)
A: (0, 3, 4), B: (0, 1, 2, 4)
A: (0, 1, 2, 4), B: (0, 3, 4)
A: (0, 1, 3, 4), B: (0, 2, 4)
A: (0, 2, 3, 4), B: (0, 1, 4)
A: (0, 1, 2, 3, 4), B: (0, 4)

All the pair of lists that generate a special grid. The number denotes the indices that contains a $1$, e.g. $A: (0,4)$ is $A = [1,0,0,0,1]$.

Perfect, we are ready for the question: what is the minimal lists size for each $n$?

Brute Forcing

Sadly, I was unable to find an easy solution out there¹. The only trivial result is that the minimal size cannot be less than $\sqrt{2n+1}$.² For now, brute forcing looks as the only possible solution.

For a given $n$, there are $2^{2\cdot(n-2)}$ choices for $A,B$ which, quite quickly, turns into an “unreasonable to compute” amount of lists to generate, compute the grid and check that it is special! To reduce such an amount, I have been introducing different tricks:

Storing the current size minimum and solutions so that, after selecting $A,B$, if the new size is already bigger than the current minimum, skip the pair and go to the next. If equal, store the pair and if smaller, save the new minimum and erase the current solutions. In this way, we skip the grid computation/check for already bad pairs but we have to still iterate other all the pairs.
Make it multi-thread, clearly!³
Use integers instead of lists meaning that $A,B$ are integers and all operations are effectively bit-wise operation.
Maximize the amount of information known. For example, the first and last entry must always be $1$ since they are the corners in the diagonals which each has a single pair of entry. Since the computation are done sequentially for increasing $n$, use the fact that the minimal size for $(n-1)$ is the possible minimal size for $n$ too! Additionally, there is a upper bound too which is the previous size plus one!⁴
Instead of navigating all the space at random, follow a distinct pattern.

What do I mean with the last point?

Lost in the Archive

After thinking for some time, it is clear that it would be best to iterate only the elements that have a given weight/size. Basically, iterate on the number of $1$s that $A$ has and then only construct $B$s that has the allowed amount of $1$s too.

Concretely, if the previous minimal size for $(n-1)$ is $l$, then we get the maximal amount of possible combination using combinatorics:

\[\sum_{i=0}^{l-3} \binom{n-2}{i} \cdot \binom{n-1}{l-3-i} = \binom{2n - 3}{l - 3}\]

which is way, way, waaaayyy less than $2^{2\cdot(n-2)}$!

Perfect, a clever theoretical way to reduce the number of checks. Now, how to implement this?

During the search adventure on the web, I found a curious source⁵: HAKMEM a MIT tech report from 1972 where Bill Gosper:

unsigned nexthi_same_count_ones(unsigned a) {
  /* works for any word length */
  unsigned c = (a & -a);
  unsigned r = a+c;
  return (((r ^ a) >> 2) / c) | r);
}

ITEM 175 (Gosper): To get the next higher number with the same number of 1 bits.

Bingo!
Exactly the trick I needed to quickly iterate!

Wrapping Up

So, how fast did the code got?

---------------------------------------------------
Code Iteration 			n=15 		Time (s)			Gain (x)
---------------------------------------------------
Original Python							88.336				  ---

Rust: mere translation			10.056 				 8.78 x
Rust: size trick					 	 1.356 				65.15 x

Rust: mini improv. 					 6.821 				12.95 x
Rust: mini + LLM MT					 0.703 			 125.66 x

Rust: all improv. 					 1.411 				62.61 x
Rust: all + my MT						 0.348 		 	 253.84 x

Timing table for different iteration of the code. "LLM MT" indicates multi-threading solution from LLM, "my MT" is my ad-hoc solution.

As expected, improvements makes the code faster except there is a small peculiarity: executing the size trick on a single thread is slightly faster than my multi-thread code with a single thread. This makes sense since this last solution has additional threading checks, spawning and receiving messages from the single solving thread.

However, as soon as the number of thread increases, the gain is definitely noticeable!

---------------------------------------------------
Code Iteration 			n=25 		Time (s)			Gain (x)
---------------------------------------------------
Rust: mini + LLM MT					73175.46 			  ---
Rust: all + my MT						 7468.55 		 	 9.80 x

Timing table for different iteration of the code and a higher value of $n$.

Concretely, the multi-threading is so essential to get it right. My solution is way faster than the LLM one and, most importantly, allows to effectively tailor suite the resources!⁶

The moral of this coding-story?

“If you search well enough, you will find solutions in the past.”

Footnotes