tl;dr: similarly to another rust reimplementation,
a new reimplementation is massively faster and, during the optimization, a magical
algorithm from
the past makes everything even faster!
I have not wrote anything for quite a long time! A lot of working, running, climbing. I have been quite busy! However, I have a small break and I was wondering if there is something cool to share from one of my side-projects. So… how to begin…
Beware, I will spare you all the theoretical technicalities and where the following problem appears. Rest assured that it is indeed a funny problem that I will definitely share once I have some results!
The maths-problem I’m interested relates to grids, partitions and similar structure. Consider two fixed sized lists of booleans, meaning that there are $A = (a_0, a_1, \dots, a_n)$ and $B = (b_0, b_1, \dots, b_n)$ for some fixed length $n$ and each entry is a boolean value, for simplicity consider ${0,1}$. From the lists, we build a grid $G$ where each entry $g_{i,j}$ denotes if both $a_i$ and $b_j$ are $1$.
For example, consider we fix $n = 4$ and have the lists $A = [1,0,0,1,1]$ and $B=[0,1,1,0,1]$. To build the grid $G$, we put $A$ and $B$ on two sides of a square and start filling up with the appropriate values:
b0 b1 b2 b3 b4
0 1 1 0 1
--------------------
a0 1 | 0 1 1 0 1
a1 0 | 0 0 0 0 0
a2 0 | 0 0 0 0 0
a3 1 | 0 1 1 0 1
a4 1 | 0 1 1 0 1
Clearly, different $A$ and $B$ will provide a different grid $G$ and I’m interested in the grids
that achieve a specific property: a grid $G$ is special if each diagonal has at least a $1$.
Basically, for each $k \in [-n,n]$, we have to check if $1 \in [g_{i,j} \mid i-j = k ]$.
“Why?” you might ask but this is what I need!
The grid from the example has diagonals $(0,1,1,1,1,1,1,0,1)$ denoting that there are two
diagonals without a $1$ meaning that the grid is sadly non special 😢.
But you should not be worrying because, if we instead take $A = [1,0,0,0,1]$ and $B=[1,1,1,1,1]$,
we get a special grid!
Good, easy peasy, anything more? Well… out of all the special grid, the focus is in the one that are generated by $A,B$ that has the minimal amount of $1$s. For example, $A = [1,0,0,0,1]$ and $B=[1,1,1,1,1]$ have a total of $7$ ones and, by brute forcing all the possible solutions, one can see that this is indeed the minimum amount. On top of that, we get many possible pairs of special lists!
A: (0, 4), B: (0, 1, 2, 3, 4)
A: (0, 1, 4), B: (0, 2, 3, 4)
A: (0, 2, 4), B: (0, 1, 3, 4)
A: (0, 3, 4), B: (0, 1, 2, 4)
A: (0, 1, 2, 4), B: (0, 3, 4)
A: (0, 1, 3, 4), B: (0, 2, 4)
A: (0, 2, 3, 4), B: (0, 1, 4)
A: (0, 1, 2, 3, 4), B: (0, 4)
Perfect, we are ready for the question: what is the minimal lists size for each $n$?
Sadly, I was unable to find an easy solution out there1. The only trivial result is that the minimal size cannot be less than $\sqrt{2n+1}$.2 For now, brute forcing looks as the only possible solution.
For a given $n$, there are $2^{2\cdot(n-2)}$ choices for $A,B$ which, quite quickly, turns into an “unreasonable to compute” amount of lists to generate, compute the grid and check that it is special! To reduce such an amount, I have been introducing different tricks:
Storing the current size minimum and solutions so that, after selecting $A,B$, if the new size is already bigger than the current minimum, skip the pair and go to the next. If equal, store the pair and if smaller, save the new minimum and erase the current solutions. In this way, we skip the grid computation/check for already bad pairs but we have to still iterate other all the pairs.
Make it multi-thread, clearly!3
Use integers instead of lists meaning that $A,B$ are integers and all operations are effectively bit-wise operation.
Maximize the amount of information known. For example, the first and last entry must always be $1$ since they are the corners in the diagonals which each has a single pair of entry. Since the computation are done sequentially for increasing $n$, use the fact that the minimal size for $(n-1)$ is the possible minimal size for $n$ too! Additionally, there is a upper bound too which is the previous size plus one!4
Instead of navigating all the space at random, follow a distinct pattern.
What do I mean with the last point?
After thinking for some time, it is clear that it would be best to iterate only the elements that have a given weight/size. Basically, iterate on the number of $1$s that $A$ has and then only construct $B$s that has the allowed amount of $1$s too.
Concretely, if the previous minimal size for $(n-1)$ is $l$, then we get the maximal amount of possible combination using combinatorics:
\[\sum_{i=0}^{l-3} \binom{n-2}{i} \cdot \binom{n-1}{l-3-i} = \binom{2n - 3}{l - 3}\]which is way, way, waaaayyy less than $2^{2\cdot(n-2)}$!
Perfect, a clever theoretical way to reduce the number of checks. Now, how to implement this?
During the search adventure on the web, I found a curious source5: HAKMEM a MIT tech report from 1972 where Bill Gosper:
unsigned nexthi_same_count_ones(unsigned a) {
/* works for any word length */
unsigned c = (a & -a);
unsigned r = a+c;
return (((r ^ a) >> 2) / c) | r);
}
Bingo!
Exactly the trick I needed to quickly iterate!
So, how fast did the code got?
---------------------------------------------------
Code Iteration n=15 Time (s) Gain (x)
---------------------------------------------------
Original Python 88.336 ---
Rust: mere translation 10.056 8.78 x
Rust: size trick 1.356 65.15 x
Rust: mini improv. 6.821 12.95 x
Rust: mini + LLM MT 0.703 125.66 x
Rust: all improv. 1.411 62.61 x
Rust: all + my MT 0.348 253.84 x
As expected, improvements makes the code faster except there is a small peculiarity: executing the size trick on a single thread is slightly faster than my multi-thread code with a single thread. This makes sense since this last solution has additional threading checks, spawning and receiving messages from the single solving thread.
However, as soon as the number of thread increases, the gain is definitely noticeable!
---------------------------------------------------
Code Iteration n=25 Time (s) Gain (x)
---------------------------------------------------
Rust: mini + LLM MT 73175.46 ---
Rust: all + my MT 7468.55 9.80 x
Concretely, the multi-threading is so essential to get it right. My solution is way faster than the LLM one and, most importantly, allows to effectively tailor suite the resources!6
The moral of this coding-story?
“If you search well enough, you will find solutions in the past.”
if you have any tip, please leat me know! ↩
Why? Well, there are $2n+1$ diagonals to fill up and the total number of elements in the grid is $l^2$ where $l$ is the length of $A$ (and $B$ too). Hocus pocus and we get that $l^2 = 2n+1$. ↩
on top, make the multi-thread simple. In my case, home-made thread-pool that handles the thread ad-hoc with my algorithm. All LLM were proposing absurd solutions and the known frameworks are either an overkill or does not allow me to do the fine-tuning tricks I need. ↩
quick proof: the $l$ ones for both $A$ and $B$ have $a_{n-1} = b_{n-1} = 1$ otherwise the last diagonal cannot be $1$. For $n$, move these two elements $a_{n-1} \to a_{n}$ and $b_{n-1} \to b_{n}$ which opens the second-to-last diagonal. The intuition is that either shuffling the other elements will fill up everything (thus no added elements for the size) or by adding either $a_{n-1}$ or $b_{n-1}$ will provide a special solution. Thus, either the same or one element more. ↩
the LLM MT was quickly and irresponsibly spawning threads that got my computing-laptop to basically run out of CPU, RAM and swap space! The laptop was so unresponsive that it was listing a $800\%$ of CPU usage despite having only four cores. Free cores, anyone? ↩