Researchers Defeat Randomness to Create perfect local testability for information

Suppose you are trying to transmit a message. Convert each character into bits, and each bit into a signal. Then send it, over copper or fiber or air. Try as you might to be as careful as possible, what is received on the other side will not be the same as what you began with. Noise never fails to corrupt.

In the 1940s, computer scientists first confronted the unavoidable problem of noise. Five decades later, they came up with an elegant approach to sidestepping it: What if you could encode a message so that it would be obvious if it had been garbled before your recipient even read it? A book can’t be judged by its cover, but this message could.

They called this property local testability, because such a message can be tested super-fast in just a few spots to ascertain its correctness. Over the next 30 years, researchers made substantial progress toward creating such a test, but their efforts always fell short. Many thought local testability would never be achieved in its ideal form.

Now, in a preprint released on November 8, the computer scientist Irit Dinur of the Weizmann Institute of Science and four mathematicians, Shai Evra, Ron Livne, Alex Lubotzky and Shahar Mozes, all at the Hebrew University of Jerusalem, have found it.

[…]

Their new technique transforms a message into a super-canary, an object that testifies to its health better than any other message yet known. Any corruption of significance that is buried anywhere in its superstructure becomes apparent from simple tests at a few spots.

“This is not something that seems plausible,” said Madhu Sudan of Harvard University. “This result suddenly says you can do it.”

[…]

To work well, a code must have several properties. First, the codewords in it should not be too similar: If a code contained the codewords 0000 and 0001, it would only take one bit-flip’s worth of noise to confuse the two words. Second, codewords should not be too long. Repeating bits may make a message more durable, but they also make it take longer to send.

These two properties are called distance and rate. A good code should have both a large distance (between distinct codewords) and a high rate (of transmitting real information).

[…]

To understand why testability is so hard to obtain, we need to think of a message not just as a string of bits, but as a mathematical graph: a collection of vertices (dots) connected by edges (lines).

[…]

Hamming’s work set the stage for the ubiquitous error-correcting codes of the 1980s. He came up with a rule that each message should be paired with a set of receipts, which keep an account of its bits. More specifically, each receipt is the sum of a carefully chosen subset of bits from the message. When this sum has an even value, the receipt is marked 0, and when it has an odd value, the receipt is marked 1. Each receipt is represented by one single bit, in other words, which researchers call a parity check or parity bit.

Hamming specified a procedure for appending the receipts to a message. A recipient could then detect errors by attempting to reproduce the receipts, calculating the sums for themselves. These Hamming codes work remarkably well, and they are the starting point for seeing codes as graphs and graphs as codes.

[…]

Expander graphs are distinguished by two properties that can seem contradictory. First, they are sparse: Each node is connected to relatively few other nodes. Second, they have a property called expandedness — the reason for their name — which means that no set of nodes can be bottlenecks that few edges pass through. Each node is well connected to other nodes, in other words — despite the scarcity of the connections it has.

[…]

However, choosing codewords completely at random would make for an unpredictable dictionary that was excessively hard to sort through. In other words, Shannon showed that good codes exist, but his method for making them didn’t work well.

[…]

However, local testability was not possible. Suppose that you had a valid codeword from an expander code, and you removed one receipt, or parity bit, from one single node. That would constitute a new code, which would have many more valid codewords than the first code, since there would be one less receipt they needed to satisfy. For someone working off the original code, those new codewords would satisfy the receipts at most nodes — all of them, except the one where the receipt was erased. And yet, because both codes have a large distance, the new codeword that seems correct would be extremely far from the original set of codewords. Local testability was simply incompatible with expander codes.

[…]

Local testability was achieved by 2007, but only at the cost of other parameters, like rate and distance. In particular, these parameters would degrade as a codeword became large. In a world constantly seeking to send and store larger messages, these diminishing returns were a major flaw.

[…]

But in 2017, a new source of ideas emerged. Dinur and Lubotzky began working together while attending a yearlong research program at the Israel Institute for Advanced Studies. They came to believe that a 1973 result by the mathematician Howard Garland might hold just what computer scientists sought. Whereas ordinary expander graphs are essentially one-dimensional structures, with each edge extending in only one direction, Garland had created a mathematical object that could be interpreted as an expander graph that spanned higher dimensions, with, for example, the graph’s edges redefined as squares or cubes.

Garland’s high-dimensional expander graphs had properties that seemed ideal for local testability. They must be deliberately constructed from scratch, making them a natural antithesis of randomness. And their nodes are so interconnected that their local characteristics become virtually indistinguishable from how they look globally.

[…]

In their new work, the authors figured out how to assemble expander graphs to create a new graph that leads to the optimal form of locally testable code. They call their graph a left-right Cayley complex.

As in Garland’s work, the building blocks of their graph are no longer one-dimensional edges, but two-dimensional squares. Each information bit from a codeword is assigned to a square, and parity bits (or receipts) are assigned to edges and corners (which are nodes). Each node therefore defines the values of bits (or squares) that can be connected to it.

To get a sense of what their graph looks like, imagine observing it from the inside, standing on a single edge. They construct their graph such that every edge has a fixed number of squares attached. Therefore, from your vantage point you’d feel as if you were looking out from the spine of a booklet. However, from the other three sides of the booklet’s pages, you’d see the spines of new booklets branching from them as well. Booklets would keep branching out from each edge ad infinitum.

“It’s impossible to visualize. That’s the whole point,” said Lubotzky. “That’s why it is so sophisticated.”

Crucially, the complicated graph also shares the properties of an expander graph, like sparseness and connectedness, but with a much richer local structure. For example, an observer sitting at one vertex of a high-dimensional expander could use this structure to straightforwardly infer that the entire graph is strongly connected.

“What’s the opposite of randomness? It’s structure,” said Evra. “The key to local testability is structure.”

To see how this graph leads to a locally testable code, consider that in an expander code, if a bit (which is an edge) is in error, that error can only be detected by checking the receipts at its immediately neighboring nodes. But in a left-right Cayley complex, if a bit (a square) is in error, that error is visible from multiple different nodes, including some that are not even connected to each other by an edge.

In this way, a test at one node can reveal information about errors from far away nodes. By making use of higher dimensions, the graph is ultimately connected in ways that go beyond what we typically even think of as connections.

In addition to testability, the new code maintains rate, distance and other desired properties, even as codewords scale, proving the c3 conjecture true. It establishes a new state of the art for error-correcting codes, and it also marks the first substantial payoff from bringing the mathematics of high-dimensional expanders to bear on codes.

[…]

 

Source: Researchers Defeat Randomness to Create Ideal Code | Quanta Magazine

Robin Edgar

Organisational Structures | Technology and Science | Military, IT and Lifestyle consultancy | Social, Broadcast & Cross Media | Flying aircraft

 robin@edgarbv.com  https://www.edgarbv.com