Weeknotes — September 22, 2023
Review the latest research, protocol updates, and community insights for the Tableland network.
Begin transmission…
SNARKs still don’t scale on their own… but you can kinda of scale them enough to be useful
One of the best examples of usable zero-knowledge proof systems in the industry right now, is (surprisingly) coming from the Mina protocol. In addition to providing nice libraries to work with, the Mina team has come up with an interesting approach to scaling recursive SNARKs, based on a functional “scan” construct. The functional scan comes up a lot in programing. And you can represent it with data structures like a Merkle Mountain Range, or a Binary Numeral Trees, or a History Trees, or…
What’s cool/useful here is that you can leverage all the work done in all those related projects to reduce compute and size requirements for scaling SNARK computations. This is useful, because the most expensive thing about a recursive SNARK is… the recursive part 🤪. So if we can reduce the number of “folds” required to compute a top-level proof, well that’s great, and if we can reduce the amount of space our scan requires, well that’s even more great!
It is also pretty much exactly how folks talking about folding schemes plan to parallelize Nova (more on that later): construct a binary tree of computations, where each node executes one step. At the root of the tree, we get a proof/claim that all computations have been executed correctly, in the correct order.
Relying on the Postgres Streaming Protocol to capture database changes, and efficient data streams transfer with Cap’n Proto
by Avichal Pandey, Bruno Calza, & Carson Farmer
Postgres offers a streaming protocol for logical replication. Using this protocol, we can subscribe to WAL (write ahead log) updates. Each WAL message captures a change on the primary database. Wal2Json is a Postgres plugin that can deserialize the binary WAL message into a JSON object. In Basin, we use a locally running daemon to listen to the JSON-encoded WAL messages as they arrive. JSON objects are not amenable to hashing because the key-value pairs don't have a canonical order.
Cap'nProto is a highly performant data interchange format and an RPC system. Also, it’s worth mentioning that Cap’nProto is “zero-copy”, which means there's no overhead to get the bytes for signing and sending across the network. One of the reasons we chose it. There is a small catch though: Cap’nProto also doesn’t have canonical ordering out of the box. But the good news is, the spec offers an algorithm for libraries that need to implement canonicalization.
So, essentially what we do is we apply the canonicalization algorithm when serializing the data for signing. It signs each of these canonicalized and hashed WAL message on behalf of the user. The user needs to provide their private key, but the key never leaves the Basin client daemon that runs on their machine. We use ECDSA over the secp256k1 curve for signatures, the same as Ethereum. Hence, any key pair generated by an Ethereum wallet can be used for signing WAL messages.
Then the Basin (codename for a set of novel features inside of Tableland) client starts a connection with a provider. Using Cap'nProto's RPC system, the basin client sends the messages to the provider. The server has to apply the same canonicalization algorithm before verifying the signature. The provider then verifies the signatures to ensure the authenticity of WAL updates. If an update is authentic, the provider converts it into a SQL query, such as an Insert statement, and executes it on its (local) database. At regular intervals, the provider backs up the database and sends the backup to Filecoin. It also indexes the Filecoin deal metadata into an FEVM contract. Anyone can query the contract to keep track of the original database backups and the corresponding Filecoin deals.
Homomorphic hashing for secure incremental updates is cool, but not quite ready yet
How can we allow a client to make some incremental changes to a file locally, and then publish those updates via an intermediate relayer, such that the updates are verifiable by (eventual) downstream consumers?
Homomorphic hashing, specifically LtHash, allows the update of a hash value based on incremental changes to the data, without needing to rehash everything. It is just about the silver bullet we’re looking for. But for approximately 200 bits of security, LtHash produces a 2KB hash. That’s an awfully big hash size compared to say a Blake3 hash (128-bit security and 256-bit default output size). For now, I think we’ll stick with something based on Merkle trees…
Read about it
Homomorphic Hashing: Deep-dive research Read Facebook's Blog
LtHash Definitions and Specifications:
Play with it
Here's a small Rust example using lthash-rs
crate to handle incremental inserts (and removals!):
use lthash_rs::LtHash;
use sha3::Shake128;
type LtHash16 = lthash_rs::LtHash16<Shake128>;
#[test]
fn insert_remove_object() {
let mut lthash = LtHash16::new();
let elements = ["apple", "banana", "kiwi"];
lthash.insert(elements[0]);
lthash.insert(elements[1]);
lthash.insert(elements[2]);
lthash.remove(elements[1]); // Oops, remove that one!
let mut lthash_bis = LtHash16::new();
lthash_bis.insert(elements[0]);
lthash_bis.insert(elements[2]);
assert_eq!(lthash.into_bytes(), lthash_bis.into_bytes());
}
Wait for it
The homomorphic hashing approach is extremely promising, and Facebook is even using LtHash in production. Keep an eye on the homomorphic hash space, I’m guessing were going to see more advances here in the future.
What the heck are folding schemes?
Folks are seriously excited about Nova and folding schemes, and seemingly for good reason! According to Justin Drake, you can think of Nova as a pre-processing step for SNARKs. So its not a SNARK in and of itself. But it is a very powerful technique that, when used with a general purpose SNARK system, can give us the best of both worlds in terms of very past prover time and very fast and cheap verifier time.
You should probably read this article to get fully up to speed. It jumps around a lot and is kind of manic in some parts… but to be honest, this reflects the current state of the art. Things are move extremely fast. Also the folks from Lurk Lab (Lurk’s default backend is Nova) have been keeping an awesome repo mostly updated, and this proposal for a Nova-based VM gives you a sense of the potential of folding schemes.
While we’re at it, these two papers provide some high-level motivations for incrementally verifiable compute and verifiable state machines, and I’d be remise if I didn’t provide links to some relevant Nova papers and extensions (HyperNova, SuperNova, ProtoStar, and others).
Evaluating IPC Subnet Actors for performant application data delivery - early but worth another look
by Andrew Hill
The below include findings mostly made by early summer 2023, but are worth noting before we embark on our second round of IPC discovery work.
Objectives
Gauge IPC's capacity to handle performant application data from decentralized nodes.
Develop a proof-of-concept (POC) to assess the efficacy of deploying custom software within IPC subnets.
Notable contributions
We employed Fendermint-powered FVM for R&D.
We also built a custom FVM actor that incorporates SQLite through a blockstore-backed VFS. The actor exposes APIs for SQL execution and transactional/non-transactional reads.
That work Introduced RPC endpoints (
execute
,query
) for actor interactions.Sander Pick got a working POC that serves geospatial map tiles via a SQLite database hosted within our custom actor! You can check it out here.
The above work required a modified version of Fendermint and we had to fork the built-in actors repository due to a lack of a formal SDK for custom actors. Here they are: Fendermint, Built-in Actors.
We also got a really cool demo of database querying at any blockchain "height," leveraging a blockstore and chain-enabled databases. Here’s the demo as a Loom Video.
Limitations & unresolved queries
At the time of our initial research, there was still some ambiguity around IPC's roadmap, particularly around key features (for us) like custom WASM actors.
Also, it was unclear if they IPC would ever support bootstrapping actor state from a Filecoin deal or snapshotting actor state as mainnet Filecoin deals—which at the time, were pretty much critical for the conceptual network design we had come up with.
Next steps
Well, its time to take another look! We think there might be a better fit for a modified approach that embraces some of the limitations we outlined above. We are starting a second round of IPC explorations now to see what has changed and how the system might fit in our latest architectural designs.
Other updates this week
New repo: basin-storage replicates data to Filecoin. It works in conjunction with basin-cli and basin-provider.
New repo: basin-provider ingests data from basin-cli. Data can be a continuous stream of database updates (currently only Postgres is supported), or in the form of manually uploaded files.
End transmission…
Want to dive deeper, ask questions, or just nerd out with us? Jump into our weekly research office hours.