O Under the hood#

FRAGMENT is the database we wish we had at Stripe and Robinhood. It's the abstraction we want at the performance we need.

a. Performance

#

On February 13, 2024, we ran a load test to simulate traffic using Grafana K6. A total of 19,622,609 requests were made over a 15 minute period. 6,769,966 of them were reads and 12,852,643 of them were writes. We observed:

  • 14,578 ledger entries per second average write throughput
  • 7,489 balance reads per second average read throughput
  • 33ms p95 read latency
  • 69ms p95 write latency

By comparison, at Robinhood it took 18 months of dedicated engineering effort to get even remotely close to these numbers.

b. Architecture

#

To achieve this performance, FRAGMENT is built on top of two distributed databases:

  • Transactional storage on AWS DynamoDB
  • Indexed storage on ElasticSearch

A two-tier system, as opposed to one built on a single Postgres instance, is harder to build and maintain. But by having separate write-optimized and read-optimized systems, we can tune each system independently and use the best tool for each job.

c. Write path

#

When customers hit any of our GraphQL mutations, all data is synchronously written into DynamoDB then asynchronously indexed on ElasticSearch.

d. Scaling writes

#

We optimize DynamoDB by:

  • Storing data in many small partitions. Each ledger entry is its own Dynamo partition. This scales horizontally since each partition can be served from a different DynamoDB server.
  • Using a single-table design with a dataloader. The dataloader batches several reads in a single application tick into one request to DynamoDB, minimizing the number of requests made to DynamoDB.

e. Read path

#

Depending on the query, GraphQL queries API requests are served from either DynamoDB or ElasticSearch.

  • DynamoDB serves ledger account balances, single item lookups, and low-volume list queries
  • ElasticSearch serves list queries that may use filtering and pagination

f. Scaling reads

#

Our ElasticSearch strategy is based on the idea that each query should only hit a single server. When a list query comes in, it gets routed to a single server, which uses sorting to cut down the search space, applies additional filters on indexed attributes, then returns the results. The results are fully hydrated so the FRAGMENT API can return data directly from ElasticSearch without hitting DynamoDB again.

This strategy is opposite to Elasticsearch's default where docs are thrown onto random servers and queries map out to every server in the cluster. Our strategy works well for a highly structured search with a high hit rate: filtering data in a Ledger. The default strategy is better for a fuzzy search with a low hit rate, like searching for a string across millions of documents.