Documentation

P Under the hood#

FRAGMENT is the database we wish we had at Stripe and Robinhood. It's the abstraction we want at the performance we need.

On February 13, 2024, we ran a load test to simulate traffic using Grafana K6. A total of 19,622,609 requests were made over a 15 minute period. 6,769,966 of them were reads and 12,852,643 of them were writes. We observed:

14,578 ledger entries per second average write throughput
7,489 balance reads per second average read throughput
33ms p95 read latency
69ms p95 write latency

By comparison, at Robinhood it took 18 months of dedicated engineering effort to get even remotely close to these numbers.

Architecture#

To achieve this performance, FRAGMENT is built on top of two distributed databases:

Transactional storage on AWS DynamoDB
Indexed storage on ElasticSearch

A two-tier system, as opposed to one built on a single Postgres instance, is harder to build and maintain. But by having separate write-optimized and read-optimized systems, we can tune each system independently and use the best tool for each job.

Write path#

When customers hit any of our GraphQL mutations, all data is synchronously written into DynamoDB then asynchronously indexed on ElasticSearch.

Scaling writes#

We optimize DynamoDB by:

Storing data in many small partitions. Each ledger entry is its own Dynamo partition. This scales horizontally since each partition can be served from a different DynamoDB server.
Using a single-table design with a dataloader. The dataloader batches several reads in a single application tick into one request to DynamoDB, minimizing the number of requests made to DynamoDB.

Read path#

Depending on the query, GraphQL queries API requests are served from either DynamoDB or ElasticSearch.

DynamoDB serves ledger account balances, single item lookups, and low-volume list queries
ElasticSearch serves list queries that may use filtering and pagination

Scaling reads#

Our ElasticSearch strategy is based on the idea that each query should only hit a single server. When a list query comes in, it gets routed to a single server, which uses sorting to cut down the search space, applies additional filters on indexed attributes, then returns the results. The results are fully hydrated so the FRAGMENT API can return data directly from ElasticSearch without hitting DynamoDB again.

This strategy is opposite to Elasticsearch's default where docs are thrown onto random servers and queries map out to every server in the cluster. Our strategy works well for a highly structured search with a high hit rate: filtering data in a Ledger. The default strategy is better for a fuzzy search with a low hit rate, like searching for a string across millions of documents.

FRAGMENT implements enterprise-grade security controls across all layers of the system. We are SOC 2 Type II compliant and undergo regular security audits.

Data#

We maintain comprehensive logging of all critical systems through AWS CloudTrail and AWS GuardDuty, tracking failed and successful logins, application access, administrator changes, and system changes.

Customer data is stored using managed AWS services: AWS S3 and AWS DynamoDB. We maintain automated real-time backups of all customer and system data, encrypted with the same standards as production data.

All customer data is encrypted at rest, including data in our internal networks, cloud storage, database tables, and backups. Data in transit is encrypted using TLS 1.2 or greater.

Code#

Our security and development teams conduct threat modeling and secure design reviews for new releases. We perform code audits and security scans for significant feature launches. We maintain a defined SDLC with security integrated throughout, including security threat modeling during design, code audits and security scans for major releases, and regular vulnerability management.

We perform continuous vulnerability scanning and package monitoring across our infrastructure. Issues are triaged and resolved based on severity.

Infra#

Our infrastructure is hosted on AWS, leveraging their global network of secure data centers. We maintain strict separation between development, testing, and production environments. Customer data is never stored in non-production environments.

Access#

We follow the Principle of Least Privilege for all access. Access is granted based on job function and business requirements, with regular access reviews.

Critical system logs are monitored with alerting rules for security events. We implement strict password policies and require MFA for all team members.

Endpoint#

All employee laptops have disk encryption enabled. We deploy endpoint detection software across all devices, providing visibility and rapid response capabilities.

Employee devices are managed remotely through MDM software. We use endpoint protection software for dedicated threat detection, monitoring for intrusions, malware, and malicious activities.

Documentation

a. Performance

b. Security